logoalt Hacker News

wolttamtoday at 2:46 PM1 replyview on HN

I started with antirez' DwarfStar[1] on one spark and that (~11-14tok/s generation, ~300-400 tok/s prompt processing) was enough of a taste for me to jump into 2 sparks, running the native quant of DSv4 Flash.

Now at 40-50tok/s generation and ~2000 tok/s prefill with a model that I've seen reason through race conditions and be able to trivially pull off any straight-forward coding task, and remain coherent at 500k context. With a preview checkpoint of the weights!

I'm excited for the future of local LLMs. There is some buy-in but apparently not an extreme amount to get access to models that can stand in the for the giants on all but the most challenging and/or hands-off coding tasks.

[1]: https://github.com/antirez/ds4


Replies

binyutoday at 3:00 PM

> Now at 40-50tok/s generation and ~2000 tok/s

Not clear how you went from ~11-14 to ~40-50 tok/s. Is it by running the quant native model and adding a second Spark?

Cheers

show 2 replies