logoalt Hacker News

zozbot234yesterday at 4:29 PM1 replyview on HN

What kind of hardware are you planning to run this on? As mentioned already, I've been trying to understand how gracefully it might degrade on 64GB RAM or perhaps lower (the total weights size is 80GB at the provided quant) using SSD offload for the weights, and then (assuming it works and doesn't just OOM) whether the tok/s figures might meaningfully improve in that scenario by running multiple sessions in parallel.


Replies

alfiedotwtfyesterday at 5:19 PM

I've got a 4060 Ti 12Gb with 128Gb RAM. I was hoping once I could demonstrate to myself that I could run Deepseek v4 Flash locally (even at really slow speeds), then it would be worth my time and money to get something to run it > 20t/s.

... currently testing out Stepfun 3.5 Flash Q4_k_m as a stop gap (unless it blows my socks off first).

show 1 reply