logoalt Hacker News

alfiedotwtfyesterday at 5:19 PM1 replyview on HN

I've got a 4060 Ti 12Gb with 128Gb RAM. I was hoping once I could demonstrate to myself that I could run Deepseek v4 Flash locally (even at really slow speeds), then it would be worth my time and money to get something to run it > 20t/s.

... currently testing out Stepfun 3.5 Flash Q4_k_m as a stop gap (unless it blows my socks off first).


Replies

zozbot234yesterday at 6:51 PM

I don't think the DS4 project supports the CPU/GPU split approach you'd need for best performance on that kind of hardware (shared layers on GPU, most experts on CPU). CPU-only inference would work but might be slow.

show 1 reply