It really depends. With the new "thinking" models they usually spend some time before writ...

NitpickLawyer • yesterday at 5:53 PM • 1 reply • view on HN

It really depends. With the new "thinking" models they usually spend some time before writing the final answer. If they "think" for 1k tokens, that's a minute of spinning wheel you're gonna see for each question. Add that to the prompt processing, and diminishing speeds as context increases, and it becomes really slow for longer sessions.

Replies

mudkipdev • yesterday at 7:12 PM

Reminds me of the possibility of running DeepSeek at 3-4 t/s with SSD streaming, could be viable if you are running something overnight for example

➕ show 1 reply

alt Hacker News

Replies