This model is pretty cool if you don't have a GPU - I was able to get I think 20 or 30 tokens p...

meatmanek • today at 5:37 AM • 0 replies • view on HN

This model is pretty cool if you don't have a GPU - I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone. (I don't remember if that was with q4 or q8.)

Otherwise, if you have a GPU with more than like 4GB of VRAM, there are better models. Gemma4 and Qwen3.6 (or Qwen3.5 if you need the smaller dense models that haven't yet been released for 3.6) are a good place to start.

alt Hacker News