logoalt Hacker News

smcleodyesterday at 12:29 PM3 repliesview on HN

Yeah that 60-150b~ range is such a sweet spot for current 'prosumer' hardware, I'd love to see something like a 120b-a14b or there about.


Replies

tarrudayesterday at 12:35 PM

I have a 128G mac studio and even 397B was a happy surprise to me due to its high quantization resilience.

I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...

show 2 replies
KronisLVyesterday at 5:40 PM

There definitely have been some options in the past, cool to see them.

Oddly enough, though, Qwen 3.6 35B A3B and Gemma got some really good reviews, despite being way smaller than any of these ones.

Qwen 3.5, 122B A10B: https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF

Qwen Coder Next, 80B A3B: https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF

It's kinda weird that DeepSeek V4 Flash is supposed to be 284B A13B, but shows up as 158B in HuggingFace, probably some weird bug: https://huggingface.co/unsloth/DeepSeek-V4-Flash and that's not even just Unsloth but like the official source too https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash (so also doesn't fit the category unless you get a heavily quantized version to run, but cool regardless)

Mistral Medium 3.5 is interesting because it's 128B but dense, so probably too slow for most folks: https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF

GPT-OSS, 120B A5B: https://huggingface.co/unsloth/gpt-oss-120b-GGUF

gcryesterday at 12:33 PM

What’s the price point for getting into that sweet spot?

I’m on an M1 Max with 32GB VRAM, so I’m looking forward to the 27B or 35B-A3B models. Is dropping $5k for an RTX 6000 or a DGX Spark really the best option?

show 6 replies