Sorry but you're just seeing what you want to see. The idea that a 31b model is anywhere even i...

root_axis • yesterday at 6:20 AM • 3 replies • view on HN

Sorry but you're just seeing what you want to see. The idea that a 31b model is anywhere even in the ballpark of something like Opus 4.5 is just absurd on its face.

Replies

thot_experiment • yesterday at 7:27 AM

False. The absolute capability is irrelevant, with the proper harness 31b is more than adequate for a very large portion of the tasks I ask AI to do. The metric isn't how good the model is at Erdos Problems, it's how reliably it can remove drudgery in my life. It just autonomously reverse engineered a bluetooth protocol with minimal intervention, it's ability to react to data and ground itself is constantly impressive to me. I do a ton of testing with these models, today I had Gemma answer a physics problem that Opus 4.7 gave up on. With a decent harness and context the set of tasks where their capabilities are both good enough is very surprising. The tasks I have that stump Gemma often also stump Opus 4.7.

➕ show 2 replies

BoredomIsFun • yesterday at 7:50 AM

It would be true, if model providers did not throttle their models. I do not have definitive proof they do but the rumors are abundant.

creativeSlumber • yesterday at 3:48 PM

I think you are missing the point here. what matters is for that user the local models are good enough for their use case.

alt Hacker News

Replies