logoalt Hacker News

root_axisyesterday at 6:20 AM3 repliesview on HN

Sorry but you're just seeing what you want to see. The idea that a 31b model is anywhere even in the ballpark of something like Opus 4.5 is just absurd on its face.


Replies

thot_experimentyesterday at 7:27 AM

False. The absolute capability is irrelevant, with the proper harness 31b is more than adequate for a very large portion of the tasks I ask AI to do. The metric isn't how good the model is at Erdos Problems, it's how reliably it can remove drudgery in my life. It just autonomously reverse engineered a bluetooth protocol with minimal intervention, it's ability to react to data and ground itself is constantly impressive to me. I do a ton of testing with these models, today I had Gemma answer a physics problem that Opus 4.7 gave up on. With a decent harness and context the set of tasks where their capabilities are both good enough is very surprising. The tasks I have that stump Gemma often also stump Opus 4.7.

show 2 replies
BoredomIsFunyesterday at 7:50 AM

It would be true, if model providers did not throttle their models. I do not have definitive proof they do but the rumors are abundant.

creativeSlumberyesterday at 3:48 PM

I think you are missing the point here. what matters is for that user the local models are good enough for their use case.