logoalt Hacker News

NitpickLawyertoday at 2:45 PM0 repliesview on HN

> But is the capability difference enough [..]

This is the (m/b)illion dollar question, isn't it? I think there's also a question of what do you think capability is exactly, and how the difference manifests itself.

On the one hand, when something becomes "good enough" that's a clear capability threshold. On the other hand, what's the limit of those capabilities, and equally as important, how does capability reflect on reliability?

We've seen "local models" lately improve on capabilities where they're "good enough" for some tasks. Reliability of solving those tasks is a bit harder to measure/benchmark/test. It'll get better as more people work with those models. But, something I've noticed in the past ~6months is that the frontier models are gaining a lot in both the breadth of capabilities, as well as the reliability of solving those tasks that they're capable of solving. I think this is where scaling (both compute and data) is showing, and where having more compute is simply better (more parallel exploration, more training data output, more broad data, etc).

There's also the problem of benchmarking true capabilities. The popular ones are getting old, and aren't as reliable as they used to be (not even touching on the subject of benchmaxxing, just thinking about their saturation, even with honest intentions).

So the question then becomes what will users prefer? Do you get the best of the best, or the one that's good enough? There might be a market for both, honestly. Not everyone does SotA stuff. And a lot of what people used to do in a company is probably mundane enough that a "good enough" model with "good enough" reliability can probably handle (w/ some supervision ofc).

What I'm more interested in is if things like Thaalas succeed and they get to provide local hardware that runs models "burned in silicon". That would be interesting, because speed and all the advantages of local models are a "quality" on their own. For example, right now I'd pay ~1k$ for an external hdd-sized block that can run a ~32B model that's popular right now, even knowing that it can only run that model. I have no idea if that's feasible or not, if it makes sense from a financial pov. But I'd buy one. And local inference on dedicated chips doesn't need to be "oss only". I'm sure oAI / etc would probably take the risk of licensing one of their -mini / -lite models provided that the risk of the weights leaking is small enough (and it probably is).

> This keeps a ceiling on how much or how fast the frontier labs can raise prices.

I generally agree, but from a different perspective. Up till now we've seen that the 3 labs influence each other's price points. When gpt5 came out at a radically smaller price, the others lowered them as well. Now with opus being SotA for coding, w/ 5.5 close behind, they've raised them back. Google seems to follow slowly. But there's hope that, being 3 top labs + 2 trailing (xAI & Meta), there'll be pressure once again. If any of those trailing labs manage to get to SotA again, the prices will drop once more. Some people say that open source also provides a pressure here, but I'm not yet convinced of this. There's still a question of who'll serve the models, at what scales, etc.