logoalt Hacker News

Philip-J-Fryyesterday at 9:36 AM5 repliesview on HN

I think the difference is that with LLMs, in a lot of cases you do see some diminishing returns.

I won't deny that the latest Claude models are fantastic at just one shotting loads of problems. But we have an internal proxy to a load of models running on Vertex AI and I accidentally started using Opus/Sonnet 4 instead of 4.6. I genuinely didn't know until I checked my configuration.

AI models will get to this point where for 99% of problems, something like Gemma is gonna work great for people. Pair it up with an agentic harness on the device that lets it open apps and click buttons and we're done.

I still can't fathom that we're in 2026 in the AI boom and I still can't ask Gemini to turn shuffle mode on in Spotify. I don't think model intelligence is as much of an issue as people think it is.


Replies

wjyesterday at 9:52 PM

I'm not sure I understand your last paragraph? The two sentences seem to contradict?

show 1 reply
dimmkeyesterday at 3:23 PM

100% agree here. The actual practical bottleneck is harness and agentic abilities for most tasks.

It's the biggest thing that stuck out to me using local AI with open source projects vs Claude's client. The model itself is good enough I think - Gemma 4 would be fine if it could be used with something as capable as Claude.

And that's gonna stay locked down unfortunately especially on mobile and cars - it needs access to APIs to do that stuff - and not just regular APIs that were built for traditional invoking.

The same way that websites are getting llm.txts I think APIs will also evolve.

Tianningyesterday at 3:23 PM

Agree on the diminishing returns,the Opus 4.6 anecdote is a good signal

bawanayesterday at 10:59 AM

I think security is the issue-ai is good at circumventing this. For example , ai can read paywalled articles you cannot. Do you really want ai to have ‘free range’.?

mewpmewp2yesterday at 9:54 AM

I mean to me even difference between Opus and Sonnet is as clear as day and night, and even Opus and the best GPT model. Opus 4.6 just seems much more reliable in terms of me asking it to do something, and that to actually happen.

show 2 replies