logoalt Hacker News

thot_experimentyesterday at 7:27 AM2 repliesview on HN

False. The absolute capability is irrelevant, with the proper harness 31b is more than adequate for a very large portion of the tasks I ask AI to do. The metric isn't how good the model is at Erdos Problems, it's how reliably it can remove drudgery in my life. It just autonomously reverse engineered a bluetooth protocol with minimal intervention, it's ability to react to data and ground itself is constantly impressive to me. I do a ton of testing with these models, today I had Gemma answer a physics problem that Opus 4.7 gave up on. With a decent harness and context the set of tasks where their capabilities are both good enough is very surprising. The tasks I have that stump Gemma often also stump Opus 4.7.


Replies

diordiderotyesterday at 9:54 AM

Maybe reaching for an analogy would be helpful here.

Thot_experiment is saying that his 2016 Toyota Prius is a great and reliable car for his daily commute and running errands.

Whereas everyone is screeching about its capability gap with a Lockheed Martin F35 lightning.

show 1 reply
ameliusyesterday at 7:38 AM

This is like saying that 640kB is enough for anybody.

show 2 replies