logoalt Hacker News

neilellisyesterday at 5:45 PM1 replyview on HN

It's ahead in raw power but not in function. Like it's got the worlds fast engine but one gear! Trouble is some benchmarks only measure horse power.


Replies

NitpickLawyeryesterday at 5:52 PM

> Trouble is some benchmarks only measure horse power.

IMO it's the other way around. Benchmarks only measure applied horse power on a set plane, with no friction and your elephant is a point sphere. Goog's models have always punched over what benchmarks said, in real world use @ high context. They don't focus on "agentic this" or "specialised that", but the raw models, with good guidance are workhorses. I don't know any other models where you can throw lots of docs at it and get proper context following and data extraction from wherever it's at to where you'd need it.