logoalt Hacker News

spacedoutmantoday at 5:12 PM1 replyview on HN

This research is useless and nearly all other LLM research is too.

gpt 5.2 is the strongest model they tested, a nearly 6 month old model.

Traditional research can not keep up.


Replies

acgourleytoday at 5:31 PM

I disagree, their findings should generalize to the frontier. Even if the latest can deal with the extra complexity, it stands to reason it will take more tokens to do less. This could be a useful insight into the next generation of evals.