logoalt Hacker News

samuelknightyesterday at 11:58 PM0 repliesview on HN

I don't know about Mythos but the chart understates the capability of the current frontier models. GPT and Claude models available today are capable of Web app exploits, C2, and persistence in well under 10M tokens if you build a good harness.

The benchmark might be a good apples-to-apples comparison but it is not showing capability in an absolute sense.