logoalt Hacker News

lebovictoday at 7:14 PM0 repliesview on HN

I think the third chart is the most notable; Mythos is the first model which saturated that eval from the UK AISI [1].

Personally, I think we crossed the threshold of meaningfully useful capabilities for autonomous hacking with Opus 4.6 [2], mostly because its behaviors and persistence are useful for finding vulnerabilities out of the box [3]. But it still seems like Mythos is another step up.

[1]: https://cdn.prod.website-files.com/663bd486c5e4c81588db7a48/...

[2]: https://www.noahlebovic.com/testing-an-autonomous-hacker/

[3]: https://news.ycombinator.com/item?id=46920682