logoalt Hacker News

refulgentisyesterday at 6:51 PM2 repliesview on HN

The CTF charts are the less interesting result. (article: "Even expert-level CTFs only test specific skills in isolation.") Models converging at non-expert level isn't a knock on Mythos, it's the benchmark saturating. Of course GPT-5 matches it there.

The actual result is TLO, and "only 6 more steps" in OP misreads how sequential attack chains work. These aren't independent puzzles. Each step gates the next. Averaging 22 vs 16 means Mythos is consistently punching through bottlenecks that completely stop Opus 4.6. More importantly: Mythos completed the full chain 3/10 times. Opus 4.6 completed it 0/10 times. That's not a narrow margin. In any security-relevant framing, "achieves full network takeover" vs "does not achieve full network takeover" is a binary threshold, and exactly one model crossed it. A year ago the best models struggled with beginner CTFs. Now one autonomously replicates what AISI estimates takes human professionals 20 hours. Calling that unimpressive because the margin over second place is single digits is measuring the wrong gap.

re: compute, "requires lots of compute" and "scaling is a dead end" are near-opposite claims. If performance is still climbing at 100M tokens with no visible plateau, that's evidence scaling works. Whether it's cheap today is a different question, and not one that ages well. Compute costs fall reliably, so what matters is the capability at a given price point in 18 months, not today.


Replies

traceroute66yesterday at 8:41 PM

> Compute costs fall reliably, so what matters is the capability at a given price point in 18 months, not today.

The underlying point still stands, namely that "more compute" as the default answer is not sustainable.

Why ?

Because even if we accept the unlikely dream that GPU prices will magically take a nose-dive, you still need somewhere to put all those servers stuffed with GPUs.

That means datacentres.

And "more datacentres" is absolutely not sustainable.

The cooling needs, the power needs, the land needs..... none of it is remotely sustainable.

show 1 reply
thepaschyesterday at 7:07 PM

Thanks for that context, this is valuable info I was missing and makes it read differently for sure.

show 1 reply