logoalt Hacker News

admax88qqqtoday at 8:27 PM2 repliesview on HN

> beats Claude in our Cyber Benchmarks

Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).

It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.


Replies

InsideOutSantatoday at 8:40 PM

They say "Claude Opus 4.8" in the first paragraph.

show 1 reply
ls612today at 8:33 PM

Opus 4.8 according to TFA. Whether or not the safety guardrails were responsible for the difference is an open question but for a dev who wants to secure their software who doesn’t work at one of the blessed Glasswing companies it doesn’t really matter why, it matters what the best tool you actually have is.