> beats Claude in our Cyber Benchmarks Beats which model in Claude? Whenever a "benchmark...

admax88qqq • today at 8:27 PM • 2 replies • view on HN

> beats Claude in our Cyber Benchmarks

Beats which model in Claude? Whenever a "benchmark" doesn't put precise model numbers in their headlines I am immediately skeptical. Either they don't know the difference (bad) or they are benchmarking against weaker models (misleading, also bad).

It's like when studies say "AI is bad at X" and they used GPT-3.5 in current year.

Replies

InsideOutSanta • today at 8:40 PM

They say "Claude Opus 4.8" in the first paragraph.

➕ show 1 reply

ls612 • today at 8:33 PM

Opus 4.8 according to TFA. Whether or not the safety guardrails were responsible for the difference is an open question but for a dev who wants to secure their software who doesn’t work at one of the blessed Glasswing companies it doesn’t really matter why, it matters what the best tool you actually have is.

alt Hacker News

Replies