logoalt Hacker News

danslotoday at 8:05 PM5 repliesview on HN

It reads like an ad.

Secondly these are "just" IDORs, arguably the easiest class of vulnerabilities.

Thirdly it compares to GPT 5.5 and Opus 4.8.

No, we don't have Mythos at home.


Replies

vlian2088today at 8:16 PM

>Thirdly it compares to GPT 5.5

mythos is <10% ahead of gpt 5.5 on all benchmarks, which it gains by being several times the size of opus. had it been economical to provide, it would've been released to the public on day one instead of the marketing circus those effective altruism clowns had exhibited. admitting that it costs >1000% to run inference on a <10% better model would've been very damning.

InsideOutSantatoday at 8:21 PM

In my experience, GLM 5.2 is extremely good at finding vulnerabilities, and more importantly, unlike Opus, I've never seen it refuse a command. It genuinely is a very strong model for finding and fixing vulnerabilities.

NitpickLawyertoday at 9:12 PM

> Thirdly it compares to GPT 5.5 and Opus 4.8.

> No, we don't have Mythos at home.

That's still useful. To paraphrase the kids these days, GLM5.2 is in the room with us, today. Mythos is not. And for us in the EU, it's even more complicated, as Mythos might be with us in the room one day, and go poof the next day, on the whims of political entities that we have 0 control over.

Knowing where open, accessible, local models are is important. We know they're behind. But there comes a time when "good enough" is useful. Even if they're "just IDORs" today, and even if they're behind SotA today.

As someone else said above, GLM5.2 (and other models in the same tier like kimi, dsv4, etc) is / are slowly becoming "good enough" to assist in automated repo prepare work (download, install, test, edit, re-test, etc). And that translates in RL traces ready to be trained into the next generations. That might be more important than x% behind on benchmarks.

sanidtoday at 9:12 PM

Technically we don't have Mythos at all? You guys have access. This tells me we have Opus at home (open weights).

jimbob45today at 9:09 PM

Yeah they straight up say that their criteria is narrow and primarily important for their specific use case. Never let rationality cause your pitchfork to be cast away though!