logoalt Hacker News

GLM-5.2 is a step change for open agents

339 pointsby vantareedlast Tuesday at 3:23 AM194 commentsview on HN

Comments

jerojerolast Tuesday at 11:49 AM

Open weight models from Chinese labs tend to be significantly cheaper.

I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.

It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.

show 11 replies
guybedotoday at 12:12 AM

GLM-5.2 has been a step change in how fast i can burn through tokens.

I subscribed to their max plan to try it out. It counted me 700M tokens and drained my weekly quota in under 2 days.

Quota just reset less than 24h ago and i'm already >60% weekly quota usage.

For reference the kind of work i did would have used somewhere between 3% and 5% of Codex max or Claude max.

The model is good, the plan is a scam

show 3 replies
christophilustoday at 12:03 AM

I've been working with Deepseek V4 Flash (with opencode as the harness). It's been almost indistinguishable from Codex / Claude Code for me. I'm sure I'll run into problems when I get to a stickier ticket to tackle. But so far, it's been quite good, and I find it writes straightforward code.

I do think the Chinese models are good enough for an 80/20 rule use case.

show 3 replies
aunty_helenyesterday at 10:53 PM

I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.

show 6 replies
timcobbyesterday at 10:44 PM

Can people share their GLM and open model setups in general please? What provider do you use. Why do you trust it with serving full quality? What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps). I am just trying GLM 5.1 from Nvidia build in open code would love to hear how you all do it, thanks.

show 8 replies
sibellaviatoday at 6:44 AM

While I agree with the post in its entirety, I think it would have been worth mentioning DeepSeek V4 Flash as well, which, in my view, had already reached a sufficient, if not high-level of agentic coding before GLM 5.2 (see DwarfStar).

ramon156today at 7:51 AM

I know very little about the current state of replacability of Opus but I do sometimes imagine a reality where Opus has been rebuilt as an open model. What plan does Anthropic have when it does happen?

Will they still rent out their own model, will they support the open model and become a resource provider? Will they be able to repay the billions of dollars ?

This is probably the first question I would ask someone from Anthropic, if I ever meet one.

fraywingtoday at 12:00 AM

It feels like the gap is closing from an intelligence perspective. Or at least doing some kind of log flattening.

Been playing with GLM 5.2 in different contexts. It's less good if you don't max out thinking, but as xhigh it's been able to solve most problems I was throwing at Opus in the about the same amount of time (via OpenRouter).

Wild time to be alive.

show 1 reply
nullbiotoday at 6:55 AM

The idea of an open-weight Mythos model is not scary at all. This space is moving so quickly that it'll looked at in 1-2 years as childs play.

show 1 reply
mlmonkeytoday at 12:30 AM

Here are the numbers from their bar chart:

    1. SWE-bench Pro
    Model Score (%)
    GLM-5.2 62.1
    GLM-5.1 58.4
    Claude Opus 4.8 69.2
    GPT-5.5 58.6
    Gemini 3.1 Pro 54.2

    2. Terminal-Bench 2.1
    Model Score (%)
    GLM-5.2 81.0
    GLM-5.1 63.5
    Claude Opus 4.8 85.0
    GPT-5.5 84.0
    Gemini 3.1 Pro 74.0
    
    3. NL2Repo
    Model Score (%)
    GLM-5.2 48.9
    GLM-5.1 42.7
    Claude Opus 4.8 69.7
    GPT-5.5 50.7
    Gemini 3.1 Pro 33.4
    
    4. DeepSWE
    Model Score (%)
    GLM-5.2 46.2
    GLM-5.1 18.0
    Claude Opus 4.8 58.0
    GPT-5.5 70.0
    Gemini 3.1 Pro 10.0
    
    5. ProgramBench
    Model Score (%)
    GLM-5.2 63.7
    GLM-5.1 50.9
    Claude Opus 4.8 71.9
    GPT-5.5 70.8
    Gemini 3.1 Pro 39.5
    
    6. MCP-Atlas
    Model Score (%)
    GLM-5.2 77.0
    GLM-5.1 71.8
    Claude Opus 4.8 77.8
    GPT-5.5 75.3
    Gemini 3.1 Pro 69.2
    
    7. Tool-Decathlon
    Model Score (%)
    GLM-5.2 48.2
    GLM-5.1 40.7
    Claude Opus 4.8 59.9
    GPT-5.5 55.6
    Gemini 3.1 Pro 48.8
    
    8. Humanity's Last Exam
    Model Base Score (%) Score w/ Tools (%)
    GLM-5.2 40.5 54.7
    GLM-5.1 31.0 52.3
    Claude Opus 4.8 49.8 57.9
    GPT-5.5 41.4 52.2
    Gemini 3.1 Pro 45.0 51.4
Seems to be handily beating Gemini 3.1 Pro. What _is_ Google DeepMind doing (other than bleeding talent to A\ ) ?
show 4 replies
neosatyesterday at 11:51 PM

I've been using GLM 5.2 recently (company hosted, for non-coding tasks) and it's been strong and reliable. There are areas where GPT 5.5 and Opus 4.x still feel marginally better but only marginally. For most tasks if GLM 5.2 is the only model I have to use I'm productive and happy. This was not true before GLM 5.2. No doubt in my mind that the gap is closing quickly and for most tasks that are not very specialized open models will be usably on par on flagship closed models and have an edge factoring in cost.

For coding I still use 5.5 w/ Codex and prefer that to other models + harness combinations.

GL26today at 2:06 PM

if someone has any tutorial on how to run GLM-5.2 from a Rasberry Pi 5 (AI hat), I want it !

show 1 reply
themgtyesterday at 9:39 PM

I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.

But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.

It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.

Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.

show 3 replies
melodyogonnatoday at 10:33 AM

American AI labs really need to start releasing good open-weight models.

show 1 reply
seanytoday at 12:19 AM

What's the current best for ablation? Specifically chemistry and red-team/netsec?

show 1 reply
NovaCode37today at 8:27 AM

Honestly, glm is staying quiet close to claude but it can save tons of tokens either than anthropic model

yogthostoday at 3:10 AM

It's by far the most competent open model I've tried yet. It's a bit slower than Claude, but in terms of coding capability it seems to get comparable results at least for the work I'm doing.

newaccountman2today at 12:35 AM

5.1 and Qwen 3.6 are great too IMO

doolstoday at 12:16 AM

Is z.ai

Is 2 better than x.ai

citizenpaulyesterday at 10:08 PM

Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2

Perhaps it is just my harness and workflow, but the older model still seems to work better. Also the token cost is significantly lower. I rarely spend more than $20 a week with $50 cap. Not even half claudes ambiguous minimum $200 a month plan.

show 1 reply
nubgtoday at 1:46 PM

A question I always have is, how to the AI labs safeguard the leak of their model? Training a cutting edge model basically cost a minimum of hundreds of millions of dollars. And its all contained within a file. Okay, that file might be 500GB large, but its still just one blob that is worth almost a billion dollars. And they need to train new models every few weeks, have lots of people with access to it to debug it, run inference etc. I wonder when we will see the first leaks? Imagine if e.g. Opus 4.8 got leaked. Wouldnt that bankrupt Anthropic?

alfiedotwtftoday at 9:29 AM

Once open Chinese models look like they’re about to overtake closed US models, watch the US government push imperialism hidden behind increasingly hyperbolic national security concerns.

At the end of the day, open weights should be seen as nothing more than information (just more just numbers afterall), and so organisations like the EFF should sue for any restricting of the 1st Amendment

ddemiantoday at 4:42 PM

[flagged]

bugthesystemtoday at 1:26 PM

[flagged]

Balinareslast Tuesday at 11:45 AM

I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.

show 3 replies
s_kazmitoday at 10:20 AM

[dead]

modgatetoday at 2:39 AM

[flagged]

ideaxiaoshitoday at 6:08 AM

[dead]