GLM-5.2 is a step change for open agents

339 points • by vantareed • last Tuesday at 3:23 AM • 194 comments • view on HN

Comments

Open weight models from Chinese labs tend to be significantly cheaper.

I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.

It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.

➕ show 11 replies

guybedo • today at 12:12 AM

GLM-5.2 has been a step change in how fast i can burn through tokens.

I subscribed to their max plan to try it out. It counted me 700M tokens and drained my weekly quota in under 2 days.

Quota just reset less than 24h ago and i'm already >60% weekly quota usage.

For reference the kind of work i did would have used somewhere between 3% and 5% of Codex max or Claude max.

The model is good, the plan is a scam

➕ show 3 replies

christophilus • today at 12:03 AM

I've been working with Deepseek V4 Flash (with opencode as the harness). It's been almost indistinguishable from Codex / Claude Code for me. I'm sure I'll run into problems when I get to a stickier ticket to tackle. But so far, it's been quite good, and I find it writes straightforward code.

I do think the Chinese models are good enough for an 80/20 rule use case.

➕ show 3 replies

aunty_helen • yesterday at 10:53 PM

I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.

➕ show 6 replies

timcobb • yesterday at 10:44 PM

Can people share their GLM and open model setups in general please? What provider do you use. Why do you trust it with serving full quality? What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps). I am just trying GLM 5.1 from Nvidia build in open code would love to hear how you all do it, thanks.

➕ show 8 replies

sibellavia • today at 6:44 AM

While I agree with the post in its entirety, I think it would have been worth mentioning DeepSeek V4 Flash as well, which, in my view, had already reached a sufficient, if not high-level of agentic coding before GLM 5.2 (see DwarfStar).

ramon156 • today at 7:51 AM

I know very little about the current state of replacability of Opus but I do sometimes imagine a reality where Opus has been rebuilt as an open model. What plan does Anthropic have when it does happen?

Will they still rent out their own model, will they support the open model and become a resource provider? Will they be able to repay the billions of dollars ?

This is probably the first question I would ask someone from Anthropic, if I ever meet one.

fraywing • today at 12:00 AM

It feels like the gap is closing from an intelligence perspective. Or at least doing some kind of log flattening.

Been playing with GLM 5.2 in different contexts. It's less good if you don't max out thinking, but as xhigh it's been able to solve most problems I was throwing at Opus in the about the same amount of time (via OpenRouter).

Wild time to be alive.

➕ show 1 reply

nullbio • today at 6:55 AM

The idea of an open-weight Mythos model is not scary at all. This space is moving so quickly that it'll looked at in 1-2 years as childs play.

➕ show 1 reply

mlmonkey • today at 12:30 AM

Here are the numbers from their bar chart:

    1. SWE-bench Pro
    Model Score (%)
    GLM-5.2 62.1
    GLM-5.1 58.4
    Claude Opus 4.8 69.2
    GPT-5.5 58.6
    Gemini 3.1 Pro 54.2

    2. Terminal-Bench 2.1
    Model Score (%)
    GLM-5.2 81.0
    GLM-5.1 63.5
    Claude Opus 4.8 85.0
    GPT-5.5 84.0
    Gemini 3.1 Pro 74.0
    
    3. NL2Repo
    Model Score (%)
    GLM-5.2 48.9
    GLM-5.1 42.7
    Claude Opus 4.8 69.7
    GPT-5.5 50.7
    Gemini 3.1 Pro 33.4
    
    4. DeepSWE
    Model Score (%)
    GLM-5.2 46.2
    GLM-5.1 18.0
    Claude Opus 4.8 58.0
    GPT-5.5 70.0
    Gemini 3.1 Pro 10.0
    
    5. ProgramBench
    Model Score (%)
    GLM-5.2 63.7
    GLM-5.1 50.9
    Claude Opus 4.8 71.9
    GPT-5.5 70.8
    Gemini 3.1 Pro 39.5
    
    6. MCP-Atlas
    Model Score (%)
    GLM-5.2 77.0
    GLM-5.1 71.8
    Claude Opus 4.8 77.8
    GPT-5.5 75.3
    Gemini 3.1 Pro 69.2
    
    7. Tool-Decathlon
    Model Score (%)
    GLM-5.2 48.2
    GLM-5.1 40.7
    Claude Opus 4.8 59.9
    GPT-5.5 55.6
    Gemini 3.1 Pro 48.8
    
    8. Humanity's Last Exam
    Model Base Score (%) Score w/ Tools (%)
    GLM-5.2 40.5 54.7
    GLM-5.1 31.0 52.3
    Claude Opus 4.8 49.8 57.9
    GPT-5.5 41.4 52.2
    Gemini 3.1 Pro 45.0 51.4

Seems to be handily beating Gemini 3.1 Pro. What _is_ Google DeepMind doing (other than bleeding talent to A\ ) ?

➕ show 4 replies

neosat • yesterday at 11:51 PM

I've been using GLM 5.2 recently (company hosted, for non-coding tasks) and it's been strong and reliable. There are areas where GPT 5.5 and Opus 4.x still feel marginally better but only marginally. For most tasks if GLM 5.2 is the only model I have to use I'm productive and happy. This was not true before GLM 5.2. No doubt in my mind that the gap is closing quickly and for most tasks that are not very specialized open models will be usably on par on flagship closed models and have an edge factoring in cost.

For coding I still use 5.5 w/ Codex and prefer that to other models + harness combinations.

GL26 • today at 2:06 PM

if someone has any tutorial on how to run GLM-5.2 from a Rasberry Pi 5 (AI hat), I want it !

➕ show 1 reply

themgt • yesterday at 9:39 PM

I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.

But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.

It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.

Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.

➕ show 3 replies

melodyogonna • today at 10:33 AM

American AI labs really need to start releasing good open-weight models.

➕ show 1 reply

seany • today at 12:19 AM

What's the current best for ablation? Specifically chemistry and red-team/netsec?

➕ show 1 reply

NovaCode37 • today at 8:27 AM

Honestly, glm is staying quiet close to claude but it can save tons of tokens either than anthropic model

yogthos • today at 3:10 AM

It's by far the most competent open model I've tried yet. It's a bit slower than Claude, but in terms of coding capability it seems to get comparable results at least for the work I'm doing.

newaccountman2 • today at 12:35 AM

5.1 and Qwen 3.6 are great too IMO

dools • today at 12:16 AM

Is z.ai

Is 2 better than x.ai

citizenpaul • yesterday at 10:08 PM

Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2

Perhaps it is just my harness and workflow, but the older model still seems to work better. Also the token cost is significantly lower. I rarely spend more than $20 a week with $50 cap. Not even half claudes ambiguous minimum $200 a month plan.

➕ show 1 reply

nubg • today at 1:46 PM

A question I always have is, how to the AI labs safeguard the leak of their model? Training a cutting edge model basically cost a minimum of hundreds of millions of dollars. And its all contained within a file. Okay, that file might be 500GB large, but its still just one blob that is worth almost a billion dollars. And they need to train new models every few weeks, have lots of people with access to it to debug it, run inference etc. I wonder when we will see the first leaks? Imagine if e.g. Opus 4.8 got leaked. Wouldnt that bankrupt Anthropic?

alfiedotwtf • today at 9:29 AM

Once open Chinese models look like they’re about to overtake closed US models, watch the US government push imperialism hidden behind increasingly hyperbolic national security concerns.

At the end of the day, open weights should be seen as nothing more than information (just more just numbers afterall), and so organisations like the EFF should sue for any restricting of the 1st Amendment

ddemian • today at 4:42 PM

[flagged]

bugthesystem • today at 1:26 PM

[flagged]

Balinares • last Tuesday at 11:45 AM

I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.

➕ show 3 replies

s_kazmi • today at 10:20 AM

[dead]

modgate • today at 2:39 AM

[flagged]

ideaxiaoshi • today at 6:08 AM

[dead]

alt Hacker News

GLM-5.2 is a step change for open agents

Comments