GLM 5.2 Max = Opus 4.8 Max in thinking behavior. The thinking chain is so similar, and so is the amount of token usage on the output.
If you want reasonable token usage, you need to run it GLM 5.2 at High. There is little drop in quality from Max to High (for most tasks). And it cuts token usage by 2 a 2.5x. GLM 5.2, Max is really something you only need for complex tasks.
In essence, GLM 5.2 is Opus 4.8 its little brother, at a way, WAY cheaper price.
There has been really no training on Opus models going on, really, none i tell you! /sarcasm
distillation of thinking models is not particularly effective - both "Open"AI and Misanthropic don't show you the real chain of thought, only its severely downscaled version. both do everything in their power to combat such outrageous copyright infringement, so the bulk of unethically scrapped data the Chinese have is from several generations ago.
looking at the score this is rather a gemini 3.5 flash competitor, yes, for cheaper, but distance to opus and fable is as big as their price diff.
With such ridiculously long thinking traces I'm surprised max outperforms high. After all, performance falls off a hill after a certain amount of context, and long thinking traces can fill that up really quickly.
> GLM 5.2 Max = Opus 4.8 Max in thinking behavior
This is insane! I can't wait until technology progresses to the point we can run these things on consumer hardware!