GLM-OCR: Accurate × Fast × Comprehensive

201 points • by ms7892 • last Saturday at 2:15 PM • 58 comments • view on HN

Comments

There are a bunch of new OCR models.

I’ve also heard very good things about these two in particular:

- LightOnOCR-2-1B: https://huggingface.co/lightonai/LightOnOCR-2-1B

- PaddleOCR-VL-1.5: https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5

The OCR leaderboards I’ve seen leave a lot to be desired.

With the rapid release of so many of these models, I wish there were a better way to know which ones are actually the best.

I also feel like most/all of these models don’t handle charts, other than to maybe include a link to a cropped image. It would be nice for the OCR model to also convert charts into markdown tables, but this is obviously challenging.

➕ show 5 replies

alaanor • today at 4:31 PM

There was so many OCR models released in the past few months, all VLM models and yet none of them handle Korean well. Every time I try with a random screenshot (not a A4 document) they just fail at a "simple" task. And funnily enough Qwen3 8B VL is the best model that usually get it right (although I couldn't get the bbox quite well). Even more funny, whatever is running on an iphone locally on cpu is insanely good, same with google's OCR api. I don't know why we don't get more of the traditional OCR stuff. Paddlepaddle v5 is the closest I could find. At this point, I feel like I might be doing something wrong with those VLMs.

➕ show 2 replies

aliljet • today at 3:27 PM

This is actually the thing I really desperately need. I'm routinely analyzing contracts that were faxed to me, scanned with monstrously poor resolution, wet signed, all kinds of shit. The big LLM providers choke on this raw input and I burn up the entire context window for 30 pages of text. Understandable evals of the quality of these OCR systems (which are moving wicked fast) would be helpful...

And here's the kicker. I can't afford mistakes. Missing a single character or misinterpreting it could be catastrophic. 4 units vacant? 10 days to respond? Signature missing? Incredibly critical things. I can't find an eval that gives me confidence around this.

➕ show 7 replies

mikae1 • today at 5:34 PM

Text me back when there's a working PDF to EPUB conversion tool. I've been waiting (and searching for one) long enough. :D

EDIT: https://github.com/overcuriousity/pdf2epub looks interesting.

surfacedamage • today at 8:54 PM

This might be a niche question, but does glm-ocr (or other libraries) have the ability to extract/interpret QR code data?

ThrowawayTestr • today at 10:13 PM

What's the current SOTA for Japanese and Korean OCR? BalloonsTranslator has a great workflow but the models are pretty old.

ks2048 • today at 5:01 PM

I've been trying different OCR models on what should be very simple - subtitles (these are simple machine-rendered text). While all models do very well (95+% accuracy), I haven't seen a model not occasionally make very obvious mistakes. Maybe it will take a different approach to get the last 1%...

rdos • today at 4:11 PM

Is it possible for such a small model to outperform gemini 3 or is this a case of benchmarks not showing the reality? I would love to be hopeful, but so far an open source model was never better than a closed one even when benchmarks were showing that.

➕ show 2 replies

sinandrei • today at 5:16 PM

Has anyone experiment with using VLM to detect "marks"? Thinking of pen/pencil based markings like underlines, circles,checkmarks.. Can these models do it?

➕ show 1 reply

bugglebeetle • today at 4:50 PM

I tested this pretty extensively and it has a common failure mode that prevents me from using: extracting footnotes and similar from the full text of academic works. For some reason, many of these models are trained in a way that results in these being excluded, despite these document sections often containing import details and context. Both versions of DeepseekOCR have the same problem. Of the others I’ve tested, dot-ocr in layout mode works best (but is slow) and then datalab’s chandra model (which is larger and has bad license constraints).

➕ show 2 replies

raphaelmolly8 • today at 5:02 PM

[dead]

alt Hacker News

GLM-OCR: Accurate × Fast × Comprehensive

Comments