logoalt Hacker News

floriansyesterday at 8:12 PM1 replyview on HN

What I want are precise and tight bounding boxes. Why is this so difficult?


Replies

philipkglassyesterday at 8:35 PM

The PP-DocLayoutV3 [1] bounding boxes are pretty good in my experience, if you want boxes around individual document headings or paragraphs. If you want boxes around individual words, similar to what's shown in the Interfaze screen shot [2], Apple has a LiveText "token" model that's proprietary but free/bundled with macOS and iOS. There are easy to use Python bindings here: https://github.com/straussmaximilian/ocrmac

I presume that some otherwise-great OCR models (like Chandra) have terrible bounding boxes because generating good bounding boxes just wasn't a training priority. A lot of people are using OCR models to bulk-process documents without a lot of care for how the layout is preserved. It matters a lot if (e.g.) you want to be able to update and re-print old documents, but it doesn't matter if you are just transcribing whole documents for indexing/chunking/translation.

[1] https://huggingface.co/PaddlePaddle/PP-DocLayoutV3

[2] https://r2public.jigsawstack.com/interfaze/examples/dense_te...

show 1 reply