> OCR for construction documents does not work I'm reminded of the Xerox JBIG2 bug back in...

Terr_ • today at 5:34 PM • 2 replies • view on HN

> OCR for construction documents does not work

I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]

It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.

[0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s

Replies

TehCorwiz • today at 5:48 PM

If I recall it was an artifact of the compression algo.

Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

hackcasual • today at 8:30 PM

JBIG2 does glyph binning, as you say not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance.

➕ show 1 reply

alt Hacker News

Replies