logoalt Hacker News

Terr_today at 5:34 PM2 repliesview on HN

> OCR for construction documents does not work

I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]

It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.

[0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s


Replies

TehCorwiztoday at 5:48 PM

If I recall it was an artifact of the compression algo.

Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

hackcasualtoday at 8:30 PM

JBIG2 does glyph binning, as you say not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance.

show 1 reply