> OCR for construction documents does not work
I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]
It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.
JBIG2 does glyph binning, as you say not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance.
If I recall it was an artifact of the compression algo.
Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...