Letting Claude work a little longer produced this behemoth of a script (which is supposed to be some...

dperfect • yesterday at 6:06 PM • 1 reply • view on HN

Letting Claude work a little longer produced this behemoth of a script (which is supposed to be somewhat universal in correcting similar OCR'd PDFs - not yet tested on any others though): https://pastebin.com/PsaFhSP1

which uses this Rust zlib stream fixer: https://pastebin.com/iy69HWXC

and gives the best output I've seen it produce: https://imgur.com/itYWblh

This is using the same OCR'd text posted by commenter Joe.

Replies

daveguy • yesterday at 6:58 PM

> which is supposed to be somewhat universal in correcting similar OCR'd PDFs

Xerox would like a word.

https://news.ycombinator.com/item?id=29223815

Point being, "correcting" to "correct looking" may be worse than just accepting errors. Errors are often clearly identified by humans as a nonsense word. "Correcting" OCR can result in plausible, but wrong results that are more difficult for the human in the loop to identify.

➕ show 1 reply

alt Hacker News

Replies