logoalt Hacker News

k310today at 3:33 PM2 repliesview on HN

This may be outside your plan, but I really could use a pdf editor that makes Internet Archive book scans more readable.

Apparently, the scanner(s) adopt some compromise setting that renders halftones OK, but gives all text a "dishwater gray" background.

If there are few pictures, I run the PDF through a quartz filter in Preview to threshold the text and later merge graphics pages with the "contact sheet" view from an un-threshold-ed image in Preview.app. This is slow and tedious.

Of course, computers are "smart," so they tell me, and should be able to recognize a picture from a block of text on the same page and render each one appropriately.

I used to do such editing of really important documents (like ads for pioneer computer products and gizmos like GENIAC and such)[0] pretty much by hand, splitting a PDF, if needed, into multiple images and hand/batch editing, then merging again.

I could use ImageMagick ... but it's not adaptive, as described above.

Geniac ad sample (imgbb.com)

[0] https://i.ibb.co/67zpBDgh/OIP-2472099845.jpg


Replies

fn-motetoday at 6:26 PM

Sounds like a job for ScanTailor. I'm not aware of an actively developed alternative. The version on my system comes from ScanTailor Advanced [3].

[1]: https://scantailor.org/ [2]: https://github.com/scantailor/scantailor [3]: https://github.com/4lex4/scantailor-advanced

philjohnsontoday at 4:34 PM

Neat idea. Basically an "Enhance Readability" button. I'm looking into how it can be done, will report back.