logoalt Hacker News

nestorD05/03/20251 replyview on HN

Nice! I will give it a try later today.

For people who want a non web-based alternative, these days I use Xournal++ (https://xournalpp.github.io/) to do that type of edition locally.

What I am still looking for is a good way to clean scanned PDFs: split double pages, clean up text and make sure it is in lack and white, clean deformations and margin, cut and maybe rotate pages, compress the end result.


Replies

fiddlerwoaroof05/03/2025

For splitting double pages, this is the best tool I’ve seen: https://github.com/mbaeuerle/Briss-2.0

For the other issues, I haven’t found any single good tool, but I’ve stitched together things like unpaper, ghostscript and deskew ( https://github.com/galfar/deskew ).

Also, if you need OCR, hocr-tools and Google’s Document AI ocr API have worked really well for me (I tried Gemini, but you run into issues with big documents).