logoalt Hacker News

fiddlerwoaroofyesterday at 9:06 PM0 repliesview on HN

For splitting double pages, this is the best tool I’ve seen: https://github.com/mbaeuerle/Briss-2.0

For the other issues, I haven’t found any single good tool, but I’ve stitched together things like unpaper, ghostscript and deskew ( https://github.com/galfar/deskew ).

Also, if you need OCR, hocr-tools and Google’s Document AI ocr API have worked really well for me (I tried Gemini, but you run into issues with big documents).