logoalt Hacker News

kevin_thibedeaulast Thursday at 11:53 PM1 replyview on HN

pdftoppm and Ghostscript (invoked via Imagemagick) re-rasterize full pages to generate their output. That's why it was slow. Even worse with a Q16 build of Imagemagick. Better to extract the scanned page images directly with pdfimages or mutool.

Followup: pdfimages is 13x faster than pdftoppm


Replies

masfuerteyesterday at 3:04 PM

This. Not only is it faster, the images are likely to be of better quality. If you rasterize the pages then the images will be scaled, unless you get very lucky.