logoalt Hacker News

throwaw12today at 9:30 AM1 replyview on HN

> For a RAG project for a client with a lot of PDFs and Powerpoints with images, I used ColPali a year ago

How was the accuracy compared to pre-parsing the image and doing search in the text?


Replies

vinzenzutoday at 4:10 PM

Leaps and bounds better! I don't think I benchmarked it.

But the experience was that it was able to find small details in PDFs, in technical diagrams, and this was really not captured well at all with OCR.

In general, OCR I think should be used more as an add-on to retrieve data, not given to the generation model itself. Similar to retrieving based off a text description and then giving the generation model the image.