logoalt Hacker News

cpursleyyesterday at 11:51 AM2 repliesview on HN

How are you prepping the PDF data before shoving it into Qwen?


Replies

Alifatiskyesterday at 12:05 PM

I just compress the file size as low as possible without losing the quality, didn't even know there was more ways to prep it.

I do sometimes chop up the PDF into smaller pdfs with their own individual chapters

show 1 reply
navbakeryesterday at 12:18 PM

Not OP, but we use the docling library to extract text and put it in markdown before storing for use with an LLM.