logoalt Hacker News

nradovtoday at 12:24 AM1 replyview on HN

It's so horrible that in 2026 people are still publishing important data and specifications in a format like PDF that's difficult for LLMs to consume. We need to drag them kicking and screaming to HTML or Markdown. Heck, even Microsoft Word DOCX is superior for reliable parsing and content extraction.


Replies

dannywtoday at 8:30 AM

Good luck, getting rid of PDFs is going to be as hard as migrating from JPEG everywhere.