I spent years in the early 2000s trying to get a computer to read unstructured PDFs and TIFF images ...

marcus_holmes • today at 2:49 AM • 0 replies • view on HN

I spent years in the early 2000s trying to get a computer to read unstructured PDFs and TIFF images (mainly invoices, either scanned or electronic). Limited success, we always had to get a human to look at them in the end.

We implemented that in about three days earlier this year, just by feeding the files to LLMs. And it's good enough to not need a human to check.

I get that this isn't a "Computer Science breakthrough" in the sense you mean, but it used to involve a lot of hard CS to try and solve, and now it doesn't.

alt Hacker News