logoalt Hacker News

dperfectyesterday at 1:26 AM4 repliesview on HN

Nerdsnipe confirmed :)

Claude Opus came up with this script:

https://pastebin.com/ntE50PkZ

It produces a somewhat-readable PDF (first page at least) with this text output:

https://pastebin.com/SADsJZHd

(I used the cleaned output at https://pastebin.com/UXRAJdKJ mentioned in a comment by Joe on the blog page)


Replies

pestsyesterday at 2:53 AM

So it was a public event attended by 450 people:

https://www.mountsinai.org/about/newsroom/2012/dubin-breast-...

https://www.businessinsider.com/dubin-breast-center-benefit-...

Even names match up, but oddly the date is different.

show 2 replies
dperfectyesterday at 6:06 PM

Letting Claude work a little longer produced this behemoth of a script (which is supposed to be somewhat universal in correcting similar OCR'd PDFs - not yet tested on any others though): https://pastebin.com/PsaFhSP1

which uses this Rust zlib stream fixer: https://pastebin.com/iy69HWXC

and gives the best output I've seen it produce: https://imgur.com/itYWblh

This is using the same OCR'd text posted by commenter Joe.

show 1 reply
notpushkinyesterday at 4:12 AM

> It produces a somewhat-readable PDF (first page at least) with this text output

Any chance you could share a screenshot / re-export it as a (normalized) PDF? I’m curious about what’s in there, but all of my readers refuse to open it.

show 1 reply
the_real_cheryesterday at 12:24 PM

This is cool!