logoalt Hacker News

wahernlast Thursday at 11:25 PM4 repliesview on HN

It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish.


Replies

JKCalhounyesterday at 1:57 PM

Not necessarily a PDF attachment?

Someone who made some progress on one Base64 attachment got some XMP metadata that suggested a photo from an iPhone. Now I don't know if that photo was itself embedded in a PDF, but perhaps getting at least the first few hundred bytes decoded (even if it had to be done manually) would hint at the file-type of the attachment. Then you could run your tests for file fidelity.

show 1 reply
sznioyesterday at 9:56 AM

>It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure

that's pointed out in the article. It's easy for plaintext sections, but not for compressed sections. Didn't notice any mention of checksums.

pimlottcyesterday at 1:40 AM

I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation.

cluckindanlast Thursday at 11:57 PM

On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible.

show 1 reply