Giving agents linux has compounding benefits in our experience. They're able to sort through weirdness that normal tooling wouldn't allow. Like they can read and image, get an error back from the API and see it wasn't the expected format. They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.
> They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.
Maybe I'm missing something, but it seems trivial to implement reading the magic bytes. I haven't tested it, but I'd expect most linux image displayers/editors to automatically work with misnamed files as that is almost entirely the purpose of magic bytes.
Personally, I think Microsoft is to blame for everyone relying on file extensions too much as it was a bad idea which led to a lot of security issues.
I don't understand why this is something special that somebody would need some LLM slop generation for? Any human can also do this in a few seconds using normal unix tooling.
Matches my experience with print-on-demand workflows. I tried using vision models to validate things like ICC profiles and total ink density, but they usually just hallucinate that the file is compliant. I ended up giving the agent access to ImageMagick to run analysis directly. It’s the only reliable way to catch issues before sending files to fulfillment, otherwise you end up eating the cost of failed prints.