Using a throwaway account for obvious reasons, but I’m very involved in this space using LLMs from multiple providers. I’m aware of at least two instances in which the intermediate infrastructure “swapped” responses, once impacting Claude models and once impacting GPT models, from two different providers.
One gave us a proper postmortem in which their API gateway was incorrectly handling HTTP 100 status codes, putting them into an error state where there was effectively an off by one error - you would receive the response to the prompt that came in before yours and would pay it forward (your response would go to the next caller).
The other instance never had root cause explained to us, and we were just told to trust it wouldn’t happen again.
Both of these are from $1T+ companies.
ZDR wasn’t compromised in these cases since it was responses being swapped in flight. I wouldn’t be surprised if this is a similar issue - it’s not that data is being retained, it’s just not being safely isolated in intermediate infrastructure.
Actually, it’s not obvious why you’re using a throwaway account…
Every emergent behavior from these actors - whose claim to positive moral values is barely plausible - should be reported, discussed, dissected and critiqued early and often.
Woah. Sounds plausible. However, wouldn’t that still be an implicit violation of ZDR since now the response is possibly egressed out of the enterprise network? So if I were working with PHI, the response egress is a potential violation of HIPAA even though claude didn’t retain anything — but the whole Point was to comply with HIPAA. Thoughts?
These companies(at least one of them) seem lead by idiots(Hint:his name is Dario) so I wouldn’t be surprised to have multiple wtf moment if you were to see how they treat our data…Let’s just start pushing for opening up AI models because they are too dangerous behind paid walls. That would be a great regulation.
This attack is called "HTTP desync" or "request smuggling". It's often done intentionally by a client to try and spy on other clients' responses.
Every time you multiplex requests from multiple clients onto one upstream connection, you are probably vulnerable to this, because (despite its superficial simplicity) HTTP is just too complex to reliably match the requests and responses to upstream.
For example a desync can be triggered in some systems by having more than one Content-Length header, by mixing Content-Length with chunked encoding, or by passing an HTTP/2 header called Content-Length that doesn't match the actual content length.
Here's a DEF CON talk (6 years ago) on this topic: https://www.youtube.com/watch?v=w-eJM2Pc0KI
The same attack has been applied to SMTP by messing up the line endings surrounding the end-of-message delimiter, where it's called SMTP smuggling. It may also apply to other protocols.