Haven’t looked at the code, but is the server providing the client with a system prompt that it can use, which would contain fake tool definitions when this is enabled? What enables it? And why is the client still functional when it’s giving the server back a system prompt with fake tool definitions? Is the LLM trained to ignore those definitions?
Wonder if they’re also poisoning Sonnet or Opus directly generating simulated agentic conversations.
Not sure, and not completely convinced of the explanation, but the way this sticks out so obviously makes it look like a honeypot to me.