"Why would you ship tests?" — Fair point. Source maps only include production bundle files — tests wouldn't appear in the map regardless. Tests may well exist in Anthropic's internal repo, and we can't claim otherwise. However, the bugs we found speak for themselves: a watchdog that doesn't protect the most vulnerable code path for 5+ months, a fallback with telemetry that never executes where it's needed, Promise.race without catch silently dropping tool results. If tests exist, they clearly don't cover the streaming pipeline adequately — these are the kind of issues that even basic integration tests would catch.
You're not beating the "written by an LLM" allegations.