They must have some sort of smoke tests for common operations, run in a test harness with the system prompts they force on users, right?
....Right?
What kind of Mickey mouse operation are they running over there?
I wouldn't bet a chocolate chip cookie on that.
In the original claude degradation followup email Boris mentioned they are upping the percentage of engineers required to use the public version of claude code. I have no idea what percentage this is, or how much of a punishment it is considered to be. :)
That said, I was sympathetic to the recent bug reports —- to trigger one, you’d need to have a session that waited an hour doing nothing and then very specifically tested for in-context retrieval. I don’t want to run that test, do you want to run that test?