This is preliminary, but it seems like it might somehow be related to the `## Intermediary updates` system prompt that's provided to the model. Seems like it forces the model to stop thinking and return early to provide updates. Removing that entirely makes all runs succeed [1].
I wonder if it's somehow getting confused between what's supposed to be an intermediate update vs the final result.
[1] https://github.com/openai/codex/issues/30364#issuecomment-48...