This is preliminary, but it seems like it might somehow be related to the `## Intermediary updates` system prompt that's provided to the model. Seems like it forces the model to stop thinking and return early to provide updates. Removing that entirely makes all runs succeed [1].
I wonder if it's somehow getting confused between what's supposed to be an intermediate update vs the final result.
This is preliminary, but it seems like it might somehow be related to the `## Intermediary updates` system prompt that's provided to the model. Seems like it forces the model to stop thinking and return early to provide updates. Removing that entirely makes all runs succeed [1].
I wonder if it's somehow getting confused between what's supposed to be an intermediate update vs the final result.
[1] https://github.com/openai/codex/issues/30364#issuecomment-48...