this explains so much why gpt 5.5 has been so bad lately it was really puzzling why it struggled so much where when it first came out it was one shotting stuff totally amazing, i tried the prompt that will tell you if your plan is degraded:
codex exec --json --skip-git-repo-check --ephemeral -s read-only --disable memories -m gpt-5.5 -c model_reasoning_effort=high "Do not use external tools. A black bag contains candies with counts: round apple 7, round peach 9, round watermelon 8; star apple 7, star peach 6, star watermelon 4. Shape is distinguishable by touch before drawing; flavor is not. What is the minimum number of candies to draw to guarantee having apple and peach candies of different shapes, i.e. round apple + star peach or round peach + star apple? Give reasoning and final number. The local project dir is irrelevant for this task, do not consult it. "
1. 516, 242. 516, 27
3. 516, 12
4. 516, 21
5. 516, 21
This means that the whole time we've been paying for a product that was silently routing to something completely different and inferior from gpt 5.5
Also I read through the github issues and it seems like they closed a previous issue without addressing it ???!!
whooo boy somebody from OpenAI is getting fired over this if not a class action lawsuit is almost guaranteed at this point.
Verified this locally myself. Thanks for the concrete test. I guess it's time to give Claude another try.
The correct answer is 29, right? You could draw all the watermelons and all the round pieces before drawing a star piece. So the model never gets it right, but it does when listing the cases exhaustively?