yea but i feel like we are over the hill on benchmaxxing, many times a model has beaten anthropic on a specific bench, but the 'feel' is that it is still not as good at coding
'feel' is no more accurate
not saying there's a better way but both suck
Your feeling is not my feeling, codex is unambiguously smarter model for me
When Anthropic beats Benchmarks its somehow earned, when OpenAi games it, its somehow about not feeling good at coding.