logoalt Hacker News

greenfish6yesterday at 6:47 PM3 repliesview on HN

yea but i feel like we are over the hill on benchmaxxing, many times a model has beaten anthropic on a specific bench, but the 'feel' is that it is still not as good at coding


Replies

falloutxyesterday at 7:33 PM

When Anthropic beats Benchmarks its somehow earned, when OpenAi games it, its somehow about not feeling good at coding.

AstroBenyesterday at 6:49 PM

'feel' is no more accurate

not saying there's a better way but both suck

show 4 replies
karmasimidayesterday at 7:57 PM

Your feeling is not my feeling, codex is unambiguously smarter model for me