logoalt Hacker News

jasonjmcgheetoday at 2:05 PM1 replyview on HN

That's Grok 4.2 not 4.3 right?

And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5)


Replies

michaelbuckbeetoday at 3:36 PM

Good catch, there was an issue with the second hardest thing in programming (caching).

Here's an updated eval with the proper models https://a3bmfqfom3.evvl.io/

show 1 reply