logoalt Hacker News

nlyesterday at 11:51 PM1 replyview on HN

First model to get 100% on my agentic benchmark: https://sql-benchmark.nicklothian.com/?highlight=anthropic_c...


Replies

b--ltoday at 12:31 AM

grok-4.1-fast is the the number 2 model on this benchmark.

~~If you've used this model in real life to do any sort of programming, and have seen its output, you would know that there is something VERY wrong with your benchmark.~~

Edit: Oh sorry, I looked at the questions, I see this is also for SQL specifically. Interesting. Maybe they tuned that grok model for SQL. Cool site. I bookmarked it.

show 1 reply