logoalt Hacker News

NetOpWibbyyesterday at 7:24 PM1 replyview on HN

How are they able to compare with Fable when Fable was only available for three days?


Replies

Topfiyesterday at 7:50 PM

Terminalbench numbers are publicly available. What is more interesting, why is that the only benchmark they highlight. Maybe 5.6 isn’t that far ahead of Fable 5 in DeepSWE and FrontierCode (which I consider the most useful and close to my evals + subjective experience)…