logoalt Hacker News

epolanskiyesterday at 6:44 PM0 repliesview on HN

I think that they are simply evaluated on prompt to solution benchmarks.