logoalt Hacker News

ZeroCool2uyesterday at 6:38 PM1 replyview on HN

Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.


Replies

csnwebyesterday at 6:40 PM

Are you may be comparing the pro model to the non pro model with thinking? Granted it’s a bit confusing but the pro model is 10 times more expensive and probably much larger as well.

show 1 reply