logoalt Hacker News

cbg0today at 5:36 AM3 repliesview on HN

Doesn't look like it: https://marginlab.ai/trackers/codex/


Replies

rahidztoday at 2:08 PM

So what are we to make of the two items:

- This tracker not showing any visible degradation. - Clearly incorrect answers being reported due to truncated thinking.

Is the tracker not measuring 'simpler' tasks that might get auto-sent to "low reasoning hell" even on high/xhigh? Is the clustering not actually causing reasoning misses in real-life coding, or not enough of a negative effect compared to the improvements made elsewhere? Something else?

linzhangruntoday at 8:34 AM

Thanks for sharing this project. Maybe I'm being subjective.

show 1 reply
mdgldtoday at 8:05 AM

Cool resource and perfect way to track this, thanks for sharing