Doesn't look like it: | alt Hacker News

cbg0 • today at 5:36 AM • 3 replies • view on HN

Doesn't look like it: https://marginlab.ai/trackers/codex/

Replies

So what are we to make of the two items:

- This tracker not showing any visible degradation. - Clearly incorrect answers being reported due to truncated thinking.

Is the tracker not measuring 'simpler' tasks that might get auto-sent to "low reasoning hell" even on high/xhigh? Is the clustering not actually causing reasoning misses in real-life coding, or not enough of a negative effect compared to the improvements made elsewhere? Something else?

linzhangrun • today at 8:34 AM

Thanks for sharing this project. Maybe I'm being subjective.

➕ show 1 reply

mdgld • today at 8:05 AM

Cool resource and perfect way to track this, thanks for sharing