My stupid pelican benchmark proves to be genuinely quite useful here, you get a visual representatio...

simonw • yesterday at 9:20 PM • 1 reply • view on HN

My stupid pelican benchmark proves to be genuinely quite useful here, you get a visual representation of the quality difference between GPT-5.3-Codex-Spark and full GPT-5.3-Codex: https://simonwillison.net/2026/Feb/12/codex-spark/

Replies

lacoolj • yesterday at 9:25 PM

These are the ones I look for every time a new model is released. Incorporates so many things into one single benchmark.

Also your blog is tops. Keep it up, love the work.

alt Hacker News

Replies