This doesn't have API access yet, but OpenAI seem to approve of the Codex API backdoor used by OpenClaw these days... https://twitter.com/steipete/status/2046775849769148838 and https://twitter.com/romainhuet/status/2038699202834841962
And that backdoor API has GPT-5.5.
So here's a pelican: https://simonwillison.net/2026/Apr/23/gpt-5-5/#and-some-peli...
I used this new plugin for LLM: https://github.com/simonw/llm-openai-via-codex
UPDATE: I got a much better pelican by setting the reasoning effort to xhigh: https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602...
That pelican you posted yesterday from a local model looks nicer than this one.
Edit: this one has crossed legs lol
Isn't it awful ? After 5.5 versions it still can't draw a basic bike frame. How is the front wheel supposed to turn sideways ?
The pelican doesn’t really matter anymore since models are tuned for it knowing people will ask.
So pelican must have become the mandatory test case to pass for all model providers before launch.
I made pelicans at different thinking efforts:
https://hcker.news/pelican-low.svg
https://hcker.news/pelican-medium.svg
https://hcker.news/pelican-high.svg
https://hcker.news/pelican-xhigh.svg
Someone needs to make a pelican arena, I have no idea if these are considered good or not.
Is this direct API usage allowed by their terms? I remember Anthropic really not liking such usage.
That's amazing that the default did that much in just 39 "reasoning tokens" (no idea what a reasoning token is but that's still shockingly few tokens)
Hmm. Any idea why it's so much worse than the other ones you have posted lately? Even the open weight local models were much better, like the Qwen one you posted yesterday.
I for one delight in bicycles where neither wheel can turn!
It continues to amaze me that these models that definitely know what bicycle geometry actually looks like somewhere in their weights produces such implausibly bad geometry.
Also mildly interesting, and generally consistent with my experience with LLMs, that it produced the same obvious geometry issue both times.
what is your setup for drawing pelican? Do you ask model to check generated image, find issues and iterate over it which would demonstrate models real abilities?
Thank you for continuing to post these! Very interesting benchmark.
Wait, I thought we were onto racoons on e-scooters to avoid (some of) the issues with Goodhart's Law coming into play.
You know they are 1000% training these models to draw pelicans, this hasn't been a valid benchmark for 6 months +
At some point, OpenAI is going to cheat and hardcode a pelican on a bicycle into the model. 3D modelling has Suzanne and the teapot; LLMs will have the pelican.
OpenAI hired the guy behind OpenClaw, so it makes sense that they’re more lenient towards its usage.