The pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment...
Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars.
Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl...
This is a perfect illustration of something I noticed with llm progress. Ask them to improve an svg like this, and it never fixes the missing crossbar or disconnected limbs, it just adds more stuff. In this example they have obviously improved greatly, and it contains a ridiculous amount of detail, but they still to get the basic shape of the frame wrong. It's weird. And the pattern shows up everywhere, try it with a webpage and it will add more buttons and stuff. I've even experimented with feeding the broken pelican svgs to an image model to look for flaws, and they still fail to spot the broken elements.
edit: fixed human hallucination
Forgetting the chainstay is typical of asking random people to draw a bicycle.
https://www.gianlucagimini.it/portfolio-item/velocipedia/
> most ended up drawing something that was pretty far off from a regular men’s bicycle
The fact it went for vaporwave styling on its own is very telling.
I feel like it embodies Google's vibe of an uncool guy trying to stay relevant to the youth pretty well.
If you sort that table by "output token price", it gets really terrifying - going from 4 cents up to $600 =8-O
We've been daily-driving this model for a few weeks and let me tell you, everything it does is a lot. Fast as fuck and it's actually not bad intelligence-wise for a fast model. It basically tries to make up for any intelligence deficit by just doing a lot, checking a lot, retrying a lot.
That's not to say I don't spend my days raging at it... a lot... but it's not that bad. It does tend to ignore completion criteria but it doesn't obviously degrade when being nudged like some models do.
I'm told there is a new Jeff Dean fact inside google: "Jeff Dean manually adjusts the weights in the model just to screw with Simon".
I'm hoping we'll have many of these pelican cyclist pictures collected. Then when all the models can do it well, we'll stop posting about them, and dhen the next generations of AIs train on the data we'll have these canonical archetypes.
I wonder if they added all these unrequested details as an Easter-egg or something? (Since they must be aware of your test by now).
Same old issue with Gemini models trying to "enrich" everything
I can’t help but think that what AI is best at is convincing management that things it creates are full featured which reads to their brains as mature
I enjoy the vaporwave aesthetic it went for. Looks like the pelican has a fish in its mouth too?
That sun is very similar to the one from the background of this other top HN post about the OS museum: https://news.ycombinator.com/item?id=48195009
Wow what’s with all the styling? Is it manifestation of google’s styling bias? I like the result for sure. It’s shiny and pretty. But then it’s something I didn’t ask for.
Given your pelican is very famous now, don't you think they are adding instructions to beat this benchmark those days?
I've found prompts like "capybara with spotted fur and 7 octopus tentacles instead of legs, each a different color, riding a tricycle" etc. to be a better test
Last time I tried, ChatGPT's image generator got the best result.
`<!-- Pelican Eye / Sunglasses (Cool Retro Aviators) -->`
wtf
`<!-- Gold Rim -->`
WTF??
They are just trolling you now
funny that when I try the same prompt, gemini generates an image, not an SVG. something is not right.
Love your pelicans, as always. And that one is... Wow.
I noticed the "Synthwave" aesthetic, which is enjoying quite some success since quite some time now, has found its way into AI models (even when it's not in the user's query). It's not the first time I see the sun at sunset with color bands etc. in AI-generated pictures. Don't know why it's now taking on in AI too.
https://en.wikipedia.org/wiki/Synthwave
Hence the comments here about the 90s, Sonny Crockett's white Ferrari Testarossa in Miami, etc.
To be honest as a kid from the 80s and a teenager from the 90s who grew up with that aesthetic in posters, on VHS tape covers, magazine covers, etc. I do love that style and I love that it made a comeback and that that comeback somehow stayed.
at a certain point you're gonna need to change your benchmark because this will end up in the model's training set
That pelican looks like it's in Miami for a crypto conference.