logoalt Hacker News

simonwyesterday at 7:29 PM24 repliesview on HN

The pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment...

Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars.

Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl...


Replies

hedgehogyesterday at 7:32 PM

That pelican looks like it's in Miami for a crypto conference.

show 10 replies
irthomasthomasyesterday at 7:48 PM

This is a perfect illustration of something I noticed with llm progress. Ask them to improve an svg like this, and it never fixes the missing crossbar or disconnected limbs, it just adds more stuff. In this example they have obviously improved greatly, and it contains a ridiculous amount of detail, but they still to get the basic shape of the frame wrong. It's weird. And the pattern shows up everywhere, try it with a webpage and it will add more buttons and stuff. I've even experimented with feeding the broken pelican svgs to an image model to look for flaws, and they still fail to spot the broken elements.

edit: fixed human hallucination

show 7 replies
tantaloryesterday at 8:01 PM

Forgetting the chainstay is typical of asking random people to draw a bicycle.

https://www.gianlucagimini.it/portfolio-item/velocipedia/

> most ended up drawing something that was pretty far off from a regular men’s bicycle

show 2 replies
VectorLocktoday at 4:40 AM

The fact it went for vaporwave styling on its own is very telling.

smcleodyesterday at 7:55 PM

I feel like it embodies Google's vibe of an uncool guy trying to stay relevant to the youth pretty well.

show 1 reply
tandrtoday at 12:18 AM

If you sort that table by "output token price", it gets really terrifying - going from 4 cents up to $600 =8-O

nrdstoday at 12:27 AM

We've been daily-driving this model for a few weeks and let me tell you, everything it does is a lot. Fast as fuck and it's actually not bad intelligence-wise for a fast model. It basically tries to make up for any intelligence deficit by just doing a lot, checking a lot, retrying a lot.

That's not to say I don't spend my days raging at it... a lot... but it's not that bad. It does tend to ignore completion criteria but it doesn't obviously degrade when being nudged like some models do.

dekhntoday at 12:55 AM

I'm told there is a new Jeff Dean fact inside google: "Jeff Dean manually adjusts the weights in the model just to screw with Simon".

karmakazetoday at 1:01 AM

I'm hoping we'll have many of these pelican cyclist pictures collected. Then when all the models can do it well, we'll stop posting about them, and dhen the next generations of AIs train on the data we'll have these canonical archetypes.

bee_ridertoday at 2:22 AM

I wonder if they added all these unrequested details as an Easter-egg or something? (Since they must be aware of your test by now).

hydra-fyesterday at 7:38 PM

Same old issue with Gemini models trying to "enrich" everything

taurathtoday at 12:40 AM

I can’t help but think that what AI is best at is convincing management that things it creates are full featured which reads to their brains as mature

nickvecyesterday at 9:31 PM

I enjoy the vaporwave aesthetic it went for. Looks like the pelican has a fish in its mouth too?

https://en.wikipedia.org/wiki/Vaporwave

khyyesterday at 8:56 PM

That sun is very similar to the one from the background of this other top HN post about the OS museum: https://news.ycombinator.com/item?id=48195009

sbinneeyesterday at 10:16 PM

Wow what’s with all the styling? Is it manifestation of google’s styling bias? I like the result for sure. It’s shiny and pretty. But then it’s something I didn’t ask for.

danilocesartoday at 12:18 AM

Given your pelican is very famous now, don't you think they are adding instructions to beat this benchmark those days?

show 1 reply
Razengantoday at 1:56 AM

I've found prompts like "capybara with spotted fur and 7 octopus tentacles instead of legs, each a different color, riding a tricycle" etc. to be a better test

Last time I tried, ChatGPT's image generator got the best result.

setgreeyesterday at 9:05 PM

`<!-- Pelican Eye / Sunglasses (Cool Retro Aviators) -->`

wtf

`<!-- Gold Rim -->`

WTF??

__mharrison__yesterday at 10:03 PM

They are just trolling you now

gcgbarbosayesterday at 7:53 PM

funny that when I try the same prompt, gemini generates an image, not an SVG. something is not right.

show 1 reply
nashashmiyesterday at 7:44 PM

Beats a human by like 10$

show 1 reply
TacticalCoderyesterday at 10:30 PM

Love your pelicans, as always. And that one is... Wow.

I noticed the "Synthwave" aesthetic, which is enjoying quite some success since quite some time now, has found its way into AI models (even when it's not in the user's query). It's not the first time I see the sun at sunset with color bands etc. in AI-generated pictures. Don't know why it's now taking on in AI too.

https://en.wikipedia.org/wiki/Synthwave

Hence the comments here about the 90s, Sonny Crockett's white Ferrari Testarossa in Miami, etc.

To be honest as a kid from the 80s and a teenager from the 90s who grew up with that aesthetic in posters, on VHS tape covers, magazine covers, etc. I do love that style and I love that it made a comeback and that that comeback somehow stayed.

show 2 replies
holtkam2yesterday at 7:55 PM

at a certain point you're gonna need to change your benchmark because this will end up in the model's training set

show 2 replies