logoalt Hacker News

rustyhancockyesterday at 7:58 PM5 repliesview on HN

The intensity of competition between models is so intense right now they are definitely benchmaxxing pelican on bike SVGs and Will Smith spaghetti dinner videos.


Replies

bonesssyesterday at 8:48 PM

Parallel hypothesis: the intensity of competition between models is so intense that any high-engagement high-relevance web discussion about any LLM/AI generation is gonna hit the self-guided self-reinforced model training and result in de facto benchmaxxing.

Which is only to say: if we HN-front-page it, they will come (generate).

staredyesterday at 8:16 PM

There was Lenna for digital image compression (https://en.wikipedia.org/wiki/Lenna).

A pelican on a bike is SFW, inclusive, yet cool.

It is not a full benchmark - rather a litmus test.

show 1 reply
bayindirhyesterday at 8:05 PM

So, again, when the indicator becomes a target, it stops being a good indicator.

show 3 replies
thatguysaguyyesterday at 8:36 PM

You can just try other svgs, I got some pretty good ones.

(*Disclaimer: I work for Google, but also I have zero idea about what they trained deepthink on)

yieldcrvyesterday at 8:13 PM

note that this benchmark aside, they've gotten really good at SVGs, I used to rely on the nounproject for icons, and sometimes various libraries, but now coding agents just synthesize an SVG tag in the code and draw all icons.