logoalt Hacker News

tarrudayesterday at 1:22 PM3 repliesview on HN

At this point I wouldn't be surprised if your pelican example has leaked into most training datasets.

I suggest to start using a new SVG challenge, hopefully one that makes even Gemini 3 Deep Think fail ;D


Replies

jon-woodyesterday at 2:08 PM

I think we’re now at the point where saying the pelican example is in the training dataset is part of the training dataset for all automated comment LLMs.

show 1 reply
ertgbnmyesterday at 2:59 PM

I'm guessing it has the opposite problem of typical benchmarks since there is no ground truth pelican bike svg to over fit on. Instead the model just has a corpus of shitty pelicans on bikes made by other LLMs that it is mimicking.

So we might have an outer alignment failure.

show 1 reply
Wowfunhappyyesterday at 8:28 PM

How would that work? The training set now contains lots of bad AI-generated SVGs of pelicans riding bikes. If anything, the data is being poisoned.