logoalt Hacker News

holtkam2yesterday at 7:55 PM2 repliesview on HN

at a certain point you're gonna need to change your benchmark because this will end up in the model's training set


Replies

simonwyesterday at 8:09 PM

Gemini were the team most likely to have this in their training set - see https://x.com/JeffDean/status/2024525132266688757 - and yet their latest model still messes up the bicycle frame!

recursiveyesterday at 9:58 PM

I'm sure that certain point came and went many releases ago.