logoalt Hacker News

woogerlast Friday at 11:50 AM1 replyview on HN

> unless I guess they routinely index this repo

This sounds like exactly the kind of thing any tech company would do when confronted with a competitive benchmark.


Replies

rsaneklast Friday at 1:40 PM

I mean, the repo has <200 stars, it's not like it's so mainstream that you'd expect LLM makers to be watching it actively. If they wanted to game it, they could more easily do that in RL with synthetic data anyway.