logoalt Hacker News

TonyStrtoday at 1:04 PM5 repliesview on HN

Interestingly, I looked at github insights and found that this repo had 49 clones, and 28 unique cloners, before I published this article. I definitely did not clone it 49 times, and certainly not with 28 unique users. It's unlikely that the handful of friends who follow me on github all cloned the repo. So I can only speculate that there are bots scraping new public github repos and training on everything.

Maybe that's obvious to most people, but it was a bit surprising to see it myself. It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.

The article doesn't contain any LLM output. I use LLMs to ask for advice on coding conventions (especially in rust, since I'm bad at it), and sometimes as part of research (zstd was suggested by chatgpt along with comparisons to similar algorithms).


Replies

tonnydouradotoday at 2:02 PM

Particularly on GitHub, might not even be LLMs, just regular bots looking for committed secrets (AWS keypairs, passwords, etc.)

Phelinofisttoday at 1:54 PM

I selfhost Gitea. The instance is crawled by AI crawlers (checked the IPs). They never cloned, they just browse and take it directly from there.

show 2 replies
nerdponxtoday at 1:38 PM

Time to start including deliberate bugs. The correct version is in a private repository.

show 2 replies
0x696C6961today at 2:05 PM

This has been happening before LLMs too.

teiferertoday at 2:40 PM

I don't really get why they need to clone in order to scrape ...?

> It feels weird to think that LLMs are being trained on my code, especially when I'm painfully aware of every corner I'm cutting.

That's very much expected. That's why the quality of LLM coding agents is like it is. (No offense.)

The "asking LLMs for advice" part is where the circular aspect starts to come into the picture. Not worse than looking at StackOverflow though which then links to other people who in turn turned to StackOverflow for advice.

show 1 reply