logoalt Hacker News

hparadizyesterday at 2:34 AM2 repliesview on HN

You can train models locally now and use open source ones and there's a robust community of people training, retraining, and generally pulling data from anywhere. And then new models get trained on old models. The models in use now are already several generations deep even further trained on code freely given by the entire industry. It's like complaining about being 1/100000th of a soup with no real proof you're even in it. Can you provide proof that a model used your code? It's like a remix of a remix of a remix.


Replies

Nursieyesterday at 9:05 AM

> It's like complaining about being 1/100000th of a soup with no real proof you're even in it.

I love a good analogy, especially one that takes a complex situation in which esoteric, unusual conditions are distilled and related back to common experiences held by the reader, such that all can understand.

Next time I'm a small part of a soup I'll think of this.

whattheheckheckyesterday at 3:04 AM

The fact that github copilot had an option to block generated code that matched public examples and the fact that the llms can regenerate Harry Potter books verbatim means the training data is definitely "stored in a digital system of retrieval" but Goodluck actually having common sense win vs trillionaire incentive group stealing from everyone