logoalt Hacker News

baxtryesterday at 5:34 PM1 replyview on HN

ok, fair enough. I think I misread your first comment.

not sure if that would work in this case since all these companies scraped (publicly) available data? So with the right resources anyone could redo it?


Replies

droobyyesterday at 6:09 PM

Well, two things worth considering.

First, training isn't a one-time event. These companies are continuously scraping new data, training new model generations, ingesting new human output. Every new model is a new extraction event. The fact that GPT-4 already trained on your 2022 blog post doesn't mean the window is closed. GPT-6 will train on your 2025 and 2026 output too. There's always a live point at which to assert a collective claim.

Likely - these models will always be training on us to better understand us and continue to be of value to us commercially.

Second, "anyone could redo it with the right resources" is technically true but practically meaningless. Anyone could theoretically drill for oil too. The barrier was never access to the crude sitting in the ground. It was the billions in infrastructure needed to extract and refine it. Same here. The data is public, but the compute required to turn it into a frontier model costs billions. That concentration of capital is exactly why a public claim on the value makes sense, just like it did with oil.