logoalt Hacker News

poly2ityesterday at 9:29 PM2 repliesview on HN

A lot? I would be kind of interested if there were any known figures. Do companies want to be implicated in AA-cooperation in any capacity?


Replies

flexagoonyesterday at 10:11 PM

No specific figures, but see, for example:

https://annas-archive.gl/blog/ai-copyright.html

> Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.

> We have given high-speed access to about 30 companies. Most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan. DeepSeek admitted that an earlier version was trained on part of our collection, though they’re tight-lipped about their latest model (probably also trained on our data though).

It's at least 30 companies, each of which paid hundreds of thousands of dollars.

Cider9986yesterday at 9:33 PM

They likely use intermediary companies, but NVIDIA might have purchased from them directly, I don't remember the full story.