logoalt Hacker News

tiarafawnyesterday at 5:35 AM0 repliesview on HN

Could the real reason for this be more centered around generation and control of new training data?

I suspect the same for the forced high AI usage quotas for developers at MS etc. We've had multiple generations of models trained on all of the code that's available and there are diminishing returns on how much that data can do for training now. Newly published publicly available data is also made up of a significant portion of slop.

The best way to get fresh training data from real human brains might be to have real humans use your first party tools where you control all of the telemetry.