logoalt Hacker News

necovektoday at 4:20 AM4 repliesview on HN

Where's the training data and training scripts since you are calling this open source?

Edit: it seems "open source" was edited out of the parent comment.


Replies

b65e8bee43c2ed0today at 5:09 AM

doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?

no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.

show 3 replies
woctordhotoday at 8:36 AM

They are exactly open source. The training data is the internet. Don't say it's on the internet. It IS the internet.

The training scripts are in Megatron and vLLM.

bl4ckneontoday at 5:45 AM

Aww yes, let me push a couple petabytes to my git repo for everyone to download...

show 1 reply
0-_-0today at 6:03 AM

Weights are the source, training data is the compiler.

show 1 reply