Where's the training data and training scripts since you are calling this open source ?

necovek • today at 4:20 AM • 4 replies • view on HN

Where's the training data and training scripts since you are calling this open source?

Edit: it seems "open source" was edited out of the parent comment.

Replies

b65e8bee43c2ed0 • today at 5:09 AM

doesn't it get tiring after a while? using the same (perceived) gotcha, over and over again, for three years now?

no one is ever going to release their training data because it contains every copyrighted work in existence. everyone, even the hecking-wholesome safety-first Anthropic, is using copyrighted data without permission to train their models. there you go.

➕ show 3 replies

woctordho • today at 8:36 AM

They are exactly open source. The training data is the internet. Don't say it's on the internet. It IS the internet.

The training scripts are in Megatron and vLLM.

bl4ckneon • today at 5:45 AM

Aww yes, let me push a couple petabytes to my git repo for everyone to download...

➕ show 1 reply

0-_-0 • today at 6:03 AM

Weights are the source, training data is the compiler.

➕ show 1 reply

alt Hacker News

Replies