It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.

simonw • yesterday at 10:21 PM • 1 reply • view on HN

Replies

You don't need a license to scrape the public web and analyze it, turn it into tokens and other transformations. Let's not expand copyright beyond the horrible monster it already is.

➕ show 1 reply

alt Hacker News

Replies