logoalt Hacker News

simonwyesterday at 10:21 PM1 replyview on HN

It uses fineweb, which is derived from Common Crawl, which is an unlicensed scrape of web pages.


Replies

reedcicciotoday at 12:25 PM

You don't need a license to scrape the public web and analyze it, turn it into tokens and other transformations. Let's not expand copyright beyond the horrible monster it already is.

show 1 reply