>which have indexed all of the books and used pirated copies to do so
Funnily enough, people on HN often do not consider this an issue, like at all... I wonder how they'd think about it if they had created something (meaningful) that was subjected to this. I love Go and learned it a lot in the past 2 years but ultimately put it down in favor of more "batteries included" solutions as I don't trust myself enough as a dev to confidently handle concurrency in Go. Still, it's a beautiful language and if I ever come back I hope I can still find books about it, as I hate using AI for learning.
> Funnily enough, people on HN often do not consider this an issue, like at all...
I didn't have a problem with it when it was Aaron Swartz, not sure why I should have a problem with it when others do it.
A few years ago (before LLMs were as good as they are today) I wanted an LLM to do a RAG like memory on all the books I own. My dream was that every book I purchased would go into my LLM making it better but also giving me a reference back to the text to look up and help me get better.
Honestly I didn't expect LLMs to progress so fast. Now it just seems like an unnecessary solution to a problem that no longer exists.
I'd rather not have copyright at all, as I said in another comment it's not useful anymore. Information should instead just be free for everyone.
> I wonder how they'd think about it if they had created something (meaningful) that was subjected to this.
I used to write books in the past (all obsolete since, well, two decades+ now) and I'm totally fine with piracy: people who are pirating content are typically not those who are going to pay for it anyway.
As a sidenote I'd really wish that state resources spent fighting bad actors in society was first uses to catch and imprison rapists and the likes and not chasing pirates sailing the digital high-seas but I digress...
Priorities.
> > which have indexed all of the books and used pirated copies to do so
> Funnily enough, people on HN often do not consider this an issue, like at all...
That is far from true - opinion is quite divided, perhaps even close to 50/50. It sometimes seems that the opinion is skewed massively towards the positive because there are a lot more “look what I did with GenAI” stories because “yeah, I'm not doing that because… here's what I did the old way” doesn't catch interest in the same manner.
This is one of the (several) reasons I'm doing my level best to avoid using the tools - I don't want to pay in to the companies that have run ripshod over everyone's work because they can¹. This is a rather risky position to take in a company where the up-aboves have all but said “get with AI or get left behind”, but quite frankly at the moment “redundancy” isn't a scary word for me².
--------
[1] Take from a few (i.e. download a couple of TV shows) and it is piracy making you liable for huge fines or even prison time, take from practically everyone (hoover up all their published writing irrespective of licence, gum up their servers with your badly written, or well written but deliberately badly behaved, scraper, etc…) and that is perfectly valid for training purposes.
[2] I appreciate that for many this is not the case, and because of economic pressures they might have to compromise on their feelings if they have the same opinions as I do on GenAI.
I have a different impression, that the folks here are divided in this issue, with a half being AI maximalists saying it's a necessary evil while the other half condemning such practices, maybe not as much as to protect copyright per se, but because there are two different measures here. While teenagers get ridiculous fines for sharing MP3, big corp gets the free pass for stealing data on a industrial scale.