logoalt Hacker News

dvduvaltoday at 2:09 PM9 repliesview on HN

The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.


Replies

Ensorceledtoday at 2:18 PM

Worse, the constant AI scraping is actually costing content providers additional money for no return. At least Google/Bing/Yahoo scraping would then be used to provide links back to your content.

show 2 replies
motbus3today at 2:19 PM

About a year ago OpenAI crawled and go DDOS level the company I work. Even despite the robots.txt not allowing it, and despite some recaptcha we could assemble in time.

We found our data in the outputs of their models but who can do anything about it...

show 4 replies
b00ty4breakfasttoday at 4:12 PM

>Why look at a website when it's all in AI?

well, at least in the case of google, I'm pretty sure that's the point. Or at least, they are doing things that would seem to be moving towards being an oracle with all the answers and not the signpost that points you in the right direction. The destination rather than the gateway.

show 1 reply
spacechild1today at 2:41 PM

It's actually costing them money/time! A friend of mine is a sysadmin at a university and he constantly has to deal with AI crawler DDoS-ing his servers. He said Anthropic is actually one of the worst offenders.

These AI companies are really just a gross example of the motto "Socialize the costs, privatise the profits". It's disgusting!

aaarrmtoday at 2:39 PM

Is it possible able to host your website in a way so that it couldn't be found via search engines (and thus wouldn't be crawlable I hope)?

I know this has repercussions on findability, but if that wasn't a concern, I'm curious how one might circumvent getting crawled.

show 5 replies
wolttamtoday at 2:21 PM

I’ve been thinking of a proof-of-work scheme for accessing content where you effectively need to mine some crypto for the author, but, this idea might not fly today

show 3 replies
gabbagooltoday at 3:43 PM

I agree with this whole heartedly. What's the point of even having copyright law at this point?

What's even crazier to think about is that to use the latest versions of these models for which you supplied training data, you have to pay hundreds of dollars a month. I would love to get a settlement check proportional to my model weights. Even if it's $0.10, at least everyone out there will get what they're owed.

show 2 replies
WarmWashtoday at 2:41 PM

[flagged]

show 7 replies
internet2000today at 3:07 PM

Perhaps we should go back to back when the internet was about sharing information you liked, not about credit or making money on "content".

show 1 reply