logoalt Hacker News

News outlets are limiting the Internet Archive’s access to their journalism

122 pointsby jaredwienertoday at 4:59 PM39 commentsview on HN

Comments

remustoday at 7:10 PM

That's a real shame. I am involved with some history-related projects and the number of websites which go offline is huge, and the wayback machine is incredibly helpful for unearthing these dead sites.

It is not hard to imagine a future in 50 years time where a huge percentage of this content is lost forever, or at best incredibly hard to find.

show 1 reply
wormiustoday at 7:13 PM

Ugh - our local paper used to have a wonderful archive, that got limited and locked down after the pandemic. IDK if they got bought out, but it's a real shame, I think some of the problem is things that used to be public information (birthdates, families, names) in hospital admissions (I found old entries of my friends parents and my own for being "in the hospital" in the newspaper for example).

I'm sure that plays a role, but still... This obviously is about cost and money making, not security as a whole (ime)

svachalektoday at 6:30 PM

There really should be a micropayments setup on the internet that's not advertising based. Let these models pay a nickel to read the article, covered by the multi trillion dollar AI blank check.

show 2 replies
sandeepkdtoday at 6:56 PM

I think its bound to happen and in some ways it a good thing to happen too. The current state of AI affairs is a lot about outrightly selling some one else's intellectual property. The short term incentives are eroding the trust and goodwill among the natural knowledge actors.

The next natural thing to happen would be privatization or consolidation of the internet itself. Its already happening in the form of grabbing and consolidating IPv4 addresses.

show 2 replies
evanjrowleytoday at 7:44 PM

They should allow access after the news becomes old. That's what the archive is intended for.

arjietoday at 7:58 PM

It's interesting how much we lost with the end of the advertising model (though likely its death would arrive with agentic access anyway). An unsurprising reaction to that was the advent of the widespread paywall. And in a world where every paywalled article on social media, including HN, is on an archived paywall-bypass site there was going to be a natural cat-and-mouse game. The distributed payment model of online advertising was surprisingly effective. No single person was worth very much but the aggregate of attention had a probabilistic conversion that enabled a sufficient ecosystem of news.

Now most of those who spend money get access to relatively good news in comparison to those who don't. The interesting thing is that if you model the utility of a customer base as trifactorial (subscriptions, ad-supported, influence-ability) and you set ad-support to near zero you're left with this situation where those with no ability to pay are now overwhelmingly useful to the website provider only as an influenceable base.

"If you're not paying, you're not the customer, you're the product", we used to say[0]. It turns out that's true, but if you can't pay by looking at ads, you will pay by the actions you take when you believe what the actual customer wants you to believe.

0: Though sometimes you do pay and you're still "the product" haha!

xp84today at 7:54 PM

> "as profit margins for news thin, it’s only become more important to news publishers to protect their intellectual property."

So their argument is that people who would be paying money at their paywalls, are going to IA to get their news for free? And if they can thwart those people, they'll show up and become monthly subscribers?

I am vaguely sympathetic to newspapers as a concept, though the actually owners of approximately all of them are just PE companies looking to extract maximum profit from this dying industry, not really trying to prolong their existence.

But I think everyone who is interested in subscribing to their newspapers' paywalls already has subscribed. Those of us who bypass paywalls with that archive.whatever site, or apparently IA (I have never tried it for this purpose) are doing so because there is zero chance we're going to (recurringly!) pay the asking price for some random out-of-town newspaper, The Verge, Bloomberg, whatever. It's fair game to call us immoral for that decision, but if (and it's a big if) this move prevents more people from being able to bypass a paywall, I predict zero incremental dollars will go to the news publishers.

show 1 reply
acidhousemcnabtoday at 6:45 PM

Perhaps I imagined this, however some months ago on X someone pointed out a historical article on dailymail.co.uk related to Prince Phillip and Epstein had been scrubbed, which likely would be intelligence or through D-Notices, but where instead of showing a 404 page would redirect to an article that was similar but benign. I checked the URL on the Wayback Machine and it turned up zero results, but not even the redirected article, however the user on X had screen grabbed the original, which everyone was reading and commenting on. As of 21st May I can't find this discussion on X and Grok denies it ever existed. This is a "maximally truth-finding" AI, so I must be mistaken. Perhaps the Internet Archive cannot be trusted, so this is why 340 local news outlets need to limit access.

show 1 reply
stronglikedantoday at 8:00 PM

That's okay. The AI knows everything now, and forever more. Farwell IA.

jqmcclearytoday at 7:57 PM

If we don't know the past we wont know it's repeating

flippanttoday at 6:44 PM

Apologies for the self-promo. Downvote and I'll know not to do it again.

This trend of outright banning the Internet Archive has me extremely worried. I fear a future where news articles are memoryholed, and no one can remember exactly what was reported and how sensational it all seemed.

I've been working on this project [0] for a while. Originally, I started with a tool that would allow people to snapshot webpages in their own browser, and they could selectively share their snapshots. Then by consensus, everyone could understand what exactly had changed, and they could draw their own conclusion about why.

While working on it, I realized that an authoritative answer to "what did it look like on $DATE" can't be produced by a no-name company. It's gotta be a non-commercial entity that's got a track record of integrity. The dream would be to allow MemoryHole customers to submit their snapshots to the Internet Archive (or other non-commercial entity). It's definitely a copyright nightmare - so no clue how this could work.

[0] - https://memoryhole.app

show 2 replies
jmclnxtoday at 6:13 PM

Maybe they should allow the Internet Archive access to their article after a week or 2.

But I think this will hurt them as time goes on more then help. IIRC, one news org blocked free access and their revenue fell. I think that was in Australia.

But seems they are using AI as the reason. So allowing after a week will not avoid AI access.

But, what happens of an AI Company subscribes to the news site using a person's name (or a fake name) ? They will still get the article and avoid hassles.

show 2 replies
_ink_today at 7:15 PM

Thanks, Big Tech!

charcircuittoday at 6:55 PM

If the block is merely user agent based IA can spoof a different user agent to get these sites.

Gagarin1917today at 7:31 PM

Not surprising, sites like Reddit use it to get around their paywalls.

Redditors then had the gall to pretend like it wasn’t their number one use case.

picsaotoday at 7:59 PM

[dead]

b00ty4breakfasttoday at 7:46 PM

Of course they are, because they are not primarily concerned with the reporting of noteworthy events. They are most worried about profit with the secondary goal of reporting but only insofar as it serves the first goal. This is a wider trend across many industries.

Obviously, a business needs to have an income but it's becoming more common for businesses to function first and foremast as revenue generators and the thing that enables that is only seen as a means to an end. When the quality of the product/service and it's function as a revenue generator diverge, the product/service will always take 2nd chair.

Maybe we could argue that the primary product is the revenue, especially when there are investors involved who are looking for big returns.

show 3 replies