logoalt Hacker News

Tell NYT, Atlantic, USA Today to keep Wayback Machine

346 pointsby doeneryesterday at 11:11 PM96 commentsview on HN

Comments

switzertoday at 10:04 AM

I think the problem is that when Archive.org has access to NYT and other publisher content, people can scrape NYT content at scale from Archive.org even when they cannot do so directly on NYT. If Archive.org blocks scrapers, maybe the publishers would make different choices and allow Archive.org access.

ctippetttoday at 1:03 AM

Am I correct that this has come about because archive.org respects robots.txt and these sites have blocked their crawler from indexing their sites?

I'm not sure how to articulate my thoughts on this exactly, other than to say it's disappointing that doing the right thing (i.e. respecting robots.txt) is rewarded with the burden of soliciting responses to a petition while at the same time others are rewarded with profit for ignoring those same directives.

show 6 replies
ajaimktoday at 1:53 AM

Idea: allow scraping but can’t publish for 1 year?

somepersonyesterday at 11:38 PM

Maybe they should have an escrow like Financial Times is available on NewsBank service with a 30 day escrow

WarmWashtoday at 1:51 AM

A bunch of people who have haven't ever loaded an ad or paid a subscription to those organizations are going to make a stand to demand they leave their backdoor open?

Cider9986today at 1:38 AM

I am looking forward to this (https://news.ycombinator.com/item?id=48070516)

JustinGoldberg9today at 2:39 AM

Need a cryptographically verifiable internet archive. This is probably not possible without something like web 3 or nostr or gpg pgp. Idk.

show 1 reply
JumpCrisscrosstoday at 12:51 AM

I know a little about this debate on the Times and Atlantic sides. I’ll get some grief for this, but I asked a senior person at the former what they thought about the paywall workarounds that are frequent on HN—I was genuinely shocked to learn they hadn’t heard about it.

In the end, we settled on agreeing that making such stuff available after 30 days, and possibly with access restrictions (can’t be pulled more than N times a day, in case it becomes relevant in the future) struck the right balance.

To my knowledge, the Internet Archive hasn’t done any outreach on this issue. In addition to pressuring the publications, I’d put some pressure on them to negotiate.

show 3 replies
crowcrofttoday at 3:29 AM

Ok, but what about Meta and X etc.

eranationtoday at 2:28 AM

I signed, but let’s be honest.

A pie chart showing the times I used the wayback machine to read an old NYT article vs the times I visited it due to a highly upvoted top HN comment linking to a relatively new article so we all can bypass the paywall is a solid circle.

show 1 reply
karel-3dtoday at 5:11 AM

There is still archive.today, too bad the owner is crazy

drivingmenutstoday at 5:35 AM

What is the advantage to those organizations to have their work preserved? If their work is stored in a public archive, they can’t charge for it and they lose money. If they make a mistake, then history is what they say it is and there is no external record to say otherwise.

show 2 replies
shevy-javatoday at 5:29 AM

We are kind of losing the world wide web here or at the least part of how we could use it in the past. More and more key services get knocked out; see the associated rise of age snifing and the campaign to destroy VPNs.

sublineartoday at 1:29 AM

After many years of these media outlets circling the drain, this is likely the clearest signal of their irrelevance. It's not like anyone is committing these rags to microfiche anymore.

show 3 replies
kr108sdhtoday at 1:06 AM

The petition should be to ban the AI theft. If it is on wayback, the bots could as well scrape the NYT directly.

The NYT is of course guilty itself. It did not investigate the possible murder of its star witness Suchir Balaji and is too reserved in examining the consequences of AI in general.

If they don't fulfill their journalistic and societal obligations, soon its own journalists will be replaced by AI bullet point slop like Axios.

WarmWashtoday at 1:45 AM

Can we just go back to ads and normalize blocking people who ad-block?

I'm grown up now, I understand how things work, and I'd rather see Tide and Coke ads than pay $20/mo to 8 different orgs, while maintaining that ad free option for those who want it.

The children of the internet probably won't sign a truce, so let's just cut them out and let intellectually honest people have a decent internet.

show 7 replies
LNSYtoday at 12:48 AM

[flagged]

righthandtoday at 12:43 AM

Wouldn’t it be better to let these legacy news orgs (which aren’t really anything beyond advertising and data harvesting firms) block archive.org and thus no one will read their articles and they can go under? I’m struggling to think of a reason I need NY Times. I’ve never had a subscription and never seen writing that I thought benefited me as a citizen (they’re Very pro-war of any kind).

show 3 replies
xyzzy_plughtoday at 12:52 AM

The title freaked me out. I thought this was about the Wayback Machine going away but no, it's just news publications blocking being archived.

I guess I don't really care. As soon as it becomes unworkable to view these publications through archivers I'll just stop viewing them altogether. I don't see this helping their bottom line though.

show 2 replies