logoalt Hacker News

Gone but Not Forgotten: Recovering the Dead Web

104 pointsby wslhlast Tuesday at 9:48 PM42 commentsview on HN

Comments

firefoxdtoday at 3:51 AM

There was a website that I had quoted a long time ago. The author said something like "when the robots are taking over the world, don't panic. Buy a robot." I loved it. So I linked to it on my old blog. Then years later, I went to the source only to find that the page returned a 404. So I linked to the wayback machine instead. But then, it was removed from the archive.org. I can't even remember the name of the website at this point, just that it had the word "café" in it.

Anyway, all this to say that since there are no sources for this quote, then I'm the new original source. You can quote me on that.

https://imgflip.com/memegenerator/117370206/You-made-thisI-m...

show 3 replies
badlibrariantoday at 4:24 AM

"Do you do backups too, for example to guard against corrupt data getting mirrored across both copies, or accidental deletion?"

John Gonzalez, Internet Archive infrastructure lead, replied:

"We have done experiments to confirm that we can back up large portions of our corpus... but this is not a regular practice for us at this time."

https://blog.archive.org/2016/10/25/20000-hard-drives-on-a-m...

com2kidtoday at 4:38 AM

Long ago in Seattle there was a network of BBSs and the head board was called Rat City. They had a lot of work by local artists (mostly tracker files and digital artwork IIRC).

I have not been able to find a single hint of their existence. Everything about what was once a collection of artistic works, wiped from the earth.

We really need to do a better job managing our historical legacy.

show 3 replies
Venn1today at 2:04 PM

Anyone in their late 30s or early 40s should be grateful. We got to be stupid teenagers on the internet without any of it going on our permanent record.

show 1 reply
johnealast Wednesday at 2:39 AM

In modern times, archive.org is an international treasure.

Which of course means it's facing major opposition from capital interests.

Apparently no one ever thought an incoming presidential administration would literally wipe gigabytes of government funded research results off the web.

Now we see in bold type how precarious is our democracy...

vbernattoday at 9:43 AM

I often run a linkchecker on my blog and substitute broken URL with links to the Wayback machine. Unfortunately, this is becoming quite difficult to detect broken URL as everybody is fighting bots. I am using linkchecker <https://github.com/linkchecker/linkchecker/> and it respects robots.txt but many sites are now serving 503 or various other codes.

show 1 reply
conceptiontoday at 2:01 PM

This brought up an inadvertent benchmark I accidentally made between the big three AIs. I had them all research an old BBS i used to use, hunting for a DOOR game i played on it. I gave details about the bbs that i could remember and a pretty well defined description of the game and threw it at the deep researchers to see what they could fine.

ChatGPT gave me about a ten page report on who ran the bbs and the name of the game. When I looked into it the game was totally different and the guy named had nothing to do with the bbs. “Since these were both popular items at the time, I just inferred.” But it had fabricated the entire report. Nothing in it was true.

Gemini did the same thing but the report was about twenty pages. 100% hallucinated.

Claude said it couldn’t find any information.

Best advertisement I’ve ever lived.

I still hunt for the door game today….

show 2 replies
SubiculumCodetoday at 2:51 PM

Vanishing culture is the default, and not just on the internet, but for all time.

zerobeestoday at 5:20 AM

I'll take a contrarian view and probably get downvoted for it, but many people are adamant about the benefits of indiscriminate archiving, and I just don't see it. Do we have a moral right to keep a copy of everything that's ever been written on the internet, basically just for the sake of it?

Sure, there's a variety of official and quasi-officials resources that should be treated as public record and preserved. And arguably, there are things that rise to the level of a cultural phenomenon and where the benefit of keeping receipts outweighs the jerk factor of never asking for permission and not respecting the wishes of private individuals.

But if it's some family blog from 30 years ago that's been deliberately taken down and lives on archive.org unbeknownst to the original owner? Do we have a right to that? To what end, other than "well, future historians may need it"? A historian won't look at it. A person trying to doxx you or shame you will.

show 6 replies
david_shitoday at 3:47 AM

Is this the story of Johnny Rotten?

show 1 reply
theodrictoday at 4:21 PM

I had a collection of DEC LSI-11 technical/troubleshooting links captured in 2021 when I last was working on a young PDP. When I checked them again in 2024, more than HALF were gone, but (praise be to Kahle) all had been archived.

The saying goes: "when an old man dies, a library burns." Doubly so when his heirs don't pay the hosting bill. The boomers and previous were great at writing things down, but we're rapidly losing them.

Razengantoday at 8:00 AM

Just yesterday after seeing a post on HN about "Half baked product" and it reminded me of halfbakery.com

And it's still around! 27 years!! and doesn't seem to be enshittified..

Freezonetoday at 8:20 AM

[dead]