logoalt Hacker News

toomuchtodoyesterday at 9:07 PM4 repliesview on HN

Not opposed, Wikimedia tech folks are very accessible in my experience, ask them to make a GET or POST to https://web.archive.org/save whenever a link is added via the Wiki editing mechanism. Easy peasy. Example CLI tools are https://github.com/palewire/savepagenow and https://github.com/akamhy/waybackpy

Shortcut is to consume the Wikimedia changelog firehose and make these http requests yourself, performing a CDX lookup request to see if a recent snapshot was already taken before issuing a capture request (to be polite to the capture worker queue).


Replies

Gander5739yesterday at 9:36 PM

This already happens. Every link added to Wikipedia is automatically archived on the wayback machine.

show 2 replies
jsheardyesterday at 9:11 PM

I didn't know you can just ask IA to grab a page before their crawler gets to it. In that case yeah it would make sense for Wikipedia to ping them automatically.

ferngodfatheryesterday at 9:19 PM

Why wouldn't Wikipedia just capture and host this themselves? Surely it makes more sense to DIY than to rely on a third party.

show 2 replies
RupertSaltyesterday at 9:11 PM

Spammers and pirates just got super excited at that plan!

show 1 reply