The biggest mistake I made was high uptime. arjie.com was up for 10 years plus on a Hetzner VPS so that by the time they wanted to sunset the machine underlying I had no idea what my teenage self had set up. I have the backups but the site hasn’t been up in a decade…
Nowadays I build things so that they move and I have moved things about a bit so I know they work.
This reminds me of Ise Shrine in Japan, which is completely dismantled then rebuilt every 20 years.
This is top of mind because I recently read Breakneck by Dan Wang. He makes the case that this practice of rebuilding the shrine preserves knowledge that would otherwise have been lost to time. Wang contrasts Ise Shrine with Notre Dame, where rebuilding the roof is apparently quite difficult, perhaps in part due to the loss of knowledge. I'm not familiar enough with either structure to judge whether this is a fair comparison, but I like the principle.
(Edit to add: This is only a minor analogy from the book, which I highly recommend overall.)
Indeed, for a VM, high uptime makes little sense, because a reboot takes a few seconds, and an upgrade requires no downtime, just switching the DNS to a new instance.
For a physical machine which you can't easily copy, it's a different story.
I started putting things in a big ansible playbook repo. Don't need to have it fully managed by ansible either I mostly just have setup configured there I still do lots of by hand management.
Sometimes I leave Architectural Decision Records for personal projects. It feels silly but it honestly comes in handy more times than expected
I hear you. On the other hand, not having to mess with something is good. I just make extensive notes in a README somewhere - usually in KeePass right next to the system info.
I stood up a dokuwiki instance recently and then documented how to stand up dokuwiki, haha.
I disabled revision history viewing and have a public portion and a private portion. I use it to track things I'm learning and document rollout procedures and commands I need for things. So far I have rclone backups into S3 Glacier, Tuwunel(Matrix) server deployment with voice/video support, and various little tutorials on server stuff I'm learning.
TLDR use a wiki!
> The biggest mistake I made was high uptime. arjie.com was up for 10 years plus on a Hetzner VPS so that by the time they wanted to sunset the machine underlying I had no idea what my teenage self had set up. I have the backups but the site hasn’t been up in a decade
LLMs have solved this problem, they’ll happily deal with the software archaeology on your behalf. This is the kind of task they really excel at.
"The biggest mistake I made was high uptime"
Quite. I'm old enough to remember machine uptime being a badge of honour.
However, being older and not really wiser, I look for service uptime these days. Yes we did have similar back in the day, that's why MX and the like DNS records exist.
Old school clusters were pretty esoteric but the lessons were learned (split brain n that) and that's why we still argue the toss with kiddies about why a Proxmox cluster with two nodes is fucked and why we recommend an additional "witness".
I don't care that VMware glossed over the whole two node HA cluster thing years ago with a massive bodge. They were wrong then and they are probably still wrong because that nonsense is probably still baked in.
Sorry, slight digression.
High uptime implies no patching. We all love patching.