I've had actual, real-life deployments in datacentres where we just left dead hardware in the racks until we needed the space, and we rarely did. Typically we'd visit a couple of times a year, because it was cheap to do so, but it'd have totally viable to let failures accumulate over a much longer time horizon.
Failure rates tend to follow a bathtub curve, so if you burn-in the hardware before launch, you'd expect low failure rates for a long period and it's quite likely it'd be cheaper to not replace components and just ensure enough redundancy for key systems (power, cooling, networking) that you could just shut down and disable any dead servers, and then replace the whole unit when enough parts have failed.
The analysis has zero redundancy for either servers or support systems.
Redundancy is a small issue on Earth, but completely changes the calculations for space because you need more of everything, which makes the already-unfavourable space and mass requirements even less plausible.
Without backup cooling and power one small failure could take the entire facility offline.
And active cooling - which is a given at these power densities - requires complex pumps and plumbing which have to survive a launch.
The whole idea is bonkers.
IMO you'd be better off thinking about a swarm of cheaper, simpler, individual serversats or racksats connected by a radio or microwave comms mesh.
I have no idea if that's any more economic, but at least it solves the most obvious redundancy and deployment issues.
serious q: how much extra failure rate would you expect from the physical transition to space?
on one hand, I imagine you'd rack things up so the whole rack/etc moves as one into space, OTOH there's still movement and things "shaking loose" plus the vibration, acceleration of the flight and loss of gravity...
The original article even addresses this directly. Plus hardware returns over fast enough that you'll simply be replacing modules with a smattering of dead servers with entirely new generations anyways.
It would be interesting to see if the failure rate across time holds true after a rocket launch and time spent in space. My guess is that it wouldn’t, but that’s just a guess.
I'd naively assume that the stress of launch (vibration, G-forces) would trigger failures in hardware that had been working on the ground. So I'd expect to see a large-ish number of failures on initial bringup in space.
Yes. I think I read a blogpost from Backblaze about running their Red Pod rack mounted chassis some 10 years ago.
They would just keep the failed drives in the chassi. Maybe swap out the entire chassi if enough drives died.
A new meaning to the term "space junk"
Exactly what I was thinking when the OP comment brought up "regular launches containing replacement hardware", this is easily solvable by actually "treating servers as cattle and not pets" whereby one would simply over-provision servers and then simply replace faulty servers around once per year.
Side: Thanks for sharing about the "bathtub curve", as TIL and I'm surprised I haven't heard of this before especially as it's related to reliability engineering (as from searching on HN (Algolia) that no HN post about the bathtub curve crossed 9 points).