logoalt Hacker News

GlenTheMachine06/27/202517 repliesview on HN

Space roboticist here.

As with a lot of things, it isn't the initial outlay, it's the maintenance costs. Terrestrial datacenters have parts fail and get replaced all the time. The mass analysis given here -- which appears quite good, at first glance -- doesn't including any mass, energy, or thermal system numbers for the infrastructure you would need to have to replace failed components.

As a first cut, this would require:

- an autonomous rendezvous and docking system

- a fully railed robotic system, e.g. some sort of robotic manipulator that can move along rails and reach every card in every server in the system, which usually means a system of relatively stiff rails running throughout the interior of the plant

- CPU, power, comms, and cooling to support the above

- importantly, the ability of the robotic servicing system toto replace itself. In other words, it would need to be at least two fault tolerant -- which usually means dual wound motors, redundant gears, redundant harness, redundant power, comms, and compute. Alternately, two or more independent robotic systems that are capable of not only replacing cards but also of replacing each other.

- regular launches containing replacement hardware

- ongoing ground support staff to deal with failures

The mass analysis also doesn't appear to include the massive number of heat pipes you would need to transfer the heat from the chips to the radiators. For an orbiting datacenter, that would probably be the single biggest mass allocation.


Replies

vidarh06/27/2025

I've had actual, real-life deployments in datacentres where we just left dead hardware in the racks until we needed the space, and we rarely did. Typically we'd visit a couple of times a year, because it was cheap to do so, but it'd have totally viable to let failures accumulate over a much longer time horizon.

Failure rates tend to follow a bathtub curve, so if you burn-in the hardware before launch, you'd expect low failure rates for a long period and it's quite likely it'd be cheaper to not replace components and just ensure enough redundancy for key systems (power, cooling, networking) that you could just shut down and disable any dead servers, and then replace the whole unit when enough parts have failed.

show 8 replies
NitpickLawyer06/27/2025

Appreciate the insights, but I think failing hardware is the least of their problems. In that underwater pod trial, MS saw lower failure rates than expected (nitrogen atmosphere could be a key factor there).

> The company only lost six of the 855 submerged servers versus the eight servers that needed replacement (from the total of 135) on the parallel experiment Microsoft ran on land. It equates to a 0.7% loss in the sea versus 5.9% on land.

6/855 servers over 6 years is nothing. You'd simply re-launch the whole thing in 6 years (with advances in hardware anyways) and you'd call it a day. Just route around the bad servers. Add a bit more redundancy in your scheme. Plan for 10% to fail.

That being said, it's a complete bonkers proposal until they figure out the big problems, like cooling, power, and so on.

show 5 replies
protocolture06/27/2025

Did Microsoft do any of that with their submersible tests?

My feeling is that, a bit like starlink, you would just deprecate failed hardware, rather than bother with all the moving parts to replace faulty ram.

Does mean your comms and OOB tools need to be better than the average american colo provider but I would hope that would be a given.

show 1 reply
lumost06/27/2025

I used to build and operate data center infrastructure. There is very limited reason to do anything more than a warranty replacement on a GPU. With a high quality hardware vendor that properly engineers the physical machine, failure rates can be contained to less than .5% per year. Particularly if the network has redundancy to avoid critical mass failures.

In this case, I see no reason to perform any replacements of any kind. Proper networked serial port and power controls would allow maintenance for firmware/software issues.

oceanplexian06/27/2025

Why does it need to be robots?

On Earth we have skeleton crews maintain large datacenters. If the cost of mass to orbit is 100x cheaper, it’s not that absurd to have an on-call rotation of humans to maintain the space datacenter and install parts shipped on space FedEx or whatever we have in the future.

show 5 replies
monster_truck06/27/2025

I suspect they'd stop at automatic rendezvous & docking. Use some sort of cradle system that holds heat fins, power, etc that boxes of racks would slot into. Once they fail just pop em out and let em burn up. Someone else will figure out the landing bit

I won't say it's a good idea, but it's a fun way to get rid of e-waste (I envision this as a sort of old persons home for parted out supercomptuers)

show 1 reply
angadh06/29/2025

Thanks for the thorough comment—yes, the heat pipes etc haven’t been accounted for. Might be a future addition but the idea was to look at some key large parts and see where that takes us in terms of launch. The pipes would definitely skew the business case further. Similarly, the analysis is missing trusses.

Don’t even get me started on the costs of maintenance. I am sweating bricks just thinking of the mission architecture for assembly and how the robotic system might actually look. Unless there’s a single 4 km long deployable array (of what width?), which would be ridiculous to imagine.

Spooky2306/27/2025

Don’t you need to look at different failure scenarios or patterns in orbit due to exposure to cosmic rays as well?

It just seems funny, I recall when servers started getting more energy dense it was a revelation to many computer folks that safe operating temps in a datacenter should be quite high.

I’d imagine operating in space has lots of revelations in store. It’s a fascinating idea with big potential impact… but I wouldn’t expect this investment to pay out!

RecycledEle06/28/2025

What if we just integrate the hardware so it fails softly?

That is, as hardware fails, the system looses capacity.

That seems easier than replacing things on orbit, especially if StarShip becomes the cheapest way to launch to orbit because StarShip launches huge payloads, not a few rack mounted servers.

markemer06/29/2025

Not to mention radiation hardening. The soft error rate alone on these single digit nm chips would be massive.

hamburglar06/27/2025

Seems prudent to achieve fully robotic datacenters on earth before doing it in space. I know, I’m a real wet blanket.

show 2 replies
empath7506/27/2025

I think what you actually do is let it gradually degrade over time and then launch a new one.

callamdelaney06/27/2025

What, why would you fly out and replace it? It'd be much cheaper just to launch more.

intended06/27/2025

It sounds like building it on the moon would be better.

show 1 reply
spullara06/27/2025

you don't replace it, you just let it fail and over time the datacenter wears out.

aaron69506/27/2025

[dead]