logoalt Hacker News

iso1631yesterday at 1:04 PM3 repliesview on HN

Been through this recently in a fairly large enterprise

We have some in house software which runs in k8s. Total throughput peaks at about 1mbit a second of control traffic - it's controlling some other devices which are on dedicated hardware. Total of 24GB of ram.

The software team say it needs to run across 3 different servers for resilience purposes.

The VM team want to use neutronix as their VM platform, so they can live migrate one VM to another.

They insist on 25gbit networking, and for resilience purposes that needs to be mlagged

The network team also have to have multiple switches and routers, again for resilience.

So rather than having 3 $1000 laptops running bare metal kubes hanging off a pair of $500 1G switches eating maybe 200W, we have a $140k BOM sucking up 2kW.

When something goes wrong all those layers of resilience will no doubt fight each other. The hardware drops, so the VM freezes as it restored onto another host, so K8s moves the workloads, then the VM comes back, the k8s gets confused (maybe? I don't know how k8s works).

It's all needlessly overspecced costing 30 times as much as it should.

But from each individual team it makes sense. They don't want to be blamed if it doesn't work, they don't have to find the money. It's different departments.


Replies

amlutoyesterday at 1:52 PM

One of my favorite bits of hardware is a UPS. I’ve played with several over the years, from fancy server-grade rack-mount APC stuff to inexpensive edge stuff. Without exception, downtime is increased by use of a UPS. I used to plug a server with redundant PSUs into the UPS and the wall so it could ride out UPS glitches.

Even today, a UPS that turns itself back on after power goes out long enough to drain the battery and is then restored is somewhat exotic. Amusingly, even the new UniFi UPSes, which are clearly meant to be shoved in a closet somewhere, supposedly turn off and stay off when the battery drains according to forum posts. There are no official docs, of course.

show 1 reply
torginusyesterday at 2:32 PM

The funniest thing about huge enterprises is that they often have processes so convoluted and restrictive for everything, that getting stuff done by the book is basically impossible, so people get creative with the limitations and we often end up with the sketchiest solutions in existence.

I hope the words 'web server hosted in Excel VBA' illustrate the magnitude of horrors that can emerge in these situations.

show 1 reply
lijokyesterday at 3:32 PM

which is exactly why this being different departments makes no sense

one infra team - provides the entire platform

any other approach and you’re dicking around