AWS North Virginia data center outage – resolved

228 points • by christhecaribou • yesterday at 3:31 AM • 155 comments • view on HN

Comments

AWS’s US-East 1 continues to be the Achilles heel of the Internet.

And while yes building across multiple regions and AZs is a thing, AWS has had a string of issues where US-East 1 has broader impacts, which makes things far less redundant and resilient than AWS implies.

➕ show 6 replies

aurareturn • yesterday at 11:12 PM

These things are dangerous. Someone who can take AWS down such as an employee can place a bet.

These bets aren’t as innocent as they seem because the bettors can often influence or change the outcome.

➕ show 2 replies

fabian2k • yesterday at 9:53 PM

I thought cooling was pretty much pre-planned in any data center, and you simply don't install more stuff than you can cool?

So did some cooling equipment fail here or was there an external reason for the overheating? Or does Amazon overbook the cooling in their data centers?

➕ show 4 replies

tornikeo • today at 5:54 AM

I wonder if hetzner had better uptime in EU than AWS this year.

➕ show 1 reply

grendelt • today at 11:05 AM

There sure are a lot of eggs in that East basket.

merek • yesterday at 3:44 AM

AWS EC2 outage in use1-az4 (us-east-1)

https://news.ycombinator.com/item?id=48057294

corvad • yesterday at 11:56 PM

It's always East 1... Jokes aside I don't understand how often east-1 is taken down compared to other regions. Like it should be pretty similar to other regions architecture wise.

➕ show 3 replies

fastest963 • today at 12:21 AM

Coinbase claimed multiple AZs were down but the AWS statement was that only a single AZ was affected. Does anyone have more details?

➕ show 4 replies

whatever1 • today at 5:49 AM

2/last 365 days down. My Ubuntu nas is 0/last 365days down.

Come and give me your cash if you want resilience.

➕ show 1 reply

Havoc • yesterday at 9:54 PM

Could someone explain to me why they don't build these things near oceans? Like nuclear plants that need plenty cooling capacity too

Two loop cycle with heat exchanger to get rid of the heat

➕ show 10 replies

sitzkrieg • today at 2:12 AM

using aws since s3 came out and i’ve yet to see any major company do multi az failover in any capacity whatsoever. default region ftw

➕ show 2 replies

yomismoaqui • today at 12:09 AM

How many nines of are we at this year?

➕ show 1 reply

matt3210 • today at 2:31 AM

Right, cooling.

nikcub • yesterday at 11:06 PM

both realtime markets where multi-AZ is hard?

➕ show 1 reply

jeffbee • today at 12:56 AM

I don't see anything on downdetector suggesting this was particularly disruptive.

aussieguy1234 • yesterday at 11:45 PM

Once known for having super reliable services, I've heard this company is scrambling to re hire some of the engineers they overconfidently "replaced" with AI.

When customers pay for cloud services, they expect them to be maintained by competent engineers.

edit: Not sure why the downvotes. If you fire the engineers that have been keeping your systems running reliably for years, what do you expect to happen?

➕ show 1 reply

ElenaDaibunny • today at 8:37 AM

[flagged]

tcp_handshaker • yesterday at 9:50 PM

I bet post-mortem will say vibe coding confused fahrenheit and celsius, we run too hot...

➕ show 1 reply

fukinstupid • today at 4:55 AM

[flagged]

OhMeadhbh • yesterday at 11:08 PM

[flagged]

➕ show 1 reply

BugsJustFindMe • yesterday at 11:36 PM

[flagged]

tailscaler2026 • yesterday at 10:30 PM

us-east-1 is down? shocking! stop putting SPOF services there. this location has had frequent issues for the past 15 years.

➕ show 1 reply

rswail • today at 8:51 AM

So in the comments here we have the usual about us-east-1, it's centralized, it's a SPOF for AWS, they should fix it, don't put your stuff there, etc.

This was one data centre in one zone of a multi-zone region.

Yes IAM/R53 and others are centralized there, yes, reworking those service to be decentralized and cross-region would be a Good Thing. But us-east-1 is already multi-zone (6 with a seventh marked as "coming in 2026") with multi DC within zones. From memory, when a global service like IAM is out, it's more likely to be bugs in the implementation or dependency than a "if this was cross-region it wouldn't have died" issue.

But this wasn't an outage of any AWS global service this time. The only one that seemed to have more impact was/is MSK. Which is likely to be more of an issue with Kafka than anything AWS related.

alt Hacker News

AWS North Virginia data center outage – resolved

Comments