logoalt Hacker News

Railway Blocked by Google Cloud

333 pointsby aarondftoday at 12:23 AM149 commentsview on HN

Comments

valgazetoday at 3:32 AM

May 2024 UniSuper incident: https://cloud.google.com/blog/products/infrastructure/detail...

https://www.unisuper.com.au/about-us/media-centre/2024/a-joi...

A joint statement from UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian

8 May 2024

UniSuper and Google Cloud understand the disruption to services experienced by members has been extremely frustrating and disappointing. We extend our sincere apologies to all members.

While supporting UniSuper to bring its systems back online, Google Cloud has been conducting a root cause analysis.

Thomas Kurian has confirmed that the disruption arose from an unprecedented sequence of events, where an inadvertent misconfiguration during provisioning of UniSuper’s Private Cloud services ultimately resulted in the deletion of UniSuper’s Private Cloud subscription.

This is described as an isolated, “one-of-a-kind occurrence” that has never before occurred with any Google Cloud client globally. This should not have happened. Google Cloud has identified the sequence of events and taken measures to ensure it does not happen again.

Why did the outage last so long?

UniSuper had duplication across two geographies as protection against outages and data loss. However, the deletion of the Private Cloud subscription triggered deletion across both geographies.

Restoring the Private Cloud required significant coordination and effort between UniSuper and Google Cloud, including recovery of hundreds of virtual machines, databases, and applications.

show 2 replies
dangoodmanUTtoday at 1:22 AM

It has been 0 days since GCP has taken down a startup (again).

You see this at least once a year. Never heard of this from AWS or Azure.

In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.

show 7 replies
binaryclerictoday at 2:00 AM

How the heck do these things happen, especially with companies with huge monthly spend? At my last job we had some suspicious workloads running on AWS and our TAM reached out to us before taking any action. Who wants to bet this was some AI automation gone wrong and because GCP seems to be allergic to actually contacting a human to get a response, this just sits in some support queue that outsourced workers look at after a few hours just to give a canned response?

show 3 replies
BitWiseVibetoday at 2:07 AM

As someone who runs some public APIs, the amount of spam from Railway IPs is insane. They have horrible abuse prevention. Hopefully this encourages them to improve their operations.

show 1 reply
chatmastatoday at 2:57 AM

I thought Railway was building their own data centers? [0]

> The fact of the matter is, you simply cannot build a cloud on someone else’s cloud.

Indeed…

[0] https://blog.railway.com/p/launch-week-02-welcome

show 1 reply
mjy78today at 4:16 AM

All in on cloud so we don’t need to worry about backups. Now your subscription is the single point of failure.

jaspangliatoday at 4:36 AM

Cloud platform dependencies are becoming a huge single point of failure

bearjawstoday at 2:02 AM

I will never leverage GCP in an enterprise setting, it's honestly amazing how hard they fumble the bag. Will be interesting to see when GCP support started working with them, from the updates there was an hour and change from when they identified the issue and GCP support was confirmed.

In the cloud space it seems like AWS does nothing and wins.

brokenodotoday at 2:20 AM

Well, as a 2 week tenured and very happy Railway customer until now, I am now a Render customer. Somehow DNS cut over within 1 min(!) and live after about 30 minutes of work. Not bad!

show 1 reply
steve1977today at 4:40 AM

Lesson learned: don't rely on a single hyperscaler, even (or especially) as a startup.

UrbanNorminaltoday at 2:05 AM

Is google allergic to humans or something? Cannot they just send an email or call the company before taking a wrecking ball to the entire company's infra? Are they stupid?

show 1 reply
codegeektoday at 1:33 AM

This is bad. Even their own website is down at railway.com. Looks like total dependency on google cloud. Surprising for a company of their scale with all this VC money.

show 1 reply
padolseytoday at 1:59 AM

Does anyone know how this even happens inside the walls of google? Is it an automated process? How is such a (presumably) high revenue account just magically blocked without human intervention? I'm quite perplexed.

show 2 replies
r_leetoday at 1:39 AM

seriously, is it possible to trust GCP with critical data/services at this point if you're not a billion dollar company?

I'm exaggerating but someone said they got "auto banned"

what if that happens to a small account which hosts some really important data/services there?

show 6 replies
sammy2255today at 2:53 AM

The 3-2-1 backup rule is pretty outdated in the world of cloud. You could have 3 complete copies of your data in different S3 buckets, but if they're all under the same account you've lost your blast radius protection

show 2 replies
tuxtoday at 1:45 AM

At this point you can’t trust Google anymore, it keeps breaking things. Imagine having Google AI do this thins automatically. Will have apocalypse in in a day.

jefborgestoday at 1:49 AM

Railway is back, but I’m not sure if I can trust keeping my projects there, so I’m going to migrate to another company.

show 2 replies
usernametaken29today at 2:46 AM

I didn’t knew Railway so with this misleading headline I thought a Google Cloud data centre was being built in the way of a railroad. That’d been a funny story to read..

show 2 replies
zelon88today at 3:10 AM

Wild to me that any tech sector business would want to rent an operating environment to park their entire infrastructure into. This is the equivalent to traveling shoe salesmen setting up a tent in the parking lot of a strip mall.

hnburnsytoday at 2:50 AM

From their founder on X...

"Absolutely. The Railway network is a mesh ring between AWS, GCP, and Metal

So: - High availability interconnects - High availability path routing between clouds - Database itself is high availability

However, Google's VPC itself is not. So we will add a shard to Metal and AWS"

show 1 reply
bilalqtoday at 3:57 AM

Building a startup on GCP (or even Google Workspace) is an existential risk.

orliesaurustoday at 1:51 AM

I wonder if someone has exploited a weird Google-safety automated process to report something on Railway which caused Google to block the whole thing.

gnabgibtoday at 12:24 AM

Dupe - join the discussion started an hour ago instead of query string work (12 points, 4 comments) https://news.ycombinator.com/item?id=48200827

show 1 reply
koolhead17today at 4:01 AM

Let's blame some rouge AI agent at GCP causing this.

eezingtoday at 4:02 AM

“Deletion of private cloud subscription…”

Who deleted it?

parineumtoday at 2:19 AM

There's a lot of, what seems to me, unfounded blame being directed at Google for this. Isn't railway the company that just blamed Anthropic for deleting their prod database?

show 2 replies
thrownthatwaytoday at 4:20 AM

Huh.

Railway dot com

Has nothing to do with railways.

I wish software people would get their own words.

jujube3today at 3:00 AM

If you buy a cloud-on-a-cloud, you're a clown-on-a-clown.

redanddeadtoday at 2:16 AM

one of the many reasons companies are cloud agnostic and dont want to get locked in

show 1 reply
shevy-javatoday at 4:00 AM

Do not become dependent on Google. Ever.

show 1 reply
isninkhamisstoday at 1:35 AM

github got way more noise for less

fnord77today at 3:21 AM

wish I knew what "railway" is

rvztoday at 1:31 AM

Let me guess… Googler running AI agent in production that blocked this startup’s account.

codepacktoday at 3:48 AM

[dead]

codepacktoday at 4:01 AM

[dead]

codepacktoday at 3:47 AM

[dead]

codepacktoday at 3:45 AM

[dead]

htrptoday at 4:01 AM

[dead]

unit490today at 2:34 AM

[dead]

rekabistoday at 12:55 AM

TL;DR: putting all your eggs into one basket is bad, man.

show 2 replies