logoalt Hacker News

beernetyesterday at 6:24 PM3 repliesview on HN

More than by the downtime I am much more surprised by the actual uptime. Hard to imagine how difficult this must be, given the speed of growth.


Replies

nippooyesterday at 6:34 PM

Truly! As someone who's worked with HPC and GPUs in a scientific research context, trying to get a service like this to work reliably is a different ballgame to your usual webapp stack...

show 4 replies
wrsyesterday at 6:44 PM

On the other hand, the status page is blaming the authentication system, which one would think is not a frontier-class problem.

Havocyesterday at 11:48 PM

Would have thought that compared to training the serving part is pretty easy. Less of a “everything needs to come together at once” and more just move demand to a working cluster if one bombs & have some spare capacity