> 300 megawatts of new capacity (over 220,000 NVIDIA GPUs) The scale is just mindboggling here....

losvedir • yesterday at 9:18 PM • 6 replies • view on HN

> 300 megawatts of new capacity (over 220,000 NVIDIA GPUs)

The scale is just mindboggling here. Are there any blog posts or anything discussing what kind of infrastructure is used for even just the inference side (nevermind the training) for SotA models like Opus? I would have thought it might be secret, but given that you can actually run the models yourself on AWS Bedrock doesn't that give an indication?

Replies

epistasis • yesterday at 9:33 PM

I know you're probably talking about the compute infrastructure, but I think the electricity infrastructure side is interesting too, data centers are doing things in dumb ways because the need for operational expansion speed is greater than the need dollars:

> It’s regulation with the utilities. There are ramp rates, there are all of these things that you’re supposed to do to not screw up the grid. Data centers have been in gross violation of that. When you think about what’s wrong with data centers, they have load volatility, which we just talked about, then they decide to power it with behind-the-meter natural gas generators. These natural gas generators, their shaft is supposed to last for seven years. It’s lasting 10 months because of all the cycling.

https://www.volts.wtf/p/doing-data-centers-the-not-dumb-way

On the compute infrastructure, there are standard NVIDIA reference designs like this:

https://www.nvidia.com/en-us/technologies/enterprise-referen...

I haven't bothered to look but I'd guess Mellanox GPU-to-GPU networks, and massive custom code for splitting tensors across GPUs, and for shuttling activations across GPU nodes.

airspresso • yesterday at 9:44 PM

> but given that you can actually run the models yourself on AWS Bedrock

That's not exactly how it works. Anthropic are hosting their models in AWS Bedrock as a managed service. Customers call those LLMs just like calling any other API. There's no visibility into what kind of AWS infrastructure is serving that API request.

kristjansson • today at 4:10 AM

All evidence is that the final training runs across thousands to low tens of thousands of GPU, and that a single instance of the resulting model runs (or could run) well within a rack (ie NVL72).

The massive scale is all massively parallel: test-time compute for users, test time compute for RL rollouts (and probably increasingly environments for those rollouts), other synthetic data generation, research experiments, …

cavisne • today at 2:03 AM

Probably the best source https://jax-ml.github.io/scaling-book/

sroussey • yesterday at 10:25 PM

> 300 megawatts of new capacity (over 220,000 NVIDIA GPUs)

That’s just for the SpaceX part (over provisioning for grok, lol).

The Amazon and Google deals are each over an order of magnitude larger! Pretty wild indeed!

giwook • yesterday at 9:29 PM

How many instances of Doom can it run though?

alt Hacker News

Replies