I know you're probably talking about the compute infrastructure, but I think the electricity infrastructure side is interesting too, data centers are doing things in dumb ways because the need for operational expansion speed is greater than the need dollars:
> It’s regulation with the utilities. There are ramp rates, there are all of these things that you’re supposed to do to not screw up the grid. Data centers have been in gross violation of that. When you think about what’s wrong with data centers, they have load volatility, which we just talked about, then they decide to power it with behind-the-meter natural gas generators. These natural gas generators, their shaft is supposed to last for seven years. It’s lasting 10 months because of all the cycling.
https://www.volts.wtf/p/doing-data-centers-the-not-dumb-way
On the compute infrastructure, there are standard NVIDIA reference designs like this:
https://www.nvidia.com/en-us/technologies/enterprise-referen...
I haven't bothered to look but I'd guess Mellanox GPU-to-GPU networks, and massive custom code for splitting tensors across GPUs, and for shuttling activations across GPU nodes.