Sort of. But SRAM is not all made equal - L1 caches are small because they’re fast, and vice-versa L...

EdNutting • yesterday at 12:17 PM • 3 replies • view on HN

Sort of. But SRAM is not all made equal - L1 caches are small because they’re fast, and vice-versa L3 SRAM caches are slow because they’re big.

To address a large amount of SRAM requires an approximately log(N) amount of logic just to do the addressing (gross approximation). That extra logic takes time for a lookup operation to travel through, hence large = slow.

It’s also not one pool of SRAM. It’s thousands of small SRAM groups spread across the chip, with communication pathways in between.

So to have 44GB of SRAM is a very different architecture to 80GB of (unified) HBM (although even then that’s not true as most chips use multiple external memory interfaces).

HBM is high bandwidth. Whether that’s “fast” or not depends on the trade off between bandwidth and latency.

So, what I’m saying is this is way more complicated than it seems. But overall, yeah, Cerebras’ technical strategy is “big SRAM means more fast”, and they’ve not yet proven whether that’s technically true nor whether it makes economic sense.

Replies

adrian_b • yesterday at 7:19 PM

Right. L3 caches, i.e. SRAMs of tens of MB or greater sizes have a latency that is only 2 to 3 times better than DRAM. SRAMs of only a few MB, like most L2 caches, may have a latency 10 times less than DRAM. L1 caches, of around 64 kB, may have a latency 3 to 5 times better than L2 caches.

The throughput of caches becomes much greater than of DRAM only when they are separated, i.e. each core has its private L1+L2 cache memory, so the transfers between cores and private caches can be done concurrently, without interference between them.

When an SRAM cache memory is shared, the throughput remains similar to that of external DRAM.

If the Cerebras memory is partitioned in many small blocks, then it would have low latency and high aggregate throughput for data that can be found in the local memory block, but high latency and low throughput for data that must be fetched from far away.

On the other hand, if there are fewer bigger memory blocks, the best case latency and throughput would be worse, but the worst case would not be so bad.

SkiFire13 • yesterday at 1:43 PM

> L1 caches are small because they’re fast

I guess you meant to say they are fast because they are small?

kittbuilds • yesterday at 1:14 PM

[dead]

alt Hacker News

Replies