Right. L3 caches, i.e. SRAMs of tens of MB or greater sizes have a latency that is only 2 to 3 times better than DRAM. SRAMs of only a few MB, like most L2 caches, may have a latency 10 times less than DRAM. L1 caches, of around 64 kB, may have a latency 3 to 5 times better than L2 caches.
The throughput of caches becomes much greater than of DRAM only when they are separated, i.e. each core has its private L1+L2 cache memory, so the transfers between cores and private caches can be done concurrently, without interference between them.
When an SRAM cache memory is shared, the throughput remains similar to that of external DRAM.
If the Cerebras memory is partitioned in many small blocks, then it would have low latency and high aggregate throughput for data that can be found in the local memory block, but high latency and low throughput for data that must be fetched from far away.
On the other hand, if there are fewer bigger memory blocks, the best case latency and throughput would be worse, but the worst case would not be so bad.