Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
Per compute die it functions as one 96M L3 with uniform latency. It is 4 cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained over global memory interconnect to the third (IO) die.