I always wondered why L1 caches couldn't just be bigger. L1 caches need to be close to clock speed of the core and bigger caches means increased latency because the bottleneck is length of the bit line and number of word lines which increases with capacity.