logoalt Hacker News

adgjlsfhk1yesterday at 3:42 PM2 repliesview on HN

This the the most cursed part of modern cpu design, but the TLDR is that programs use virtual addresses while CPUs use physical addresses which means that CPU caches need to include the translation from virtual to physical adress. The problem is that for L1 cache, the latency requirement of 3-4 cycles is too strict to first do a TLB lookup and then an L1 cache lookup, so the L1 can only be keyed on the bits of ram which are identical between physical and virtual addresses. With a 4k page size, you only have 6 bits between the size of your cache line (64 bytes) and the size of your page, which means that at an 8 way associative L1D, you only get 64 buckets*64 bytes/bucket=32 kbits of L1 cache. If you want to increase that while keeping the 4k page size, you need to up the associativity, but that has massive power draw and area costs, which is why on x86, L1D on x86 hasn't increased since core 2 duo in 2006.


Replies

joha4270yesterday at 5:16 PM

Can you not take some of those virtual bits and get more buckets that way? I am sure it will make things more complicated if nothing else by them possibly being mapped to the same physical page, but it doesn't sound like an impossible barrier. Maybe something terrible where a cache line keeps bouncing between different buckets in the rare case that does happen, but as long as you can keep the common case as fast...

Otoh L1 sizes hasn't increased since my first processor, those CPU designers probably know more than I do.

show 1 reply
rslashuseryesterday at 4:11 PM

Nice HN explanation! One hopes we will not be living with 4kb pages forever, and perhaps L1 performance will be one more reason.

show 1 reply