Gemma-4 E2B/E4B models reuses K-V cache from other layers, which do things in a "transpose...

foldl2022 • today at 2:08 AM • 0 replies • view on HN

Gemma-4 E2B/E4B models reuses K-V cache from other layers, which do things in a "transposed" way: not reuse Q/K/V matrices within a single layer, but reuse across different layers.

alt Hacker News