Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik...

throwdbaaway • today at 2:59 AM • 0 replies • view on HN

Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.

alt Hacker News