logoalt Hacker News

xaskasdfyesterday at 2:42 PM0 repliesview on HN

I did it, but with different quantization compressions, It ran into quality issues, I will try to rerun with the same quants if that fixes the issue, but the most that looks unused, its being used by rotating layers that are being swapped by the cpu from the ram itself, that manages to keep layers warm, ready to use while inferencing and discarding already used ones