logoalt Hacker News

rao-vyesterday at 11:37 PM0 repliesview on HN

Hey some of us are on hardware (gfx906 based Radeon MI50s with 32GB of stupidly fast VRAM and basically no compute) that inference significantly faster with Q_0 and Q_1 quants