logoalt Hacker News

deauxyesterday at 2:59 AM0 repliesview on HN

And that's at unusable speeds - it takes about triple that amount to run it decently fast at int4.

Now as the other replies say, you should very likely run a quantized version anyway.