logoalt Hacker News

GeekyBearyesterday at 6:27 PM1 replyview on HN

A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell.

That's why all the projects streaming models into the GPU from an SSD popped up recently.


Replies

manmalyesterday at 9:20 PM

Yes. There’s just no way to get above 1t/s that way with a large model.