logoalt Hacker News

alfiedotwtfyesterday at 4:07 PM1 replyview on HN

I've been keeping an eye on Antirez's Metal fork for llama.cpp, but I totally missed this. Whoa, nice. Giving it a go, thanks!!


Replies

zozbot234yesterday at 4:29 PM

What kind of hardware are you planning to run this on? As mentioned already, I've been trying to understand how gracefully it might degrade on 64GB RAM or perhaps lower (the total weights size is 80GB at the provided quant) using SSD offload for the weights, and then (assuming it works and doesn't just OOM) whether the tok/s figures might meaningfully improve in that scenario by running multiple sessions in parallel.

show 1 reply