How are people running this locally? I just checked llama.cpp and it appears unsloth has a version b...

xrd • today at 1:39 PM • 1 reply • view on HN

How are people running this locally? I just checked llama.cpp and it appears unsloth has a version but it hacks a bunch of things to make it work and isn't optimal.

https://github.com/ggml-org/llama.cpp/issues/24730

Replies

jeremyjh • today at 1:45 PM

No one is doing that for a model this size it would have to be so heavily quantized that it wouldn’t be useful - or you’d need to spend a half million dollars on hardware. People use hosted APIs. Open weight means cloud vendors can host it.

➕ show 1 reply

alt Hacker News

Replies