Having read above article, I just gave llama.cpp a shot. It is as easy as the author says now, thoug...

nikodunk • today at 7:21 AM • 1 reply • view on HN

Having read above article, I just gave llama.cpp a shot. It is as easy as the author says now, though definitely not documented quite as well. My quickstart:

brew install llama.cpp

llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000

Go to localhost:8000 for the Web UI. On Linux it accelerates correctly on my AMD GPU, which Ollama failed to do, though of course everyone's mileage seems to vary on this.

Replies

teekert • today at 7:55 AM

Was hoping it was so easy :) But I probably need to look into it some more.

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model

Edit: @below, I used `nix-shell -p llama-cpp` so not brew related. Could indeed be an older version indeed! I'll check.

➕ show 2 replies

alt Hacker News

Replies