The local LLM ecosystem doesn’t need Ollama

514 points • by Zetaphor • today at 3:35 AM • 151 comments • view on HN

Comments

For most users that wanted to run LLM locally, ollama solved the UX problem.

One command, and you are running the models even with the rocm drivers without knowing.

If llama provides such UX, they failed terrible at communicating that. Starting with the name. Llama.cpp: that's a cpp library! Ollama is the wrapper. That's the mental model. I don't want to build my own program! I just want to have fun :-P

➕ show 8 replies

u1hcw9nx • today at 8:52 AM

Two Views of MIT-Style Licenses:

1. MIT-style licenses are "do what you want" as long as you provide a single line of attribution. Including building big closed source business around it.

2. MIT-style licenses are "do what you want" under the law, but they carry moral, GPL-like obligations to think about the "community."

To my knowledge Georgi Gerganov, the creator of llama.cpp, has only complained about attribution when it was missing. As an open-source developer, he selected a permissive license and has not complained about other issues, only the lack of credit. It seems he treats the MIT license as the first kind.

The article has other good points not related to licensing that are good to know. Like performance issues and simplicity that makes me consider llama.cpp.

➕ show 2 replies

0xbadcafebee • today at 6:30 AM

No mention of the fact that Ollama is about 1000x easier to use. Llama.cpp is a great project, but it's also one of the least user friendly pieces of software I've used. I don't think anyone in the project cares about normal users.

I started with Ollama, and it was great. But I moved to llama.cpp to have more up-to-date fixes. I still use Ollama to pull and list my models because it's so easy. I then built my own set of scripts to populate a separate cache directory of hardlinks so llama-swap can load the gguf's into llama.cpp.

➕ show 6 replies

Zetaphor • today at 3:36 AM

I got tired of repeating the same points and having to dig up sources every time, so here's the timeline (as I know it) in one place with sources.

➕ show 4 replies

dizhn • today at 9:19 AM

> the file gets copied into Ollama’s hashed blob storage, you still can’t share the GGUF with other tool

This is the reason I had stopped using it. I think they might be doing it for deduplication however it makes it impossible to use the same model with other tools. Every other tool can just point to the same existing gguf and can go. Whether its their intention or not, it's making it difficult to try out other tools. Model files are quite large as you know and storage and download can become issues. (They are for me)

denismi • today at 7:42 AM

Hmm..

  pacman -Ss ollama | wc -l                                                                                                              
  16
  pacman -Ss llama.cpp | wc -l
  0
  pacman -Ss lmstudio | wc -l
  0

Maybe some day.

➕ show 2 replies

usernomdeguerre • today at 5:04 AM

Do they still not let you change the default model folder? You had to go through this whole song and dance to manually register a model via a pointless dockerfile wannabe that then seemed to copy the original model into their hash storage (again, unable to change where that storage lived).

At the time I dropped it for LMStudio, which to be fair was not fully open source either, but at least exposed the model folder and integrated with HF rather than a proprietary model garden for no good reason.

➕ show 2 replies

blueybingo • today at 9:01 AM

the article buries what's actaully the most practical gotcha: ollama's hashed blob storage means if you've been pulling models for months, switching tools requires re-downloading everything because you can't just point another runtime at those files, and most users won't discover this until they're already invested enough that it genuinely hurts to leave.

zxcholmes • today at 7:48 AM

The name "llama.cpp" doesn't seem very friendly anymore nowadays... Back then, "llama" probably referred to those models from Facebook, and now those Llama series models clearly can't represent the strongest open-source models anymore...

➕ show 2 replies

zarzavat • today at 8:35 AM

It's as if Ollama is trying to create a walled garden, but the garden is outside of their property, so all it achieves is walling themselves in.

FeepingCreature • today at 8:54 AM

I always avoided Ollama because it smelled like a project that was trying so desperately to own the entire workflow. I guess I dodged a bigger bullet than I knew.

song • today at 8:38 AM

So, on a mac, what good alternative to ollama supports mlx for acceleration? My main use case is that I have an old m1 max macbook pro with 64 gb ram that I use as a model server.

➕ show 1 reply

rrhjm53270 • today at 10:15 AM

It is a bit off-topic, but would it possible to provide a light mode for this blog? I used to work during the day time, and my pupils had to contract to read, making it a very poor reading experience.

flux3125 • today at 8:56 AM

I stopped using Ollama a couple of months ago. Not out of frustration, but because llama.cpp has improved a lot recently with router mode, hot-swapping, a modern and simple web UI, MCP support and lots of other improvements.

fy20 • today at 6:37 AM

It feels like a bit of history is missing... If ollama was founded 3 years before llama.cpp was released, what engine did they use then? When did they transition?

➕ show 2 replies

tosh • today at 8:26 AM

This is a bit like saying stop using Ubuntu, use Debian instead.

Both llama.cpp and ollama are great and focused on different things and yet complement each other (both can be true at the same time!)

Ollama has great ux and also supports inference via mlx, which has better performance on apple silicon than llama.cpp

I'm using llama.cpp, ollama, lm studio, mlx etc etc depending on what is most convenient for me at the time to get done what I want to get done (e.g. a specific model config to run, mcp, just try a prompt quickly, …)

➕ show 5 replies

speedgoose • today at 6:24 AM

I prefer Ollama over the suggested alternatives.

I will switch once we have good user experience on simple features.

A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.

➕ show 4 replies

pplonski86 • today at 9:10 AM

I like Ollama Cloud service (I'm paid pro user), because it let me test several open source LLMs very fast - I dont need to download anything locally, just change the model name in the API. If I like the model then I can download it and run locally with sensitive data. I also like their CLI, because it is simple to use.

The fact that they are trying to make money is normal - they are a company. They need to pay the bills.

I agree that they should improve communication, but I assume it is still small company with a lot of different requests, and some things might be overlooked.

Overall I like the software and services they provide.

dragochat • today at 8:15 AM

how about the others:

- vLLM https://vllm.ai/ ?

- oMLX https://github.com/jundot/omlx ?

TomGarden • today at 6:40 AM

The performance issues are crazy. Thanks for sharing this

osmsucks • today at 6:43 AM

I noticed the performance issues too. I started using Jan recently and tried running the same model via llama.cpp vs local ollama, and the llama.cpp one was noticeably faster.

utopiah • today at 6:44 AM

Not sure why VLC doesn't do that.

It's a joke... but also not really? I mean VLC is "just" an interface to play videos. Videos are content files one "interact" with, mostly play/pause and few other functions like seeking. Because there are different video formats VLC relies on codecs to decode the videos, so basically delegating the "hard" part to codecs.

Now... what's the difference here? A model is a codec, the interactions are sending text/image/etc to it, output is text/image/etc out. It's not even radically bigger in size as videos can be huge, like models.

I'm confused as why this isn't a solved problem, especially (and yes I'm being a big sarcastic here, can't help myself) in a time where "AI" supposedly made all smart wise developers who rely on it 10x or even 1000x more productive.

Weird.

➕ show 1 reply

tyfon • today at 6:17 AM

I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.

Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.

My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.

[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...

➕ show 6 replies

iib • today at 8:22 AM

Has anybody figured some of the best flags to compile llama.cpp for rocm? I'm using the framework desktop and the Vulkan backend, because it was easier to compile out of the box, but I feel there's large peformance gains on the table by swtiching to rocm. Not sure if installing with brew on ubuntu would be easier.

➕ show 1 reply

rothific • today at 8:48 AM

I've been experimenting with running Gemma with MLX directly within my own harness: https://github.com/cjroth/mlx-harness

mrkeen • today at 8:05 AM

> Red Hat’s ramalama is worth a look too, a container-native model runner that explicitly credits its upstream dependencies front and center. Exactly what Ollama should have done from the start.

  % ramalama run qwen3.5-9b
  Error: Manifest for qwen3.5-9b:latest was not found in the Ollama registry

➕ show 1 reply

nextlevelwizard • today at 9:10 AM

I am running ollama as back end and open webui as front end. It handled downloading and swapping between models.

What is the llama-cpp alternative?

thot_experiment • today at 7:42 AM

I was pretty big on ollama, it seemed like a great default solution. I had alpha that it was a trash organization but I didn't listen because I just liked having a reliable inference backend that didn't require me to install torch. I switched to llama.cpp for everything maybe 6 months ago because of how fucking frustrating every one of my interactions with ollama (the organization) were. I wanna publicly apologize to everyone who's concerns I brushed off. Ollama is a vampire on the culture and their demise cannot come soon enough.

FWIW llama.cpp does almost everything ollama does better than ollama with the exception of model management, but like, be real, you can just ask it to write an API of your preferred shape and qwen will handle it without issue.

mentalgear • today at 7:09 AM

> Ollama is a Y Combinator-backed (W21) startup, founded by engineers who previously built a Docker GUI that was acquired by Docker Inc. The playbook is familiar: wrap an existing open-source project in a user-friendly interface, build a user base, raise money, then figure out monetization.

    The progression follows the pattern cleanly:

    1. Launch on open source, build on llama.cpp, gain community trust
    2. Minimize attribution, make the product look self-sufficient to investors
    3. Create lock-in, proprietary model registry format, hashed filenames that don’t work with other tools
    4. Launch closed-source components, the GUI app
    5. Add cloud services, the monetization vector

san_tekart • today at 7:28 AM

The CLI is great locally, but the architecture fights you in production. Putting a stateful daemon that manages its own blob storage inside a container is a classic anti-pattern. I ended up moving to a proper stateless binary like llama-server for k8s.

opem • today at 8:52 AM

There is also lemonade-server from AMD. Although I am not sure if that is any better.

dhruv3006 • today at 7:24 AM

ollama is pretty intuitive to use still - dont see why will stop.

aquir • today at 8:35 AM

I did not know! Shady :(

I was using LM Studio since I've moved to MacOS so that's fine I guess

NamlchakKhandro • today at 7:40 AM

LM Studio is 1000x easier to use than ollama btw

asim • today at 8:29 AM

Ah man the VC death trap. It's ok. I don't mean it like that but this is classic. It's unavoidable. They gotta make money. They took money, they gotta make money. It's not easy. Everyone has principles, developers more than anyone. They are developers, they are people like you and me. They didn't even start as ollama. They started as a kubernetes infra project in YC and pivoted. Listen don't be hard on these guys. It's hard enough. Trust me I did it. And not as well them.

This is the game. We shouldn't delude ourselves into thinking there are alternative ways to become profitable around open source, there aren't. You effectively end up in this trap and there's no escape and then you have to compromise on everything to build the company, return the money, make a profit. You took people's money, now you have to make good, there's no choice. And anyone who thinks differently is deluded. Open source only goes one way. To the enterprise. Everything else is burning money and wasting time. Look at Docker. Textbook example of the enormous struggle to capture the value of a project that had so much potential, defined an industry and ultimately failed. Even the reboot failed. Sorry. It did.

This stuff is messy. Give them some credit. They give you an epic open source project. Be grateful for that. And now if you want to move on, move on. They don't need a hard time. They're already having a hard time. These guys are probably sweating bullets trying to make it work while their investors breathe down their necks waiting for the payoff. Let them breathe.

Good luck to you ollama guys!

➕ show 1 reply

renierbotha • today at 8:52 AM

Thank you, I needed to read this.

DeathArrow • today at 7:34 AM

I see no mention of vLLM in the article.

➕ show 1 reply

Havoc • today at 7:39 AM

Alas people want convenience and don’t care about this sort of stuff.

yokoprime • today at 6:21 AM

i had no idea about all this. especially the performance and bugs. thanks for informing me!

dnnddidiej • today at 6:17 AM

On a practical note if fumbles connection handling as to be unusable to download anything.

alfiedotwtf • today at 8:58 AM

I'm a llama.cpp user, but apart from the MIT licensing issue, I personally don't see what's the problem here is? Sure Ollama could have advertised better that llama.cpp was it's original backend, but were they obligated to? It's no different to Docker or VMWare that hitch a ride on kernel primitives etc.

sminchev • today at 8:11 AM

With such concurrency in the market, it is unforgivable to manage a product that way. The concurrency will kill you.

Clients get disappointed, alternatives have better services, and more are popping out monthly. If they continue that way, nothing good will happen, unfortunately :(

WhereIsTheTruth • today at 8:50 AM

The state of LLM as a service is just depressing

It is a parasitic stack that redirects investment into service wrappers while leaving core infrastructure underfunded

We have to suffer with limits and quotas as if we are living in the Soviet Union

damnitbuilds • today at 8:26 AM

I am trying to run models that are on the edge of what my hardware can support. I guess many people are.

So given, as the author states, Ollama runs the LLMs inefficiently, what is the tool that runs them most efficiently on limited hardware ?

NamlchakKhandro • today at 7:40 AM

drop ollama in the bin, no one needs it.

dackdel • today at 6:19 AM

i use goose by block

➕ show 1 reply

goodpoint • today at 7:23 AM

The missing attribution pattern is nasty.

eternaut • today at 7:49 AM

the article nails it!

arcza • today at 7:01 AM

I find the style of writing incredibly annoying (it doesn't make the point, full of hyperbole) and the website has the standard slopsite black background and glowing CSS.

➕ show 1 reply

stuaxo • today at 8:17 AM

Way too much text - feels LLM written.

At the top could have been a link to equivalent llamacpp workflows to ollamas.

I wish the op had gone back and written this as a human, I agree with not using Ollama but don't like reading slop.

➕ show 3 replies

alt Hacker News

The local LLM ecosystem doesn’t need Ollama

Comments

🔗 View 7 more comments