> Notwithstanding the fact that there's about zero difference between `ollama run model-name` and `llama-cpp -hf model-name`
There is a TON of difference. Ollama downloads the model from its own model library server, sticks it somewhere in your home folder with a hashed name and a proprietary configuration that doesn't use the in built metadata specified by the model creator. So you can't share it with any other tool, you can't change parameters like temp on the fly, and you are stuck with whatever quants they offer.
This was my issue with current client ecosystem. I get a .guff file. I should be able to open my AI Client of choice and File -> Open and select a .guff. Same as opening a .txt file. Alternatively, I have cloned a HF model, all AI Clients should automatically check for the HF cache folder.
The current offering have interfaces to HuggingFace or some model repo. They get you the model based on what they think your hardware can handle and save it to %user%/App Data/Local/%app name%/... (on windows). When I evaluated running locally I ended up with 3 different folders containing copies of the same model in different directory structures.
It seems like HuggingFace uses %user%/.cache/.. however, some of the apps still get the HF models and save them to their own directories.
Those features are 'fine' for a casual user who sticks with one program. It seems designed from the start to lock you into their wrapper. In the end they are all using llama cpp, comfy ui, openvino etc to abstract away the backed. Again this is fine but hiding the files from the user seems strange to me. If you're leaning on HF then why now use their own .cache?
In the end I get the latest llama.cpp releases for CUDA and SYCL and run llama-server. My best UX has been with LM Studio and AI Playground. I want to try Local AI and vLLM next. I just want control over the damn files.