I get the sentiment for self hosting. But there are a few counter arguments:
- Self hosting is expensive. It involves expensive machines with GPUs that cost hundreds per month if you use cloud based ones. You might need multiple of those. And you need people to mind those machines and they are even more expensive per month.
- If you run stuff on your laptop, it consumes a lot of resources and energy. I have qwen running on my laptop. Even minimal usage turns my laptop in a radiator. Nice as a demo, but I can't have it this hot all the time. It would run out of battery, and it's probably not great for longevity of components in the laptop.
- Models are evolving quickly and the self hosted smaller ones aren't as good when it comes to things like tool usage, reasoning, etc. Being able to switch tot he latest model is valuable.
- It's easier to get your use case working with one of the top models than with one of the smaller self hosted ones.
- If you get the wrong hardware, it might not be able to run the latest models very soon.
- Self hosting models is mostly a cost optimization. It only becomes relevant if you hit a certain scale.
- You have alternatives in the form of hosted models via a wide range of service providers. Some of those are EU based and offer all the things you'd be looking for if you are offering your services there. Including legal requirements.
- Reinventing what these companies do in house is technically challenging and possibly more expensive than self hosting models because now you need a lot of engineering capacity dedicated to that. And legal. And all the rest.
If, like most companies/people, you are at the experimenting stage, the cheapest and fastest is just getting an API key from an API provider of your choice. You can take it from there if your experiment actually works. And then it's mostly about optimizing cost. If your API usage goes to the thousands per month or worse, it becomes a cost/quality trade off.