Although, I wonder how many orders of magnitude in terms of affordability the utilization rate actually gets them. Realistically if you use a self-hosted LLM for your job, you might be using it, what, a solid 6 hours per day? Assuming you can keep it actually fed, while working (so, some agentic thing might be necessary, I guess it will need to be more than VSCode autocomplete and responding to individual prompts). Anyway, that starts you out at 1/4’th the utilization, a 4X price increase might be worth paying for privacy and stability (no sudden change in model behavior, no price changes, no days when the system is over-utilized for reasons outside your control).
Rather I think it is just hard for local LLMs to compete in this early stage when the cloud providers are allowed by investors to be unprofitable.
> Realistically if you use a self-hosted LLM for your job, you might be using it, what, a solid 6 hours per day?
You can grow the utilization rate well beyond that if you don't always care about getting a quick, real-time response. (And if you do, then maybe the cloud model was the better deal after all!)