Frontier models being in the hands of a handful companies does not help either. Let's hope that the open weight movement changes that soon.
realistically any 'huge' frontier model that takes a rack of H100s to infer against is probably going to have downtime no matter who runs it.
downtime is always going to 'scale' poorly against loads that require a lot of hardware thrown at them, even with lots of good fail-over -- probably worse for the small vendors because they don't have the contracts supplying them with hardware first so availability is already at a premium for them.
so, I guess i'm saying yeah I hope frontier-level-models get out soon in the open arenas, but I suspect the same or similar level of exclusivity will exist as long as they take that much compute to operate.
If it goes as well as the 'open' / federated social network alternatives of the 2010s, I wouldn't count on it.
Gemma 4 has made a lot of progress in this area. The model is phenomenal. It's size is workable. This is the worst it will ever be.