All these new datacenters are going to be a huge sunk cost. Why would you pay OpenAI when you can host your own hyper efficient Chinese model for like 90% less cost at 90% of the performance. At that is compared to today's subsidized pricing, which they can't keep up forever.
>to today's subsidized pricing, which they can't keep up forever.
The APIs are not subsidized, they probably have quite the large margin actually: https://lmsys.org/blog/2025-05-05-large-scale-ep/
>Why would you pay OpenAI when you can host your own hyper efficient Chinese model
The 48GB of VRAM or unified memory required to run this model at 4bits is not free either.
Eventually Nvidia or a shrewd competitor will release 64/128gb consumer cards; locally hosted GPT 3.5+ is right around the corner, we're just waiting for consumer hardware to catch up at this point.