logoalt Hacker News

ycui7today at 12:44 AM0 repliesview on HN

This is not surprising at all. The biggest benefit of cloud model in terms of energy efficiency is that when running more than 1 requests, the power consumption of said GPU roughly stayed the same. The more concurrency requests the server can handle, the less power each request consume. The server GPU is already likely more energy efficient than local GPU, concurrency make the cost structure unbeatable by local hardware. It is generally assumed the local hardware only run 1 request, but if the local engine is meant to serve a small business with meaningful concurrency, the economy might still work out.