China has cheap electricity.

jojobas • yesterday at 1:32 AM • 2 replies • view on HN

Replies

Well, also, LLM servers get much more efficient with request queue depth >1 - tokens per second per gpu are massively higher with 100 concurrents than 1 on eg vllm.

DeathArrow • yesterday at 8:42 AM

Yes, but the hardware they use for inference like Huawei Ascend 910C is less efficient than Nvidia H100 used in US due to the difference in the process node.

alt Hacker News

Replies