logoalt Hacker News

cedwsyesterday at 2:16 PM1 replyview on HN

The network latency bit deserves more attention. I’ve been trying to find out where AI companies are physically serving LLMs from but it’s difficult to find information about this. If I’m sitting in London and use Claude, where are the requests actually being served?

The ideal world would be an edge network like Cloudflare for LLMs so a nearby POP serves your requests. I’m not sure how viable this is. On classic hardware I think it would require massive infra buildout, but maybe ASICs could be the key to making this viable.


Replies

Twirrimyesterday at 3:58 PM

> The network latency bit deserves more attention. I’ve been trying to find out where AI companies are physically serving LLMs from but it’s difficult to find information about this. If I’m sitting in London and use Claude, where are the requests actually being served?

Unfortunately, as with most of the AI providers, it's wherever they've been able to find available power and capacity. They've contracts with all of the large cloud vendors and lack of capacity is significant enough of an issue that locality isn't really part of the equation.

The only things they're particular about locality for is the infrastructure they use for training runs, where they need lots of interconnected capacity with low latency links.

Inference is wherever, whenever. You could be having your requests processed halfway around the world, or right next door, from one minute to the next.

show 1 reply