logoalt Hacker News

josalhortoday at 8:17 AM3 repliesview on HN

We need LLM query routing at the OS level like Mobile data. I know it will sound crazy but hear me out. I think about this AI inference as infrastructure. I do not want to pay for it on every app I use it on. I do not think "I have to pay the mobile data of youtube, and the mobile data of whatsapp etc.". I pay Mobile data infrastructure and let my device route it appropiately. In fact, if we ever go the local llm route, you could have LLM capabilities without having access to the internet (or local LAN), and your OS/computer is the only one capable of doing that routing for you.


Replies

solenoid0937today at 9:09 AM

It doesn't sound crazy at all, this seems almost obvious. The OS should provide a chat completions server and the user should be able to select the underlying LLM's server. This should be just like selecting a default search engine or browser.

Hopefully the EU forces US tech giants to do this. God knows Apple and Google won't do this on their own. They gotta get that sweet default provider revenue.

show 2 replies
utopiahtoday at 12:05 PM

Honestly I don't get the point but if you want to explore that, both on desktop, mobile or headless server Linux allows you to try it.

You can run ollama with whatever you want on a Debian in literally minutes. You can even do that within a virtual machine using e.g. QEMU, so that you can do all the tests you need risk free.

Again I don't understand what that would enable that can't be done today but it's perfectly fine, you can try today anyway, no need to ask permission to anyone.

show 1 reply
throwaway894345today at 1:30 PM

I mean, the reason mobile data is part of the OS is because the antenna is hardware that must be shared across processes. Chat completions is just a network call like anything else—it’s already available to every app; they don’t need to pay separately (they can use the same account), they just pass their API key over the network to the completions server. What am I missing?

show 1 reply