logoalt Hacker News

throwthrowuknowyesterday at 4:59 PM2 repliesview on HN

it doesn’t need to go in the phone if it only takes a few milliseconds to respond and is cheap


Replies

yunwalyesterday at 7:06 PM

Perceptible latency is somewhere between 10 and 100ms. Even if an LLM was hosted in every aws region in the world, latency would likely be annoying if you were expecting near-realtime responses (for example, if you were using an llm as autocomplete while typing). If, say, apple had an LLM on a chip any app could use some SDK to access, it could feasibly unlock a whole bunch of usecases that would be impractical with a network call.

Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.

https://www.cloudping.co/

hamdingersyesterday at 5:07 PM

It does if you care about who can access to your tokens