it doesn’t need to go in the phone if it only takes a few milliseconds to respond and is cheap

throwthrowuknow • yesterday at 4:59 PM • 2 replies • view on HN

Replies

Perceptible latency is somewhere between 10 and 100ms. Even if an LLM was hosted in every aws region in the world, latency would likely be annoying if you were expecting near-realtime responses (for example, if you were using an llm as autocomplete while typing). If, say, apple had an LLM on a chip any app could use some SDK to access, it could feasibly unlock a whole bunch of usecases that would be impractical with a network call.

Also, offline access is still a necessity for many usecases. If you have something like an autocomplete feature that stops working when you're on the subway, the change in UX between offline and online makes the feature more disruptive than helpful.

https://www.cloudping.co/

hamdingers • yesterday at 5:07 PM

It does if you care about who can access to your tokens

alt Hacker News

Replies