It's not that you want it to be faster, but you want the latency to be predictable and reliable...

CamouflagedKiwi • yesterday at 11:12 PM • 0 replies • view on HN

It's not that you want it to be faster, but you want the latency to be predictable and reliable, which is much more the case for local inference than sending it away over a network (and especially to the current set of frontier model providers who don't exactly have standout reliability numbers).

alt Hacker News