logoalt Hacker News

CamouflagedKiwiyesterday at 11:12 PM0 repliesview on HN

It's not that you want it to be faster, but you want the latency to be predictable and reliable, which is much more the case for local inference than sending it away over a network (and especially to the current set of frontier model providers who don't exactly have standout reliability numbers).