17k TPS is slow compared to other probabilistic models. It was possible to hit ~10-20 million TPS de...

bmc7505 • today at 5:40 PM • 0 replies • view on HN

17k TPS is slow compared to other probabilistic models. It was possible to hit ~10-20 million TPS decades ago with n-gram and PDFA models, without custom silicon. A more informative KPI would be Pass@k on a downstream reasoning task - for many such benchmarks, increasing token throughput by several orders of magnitude does not even move the needle on sample efficiency.

alt Hacker News