Small models in the browser are a different optimization problem than small models on a server. On s...

OsamaJaber • yesterday at 6:40 PM • 0 replies • view on HN

Small models in the browser are a different optimization problem than small models on a server. On server you chase throughput so you batch. In browser you're stuck at batch size 1, which means kernel launch overhead and memory bandwidth dominate, not FLOPs

alt Hacker News