logoalt Hacker News

OsamaJaberyesterday at 6:40 PM0 repliesview on HN

Small models in the browser are a different optimization problem than small models on a server. On server you chase throughput so you batch. In browser you're stuck at batch size 1, which means kernel launch overhead and memory bandwidth dominate, not FLOPs