logoalt Hacker News

danudeyyesterday at 4:34 PM3 repliesview on HN

So throughput was already good but TTFT was the metric that needed more improvement?


Replies

zamadatixyesterday at 5:28 PM

To add to the sibling "good is relative" it also depends what you're running, not just your relative tolerances of what good is. E.g. in a MoE the decode speedup means the speed of prompt processing delay is more noticeable for the same size model in RAM.

convenwisyesterday at 4:46 PM

Good is relative but first token was clearly the biggest limitation.

brooksttoday at 1:22 AM

Yeah TTFT was terrible. I don’t think it’s unreasonable to benchmark the most-improved metric.