logoalt Hacker News

lukebechteltoday at 7:03 PM0 repliesview on HN

Yes, speculative decoding will make both us and VLLM faster, but we believe it would be a relatively even bump on both sides, so we didn't include it in this comparison. Worth another test!