logoalt Hacker News

BoredomIsFunyesterday at 7:26 PM0 repliesview on HN

No, not in milliseconds if you have longish context. Prefill is very compute heavy, compared to inference.