logoalt Hacker News

ankit219yesterday at 5:33 PM1 replyview on HN

People are misunderstanding Anthropic's fast mode because they chose to name it that way. The hints all point to a specific thing they did. The setup is costlier, its also smarter and better on tougher problems which is unheard of in terms of speed. This paper[1] fits perfectly:

The setup is parallel distill and refine. You start with parallel trajectories instead of one, then distill from them, and refine that to get to an answer. Instead of taking all trajectories to completion, they distill it quickly and refine so it gives outputs fast and yet smarter.

- paper came out in nov 2025

- three months is a good research to production pipeline

- one of the authors is at anthropic

- this approach will definitely burn more tokens than a usual simple run.

- > Anthropic explicitly warns that time to first token might still be slow (or even slower)

To what people are saying, speculative decoding wont be smarter or make any difference. Batching could be faster, but then not as costly.

Gemini Deepthink and gpt-5.2-pro use the same underlying parallel test time compute but they take each trajectory to completion before distilling and refining for the user.

[1]: https://arxiv.org/abs/2510.01123


Replies

xcodevnyesterday at 6:30 PM

The official document from Anthropic:

> Fast mode is not a different model. It uses the same Opus 4.6 with a different API configuration that prioritizes speed over cost efficiency. You get identical quality and capabilities, just faster responses.