Isn't the standard to use continuous batching? If they are using continuous batching -- I'...

kbdiaz • yesterday at 11:16 PM • 1 reply • view on HN

Isn't the standard to use continuous batching? If they are using continuous batching -- I'm curious why generated token length matters, and why they might be clustering them. If not -- I'm curious why they aren't and what is the tradeoff here.

Replies

ACCount37 • today at 12:13 AM

This "~512 batching" makes me think of things like diffusion or prefill.

If they managed to put together some dirty hack that lets them generate about 512 tokens worth of reasoning in parallel instead of in sequence? That would explain it.

alt Hacker News

Replies