logoalt Hacker News

htrpyesterday at 11:33 PM1 replyview on HN

I've seen the same, Sparks are great at non time-sensitive tasks. if you can set up a agentic loop that does not require human intervention, you can design around the memory bandwidth limitations


Replies

girvotoday at 12:36 AM

The other benefit is that speculative decoding literally trades compute to make up for low bandwidth, so MTP/EAGLE/DFlash are unreasonably effective on the GB10 IMO, as long as your use case fits it.

I’m getting 40tk/s decode with 1000+tk/prefill with a 198B-A11B model on mine