I'm skeptical of how fast "up to" 750t/s really means. Maybe if they make it extremely expensive so it frees up enough capacity?
GPT‑5.3‑Codex‑Spark currently runs on Cerebras chips and it's giving me around 150t/s. Still relatively very fast, but nowhere near the 1,000t/s they claimed at launch. (Also it's not a very good model.)
That said, I'm super bought in to faster models being better for most use cases than smarter models.
Soon the bottleneck will be how fast your laptop can grep for a string.
If it's 150 t/s, that's barely faster than Nvidia GPUs who are batching a lot more and are a lot more cost effective. Add in the Groq piece and Nvidia claims it can do 400 tokens/s.