With this speed, you can keep looping and generating code until it passes all tests. If you have tests.
Generate lots of solutions and mix and match. This allows a new way to look at LLMs.
And then it's slow again to finally find a correct answer...
Agreed, this is exciting, and has me thinking about completely different orchestrator patterns. You could begin to approach the solution space much more like a traditional optimization strategy such as CMA-ES. Rather than expect the first answer to be correct, you diverge wildly before converging.
This is what people already do with “ralph” loops using the top coding models. It’s slow relative to this, but still very fast compared to hand-coding.
This doesn't work. The model outputs the most probable tokens. Running it again and asking for less probable tokens just results in the same but with more errors.
Not just looping, you could do a parallel graph search of the solution-space until you hit one that works.