> Generated in 0.008s • 14,293 tok/s
Chat Jimmy runs ~300X faster than the ~50 tok/s you are used to. What could you do differently when you are able to generate code 3,000 - 30,000X as fast as you could code it yourself? What if it was all good quality code? What would you do differently if it were 100,000X faster? mtok/s? gtok/s?
refine that to: what if your harness grew to encompass a larger, slower model and adapted to both the model and the project. thats where i expect the harness to go.
use the big models to code an adaptive small model. train it to use and build tools. give it a standard temple language for any project and bake it into a chip.
right now, LLMs are great because they dont need much data pruning, but once they break through to the functional components, the first thing to do is train a well scoped harness builder.