A billion stupid LLMs don't make a smart one, they just make one stupid LLM that's really fast at stupidity.
I think maybe there are subsets of problems where you can have either a human or a smart LLM write a verifier (e.g. a property-based test?) and a performance measurement and let the dumb models generate candidates iterate on candidates?
I think maybe there are subsets of problems where you can have either a human or a smart LLM write a verifier (e.g. a property-based test?) and a performance measurement and let the dumb models generate candidates iterate on candidates?