It's a race to the bottom. DeepSeek beats all others (single-shot), and it is ~50% cheaper than the cost of local electricity only.
> DeepSeek V3.2 Reasoning 86.2% ~$0.002 API, single-shot
> ATLAS V3 (pass@1-v(k=3)) 74.6% ~$0.004 Local electricity only, best-of-3 + repair pipeline
I've tested many open models, Deepseek 3.2 is the only SOTA similar.
You could use this approach with DeepSeek as well. The innovation here is that you can generate a bunch of solutions, use a small model to pick promising candidates and then test them. Then you feed errors back to the generator model and iterate. In a way, it's sort of like a genetic algorithm that converges on a solution.
All those parameters and it still won't answer questions about Tianenman Square in 1989... :(
> cheaper than the cost of local electricity only.
Can you explain what that means?
I will "suffer" through .004 of electricity if I can run it on my own computer