74% on LCB from a single 5060 Ti. I've been paying Anthropic per task and this guy is running it on electricity money, 20 minutes per task is rough for anything interactive though.
At 20 min per task you might as well code it yourself. Bill James needs to write a book on saber-metrics for LLM benchmarks.
At 20 min per task you might as well code it yourself. Bill James needs to write a book on saber-metrics for LLM benchmarks.