Probably explains why Opus was trash for the last week - https://marginlab.ai/trackers/claude-code/. Curious if the new baseline will rise now in-line with the new benchmarks.
Nice. Can you release that for older models too? I've been using a mixture of releases recently, and cannot tell the difference between any of them.
This is cool. Thanks for sharing!
Nice. Can you release that for older models too? I've been using a mixture of releases recently, and cannot tell the difference between any of them.