logoalt Hacker News

wongarsuyesterday at 7:36 PM0 repliesview on HN

See also https://marginlab.ai/trackers/claude-code-historical-perform... for a more conventional approach to track regressions

This project is somewhat unconventional in its approach, but that might reveal issues that are masked in typical benchmark datasets