logoalt Hacker News

syntextoday at 10:51 AM0 repliesview on HN

These benchmarks means very little. The real test is model + harness so agentic system that can fulfill given goals.