logoalt Hacker News

aleksiy123yesterday at 9:57 PM0 repliesview on HN

It’s also just a useful exercise in general, especially for getting feedback for models and harnesses.

I’ve been thinking about setting up a non trivial project to use as a benchmark for any plugins and/or harness changes I make.

Having a prebuilt verification suite is great. You can use it to asses things like token usage, time, across different harnesses, models, plugins.