How are you iterating on a system prompt and tool descriptions without an eval that gives you hard n...

quantumleaper • yesterday at 2:00 PM • 1 reply • view on HN

How are you iterating on a system prompt and tool descriptions without an eval that gives you hard numbers for improvement or regression?

Replies

yogthos • yesterday at 5:41 PM

I look at what the model is doing in the loop and whether the harness is catching cases such as the model having to write scripts to balance parens, whether it's trying to do the same thing over and over again, and all the other cases I explained in detail in the blog post.

Even without having hard numbers, it's pretty easy to see from the log whether the model is getting stuck or not.

alt Hacker News

Replies