Ah I see! Yes, I was talking about a coding harness, not an enterprise agent. I entirely agree with you that your suggestion of driving it via evals is the right thing for that use case!