logoalt Hacker News

staticassertionyesterday at 7:34 PM0 repliesview on HN

Can you elaborate a bit on what "working correctly" would look like? I have made use of agents, so me saying "they worked correctly for me" would be evidence of them doing so, but I'd have to know what "correctly" means.

Maybe this comes down to what it would mean for an agent to do something. For example, if I were to prompt an agent then it wouldn't meet your criteria?