You could get the agents to output something structured and then use a deterministic test if you're worried about that.