> rollout will be incremental and it will self monitor by defining success conditions at rollout time.
This sounds a lot like allowing an LLM to define tests as well as implementation, and allowing the LLM to update the tests to make the code pass. Recently people have come to understand (again?) that testing and evaluation works better outside of the sandbox.
Sorry I wasn't very clear about that part. I think success conditions are described by stakeholders, whoever that is, and then the implementation of monitoring them is probably created by the LLM. For engineering level stakeholders that's going to be metrics, performance, etc. Whereas for more business side stakeholders that'll be a mix of data metrics and product feature metrics, click-through rates, stuff like that