I think it's because it's non-deterministic too. You can't iteratively improve design the same way you can code.
If they wanted couldn't they do something like RLHF? Instead of humans picking the best of 2 text outputs, they pick the best rendered design
If they wanted couldn't they do something like RLHF? Instead of humans picking the best of 2 text outputs, they pick the best rendered design