You should note that Claude Design is most likely a DPO->PPO->Actor-Critic bootstrap play: https://arxiv.org/abs/2305.18290 / https://en.wikipedia.org/wiki/Proximal_policy_optimization / https://spinningup.openai.com/en/latest/algorithms/sac.html
It's much harder to RL out design taste because it's not self-grounding, and human labelers have no real skin in the game, so this (having a human with a vested outcome in the process directing a model's work) is the best way to get LLMs better at design/"taste"/aesthetic judgment themselves. We were working on the same thing 7 months ago and then I realized that winning over designers to do this would be a huge uphill battle setting up an inevitable fall from grace later on.
What makes me most suspicious of Claude Design is that when you disconnect and reconnect later, it loses context and nags you that the product doesn't work like that. Bullshit. It's at best an anti-abuse/implementation detail (to keep you from launching 10 at once and coming back to them later) or product shortcoming that just so happens to be optimized for keeping you from continuing your design in better tools than theirs for the inevitable followups.
It's great for one shots and it makes sense when you're trying to build a vertical product development stack like Anthropic but I'm disappointed it feels more like a tool optimized for keeping you in their product than for what you're working on. If a company other than Anthropic had shipped this - it's not that hard to build a visual self-eval loop, just use Chrome Devtools Protocol to run headless chrome and take screenshots -> feed into a judge LLM for feedback -> continue - I don't think it would really have seen much adoption.
That said, AI trained on Actor-Critic with a tight human feedback loop definitely seems like the right approach to solving the problem, just not something I want to spend my time training for someone else unless I can do so with higher "entropy" ie high parallelism/optionality
Where does the article mention Claude Design? It seems to me the author is using LLMs as a tool for iteration, given he is a designer.
Also, you're mentioning a lot of unrelated tech. DPO, PPO, actor-critic, visual self-eval loops, Anthropic's "vertical product development stack" may be interesting, but they are mostly orthogonal. The article's point is simply that a designer can now turn design proposals into working prototypes faster than with Figma.
Also, you mention what seems to be a random product bug about disconnect and reconnect that doesn't have anything to do with this workflow. It seems to me that you're post-rationalising some insights that are not really there.
Good to think things through and in public, not discouraging it. I hope this reads as constructive.