First thing is first, this is really cool. This feels like the right way to frame LLM-assisted engineering. AI can generate a shocking amount of code, but the actual value is in the review discipline, and tests around it. The browser Kubernetes angle is cool, but what I find more interesting is the workflow, and especially testing behaviour against k8s instead of just trusting “looks right.” I do wonder how many teams are already doing this level of verification for AI-written code. It might be the direction everyone goes in over the next few years.
I mean this is a specific case where you literally have a spec to code against. Not all coding endeavors have that opportunity, unfortunately.