With things like the latest dlss (extremely high quality run time .. reinterpretation), I wonder how precise mesh etc has to be now.
1. extract even a super approximate (meaning, like square edges, with some visual details) mesh from gen ai or a scan as a starting point,
2. move things around and define volumes for gameplay needs,
3. name things ("this is a Victorian house in a surprisingly good condition compared to the neighborhood it's in"), have human guided gen ai polish the things a bit more from the labels within the bounds of the gameplay required volumes,
4. let run time dlss fix the lighting etc from the rough geometry