Of course, the trained model they use to do the code generation may itself have been trained on the very open source code they are trying to replicate 'cleanly'.