Only need one, they're not thinking critically about the media they consume during training.
The secret sauce about having good understanding, taste and style (both for coding and writing) has always been in the fine tuning and RHLF steps. I'd be skeptical if the signals a few GitHub repos or blogs generate at the initial stages of the learning are that critical. There's probably a filter also for good taste on the initial training set and these are so large not even a single full epoch is done on the data these days.
It wouldn’t work at all.
Here's a sad prediction: over the coming few years, AIs will get significantly better at critical evaluation of sources, while humans will get even worse at it.