No, it isn't. The private data is largely private data, created by highly-specialized, highly-paid contracted teams of experts for domains finance, swe, consulting, etc.
Reddit data is just not that interesting, that deal is worth like $60m/year. Labs spend 10x as much on computer-use RL environments.
Sorry but your argument doesn't seem coherent: How is the cost of RL relevant here?
It would also help if you could substantiate your initial claim (i.e. "internet training data is not where frontier capabilities come from")