You are assuming statistical independence, which is explicitly not correct here. There is also an error in your analysis - what matters is whether they make the same wrong assumption. That is far less likely, and becomes exponentially unlikely with increasing trials.
I can attest that it works well in practice, and my organization is already deploying this technique internally.
How several wrong assumptions make it right with increasing trials?