> I get the idea: get 10k each samples of human data and AI data, train a simple classifier until it gets 99.9999% accuracy or <10k false negatives per day at your scale
The issue is, that's not a thing. AI-generated content and human-generated content have significant overlap. No amount of training data can allow you to distinguish them with that level of accuracy because many outputs exist that could have been generated by either one. Additional training data allows you to say that the probability is 55.0374% plus or minus 0.0001, rather than only being able to say that it's 55% plus or minus 5%. It can tell you with greater precision exactly how ambiguous it is. What it can't do is remove the ambiguity.