> They are very trivial to detect.
Today. Trying to detect AI is like extracting water from puddles in a lake that is quickly drying up. What is the point in the short term if it's impractical in the long term? It will catch some low-hanging fruit in the best case, and will find false positives in the worst.
My point is you should consider creating truly undetectable audio end to end with AI to be effectively impossible for the foreseeable future (i.e., I would bet money it is still trivially detectable five years from now). It won't be detectable to humans, though, only models.