"WebRTC is the problem" is bait; his real claim is "WebRTC has annoying transport-layer characteristics that hurt cloud Voice AI scaling"...
Having just had to tackle this again for my own startup, I'm reminded about what you would lose by ditching WebRTC - the audio DSP pipeline, transmit side VAD, echo cancellation, noise suppression, NAT traversal maturity, codec integration, browser ubiquity etc.
You don't need NAT traversal when talking to a cloud service.