I believe Gemini is Websockets? I have the same experience with heavy/custom applications that try to roll their own media stuff.
You run into issues around AudioContext and resumption etc... it's a PITA to have to handle all those corner cases :(