That happens in most speech to text systems, even Superwhisper, Monologue and Wispr Flow. I read somewhere it comes from training on YouTube audio and happens when there is silence. I guess it depends on the model but most of them are based on Whisper which has this problem
That happens in most speech to text systems, even Superwhisper, Monologue and Wispr Flow. I read somewhere it comes from training on YouTube audio and happens when there is silence. I guess it depends on the model but most of them are based on Whisper which has this problem