logoalt Hacker News

taffydavidyesterday at 8:04 AM1 replyview on HN

I've built my own tts apps testing whisper and while it's good it does hallucinate quite a bit if there's noise, or just sometimes when the audio is perfectly clear.

It often gives the illusion of being very good but I could record a half hour of me speaking and discover some very random stuff in the middle that I did not say


Replies

cootsnuckyesterday at 4:29 PM

Yup, you're absolutely right. The open source models do have their rough edges. I use NVIDIA's Parakeet v3 model a lot locally, and it will occasionally do this thing where it just repeats a word like a dozen times.