logoalt Hacker News

yorwbayesterday at 10:28 AM1 replyview on HN

If you have the correct furigana, you could even detect when the TTS model picked the wrong reading and regenerate.

But how do you know the furigana are correct? Unless you start out fully human-annotated text, you need some automated procedure to add furigana, which pushes the problem from "TTS AI picked the wrong reading" to "furigana AI picked the wrong reading."


Replies

mariano54yesterday at 11:25 AM

Yes it pushes the problem, but it's a much easier problem, and models like Gemini flash 2.5 do very well.