logoalt Hacker News

mariano54yesterday at 9:41 AM1 replyview on HN

Minimax's new model is quite good. We use their voices for some of our Japanese tutors. The pitch accent is almost perfect.

There are incorrect reading or Chinese readings occasionally, but you can tell when that happens due to the furigana being different


Replies

yorwbayesterday at 10:28 AM

If you have the correct furigana, you could even detect when the TTS model picked the wrong reading and regenerate.

But how do you know the furigana are correct? Unless you start out fully human-annotated text, you need some automated procedure to add furigana, which pushes the problem from "TTS AI picked the wrong reading" to "furigana AI picked the wrong reading."

show 1 reply