It's a variable-rate codec. The audio is still compressed, but by how much depends on the durat...

yorwba • today at 1:11 PM • 0 replies • view on HN

It's a variable-rate codec. The audio is still compressed, but by how much depends on the duration of the segment corresponding to a particular text token. The TTS model predicts one audio token per text token and its duration, and the audio decoder fills in a waveform of the appropriate length.

alt Hacker News