logoalt Hacker News

guerythontoday at 3:15 AM2 repliesview on HN

Nice work. One metric I’d really like to see for streaming use cases is partial stability, not just final WER.

For voice agents, the painful failure mode is partials getting rewritten every few hundred ms. If you can share it, metrics like median first-token latency, real-time factor, and "% partial tokens revised after 1s / 3s" on noisy far-field audio would make comparisons much more actionable.

If those numbers look good, this seems very promising for local assistant pipelines.


Replies

regularfrytoday at 1:50 PM

Tangentially, have you got any idea what the equivalent "partial tokens revised" rate for humans is? I know I've consciously experienced backtracking and re-interpreting words before, and presumably it happens subconsciously all the time. But that means there's a bound on how low it's reasonable to expect that rate to be, and I don't have an intuition for what it is.

PranayKumarJaintoday at 10:30 AM

[flagged]