logoalt Hacker News

nkzdtoday at 5:50 PM3 repliesview on HN

Why are you 'worried' about it? Shouldn't we strive for better technology even if it means some will 'lose'?


Replies

yorwbatoday at 6:09 PM

"Better" isn't just about increasing benchmark numbers. Often, it's more important that a system fails safely than how often it fails. Automatic speech recognition that guesses when the input is unclear will occasionally be right and therefore have a lower word error rate, but if it's important that the output be correct, it might be better to insert "[unintelligible]" and have a human double-check.

ks2048today at 8:08 PM

Ideally, you'd be able to specify exactly what you want - do you want to write-out filled pauses ("aaah", "umm")? Do you want to get a transcription of the the disfluencies - re-starts, etc. or just get out a cleaned up version?

IshKebabtoday at 7:13 PM

It's better in terms of WER. It's not better in terms of not making shit up that sounds plausible.

Probably the answer is simply to tweak the metric so it's a bit more smart than WER - allow "unclear" output which is penalised less than actually incorrect answers. I'd be surprised if nobody has done that.