It's better in terms of WER. It's not better in terms of not making shit up that sounds plausible.
Probably the answer is simply to tweak the metric so it's a bit more smart than WER - allow "unclear" output which is penalised less than actually incorrect answers. I'd be surprised if nobody has done that.