wow Mistral really cooked
Do you know anything better for Polish language, low quality audio than Whisper large-v3 through WhisperX?
This combo has almost unbeatable accuracy and it rejects noises in the background really well. It can even reject people talking in the background.
The only better thing I've seen is Ursa model from Speechmatics. Not open weights unfortunately.
Disappointing how this lacks a clear reference implementation, if not mixed at almost yet unreleased VLLM (nightly version) stuff. I'm ok with Open Weights being a form of OSS in the case of models, because frankly I don't believe that, for large LLMs, it is feasible to release the training data, all the orchestration stuff, and so forth. But it can't be: here are the weights, we partnered with VLLM for inference. Come on. Open Weights must mean that you put me in a situation to write an implementation easily for any hardware.
p.s. even the demo uses a remote server via websocket.
I'm on voxtral-mini-latest and that's why I started seeing 500s today lol
Pseudo related -- am I the only one uncomfortable using my voice with AI for the concern that once it is in the training model it is forever reproducible? As a non-public person it seems like a risk vector (albeit small),
Can it translate in real time?