logoalt Hacker News

Voxtral Transcribe 2

972 pointsby meetpateltechlast Wednesday at 3:08 PM237 commentsview on HN

Comments

ewuhiclast Wednesday at 5:52 PM

Can it translate in real time?

show 1 reply
bytesandbitsyesterday at 4:17 AM

wow Mistral really cooked

scotty79last Wednesday at 8:23 PM

Do you know anything better for Polish language, low quality audio than Whisper large-v3 through WhisperX?

This combo has almost unbeatable accuracy and it rejects noises in the background really well. It can even reject people talking in the background.

The only better thing I've seen is Ursa model from Speechmatics. Not open weights unfortunately.

antirezlast Wednesday at 10:26 PM

Disappointing how this lacks a clear reference implementation, if not mixed at almost yet unreleased VLLM (nightly version) stuff. I'm ok with Open Weights being a form of OSS in the case of models, because frankly I don't believe that, for large LLMs, it is feasible to release the training data, all the orchestration stuff, and so forth. But it can't be: here are the weights, we partnered with VLLM for inference. Come on. Open Weights must mean that you put me in a situation to write an implementation easily for any hardware.

p.s. even the demo uses a remote server via websocket.

dumpstatelast Wednesday at 5:31 PM

I'm on voxtral-mini-latest and that's why I started seeing 500s today lol

boringglast Wednesday at 4:52 PM

Pseudo related -- am I the only one uncomfortable using my voice with AI for the concern that once it is in the training model it is forever reproducible? As a non-public person it seems like a risk vector (albeit small),

show 1 reply
varispeedlast Wednesday at 4:06 PM

[flagged]

show 3 replies