Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

278 points • by Curiositry • today at 1:17 AM • 27 comments • view on HN

Comments

I use the open source Handy [1] app with Parakeet V3 for STT when talking to coding agents and I’ve yet to see anything that beats this setup in terms of speed/accuracy. I get near instant transcription, and the slight accuracy drop is immaterial when talking to AIs that can “read between the lines”.

I tried incorporating this Voxtral C implementation into Handy but got very slow transcriptions on my M1 Max MacBook 64GB.

[1] https://github.com/cjpais/Handy

I’ll have to try the other implementations mentioned here.

➕ show 1 reply

mythz • today at 9:52 AM

Big fan of Salvatore's voxtral.c and flux2.c projects - hope they continue to get optimized as it'd be great to have lean options without external deps. Unfortunately it's currently too slow for real-world use (AMD 7800X3D/Blas) when adding Voice Input support to llms-py [1].

In the end Omarchy's new support for voxtype.io provided the nicest UX, followed by Whisper.cpp, and despite being slower, OpenAI's Whisper is still a solid local transcription option.

Also very impressed with both the performance and price of Mistral's new Voxtral Transcription API [2] - really fast/instant and really cheap ($0.003/min), IMO best option in CPU/disk-constrained environments.

[1] https://llmspy.org/docs/features/voice-input

[2] https://docs.mistral.ai/models/voxtral-mini-transcribe-26-02

➕ show 3 replies

Curiositry • today at 3:45 AM

This was a breeze to install on Linux. However, I haven't managed to get realtime transcription working yet, ala Whisper.cpp stream or Moonshine.

--from-mic only supports Mac. I'm able to capture audio with ffmpeg, but adapting the ffmpeg example to use mic capture hasn't worked yet:

ffmpeg -f pulse -channels 1 -i 1 -f s16le - 2>/dev/null | ./voxtral -d voxtral-model --stdin

It's possible my system is simply under spec for the default model.

I'd like to be able to use this with the voxtral-q4.gguf quantized model from here: https://huggingface.co/TrevorJS/voxtral-mini-realtime-gguf

➕ show 2 replies

written-beyond • today at 8:18 AM

Funny, this and the Rust runtime implementation are neck and neck on the frontpage right now.

Cool project!

hrpnk • today at 9:50 AM

There is also a MLX implementation: https://github.com/awni/voxmlx

sgt • today at 8:05 AM

I'm very interested in speech to text - but like tricky dialects and use of various terminologies but I'm still confused as to where to start in the best possible place, in order to train the models with a huge database of voice samples I own.

Any ideas from the HN crowd currently involved in speech 2 text models?

ks2048 • today at 6:31 PM

Should this work on a 16GB M3 MacBook Pro? It starts to load, but hangs or is too slow.

9999_points • today at 4:42 PM

It seems so bizarre that we need a nearly 9gb model to do something you could do over 20 years ago with ~200mb.

sylware • today at 11:20 AM

Finally a plain and simple C lib to run LLM opened weights?

MORPHOICES • today at 9:51 AM

[dead]

genie3io • today at 8:30 AM

[dead]

alextray812 • today at 12:29 PM

From a cybersecurity perspective, this project is impressive not just for performance, but for transparency.

alt Hacker News

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

Comments