Your voxtral.c work was a big motivator for me. I built a macOS menu bar dictation app (

T0mSIlver • yesterday at 10:14 AM • 1 reply • view on HN

Your voxtral.c work was a big motivator for me. I built a macOS menu bar dictation app (https://github.com/T0mSIlver/localvoxtral) around Voxtral Realtime, currently using a voxmlx fork with an OpenAI Realtime WebSocket server I added on top.

The thing that sold me on Voxtral Realtime over Whisper-based models for dictation is the causal encoder. Text streaming in as you speak rather than appearing after you stop is a fundamentally different UX. On M1 Pro with a 4-bit quant through voxmlx it feels responsive enough for natural dictation, though I haven't done proper latency benchmarks yet.

Integrating voxtral.c as a backend is on my roadmap, compiling to a single native binary makes it much easier to bundle into a macOS app than a Python-based backend.

Replies

solarkraft • yesterday at 2:12 PM

> Text streaming in as you speak rather than appearing after you stop is a fundamentally different UX

100%. I don’t understand how people are able to compromise on this.

alt Hacker News

Replies