logoalt Hacker News

antireztoday at 8:20 AM2 repliesview on HN

Related:

https://github.com/antirez/qwen-asr

https://github.com/antirez/voxtral.c

Qwen-asr can easily transcribe live radio (see README) in any random laptop. It looks like we are going to see really cool things on local inference, now that automatic programming makes a lot simpler to create solid pipelines for new models in C, C++, Rust, ..., in a matter of hours.


Replies

T0mSIlvertoday at 10:14 AM

Your voxtral.c work was a big motivator for me. I built a macOS menu bar dictation app (https://github.com/T0mSIlver/localvoxtral) around Voxtral Realtime, currently using a voxmlx fork with an OpenAI Realtime WebSocket server I added on top.

The thing that sold me on Voxtral Realtime over Whisper-based models for dictation is the causal encoder. Text streaming in as you speak rather than appearing after you stop is a fundamentally different UX. On M1 Pro with a 4-bit quant through voxmlx it feels responsive enough for natural dictation, though I haven't done proper latency benchmarks yet.

Integrating voxtral.c as a backend is on my roadmap, compiling to a single native binary makes it much easier to bundle into a macOS app than a Python-based backend.

show 1 reply
pjmlptoday at 9:43 AM

Which is why long term current programming languages will eventually become less relevant in the whole programming stack, as in get the computer to automate tasks, regardless how.

show 1 reply