Congrats on the results. The streaming aspect is what I find most exciting here. I built a macOS d...

T0mSIlver • today at 9:48 AM • 0 replies • view on HN

Congrats on the results. The streaming aspect is what I find most exciting here.

I built a macOS dictation app (https://github.com/T0mSIlver/localvoxtral) on top of Voxtral Realtime, and the UX difference between streaming and offline STT is night and day. Words appearing while you're still talking completely changes the feedback loop. You catch errors in real time, you can adjust what you're saying mid-sentence, and the whole thing feels more natural. Going back to "record then wait" feels broken after that.

Curious how Moonshine's streaming latency compares in practice. Do you have numbers on time-to-first-token for the streaming mode? And on the serving side, do any of the integration options expose an OpenAI Realtime-compatible WebSocket endpoint?

alt Hacker News