Do we know if this is better than Nvidia Parakeet V3? That has been my go-to model locally and it's hard to imagine there's something even better.
I’m curious about this too. On my M1 Max MacBook I use the Handy app on macOS with Parakeet V3 and I get near instant transcription, accuracy slightly less than slower Whisper models, but that drop is immaterial when talking to CLI coding agents, which is where I find the most use for this.
I've been using Parakeet V3 locally and totally ancedotaly this feels more accurate but slightly slower
I liked Parakeet v3 a lot until it started to drop whole sentences, willy-nilly.
Parakeet is really good imo too, and it's just 0.6B so it can actually run on edge devices. 4B is massive, I don't see Voxtral running realtime on an Orin or fitting on a Hailo. An Orin Nano probably can't even load it at BF16.
Came here to ask the same question!
I've been using nemotron ASR with my own ported inference, and happy about it:
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0...
https://github.com/m1el/nemotron-asr.cpp https://huggingface.co/m1el/nemotron-speech-streaming-0.6B-g...