It's not fast enough to be realtime, though you could do a more advanced UI and a ring buffer a...

refulgentis • yesterday at 6:57 AM • 0 replies • view on HN

It's not fast enough to be realtime, though you could do a more advanced UI and a ring buffer and have it as you describe. (ex. I do this with Whisper in Flutter, and also inference GGUFs in llama.cpp via Dart)

This isn't even close to realtime on M4 Max. Whisper's ~realtime on any device post-2022 with an ONNX implementation. The extra inference cost isn't worth the WER decrease on consumer hardware, or at least, wouldn't be worth the time implementing.

alt Hacker News