Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.
On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.
Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.
Incidentally, waiting for Apple to blow this all up with native STT any day now. :)
I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.
Thanks for sharing! I was literally getting ready to build, essentially, this. Now it looks like I don't have to!
Have you ever considered using a foot-pedal for PTT?
Apple incidentally already has native STT, but for some reason they just don't use a decent model yet.
Nice, I've been using Hyprwhspr on Omarchy daily for a while now, it's been awesome, thanks very much.
looks like there's a nearly identically named one for Hyprland
Also, wish it was on nixpkgs, where at least it will be almost guaranteed to build forever =)
How does it compare to the more well established https://github.com/cjpais/handy? Are there any stand out features (for either option)? What was the reason for writing your own rather than using or improving existing software?