I built this because I wanted to see how far I could get with a voice-to-text app that used 100% local models so no data left my computer. I've been using a ton for coding and emails. Experimenting with using it as a voice interface for my other agents too. 100% open-source MIT license, would love feedback, PRs, and ideas on where to take it.
[dead]
[dead]
[dead]
[dead]
[flagged]
[dead]
[flagged]
[dead]
[dead]
[flagged]
nice to see this running fully local. what model size are you shipping as default, and what's the cold-start time on Apple Silicon? I've been using Whisper locally for meeting transcription and the biggest friction point is always endpoint detection - knowing when you've stopped talking vs pausing to think. curious how you handle that with hold-to-talk.
[flagged]
[dead]
[dead]
[dead]
[dead]
always mac. when windows? why can you just make things multios
[dead]