Show HN: Free alternative to Wispr Flow, Superwhisper, and Monologue

254 points • by zachlatta • yesterday at 9:10 PM • 116 comments • view on HN

Comments

I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.

➕ show 3 replies

digitalbase • yesterday at 10:30 PM

Was searching for this this morning and settled on https://handy.computer/

➕ show 10 replies

p0w3n3d • yesterday at 10:09 PM

There's also an offline-running software called VoiceInk for macos. No need for groq or external AI.

https://github.com/Beingpax/VoiceInk

➕ show 3 replies

paweladamczuk • today at 6:44 PM

I also vibe coded my own version of this. Funny to see how many people did that.

https://github.com/PawelAdamczuk/blah

Mine was only tested on an Arc GPU (the acceleration works nicely through Vulkan). It hooks into Win32 API and simulates key presses so it works in various non-obvious contexts.

sathish316 • today at 1:52 AM

To build your own STT (speech-to-text) with a local model and and modify it, just ask Claude code to build it for you with this workflow.

F12 -> sox for recording -> temp.wav -> faster-whisper -> pbcopy -> notify-send to know what’s happening

https://github.com/sathish316/soupawhisper

I found a Linux version with a similar workflow and forked it to build the Mac version. It look less than 15 mins to ask Claude to modify it as per my needs.

F12 Press → arecord (ALSA) → temp.wav → faster-whisper → xclip + xdotool

https://github.com/ksred/soupawhisper

Thanks to faster-whisper and local models using quantization, I use it in all places where I was previously using Superwhisper in Docs, Terminal etc.

vesterde • yesterday at 11:00 PM

Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.

Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!

➕ show 1 reply

dkhenry • today at 1:09 PM

Just Text to speech seems like its largely solved on pretty much every compute platform. However I have found a huge gap going from independent words being transcribed, to formatted text ready for an editor, or further processing.

If you look at how authors dictate they works ( which they have done for millennia), just getting the words written down is only the first step, and its by far the easiest. I have been helping build a tool https://bookscribe.ai that not only does the transcription, but then can post process it to make it actually usable for longer form content.

➕ show 1 reply

strokirk • today at 7:46 AM

Does any of these solutions work reliably for non-English languages? I’ve had a lot of issues trying to transcribe Swedish with all the products I’ve used so far.

➕ show 1 reply

kombinar • yesterday at 9:28 PM

Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.

I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.

➕ show 1 reply

k9294 • today at 10:23 AM

I'm building in the same space, Workin on https://ottex.ai - It's a free STT app, with local models and BYOK support (OpenRouter, Groq, Mistral, and more).

The top feature is the per-app custom settings - you can peak different models and instructions for different apps and websites.

- I use the Parakeet fast model when working with Claude Code (VS Code app). - And I use a smart one when I draft notes in Obsidian. I have a prompt to clean up my rambling and format the result with proper Markdown, very convenient.

One more cool thing is that it allows me to use LLMs with audio input modalities directly (not as text post-processing). e.g. It sends the audio to Gemini and prompts it to transcribe, format, etc., in one run. I find it a bit slow to work with CC, but it is the absolute best model in terms of accuracy, understanding, and formatting. It is the only model I trust to understand what I meant and produce the correct result, even when I use multiple languages, tech terms, etc.

threekindwords • today at 1:54 AM

i've used macwhisper (paid), superwhisper (paid), and handy (free) but now prefer hex (free):

https://github.com/kitlangton/Hex

for me it strikes the balance of good, fast, and cheap for everyday transcription. macwhisper is overkill, superwhisper too clever, and handy too buggy. hex fits just right for me (so far)

➕ show 1 reply

zuInnp • today at 2:56 PM

I am a huge fan of OpenSuperWhisper (https://github.com/Starmel/OpenSuperWhisper). Works local and is more than enough for me.

rabf • today at 12:59 AM

https://github.com/rabfulton/Auriscribe

My take for X11 Linux systems. Small and low dependency except for the model download.

drooby • today at 12:47 AM

I just vibe coded a my own NaturalReader replacement. The subscription was $110/year... and I just canceled it.

Chatterbox TTS (from Resemble AI) does the voice generation, WhisperX gives word-level timestamps so you can click any word to jump, and FastAPI ties it all together with SSE streaming so audio starts playing before the whole thing is done generating.

There's a ~5s buffer up front while the first chunk generates, but after that each chunk streams in faster than realtime. So playback rarely stalls.

It took about 4 hours today... wild.

stranded22 • today at 8:40 AM

looks good although Mistral Voxtral would be a good choice, wouldn't it?

https://mistral.ai/news/voxtral-transcribe-2

Fidelix • yesterday at 9:41 PM

MacOS only. May this help you skip a click.

➕ show 3 replies

seyz • today at 1:01 PM

The moat here is local inference. Whisper.cpp + Metal gives you <500ms latency on M1 with the small model. no API costs + no privacy concerns. Ship that and you've got something the paid tools can't match. The UI is already solid, the edge is in going fully offline.

knob • today at 12:55 AM

This thread is a beautiful intro into our near future. Yet more and more custom coded software. Takes me back to the days of late 90s. Loving this!

muratsu • yesterday at 11:42 PM

For those using something like this daily, what key combinations do you use to record and cancel. I’m using my capslock right now but was curious about others

➕ show 6 replies

arach • today at 3:43 PM

Combine this with a nice shortcut keyboard and then you're really flying - my favorites are XP Pen and DOIO 16

vittore • today at 4:04 AM

For macos i found https://github.com/rselbach/jabber and was lately use that, but the iOS where I still need replacement.

➕ show 1 reply

yrral • today at 8:58 AM

Does anyone know of any macos transcription apps that allow you to do speech to text live? Eg, the text outputs as you are talking? Older tech like the macos dictation as well as dragon does this, but seems like theres nothing available that uses the new, better models.

spelk • yesterday at 10:38 PM

Does anyone know of an effective alternative for Android?

➕ show 4 replies

corlinp • yesterday at 11:35 PM

I created Voibe which takes a slightly different direction and uses gpt-4o-transcribe with a configurable custom prompt to achieve maximum accuracy (much better than Whisper). Requires your own OpenAI API key.

https://github.com/corlinp/voibe

I do see the name has since been taken by a paid service... shame.

arcologies1985 • yesterday at 9:34 PM

Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.

➕ show 2 replies

yuppiepuppie • today at 7:07 AM

Quick question, what’s the state of vibe coding with Xcode? I remember there were some issues months ago trying to get a seem less integration working. Has it improved?

johnbatch • today at 12:29 AM

Do any of these works as an iOS keyboard to replace the awful voice transcription Apple is currently shipping?

➕ show 2 replies

baxtr • yesterday at 11:00 PM

Is there a tool that preserves the audio? I want both, the transcript and the audio.

➕ show 1 reply

sonu27 • yesterday at 9:58 PM

Nice! I vibe coded the same this weekend but for OpenAI however less polished https://github.com/sonu27/voicebardictate

➕ show 1 reply

Laurenz1337 • today at 11:12 AM

I dont understand who this is for honestly. Unless you dont have hands, why would you want to talk to your computer. Maybe Im just autistic, but I would always prefer text over speaking out and have that translate to text.

➕ show 2 replies

hodanli • yesterday at 11:18 PM

title lacks: for Mac

lemming • yesterday at 10:56 PM

Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.

dcreater • today at 2:17 AM

Why do people find the need to market as "free alternative to xyz" when its a basic utility? I take it as an instant signal that the dev is a copycat and mostly interested in getting stars and eyeballs rather than making a genuinely useful high quality product.

Just use handy: https://github.com/cjpais/Handy

➕ show 1 reply

dan_wood • today at 8:08 AM

But SuperWhisper is free with Parakeet as a local model?

➕ show 1 reply

SomaticPirate • yesterday at 11:58 PM

Seeing this thread, sounds a blog post comparing the offerings would be useful

➕ show 1 reply

wazoox • today at 10:04 AM

Murmure is multiplatform, uses parakeet and can connect to your local llm (using ollama). https://murmure.al1x-ai.com/

ndgold • today at 12:13 PM

Vowen