I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too...

lxe • today at 5:05 AM • 5 replies • view on HN

I built something similar for Linux (yapyap — push-to-talk with whisper.cpp). The "local is too slow" argument doesn't hold up anymore if you have any GPU at all. whisper large-v3-turbo with CUDA on an RTX card transcribes a full paragraph in under a second. Even on CPU, parakeet is near-instant for short utterances.The "deep context" feature is clever, but screenshotting and sending to a cloud LLM feels like massive overkill for fixing name spelling. The accessibility API approach someone mentioned upthread is the right call — grab the focused field's content, nearby labels, window title. That's a tiny text prompt a 3B local model handles in milliseconds. No screenshots, no cloud, no latency.The real question with Groq-dependent tools: what happens when the free tier goes away? We've seen this movie before. Building on local models is slower today but doesn't have a rug-pull failure mode.

Replies

wolvoleo • today at 9:28 AM

Yeah local works really fine. I tried this other tool: https://github.com/KoljaB/RealtimeVoiceChat which allows you to live chat with a (local) LLM. With local whisper and local LLM (8b llama in my case) it works phenomenally and it responds so quickly that it feels like it's interrupting me.

Too bad that tool no longer seems to be developed. Looking for something similar. But it's really nice to see what's possible with local models.

h3lp • today at 7:56 PM

FWIW whisper.cpp with the default model works at 6x realtime transcription speed on my four-core ~2.4GHz laptop, and doesn't really stress CPU or memory. This is for batch transcribing podcasts.

The downside is that couldn't get it to segment for different speakers. The concensus seemed to be to use a separate tool.

BatteryMountain • today at 8:06 PM

I also built one.. mine is called whispy. I use mine to pump commands to claude. So far a bit hit & miss, still tweaking it.

Wowfunhappy • today at 6:31 AM

> The "local is too slow" argument doesn't hold up anymore if you have any GPU at all.

By "any GPU" you mean a physical, dedicated GPU card, right?

That's not a small requirement, especially on Macs.

➕ show 1 reply

wazoox • today at 10:02 AM

I've installed murmure on my 2013 Mac, and it works through 1073 words/minute. I don't know about you, but that's plenty faster than me :D

alt Hacker News

Replies