I’m running a local Whisper + Gemma 4 pipeline with a cheap USB mic to extract health related data and potential todos from ambient speech. It doesn’t have to be fast doesn’t have to be 100% correct because if it captures at least a few bits of interesting information that would otherwise go unnoticed it’s still a win.
I run whisper through openwebui to gemma4 moe and use kokoro TTS back to me.
I use a 5060ti 16gb and a minipc.
I tunnel in via Tailscale and access it with my phone or laptop from anywhere. It’s pretty good and will only get better as I optimize.