I run whisper through openwebui to gemma4 moe and use kokoro TTS back to me.
I use a 5060ti 16gb and a minipc.
I tunnel in via Tailscale and access it with my phone or laptop from anywhere. It’s pretty good and will only get better as I optimize.