I have an open GitHub issue for macOS hardware detection. I don't have a Mac myself to do dev on but happy to accept a fork! I did assign a buddy to that issue but she's been slacking - call her out :p.
Latency is dependent on the guardrails firing, effectively. If nothing fires, it's a passthrough, for all intents and purposes, very little overhead. But if a retry nudge fires then that's another LLM call.
As a consumer for a home assistant, a retry nudge firing is something I'd catch, and have my voice model output a pre-baked "one sec, trying again" sort of filler message or something.