I am myself working on something similar, but i have noticed that if I try to pass on early speech from the user to the LLM to reduce latency, chances of interruptions get even higher. For example, the user may say something like “Yes” followed by a brief pause, leading the speech model to count that as a complete turn, triggering the LLM call. But then the user may add something more, so i have to cancel the previous request so that any irreversible state transitions can be avoided. Now due to the lower latency (due to speculative calls), I get an even smaller window to actually cancel the response or even to stop the model from speaking.