Why can't a speech to speech model do tool calls? Others like Gemini live do it just fine.
If it is doing a tool call, it has to convert the speech to text or at least a JSON object of the necessary parameters for the tool and convert the result to speech doesn’t it? Is it truly speech to speech then?
Ok, I was wrong. I just tested ChatGPT voice, Claude Voice and Gemini Live. And all three are able to do web search. For some reason, I thought when I tested ChatGPT voice a few weeks ago, it sometimes said it can’t directly open links, but it can do web search, which was strange.