Why can't a speech to speech model do tool calls? Others like Gemini live do it just fine.

satvikpendem • yesterday at 2:26 PM • 2 replies • view on HN

Replies

Ok, I was wrong. I just tested ChatGPT voice, Claude Voice and Gemini Live. And all three are able to do web search. For some reason, I thought when I tested ChatGPT voice a few weeks ago, it sometimes said it can’t directly open links, but it can do web search, which was strange.

raw_anon_1111 • yesterday at 2:54 PM

If it is doing a tool call, it has to convert the speech to text or at least a JSON object of the necessary parameters for the tool and convert the result to speech doesn’t it? Is it truly speech to speech then?

➕ show 1 reply

alt Hacker News

Replies