logoalt Hacker News

satvikpendemyesterday at 2:26 PM2 repliesview on HN

Why can't a speech to speech model do tool calls? Others like Gemini live do it just fine.


Replies

d4rkp4tternyesterday at 2:47 PM

Ok, I was wrong. I just tested ChatGPT voice, Claude Voice and Gemini Live. And all three are able to do web search. For some reason, I thought when I tested ChatGPT voice a few weeks ago, it sometimes said it can’t directly open links, but it can do web search, which was strange.

raw_anon_1111yesterday at 2:54 PM

If it is doing a tool call, it has to convert the speech to text or at least a JSON object of the necessary parameters for the tool and convert the result to speech doesn’t it? Is it truly speech to speech then?

show 1 reply