I think this is very pessimistic. Yes, big models are "smarter" and have more inherent knowledge but I'd bet you a coffee that what 99% of people want to do with Siri isn't "Write me an essay on the history of textiles" or "Vibe code me a SPA", rather it's "Send Mom the pictures I took of the kids yesterday" and "Hey, play that Deadmau5 album that came out a couple years back" which is more about tool calls than having wikipedia-level knowledge built in to the model.
> Hey, play that Deadmau5 album that came out a couple years back
It could work for Deadmau5 because it is probably popular enough to be part of the model. How about "Hey, play that $regional_artist's cover of Deadmau5" and the model needs to know about "regional_artist", the concept of "cover", where those remixes might be (youtube? soundcloud? some other place).
All of a sudden, it all breaks down. So it'll work for "turn off porch lights", but not for "turn off the lights that's in the front of the house"