I think this is missing the point, These are tools that enable the LLM to do things that humans can do easily.
It stops an LLM from being blocked by the inability to do this thing. Removing this barrier might enable the LLM to complete a task that would be considerable work for a human.
For instance, identifying which files are PNG files containing pictures of birds, regardless of filename, presence or absence of suffix. An image handling LLM can identify if an image is of a bird much more easily than it could determine that an arbitrary file is a png. They can probably still do it, wasting a lot of tokens along the way, but using a few commands to determine which files to even bother looking at as images means the LLM can do what it is good at.