Oh right you're very focused on specifically the strawberry problem. I just gave that as a throwaway example. It's a solution but not necessarily the solution for something that simple.
My point was much more general, that code execution is a key part of these models ability to perform maths, analysis, and provide precise answers. It's not the only way, but a key way that's very efficient compared to more inference for CoT.
I agree that tool usage dramatically improves the utility of LLM's. But it is absolutely not needed for the strawberry problem.
It can perform complicated arithmatic without tools - multiplying multiple 20 digit numbers, division and so on (to an extent).