The problem with tokens is that they have wrong incentive. The quicker model arrives at the solution the less tokens you have to buy.
So I noticed the model is purposefully coming with dumb ideas or running around in circles and only when you tell it that they are trying to defraud you, they suddenly come back with a right solution.