My general understanding of the concenus on most models these days is that people consider google models to be some of the worst at tool calling, so certainly an interesting choice. Did you do any evals on this?