Gemini just doesn’t do even mildly well in agentic stuff and I don’t know why.
OpenAI has mostly caught up with Claude in agentic stuff, but Google needs to be there and be there quickly
I suspect a large part of Google's lag is due to being overly focused on integrating Gemini with their existing product and app lines.
the agentic benchmarks for 3.1 indicate Gemini has caught up. the gains are big from 3.0 to 3.1.
For example the APEX-Agents benchmark for long time horizon investment banking, consulting and legal work:
1. Gemini 3.1 Pro - 33.2% 2. Opus 4.6 - 29.8% 3. GPT 5.2 Codex - 27.6% 4. Gemini Flash 3.0 - 24.0% 5. GPT 5.2 - 23.0% 6. Gemini 3.0 Pro - 18.0%
My guess is that Gemini team didn't focus on the large-scale RL training for the agentic workload. And they are trying to catch up with 3.1.
It's like anything Google - they do the cool part and then lose interest with the last 10%. Writing code is easy, building products that print money is hard.
Can you explain what you mean by its bad at agentic stuff?
Because Search is not agentic.
Most of Gemini's users are Search converts doing extended-Search-like behaviors.
Agentic workflows are a VERY small percentage of all LLM usage at the moment. As that market becomes more important, Google will pour more resources into it.