I can’t really take this too seriously. This seems to me to be a case of asking “can an LLM do X?” Instead, the question is like to see is: “I want to do X, is an LLM this right tool?”
But that said, I think the author missed something. LLMs aren’t great at this type of reasoning/state task, but they are good at writing programs. Instead of asking the LLM to search with a drone, it would be very interesting to know how they performed if you asked them to write a program to search with a drone.
This is more aligned with the strengths of LLMs, so I could see this as having more success.