This is essentially tool use with a filesystem interface — the LLM decides what to read instead of a retrieval pipeline choosing for it. Clean idea, and it sidesteps the chunking problem entirely.
Curious about the latency though. RAG is one round trip: embed query, fetch chunks, generate. This approach seems like it needs multiple LLM calls to navigate the tree before it can answer. How many hops does it typically take, and did you have to do anything special to keep response times reasonable?
In their case it was competing with cloning an entire repo before starting a session which was taking 10s of seconds.