How are you measuring the accuracy? Are you running this against any benchmarks?
I see this covers a file based approach, was there ever a consideration for a graph based approach?
For business context, how do you handle context that evolves over time?
Great questions! we're currently working on Spider 2 submission, hope to have first results soon. It's true that we took file-first approach and not a graph DB. Main reason is that the ktx wiki and semantic layer entities while being written in plain text files (md or yaml) still contain links to each other. This allows an agent to find the right entry point (with the help of lexical and semantic searches merged with RRF) and then traverse these links to collect enough context.
As for the business context evolution - that's exactly the reason we have ingestion reconciliation and git versioning. The idea is to give ingestion agent a way to deduplicate/consolidate knowledge during the ingestion and leave complex conflicts to humans to resolve