This is really neat. I’m working on something similar but for data artifacts not just code. It’s very encouraging to see that this kind of tooling helps both humans and models, that was what made me starting to work on that.
There is still no good "data diff" tool that I can run on, say, a big pile of CSV or Parquet. Something with DVC integration would be especially welcome.
Thanks! The data artifacts angle is really interesting. in some ways the problem is even harder there because data pipelines have less explicit structure than code, I guess.