Thanks! The data artifacts angle is really interesting. in some ways the problem is even harder there because data pipelines have less explicit structure than code, I guess.
The artifacts themselves have more structure, but diffing is hard because of size: what exactly do you show in the different? Row-level? Summary statistics? How do you keep it from getting slow on bigger datasets?
Then there are plots saved as images which have basically no structure at all exposed.
The artifacts themselves have more structure, but diffing is hard because of size: what exactly do you show in the different? Row-level? Summary statistics? How do you keep it from getting slow on bigger datasets?
Then there are plots saved as images which have basically no structure at all exposed.