logoalt Hacker News

rohanuclatoday at 3:58 AM1 replyview on HN

Thanks! The data artifacts angle is really interesting. in some ways the problem is even harder there because data pipelines have less explicit structure than code, I guess.


Replies

gwerbintoday at 4:08 AM

The artifacts themselves have more structure, but diffing is hard because of size: what exactly do you show in the different? Row-level? Summary statistics? How do you keep it from getting slow on bigger datasets?

Then there are plots saved as images which have basically no structure at all exposed.

show 1 reply