logoalt Hacker News

a_t48yesterday at 10:34 PM1 replyview on HN

charles, amit, can you go into more about the path based caching? Particularly "shared bytes aren’t guaranteed to be in the exact same container image layer"? I've built something that solves issues around sharing data between layers, and am interested to see if it fits usecases like Modal's.

Edit: "The solution is to disaggregate the container launcher (runc for Docker, runsc for gVisor) from the container image delivery" is exactly what I've done! I've not built a lazy FUSE on top of it (yet! except for cache mounts in BuildKit), but it's on my TODO list. I guess I'm mainly curious what stops bytes from being shared in your case.


Replies

charles_irlyesterday at 10:55 PM

To clarify: we do content-based hashing, and when we say "shared bytes aren’t guaranteed to be in the exact same container image layer", what we mean is that

FROM some/image RUN pip install torch==2.7.1

and

FROM another/image RUN pip install torch==2.7.1

will produce images with very high overlap in contents, which will be shared by a content-based cache, but those images' final layers are disjoint from the perspective of a layerwise cache.