logoalt Hacker News

o7ri6246iu45last Thursday at 7:43 PM2 repliesview on HN

there generally is no central shared immutable image store because every job is using its own collection of images.

what you're describing might work well for a small team, but when you have a few hundred to thousand researchers sharing the cluster, very few of those layers are actually shared between jobs

even with a handful of users, most of these container images get fat at the python package installation layer, and that layer is one of the most frequently changed layers, and is frequently only used for a single job


Replies

0xbadcafebeelast Thursday at 10:23 PM

Just to review, here are the options:

1. Create an 8gb file on network storage which is loopback-mounted. Accessing the file requires a block store pull over the network for every file access. According to your claim now, these giant blobs are rarely shared between jobs?

2. Create a Docker image in a remote registry. Layers are downloaded as necessary. According to your claim now, most of the containers will have a single layer which is both huge and changed every time python packages are changed, which you're saying is usually done for each job?

Both of these seem bad.

For the giant loopback file, why are there so many of these giant files which (it would seem) are almost identical except for the python differences? Why are they constantly changing? Why are they all so different? Why does every job have a different image?

For the container images, why are they having bloated image layers when python packages change? Python files are not huge. The layers should be between 5-100MB once new packages are installed. If the network is as fast as you say, transferring this once (even at job start) should take what, 2 seconds, if that? Do it before the job starts and it's instantaneous.

The whole thing sounds inefficient. If we can make kubernetes clusters run 10,000 microservices across 5,000 nodes and make it fast enough for the biggest sites in the world, we can make an HPC cluster (which has higher performance hardware) work too. The people setting this up need to optimize.

show 1 reply
robertlagrantlast Friday at 9:54 AM

> container images get fat at the python package installation layer, and that layer is one of the most frequently changed layers

This might be mitigated by having a standard set of packages, which you install in a lower layer, and then changing ones, at a higher layer.

show 1 reply