Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Lazy-pulling containers: 65x faster pulls, but 20x slower readiness (zmalik.dev)
4 points by zmalik 34 days ago | hide | past | favorite | 2 comments


My take from all this is that OCI, being essentially a .tar.gz, is a bad image format. A smarter solution would either use a seekable archive format like zip, or profile the container startup to make a list of essential files, and put those at the beginning of the image where they would be unconditionally fetched. Also, a fully lazy fetch is bad for reliability: the image should slowly be slowly fetched in its entirety, after which restarts only work with local data.


We’ve all seen the benchmarks: "Lazy-pulling reduces container startup from 5 minutes to 500ms!" It looks great on a chart, but it hides a dangerous trade-off.

I built a benchmark to measure Readiness—the actual time until a container can serve an HTTP request, rather than just pull time. The results were surprising. While lazy-pulling (eStargz/FUSE) made pulls 65x faster, it made the application's first successful response 20x slower compared to a local registry full-pull.

Why? Because lazy-pulling doesn't remove the cost of downloading bytes; it just shifts it to the runtime. Your registry becomes a runtime dependency, and every uncached import torch becomes a network round-trip. In my latest post, I dive deep into:

- The OCI file format limits (DEFLATE chains) that make this hard.

- Why containerd’s snapshotter is the bottleneck.

- The operational risks of FUSE on your GPU nodes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: