
CRFS: Container Registry Filesystem - helper
https://github.com/golang/build/blob/master/crfs/README.md
======
koolba
> Fortunately, we can fix the fact that tar.gz files are unindexed and
> unseekable, while still making the file a valid tar.gz file by taking
> advantage of the fact that two gzip streams can be concatenated and still be
> a valid gzip stream. So you can just make a tar file where each tar entry is
> its own gzip stream.

I'm surprised nobody came up with this idea till now. It's brilliantly simple.

~~~
xyzzy_plugh
Well, for one, it's not obviously useful in traditional applications:

> This makes images a few percent larger (due to more gzip headers and loss of
> compression context between files), but it's plenty acceptable.

Compressing an entire image is generally great. Compressing all of the
individual files in an image, is generally not great.

~~~
bradfitz
Maybe not great, but like I said: acceptable.

About 7.6% bigger:
[https://github.com/golang/build/commit/8a5a4d227f08eb1d889fa...](https://github.com/golang/build/commit/8a5a4d227f08eb1d889fa0c0f67dc71020c222f1)

~~~
koolba
That's what I meant by "brilliantly simple"; namely having something like this
work with existing systems while imposing a marginal overhead.

------
bradfitz
Author here.

I just moved this to
[https://github.com/google/crfs](https://github.com/google/crfs) if people
want to track that repo instead of Go's build system (which is relatively
boring for most people).

------
toomuchtodo
This is very cool! I've been waiting to see someone enable tar.gz files to be
seekable so they could be object bundles stored in remote blob storage systems
that a client could mount and seek through on demand by byte range (so you
could treat data in a similar fashion to containers, or like a Mac DMG file
that had an open standard for remote mounting).

------
maxmcd
Conceivably this could be leveraged to allow docker for mac to only push
deltas to the build virtual machine when running docker build, correct?

Currently docker build compressed everything in the working directory on every
build. This is fine for building images for deploy/upload but is annoying for
a local dev situation where you're frequently rebuilding.

Seems like it wouldn't be too hard to write an alternate docker build that
checks a previously built "Stargz" and just sends the additional files? (There
would be some complexity here reassembling a valid tar within hyperkit).

I might be missing something here, it might be misplacing the bottleneck
during build, but every time I'm annoying by this problem it seems part of the
issue is the single fat tar that needs to be created every time.

edit: this strategy could also work with docker-machine building on remote
machines

------
catern
In the introduction:

>Currently, however, starting a container in many environments requires doing
a pull operation from a container registry to read the entire container image
from the registry and write the entire container image to the local machine's
disk. It's pretty silly (and wasteful) that a read operation becomes a write
operation.

What's silly is to claim that this is the problem. Any read is going to be a
write operation, at multiple levels, thanks to systems of transparent caching:
To a nearby CDN, to local disk, to local memory, to your CPU cache, etc. These
are optimizations, they aren't making your container startup any slower.

The real problem, which this tool indeed helps to solve, is that reading the
entire image must complete before you're able to start further processes which
read specific parts of the image. Not anything to do with "reads causing
writes".

~~~
bradfitz
Hi, author here.

The unnecessary writes I care about are to my cloud VM's small block device,
which is I/O limited. The best way to not wait for those is to not do the
writes in the first place.

~~~
catern
If you don't want those writes to a block device to happen, then you could
store your images in a tmpfs instead.

~~~
ithkuil
But then you have to provision space in memory for the whole image size or
else you'll need to spill the excess on disk, even those that are not
technically dirty (binaries and libraries and support files meant to be just
read, I.e. likely most of the image)

Demand paging the bits you actually use from the network solves both the
problem of not even loading cold pages, but also the ability to throw away
least recently used pages knowing you can fetch them again when needed.

~~~
catern
>But then you have to provision space in memory for the whole image size

No, you can have your container system load parts of the image on demand into
that tmpfs.

That's my point: this doesn't have anything to do with writing to memory,
writing to block device, etc. etc. This is just a matter of indexing the image
so that it can be lazily loaded. Which is of course good, but let's be clear
that it's orthogonal from whether you stick the images on local disk, stick
them only in memory, or whatever.

------
fulafel
If the bottleneck of pulling was eliminated by this, it means the test runs
didn't need to access most of the image, right? I wonder what this says about
carrying unnecessary stuff or test coverage. Especially since the base distro
layers were probably cached.

Edit: " For isolation and other reasons, we run all our containers in a
single-use fresh VMs." So they had no caching for the base layers unless those
were primed in the vm image?

------
tsurkoprt
Why not directly use www.lucidlink.com, same result but read/write

~~~
helper
Because it doesn't solve the problem the author is trying to solve? The goal
of this is to be able to produce backwards compatible tar.gz files that can be
served from a docker registry that also can be on demand streamed instead of
predownloaded.

If they just wanted a S3/GCS fuse filesystem there are plenty of open source
options out there.

~~~
tsurkoprt
Yeah, it does, one needs just to read instead of reactively comment.

~~~
helper
What do you want me to read?

------
whalesalad
TaaS – Tar as a Service.

