
Where are Docker images stored? - thoward37
http://blog.thoward37.me/articles/where-are-docker-images-stored/
======
yackob03
Images may also be stored in one of the available private registries.
<shameless plug> As the Co-Founder, I am partial to Quay.io
[[https://quay.io](https://quay.io)], which in my not so humble opinion has
the best features, reliability, and support for businesses and organizations,
but there are other options if for some reason Quay.io doesn't meet your
needs. For those who prefer to self-host, we've also got an enterprise option,
which brings all of the index and registry goodness behind your firewall.
</shameless plug>

That said, we love the Docker ecosystem and way of doing things. A sibling
comment mentioned how complicated Docker is, but I think when you realize that
they are trying to offer DVCS like features and paradigms, you will realize
that it is complicated for a reason. We all thought git was complicated at
first as well.

~~~
thoward37
Regarding the shameless product plug: Quay looks like a very cool product.
Love the history and diff views. Glad to see pricing mimics Github model "pay
for private, but public is free and unlimited". Awesome!

Regarding complexity in Docker: So here's the thing, people wanted npm, but
they got git. How can we bridge the gap between a easy to use, out of your way
package manager and a fully featured DVCS experience? I love the idea of
merging them, but IMO, need to make the semantic model more accessible.
Specifically, need to ensure concepts are properly orthogonal, not overloaded,
and unambiguously defined. Might be too late to scrub this aspect though.

Some other general problems are things like checksums, fingerprints, image
signing, etc. How to verify the validity of an image?

~~~
yackob03
I will speak to the issues about which I am familiar.

Checksums are currently uploaded by the client and verified by the registry.
Signing is on the roadmap[1]. I'm not sure what you mean by a fingerprint,
would this be analogous to an SSH host key? What function would it serve if
you already had a signature that only you could reproduce?

[1]:
[https://github.com/dotcloud/docker/issues/2700](https://github.com/dotcloud/docker/issues/2700)

~~~
thoward37
A fingerprint is just a small, easy to recognize string that identifies a pub
key of a trusted individual. It's helpful with recognizing the "trustfulness"
of a release. More important than the fingerprint though is the pub key of the
release engineer, and a web of trust to verify that key.

The process that is the gold standard for this, IMO, is what's used over at
Apache Software Foundation.

[https://www.apache.org/dev/release-
signing.html](https://www.apache.org/dev/release-signing.html)

For those who aren't familiar with the topic, I'll illustrate with a release I
made a few years ago, here's the release artifacts for Lucene.Net 2.9.2:

[http://www.apache.org/dist/incubator/lucene.net/source/2.9.2...](http://www.apache.org/dist/incubator/lucene.net/source/2.9.2-incubating/)

You'll find a .zip, .asc, .md5, and .sha1 file. The .zip is the release
artifact. The MD5 and SHA1 are just two different hashes to prove that the
package you got is not corrupt and is what it should be, similar to a checksum
(note: these hashes should also be signed, IMO). The .asc is a signature for
the release.

A signature is made from the release engineer's key pair and the release
artifact. gpg can take the .asc and the .zip as inputs and tell you what pub
key made the signature (and it reports it as a short fingerprint). If you've
imported a trusted key into gpg, it will tell you that it's a verified and
trusted key, and tell you who it was.

My pub key for ASF signing is available here:

[http://people.apache.org/~thoward/F1AADDE6.asc](http://people.apache.org/~thoward/F1AADDE6.asc)

If you pull all these files together and verify them, this should be your
result:

$ curl -sSL
[http://people.apache.org/\~thoward/F1AADDE6.asc](http://people.apache.org/\\~thoward/F1AADDE6.asc)
| gpg --import gpg: key F1AADDE6: public key "Troy Howard (CODE SIGNING KEY)
<thoward@apache.org>" imported gpg: Total number processed: 1 gpg: imported: 1
(RSA: 1)

$ gpg --verify ~/Downloads/Apache-Lucene.Net-2.9.2-incubating.src.zip.asc
~/Downloads/Apache-Lucene.Net-2.9.2-incubating.src.zip gpg: Signature made Fri
Feb 25 09:33:40 2011 PST using RSA key ID F1AADDE6 gpg: Good signature from
"Troy Howard (CODE SIGNING KEY) <thoward@apache.org>" gpg: WARNING: This key
is not certified with a trusted signature! gpg: There is no indication that
the signature belongs to the owner. Primary key fingerprint: 062B 4DAF 06F8
61CD 2E71 E40B 8EAA A8A8 F1AA DDE6

Anything else, and you should not use the release.

A good package and release system, like Docker Index/Registry should build
these verifications in automatically. A tool like Quay can host pub keys, and
can automatically sign images. The Docker Index API can be extended slightly
to support fetching the signature. Docker itself could be extended to support
"verified" mode, where it refuses to run images that don't have a signature,
or fail key verification from a trusted set of keys.

Hmm.. maybe I need to write another blog post. ;)

------
peterwwillis
Holy crap this is complicated. (how docker works, not your write-up)

------
j_s
(1) Who pays the bills for the public registry (docker.io) and why?

(2) Is there a future possibility similar to how so many Ruby projects fall
down when GitHub goes down due to not 'pit-of-success-ing'[1] a copy of
everything locally?

[1]
[http://blogs.msdn.com/b/brada/archive/2003/10/02/50420.aspx](http://blogs.msdn.com/b/brada/archive/2003/10/02/50420.aspx)

~~~
sergiotapia
Why do you '[1]' links?

~~~
gknoy
Part of HN's convention, especially since there are often more than one link,
is to use end-noted links rather than inline ones. This makes the prose easier
to read, and still makes it easy to identify and annotate the importance of
links.

~~~
derefr
I kind of wonder why HN doesn't support [Markdown-
style]([http://daringfireball.net/projects/markdown/syntax#link](http://daringfireball.net/projects/markdown/syntax#link))
links. It _almost_ makes me miss Reddit.

I suppose it's to make it clear where links go, and discourage trolling... but
browsers failing to show you where links go is a failing of browser chrome,
and people who are concerned about that can install extensions that make link
destinations more obvious.

------
derefr
To put it another way:

1\. a Docker _registry_ is like (or maybe just _is_ ) an S3 bucket: a dumb,
private object-store.

2\. A docker _index_ is a database-backed web service with a REST API, that
clients talk to.

3\. The web service can generate temporary tokens that let you GET things
from, and PUT things in, the bucket.

4\. The web service's database has a model of an image "project" similar to a
Git repository: version history, branches, and other metadata.

5\. The bucket contains the image repository's "object pool." Just like git,
when you pull a branch, the client downloads all the "objects" required to
check out that branch.

------
mey
Is there any way to run your own Docker repository?

Edit: Looks like the correct term is registry.

~~~
jimrhoskins
Yes, assuming you mean your own docker registry. Docker has open sourced
[https://github.com/dotcloud/docker-
registry](https://github.com/dotcloud/docker-registry)

Remember the registry deals with the actual data, and delegates auth and other
stuff to an index. The docker-registry has a dummy implementation of an index
that has no notion of authentication or authorization, so anything you push to
your private registry is really public if you don't secure it with some other
method.

For me, I wanted to have a real private registry with access control limited
to my team. It turned out to not be too difficult to make our own
registry+index implementation that is private by default. It has a basic web
interface too. I've open sourced development of it
[https://github.com/jimrhoskins/stevedore](https://github.com/jimrhoskins/stevedore)
. It's still really rough, especially on the web interface stuff, but for
push/pull operations with required authentication, it does the job now.

------
avgp
"So why did it say that? I have no idea, but you can ignore it." \- because it
is "uploaded" from the CLI tool to the daemon. See
[http://docs.docker.io/en/latest/use/builder/#usage](http://docs.docker.io/en/latest/use/builder/#usage)

------
bstar77
I've started using quay.io to store my docker images. I have not used the
service heavily (because I'm still building out my Dockerfiles), but what I
have used has been great.

Docker.io is great to use to store images, unless you don't want them public.
I have proprietary apps loaded on my images so making them public is not an
option.

So far Docker is set to solve my scalability problem that i've been seeking
for the past year. Since VMs are not ideal, I start with a farm of bare ubuntu
servers and scale out to VM's in the cloud if needed. With Docker I can
configure once and deploy to all of these nodes no matter how they were built.
I stopped using Chef when I realized I could accomplish my goals with a
fraction of the complexity and effort.

------
hoprocker
Great writeup. I always appreciate somebody else doing a down-to-earth
overview of something complicated. This post makes me wonder if some sort of
flowchart or UML diagram of the docker system components wouldn't be a useful
thing.

