
Docker update ToS: Image retention limits imposed on free accounts - Benjamin_Dobell
https://www.docker.com/pricing/retentionfaq
======
znpy
For everybody complaining about having to pay actual money for goods and
services: if you're not okay with this you can run a self hosted registry.

The out of the box registry does very little and has a very poor user
experience.

But nowadays there are Harbor from VMware and Quay from RedHat that are open
source and easily self-hostable.

We run our own Harbor instance at work and I can tell you... Docker images are
NOT light. You think they are, they are not. It's easy to have images
proliferate a lot and burn a lot of disk space. Under some conditions when
layers are shared among too many images (can't recall the exact details here)
deleting an image may result in also deleting a lot more images (and this is
not the correct/expected/wanted behaviour) and that means that under some
circumstances you have to retain a lot more images or layers than you think
you should.

The thing is, I can only wonder how much bandwidth and disk space (oh and disk
space must be replicated for fault tolerance) must cost running a public
registry for everybody.

It hurts the open source ecosystem a bit, I understand... Maybe some middle
ground will be reached, dunno.

Edit: I also run harbor at home, it's not hard to setup and operate, you
should really check that out.

~~~
m463
I thought they didn't make private registries available because it would
"fragment the ecosystem"

Here's the conversation I saw:

[https://stackoverflow.com/questions/33054369/how-to-
change-t...](https://stackoverflow.com/questions/33054369/how-to-change-the-
default-docker-registry-from-docker-io-to-my-private-registry)

pointing to this:

[https://github.com/moby/moby/issues/7203](https://github.com/moby/moby/issues/7203)

and also there was this comment:

"It turns out this is actually possible, but not using the genuine Docker CE
or EE version.

You can either use Red Hat's fork of docker with the '\--add-registry' flag or
you can build docker from source yourself with registry/config.go modified to
use your own hard-coded default registry namespace/index."

~~~
justincormack
No you have misunderstood the issue. You can use any registry, just write out
the domain for it, this has always worked and is very widely used. Red Hat
changed the default if you don't specify a FQDN, before they decided not to
ship Docker at all.

~~~
m463
I understand that point, but it makes it harder to not "accidentally" pull
from a public registry with intertwined docker images (which most people use)

------
alexellisuk
Some thoughts / scenarios:

"Fine we will just pay" \- I have a personal account then 4 orgs, that's ~ 500
USD / year to keep older OSS online for users of openfaas/inlets/etc.

"We'll just ping the image very 6 mos" \- you have to iterate and discover
every image and tag in the accounts then pull them, retry if it fails. Oh and
bandwidth isn't free.

"Doesn't affect me" \- doesn't it? If you run a Kubernetes cluster, you'll do
100 pulls in no time from free / OSS components. The Hub will rate-limit you
at 100 per 6 hours (resets every 24?). That means you need to add an image
pull secret and a paid unmanned user to every Kubernetes cluster you run to
prevent an outage.

"You should rebuild images every 6 mo anyway!" \- have you ever worked with an
enterprise company? They do not upgrade like we do.

"It's fair, about time they charged" \- I agree with this, the costs must have
been insane, but why is there no provision for OSS projects? We'll see images
disappear because people can't afford to pay or to justify the costs.

A thread with community responses -
[https://twitter.com/alexellisuk/status/1293937111956099073?s...](https://twitter.com/alexellisuk/status/1293937111956099073?s=20)

~~~
pydry
>"You should rebuild images every 6 mo anyway!" \- have you ever worked with
an enterprise company? They do not upgrade like we do.

No, but they've got cash and are not price sensitive. Wringing money out of
them helps keep it cheap and/or free for everyone else.

Enterprise customers might as well fork over cash to docker rather than
_shudder_ Oracle.

~~~
sebazzz
Companies might base their image based on another image in the docker
registry. That image might be _good_ now, might be good in two years, but what
if I want to pull a, say .NET Core 1.1 docker image in four years?

Now, .NET Core 1.1 might not be the best example, but I'm sure you can think
of some example.

~~~
krferriter
If you anticipate needing that image around in 4 years for a critical business
case, you can either pull it once every 6 months from here on out, download
the image and store it somewhere yourself, or make a fully reproducible
Dockerfile for it so the image can be re-created later if it disappears from
the registry.

------
brutos
This will be quite bad for reproducible science. Publishing bioinformatics
tools as containers was becoming quite popular. Many of these tools have a
tiny niche audience and when a scientist wants to try to reproduce some
results from a paper published years ago with a specific version of a tool
they might be out of luck.

~~~
dijksterhuis
Simplest answer is to release the code with a Dockerfile. Anyone can then
inspect build steps, build the resulting image and run the experiments for
themselves.

Two major issues I can see are old dependencies (pin your versions!) and out
of support/no longer available binaries etc.

In which case, welcome to the world of long term support. It's a PITA.

~~~
Nullabillity
That doesn't help against expiring base images though.. :/

~~~
tensor
There needs to be a way to create a combined image of all dependencies to
distribute with a Dockerfile and code. That way people could still modify the
code and dockerfile.

~~~
bananadonkey
[https://docs.docker.com/engine/reference/commandline/save/](https://docs.docker.com/engine/reference/commandline/save/)

You already have the source, now you have a dist.

------
stevebmark
Makes sense. I don't get how Docker could offer so much free hosting in the
first place. I know storage is cheap, but not this cheap. Eventually they're
going to need to make these rules more stringent.

~~~
DJHenk
I also don't get why people put all their stuff on free services like this and
expect it to work until eternity.

Come on, if you stop just half a second and think about it, you know it is a
stupid idea and you know that one day you will have a problem. You really
don't have to be a genius for that. Same goes for all these other kinds of
"services" that are bundled together with things that used to be a one-time
purchase, like cars, etc.

Oh, I now have a t.v. that can play Netflix and Youtube, but is otherwise not
extendible. But what happens in ten years? T.v. still works fine, but Netflix
has gone bust and this new video-service won't work. Too bad, gonna buy a new
t.v then. I can get really mad about this stupid short-sightedness everybody
has these days.

Spoiler alert: one day Github will be gone too.

~~~
globular-toast
Do you save a copy of every web page you think might be useful later? I have a
small archive of things I consider to be "at risk", but there are many things
I enjoy that exist only on other people's servers now. I can't keep it all on
my own machines forever, so the difficulty is guessing what will disappear and
what won't.

~~~
DJHenk
No, I don't save every page that might be useful. But I do save some content
if I notice that I keep referring to it multiple times.

However, that is just information and that is not what I am talking about. I
am talking about tools and things that stop functioning because they need some
free service on the internet to work. Yes, all my projects and tools can work
without internet access. Sure, they might not get updated anymore, but they
will keep functioning and I could continue living my live no matter what shuts
down.

This even extends to non-free services. For instance, I don't use Spotify,
even though it is a nice product and I love exploring new music. But if there
was a change of service, Trump decides to block my country economically or
something like that, and I am kicked off the platform, I would suddenly have
no music anymore, even though I would have paid for it for years. So I buy
cd's and vinyl instead and rip them to flac.

------
nickjj
The FAQ[0] says pulling an image once every 6 months will prevent it from
being purged by resetting the timer.

It doesn't seem like a big deal really. It just means old public images from
years ago that haven't been pulled or pushed to will get removed.

[0]:
[https://www.docker.com/pricing/retentionfaq](https://www.docker.com/pricing/retentionfaq)

~~~
spiffytech
This may not be a big deal for small-time projects. But does this mean e.g.,
the official Node images for older runtime versions could disappear? I
recently needed to revive an app that specifically required node:8.9.4-wheezy,
pushed 2 years ago. An image that specific and old will quite possibly hit 0
downloads per 6 months in short order, if it hasn't already.

~~~
nickjj
That is a really good point. I wonder if official images will be treated
differently.

~~~
manishyt
(I work for Docker). Inactive official images will not be deleted. We are
updating the FAQ shortly.

------
jacques_chester
Docker is partly to blame for its own predicament by conflating URIs with
URNs. When you give an image reference as `foo/bar`, the implicit actual name
is `index.docker.io/foo/bar`.

That means that "which image" is mixed with "where the image is". You can't
deal with them separately. Because everyone uses the shorthand, Docker gets
absolutely pummeled. Meanwhile in Java-land, private Maven repos are as
ordinary as dirt and a well-smoothed path.

It's time for a v3 of the Registry API, to break this accidental nexus and
allow purely content-addressed references.

~~~
pcthrowaway
> the implicit actual name is `index.docker.io/foo/bar`

`index.docker.io/foo/bar:latest` to be more exact, which is a URL, but not
really a URI if we're being pedantic.

Docker doesn't really provide an interface to address images by URI (which
would be more like the SHA), though in practice, tags other than latest should
function closer to a URI

------
mrweasel
That's fantastic, my main issue with Docker Hub is that there's a ton of
unmaintained and out of date images.

Some just pollutes my search result, I don't care that "yes, technically
there's an image that does this thing I want, but it's Ubuntu 14.04 and 4
years old".

Even better, it prevents people from using these unmaintained image as a base
for new project, which they will do, because many developer don't look at the
Dockerfile and actually review the images they use in shipping product.

As a bonus perhaps this will mean that some a the many image of extremely low
quality will go away.

I think it's fair, now you can either pay or maintain your images.

~~~
swozey
You really shouldn't be pulling someones random images off of dockerhub. If I
made a POC 4 years ago on some random kubernetes configuration/tutorial that I
was testing and I decided to use dockerhub to host its images (as one
typically does, and it used to not have private repos) I'm not posting that
for you to come consume 4 years later out of the blue in production because
you found it randomly via the search.

You also tend to have no idea what's in those images and what context people
are creating them under. Sure, a lot of us know to check the dockerfile,
github repo, etc but I have images with 10k+ downloads from OSS contributions
but as you've said a whole lot of developers just grab whatever looks fitting
on there. My biggest dockerhub pull has no dockerfile, no github repo, and is
a core network configuration component I put up randomly just for my own
testing because no docker image for it existed years ago.

~~~
mrweasel
You’re right, but people tend to see Docker Hub as some master registry for
quality and official images, even if it never claimed to be such a thing.
Reading and understanding the Dockerfile it vital, before deciding to use it
in any sort of production environment. The never policy well help clean up
Docker Hub.

~~~
swozey
Totally! I'm really worried what'll happen when these docker images I made get
deleted. Dockerhub doesn't give you ANY details beyond the download number
(which stops at 10k+) so I can't tell if they're still getting used or what.

I'm hopeful they'll add statistics/refers when this goes live.

------
freedomben
If I'm reading this correctly, a single pull every <6 months would avoid this.
This seems like NBD to me.

Still, I keep my images mirrored on quay.io and I would recommend that to
others (disclaimer: I work for Red Hat which acquired quay.io)

------
LockAndLol
If people really think this is a problem, they'd contribute a non-abusive
solution. Writing cron jobs to pull periodically in order to artificially
reset the timer is abusive.

Non-abusive solutions include:

\- extending docker to introduce reproducible image builds

\- extending docker push and pull to allow discovery from different sources
that use different protocols like IPFS, TahoeLAFS, or filesharing hosts

I'm sure you can come up with more solutions that don't abuse the goodwill of
people.

~~~
Legogris
docker pull and push integrated with IPFS is a great idea!

~~~
zoobab
IPFS is only a partial solution, if you are the only one to have a copy and
you pull the plug, content is gone. You would need a bot that takes care of
maintained at least 3 or 5 copies always available on the whole file system.

------
morpheuskafka
This seems like a non-issue, if you need to keep a rarely-used image alive for
some reason just write a cron job to pull it once every six months. If the
goal is long term archival it should be entrusted to something like Internet
Archive.

------
bithavoc
This is fine and completely fair, I bet is not cheap paying for storage for
docker images no one cares about.

------
gramakri
Does anyone know what is docker's business model these days?

~~~
gtirloni
After they sold Docker Enterprise to Mirantis, I don't know anymore.

Probably hold on long enough to get acquired?

~~~
CameronNemo
Why would anyone want to acquire them after they sold off the majority of
their client base?

Did they retain a significant amount of talent?

~~~
fapjacks
Yeah.

------
SEJeff
[https://quay.io/plans/](https://quay.io/plans/)

""" Can I use Quay for free? Yes! We offer unlimited storage and serving of
public repositories. We strongly believe in the open source community and will
do what we can to help! """

------
francislavoie
That includes public images? That'll hurt OSS. That's a bummer.

It wouldn't surprise me if people move to Github's registry for open source
projects.
[https://github.com/features/packages](https://github.com/features/packages)

~~~
NathanKP
Probably not at 50 cents per GB of data transfer outside of Github Actions.
Unfortunately the only place you can viably use Github registry right now is
inside Github actions

~~~
laurencerowe
That pricing is for private repos. It's free for public repos.

------
nhumrich
I would delete my own images to clear up room on dockerhub, but they dont have
an api to remove images. the only way is to manually click the x in the UI.
So, in a lot of ways, docker forced us to "abuse" their service and store
thousands of images on a free/open source account. I get this change, and it
was inevitable. But its still ironic that you cant delete tour own images. The
best way to delete your image is to just stop using it and let docker delete
it for you in 6 months.

~~~
lightswitch05
I agree, I have a little tool called `php-version-audit` that literally
becomes useless after a few weeks without an update (you can't audit your php
version without the knowledge of the latest CVEs). I have manually cleaned up
old images like you say by clicking through them all, but having a way to
define retention limits is a feature to me.

------
ncrmro
Been wondering when image space would start to be a concern.

Actually just set up my own private registry and pull though registry.

Pretty easy stuff although no real GUI to browse as of yet.

This is all sitting on my NAS running in Rancher OS

~~~
chrisandchris
Take a look at Portus, a project maintained by SUSE, which has a pretty nice
GUI for a private docker registry.

[https://github.com/SUSE/Portus](https://github.com/SUSE/Portus)

~~~
ncrmro
Actually spent some time looking at this today. It’s a bit more complex than I
was hopping. As right now I’m the only user. And the only way to conecto to my
registry atm is through wireguard.

It is cool seeing opensuse. Same the rancher

~~~
chrisandchris
It‘s a bit an overkill if you‘re alone but even then, I‘m using it for my own
private registry too because it‘s the nicest/easiest way IMHO for adding auth
to a docker registry.

------
djsumdog
Seems like companies are relearning what they should have in the 2001 dotcom
bust.

Keep free stuff free and add paid stuff. If your free stuff isn't sustainable,
you really should have though that through early on.

This limit seems reasonable, because storage costs are expensive. But it
should have been implemented day one so people have reasonable expectations on
retention. Other's have mentioned open source projects and artifacts for
scientific publication being two niche use cases where people still might want
this data years later, but it'd be rare for it to be pulled every six months.

I only have a few things on docker hub, but I'll probably move them to a self-
hosted repo pretty soon. At least if it's self hosted, I know it will stay up
until I die and my credit cards stop working.

~~~
Macha
I think in Docker's case, in their original plan this free unlimited hosting
was probably sustainable in a freemium model where businesses paid for Docker
Enterprise and Docker.com was about marketing and user acquisition, similar to
open source on GitHub.com being marketing and user acquisition for paid
accounts/Github Enterprise.

Its not an unreasonable strategy to provide generous free hosting if you
derive some other business benefit from it (YouTube being another example).

But Docker Inc. found their moat was not that deep and other projects from the
big cloud providers killed the market they saw for Docker Enterprise and they
sold it off.

So now they just have docker.com and Docker CE - which even that has
alternatives now with other runtimes existing. So they need to make docker.com
a profitable business on its own or find something else to do which changes
the equation significantly.

------
anderspitman
If you've never used Singularity containers[0], I highly recommend checking
them out. They make a very different set of tradeoffs compared to Docker,
which can be a nice fit for some uses. Here's a few of my favorites:

* Images are just files. You can copy them around (or archive them) like any other file. Docker's layer system is cool but brings a lot of complexity with it.

* You can build them from Docker images (it'll even pull them directly from Dockerhub).

* Containers are immutable by default.

* No daemon. The runtime is just an executable.

* No elevated permissions needed for running.

* Easy to pipe stdin/stout through a container like any other executable.

[0]:
[https://github.com/hpcng/singularity](https://github.com/hpcng/singularity)

~~~
GordonS
> * Images are just files. You can copy them around (or archive them) like any
> other file

Never heard of Singularity before, and it does look interesting. Wanted to
point out though that you can create tarballs of Docker images, copy them
around, and load them into a Docker instance. This is really common for air-
gapped deployments.

~~~
anderspitman
I've never seen this mentioned in the official Docker docs. Is it a well-
supported workflow?

~~~
jonfw
Docker save and docker load- I use this often for air-gapped installations and
it works exactly as well as docker pull in my experience.

------
Bedon292
If they are doing this, they should add stats on when the last time an image
was pulled, so you can see what is at risk of being removed. Would be curious
about a graph, like NPM has a weekly downloads one so you can see how active
something is.

~~~
manishyt
(I work for Docker). We will be updating the UI to show status of each image
(active or inactive). We will be updating the FAQ shortly to clarify this.

------
fgribreau
Docker is the new Heroku. Cronjobs will pull images to simulate image activity

------
wilsonfiifi
In case anyone decides to self-host their docker registry, Pluralsight has a
nice course [0] on that subject.

[0] [https://www.pluralsight.com/courses/implementing-self-
hosted...](https://www.pluralsight.com/courses/implementing-self-hosted-
docker-registry)

------
sebazzz
> _What is an “inactive” image?_

> An inactive image is a container image that has not been either pushed or
> pulled from the image repository in 6 or months.

>

> _How can I view the status of my images_

> All images in your Docker Hub repository have a _“Last pushed”_ date and can
> easily be accessed in the Repositories view when logged into your account. A
> new dashboard will also be available in Docker Hub that offers the ability
> to view the status of all of your container images.

That still does not tell the whole story, does it? I still don't know if my
image have been pulled for the last six months. Only when _I_ pushed it.

------
rmoriz
So this means that open source projects need to pay to keep older images
alive?

~~~
moondev
They say image not tag, so it wouldn't appear this should impact active
projects

------
laksjd
I just got an email notification and while I can understand that they're doing
this (all those GB must add up to a significant cost), the relatively short
notice seems unnecessary.

~~~
ownagefool
6 month notice doesn't seem terrible for a free service imo.

~~~
amenod
3 month notice - they will start on Nov 1st. Still not bad.

~~~
ownagefool
Image retention is 6 months though, so it seems slightly unclear if the timer
starts counting from today or from Nov 1st.

Best to assume the worst but still plenty of time to write a cron that pulls
all your images, assuming for some reason you need images you don't pull for >
6 months.

------
GordonS
Hmm, the very nature of layered images presumably means big storage savings; I
wonder if block-level deduplication at the repository backend would be
feasible too?

~~~
brown9-2
Registries already do this

~~~
GordonS
Do you mean at the filesystem level, or higher up? Have you got any sources
for this?

~~~
binman-docker
Hi, I work at Docker. Registry sees each layer as a SHA and does not store
multiple copies of the same SHA for obvious reasons. This is not unique to
Hub, it's part of the registry design spec.

Registry is open source
([https://github.com/docker/distribution](https://github.com/docker/distribution))
and implements the OCI Distribution Specification
([https://github.com/opencontainers/distribution-
spec/blob/mas...](https://github.com/opencontainers/distribution-
spec/blob/master/spec.md)) if you want to dig into it.

~~~
GordonS
Yes, that's what I meant when I mentioned layers; clearly copies of the same
layer are not kept :) My question was about block-level, or other forms of
deduplication.

~~~
binman-docker
Deduplication at the block level would be dependent on the choice of storage
driver ([https://docs.docker.com/registry/storage-
drivers/](https://docs.docker.com/registry/storage-drivers/)). In the case of
Hub, S3 is the storage medium and that's an object store rather than a block
store.

In theory you could modify the spec/application to try to break layers down
into smaller pieces but I have a feeling you would reach the point of
diminishing returns for normal use cases pretty quickly.

~~~
jacques_chester
I found this recent paper interesting:
[https://www.usenix.org/conference/atc20/presentation/zhao](https://www.usenix.org/conference/atc20/presentation/zhao)

> _Containers are increasingly used in a broad spectrum of applications from
> cloud services to storage to supporting emerging edge computing paradigm.
> This has led to an explosive proliferation of container images. The
> associated storage performance and capacity requirements place high pressure
> on the infrastructure of registries, which store and serve images.
> Exploiting the high file redundancy in real-world images is a promising
> approach to drastically reduce the severe storage requirements of the
> growing registries. However, existing deduplication techniques largely
> degrade the performance of registry because of layer restore overhead. In
> this paper, we propose DupHunter, a new Docker registry architecture, which
> not only natively deduplicates layer for space savings but also reduces
> layer restore overhead. DupHunter supports several configurable
> deduplication modes , which provide different levels of storage efficiency,
> durability, and performance, to support a range of uses. To mitigate the
> negative impact of deduplication on the image download times, DupHunter
> introduces a two-tier storage hierarchy with a novel layer prefetch
> /preconstruct cache algorithm based on user access patterns. Under real
> workloads, in the highest data reduction mode, DupHunter reduces storage
> space by up to 6.9x compared to the current implementations. In the highest
> performance mode, DupHunter can reduce the GET layer latency up to 2.8x
> compared to the state-of-the-art._

~~~
GordonS
This is really interestingm thanks for posting this! It's exactly the kind of
thing I was thinking of, even if I expected a comment like yours to come from
someone at Docker ;)

------
Bnshsysjab
Remember that time you were looking for an answer to some obscure question,
you find the perfect google result - description, page title and URL all
indicate it’s going to answer your question so you click it, and... nothing..
the page cannot be found.

You now have that, with docker.

------
voltagex_
A 2019 paper says there's 47TB of Docker images on the Hub. Get scraping.

------
voltagex_
I wonder what kind of account the Home Assistant images are using. This could
break a whole lot of stuff - and I've seen projects that don't publish a
Dockerfile anywhere.

------
maztaim
Relying on goodwill works until that goodwill stops. Store your images locally
at least as a backup, but it has other advantages.

------
avian
Is there a way to see when an image was last _pulled_? I can see the last push
date, but not pull.

------
wildpeaks
This will begin November 1, 2020

------
thrownaway954
given their track record with developers over the years, i wouldn't be
surprised if microsoft scammbles to build a competitor to docker repo service
and integrates it with github.

------
pjmlp
The complaints as expected, are the usual ones from free generation,
apparently Mozilla is not enough.

~~~
zoobab
Let's mirror dockerhub on a distributed fault tolerant file system. And IPFS
sucks.

------
PaywallBuster
tl;dr images hosted on free accounts without downloads for 6 months will be
(scheduled) removed.

------
oauea
Time for someone to create a new service that will pull your images into
/dev/null once a month.

~~~
mr__y
This could also be solved by one person running a service that would crawl all
public docker images and pull those that are close to expiration automatically
every 6 months. At this moment I'm just curious how much resources would be
needed for that

~~~
remram
If you just need to send the request, not read the content in full, that can
be done by one free-tier cloud VM.

~~~
csunbird
Reading the content still should not be a problem, since ingress is free for
almost all cloud providers.

~~~
mr__y
> ingress is free for almost all cloud providers.

that would still pose a problem, not cost-wise, but you'd still need to
download image after image. Will a single instance be capable of "downloading"
all existing images every 6 months?

------
dataminded
I'm selling a SAAS service that will pull each of your images once every 6
months...thank you Docker.

------
jmondi
Add one more coin to the "always self host" bucket. Just another example of a
service that starts free, then they pull the rug from under you and hold you
hostage for their ransom.

