Docker to rate limit image pulls

rubyn00bie · on Aug 24, 2020

I originally came here to ask how folks use that many docker images in six hours (I'm mostly a Docker n00b, and not being facetious); however, after reading the article, I clicked to see how much unlimited is and it's $5 a month. Now my question has changed to: is $5 a month really a deal breaker for folks to get unlimited? Or what are the use cases where the cost is prohibitive? Open source or community projects?

In most cases it would seem $5/user isn't much as a business or organizational expense, and if it's a personal project 200 images in six hours seems pretty solid?

I'm just sort of shocked it's so cheap. I figured it'd be like $25-$100 a month or something just because of all the bandwidth someone could probably burn building random/broken shit over and over.

I'm mostly curious because I've considered using Docker recently for personal projects and my home server; but I'd rather not invest a bunch of time porting things only to find out it's gonna be way more than $5 (not including my time).

Or is it that hard to just build your own images? To be honest, I've never really understood that part of Docker (using other folks images)... It always seemed like an enormous security risk[1]. FWIW, I still deploy my personal projects using chef/ansible, shell scripts, and systemd units like some sort of curmudgeon-y monster...

[1] Everything is a security risk (APT, source, etc), I know, I get it. No one need scribe miles of pedantry into the comments explaining it to me. It's what each of us has tolerance for and can accept that matters.

ViViDboarder · on Aug 24, 2020

It’s not common to need that many pulls nor is it hard to build your own images.

If you’re deploying to a cluster with 200 machines, you could easily hit this if you use the public registry though. However, if you’re managing that size cluster you can probably afford the fee, but more importantly, you should probably pull once to a local registry and use that to deploy to your cluster anyway.

hamiltont · on Aug 24, 2020

Do you run a local registry? Any high-quality articles/youtube talks to share? I'm about to set one up for our own little cluster (~5 machines, ~75 containers). I know tons about docker engine, and a fair bit about the registry, but it's always nice to watch a "lessons learned from actually doing this in production" talk to know what mistakes to avoid

oppositelock · on Aug 24, 2020

I run tens of thousands of docker images in production, or rather, tens of thousands of copies of a few hundred images.

If you do something like this, you absolutely MUST have a local registry.

Harbor [1], JFrog [2], and Quay [3] would be the first ones that I look at.

Harbor is open source, free, and a member of the CNCF. You will need to do a little bit of work to set it up to scale properly. JFrog offers a SaaS registry, but you will pay big $$ based on pull traffic. Their commercial site license is about $3k/year. Quay is older than either of them, stable, and high quality. I'd start with Harbor these days.

[1] https://goharbor.io/ [2] https://www.jfrog.com/confluence/display/JFROG/JFrog+Artifac... [3] https://quay.io/

manquer · on Aug 25, 2020

Just to add all the major cloud service providers provide registries ACR /ECR/GCR etc . If you run k8s service with one of the them in my experience it is best to use the corresponding registry.

I have pulled and run 20k times a 1GB image in less than 10-15 minutes without breaking a sweat.

Finally GitHub packages offers a registry out of the box . It is great for CI and devs to access . I generally have the tags mirrored from tags GitHub for production to ACR .

threeseed · on Aug 25, 2020

Github Docker Registry is a mess and should be avoided at all costs.

1) It is broken and unusable on Kubernetes and Docker Swarm.

2) It is flaky often returning 500 type errors.

3) It is expensive as the amount of pull bandwidth is very limited.

manquer · on Aug 27, 2020

Github packages works with Github CI out of the box, it makes development lot easier, like I mentioned for best networking in prod you should always use the registry from your k8s Provider, mirroring the Github registry to ECR/GCR/ACR is fairly straightforward. Bandwidth costs are eliminated, network is lot more reliable intra DC.

neurostimulant · on Aug 25, 2020

> It is broken and unusable on Kubernetes and Docker Swarm.

Hmm, I use them on several kubernetes clusters in the past few months and don't see any issue yet.

hamiltont · on Aug 26, 2020

FYI, Using ECR with Docker Swarm is something we did try. It was hellish. We never nailed down the exact problems, but we spent about a month with 2-3 experienced engineers trying to fix the edge case issues.

The main issue was ECR has a slightly different authentication model than docker swarm. The whole '--with-registry-auth' only partially works when you are using ECR. Unfortunately, it works just enough that you think it's working, until all your tokens time out and a worker can suddenly no longer pull an image.

Our common failure case was an image becoming unhealthy or a node being drained. When that image would try to be restarted on a different worker, if that worker did not have the image it would try to get it from the registry. If the tokens were expired it would fail.

The only "fix" we ever found was to setup a cron job that forcibly deployed a new version of a "replicated globally" image every X minutes (where X was based on ECR token expiration). It kind of worked, but we still had occasional failures we could not identify.

I wish it worked better, because it was nice to use ECR. Frankly token expiration sounds much more secure too, but without direct support for token refresh inside the docker engine it's just hard to get everything to work

ztjio · on Aug 25, 2020

Looking at doing GitHub Packages for direct-to-dev and mirroring into ECR over here. Seems sound. But also considering other options as ECR is a pain to work with.

That said, word of warning for anyone looking at GitHub Packages for docker registry: it's broken with containerd and some other similar tools. They (GitHub) are currently working on a fix: https://github.com/containerd/containerd/issues/3291

lolinder · on Aug 25, 2020

I got set up with ECR without any difficulties whatsoever. You do have to authenticate before pulls and pushes, but that can be scripted very easily.

drzaiusx11 · on Aug 24, 2020

I was a happy user of JFrog's registries via site license at my last 2 places. Seemed to just work as expected. Didn't have visibility into the cost though (other teams set it up) so I had no idea it was $3k/year.

apple4ever · on Aug 26, 2020

We have not had good luck with Quay. They are not stable, especially as of late. There was a period last month where for two weeks pulling images was a crapshoot.

hamiltont · on Aug 26, 2020

Thank you very much. This is exactly the type of info I needed.

sytse · on Aug 24, 2020

If you want to run a local registry to stay below the 100 pulls per 6 hours limit please consider GitLab. The Dependency Proxy https://docs.gitlab.com/ee/user/packages/dependency_proxy/ will cache docker images. This way you stay within the limits Docker set and subsequent pulls should be faster as well.

efreak · on Aug 25, 2020

Personally, I wish generic caching proxies were still a thing, and easier to set up. I've tried setting up squid several times in the past, and failed miserably every single time--all I want to do is use it as a gateway (ie, make the proxy invisible to the application) for e.g. apt packages, so I just ended up using apt-cache or whatever other appropriate software, but I'd far rather use something generic that just works on 90% of the software I use at home, whether it's reading webcomics or repeatedly installing the same software in a dozen VMs with slightly different configurations, or even just browsing remote filesystems via webdav.

megous · on Aug 25, 2020

I use nginx to proxy cache the Arch Linux package repository transparently. It's fairly easy to set up, and enables nice features like contacting a secondary mirror if the first one is down, or when multiple requests hit the same resource, all are blocked waiting for a single merged package download, so the proxy will not make the download multiple times if I run pacman -Syu on my 18 machines in parallel. And it's all just 20-30 lines of nginx config.

It's not transparent though.

EnigmaCurry · on Aug 25, 2020

this is my way: https://github.com/EnigmaCurry/lazy-distro-mirrors

drzaiusx11 · on Aug 24, 2020

I just use ECR[1] which in many cases costs less and is fully locked down behind my AWS VPC

With ECR you pay for image storage: $0.09 per GB after the first 1 GB which is free

[1] https://aws.amazon.com/ecr/

tuananh · on Aug 26, 2020

are you gonna rebuild all the images that you use and push to ECR?

drzaiusx11 · on Aug 28, 2020

nope. you don't have to rebuild images to push to different registries

pull from docker hub once, push to ECR. then pull from ECR as much as wish

mschuster91 · on Aug 24, 2020

Just drop a Sonatype Nexus instance on a Docker container somewhere on your network. Alternatively, use Squid if you don't push to the public Docker registry, although you might need to mess around with internal CA for SSL...

Spivak · on Aug 25, 2020

Docker supports proxies (they call them “pullthrough repos”) so you don’t have to be so generic as an http proxy.

efrecon · on Aug 25, 2020

I would stay away from Nexus. It has problems with latest tags.

mvanbaak · on Aug 24, 2020

Nexus in a container... because storage in containers is such a good idea? Any vps with a disk is probably a better idea

1337shadow · on Aug 24, 2020

You can still bind mount a directory into a container...

Spivak · on Aug 25, 2020

Storage in containers has been a long solved issue. The defaults are unfortunate because but make sense for ease of use. Your container root should be read only, ephemeral storage lives in a tmpfs or dynamic volumes depending on performance and size needs, and persistent storage lives in volumes.

swuecho · on Aug 25, 2020

https://docs.docker.com/registry/

you can set it up in less 10 min and the only thing required is to add '--insecure-registry' in your client. It is not a issue if all your machine are in private network.

johnmaguire · on Aug 25, 2020

Isn't there no authentication on that registry? I guess that's fine if you don't believe in zero-trust architecture.

swuecho · on Aug 25, 2020

you are right. That is what you can get in minutes.

majewsky · on Aug 25, 2020

If you cannot get a TLS cert for internal infrastructure in a few minutes, I'd recommend you start looking into why.

swuecho · on Aug 25, 2020

no good document on it and it is not very important for me ( I run it on homelab).

still wonder how to do it in minutes.

Aeolos · on Aug 25, 2020

I use this (in a docker image) to generate certificates automatically: https://github.com/adferrand/dnsrobocert

Expect to spend 1-2 hours first time you try it until you can setup the correct DNS records, API keys and configuration.

Afterwards it's pretty hands off, every three months you'll receive an email from letsencrypt and you'll have to rerun this script to regenerate your certificates. Takes 2-3 minutes max (but of course you still need to distribute your certificates to all relevant services...)

roleone · on Aug 27, 2020

If you run traefik it's even easier: https://docs.traefik.io/https/acme/

Sherl · on Aug 25, 2020

If you run on Kuberenetes, the image is/can be cached at network layer.

paxys · on Aug 24, 2020

Anything is a deal breaker if the user expects to pay nothing. That is why going from free to $1/month will have a much larger user dropoff than, say, $1 to $10.

overcast · on Aug 25, 2020

Weber's Law. Human perception is logarithmic.

temikus · on Aug 24, 2020

It may be general resistance to a concept. More and more services are becoming subscription based, so 5$ here, 5$ there and the cost creeps further.

mm89 · on Aug 24, 2020

Exactly.

It's hard enough to save money as it is.

News websites blow my mind with this - if I forked over $5 to every news outlet I occasionally like to read, I'd be spending at least $500 maybe more per year JUST to get access to some random person's biased recant of what's happening in the world. If there were a news source that did the opposite of this, and basically provided a bullet list of objective, non-biased events boiled down to exactly what I need to know, that might be something I'd pay for. Hell, it would save you time over filtering the opinionated BS out.

skocznymroczny · on Aug 25, 2020

Providing objective, non-biased events would be very hard.

Consider for example the current riots going on in the US. How do you objectively report on that? With bias, on one side you have "peaceful protest disrupted and escalated by the police", on the other side you have "police intervening in riots to maintain order and protect property". There's not really inbetween.

majewsky · on Aug 25, 2020

"Protesters say their peaceful assembly has been disrupted and escalated by the police. The police argue they've only been intervening in riots to maintain order and protect property."

Done.

SifJar · on Aug 25, 2020

If you have to represent "both sides" (in many cases there'll be more than two sides really), you end up having to give a voice to nutjobs, plus you present both sides as equally valid assessments.

Much as we'd all love an "unbiased" news source, the reality is that bias is a very hard problem to solve well.

moomin · on Aug 25, 2020

Indeed, just repeating what people say about an event may be factual, but without any concept of what is actually true, it can’t be considered objective.

If one side is lying, objective reporting would tell you which side it was.

manigandham · on Aug 24, 2020

I explored building exactly that but turns out there’s no money in it. Most people want the narrative with the facts, if not more so.

smabie · on Aug 25, 2020

There's a big market for this, it's just not for individuals. For example, Bloomberg distributes factual news on its terminal. The Bloomberg terminal even highlights important words in news stories so you can absorb the information more quickly. So if there was a earthquake somewhere, it might highlight the word "earthquake," the number of people that died, and the economic cost, for example.

Also there are news wire services that do mostly what you're describing. If you just want to be entertained (most people read news for entertainment), then they don't really care about the facts. They want to hear about so and so blasting so and so or whatever. But if you're trying to make money from information (traders, journalists, etc), then you really don't want to be reading the kind of stuff the New York Times is publishing.

1f60c · on Aug 25, 2020

I ran the math on this once.

Just a subscription to The Information is $399/yr. Add a subscription to the Times, and you've already blown past your $500 budget.

pjmlp · on Aug 24, 2020

Mind blowing, imagine paying $5 for each newspaper one wants to read.

Spivak · on Aug 24, 2020

You’re missing the point. If there’s a news source that you occasionally read then it’s far more cost effective for you to just buy the paper at the stand for 50¢ the few times you want it. Same with magazines. If you read every newspaper then getting lower cost and delivery in return for the paper getting consistent revenue is a good deal for both parties.

If you get your news like most people, via link aggregators like HN, Reddit Facebook, Twitter then you get linked to dozens of publications that all want a $5/mo. commitment which is untenable.

johnmaguire · on Aug 25, 2020

It would be neat if news sites would start offering 50 cent day passes instead of difficult to cancel subscriptions.

mgkimsal · on Aug 25, 2020

I sort of half thought apple news might go that way. might not be cost effective - bundling larger subscription stuff is probably more revenue/profit.

But... in their news app, there's always a couple of interesting articles I might want to read, but I'm not signing up. They have my info, and a Touch ID device I'm holding tied to my payment info. "Read this article for 50c?" I'd certainly give some a read now and then.

pjmlp · on Aug 25, 2020

My daily newspaper costs 2€ at the newsstand, not cents. With a monthly subscription of 5€. Pretty much worth it.

I have been subscribing valuable information sources since 1995, so I do get the point.

ironmagma · on Aug 24, 2020

Except it’s not $5, it’s $5 to sign up, then an email and a phone call and your firstborn dog to unsubscribe. If it were microtransactions, that’d be one thing...

pjmlp · on Aug 25, 2020

If the service doesn't play ball there is always the consumer protection agency.

prepend · on Aug 25, 2020

That doesn’t work. I find it better to stay anonymous and avoid spam from these services.

The perpetual spam is worse for me than the $5.

msla · on Aug 24, 2020

$5/month isn't $5/month. It's convincing your boss you need $5/month, because they need to convince their boss, which eventually makes its way up the chain to C-levels, who don't know Docker from yesterday's rotting tuna casserole and view eating either that or the $5/month with the same level of disdain.

It isn't about the money, it's about the Mommy-May-I up and down the chain with emails and meetings and careful explanations to skeptical glares. It's a psychological and institutional barrier.

cddotdotslash · on Aug 24, 2020

If your C suite is personally approving a $5/month charge, your organization is likely no where near the size where a change like this from Docker impacts you.

Twirrim · on Aug 24, 2020

> If your C suite is personally approving a $5/month charge, your organization is likely no where near the size where a change like this from Docker impacts you.

Or has toxic micromanaged structure. I've had friends who have worked places that would barf over ongoing $60/year software charges, where anything like that would have to go up to C levels and require justification. Luckily never worked at one myself, dodged that particular bullet.

ISL · on Aug 24, 2020

My purchase-approval flow is the same for $1 purchases as it is for $1k purchases. If we hadn't finagled a minor workaround, it would be the same as that required for $5k purchases.

Handling each purchase and documenting it in case we are ever audited requires easily $25-50 of people's time.

nodesocket · on Aug 24, 2020

If your C suite requires approving a $5/month charge the company has serious issues. This always bugs me about the HN attitude about spending money, but it's really a extremely frugal developer complex. It's $5 a month, you get insane value out of Docker Hub, just pay it. I come from an entrepreneur attitude, my time is my most precious commodity. I don't optimize minor expenses, I optimize the big picture, my time, outcomes, and revenue coming in.

viraptor · on Aug 24, 2020

Yeah, when it's $5/mo blocking good utilisation of multiple $150k people, someone doesn't understand priorities.

muxator · on Aug 24, 2020

Or maybe your organization is a public institution, where each penny spent needs to be authorized/accounted for.

mcny · on Aug 25, 2020

> It was revealed on January 22, 2009 that Thain spent $1.22 million of corporate funds in early 2008 to renovate two conference rooms, a reception area, and his office, spending $131,000 for area rugs, $68,000 for an antique credenza, $87,000 for guest chairs, $35,115 for a gold-plated commode on legs, and $1,100 for a wastebasket. Thain subsequently apologized for his lapse in judgment, and reimbursed the company in full for the costs.

> https://en.wikipedia.org/wiki/John_Thain

"Sorry I got caught. I will work harder to hide next time."

protomyth · on Aug 25, 2020

Even in a public institution that has never failed an audit, I have trouble believing $5/mo is a hard thing to get. Technically, its $60/year fee because its one of those annual plans. We pay for vultr with no drama.

cma · on Aug 25, 2020

$5 a month also means consideration is given as part of a contract, and legal will need to review to make sure you aren't granting them patent immunity, etc.

Pet_Ant · on Aug 24, 2020

Worked at a place where everything went through the VP finance and they were making $15M/yr profit. At one point we brought ion a desktop off of the curb to use as our build server. Sure, we got in trouble with ops later but those above us didn't care since it didn't cost us anything.

guillaumei · on Aug 25, 2020

It took 3 months at my old client (fortune 100) to get a signature for an Addendum that would deliver us a service for free in addition to the paid stuff that was already signed.

prepend · on Aug 25, 2020

For me it’s not the c-suite, it’s the admins who make purchases. Or the hassle of maintaining a corporate card.

The headaches to spend $5 cost way more than $5.

trhway · on Aug 24, 2020

>It's convincing your boss you need $5/month, because they need to convince their boss, which eventually makes its way up the chain to C-levels, who don't know Docker from yesterday's rotting tuna casserole and view eating either that or the $5/month with the same level of disdain.

this is where miracle of enterprise sales happen - the $5 subscription can be sold as a $50K+ deal by smooth enterprise sales who will provide the C-exec with the experience making him feel like he did something smart and great for the company.

coredog64 · on Aug 25, 2020

Just promise to give the exec a keynote time slot where he can describe the ROI of the 50k deal. Helpfully, the vendor will provide an Excel doc that lets you calculate the benefits. You don’t even have to realize them, just aspire to. From there, the C-level gets some fawning press coverage for their LinkedIn profile and a set of job offers at the next-size-up organization where they can also level up on comp...

Ironlikebike · on Aug 25, 2020

This "up-and-down chain" should stop pretty quickly at the level where eng mgmt knows loaded costs. It's an eng manager and director's job to compute total-cost-to-execute, and opportunity cost (using loaded-costs for engineer time). They should be able to approve once you've convinced them that it's cheaper to buy than to build.

Engineers can head this off by prepping a total-cost-to-execute analysis for mgmt. This doesn't need to be complicated. It's just some estimates of what's needed and why, and what the alternatives cost. My eng VP used to ask me for these when I'd send up a request. He wanted to know that we thought about total-cost-to-execute. He'd usually only read the exec summary and approve. If these requests are really going that far up the mgmt chain either someone isn't doing their job, or higher level mgmt are micromanagers.

stjohnswarts · on Aug 25, 2020

If you work for a company that disconnected from reality you best be looking for a new job. $5 the guy who decides budget should be angry that you didn't come with him directly for such a tiny amount of money.

cortesoft · on Aug 25, 2020

There is no way you need C-level approval to get a $5 a month subscription.

Igelau · on Aug 25, 2020

Start looking. That place is going to fail.

callmeal · on Aug 25, 2020

>Now my question has changed to: is $5 a month really a deal breaker for folks to get unlimited?

I want to speed forward 5 years and see how well this ages. It reminds me of all the other comments about "of course Facebook will never require an account to use your VR headset"...

Dylan16807 · on Aug 25, 2020

I think it will age fine. If every ad-supported website switched to this model, I'd be paying for either zero or one websites.

If it was $5 for access that would be a completely different situation, but a big free tier followed by $5 for unlimited is fine.

klodolph · on Aug 24, 2020

I don’t think anyone who’s adopted systemd can call themselves a curmudgeon :)

msla · on Aug 24, 2020

Sometimes, curmudgeonliness is measured in resistance to changing the default:

"Stock CentOS was good enough before and it's good enough now!"

pjmlp · on Aug 24, 2020

It is pretty cheap, it is just the free beer crowd that is upset.

staticassertion · on Aug 25, 2020

Yeah, I'm just going to pay for it. I think I probably already am, but if I'm not, I will. Even the enterprise 7/user isn't that bad at all. Github enterprise is like 3x that.

coredog64 · on Aug 25, 2020

The limit for unauthenticated users is by IP address. I could imagine a smallish business that has a consistent on-ramp IP for their users that could breach the limit not on any single user but in aggregate.

I do sympathize with Docker though: Storage and bandwidth at that scale isn’t cheap and they need to monetize somehow.

tomc1985 · on Aug 25, 2020

You should be more shocked that everything else is so expensive.

The base costs for a lot of tech stuff (like bandwidth) are so cheap. But you would never know between these bullshit "what is it worth to you" pricing models and the number of middlemen trying to stick their hand into the pot. It's disgusting.

corobo · on Aug 25, 2020

Developers are cheap. “I could build that” is your competitor

Note could, not will

Ironlikebike · on Aug 25, 2020

Developers are overly optimistic, undervalue their time (they don't know the loaded-cost of their time), and overvalue their solutions (they rarely account for maintenance costs or the long-tail of time to perfect the solution). So developers think they are cheap. For example, I had an engineer express disgust that we might have to pay $4/month-per-user for gitlab just to have mirroring enabled. He said "we can do our own mirroring." When asked how long it would take he said, "between 2 hours to two-weeks if we have to iron out bugs". So when accounting for his time, his price to deliver ranged between $200-ish in the best-case scenario, to $8000-ish in the worst-case scenario (which far exceeds the cost of paying for the licenses). Needless to say, we paid for the licenses. He also wasn't considering the opportunity-cost, i.e., what will he NOT do that's more valuable while he works on this proposed solution that we could just pay for?

theon144 · on Aug 24, 2020

Good!

I'd even welcome much more agressive limits than what they're proposing; the current culture regarding builds and CI in general is horrifyingly ineffficient, wasteful and in the end just plain slow.

I'm looking forward to developers adjusting their workflows (and caches, etc.) to actual, reasonable limits, not just perusing the service as if it were an unlimited cost-free cornucopia of software.

pjmlp · on Aug 24, 2020

I welcome the end of free beer culture, so I am quite alright with their change.

avasthe · on Aug 26, 2020

Everything is inefficient. There is a huge class of "developers" that don't understand what O(n) means, and a subset of them are vocally proud of it.

draw_down · on Aug 24, 2020

> the current culture regarding builds and CI in general is horrifyingly ineffficient

I used to be in charge of the website for a company you’ve heard of. We once realized some huge proportion of our traffic originated from a hosted CI company requesting the site thousands and thousands of times (guessing one for each build they hosted) every 5 minutes.

I can’t remember what proportion of traffic it was but I’m pretty sure it was a majority, maybe even more than 80%.

userbinator · on Aug 25, 2020

I sure hope the "cloud first" advocates are happy now, because they have managed to create masses of developers who have next to no idea that what they're doing is an incredible waste of resources. These are also the same people who are perplexed why their systems intermittently fail, or are surprised that they do when the Internet connection cuts out for a bit.

bsamuels · on Aug 24, 2020

There is very little reason for a build node to need to pull 200 images in 6 hours, and here is why:

When a machine issues a ``docker build`` command, the program reads the relevant dockerfile to check for any base images that need to be pulled (a la "FROM:")

These base images are identified based on the image repository, image name, and image tag. The first thing docker does is it checks its local registry and tries to find a match for the base image the docker build is requesting. If a matching image is located in the local registry, it uses that one in lieu of downloading the image.

This is significant - if your organization only uses a few dozen base images from DockerHub, those images will only be downloaded by each build node _once_, then never again.

Many docker users erroneously believe that if their Dockerfile requests a "latest" tagged image, docker build will always download the newest version of the image. However, the "latest" tag is literally just a tag, it doesn't have any special functionality built in. If the docker build command finds an image tagged "latest" in the local registry, it stops there.

The only way to get docker build to always use the "actual latest" version of the base image is to add the "--pull" parameter to the docker build command. This arg will tell docker build to check the repository remote to see if the SHA hash of the image tagged "latest" has changed, and if so, re-download and use it. In the absolute worst case, this means each build node will pull 1 copy of each base image when the base image is updated. So unless you use 200 different base images that all have updates deployed to Dockerhub each and every day, you are fine.

nrmitchi · on Aug 24, 2020

I don't disagree with what you are saying, _but_:

> Docker defines pull rate limits as the number of manifest requests to Docker Hub.

> For example, if you already have the image, the Docker Engine client will issue a manifest request, realize it has all of the referenced layers based on the returned manifest, and stop. ... <excluded> ... So an image pull is actually one or two manifest requests,

This still implies that even if you are appropriately re-using layers on your machine, with a free plan you can only do maximum 200 builds (since docker still needs to verify it has the image) per 6 hours?

This change also seems to imply that builds steps which previously did not handle/require authentication against Docker hub (it was only pulling public images, and pushing elsewhere) will now be required to auth against docker hub in order to double the number of pulls/checks/builds it is allowed?

bsamuels · on Aug 24, 2020

This is an excellent point. Trying to find out if docker build --pull without an accompanying blob download will trigger the rate limiter.

If it does, then this will definitely be a reason to riot. It will effectively mean that anyone who wants to do more than 200 builds every 6 hours using the "right" way will have to get a docker pro subscription.

nrmitchi · on Aug 24, 2020

It sounds like it definitely does trigger the rate limiter.

> There is a small tradeoff – if you pull an image you already have, this is still counted even if you don’t download the layers.

I expect we're just going to see a lot more recycling of build nodes once it has "used up it's docker credits".

heavenlyblue · on Aug 24, 2020

All reasonable orgs should have had their private docker repo a long time ago.

Everybody else is living the pipe dream where they have externalised their risk and probably deserve the Docker treatment.

Spivak · on Aug 24, 2020

Yes but Docker achieved their goal of making it annoying as hell to not user DockerHub. Because you can run your own private repo just fine but what you want is a transparent proxy (like apt-catcher) that will let you pretend you’re using DH but actually pulling from either the cache or your private repos. All the pieces are there with private repos and “pullthrough” proxies they’re just not well integrated, seemingly on purpose.

RedHat’s patches to Docker make this possible but Docker has refused to upstream it.

heavenlyblue · on Aug 27, 2020

> annoying as hell to not user DockerHub

Why? All your need to do is to use your domain name when referencing the image.

jhenkens · on Aug 24, 2020

Agreed. Third party package repositories has been a weak point in our CI, and we put all of them behind a self-hosted proxy that we can manage in our own HA fasion. Turns out we get faster pulls from it, as well as being a good internet-citizen.

dannyw · on Aug 25, 2020

If you do 200 builds every 6 hours, you could probably afford to pay $5 a month.

userbinator · on Aug 25, 2020

I admittedly have only used Docker very little, but how exactly does someone manage to build images once every 108 seconds continuously for 6 hours? That sounds extreme.

terom · on Aug 25, 2020

Easily with CI. Every pull request on the GitHub project will build a dozen Docker images whenever a PR is opened, updated or merged.

Granted, there's only a couple base images involved, so CI pipelines will need updating to be more efficient in terms of `docker build --pull` usage.

adrianmsmith · on Aug 24, 2020

> The first thing docker does is it checks its local registry and tries to find a match for the base image the docker build is requesting. If a matching image is located in the local registry, it uses that one in lieu of downloading the image.

While I agree that this is the way it's supposed to work, I have unfortunately worked at companies with "stateless" build/CI servers that download the Docker image each build.

umvi · on Aug 24, 2020

Well, this policy change will force them to be more efficient, and it's a net win for everyone

Androider · on Aug 24, 2020

Or just pony up the $5/mo for Pro... not as fun as re-engineering your CI pipeline, once again.

koolba · on Aug 24, 2020

You have to re-engineer it anyway to authenticate your Pro account.

Pet_Ant · on Aug 24, 2020

> While I agree that this is the way it's supposed to work, I have unfortunately worked at companies with "stateless" build/CI servers that download the Docker image each build.

Couldn't they remain stateless but be redirected through a caching proxy? Memoization is not contrary to statelessness.

jopsen · on Aug 24, 2020

Sure, now they have to build a proxy...

eblanshey · on Aug 24, 2020

Super easy to run a docker cache proxy:

    docker run -d -p 6000:5000 \
    -e REGISTRY_PROXY_REMOTEURL=https://registry-1.docker.io \
    --restart always \
    --name registry registry:2

That's it. Now fetch docker images from the IP that command is running on. Taken from gitlab: https://docs.gitlab.com/runner/install/registry_and_cache_se...

terom · on Aug 25, 2020

Is that going to actually help with the manifest-based rate limits? It sounds like it only caches the layers, the manifest metadata for a tag is not cached.

https://docs.docker.com/registry/recipes/mirror/#what-if-the...

> When a pull is attempted with a tag, the Registry checks the remote to ensure if it has the latest version of the requested content. Otherwise, it fetches and caches the latest content.

eblanshey · on Aug 29, 2020

Hm you're right. I wonder if there's a way to cache a tag's metadata for a while...

kortex · on Aug 24, 2020

I think this addition to docker/config.json should do the trick to make it hit the proxy?

https://docs.docker.com/registry/recipes/mirror/#configure-t...

hinkley · on Aug 24, 2020

Artifactory is less bad than most of the tools I have to use all day.

falcolas · on Aug 24, 2020

Artifactory is the very definition of expensive (even at an enterprise scale) when it comes to docker images though.

hinkley · on Aug 25, 2020

Can you tell me more? How expensive are we talking?

Working for the same sized companies for a while has apparently dulled my senses. At a certain size, the capital that matters is the political capital it takes to get a vendor agreement in place to begin with. The monthly costs of the system are something you only feel through pushback on how big the repo gets, or the rate of traffic (experiencing the latter now with a browser testing SaaS)

solatic · on Aug 24, 2020

> This is significant - if your organization only uses a few dozen base images from DockerHub, those images will only be downloaded by each build node _once_, then never again.

You're assuming that the set of build nodes is relatively static.

Plenty of architectures set up autoscaling for the underlying nodes, that terminate servers that aren't being used and relatively soon enough (tens of minutes, hours) spin up new servers to replace them as needed.

Rarely do the machine images used to spin up new servers include the base images of the containers that will be spun up to replace them. Much more often, the base machine image is a base OS image, and container images are downloaded on-the-fly as needed. Essentially, the engineering cost of making image-launching more efficient was externalized onto an external provider willing to pay the price.

closeparen · on Aug 24, 2020

If you're doing this, you're in a cloud environment that's also proximate to a blob store, and can trivially host your own registry.

outworlder · on Aug 24, 2020

> and can trivially host your own registry

That is far from trivial.

maccard · on Aug 24, 2020

And now you have an alternative option - pay $5/month

grumple · on Aug 24, 2020

If you’re using docker for production distribution of images, you should be paying for it. That’s exactly the behavior that creates the need for a limit.

mvanbaak · on Aug 24, 2020

Or don’t use docker in production ;)

secondcoming · on Aug 25, 2020

You can just build your own machine images and not use docker at all.

AaronFriel · on Aug 24, 2020

CI/CD systems on AWS, Azure, GCP and others might be running on Kubernetes containers (using kaniko, podman, etc) or using Docker-in-Docker, and there isn't a widely supported or in-cloud-platform tool for sharing cached layers.

And as pointed out below, even if you are intelligently caching layers, manifest requests count as a pull. As far as I know, no caching proxies exist for Docker that support limiting manifest pulls.

stingraycharles · on Aug 24, 2020

Surprised to see Docker-in-Docker mentioned so deeply down here. It’s an extremely valid way of doing things, and non-trivial to implement a caching layer for.

pbalau · on Aug 24, 2020

Isn't Docker-in-Docker actually using the host's Docker daemon? I am mounting the docker socket in all my Docker-in-Docker containers, thus all the build tasks running on the same host can share the caches.

I guess one could have docker containers that actually run docker, but I don't see a reason to do that...

cpuguy83 · on Aug 24, 2020

No Docker-in-Docker would generally refer to running a new dockerd inside of a container.

eblanshey · on Aug 24, 2020

I was wondering how Docker-in-Docker works, but I couldn't find it dockermented anywhere. If it's using the host's Docker daemon, why do you need to mount the docker socket?

pbalau · on Aug 24, 2020

> If it's using the host's Docker daemon, why do you need to mount the docker socket?

There are 2 components for docker: the daemon and the tool used to send commands to the daemon. In order for said tool to be able to send commands to the daemon, it needs a way to communicate with the daemon. Mounting the socket in the container is the easiest method.

I have a "tooling" image that consists of a set of scripts (python code) to do various things ops related. One of the things is to build new images when required. I have a script that given a git commit will detect the images that need to be build and build them. Having my tooling code in a container makes it easier to deploy and use new versions of the tooling code. I don't need anything on the host apart docker itself. No build scripts, no python.

As I said, i could be running the docker daemon inside the container, but that breaks one of my rules related to containers: containers are not virtual machines, they should only run 1 process and the output of that process should be std out.

eblanshey · on Aug 29, 2020

Very interesting, thanks for sharing! I found a good article about it: https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-d...

At the end he describes mounting the socket. The tooling image which has all the dependencies needed to build will also have the docker cli installed, which is what I'm assuming you are doing.

I might just use this. Cheers!

kinghajj · on Aug 24, 2020

Docker-in-Docker (DinD) doesn't piggy back on the host's Docker daemon, but instead runs a stripped-down Docker daemon inside of the container. The major downside is that I/O is quite slow, since you're going through two virtualization layers (the DinD one, plus the host Docker daemon).

cpuguy83 · on Aug 24, 2020

This is not true.

There is, effectively, no "virtualization" layer here. There are some things that if needed can cause overhead... such as the bridge networking (really shouldn't be a bottleneck for majority of people), and the CoW filesystem... which docker won't be (or shouldn't be) running on top of since, for example, overlayfs on top of overlayfs is not supported.

There is also nothing stripped down about the daemon inside of the container.

kinghajj · on Aug 24, 2020

Sure, I was speaking off the cuff based on my experience from a few years ago. Maybe I messed up and somehow had the DinD daemon not use a volume mount, and that's what caused it to build images slowly?

cpuguy83 · on Aug 24, 2020

Very well could be since it would have to fallback to the naive graphdriver that just copies stuff around.

eblanshey · on Aug 24, 2020

Will mounting the socket, as the person I replied to suggested, make it use the host's docker daemon?

pbalau · on Aug 24, 2020

Yes, that's the point.

boudin · on Aug 24, 2020

Usually docker outside of docker is used, no? If the image is cached on the host, it would be available to any container having access to the docker daemon socket as well since it's the same daemon.

stingraycharles · on Aug 25, 2020

No that’s only the case if you mount the Docker socket into the container, which is not what Docker-in-Docker is.

hinkley · on Aug 24, 2020

> This is significant - if your organization only uses a few dozen base images from DockerHub, those images will only be downloaded by each build node _once_, then never again.

Only if your build nodes have unlimited storage. If the build nodes are spun up on demand or have housecleaning tasks to prevent Tragedy of the Commons disk exhaustion, this is not true.

On the other hand, this is what caching proxies/registries are for.

maxmcd · on Aug 24, 2020

I'm not sure if this is 100% true any more. I've found that when enabling the DOCKER_BUILDKIT=1 env var that docker will sometimes eagerly re-fetch stale images. I think your argument is generally still true, but was happy to see that some progress is being made on dealing with stale `latest`.

scarface74 · on Aug 24, 2020

This is significant - if your organization only uses a few dozen base images from DockerHub, those images will only be downloaded by each build node _once_, then never again.

Unless you’re using something like AWS CodeBuild that spins up a Linux/Windows container for your build environment, executes bash commands in a yaml file, and then terminates it when it is done. Nothing is stored locally after the build is finished.

I’m sure there are other similar services. Wouldn’t Azure Devops using hosted builds do basically the same thing? I haven’t used it since they changed the name from Visual Studio Team Services.

zxcvbn4038 · on Aug 24, 2020

What is a solution for the scenarios you have described? Amazon has ECR but it doesn’t support signing and doesn’t function as a proxy so you would miss upstream changes unless someone pushed them. Anything self hosted that supplies that functionality?

pbalau · on Aug 24, 2020

Do you really need all your private images to be derived directly from the upstream? Don't you start every image with:

    FROM foo
    RUN apt-get upgrade etc

?

Then why not have a set of base images, derived directly from upstream that get built every so often and have your private images be derived from that? This will not only relieve the stres on DockerHub and prevent you from having to pay the 5/month, but also give your security people a hook to run their tests and make your private images build faster, since all the system updates won't happen every time you change the code.

scarface74 · on Aug 24, 2020

Pay $5.00 a month. Docker is a business that deserves to get paid if they offer something valuable.

zxcvbn4038 · on Aug 24, 2020

Well that takes all the fun out of it. Looks like Docker itself has a Dockerhub proxy -

https://docs.docker.com/registry/recipes/mirror/

https://hackernoon.com/mirror-cache-dockerhub-locally-for-sp...

https://stackoverflow.com/questions/32531048/docker-pull-thr...

https://docs.docker.com/registry/configuration/

https://www.google.com/amp/s/ops.tips/amp/gists/aws-s3-priva...

If using Alpine, looks like docker-registry is the needed package and /usr/bin/docker-registry serve /etc/docker-registry/config.yml is the command line. Next to last link has information on the config file.

AaronFriel · on Aug 24, 2020

Per your first link:

> What if the content changes on the Hub?

> When a pull is attempted with a tag, the Registry checks the remote to ensure if it has the latest version of the requested content. Otherwise, it fetches and caches the latest content.

If that causes a manifest pull, it counts as a pull and will be rate limited. Yikes! This could lead to wildly nondeterministic behavior.

zxcvbn4038 · on Aug 24, 2020

Yes, it does a pull but caches the response, so subsequent pulls should hit the local cache and not be limited.

AaronFriel · on Aug 24, 2020

I don't think that's what it means.

> When a pull is attempted with a tag, the Registry checks the remote

Checking the remote is a manifest pull.

zxcvbn4038 · on Aug 25, 2020

Difference is HEAD request or conditional GET, the server will not send a file if it matches the time and/or tag of the version you have, so you are replying with a few bytes rather then (potentially) dozens or hundreds of megabytes. Same with all CDNs.

judge2020 · on Aug 25, 2020

This still counts as a manifest pull for rate limiting purposes based on what i'm seeing in this thread.

fnord77 · on Aug 25, 2020

If everyone in your company (plus your CI system) is behind the same firewall/IP address, that's going to be a lot more than 200 pulls.

0xbadcafebee · on Aug 25, 2020

Luckily though, companies have cash to pay for registrations, and those that won't probably have engineers who can set up Squid proxies.

james-mcelwain · on Aug 25, 2020

Lots of tools built on top of Docker do imbue special meaning in latest, however.

Keverw · on Aug 24, 2020

I kinda wonder if Docker as a company is struggling. Redhat made Podman which is a compatible replacement, Then there's swarm but apparently that's not recommended and actively developed anymore, then as far as I know they sold off their enterprise clustering product. Seems Kubernetes is the popular thing now even if a bit complex to setup. Wonder what the current business model? Pretty neat idea of using containers, but seems they put it out in the wild, got popular and sorta lost control with so many competing options being released.

young_unixer · on Aug 24, 2020

> Redhat made Podman which is a compatible replacement

I would actually prefer if they made an incompatible replacement. Docker's CLI is pretty bad in my opinion.

I want to use Docker the same way I use a headless virtual machine running an SSH server. I want starting/exiting containers to be independent from their 'main process'. I want to attach/detach whenever I need to and execute arbitrary processes.

-- Just use /bin/bash as the main process

This seems to be the workaround, but I always have problems with containers exiting when I don't want them to and it's just harder than what it needs to be. I've spent a total of like 6 hours learning Docker and I still don't know exactly how to achieve this simple workflow without my containers quitting on me or attach/detach issues. With VirtualBox I can do this easily. Am I too stupid to use Docker?

-- Then just use VirtualBox

That's what I do, but I would like not to have the overhead of a vm.

sjy · on Aug 24, 2020

> I want to use Docker the same way I use a headless virtual machine running an SSH server. I want starting/exiting containers to be independent from their 'main process'.

If you’re running systemd anyway, check out systemd-nspawn. Your ssh command becomes `machinectl shell user@container`. It’s a more VM-like way of managing containers, without Docker’s image distribution features or philosophy that containers should be ephemeral.

krferriter · on Aug 24, 2020

They made podman as a fully compatible replacement so people could easily drop-in replace their use of docker with podman, which worked.

To handle spurious interrupts from /bin/bash you can put a small script as the entrypoint containing a while true loop with a sleep infinity in it.

viraptor · on Aug 24, 2020

> I want to attach/detach whenever I need to and execute arbitrary processes.

Isn't "docker exec -ti container-id /arbitrary/command" enough for that?

ed25519FUUU · on Aug 25, 2020

And if you want a shell? Just use “bash” as the command.

lathiat · on Aug 25, 2020

Sounds like you might want LXD which starts and leaves running a full "machine" container. You can even SSH to it if you want or just use "lxc shell bionic" to get into it.

https://linuxcontainers.org/lxd/getting-started-cli/

cpuguy83 · on Aug 24, 2020

There is a key sequence for detaching from the container... default is ctrl-p+q.

But if you want to not deal with attach/detach, perhaps `docker exec` is what you want. It doesn't affect the main process (unless of course your command you run kills the main process).

IceWreck · on Aug 24, 2020

have you looked at toolbox? Its by Red Hat and works with podman under the hood

zapita · on Aug 24, 2020

Yes Docker seems to be struggling as a company. But I doubt Podman has anything to do with it. The adoption of Docker open-source tools is massive and 99% of its users have never heard of podman or any other clones, and likely never will. The problem is simply that those tools are free, and Docker has failed to convert the success of their free tools into a successful business.

hda111 · on Sept 4, 2020

It’s partly open source and partly freeware. They could just make docker for windows/mac a paid software. I would pay for it if they would listen more to the community when a bug is found. They seem to ignore many bugs Docker Desktop on GitHub. I like about the podman Tools that there is a community effort from Red Hat. It’s not 100% compatible with docker and probably never will. So I will just hope that Microsoft or Canonical buys Docker and make it more open to the Community.

pm90 · on Aug 24, 2020

They won the container war but lost the orchestration war. Even if docker compose was successful though, I fail to see how the clouds wouldn’t just replicate everything. So I guess they just failed to monetize the technology.

zapita · on Aug 24, 2020

> So I guess they just failed to monetize the technology.

Yes, it's really that simple. All those "container wars" and "orchestration wars" are a distraction from the core issue, which is that all those container and orchestration tools are open-source, and it's very hard to build a viable business on top of them. Docker tried and failed, like most startups involved.

pjmlp · on Aug 24, 2020

Anyone that wants to make money with developer tools with the free beer generation, can only focus on enterprise customers, while adopting the traditional sales models.

Even here, when commercial projects are show, there is always an endless thread of free beer open source alternatives.

viraptor · on Aug 24, 2020

> Podman which is a compatible replacement,

Kinda... It doesn't support caching layers for example which makes it very different in practice.

hda111 · on Sept 4, 2020

It’s still amazing that there is an alternative. It must be tedious to copy such a bad CLI design over to podman. LXD CLI is far superior.

paxys · on Aug 24, 2020

After Docker Swarm failed it was clear that they could not survive just on the core Docker tech and CLI, which are all becoming less valuable day by day due to the various open container initiatives. In the absence of a killer product they are still a ripe acquisition target, but not a successful business.

marcosdumay · on Aug 24, 2020

Isn't everybody struggling right now?

Except for Zoom, of course.

LockAndLol · on Aug 24, 2020

Yes, good. After reading the comments on their image retention limits [0] talking about simply pulling the images all the time to keep them fresh, this seems like a reasonable response.

I'll repeat what I wrote there [1]:

If people really think this is a problem, they'd contribute a non-abusive solution. Writing cron jobs to pull periodically in order to artificially reset the timer is abusive.

Non-abusive solutions include:

- extending docker to introduce reproducible image builds

- extending docker push and pull to allow discovery from different sources that use different protocols like IPFS, TahoeLAFS, or filesharing hosts

I'm sure you can come up with more solutions that don't abuse the goodwill of people.

-----------------------------

Additionally, hosting a local network docker repo would mitigate this rate limit completely. Or straight up pay. It's not that difficult. Getting mad about a free, open-source service becoming pay to use... I couldn't imagine the gall and conceitedness.

0: https://news.ycombinator.com/item?id=24143588

1: https://news.ycombinator.com/item?id=24144475

kortex · on Aug 25, 2020

> introduce reproducible image builds

This is a great idea in concept, but in practice very challenging.

RUN curl https://www.random.org/integers/?num=1&min=1&max=99999

Docker will cache this after the first invocation. The build is not reproducible. Now what?

Replace "curl random.org" with "nondeterministic and really expensive code build/model training/etc operation".

> extending docker push and pull to allow discovery from different sources that use different protocols like IPFS, TahoeLAFS, or filesharing hosts

This is great, if you can solve the image integrity/trust issues therein, which should be just some signing/merkle tree work.

rhacker · on Aug 24, 2020

Ugh, I don't envy their position. There are many ways to reduce the size of a docker image. I'm guilty too. Probably the best thing to do is leverage multi-stage builds. Those have the largest effect on repo size. (Like a 10x reduction often).

The problem is, docker, the company behind the repo, has no control over what Open Source Joe and Developer Suzy are committing and the other developers pulling down their images. They can send out all these notices and announcements, and I think the typical reach of such things probably gets like what, %0.05 percent of the developers it needs to?

And of those, are any willing to rewrite the image to be smaller?

spockz · on Aug 24, 2020

Well, they could use different rate limits depending on the size of the image. Say if the image size (or the size of the added layers) is in line with what we want for dockers you could offer different rates. Unlimited for images <10MiB, high limits for <100, and low limits for everything else. That way they both push for small images, and keep everyone happy.

Or people will just add a proxy/imagestream in between instead of directly pulling from docker hub.

efreak · on Aug 25, 2020

Or you just do it the easy way and limit the speed based on the amount of data downloaded so far, with the history being 'lost' after 6 hours. This way you also prevent someone from doing dumb things to get around it, like using multiple connections (or in this case, multiple layers for a project that doesn't actually need them)

spockz · on Aug 25, 2020

That is possible but it means distributing the total amount of data to all the nodes. Luckily it doesn’t require correctness but it is more complicated to setup regardless.

zelly · on Aug 25, 2020

This is the future of containers:

https://guix.gnu.org/

https://nixos.org/

You can build a Docker-compatible image from a Guix or Nix package. You never have to use Docker or Docker Hub.

The limitation of Docker is that the nice semi-reproducible sandbox you get exists on top of an operating system that was not designed for it, resulting in giant blobs to get it to work. It's inefficient and a band-aid stopgap until we get to the future where the operating system is a pure function (which can be versioned in a tree, diffed, reverted, etc. just like git). If you used NixOS, you wouldn't need Docker. Sure, it's available and you can use it, but you wouldn't need to.

stjohnswarts · on Aug 25, 2020

It works fine though and a lot of companies have valuable time and process built on top of it. However I'm all for natural selection, if these are better products may the the best container win

Crazyontap · on Aug 24, 2020

Honest question, but why can't docker use something like bittorrent to download images?

Most of us download our OS via torrents only, so we may as well download the images too if there was support for it.

saagarjha · on Aug 24, 2020

> Most of us download our OS via torrents only

I mean, I only do it to stick it to the people who claim torrents can only be used for piracy; I think most people prefer the simplicity of direct downloads though…

Nullabillity · on Aug 24, 2020

Docker images already need special handling since you download the layers separately and reassemble them. Going from that to full BitTorrent should be transparent to the users.

In fact, there already exist several implementations of it for Docker![0, 1, 2]

[0]: https://coreos.com/blog/torrent-pulls

[1]: https://d7y.io/en-us/

[2]: https://github.com/uber/kraken

bigmanwalter · on Aug 24, 2020

I can usually max out my connection speed with torrents. This is rarely the case with direct downloads.

stingraycharles · on Aug 24, 2020

The start up time of a torrent is typically so long though, by the time it has connected to all peers and is downloading I have already downloaded the iso with a direct download anyway.

namibj · on Aug 25, 2020

Soon, when QUIC is widely supported by the swarm (just needs updating to the current version), IPFS should be better than torrents for start-up delay, and much more importantly, it allows for sharing data across "torrents" when using content-defined chunking via rabin or buzhash. This means that things like common larger binaries get shared between images that include them, which should greatly increase the average amount of seeders for the chunks that make up an image.

stonewareslord · on Aug 24, 2020

There's a feature of linx-server that provides uploaded files with a torrent URL as well as a regular download URL. I believe the torrent client tries fetching data from peers as well as from linx (http).

https://github.com/andreimarcu/linx-server

skunkworker · on Aug 24, 2020

On symmetric gigabit fiber only a few services, mainly steam and battle.net have ever given me 90mbyte/s download speeds. A surprising amount will limit to 500mbit or less, even if you have the download pipe for it.

saagarjha · on Aug 24, 2020

Guess I just have slow internet :(

Crazyontap · on Aug 24, 2020

At least for Ubuntu and popular distros like mint you get super high speeds on torrent maybe because you have a peer in the same region.

Afaik Windows is also using this to install updates where it shares the download with others in the region (1) using p2p (though i may be wrong since i don't use Windows anymore)

(1) https://www.itproportal.com/amp/news/how-to-stop-windows-10-...

jacques_chester · on Aug 24, 2020

> Honest question, but why can't docker use something like bittorrent to download images?

Docker will be limiting manifest operations, not actual blob transmissions.

user5994461 · on Aug 24, 2020

How would bittorrent work in companies? Only HTTP traffic is allowed and often only when going through the company proxy.

cogman10 · on Aug 24, 2020

You could have a mixed mode pretty easily. Heck, it would even be pretty efficient.

For your own company, you'd host a swarm that is firewalled in. Then when someone says "I want image xyz" the first thing you do is look for seeders in the swarm for that file. If non exists, then you initiate an http download from docker to get the image.

Now you've got fast distribution with low external network traffic.

Not sure how this would play with Cloud provider pricing, though. I don't believe AWS would be too happy seeing their services turned into BT swarms :)

yjftsjthsd-h · on Aug 24, 2020

> How would bittorrent work in companies? Only HTTP traffic is allowed and often only when going through the company proxy.

1. Only some companies work like that.

2. I'd expect it to work like a webtorrent; try to download by p2p, but if that fails then fall back to HTTP.

Crazyontap · on Aug 24, 2020

- Its an alternative download solution

- There is no rate limiting for paid accounts or companies from what I see in the article.

znpy · on Aug 24, 2020

The company proxy would have to be modified to allow torrent traffic, I guess?

bsamuels · on Aug 24, 2020

The hardest part of that would be verifying image authenticity.

Google Cloud uses an adjacent feature called binary authorization. When turned on, only images that are signed by a given authority (usually your ci/cd instruments) can be run inside your Kubernetes cluster.

Binary authorization may be a good starting point for someone trying to make bittorrent distributed images a usable thing.

sneak · on Aug 24, 2020

> The hardest part of that would be verifying image authenticity.

That's exactly what Bittorrent does with its hash tree. You'd get the root hash (extremely tiny) from Docker Hub, and the rest of the metadata, as well as the data blocks, from the swarm. The authenticity is all handled by the TLS that serves you the root infohash from Docker Hub. It's a Merkle tree: the root hash is for the metadata, which is a list of hashes of the blocks.

im3w1l · on Aug 24, 2020

If this reads like greek, let me dumb it down.

You get the hash of the final result from the trusted server and hash is checked. Because of this you will never get an invalid image.

There are also some clever tricks to make sure no one can force you to start over from scratch by sending wrong data. But that's more of a detail.

pixl97 · on Aug 24, 2020

Why, the website could still have the hash for the image which is only a few KB versus hundreds of megabytes or even gigabytes. Just have the bit-docker app check the hash before executing.

jtsiskin · on Aug 24, 2020

What is the issue here? Is the torrent checksum (provided by docker) not enough?

bsamuels · on Aug 24, 2020

Let's say someone hacks a maintainer for the Ubuntu base image. The hacker publishes a new version of the base image with a backdoor.

When the backdoor is detected, you now need a revocation system so the distribution of the malicious image will die. You can theoretically do this on the tracker level, but people may build other trackers that may not propagate the changes.

lights0123 · on Aug 24, 2020

You still would have a centralized manifest system though, right? It shouldn't hurt Docker much at all to host a few KBs of data describing the hash of each layer, which is still fetched every time, just big downloads are done over torrents.

closeparen · on Aug 24, 2020

https://github.com/uber/kraken

jpambrun · on Aug 25, 2020

I see a lot of people mentioning the low cost, saying that it's no big deal. It's not the cost that I find annoying; it's dealing with credentials and secrets..