Hacker News new | past | comments | ask | show | jobs | submit login
Pull Through Cache Repositories for Amazon Elastic Container Registry (amazon.com)
53 points by dantiberian 58 days ago | hide | past | favorite | 26 comments



We were getting requests to pay docker a lot of money because of the volume of docker pulls we were doing. We used a pull-through cache on DO as a work around and its a been a big success.

My colleague wrote up a guide to pull-through caching dockerhub.[1] Docker's pricing is a little bit funky, because a full image pull with all the layers costs the same as a request where you retrieve no layers at all.

If this AWS feature were out a couple months ago and had support for dockerhub, I'm not sure we would have set this up.

[1] https://earthly.dev/blog/pull-through-cache/


>Running our test suite 2-3 times over the span of a couple hours would trigger the rate limit…

Not to single you out, but not too hard to imagine why companies start enabling rate limiting. I oftentimes wonder the obscene amount of bandwidth being consumed by this developer automation. This page[0] indicates that PyPi serves 900 terabytes daily.

Seems obvious we need more turn-key solutions that enable caching for all of the popular package distributions (PyPi, NPM, Docker, etc). Too hard to configure and everyone will default to the path of least resistance (ie hammering the origin servers).

[0] https://dustingram.com/articles/2021/04/14/powering-the-pyth...


The fun thing about the docker rate limit is that the layer tarballs are stored on S3 in us-east-1.

If you do a docker pull inside us-east-1, you’ll get a direct S3 link instead of a proxied download, and thus neither you nor docker need to pay AWS anything for bandwidth!


> If you do a docker pull inside us-east-1, you’ll get a direct S3 link instead of a proxied download, and thus neither you nor docker need to pay AWS anything for bandwidth!

I thought cross-account data still cost something; is that not the case?


> You pay for all bandwidth into and out of Amazon S3, except for the following:

> Data transferred from an Amazon S3 bucket to any AWS service(s) within the same AWS Region as the S3 bucket (including to a different account in the same AWS Region).

https://aws.amazon.com/s3/pricing/


Ooooh very neat thanks for sharing:) That has... interesting potential...


Also consider enabling S3 gateway endpoints:

https://docs.aws.amazon.com/vpc/latest/privatelink/vpce-gate...

That avoids paying NAT Gateway charges for traffic to S3 and in my testing also made a bit of a latency reduction, which came in handy once when I had a one-off data small file migration.


Thank you:)


If you work in a large company, for security reasons you may already be required to use in-house mirrors of libraries or container images.

There are commercial products for doing this, e.g. https://jfrog.com/artifactory/

All the command line package management tools then need to be configured to point at the custom in-house package mirror, rather than connecting to the default public package server over the internet.


Agreed! I don't mean to complain about docker, they build great products.


Hey all! I work on the container services team at AWS. If you have any questions about this new feature, feel free to ask!


It doesn’t appear there is support for Dockerhub yet. When will that be supported?


For images on Docker Hub there is a slightly different approach. As of today you can also find many of the top official Docker Hub images being mirrored to ECR Public, so you don't even need a pull through cache for those, you can pull from ECR Public directly:

- ECR Public Gallery: https://gallery.ecr.aws/docker

- Launch blog: https://aws.amazon.com/blogs/containers/docker-official-imag...


Unfortunately we can't just add this as a registry-mirrors in the docker daemon.json which means we still need to write all our Dockerfile's/build processes/container image selection to use the new ECR image URL, which is a shame.

Google supports this within GCP: https://cloud.google.com/container-registry/docs/pulling-cac... which makes it easy to avoid going all the way to Docker Hub for images.


This feature is exciting, but ultimately a complete non-starter for us because of no Docker Hub support.

Are there plans to add it in the future?


Why is it the case?

That sounds like a political decision and not a technical one.

Looking at the gallery, the images I'm interested in seem to be built/uploaded by a different entity. Which is not great.


Looks like only official Docker images from Dockerhub will be in the ECR gallery. "Other images are under consideration for the roadmap next year, along with authenticated private registries."

Meh.


> As of today you can also find many of the top official Docker Hub images being mirrored to ECR Public, so you don't even need a pull through cache for those, you can pull from ECR Public directly

Is there any benefit to using pull-through caching with ECR Public images? Seems like it would just add extra storage costs.


ECR Public images are technically owned by the third party that uploaded them. Many AWS customers prefer to pull these public images into their own ECR private registry for more secure ownership of the copy.


Thanks!


Wanna kill Dockerhub completely? A mutating admission controller for Kubernetes that would mirror an image the first time it saw it, and the. swap out the Dockerhub image URI for the ECR one would totally cut our reliance on Dockerhub.

If you don't build it, we might build it and open source it ourselves.


Off topic.

We have our images saved into private ECR. We are trying to deploy those using AWS lighthouse or copilot. Haven’t found any material on how to deploy an ECR image to either one of those services. Any references you can provide?


The cool thing about AWS Copilot is that you don't have to manage your own image build and push anymore. Instead Copilot builds the container image from source, pushes it to ECR automatically, and then launches it in Fargate, with one command. You can find more info on that here: https://aws.github.io/copilot-cli/docs/getting-started/first...

For Lightsail you use the command line to run `aws lightsail push-container-image`. This also automatically manages the ECR and container image push for you. You can read more about that here: https://lightsail.aws.amazon.com/ls/docs/en_us/articles/amaz...

Basically these higher level tools don't require you to start from a preexisting container image in ECR. Instead they help you push your container to the cloud automatically, and you don't even have to touch ECR directly.


How bad is the WLB


I assume by WLB you mean "Work/Life Balance"?

Can't speak for every person at AWS, but I can say that I'm personally happy with my work/life balance working at Amazon. I work greater than 40 hours a week at times, but less than 40 hours a week other times. Overall it balances out between my needs and the needs of my broader team. I have the support of a great manager, who has never pushed me to do any extra overtime work that I didn't already want to, but has encouraged me and the team to take time for ourselves when we have been working really hard.

I also think work life balance has a lot to do with how much you enjoy the work. I've worked remote for AWS for nearly 5 years now, and my particular role at AWS gives me a lot of freedom in deciding what I do day to day, from writing blogs and technical content, to creating sample code, engaging with folks on social media, recording videos, travelling and giving talks at conferences, providing feedback on internal product specs, testing preproduction releases before they go live, etc. So the variety keeps me interested.

Hope this insider perspective is helpful!


Refreshing




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: