AWS Publishes Reference Architecture, Implementations for Deployment Pipelines

faizshah · on Feb 19, 2023

I also found this post on Cell based architectures this morning, I didn’t know this was public: https://aws.amazon.com/solutions/guidance/cell-based-archite...

And a talk on what a cell based architecture is here: https://youtube.com/watch?v=HUwz8uko7HY

Cells are basically subsets or shards of customer resources with infrastructure isolated from other cells. The reference deployment pipeline in the blog post [0] would deploy to each cell individually in different waves depending on how your team configures the waves ultimately limiting the blast radius of a deployment to a smaller subset of customers.

[0]: https://aws.amazon.com/blogs/aws/new_deployment_pipelines_re...

klysm · on Feb 20, 2023

I like having groups of customers segmented into copies of your architecture. It means you know how to spin that up independently which is good for a lot of reasons.

igetspam · on March 1, 2023

I was just talking about cell based architecture with one of our engineers today. Seems fun.

ldjkfkdsjnv · on Feb 19, 2023

This looks similar to the internal pipelines framework that all of amazon uses to deploy code. Funny to see truly how much value amazon has gotten out of developing frameworks for the retail business, and then making those frameworks available publicly. The abstractions work because they have already been vetted across amazon for a decade

faizshah · on Feb 19, 2023

I think the last missing piece is getting reproducible builds which I think you can get from buildpacks [0], NixOS [1], or Bazel [2] running in your CodeBuild.

[0]: https://buildpacks.io/docs/features/reproducibility/

[1]: https://nixos.org/

[2]: https://testdriven.io/blog/bazel-builds/

ldjkfkdsjnv · on Feb 20, 2023

For all its complexity, brazil is amazing

faizshah · on Feb 20, 2023

There was actually an interesting blog post I saw on using content hashing with makefiles: https://andydote.co.uk/2022/09/19/make-content-hash/

I was thinking someone could probably build a manyrepo solution similar to Brazil on top of an idea like this. I don’t see many tools like Bazel built for manyrepos.

lkschubert8 · on Feb 20, 2023

What am I missing that makes you feel that way about Brazil?

faizshah · on Feb 20, 2023

There’s no equivalent OSS implementation of a multi language lock file like Brazil’s. You have to build your own or go with a monorepo tool like bazel. Amazonians take reproducibility for granted because of Brazil.

slowmovintarget · on Feb 20, 2023

You're talking "Bazel" right? No "r" pronounced like the herb "basil."

Or is there some other framework named "Brazil?"

Autocorrect gone viral?

jcparkyn · on Feb 20, 2023

Brazil is an internal tool used at Amazon.

quag · on Feb 20, 2023

NixOS is an equivalent and it does not require a mono repo.

faizshah · on Feb 20, 2023

I’d be happy if I’m wrong about this but I don’t think you can get a reproducible Node.JS build or python build from NixOS out the box. You need to build something on top of it.

fcarraldo · on Feb 20, 2023

It’s Terry Gilliam’s best work.

ldjkfkdsjnv · on Feb 20, 2023

A deeply integrated build system with AWS would be very interesting.

xenophonf · on Feb 19, 2023

It's nice to see something like this. I maintain a reference implementation of some open-source software on AWS, but that was pieced together over the years from other people's work, now deleted blog posts (God bless the Internet Archive), and unmaintained sample code on GitHub. Hopefully, a fully worked example like this will fill in a lot of the blanks in my understanding, like how to best implement blue/green deployments (the CodeDeploy hook for CloudFormation has a lot of weird limitations), unit test automation, and monitoring.

Edit: It's blog spam, sadly. Here are direct links to the actual blog posts and tools.

https://aws.amazon.com/blogs/aws/new_deployment_pipelines_re...

https://pipelines.devops.aws.dev/

https://github.com/aws-samples/aws-deployment-pipeline-refer...

https://aws.amazon.com/builders-library/automating-safe-hand...

voytec · on Feb 19, 2023

AWS blog post:

https://aws.amazon.com/blogs/aws/new_deployment_pipelines_re...

bobnamob · on Feb 19, 2023

Not mentioned in the article but I’d recommend an end to end test pipeline as well. Deploy your tests like they’re your customers, independent and “adversarial”.

Also, one-box/one-cell deployment stage before your production stage. Beta/gamma are well and good, but they’ll never perfectly replicate an actual prod deployment

jrvarela56 · on Feb 20, 2023

What level of usage warrants these kinds of setups? Say, beyond a workflow that runs CI on PR/master off of Github (Actions or CircleCI etc) and deploys to something like Heroku or Vercel.

Do companies run apps that get 1k rps on stuff like cell-based-architechture (mentioned in top comment)?

goostavos · on Feb 20, 2023

Kinesis? Services operating at massive, hard-to understand scales?

FWIW, I work at Amazon, and a lot of things are still solved with "Boring" tech that's not "web scale". Even without our own ranks we've got to remind people that engineering is about fitting the architecture to the requirements and no more. It seems to be lost knowledge these days that an RDBMS gets you really, really far. The vast majority of apps -- even those that appear superficially "large" -- will never see usage patterns that justify anything more than a box and a database. You'd be surprised which systems are just humming along unceremoniously with a humble 3-tier architecture

bradknowles · on Feb 21, 2023

So, as you scale up and out in size, you stop being able to depend on higher level abstractions from elsewhere.

You can't build a Tier-1 service like S3 or EBS on top of RDS. If you need a database solution to help you with building and running tools of that scale, it has to be something else that is a lot less complex and a lot more robust.

It's okay for RDS to be dependent on a Tier-1 service like S3 or EBS, but not the reverse. You don't want to get into priority inversion problems here.

And when your service has to be in every AZ in every region, and they all have to be kept separately running, then things get even more complex.

Then you have to ask yourself what is the bootstrap process for building a new region.

hourago · on Feb 20, 2023

It is quite weird that "Build" and "Unit Test" are part of the Deployment Pipeline.

For me the Deployment Pipeline starts with the Acceptance Tests that ensures business quality running over already created artifacts. There are advantages to this approach, as different deployment pipelines may be required to deploy to different platforms even if all artifacts are generated by one build pipeline.

AWS seems to mix the concepts of Build Pipeline with Deployment Pipeline.

Deployment has had historically a meaning in software development, I do not understand why to change it now.

borissk · on Feb 20, 2023

>> AWS seems to mix the concepts of Build Pipeline with Deployment Pipeline.

>> Deployment has had historically a meaning in software development, I do not understand why to change it now.

Maybe because traditionally software shops would run a build server on premise, and a deployment on AWS. Now Amazon want's to move the build process on AWS as well - making them a little bit of extra money.

hourago · on Feb 20, 2023

> making them a little bit of extra money.

That may be the reason. To create an article that looks like professional best practices but it is just an advertisement. Microsoft used to do a lot of that in its war against Open Source back in the day.

bradknowles · on Feb 21, 2023

So, at Whole Foods, we made heavy use of CodeBuild and CodePipeline. But I also had experience at building and managing most of the Jenkins servers that we used to use before moving over to the native commercial AWS tooling.

Conceptually, it's easy enough to put these two sets of components together in the same "pipeline" tool. You just have to make sure that all the right steps and components are available, and that they are properly used. If so, you can set things up so that all your code goes through multiple online checks when it gets uploaded to the repo, and you prevent the code from being able to be merged to "live" (or "head" or "master" or whatever you want to call it), unless all the checks have passed.

Once all the checks have passed, your pipeline can proceed to push that to live (possible with a human approval required), then deploy that to beta, then gamma, then a canary "onebox" in one AZ in one region, and then finally into your first real multi-server production environment in one AZ in one region. Then you can chain on from there, with potential tests that have to be passed at each stage, and baking in periods that are required, and then on to the next stage with the next onebox in the next AZ in the next region.

Rinse and repeat.

Of course, you also need auto rollback processes, in case your deployments fail. And what happens if you need to rollback the rollbacks? Ad infinitum.

To that level, it's all just logical extensions of the initial concepts.

klysm · on Feb 20, 2023

Is that still a common practice? I figured most companies are running builds in their existing pipelines infrastructure which in my experience is usually GitHub/Travis/GitLab/Azure DevOps/CircleCI/etc.. and then moving it up into AWS

borissk · on Feb 20, 2023

In my experience it is - having on premise build server is a lot cheaper than running a very CPU intensive task in the cloud. Can save thousands of $ a year.

klysm · on Feb 20, 2023

Yeah it seems like a good investment. Also feedback loop time on builds in dev hours

8organicbits · on Feb 20, 2023

The repo describes this as "a CI/CD pipeline to build an application and deploy [...]". I don't think they are changing any meanings here.

ProjectMoonShot · on Feb 20, 2023

I believe it is due to the fact that they are building for multiple targets e.g. CPU architectures such as ARM and X86 etc.

ris · on Feb 20, 2023

Hope you like CloudFormation

justin_oaks · on Feb 20, 2023

I tried to like it, but when it'd get stuck for a long time at a step with no good way to get information or control what was going on... I moved on to Terraform and have been relatively happy ever since.

makestuff · on Feb 20, 2023

CDK largely fixed the issues I had with cloud formation. Obviously that locks you into AWS even more so I get why people go for terraform.

rswail · on Feb 20, 2023

CDK-tf fixes that.

CDK is basically a make for makefiles using imperative code.

So you can just as easily target terraform with it.

brodouevencode · on Feb 20, 2023

CDK is problematic in that it's just a layer over an already highly abstracted transpiler.

klysm · on Feb 20, 2023

Seems pretty easy to substitute that out with whatever IaC tool you want

Spivak · on Feb 19, 2023

My only nit with their build stages is I don’t see them mentioning that the pipeline stages they outline are logical. When it comes to how the real pipeline actually runs you should ignore the stages and just specify the minimal dependency graph and let everything that can be run in parallel run. This usually means produce your build artifact first and then run everything immediately after. If you’re feeling fancy you can have the steps remember the logical stage they’re in and bail out of later stages on failure.

jvanderbot · on Feb 19, 2023

Not necessarily. You could do staged deployment or a release train built around beta vs gamma testing before prod.

Dowwie · on Feb 19, 2023

How much of this is really meant for AWS sized problems?

ygjb · on Feb 20, 2023

Not much. Having a function build pipeline takes a bit of work to get set up, but once it's up and running, and projects are built to depend on it, it takes away a fair bit of ops work.

That can help smaller teams become more agile, and being able to walk through a consistent process across multiple products for each step from commit against a dev branch through deployment in production makes it easier to account for team members being away, training new folks, etc. This is especially important for "side projects" or other small services that end up becoming critical infrastructure but are really maintained by one or two team members and only have a couple of internal customers.

faizshah · on Feb 19, 2023

Even small companies deploy to staging environments before production.

Your deployment pipeline most likely looks like:

- Build your language artifacts

- Build a container

- Deploy that container somewhere

- Run approvals like integ tests, unit tests etc. and monitor metrics to decide whether to deploy to your next environment or rollback.

This is just a structured way of building these CD pipelines you could also use Jenkins or Google started building something called Skaffold for Google Cloud Deploy: https://skaffold.dev/docs/

As your company gets larger you'll want to limit the blast radius of new changes so that you can meet your SLAs and you'll want to use multiple AZs and cloud regions for HA. That's where the idea of waves and cells comes from.

_skel · on Feb 19, 2023

If you deploy to multiple regions or datacenters it doesn't matter how big you are -- you should orchestrate deployments so you don't deploy to all of them at the same time. If something breaks, you'd rather break a subset of regions than all of them.

You should also deploy new code to some kind of non-production environment before going to prod. Even companies with the best possible canarying in prod will still do that.

Some of the technology involved is definitely AWS-scale, but the principles of software rollout are broadly applicable.

vishnugupta · on Feb 20, 2023

The very first line of the article[1] calls it out, "....for enterprise-grade deployment pipelines." So I guess it's relevant for Fortune 500ish companies.

That said, a base version of this pipeline is very useful even for a small startups. I concede that it takes a bit of time (took me about a week) to setup end to end. By end to end I mean beginning with empty AWS account to setting up VPC, ECS, CloudFront, Fargate, Codepipeline and so on. But once set you don't have to touch it for months on end. Scaling up/down, if needed, is as simple as minor config change.

[1] https://aws.amazon.com/blogs/aws/new_deployment_pipelines_re...

birdymcbird · on Feb 19, 2023

thats true. even if not amazon scale.. lets say you building multi tenant SaaS product. cell-architecture provide strong tenant isolation avoid noisy neighbor problem. other benefits like deploy changes to one cell so tenant may request delaying upgrade if they need to do pre-req work before hand.

for ordinary service use case or web site.. most probability do not need this. over engineering infrastructure when should focus on customer and problem domain..thing that increase top or bottom line.

source: principal SDE who launched several services at amazon using cell architecture.

nimbius · on Feb 19, 2023

I've always seen AWS as a rather expensive answer to the build. Any other vps provider or even dedicated Colo is likely much cheaper and less locked-in than the amazon ecosystem.

That having been said, the "reference architecture" being offered just feels like a slow day in the AWS marketing department during a recession. Nothing really special about it unless I've completely missed something?

jiggawatts · on Feb 19, 2023

I don't see why you're being voted down.

At $dayjob, AWS consultants regularly "assist" with architectures where the final solution is invariably draped across every one of the AWS proprietary offerings. Whether on purpose or not, the result is that customers are locked in and the budgets are often a "surprise!".

borissk · on Feb 20, 2023

In my experience when negotiating any discounts the Amazon sales guys are very aware exactly how locked in the customer is. If they use a ton of proprietary AWS features Amazon may offer 5% discount, on the other hand if the customer uses Kubernetes and other stuff that is easy to move to another cloud they may offer 50% off.

jamesfinlayson · on Feb 20, 2023

Yep, I've seen this at a previous job too - a 30 person team just dealing with the AWS knot.

brodouevencode · on Feb 20, 2023

Because it's easier to downvote than it is to write a thoughtful disagreement.

not_kurt_godel · on Feb 20, 2023

I'm just curious, have you actually built and operated a feature-equivalent build/deploy system on VPSes? If so, color me impressed, I'd love to see some source code or other artifacts if you're willing to share.

sandGorgon · on Feb 20, 2023

very interestingly, Vercel [1] (which uses AWS for its deployment pipeline), does not use Codepipeline, Codebuild, etc. It uses Fargate for build performance reasons [2]

[1] https://vercel.com/blog/behind-the-scenes-of-vercels-infrast... [2] https://twitter.com/cramforce/status/1619082343259705344