Hacker News new | past | comments | ask | show | jobs | submit login
Twelve-factor app development on Google Cloud (cloud.google.com)
228 points by 9nGQluzmnq3M 17 days ago | hide | past | web | favorite | 63 comments

I'm the author of 12factor (although really it is an aggregation of the work and insights from many people at Heroku). It continues to surprise and please me that this piece continues to be relevant eight years later—a virtual eternity in software/internet time.

Fun fact: I debated whether to call it "the Heroku way" or somesuch. Glad I went with a standalone name, feel like that allowed it to take on a life beyond that product. For example I doubt Google would have wanted a page about "Heroku Way app development on GCP" in their documentation. :-)

Just wanted to thank you for this. When I was a lost and confused junior, I had no one more experienced around to ask for advice. I stumbled across this and it answered so many questions that I had, and it still very much applies many years later.

Great work :) When we switched from a monolith to microservices in a previous job, this was mentioned a lot - both internally and to external clients in presentations etc.

I agree, I think the fact that it's not called "the [platform/software] way" is a big factor in its success - it's easier to quote publicly and to external clients when it's seen as a more agnostic set of principles.

Out of interest, in hindsight do you think that any of the 12 factors could be tweaked/improved (since it was originally written)?

I wonder if your Heroku values, would be more widely known if it had a more generic name


Again, like 12factor, it's a collection of themes from around the internet but it also has stood the test of time.

12 factor seems to have stood the test of time really well—I was introduced via Heroku (who I think invented it?) quite a long time ago in tech years, and yet it still seems to be probably the most popular ‘framework’ for devops.

In fact, my startup EnvKey[1] was heavily inspired by the 12 factor approach. While it has always worked well for me, one bit that always felt thorny was using the environment for configuration and secrets. It’s obviously great to get this stuff out of code, but then you face new issues on how to keep it all in sync across many environments and pass potentially highly sensitive data around securely.

EnvKey fills in this gap using end-to-end encryption and a seamless integration that builds on top of environment variables—it’s a drop-in replacement if you already use the environment for config. Check it out if you’re looking for something to smooth out this aspect of 12 factor! We have lots of folks using it with GCP/GKE.

1 - https://www.envkey.com

Indeed it has.

I wanted to put secrets as env vars for my ECS containers. There wasn't an easy to do that. I wound up writing an entrypoint inside the container that would download secrets from AWS SSM and expose them to application code via env vars.

Then AWS ECS released the feature to specify env vars as SSM values naively from the ECS agent.

It felt good knowing someone else saw this as the "right" way too.


I will say that Factor 9 (Disposability) seems rarely followed in many applications, SIGTERM is not handled gracefully.

Program shutdown is not supposed to be handled "gracefully", simply because there's no guarantee that your program will be able to gracefully shut down. Google itself doesn't handle program shutdown at all. Your program must be written so that it's safe to outright kill it at any moment, because at scale that's what's going to happen from time to time whether you want it or not. It is best to shed this illusion that your program will have the opportunity to shut down in an orderly fashion, because when it runs on tens of thousands of computers, you can pretty much count on graceful shutdown not happening at least every now and then.

This has the added benefit of making the datacenters easier to manage. Say you have a bunch of workloads packed into racks of servers. Say one of those racks needs an electrical upgrade or hardware replacement. If all programs are preemptible by design, you can just tell cluster management software (Borg) to kill them and restart the tasks elsewhere. Or, in fact, you could even just pull the plug on the rack without telling Borg to do anything. Because workloads are spread out between fault domains, only a small fraction of tasks gets restarted elsewhere, and a properly designed system will not corrupt data or even let the external users know that anything happened.

I'm sorry, but this is mostly wrong.

Yes, you need to design your code to withstand disaster shutdown SIGKILL type situations. That doesn't mean you get to ignore SIGTERM.

The vast majority of shutdowns are due to routine maintenance events. If you get SIGTERM-ed and all you did was crash, here's a short list off the top of my head of bad things that can and do result:

* Tail latency goes up, because clients talking to tasks that don't shut down gracefully have to wait for at least one RPC timeout - possibly more if channel timeout is longer - before they retry elsewhere. (This will manifest in multiple ways, because things like load balancers will also be slow to respond.)

* System contention goes up - if your server was holding any kind of distributed lock (e.g. DB write) when it went down, everyone else needs to wait for that lock to time out before someone else can take it. (Hopefully your locks require keep-alives to hold them!)

* Corruption of data in transit - a crashing binary is basically a big ol blob of UB. With enough replication and checksumming you can mitigate this, but it doesn't mean you get to do dangerous things now! Guardrails only work if you don't take a sledgehammer to them.

* I really, really hope you don't have any request affinity in your system, because if you do, now your caches are empty, your dependencies are going to see you doing expensive ops much more frequently, and so on. (And if you're a streaming system, well now you're just all kinds of screwed.)

To me, this is focusing on minimizing the occurrence of failure scenarios rather than solutions for them. You should not only be asking yourself what you're ok with happening in failure scenarios, but also what happens when rare scenarios suddenly become not so rare. And you should be actively testing these decisions by failing often, to improve your confidence that when unplanned failure occurs your system will be resilient.

That's not to say that you can't minimize rather than solve, as its (IMO) easier and faster to introduce minimizing mitigations rather than wholesale solutions to failure and, well, time is dimes. But the reality is that these are just short-term mitigations that you'll outgrow as you scale.

You need to be resilient to all of these things to handle normal failures. So "hard shutdown" is just "free extra chaos test."

I hardly ever felt so strongly about anything in software engineering, although I will concede this is an acquired taste, and it took me some time to see the obvious truth in this.

I will reiterate. If your programs are designed to be kill-safe, it's a waste of time to shut them down any other way. It is also harmful to your systems design, because you can't guarantee that your hardware won't give out, especially if you run on e.g. 50-100K cores (as many Google services do). You can basically be certain at that point that individual tasks (and hardware they run on) will die from time to time. Note that this only applies to shut down, not to e.g. draining traffic. That part can and should still be orderly in most systems so as not to disrupt the user experience too much, and Google does have that in their RPC stack (lame duck mode etc). For everything else you end up doing a lot of logging, checkpointing, and 2PC. For distributed systems you end up using consensus mechanisms and eliminating SPOFs.

If you rely on orderly shutdown for correctness in your distributed system, I'd like to know the name of your product so I can avoid it.

I think you're talking past each other a bit here.

Nothing the parent is discussing is regarding correctness, so I think your last sentence is a bit uncalled for.

I'm actually serious about that last bit. If your distributed system relies on guarantees it _does not have_ in order to operate correctly, one would be well advised to stay away from it.

You can be super serious about it all you want, but it's accusatory and doesn't reflect the post you responsed to, which referred to performance and user experience implications with shooting nodes in the head.

Personally I think there was the formation of a really good debate. As you said, if you drain the nodes traffic before killing it, you're probably right, any costs associated with maintaining consistency is probably saved by the human aspect of just not waiting for a clean shutdown of processes at scale.

But when you throw in the sass, people stop listening and we all get dumber for it.

m0zg 15 days ago [flagged]

But there's no "debate" here to be had. All the high scale companies (Google, FB, Amazon, Microsoft, Netflix, others) do not rely on their distributed system nodes being able to wind down in an orderly fashion. Shit, Netflix and Google (and likely others as well) stage fault tolerance exercises, taking random nodes (or entire datacenters) out of rotation and checking if things still work. There's no way to get to five nines if you expect your program to always behave.

Here's one from Netflix that will give you an ulcer: https://github.com/Netflix/chaosmonkey

Here's what Google does: https://www.usenix.org/conference/lisa15/conference-program/...

>> we all get dumber for it

Not _all_. Only those who feel inclined to reject the obvious.

You didn't read my comment at all. I explicitly said that yes, you need to design and plan for disorderly shutdown, but orderly shutdown is an order of magnitude more common and you therefore need to also account for that. We don't design products based on guarantees that we don't have, we design for the problem space.

FWIW, I work at Google on distributed systems.

> Google itself doesn't handle program shutdown at all.

Not SIGTERM per se, but Google Cloud certainly offers graceful shutdown rather than pulling the plug, to minimize harmful effects.


Cloud, yes. Borg, no.

Twelve factor is one of the few methodologies where, when I drift away from its suggestions, I always stop and ask myself why, because there's a good chance I'm making a mistake.

Love seeing this! After my time at Heroku I briefly spiked out a similar idea, but never managed to get enough traction to justify making it a full-time effort. I'm excited to see someone actually executing on it. Well done.

> I think invented it

I don't think so... the 12 factor app was existing before that... but came to more common usage during the rise of Rails (and Heroku)

Both https://12factor.net/ and the Wikipedia page https://en.wikipedia.org/wiki/Twelve-Factor_App_methodology credit Heroku developers for formalizing the 12 factor methodology.

If you go to the bottom of the page, you'll notice that https://12factor.net/ is written by Adam Wiggins, a Heroku cofounder.

It was (former Herokai here).

> In non-cloud environments, web apps are often written to run in app containers such as GlassFish, Apache Tomcat, and Apache HTTP Server. In contrast, twelve-factor apps don't rely on external app containers.

This comment stood out to me. Why does this have anything to do with cloud vs non-cloud?

Furthermore I think there are merits to both of the approaches. But the article phrased it as if one is clearly more superior than the other.

I think we should also not forget that for the longest time, even before there was Google Cloud, there was App Engine that adopted the former approach: you simply write your handlers in Python, and the server provided by Google runs your handlers. Nowadays in the new Python App Engine runtime they added gunicorn by default so it's less clear.

I might be able to elaborate on this a bit. At $JOB, we're coming to the end of migrating a bunch of PHP services to Kubernetes from their various homes. I have seen the following configurations: Apache/mod_php,.htaccess disabled; Apache/mod_php,.htaccess required; Nginx/PHP-FPM colocated; Nginx/PHP-FPM using remote FastCGI;

We now only use one of these, and every other variant webserver configuration required a non-trivial amount of work to convert.

What does this have to do with clouds? In non cloud environments, deployments are pretty sticky - you put something in place on a VM and it stays there for an expected lifespan of perhaps a decade, so the fact that the webserver configuration might be hard to port to a different environment is not especially relevant. Conversely, it's easy enough to chop and change cloud environments (and the ability to do so is in some sense their USP!) that the portability issues of external Web servers come to the fore.

Primarily because instances are disposable. Getting more traffic to your Foo service than you expected? Spin up another Foo, and let the load balancer figure it out. Then, an hour later, when traffic goes back to normal, kill the new instance.

This is easier to do when your service is self-contained. It's certainly possible with an app server, but 12-factor apps tend to view the service as an atomic unit that can be started and stopped wholesale.

These are orthogonal. Have you used Google App Engine prior to Python 3? When you get more traffic, the App Engine spins up more instances, each running both the handlers you wrote and the server code you didn't write.

Backing Services section is over promising. Changing databases isn't a simple configuration change. Every database doesn't speak the same SQL. It doesn't have the same optimizations nor configuration. Also, your organization probably won't be making that change-- ever. Even if you could swap, you very likely won't.

This is similar to the claim that by using an ORM, you can easily change databases except that the ORM actually can achieve it because it abstracts away the differences. Yet, very rarely does this valuable feature get used.

The backing services factor doesn't mean that you should expect to be able to hot-swap database types.

Rather, it means you should be able to "dependency inject" a compatible database into your app. Say your app is designed to speak Postgres - that Postgres db should be able to run locally, or on a managed cloud service, and the app shouldn't know or care which. You should be able to switch between such services without any code changes.

In practice, this sometimes does mean that multiple database implementations satisfy the interface expected by your app, and you can in fact hot-swap underlying stores. But the spirit of the factor is that the app should have its dependencies ("backing services") provided, rather than hard-coded.

Ah, that makes sense. Thanks for clarifying

Many organizations are in fact switching databases. It's usually a fairly complicated process, but for a surprising number of companies it makes sense to, e.g. move away from on-prem Oracle to a cloud DB.

Also worth reading is "Beyond the Twelve-Factor App" (O’Reilly). It goes into the 12 factors and expands them to 15 for cloud-native apps. I believe I got a free e-book via Pivotal's content library.

Beyond the 12 factor app:

(1) One codebase, one application

(2) API first

(3) Dependency management

(4) Design, build, release, and run

(5) Configuration, credentials, and code

(6) Logs

(7) Disposability

(8) Backing services

(9) Environment parity

(10) Administrative processes

(11) Port binding

(12) Stateless processes

(13) Concurrency

(14) Telemetry

(15) Authentication and authorization

More info: https://content.pivotal.io/blog/beyond-the-twelve-factor-app

> A better approach is to store configuration in environment variables.

It's about damn time, Google! I've been working on a number of GCP projects this year and have consistently bumped up against the awkwardness that was Google's (former?) preference for configuration files over environment variables.

This is the weakest or least well-defined aspect of 12factor.

How can you 'store' anything in something as ephemeral as environment variables? Where do these environment variables come from? When are they set?

You should of course store everything in version control; the point of this guideline is that the scope should not be the same as the app or an instance.

There is particular tension between this guideline and X. environments should be a similar as possible and I. there should be a single codebase for all deployments.

Another option is rather than storing configuration in config files, is to store it in code. After experimenting with different approaches, storing configuration in code and switching on a single environment variable ENV (or STAGE or whatever you like to call it) this is probably my favourite.

This doesn't work for everyones infrastructure setups, but it's worked well for mine.

A file that stores configuration is a configuration file, even if the encoding format happens to be the same as the main programming language and not toml or yaml :)

That works fine if there's a limited number of environments, but it becomes a pain if you're running the same app in too many, or if someone else needs to deploy it independently of the developers.

it becomes a pain if you're running the same app in too many,

No it doesn't. It's trivial to generate the full set of configs using a script, if you have too many to manage manually.

or if someone else needs to deploy it independently of the developers

And how exactly do envvars help with that? The set of envvars still needs to get pushed with each deployment. envvars are just a special config file.

Convenient enough for dynamic languages. Not so much compiled ones.

Also ... it's a bit circuitous.

> requires strict separation of config from code. Config varies substantially across deploys, code does not.

What's the difference between config and code? Config is that which changes between deployments / environments.

Config also benefits from review and version control, and it's inconvenient to use multiple systems for these. It's been working out well for us to store secrets encrypted with public key crypto alongside source, or in a deployment repository using the same source control. The private key is inserted into the infrastructure as a K8s secret.

One of the concepts twelve factor pushes is that you should be able to release your source code right now without compromising any credentials.

Basically, you'd have a git repo for code, and a separate git repo for environment scripts. I've seen organizations use a whole separate infrastructure for config, but that always seemed like overkill to me.

This is true with encrypted credentials, if security is compromised by release of the repo then the encryption is not secure. I think the spirit of the principle is met, and in cases where you do later release the source code without having planned for it, it's easy enough to move the secrets.

It's definitely more convenient to use the same repo, especially in the declarative containerized world; you can make sweeping changes or roll them back in a single atomic commit.

That said, if open source is the intent, certainly use multiple repos from the beginning so you avoid git gardening when releasing later.

>How can you 'store' anything in something as ephemeral as environment variables? Where do these environment variables come from? When are they set?

Set at build time. They come from a build config file/script as part of the build. They're not perfect, but just about every tool ever made has a way of understanding environment variables - the same can't be said for any given config file.

Here's a real world example: I want one source of truth for my build configuration, but my build itself pulls together libraries and binaries in a number of different languages and build systems. How would you handle that with anything but environment variables and a master shell script?

> They come from a build config file/script as part of the build.

But the script is in VC surely?

> How would you handle that with anything but environment variables and a master shell script?

Again, where is that master shell script? What args do you give it?

My preference is that you have one environment variable, and your config is keyed off that.

If that environment variable is not supplied, fail noisily.

Kubernetes assists with this aspect of passing environment variables to immutable containers. There you add the variables to the deployment code or edit them live on active deployments.

You can take it one step further and define a single ConfigMap resource per environment, containing all of the required variables across your application, and then have all the various containers reference it ‘blindly’ when they are deployed. Said configuration resource would exist in version control but deploy independently from the functional codebase(s).

Configuration isn't stored in environment variables. Rather, they are exposed as environment variables.

The idea is that the application environment / infrastructure should own the config data, not the application code itself. 12 factor practices require that devs draw a boundary between app and infrastructure.

I actually had this question earlier today.

Given a config file per 'environment', with environment being an arbitrary identifier, what benefit do environment variables have over it? I've read some docs on 12 factor apps looking for an answer but didn't really find anything. Just curious what I'm overlooking

Env vars are easy and ubiquitous.

1. Bash, Python, C, Java...super easy to access; no libraries needed.

2. They automatically propagate through shell scripts and worker processes.

3. They are also super easy to specify when starting programs, in shell, docker, child process from programming languages, Lambda functions, etc.

4. They can easily exist independently of other env variables on the same system.

5. The model is dead simple. Key-value. Simplicity is a feature.

If you need to specify something complex (e.g. hundreds of tuned timeouts), consider using files or code for the heavy parts and env vars to select parts.

Thanks! I had thought of about 3 of those, but they aren't really wins for most of my deployment environments (and some of the downsides still are major downsides for me; one thing I love about config files is I can provide a readme or schema in the same place to define what all the config params are and what they do).

For #4, what do you mean? If running in a container it's not really an issue, but if I'm loading a couple of different dependencies/apps in the same container, or running without containers, name collisions have always been a concern. Set up a chroot, or start renaming things (i.e., "PORT" conflicts really quickly). Or are you saying supply them in the shell, or...?

> For #4, what do you mean?

I mean that one process can have COLOR=blue and another can have COLOR=red with no conflict.

    PORT=123 COLOR=blue ./myapp
    PORT=456 COLOR=red ./myapp
Yes you can chroot, run in containers, use only relative file paths, etc.

There aren't any irremovable differences between env vars and files; env vars are basically /proc/ files with widespread runtime support.

Env vars are universally supported, globally available, and simple to manage. Files mean yet another deployment artifact that has to be stored somewhere, and now tied to a location.

Sometimes that location is communicated through an env var, and that point why not just set all config that way?

If you have something like k8s it becomes trivial to use files vs end vars.

How so? You still have to edit the file, save it somewhere, then mount it as a volume and agree on a location.

The configuration file can be inline in the yaml, ConfigMaps take care of shuffling bytes around for you, and for a specific app it's config can always be at a fixed path.

As long as your (production) config files are not checked in with the code, then little difference. The challenge then is: where do you store them?

Environment variables had the early upside of being first class in any virtually any deployment strategy - from Bash wrappers, supervisord to Docker and Kubernetes - so it's pretty much plug and play. For example, Docker has config files, but they're mounted to a specific place in your container, so that means your application is now married to that location.

Hmm, non of the 12 factors is really be specific to the cloud. I mean ok, the implementation of some of the thinks might slightly differ. E.g. you should still abstract over storage, but it's not necessary a external service as this additional external dependency is just not no always necessary.

It's kinda funny how cloud forces people to finally use patterns recommended since years.

I wonder if anyone else does this. I have a private, env-var only repo per project that I very simply import (git clone) while building. I'm sure there are some risks like possible leakage if a team member's github account is compromised but it seems like a very nice, clean way of managing secrets. I even get the benefit of a version control.

We definitely avoid this approach. Fine grained access control per secret per environment is impossible. We use explicit secret management software.

"We use explicit secret management software"

This is interesting. Can you go into more detail?


Fairly easy to integrate with Kubernetes too.

Its interesting to see this 12-factor popping up today. We used this 12-factor app development methodology at one of my client and they had good documentation about it. This was in 2015-2016 and I believe this is still relevant.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact