Fun fact: I debated whether to call it "the Heroku way" or somesuch. Glad I went with a standalone name, feel like that allowed it to take on a life beyond that product. For example I doubt Google would have wanted a page about "Heroku Way app development on GCP" in their documentation. :-)
I agree, I think the fact that it's not called "the [platform/software] way" is a big factor in its success - it's easier to quote publicly and to external clients when it's seen as a more agnostic set of principles.
Out of interest, in hindsight do you think that any of the 12 factors could be tweaked/improved (since it was originally written)?
Again, like 12factor, it's a collection of themes from around the internet but it also has stood the test of time.
In fact, my startup EnvKey was heavily inspired by the 12 factor approach. While it has always worked well for me, one bit that always felt thorny was using the environment for configuration and secrets. It’s obviously great to get this stuff out of code, but then you face new issues on how to keep it all in sync across many environments and pass potentially highly sensitive data around securely.
EnvKey fills in this gap using end-to-end encryption and a seamless integration that builds on top of environment variables—it’s a drop-in replacement if you already use the environment for config. Check it out if you’re looking for something to smooth out this aspect of 12 factor! We have lots of folks using it with GCP/GKE.
1 - https://www.envkey.com
I wanted to put secrets as env vars for my ECS containers. There wasn't an easy to do that. I wound up writing an entrypoint inside the container that would download secrets from AWS SSM and expose them to application code via env vars.
Then AWS ECS released the feature to specify env vars as SSM values naively from the ECS agent.
It felt good knowing someone else saw this as the "right" way too.
I will say that Factor 9 (Disposability) seems rarely followed in many applications, SIGTERM is not handled gracefully.
This has the added benefit of making the datacenters easier to manage. Say you have a bunch of workloads packed into racks of servers. Say one of those racks needs an electrical upgrade or hardware replacement. If all programs are preemptible by design, you can just tell cluster management software (Borg) to kill them and restart the tasks elsewhere. Or, in fact, you could even just pull the plug on the rack without telling Borg to do anything. Because workloads are spread out between fault domains, only a small fraction of tasks gets restarted elsewhere, and a properly designed system will not corrupt data or even let the external users know that anything happened.
Yes, you need to design your code to withstand disaster shutdown SIGKILL type situations. That doesn't mean you get to ignore SIGTERM.
The vast majority of shutdowns are due to routine maintenance events. If you get SIGTERM-ed and all you did was crash, here's a short list off the top of my head of bad things that can and do result:
* Tail latency goes up, because clients talking to tasks that don't shut down gracefully have to wait for at least one RPC timeout - possibly more if channel timeout is longer - before they retry elsewhere. (This will manifest in multiple ways, because things like load balancers will also be slow to respond.)
* System contention goes up - if your server was holding any kind of distributed lock (e.g. DB write) when it went down, everyone else needs to wait for that lock to time out before someone else can take it. (Hopefully your locks require keep-alives to hold them!)
* Corruption of data in transit - a crashing binary is basically a big ol blob of UB. With enough replication and checksumming you can mitigate this, but it doesn't mean you get to do dangerous things now! Guardrails only work if you don't take a sledgehammer to them.
* I really, really hope you don't have any request affinity in your system, because if you do, now your caches are empty, your dependencies are going to see you doing expensive ops much more frequently, and so on. (And if you're a streaming system, well now you're just all kinds of screwed.)
That's not to say that you can't minimize rather than solve, as its (IMO) easier and faster to introduce minimizing mitigations rather than wholesale solutions to failure and, well, time is dimes. But the reality is that these are just short-term mitigations that you'll outgrow as you scale.
I will reiterate. If your programs are designed to be kill-safe, it's a waste of time to shut them down any other way. It is also harmful to your systems design, because you can't guarantee that your hardware won't give out, especially if you run on e.g. 50-100K cores (as many Google services do). You can basically be certain at that point that individual tasks (and hardware they run on) will die from time to time. Note that this only applies to shut down, not to e.g. draining traffic. That part can and should still be orderly in most systems so as not to disrupt the user experience too much, and Google does have that in their RPC stack (lame duck mode etc). For everything else you end up doing a lot of logging, checkpointing, and 2PC. For distributed systems you end up using consensus mechanisms and eliminating SPOFs.
If you rely on orderly shutdown for correctness in your distributed system, I'd like to know the name of your product so I can avoid it.
Nothing the parent is discussing is regarding correctness, so I think your last sentence is a bit uncalled for.
Personally I think there was the formation of a really good debate. As you said, if you drain the nodes traffic before killing it, you're probably right, any costs associated with maintaining consistency is probably saved by the human aspect of just not waiting for a clean shutdown of processes at scale.
But when you throw in the sass, people stop listening and we all get dumber for it.
Here's one from Netflix that will give you an ulcer: https://github.com/Netflix/chaosmonkey
Here's what Google does: https://www.usenix.org/conference/lisa15/conference-program/...
>> we all get dumber for it
Not _all_. Only those who feel inclined to reject the obvious.
FWIW, I work at Google on distributed systems.
Not SIGTERM per se, but Google Cloud certainly offers graceful shutdown rather than pulling the plug, to minimize harmful effects.
I don't think so... the 12 factor app was existing before that... but came to more common usage during the rise of Rails (and Heroku)
This comment stood out to me. Why does this have anything to do with cloud vs non-cloud?
Furthermore I think there are merits to both of the approaches. But the article phrased it as if one is clearly more superior than the other.
I think we should also not forget that for the longest time, even before there was Google Cloud, there was App Engine that adopted the former approach: you simply write your handlers in Python, and the server provided by Google runs your handlers. Nowadays in the new Python App Engine runtime they added gunicorn by default so it's less clear.
We now only use one of these, and every other variant webserver configuration required a non-trivial amount of work to convert.
What does this have to do with clouds? In non cloud environments, deployments are pretty sticky - you put something in place on a VM and it stays there for an expected lifespan of perhaps a decade, so the fact that the webserver configuration might be hard to port to a different environment is not especially relevant. Conversely, it's easy enough to chop and change cloud environments (and the ability to do so is in some sense their USP!) that the portability issues of external Web servers come to the fore.
This is easier to do when your service is self-contained. It's certainly possible with an app server, but 12-factor apps tend to view the service as an atomic unit that can be started and stopped wholesale.
This is similar to the claim that by using an ORM, you can easily change databases except that the ORM actually can achieve it because it abstracts away the differences. Yet, very rarely does this valuable feature get used.
Rather, it means you should be able to "dependency inject" a compatible database into your app. Say your app is designed to speak Postgres - that Postgres db should be able to run locally, or on a managed cloud service, and the app shouldn't know or care which. You should be able to switch between such services without any code changes.
In practice, this sometimes does mean that multiple database implementations satisfy the interface expected by your app, and you can in fact hot-swap underlying stores. But the spirit of the factor is that the app should have its dependencies ("backing services") provided, rather than hard-coded.
(1) One codebase, one application
(2) API first
(3) Dependency management
(4) Design, build, release, and run
(5) Configuration, credentials, and code
(8) Backing services
(9) Environment parity
(10) Administrative processes
(11) Port binding
(12) Stateless processes
(15) Authentication and authorization
More info: https://content.pivotal.io/blog/beyond-the-twelve-factor-app
It's about damn time, Google! I've been working on a number of GCP projects this year and have consistently bumped up against the awkwardness that was Google's (former?) preference for configuration files over environment variables.
How can you 'store' anything in something as ephemeral as environment variables? Where do these environment variables come from? When are they set?
You should of course store everything in version control; the point of this guideline is that the scope should not be the same as the app or an instance.
There is particular tension between this guideline and X. environments should be a similar as possible and I. there should be a single codebase for all deployments.
This doesn't work for everyones infrastructure setups, but it's worked well for mine.
That works fine if there's a limited number of environments, but it becomes a pain if you're running the same app in too many, or if someone else needs to deploy it independently of the developers.
No it doesn't. It's trivial to generate the full set of configs using a script, if you have too many to manage manually.
or if someone else needs to deploy it independently of the developers
And how exactly do envvars help with that? The set of envvars still needs to get pushed with each deployment. envvars are just a special config file.
> requires strict separation of config from code. Config varies substantially across deploys, code does not.
What's the difference between config and code? Config is that which changes between deployments / environments.
Basically, you'd have a git repo for code, and a separate git repo for environment scripts. I've seen organizations use a whole separate infrastructure for config, but that always seemed like overkill to me.
It's definitely more convenient to use the same repo, especially in the declarative containerized world; you can make sweeping changes or roll them back in a single atomic commit.
That said, if open source is the intent, certainly use multiple repos from the beginning so you avoid git gardening when releasing later.
Set at build time. They come from a build config file/script as part of the build. They're not perfect, but just about every tool ever made has a way of understanding environment variables - the same can't be said for any given config file.
Here's a real world example: I want one source of truth for my build configuration, but my build itself pulls together libraries and binaries in a number of different languages and build systems. How would you handle that with anything but environment variables and a master shell script?
But the script is in VC surely?
> How would you handle that with anything but environment variables and a master shell script?
Again, where is that master shell script? What args do you give it?
My preference is that you have one environment variable, and your config is keyed off that.
If that environment variable is not supplied, fail noisily.
You can take it one step further and define a single ConfigMap resource per environment, containing all of the required variables across your application, and then have all the various containers reference it ‘blindly’ when they are deployed. Said configuration resource would exist in version control but deploy independently from the functional codebase(s).
The idea is that the application environment / infrastructure should own the config data, not the application code itself. 12 factor practices require that devs draw a boundary between app and infrastructure.
Given a config file per 'environment', with environment being an arbitrary identifier, what benefit do environment variables have over it? I've read some docs on 12 factor apps looking for an answer but didn't really find anything. Just curious what I'm overlooking
1. Bash, Python, C, Java...super easy to access; no libraries needed.
2. They automatically propagate through shell scripts and worker processes.
3. They are also super easy to specify when starting programs, in shell, docker, child process from programming languages, Lambda functions, etc.
4. They can easily exist independently of other env variables on the same system.
5. The model is dead simple. Key-value. Simplicity is a feature.
If you need to specify something complex (e.g. hundreds of tuned timeouts), consider using files or code for the heavy parts and env vars to select parts.
For #4, what do you mean? If running in a container it's not really an issue, but if I'm loading a couple of different dependencies/apps in the same container, or running without containers, name collisions have always been a concern. Set up a chroot, or start renaming things (i.e., "PORT" conflicts really quickly). Or are you saying supply them in the shell, or...?
I mean that one process can have COLOR=blue and another can have COLOR=red with no conflict.
PORT=123 COLOR=blue ./myapp
PORT=456 COLOR=red ./myapp
There aren't any irremovable differences between env vars and files; env vars are basically /proc/ files with widespread runtime support.
Sometimes that location is communicated through an env var, and that point why not just set all config that way?
Environment variables had the early upside of being first class in any virtually any deployment strategy - from Bash wrappers, supervisord to Docker and Kubernetes - so it's pretty much plug and play. For example, Docker has config files, but they're mounted to a specific place in your container, so that means your application is now married to that location.
It's kinda funny how cloud forces people to finally use patterns recommended since years.
This is interesting. Can you go into more detail?
Fairly easy to integrate with Kubernetes too.