
Doing Python Configuration Right (2019) - whalesalad
https://whalesalad.com/blog/doing-python-configuration-right
======
dashwav
I recently ran into a few roadblocks as well when trying to get my python
applications set up in a true 12factor way, and I was surprised by how much I
had begun to take for granted the ease of configuration that the Go library
Viper provided. I went ahead and wrote a library
[Gila]([https://gitlab.com/dashwav/gila](https://gitlab.com/dashwav/gila))
that implements a similar set of features that I think does a very good job of
allowing for 12factor app building in python.

Specific to the article, my way of handling the issue of "turn everything into
an environment variable" is to rely on the cascading feature of Gila/Viper and
store sane defaults in code that only are overridden when need be. This allows
for default values to be set, overridden by config files provided, and further
overridden either by code or by ENV variables.

This would allow us to instead of having config written in python (something I
usually try to stay away from) we could have it written in something like YAML
or TOML instead and loaded in on the fly in each environment, while still
allowing for overrides in either the code or by ENV variables.

I recently (last few months) have switched all of my python projects over to
using Gila and I have been super impressed by how well the language itself
lends to building dockerized 12factor apps when using tools made for that
purpose - which is a sentiment I would not have had last year at this time.

[https://gitlab.com/dashwav/gila](https://gitlab.com/dashwav/gila)

[https://gila.readthedocs.io/en/latest/index.html](https://gila.readthedocs.io/en/latest/index.html)

I appreciate any feedback either the author or HN in general have as well.

~~~
notdonspaulding
Just lightly skimmed the docs, so I am probably missing something. Do all the
consumers of gila config use the singleton class instance and just call
gila.get() everywhere?

If I'm just focusing on how to make my code easier to read by my fellow
developers, I would want to see something like this be based off of a
dictionary-style API, that could be imported and used like so:

    
    
        import gila as config
        bucket_name = config["bucket_name"]
    

Since .get() is already so close to this API, I wonder if you considered this
and rejected it for a specific reason?

~~~
dashwav
First off, thank you for looking at the project and giving feedback - I really
appreciate it!

\- Do all the consumers of gila config use the singleton class instance and
just call gila.get() everywhere?

You can either use the singleton pattern (which is the recommended way of
utilizing the library) or you can assign the Gila object to a local variable
and ship it around your code (this would allow you to have two separate
configurations at the same time). You could see that here:
[https://gitlab.com/dashwav/gila/-/blob/develop/examples/mult...](https://gitlab.com/dashwav/gila/-/blob/develop/examples/multi-
file-example/example.py#L39)

I also think striving for readability is the way to go when building code in
general, and that is actually one of the reasons I opted to keep the `.get()`
syntax! This may come down to personal opinions but I think that when the
behavior of a library is fundamentally different than that of a base type like
dict it should be explicit that a library is being used.

While dictionaries also have a `.get()` syntax I think that seeing
`config['value']` would lead a developer to make the assumption that config is
simply a dictionary, and this might lead to erroneous code whereas
`gila.get("value")` (or even `config.get('value')`) will make it more obvious
that this is not a dictionary but actually a library that is doing more than a
dictionary under the hood.

As a final point this syntax keeps it closer to the Viper go library that I
was inspired by and therefore eases the transition between the two should
anyone have familiarity with either library.

------
zomglings
As someone who has been on call for applications with such complicated
configuration logic, this looks like a treasure chest of future frustration.

One reason that 12 factor apps use environment variables to store
configuration is because then you have a single source of truth on how your
application is configured.

Using the principles of the 12 factor app, someone who had no hand in writing
the code doesn't have to wrap their heads around whatever convoluted logical
path your application takes to figuring out which Redis instance it will
connect to, especially during an outage in which their priority is to fix your
service as quickly as possible. They can in an emergency SSH into whatever
machine is hosting your application or exec into the docker container and
inspect the environment to have an almost complete picture of how your
application is configured.

The "globals().update(vars(module))" line actually increased my blood
pressure. Please show some consideration to your teammates that help you keep
your applications running.

Have had nothing but bad experiences keeping apps running that used dotenv in
node land and viper in go land. The 12 factor app people did not speak
lightly.

~~~
whalesalad
I agree with you 100%. There is nothing stopping you from using environment
variables in these configuration files. I do it all the time, in fact. If you
scroll down to real world examples I share a few examples, one is for
connecting to Redis. (grep page for os.environ)

This approach is not mutually exclusive with 12-factor. It is complementary to
12-factor.

> They can in an emergency SSH into whatever machine is hosting your
> application or exec into the docker container and inspect the environment to
> have an almost complete picture of how your application is configured.

Totally agree with this too. I build apps with ops in mind and this specific
realtime debug ability.

> The "globals().update(vars(module))" line actually increased my blood
> pressure.

Being a little dramatic here, aren't we? Please elaborate on what is wrong
here. I have been writing Python professionally for approaching 12 years, so I
am very interested. It is certainly not something you would see every day, but
this is not an area of the codebase you visit on a daily basis. Also, as a
sidenote, I would strongly encourage you to look under the hood at some of
your favorite/popular Python libraries. You'll vomit out your nose.

> Please show some consideration to your teammates that help you keep your
> applications running.

I forklifted FarmLogs from Heroku to a homebrew Kube cluster in 2016, led
operations and support with that tool (40 microservices around the time I
left) all while continuing to be an IC and ultimately became the CTO. I have
an extreme amount of consideration for others when building software (almost
to a fault). Clearly I have some work to do on my writing.

> Have had nothing but bad experiences keeping apps running that used dotenv
> in node land and viper in go land.

Yes. IMHO viper and dotenv strictly hobby tools that should be used for
projects that involve a single developer and a single environment. I agree
whole heartedly that they encourage bad practices and make things tough to
debug and find.

I cannot tell you how many times I had to matrix my way into a running
container and try to figure out what the fuck was going wrong. Let me tell you
... when you have a microservice that depends on 50 environment variables to
_boot_ you are not going to be happy. If you can shove as much of that INTO
the application as possible (without handicapping yourself or your fellow team
mates in the production environments, losing flexibility in deployment target,
etc..) why would you not want to do it? That is ultimately what I want people
to gain from this.

~~~
zomglings
Sorry if I was abrasive in my response. Let me address two of the points here:

>> The "globals().update(vars(module))" line actually increased my blood
pressure.

> Being a little dramatic here, aren't we? Please elaborate on what is wrong
> here.

and

>> Please show some consideration to your teammates that help you keep your
applications running.

The reason I don't like this is that it first dynamically imports the right
module and then implicitly sets global variables using the variables from that
module using vars. This means that someone couldn't use simple greps or such
searches to trace the logic through which a parameter was set (a go-to tool
when debugging code someone else put into a docker container).

\- - -

> Clearly I have some work to do on my writing.

Your writing is good. I disagree with your approach to solving the problem.
Willing to accept that my position is wrong, but am reluctant to do so since
it's built on bitter experience.

------
jbergknoff
> One of my favorite Python features is the way that the files and directories
> your application is made of map one-to-one with how you import and use them
> in code.

Funny to see this stated explicitly in this way. In my opinion, this is one of
Python's biggest flaws (I'm a big fan of everything but the module system).
Paths to files on disk should be treated as string literals, not magic
unquoted strings that look like they're language keywords.

You can easily end up in situations where a directory in your project's tree
has a name conflict with some library, and this causes issues, which is mind-
bogglingly bad design (incidentally: also not a one-to-one map). If you don't
live and breathe Python, and accidentally put a hyphen in a filename, God help
you.

~~~
earthboundkid
There's also the thing where you used to need to have a magic __init__.py file
for a folder to be a considered as a module, but now it's optional, although I
still have no idea what the formula is for when Python decides something is a
module…

~~~
globular-toast
It's not really optional. There's just this new thing called namespace
packages which is not quite the same. I don't fully understand what it's for
yet.

~~~
greenshackle2
Namespace packages let you ship `foo.bar` and `foo.baz` separately. `bar` and
`baz` are really two separate packages that are both part of the `foo`
namespace. `foo` is not a real package.

They are not new but _implicit_ namespace packages are new-ish. Before 3.3 it
was more complicated to set up, now you just omit __init__.py from `foo`.

I don't see any reason to use this if `foo` is shipped as a single package.

~~~
earthboundkid
If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

------
agounaris
In a cloud native world, code should not be aware of the environment. I don't
see a reason to have multiple files per env and not just different
configuration. This kinda of creates a mental model against true
configurability.

The solution with multiple files actually goes against the 12fapp

"Sometimes apps batch config into named groups (often called “environments”)
named after specific deploys, such as the development, test, and production
environments in Rails. This method does not scale cleanly"

~~~
whalesalad
Nothing is black and white. There is a multi-dimensional scale that your
application will land on. The dimensions might be static (yaml) versus dynamic
(py) or all data stored outside the app (ENV vars only or similar, devout
adherence to the 12 factor religion) versus storing config 100% in your app.

If you plotted those axis' on a four quadrant grid you'd be able to draw a
little dot somewhere on it for your particular app. Your app is not my app. My
apps are not always the same, either. My personal blog for instance has a far
simpler configuration than the tools I build for my clients.

"In a cloud native world, code should not be aware of the environment."

This is a dogmatic fallacy. There are tons of reasons code should be aware of
its environment. The problem is that most people are unable to agree on what
"aware of its environment" actually means. I do not agree that my code should
not know what the staging or production environment is. This is _literal_
devops! Developing and building your application/system with the logistics of
shipping it in mind.

When the dust has settled from configuration arguments the most important
thing you can do when designing your application is to first replace the word
"application" with the word "system" in your mind and begin to think about it
as a system. The system contains subsystems: config, persistence,
presentation, blob storage, logging, etc... and all of these are self-
contained and isolated units that start and stop for their own unique reasons
during the app/system lifecycle. If you design things to allow for this
decoupling, the right approach for configuration will manifest itself.

------
clawlor
Author claims frustration with Django's settings, only to re-implement it
almost exactly. How is ENV=production any different from
DJANGO_SETTINGS_MODULE=myproject.settings.production?

Sure, by default Django may create a single settings.py, but common practice
is to split that into a settings package, containing a base settings module
for common settings, and other files for different scenarios, say
"development" and "production". Either / all of them can load secrets etc.
from the environment, so the "production" settings file is probably better
thought of as "deployment" settings, when supporting multiple deployed
environments e.g. staging and production.

~~~
whalesalad
I don't like to build applications that are 100% coupled to a framework. In my
ideal world, Django (if it is being used) is just one puzzle-piece of my
application.

Then again it will depend greatly on the surface area of your application. A
small app will do fine with a single configuration file or the DJANGO_
approach, but a larger application deserves its own config.

------
jcrawfordor
Applications that load their configuration different depending on
"dev/staging/prod" environment are one of my greatest enemies. It's a
bizarrely common pattern that creates a whole level of confusion,
miscommunication, and errors in figuring out the configuration of applications
running on developer computers, in testing, and in production. Even better,
since one of the sets of configuration is usually obtained from file or env
vars, there's no clear advantage over not having this switch at all. So you
sometimes end up with one of the environments used in all cases, often "dev"
because that's the one the developers started using.

Your configuration paradigm should be: simple. Environment variables are
probably preferable because they're widely supported by different tools (on
account of being a "12 factor app" thing for whatever that's worth), but I'd
say a file is also just fine, so long as it's easy to figure out where it's
being loaded from (I don't want to see any "configuration is merged from files
found at the following five paths and then overridden with environment
variables..."). The chief concern in operations is being able to quickly and
easily determine the set of configuration in use, and strangely enough,
developers appreciate this too.

The overhead saved by having some "dev/prod" switch or override scheme is
really not overhead at all, if you set one environment variable in your
dev/test/prod infrastructure you can set ten.

~~~
whalesalad
> It's a bizarrely common pattern that creates a whole level of confusion,
> miscommunication, and errors in figuring out the configuration of
> applications running on developer computers, in testing, and in production.

This stems from not having one person or one team design the entire
configuration system that will support the entire application.

All the mayhem you are referring to is certainly possible. I have experienced
it too. But I would posit that this is not due to having 'named environments'
but rather just a poor implementation with no leadership.

~~~
jcrawfordor
I would disagree with you, in that I encounter this exact problem with first-
party software on a team of three. Yes, anyone with total knowledge of the
configuration system can figure out the configuration state, but my point is
that it should be as easy as possible without any complications - because in
practice, even if "everyone knows everything" many mistakes and
inconsistencies are introduced.

~~~
whalesalad
This is one of those things that is tough to discuss wihtout really seeing
code. We might be talking about two different things right now. I can
certainly understand the issue you are discussing, but I feel like I have been
able to avoid it with this approach.

Another thing that I did not put in the post but that I tend to add-in to this
concept is a 'dump my env' function. I will occassionally bundle this up and
write it out when certain relevant issues or crashes happen.

    
    
        def get_snapshot():
            """
            Provides a snapshot of the current system configuration.
    
            """
            return filter_dict(vars(module), exclude_predicate)
    

Very handy. From my ipython shell I can just hit config.get_snapshot() and
understand the state of the world.

------
jperras
I've been using Dynaconf:
[https://github.com/rochacbruno/dynaconf](https://github.com/rochacbruno/dynaconf)
for application configuration, and it's been quite pleasant.

It has various options for cascading configuration sources, such that it makes
it relatively straightforward to build a hybrid approach in the event that
you're attempting to migrate from a file-based config system to a more modern
12-factor approach.

At one point I needed to write a custom config source for Amazon Secrets
Manager, and it proved quite easy to implement. Perhaps I should open source
that…

------
vkgfx
The whole "env is figured out behind the scenes and work is done to autofetch
the correct values to load into the CONSTANT_CASE variables you import all in
one line" thing seems to be a pretty clear violation of "Explicit is better
than implicit."

In my experience the productivity you gain from such abstractions is
negligible compared to the complexity they add to your code.

------
m23khan
If it's one thing I find about working efficiently with Python development and
execution env, - it's that you should plan for its configuration from day 1 to
accomodate items like multiple python versions and to have a well defined idea
of what you need (e.g. is python + pip installation enough or do you want to
utilize anaconda).

------
saco
I don't get how this can be used anywhere except when you have 2 deployment
targets which seems really limited.

Also checking for environments is done weirdly, why not use

if 'ENV' not in os.environ:

instead of

ENV = os.environ.get("ENV", None)

if ENV is None:

~~~
karlicoss
Maybe just a habit, that way you don't have to repeat the literal "ENV" twice.
In this case it doesn't matter much though.

Also it's a bit better from the performance perspective (but not that it also
matters in this case).

~~~
bluntfang
>(but not that it also matters in this case)

Then why mention it?

------
skohan
The python environment story has led me to the conclusion that it's not really
suited for serious production software.

I was recently onboarded into a project which is using Python in a server-less
setting, and when my colleague walked me through the concept of setting up a
virtual environment for each one of my python projects I could not really
believe it. It seems like you practically need to containerize a python code-
base to make it repeatable.

~~~
mumblemumble
I used to think that, end then I just realized that this is really just Python
showing its roots as a relatively old programming language with Unix roots.

C and C++ have a a fairly similar problem with dependency hell due to shared
libraries being installed in a single system-wide location. They have their
own hacks for getting dealing with it. I don't know which way of doing it is
technically superior, all I know is none of them offer a developer experience
that I'd choose for myself if I were designing a new programming language in
this day and age.

FWIW, I give Python some credit for coming up with a solution that doesn't
rely on containerization, and therefore will work consistently on many OSes,
and not just Linux. Containerization is an option, too. It's really down to
whether you need a system that works for one language on many OSes, or one
that works for many languages on one OS.

Or, if you'd rather keep it really simple, there are zipapps, which are
Python's answer to fat binaries. Similar to uberjars in Java. Or to good ol'
static compilation in most the languages I like to use even when someone isn't
paying me to do it.

(That said, there is one thing about environment management in Python that
absolutely drives me up the wall: There are more than a fistful of competing
tools for doing it, and a zillion different ways to work with them, and no
clear preferred way. Which sucks for everyone, and makes learning the ropes
without a good mentor way more of a chore than it needs to be. That's not too
far off from C, either, though.)

~~~
qqssccfftt
> There are more than a fistful of competing tools for doing it, and a zillion
> different ways to work with them, and no clear preferred way.

Generally, newer projects are settling around Poetry.

~~~
newen
There is a good chance it will be something else in a few months.

~~~
qqssccfftt
It's stuck around for the last year. Pyproject.toml is standardised.

------
1337shadow
It's been a while since I've been relying on environment variables per-
configuration key. I haven't defined profiles such as dev or production for a
really long time, with Django projects in particular. I have collected a
handful of little secrets that I can share with you.

I do have if DEBUG: based switch all along my settings file, and enforce
DEBUG=True in manage.py at the same time: in manage.py set os.environ['DEBUG']
= '1', and in settings.py DEBUG = bool(os.getenv('DEBUG', False)).

My objective is that when a new developer clones the repo, and runs
./manage.py runserver or shell, it will load with all development settings by
default.

In theory this could cause the problem that running manage.py commands in
production would run them with DEBUG=True. I tried that, and in practice this
turned out to never be a problem. But this could certainly bite you if you're
not careful.

In particular, I also spread env vars in two config files: docker-
compose.persistent.yml for persistent deployments, and docker-
compose.ephemeral.yml for ephemeral deployments.

To complete that, because I don't want to store secrets in the git repository,
I do have things like that to load multi-line environment variables in CI:

    
    
        - export $(echo $PROD_EMAIL_SETTINGS | xargs)
        - export $(echo $PROD_ENV | xargs)
        - export $(echo $PROD_SOMEAPI | xargs)
    

So for example, in PROD_EMAIL_SETTINGS I have a a multiline value with
EMAIL_SOME_SETTING=something for every setting that settings.py will fetch
from the environment to provision the Python email settings.

And if I want the production email server settings in staging and production,
I can load the set of variables that configures for the production email
server with that single line.

Anyway, for me the best setting profile management, is no profile management
at all: local clone should have development settings by default and just work
with 0 config, configuration for deployments in compose files for non-secrets,
in CI vars for secrets, and I should be able to easily decide if I need any of
the settings that are in production to develop something in which case I end
up building a custom profile with environment variables (ie. to debug a
blackbox such as a partner API) just "for the time being", for example I have
dumped the blackbox request/responses and then build up my unit tests based on
that.

I'm extremely happy with that trivial setup that I have in a bunch of projects
now for the last years, I thought that's what everybody who's not using a
profile-based system was doing, I thought that was the essence of
12factor/config, at least that's what I understood.

~~~
whalesalad
Mitigating human errors is certainly important. The brittle nature of your
onboarding story is exactly why I wrote this post and encourage the practice
in it.

Larger apps that I deploy have a development.py.example file that people can
copy to development.py (or fredsmith.py, madonna.py, it doesn't matter. These
would not be checked-in to Git)

Then all I need to do before running anything in my project is prefix with
ENV=fredsmith.

As a team we could settle on a different name for this variable. We could all
have our own environments. We could even decide, as a team, to check them into
Git so that we can all see each others config in case there is an issue.

~~~
1337shadow
Copying a file is a manual step, that's the kind of step that I strive to
remove from workflows I provide to others.

Using different names for the same thing across different teams also looks
like something I'd typically strive against.

