
Best Practices for Working with Configuration in Python Applications - tgp
https://tech.preferred.jp/en/blog/working-with-configuration-in-python/
======
fermigier
Quick list of Python libraries that help with application configuration:

\- Python application configuration ->
[https://github.com/edaniszewski/bison](https://github.com/edaniszewski/bison)

\- Configuration with env variables for Python ->
[https://github.com/hynek/environ_config](https://github.com/hynek/environ_config)

\- Configuration library for python projects ->
[https://github.com/willkg/everett](https://github.com/willkg/everett)

\- Strict separation of config from code ->
[https://github.com/henriquebastos/python-
decouple](https://github.com/henriquebastos/python-decouple)

This is from my personal notes. See also: [https://github.com/vinta/awesome-
python#configuration](https://github.com/vinta/awesome-python#configuration)

Anything else that we've missed ?

~~~
ungawatkt
Marshmallow. You can use its schema validation for any dict/json, which makes
it a nice fit for validating json config files (which mitigates some of the
json concerns from the article). Just immediately move the json.reads through
a schema validate, build some classes around it for different config files.

marshmallow.readthedocs.io/en/

~~~
albi_lander
dataclass_json is also very useful for schema validation. It combines python's
native dataclass objects with marshmallow's schema to provide additional
functionalities simply through a @dataclass_json decorator on your dataclass.

[https://lidatong.github.io/dataclasses-
json/](https://lidatong.github.io/dataclasses-json/)

~~~
madman_bob
I made the similar dataclasses_serialization library. It doesn't require a
special decorator on your classes, and is extensible for custom classes, and
custom serialization methods (JSON and BSON provided by default).

[https://github.com/madman-bob/python-dataclasses-
serializati...](https://github.com/madman-bob/python-dataclasses-
serialization)

------
alanfranz
While most points are valid, I feel some pieces are missing in this article.

What should I finally do? How to put everything together without creating an
hard-to-maintain mess of casting/parsing/configuration? Should I manually cast
strings to integers (or other types) for all and each value I parse? Where do
I keep default values? It's cumbersome to have them embedded in code for
get().

I usually want a) a default configuration kept in a file b) a way to override
that config with other files, _but only for certain parts_ (I don't want to
rewrite the configuration every time), c) a way to override the config at
launch time (e.g. from cmdline)

The fact that Python is dynamically typed/type hinted only makes it harder
than statically typed languages at configuration, where most configuration
libraries instantiate an adequate type conversion function to put a string
somewhere.

I ultimately found that the latest solution (parse from json) is good enough
for most use cases for points a) and b); since json is typed, a decent
conversion can happen for 95% of the use cases (for the others, just use a
string and manually parse).

A sidenote to the author if he/she's reading: datetime.date objects, just as
python's own naive datetime objects, are dangerous objects that can lead to
unpredictable results when used with actual time-handling code. I wouldn't use
them anywhere in my Python code.

~~~
ggolu2
What’s naive and dangerous about Python’s Datetime objects?

~~~
alanfranz
Your question implies that you don't know about the nuisances of the datetime
library :-) (see
[https://docs.python.org/3/library/datetime.html](https://docs.python.org/3/library/datetime.html)
it's the first paragraph!)

Python datetime objects, by design, can be naive or timezone-aware. Timezone-
aware datetime objects are OK; they identify a certain instant in time.

Naive datetime objects are Python-only abstractions (AFAIK) that don't
identify anything in the real world; they're highly error prone, because
there's no "right" way to use them.

They sort of work properly only if used in a very limited scope - e.g. your
own code only, for small sections - but they're risky because they're not a
different type (with respect to tz-aware), and it's hard to tell what any code
accepting a datetime does if passed a naive object. Some libraries like
java.time DO have a similar concept (e.g. LocalTime, LocalDate) but they keep
it well separate from the "real" concept (e.g. Instant or Date in Java) so you
can't use them accidentally.

Example: you pass a naive datetime object to any library which must translate
it to an instant, like an ISO string with a well defined timezone. What does
the library do? Throw an exception? Associate an arbitrary timezone (e.g.
UTC)? Associate the local, current timezone? There's no "correct" behaviour.

~~~
aucontraire
I agree that python datetime objects are problematic, but for the opposite
reason. It is tzinfo that is the sneaky disaster, the plain datetimes are
fine.

Transparent timezone awareness always fail, unless you are 100% certain that a
tzaware datetime object will remain uncoverted from the very top to the very
bottom of the stack and all the way up again no matter who is reading and what
they are doing.

For longterm minimization of pain, bugs and effort, you convert datetimes to
UTC as early as possible and take them back to some localized version as late
as possible (in the frontend, for a webapp, so that the backend never needs to
know there is such a thing as timezones (except for separate validation and
correction routines, since timezone definitions _always_ end up being
incorrect to some degree when you use them at scale)).

If the localization of the datetime is an essential aspect (such as the
departure time of a ship leaving port), you store a UTC value together with a
record of the location. Only at the latest possible moment of processing,
should you do a lookup on the location data to make a local time.

Obviously, there will be exceptions to this rule. If you batch process
billions of timestamps under a tight deadline and _must_ do calculations in
local time, it might make sense to have the values persisted localized.

~~~
brewmarche
Actually UTC + timezone is exactly the wrong thing for "wall clock times"
(things like meetings or departures where the time at the location is
relevant).

The conversion to UTC will lose the original local time so you cannot retrieve
it once time zone data changes, unless you perform reconversions every time
you detect such a change in tzdata. And countries changing time zones happens
more often than we think (and also on short notice).

Thus it is important to distinguish between instants (e.g. for recording when
exactly something happened after the fact) and wall clock time (e.g. for
coordinating people and goods at a certain place, like meetings, concerts,
departure times). For the former use UTC, for the latter use a localised time
zone (e.g. Europe/Rome), not an offset time zone (e.g. not +0200).

For more information Jon Skeet has written about this multiple times.

~~~
thomaslkjeldsen
I believe this "wall clock time" approach is broken by design as it pushes the
burden of figuring out timezone details to those who are not located in that
particular timezone.

A fair and therefore safer approach is to decide that by protocol the legally
binding time is defined in UTC.

Your system will translate UTC times to and from any given local time using
the IANA time zone database which is regularly updated. End users must be
aware about the UTC time, that it is legally binding, and that the local time
conversions are provided as-is.

This way the time of a meeting or deadline is protected from local governments
messing around with timezone changes.

Additionally, dates are rendered in ISO8601 standard format with a proper
footnote to help users learn about international standards.

~~~
brewmarche
I think whether UTC or wall clock time is binding is a problem in the legal
and planning (so the business) domain and has to be treated as an external
input to the software engineering problem.

Although you are of course free to advocate for UTC. I remember Swatch trying
to establish something similar and it never took off:
[https://en.wikipedia.org/wiki/Swatch_Internet_Time](https://en.wikipedia.org/wiki/Swatch_Internet_Time)

------
_pastel
As an ML engineer working in Python, I keep running into a problem:

If parameters are defined close to usage, and strongly typed, then it's hard
to cleanly search for good configurations of the parameters. Especially for
fancier search strategies, you want all parameter lookups to go through a
single file.

On the other hand, there's a lot of code churn until an ML pipeline is
finished. And errors from typos and type violations will often only show up
after hours of training. So it's _also_ painful to try to keep a separate,
loosely-typed parameter file in sync.

So far, my compromise is to:

(1) on a first pass, define all parameters as global variables at the top of
the files they are used in

(2) once mostly code-complete, pull them into a separate file that tracks
initial values, current values, and search ranges. Make all usages go through
a lookup where the key is an enum, but the value is untyped:

    
    
      def param(name: ParamName) -> Any:
        return params[name].current_value
    

Which is not ideal. Does anyone else keep running into this problem and have a
better solution?

~~~
silviogutierrez
Alas, MyPy doesn't have the concept of "keyof" and mapped types like
TypeScript.

So in place of that I would:

1\. Define a variable called Params = Any;

2\. Liberally use Param["foo"] and Param["bar"] anywhere.

3\. Once you stabilize, reimplement Params as a TypedDict. You'll get failures
if accessing any invalid key.

You can also use a NamedTuple if you prefer.

If you insist passing a param _name_ , then you'll have to create a big list
of Literals for param with every key in your dict. So Param = Literal["foo",
"bar"] etc.

~~~
_pastel
Thank you! It's surprising that mypy can typecheck based on string keys like
that. Cool!

------
zelphirkalt
I think argparse covers most of the points mentioned as desirable in this
article.

* validate at start (using the type keyword argument for add_argument) * access by name as identifier, not string

And what is more is, that default values are stored with the configuration,
plus you add a help text for telling everyone what the argument is for.

One downside is that you command line call gets longer.

------
xiwenc
Pretty good guidelines. I like the idea of having a configuration close to the
class (perhaps module-level depending on the project size) that uses it. With
dataclass the class definition is fairly clean. In addition to that, I'd
consider using [https://docs.python.org/3/library/dataclasses.html#post-
init...](https://docs.python.org/3/library/dataclasses.html#post-init-
processing) for business-specific validations.

------
jry1234
Kensho has probably the best solution to this problem I've seen so far:

[https://github.com/kensho-technologies/grift](https://github.com/kensho-
technologies/grift)

Handles typing really well, as well as config defaults and fallbacks, giving
you the ability to configure your app a few ways, and fall back on other
configs if something isn't specified.

------
anentropic
Shout out here for pydantic BaseSettings [https://pydantic-
docs.helpmanual.io/usage/settings/](https://pydantic-
docs.helpmanual.io/usage/settings/)

That provides typed and validated auto-loading from env vars. I have been
quite happy with that in conjunction with an optional .toml file, to do
flexible config cleanly and simply like:

    
    
        import toml
    
        from myproj.conf.types import Settings  # a pydantic BaseSettings model
    
    
        try:
            _config = toml.load('myproj.toml')
        except FileNotFoundError:
            _config = {}
    
    
        settings = Settings(
            **{key.upper(): val for key, val in _config.items()}
        )

------
hathym
it's worth looking at python's own configparser[1] before rolling your own.

[1]
[https://docs.python.org/3/library/configparser.html](https://docs.python.org/3/library/configparser.html)

~~~
kingosticks
Isn't that basically the same end result as using json.loads except a
different format (that has no actual spec).

~~~
frumiousirc
JSON does not support comments nor string interpolation. Python ConfigParser
language does.

~~~
kingosticks
True, you do get string interpolation but the comment support in ConfigParser
isn't very good. Although actually they may have fixed some of that in Python
3 but I'm still using workarounds.

To be clear, I am not suggesting using JSON for config, I think that would be
my last choice. My point is that ConfigParser isn't really an alternative to
rolling your own if you want decent validation etc (those spec files are
horrible to use). You very quickly need to start extending ConfigParser to the
point where you've started rolling your own. And at that point you'd be better
off with one of the other (tested) solutions already suggested.

~~~
nomel
What's wrong with the comment support?

You can't have comments at the end of a line, but that's sort of the nature of
supporting arbitrary strings as values. I don't want my users to have to quote
or escape special characters if they happen to want to use them. They're not
programmers.

    
    
        # The note to display
        note = Our #1 customer!
    

rather than

    
    
        name = Our \#1 customer  # The note to display.
    

or

    
    
        name = "Our #1 customer"  # The note display

~~~
kingosticks
> What's wrong with the comment support?

Comments are simply ignored. You can't read them. You might want to read a
commented config file in, make a change to a setting and then write that out.
You can't do that. But you can write comments using the 'allow_no_value=True'
hack, as long as long as you put it in a section.

> You can't have comments at the end of a line

You can. You need to use ';' for inline comments and you must proceed it with
whitespace. Are your users ready for that?

~~~
nomel
> make a change to a setting and then write that out

Very good point.

> You can. You need to use ';' for inline comments

I have some bugs to fix.

------
markus23
As already written by others, the article does not go very deep and is missing
many essentials. What I was mostly missing is more about keeping configuration
parameters as simple as possible. A much more detailed best practices can be
found here:
[https://www.libelektra.org/ftp/elektra/slides/cm/](https://www.libelektra.org/ftp/elektra/slides/cm/)

------
thomk
Unless the end user is not technical, use a .py file and force them to
subclass your Configuration class which has an __init_subclass__ method so you
can enforce rules.

When you are ready to move to a more generic solution, your .config or .yml
file can generate these.

The advantage here is both flexibility (it's Python) and control
(allow/disallow whatever you want).

If you need nested items, use nested classes.

~~~
ralston3
This.

Until you the app reaches the level of advanced yaml config files for cloud
deployments, it’s really hard to beat a “config.py” that does a single read of
all your ENV_VARS at startup

------
okomestudio
I have been working on a minimalistic application config library for Python,
aiming to consolidate config loading from files, environment variable, and
command-line argument parsing. It's an alpha, so please feel free to provide
feedback if app configuration has been your pain point.

[https://github.com/okomestudio/resconfig](https://github.com/okomestudio/resconfig)

~~~
jayjader
FYI that sounds like exactly the same feature set of 'python-decouple'.

~~~
okomestudio
Indeed "python-decouple" looks like it serves similar niche. (I didn't know of
the package. Thanks for letting me know.) I think I'd like to target a smaller
niche though, someone writing a small applications, with a little more
flexibility in things like YAML support and dynamic loading. Unless "decouple"
eventually supports similar features, I want to keep experimenting.

------
mixmastamyk
Along these lines and unsatisfied with current solutions I started this
project, "Turtle Config." It is format-agnostic and supports type checking as
well:

[https://github.com/mixmastamyk/tconf](https://github.com/mixmastamyk/tconf)

I'll see if I can add any advice this article gives; feedback would be
helpful.

