
Say “no” to import side‐effects in Python - chrismorgan
http://chrismorgan.info/blog/say-no-to-import-side-effects-in-python.html
======
einhverfr
I do most of my work with Perl, rather than Python, but in Perl there are
several kinds of stock import-level side-effects which are actually quite
helpful. These are pretty light-weight though. They boil down to:

1\. Global lexical changes to Perl. Sometimes this is the whole point of an
import (for example Carp::Always, which typically turns any warning or
exception string into a full stack dump). I can't imagine doing something like
this in Python. However making soemthing like this work properly without
breaking too many things requires a heck of a lot of forethought Yes, even
Carp::Always may break something.

2\. Manipulation of the importing module's symbol table. This is important for
lexical extensions to Perl that you don't want to be global (for example Moo,
Moose, and PGObject::Util::DBMethod). Among other things this allows MOPs to
be added with greater sophistication than the language typically allows. I am
not a Python guru but I could imagine metaprogramming side effects to be
useful in setting up a consistent and powerful environment.

The problem the author describes is something which is different though, which
not only is a side effect issue but also a violation of separation of
concerns. There are certain problems you do not want to solve at import time,
and connecting with/configuring external components is almost always one of
them.

Why? Because integration with external components is almost always something
you want the fine-tuning and decision-making to reside with the application
developer. That's very different than setting up a consistent lexical
programming environment for use (which is what the acceptable side effects
do).

~~~
chrismorgan
In Python, side-effects are simply not acceptable at all. One should instead
put the code with the side-effect in a function and call it.

This is the approach taken by gevent, which allows you to replace the entire
I/O stack. But importing it does not effect the change; you must explicitly
call code to do that. This is done thus:

    
    
        from gevent.monkey import patch_all
        patch_all()
    

c.f.
[http://www.gevent.org/gevent.monkey.html](http://www.gevent.org/gevent.monkey.html)

~~~
ta0967

        from __future__ import print_function

~~~
makomk
As I understand it, "from __future__ import ..." statements are actually a
special type of statement that doesn't actually import a module at all - they
just use similar syntax for compatibility reasons. There is an actual
__future__ module, also for compatibility reasons, but importing it has no
side-effects.

~~~
chrismorgan
Future statements do import something as well, when run, but their primary
purpose is changing the behaviour of the compiler _for the current file_. They
have no behaviour outside of that file, so that is not a publicly visible
side-effect.

------
lambda
On one machine I tried, help('modules') actually worked successfully with no
substantial delays or apparent side effects.

On another, it apparently tried to set up an MPI cluster:

    
    
      *** The MPI_Init() function was called before MPI_INIT was invoked.
      *** This is disallowed by the MPI standard.
      *** Your MPI job will now abort.
      [hostname:14114] Abort before MPI_INIT completed successfully; not 
      able to guarantee that all other processes were killed!
    

On a third, I get the following and then it just hangs:

    
    
      Python 2.7.6 (default, Mar 22 2014, 15:40:47)
      [GCC 4.8.2] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> help('modules')
      
      Please wait a moment while I gather a list of all available modules...
    
      /usr/lib/python2.7/dist-packages/gobject/constants.py:24: Warning: g_boxed_type_register_static: assertion 'g_type_from_name (name) == 0' failed
        import gobject._gobject
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot register existing type 'GtkWidget'
        g_type = info.get_g_type()
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot add class private field to invalid type '<invalid>'
        g_type = info.get_g_type()
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot add private field to invalid (non-instantiatable) type '<invalid>'
        g_type = info.get_g_type()
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: g_type_add_interface_static: assertion 'G_TYPE_IS_INSTANTIATABLE (instance_type)' failed
        g_type = info.get_g_type()
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: cannot register existing type 'GtkBuildable'
        g_type = info.get_g_type()
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: g_type_interface_add_prerequisite: assertion 'G_TYPE_IS_INTERFACE (interface_type)' failed
        g_type = info.get_g_type()
      /usr/lib/python2.7/dist-packages/gi/module.py:171: Warning: g_once_init_leave: assertion 'result != 0' failed
        g_type = info.get_g_type()

------
aptwebapps
This in no way invalidates the point of the post, but I can't imagine going
back to installing everything globally instead of using virtualenv. If you do
that, at least you won't have every package you every looked at in your path.

~~~
marcosdumay
I just can't get to use some modules within virtualenv. They seem to create
more problems than using systemwide.

~~~
pekk
Which ones? You can get some help. Learning virtualenv is a smart long-term
move

~~~
aptwebapps
If you're on a Mac, good luck getting pip to install mysql-python. I use
Macports for that.

~~~
jaegerpicker
This isn't a good solution. I pip install mysql-python several times a week on
my mac ( I use different venv's for different branches and we have several
different services that I work on in a given week, thus I'm installing
packages via pip A LOT) and we always install it in a venv. Sure it can be a
bit of a pain but it's well worth it IMO.

Our steps to always get it working: make sure mysql_config is in your path env
variable, make sure the xcode command line tools are probably installed, and
make sure the mysql command line client is setup and working correctly
locally. Past that is just works for us on Mac OS X 10.7+. I think 10.6 and
earlier also work but I'm not sure.

~~~
aptwebapps
I couldn't ever get it to compile, but I haven't tried in a while. I don't
actually need it much these days as I'm using Postgres for most things.

I'm sure there are other packages that are similar: not so easy to get working
with pip but not impossible.

------
jzwinck
Of course you should do this in any language. I once used a Ruby library that,
when loaded, would try to connect to a database on a remote machine. Programs
which required this library would take several seconds to display their --help
output.

Because of that and similar incidents, I've learned to import argparse up
front but nothing else unless necessary. Once argument parsing is done, then
importing other modules begins.

~~~
rcfox
> I once used a Ruby library that, when loaded, would try to connect to a
> database on a remote machine.

How is that at all acceptable? I can't believe that a library that phones home
would gain any sort of popularity.

I can't say I know much of anything about the Ruby community, but if they've
conditioned you to jump through hoops like importing modules at specific times
to avoid delays, that is a serious problem. Conditional/delayed imports have
their place, but they should be relatively rare.

~~~
liveoneggs
install my module! Also.. trust me because it's totally safe.

curl [http://foo.com/tBrwn](http://foo.com/tBrwn) | bash

[http://rvm.io/rvm/install](http://rvm.io/rvm/install)

~~~
Karunamon
Out of curiosity, what would be better to handle the case of RVM?

It's explicitly user space software, and unprivileged user space software at
that, so having it require admin interaction (i.e. touching the system package
manager) to install seems like a sledgehammer where a flyswatter would do.

They could use a VCS repo of some kind, but that doesn't handle the various
folder and script installs that need to happen - and also doesn't help if
there isn't that specific VCS on the system.

Upstream security is less of a concern since that hotlink just points to raw
code on Github, and over HTTPS no less.

So what are the negatives here? For software like RVM, this seems like the
best, most portable solution that works for the most people.

~~~
liveoneggs
being on github and using https doesn't guarantee anything, obviously.

If you look at the actual script it contains a ton of sudo and also additional
curl commands. The chain of security is very lacking.

------
dpwm
It's also worth stressing that even if import side-effects are always going to
be fast and not kill the interpreter, they are still a terrible idea.

Even spawning objects that are referenced in modules can have some rather
unpleasant properties. The __del__ method will likely never get reliably
called and other behaviours that work great in scripts break in subtle ways,
especially with a KeyboardInterrupt. Threading and multiprocessing will leave
processes running using 100% cpu. Trying to debug these things gets insane, as
you can often only find them as the interpreter is dying.

I think import side-effects can be tempting because Python is often introduced
using a scripting-oriented approach. Combined with Python following the
principle of least surprise, most people doing this won't even realise that
it's wrong. This does seem to be a rather common anti-pattern.

------
ThePhysicist
Putting side effects in the __init__ code seems to become quite fashionable
these days but is a pretty bad idea since it removes the possibility to "just"
import the functionality defined by the module without performing any
initialization. Personally, I always try to avoid having a system that relies
on some global configuration (like e.g. Django, Matplotlib or Flask do). In
matplotlib for example this causes a lot of problems, since importing the
pylab module will automatically (among other things) load and initialize a
backend, which is then set in stone for the rest of the session.

IMHO, the way to go here instead is dependency injection:

Inject the configuration into the module through a function or class method
(e.g. Flask.initialize({config state}). Wrapping all module functionality that
depends on configuration in a class is a good idea here since it allows you to
use multiple configurations in parallel and makes your code more modular.

As an example, in BlitzDB (a document-oriented database for Python,
[https://github.com/adewes/blitzdb](https://github.com/adewes/blitzdb)) there
is no global configuration at all, so you can initialize and use multiple
backends in parallel as you please without worrying about side effects.
SQLAlchemy does it in a similar way btw.

~~~
davmre
FYI, it is possible to call Matplotlib in an 'object-oriented' way without
global state; though it's a bit more cumbersome than just using the pylab
interface. See
[http://matplotlib.org/examples/pylab_examples/webapp_demo.ht...](http://matplotlib.org/examples/pylab_examples/webapp_demo.html)
for an example.

------
taejo
Agreed. Twisted's behaviour of installing a reactor on import has caused me
problems which could be worked around by importing conditionally or at a later
time, but in cases like this, where one is importing modules dynamically, one
doesn't know ahead of time which workarounds one needs.

~~~
jzwinck
Twisted's problem is the good old singleton (anti-)pattern. There can only be
one reactor ever, and most galling of all, it can never be restarted once
stopped.

~~~
thristian
It should be said that Twisted has put a lot of effort into making it possible
to making it possible to deal with multiple reactors (mostly so they can run
their test suite including all their supported reactors), and even to make it
possible to unit-test Twisted code without a reactor at all (by having
standardized mocks for the various things the reactor does).

Of course, that doesn't help the mountains of code written for Twisted that
expect a singleton reactor, so we're stuck with it for the foreseeable future.
Perhaps in the Brave New World of Python 3, where Twisted is just an
implementation detail of the asyncio module, life will be better.

~~~
jzwinck
Why can't we at least restart a stopped reactor? That seems possible without
breaking compatibility.

------
gbog
For those not in the knowing, avoiding import side effect means never having
top level code in any python module except class, functions and constant
definitions, other imports and the occasional if name = main. And constants
must be built-in types.

------
kzrdude
Qt and Gtk seem to be the problem here, and they won't change. They aren't
python modules as much as they are application frameworks and assume to be in
full control. Gtk calls sys.setdefaultencoding("utf-8") when imported,
changing the way your strings behave in the whole process.

~~~
chrismorgan
With regards to Qt, the problem is not Qt but rather that something else is
actually trying to _use_ Qt at a time when it shouldn't.

pygtk setting the default encoding to UTF-8 is concerning—if true, that is
certainly bad behaviour.

~~~
ghfdaghkj
I haven't used GTK in a long time, but when I was using it, I was under the
impression that pygtk was being deprecated, and everyone should be using
GObject-Introspection instead.

~~~
chrismorgan
I dunno—I was making assumptions, not being very familiar with the GTK scene
in Python.

~~~
kzrdude
Not me either anymore. I tried to port to introspection + Gtk3 but ran into
problems with widget subclasses and no documentation to resolve it with.

~~~
phaylon
I only extensively used it from Perl and Vala, but with GObject-Introspection
I tend to more often look at the original library documentation than something
language specific. This can be a disadvantage at first, but for me it turned
out more convenient, since GIR-inflated bindings can be more complete, as long
as the inflation supports the features the API describes. They also tend to be
more consistent in their differences to the original.

The problem about documentation seems to be the usual dilemma that as soon as
you know enough to implement an API browser reading GIR that outputs the API
in your language, you know enough about how the bindings work themselves to
just use the original documentation.

~~~
kzrdude
well it doesn't really explain how to port from pygtk or how python classes
interact with g-i.

~~~
phaylon
That is true, and unfortunately I have used Python only sparingly until now,
and have no experience using PyGtk at all.

I took a quick look at PyGObject[0] (unsure if that is what one would actually
use for this), and the most helpful part seems to be [1], giving some hints on
extending GObject.Object, which should be translatable to extending other
existing GObject type classes.

As for porting, I agree that can be a pain in that case. When bindings move
from a manual implementation to inflating it from an external description. You
often end up with good tutorials for the first, and good API reference for the
second. If I need to understand other kinds of Gtk bindings, I search for
examples on custom TreeModels or other potentially messy things like that to
get a feel for it.

[0]
[https://wiki.gnome.org/action/show/Projects/PyGObject?action...](https://wiki.gnome.org/action/show/Projects/PyGObject?action=show&redirect=PyGObject)
[1] [http://python-
gtk-3-tutorial.readthedocs.org/en/latest/objec...](http://python-
gtk-3-tutorial.readthedocs.org/en/latest/objects.html)

------
Terr_
If there's one thing I've learned from working with old PHP codebases, it's
that side-effects from include/require are a horrible horrible idea.

I'm OK with Java's static-initializers, but that's because they've got a whole
bunch of rules around them preventing common kinds of abuse.

------
IgorPartola
I am an author of a library [1] that does this. I understand the what the OP
is talking about and agree with it, but like anything for me the rule is "(1)
Don't do dangerous behavior X. (2) If you are an expert, do dangerous behavior
X sparingly and with caution."

Mind you, my library does not connect to a database, or any such crazy thing.
It simply creates a singleton which is then useful throughout your
application. There would be very few cases where you would not want that
singleton and you would want your own instance. If so, you are free to
completely ignore the created singleton and make your own instance. The
creation process is also idempotent, except the data you put into that
singleton; that data is a special case: it is your explicit responsibility to
namespace it, which is indeed the whole point of this library. I feel pretty
good about this thing.

[1] [https://github.com/ipartola/groper](https://github.com/ipartola/groper)

~~~
halostatue
Thinking about it, I do the same thing with my Ruby library mime-types[1]. The
library is only useful if you load some data into its registry, so it does so
automatically, unless you specify RUBY_MIME_TYPES_LAZY_LOAD in the
environment.

I've just put myself a task to reverse the behaviour so that lazy loading is
the default.

[1] [https://github.com/halostatue/mime-
types](https://github.com/halostatue/mime-types)

------
atilaneves
I have the unfortunate "luck" to be using a Python library at work that
_loves_ to use import side-effects. Importing it like God intended makes it
parse command-line arguments and fail if it doesn't like what was passed in.
And that's just the start. Nearly every __init__.py has code in it, including
class definitions. I have no idea why.

~~~
mercurial
It's like programmers who find out about metaprogramming. Usually they grow
out of it.

~~~
atilaneves
I haven't grown out of metaprogramming at all. I just try and use it only when
it's the best option. Usually because it means reducing duplication.

I liked metaprogramming to an extent in C++. I _love_ it in D.

~~~
gbog
Replace "best option" with "only practical option" and you're good to go.

~~~
einhverfr
That may depend on the language you are using.

------
Chris_Newton
Would anyone like to share their experiences avoiding this sort of problem in
the context of web frameworks and building the back end for larger web
sites/apps?

As an example for discussion, the first time I wrote a Flask-based back-end, I
backed myself into a corner almost immediately in the following way.

Firstly, the WSGI file that the web server uses to start the application
followed the suggestion in the Flask docs by doing this:

    
    
        # webserverseesthis.wsgi
        from yourapplication import app as application
    

That’s not so bad, but then I started doing application configuration and
loading various Flask plug-ins as side effects of that import:

    
    
        # yourapplication/__init__.py
        app = Flask("yourapplication")
        
        # Do some general application configuration.
        app.config.from_pyfile("/path/to/configuration/file")
    
        # Set up some overarching security things that modify application behaviour.
        from flaskext.securityplugin import SecurityPlugin
        sp = SecurityPlugin(app)
    

This seemed at the time like the obvious place to put such things, but of
course, this is really just a variation on the mistake we’re discussing here.

To compound the error, I then used Flask’s decorators to wire up routes from
various URLs to the relevant parts of my code. Those decorators work on the
application object (sticking with ideas common to many Python web frameworks
and avoiding getting into anything more Flask-specific like blueprints) so I
was effectively creating circular dependencies from almost everything to that
top-level package:

    
    
        # yourapplication/pages/home.py
        from yourapplication import app
    
        @app.route('/')
        def home_page():
            # Render home page
    

and then from the top-level package onto almost everything so all those
decorators could take effect:

    
    
        # After setting up the application object in yourapplication/__init__.py
        import yourapplication.pages.home
    

Now, as long as this kind of code only ever runs as a WSGI application behind
a web server, you get away with these dependencies up to a point. In practice,
your WSGI set-up imports the top-level application package, which in turn sets
up the application object everything is going to depend on and only then
imports all the supporting modules/packages, and everything “works”.

However, as soon as you want to write tests or otherwise reuse any of the code
in a different context, the entire system is a big bowl of spaghetti with all
the usual problems. The moment you import any part of the system to run a unit
test on something in it, you get much of the rest of the system as well,
complete with the side effects of any imports therein.

This was of course all horribly naïve on general programming principles, but
the nature of these frameworks tends to push in this direction, and even
Flask’s own documentation features various simple examples that follow a
similar approach, so I’ll forgive myself for falling into the trap the first
time. I’ve since experimented with various techniques to break the cycles and
avoid the side effects on imports, with some success, but frankly I’ve never
found a satisfying, general strategy for organising larger code bases built
around a web framework.

How is everyone else doing this?

~~~
hcarvalhoalves
The example on the Flask tutorial with the app at module level is really only
viable if you cram everything inside one module, it gets old soon. You should
use a factory pattern, like this:

    
    
        def app_factory(config):
            app = Flask("yourapplication")
            app.config.from_pyfile(config)
            # ...
            return app
    

Then whenever you need access to your app object, you use the provided proxy:

    
    
        from flask import current_app as app
    

You can't use it at module level though (because there isn't an application
context setup by that time), so this doesn't work:

    
    
        @app.route('/')
        def home_page():
            # ...
    

Instead, hook up views inside your app factory:

    
    
        def app_factory(config):
            # ...
            app.route('/')(somemodule.home_page)
            # ...
    

For the test suite, you can now instantiate apps with a different
configuration:

    
    
        from flask import current_app as app
        from myfoo import app_factory
        import unittest
    
        class MyFooTest(unittest.TestCase):
            # ...
    
        if __name__ == '__main__'
            test_app = app_factory(test_config)
            # The app proxy will point to that inside test cases
            unittest.main()
    
    

TL;DR: The factory pattern is your friend. Parametrize all the things. Avoid
singletons at module level, this leads to spaghetti. If you need convenience,
create proxies.

~~~
Chris_Newton
_You should use a factory pattern, like this: [...] You can 't use it at
module level though (because there isn't an application context setup by that
time), so this doesn't work: [...] Instead, hook up views inside your app
factory: [...]_

That’s basically what I did on my second iteration. It is an improvement in
some respects, particularly breaking the circular dependencies caused by using
the decorators on the global application singleton. On the other hand, now you
need some variation of God Object that not only imports all your modules that
used to have decorators but also knows enough about their internal
implementation to set up the routes and things like pre- and post-request
logic directly on the application object you get back from the factory.

The next logical step after that then seemed to be having each module/package
that contains views or similar logic provide some sort of initialization
function that is declared when you import the module and takes an application
object as a parameter. Then we can use app.add_url_rule and friends to wire up
the various handlers within each package/module but decoupled from any sort of
global application object that needs the circular import. This is the tidiest
style I’ve found so far, and all my Flask projects in recent years have used
something broadly like it. It does only require one import followed by one
initialization call for each package/module, which logically seems to be as
good as we can get, given that our starting point is a desire to avoid
including any initialization implicitly within the import itself and to avoid
depending on global singletons.

Somehow, it still doesn’t quite feel right for some reason. I think it’s
because even with that general design, I’ve still got a recurring pattern in
each of how I create these modules and how I import and then initialize them.
My instinct says we ought not to need that extra boilerplate in a highly
dynamic language like Python, but I’ve yet to find any alternative that is
neater in general. At least in the most simple cases this only adds a couple
of extra lines (converting the decorators to an init function in each
package/module, and then calling that function at the top level after
importing the package/module), which is clearly better than the earlier, more
highly connected designs.

------
adamcharnock
Can anyone suggest best practice alternatives to import side-effects?

~~~
deckiedan
Don't run any functions at the root level of your module.

Instead, if you really need long-lasting objects which get initiated once,
then use an object, and put any initialisation stuff in it's `__init__`
method. Then the module can be imported whenever, but your initialisation
stuff is only called when the user of your library creates a new instance of
that class.

For bonus points, make your classes able to be used with the `with ...`
syntax, so then lifetime is kept to a minimum, and errors/whatever are dealt
with by default.

If you really really need to monkey around and take control of the whole
python interpreter (gevent, twisted, and possibly some GUI frameworks come to
mind...) then don't do that at import time, do it with a `run_forever()` or
`take_control` type function.

But yes, a virtualenv for every project does help a lot with not accumulation
cruft.

~~~
zipfle
I sometimes make calls to collections.namedtuple and other class building
functions at the same time as the rest of my definitions.

~~~
rcfox
If you pass verbose=True to namedtuple(), you can see that it's essentially
just defining a new class. This is something that people already do at the top
level of a module.

------
TazeTSchnitzel
This is one reason why I prefer Haskell ;)

~~~
mercurial
I think most statically typed languages don't care for this kind of
shenanigans.

~~~
pcwalton
In addition to the many other languages listed here, Go allows side effects on
import.

~~~
colin_mccabe
That's interesting. I can see the motivation... it's helpful to have modules
come into the world fully initialized, and initialization often involves side
effects.

Is Rust going to allow side effects on initialization?

~~~
pcwalton
No. Having module import perform side effects delays program startup
unnecessarily and makes the semantics of the program depend on the order in
which modules got initialized, which is confusing.

------
Hello71

        $ echo 3 > /proc/sys/vm/drop_caches
        $ time python2 -c 'help("modules")' >/dev/null
            /usr/lib64/python2.7/site-packages/gobject/constants.py:24: Warning: g_boxed_type_register_static: assertion 'g_type_from_name (name) == 0' failed
          import gobject._gobject
        python2 -c 'help("modules")' > /dev/null  1.58s user 0.23s system 69% cpu 2.626 total
        $ time python3 -c 'help("modules")' > /dev/null
        python3 -c 'help("modules")' > /dev/null  2.00s user 0.17s system 74% cpu 2.928 total
        $ time python2 -c 'help("modules")' >/dev/null
        /usr/lib64/python2.7/site-packages/gobject/constants.py:24: Warning: g_boxed_type_register_static: assertion 'g_type_from_name (name) == 0' failed
          import gobject._gobject
        python2 -c 'help("modules")' > /dev/null  1.26s user 0.11s system 99% cpu 1.375 total
        $ time python3 -c 'help("modules")' > /dev/null
        python3 -c 'help("modules")' > /dev/null  1.75s user 0.09s system 99% cpu 1.852 total
    

perhaps your issue is having too many things installed?

~~~
chrismorgan
The laptop I'm working on was decent when new, six years ago, but is now not
the fastest thing on the block. It has certainly collected quite a lot of
things there (mostly from system packages), but it is probably just one or two
things that are spending most of the time (excluding I/O time).

It's interesting; now that I've tried running it a few more times,
`help('modules')` on my Python 2.7 is getting down to six or so seconds. (On
Python 3 it takes around 0.15 seconds.)

