
Common Python Mistakes - djug
http://www.toptal.com/python/top-10-mistakes-that-python-programmers-make?utm_source=Engineering+Blog+Subscribers&utm_campaign=51aba2b5ff-Blog_Post_Email_Top10PythonMistakes&utm_medium=email&utm_term=0_af8c2cde60-51aba2b5ff-109835873
======
agf
This is a pretty good list of gotchas, but it's important when writing
something targeted at beginners to be as precise and clear as possible. Nearly
every section here either uses terminology poorly, is slightly incorrect, or
has difficult examples.

    
    
      Python supports optional function arguments and allows default values to be 
      specified for any optional argument.
    

No, specifying a default is what _causes_ an argument to be optional.

    
    
      it can lead to some confusion when specifying an expression as the default value
      for an optional function argument.
    

Anything you specify as a default value is an expression. The problem is when
the default is _mutable_.

    
    
      the bar argument is initialized to its default (i.e., an empty list)
      only the first time that foo() is called
    

No, rather it's when the function is defined.

    
    
      class variables are internally handled as dictionaries
    

As dictionary _keys_ , and that's still only roughly correct.

In "Common Mistake #5", he uses both a lambda and array index based looping,
neither of which are particularly Pythonic. A better example of where this is
a problem in otherwise Pythonic code would be good.

In "Common Mistake #6" he uses a lambda in a list comprehension -- for an
article of mistakes mostly made by Python beginners, this is going to make it
tough to follow the example.

In "Common Mistake #7", he describes "recursive imports" where he means
"circular imports".

In "Common Mistake #8" he refers repeatedly to "stdlib" where he means the
Python Standard Library. Someone is going to read that and try to "import
stdlib".

~~~
nine_k
Just to notice: this text is _not_ targeted at beginners, in my opinion. These
are upper-intermediate to advanced level gotchas.

~~~
crazypyro
Did they add in "(Note: This article is intended for a more advanced audience
than Common Mistakes of Python Programmers, which is geared more toward those
who are newer to the language.)" later?

Because it states it right near the top of the article.

~~~
hsinger
That note has been in the post since it was first published.

------
Goosey
Slightly off topic, but does anyone know of a resource that has 'most common
mistakes' for different languages all in one place? It's certainly possible to
google for blog posts and stack overflow questions to assemble such a list,
but it would be handy to have them all in one place.

My use case is when interviewing candidates I often ask them to rate
themselves on a scale of 1-5 in the languages they know, and then ask them
increasingly 'tricky' questions in each language to get a feel for how their
"personal" scale aligns to their real knowledge. This works fine if we have an
overlap of several languages, but in the case where I know nothing or very
little of one of the languages they know I lose that data point.

I find it valuable to know what a "I am a 1 at X" vs "I am a 3 X" vs "I am a 5
at X" means to them, since I've found little correlation between how harshly
someone rates themselves and their true ability. Sometimes self-rated 5s are
really 5s by my book, sometimes self-rated 3s are really 5s by my book, and
sometimes self-rated 5s are really 2s by my book. So I want to know how "my
scale" translates to "their scale". If it was more formalized I'd go as far as
to get a "confidence quotient" for a person as self-critical and self-
confident people can be fantastic engineers or horrible engineers.

Does anyone else do this process when interviewing?

~~~
mark-r
While such a resource would make your job easier, it would make the
interviewee's job easier still. They'd just have to memorize all the points in
the reference.

~~~
timdierks
At which point they will no longer make these mistakes, and thus are more
expert.

Everyone wins!

~~~
mark-r
Except that you're risking hiring people that know all the trivia but still
can't program FizzBuzz properly. The test loses its predictive value.

~~~
neumann
but at least employers will stop testing trivia and focus on problem solving
skills.

Everyone wins!

------
famousactress
This list is an excellent summary. If tasked with a #11 I'd probably add the
slightly more obscure, but still super painful (when you do run into it)
implicit string concatenation:

    
    
        >>> l = ["a",
        ...      "b",
        ...      "c"
        ...      "d"]
        >>> l
        ['a', 'b', 'cd']

~~~
nine_k
Still hugely useful when you want to write long string constants.

Imho, adding a comma after _each_ list element is a good practice. You can
easily swap them, add more, and never run into a an issue you describe:

    
    
        foo = [
          "a",
          "bc",
          "def",  # comma here, too
        ]

~~~
bluecalm
You could easily use a + operator then. I find the behavior surprising. I
would expect a syntax error. You get a syntax error if you write two integers
next to each other (separated by a space) or two of any other thing but
somehow "a" "b" got converted to "ab". If I've discovered it myself I would be
tempted to fill a bug report. It goes against Python mantra:

"Explicit is better than implicit"

It seems someone thought along similar lines and took action:
[http://legacy.python.org/dev/peps/pep-3126/](http://legacy.python.org/dev/peps/pep-3126/)
but the issue got rejected.

~~~
famousactress
Yeah, when I discovered this little "feature" I had a read through that. The
folks that use this for blocks of multi-line text are very defensive about the
practice. I do understand not wanting to break compatibility though,
especially since finding instances of this is hard (which is another reason it
shouldn't exist in the first place!). Oh well :)

~~~
tripzilch
> The folks that use this for blocks of multi-line text are very defensive
> about the practice.

odd, because we have triple-quotes for that, don't we?

~~~
klibertp
No, unfortunately we have not.

With triple quotes you get a string with newlines and indentation in it. While
you can not indent the following lines it looks ugly, and you can't do
anything about newlines.

Take a look at how F# handles the issue:
[http://stackoverflow.com/a/14599828](http://stackoverflow.com/a/14599828)

And CoffeeScript:
[http://coffeescript.org/#strings](http://coffeescript.org/#strings)
(apparently implemented relatively recently
([https://github.com/jashkenas/coffeescript/issues/3229](https://github.com/jashkenas/coffeescript/issues/3229))
and borrowed from LiveScript.

There are other language which support multiline strings (Here docs) with
indents stripped via means of syntax, like YAML (with |), Racket (which
doesn't do dedenting, but being language it is it's very easy to add) and many
shells (with <<-). Python doesn't have this feature, and parse-time string
literals concatenation serves this purpose.

Of course, you can do something like:

    
    
        foo = """bar
              indented at first
              and after newline
        """
        textwrap.dedent(foo)
    

(or use list literals with str.join, or use a regex, or many, many other
thing), but you can do this in _all_ languages. Languages with syntactic sugar
for this make writing slightly-longer-but-not-too-long strings much easier and
cheaper (only done once during parsing, no need for imports, etc.), and Python
makes up for not having explicit way of doing this with implicit parse-time
string literals concatenation.

------
tomp
Another one:

    
    
        class A():
          def __init__(self):
            self._x = 0
        
          @property
          def x(self):
            return self._x
    
          @x.setter
          def x(self, new_value):
            self._x = new_value
    

Using it:

    
    
        a = A()
        print a._x    # 0
        print a.x     # 0
        a.x = 4
        print a.x     # 4
        print a._x    # 0 wait WTF?!
    

The bug was not having `A` inherit from `object`. With old-style classes,
properties do not work correctly.

~~~
Spittie
For anyone wondering, this works just fine in Python 3 :)

~~~
shotwell
Yes, but just because all classes in python 3 are new-style.

------
herge
My pet peeve with python is the classic:

    
    
        return x,
    

I have never wanted to declare a tuple without surrounding it with (). Too bad
it's not a syntax error in python 3.

Also, as opposed to one of his examples, if you are using python 2.7, declare
your exception blocks as:

    
    
        except (FooException, BarException) as e:
    

It's forward compatible with python 3, it's easier to read and the syntax
errors are clearer.

~~~
icebraining
_I have never wanted to declare a tuple without surrounding it with ()._

No? I do that occasionally, e.g.:

    
    
      x, y = 5, 6

~~~
akx
Doing

    
    
        (x, y) = (5, 6)
    

makes it clearer to "visually grep" that it's not a normal assignment though.

~~~
rnnr
It's the comma that determines an expression as a tuple, not the parentheseis.

~~~
herge
Not all the time.

    
    
        x = ,
    

causes a syntax error, while

    
    
        x = (,)
    

creates an empty tuple.

~~~
dalke
Under both Python 2.7 and Python 3.3 I get:

    
    
        >>> x=(,)
          File "<stdin>", line 1
            x=(,)
               ^
        SyntaxError: invalid syntax

~~~
Spittie
From the Python docs
([https://docs.python.org/3/tutorial/datastructures.html#tuple...](https://docs.python.org/3/tutorial/datastructures.html#tuples-
and-sequences))

> Empty tuples are constructed by an empty pair of parentheses;

And it works fine

    
    
        >>> x = ()
        >>> x
        ()
    

I guess Herge mean that? After all, his argument was that a tuple is not
always defined by the comma.

~~~
hmsimha
There's also the distinction between x = (3) and x = (3,)

------
gejjaxxita
#6 is really confusing. Whenever I encounter something like this my first
reaction is that whenever possible such obscure components of a language
should be avoided and more verbose/clear code used instead.

Programming languages are meant to be read as well as written, and someone
relatively new to Python (and many who have used the language for a long time)
is certain to get confused about the difference between:

    
    
       return [lambda x, i=i : i * x for i in range(5)]
    

and

    
    
       return [lambda x : i * x for i in range(5)]

~~~
mguillech
Agreed 100%, this type of constructions should be avoided in the first place
in favor of more "readable" ones but this happens in a fair amount of code
that I've seen (and I keep seeing).

~~~
maxerickson
Some of it seems to come from people cargo-culting their knowledge of
anonymous and first class functions, so they end up believing that the only
way to pass a function around is to construct it anonymously.

~~~
nine_k
In the particular case of constructing _several_ similar functions that do
essentially the same thing, lambda is a rather natural choice.

Being more explicit and less hacky can well be combined with staying true to
functional style:

    
    
        from functools import partial
        
        mul = lambda x, y: x*y  # could use int.__mul__, too
        multipliers = [partial(mul, n) for n in range(5)]
    

It does the closure-capturing of n for you.

~~~
maxerickson
A function defined with def would work just the same there.

It would have a big ugly 'return' in it and be a few characters longer, but it
would work the same, so I don't see what lambda brings to it.

~~~
nine_k
Both def and lambda would fail identically in this context.

------
outworlder
> "Python is an interpreted, object-oriented, high-level programming language
> with dynamic semantics."

I have an issue with that statement. No languages are inherently "compiled" or
"interpreted", that's a property of the implementation.

If we are talking about CPython here, Python code is compiled to bytecode
which is then interpreted. Not unlike Java - with the difference that the main
implementation has a JIT and afaik, Python's does not.

But that's CPython. What about PyPy? It has a JIT.

~~~
FigBug
> No languages are inherently "compiled" or "interpreted", that's a property
> of the implementation.

A language and it's implementation are usually designed at the same time.
Compiled or interpreted will affect design choices that go into the language.
While additional implementations may follow, it can be hard/impossible to
design a compiler (machine code, not byte code) for a language that was
designed to be interpreted without dropping features (ie eval).

It may be more correct to say 'Python was designed to be interpreted' than
'Python is interpreted'

~~~
nostrademons
Not really - Javascript was designed to be interpreted, and yet
V8/SpiderMonkey/Nitro all JIT-compile it down to machine code, sometimes very
effectively.

~~~
stefantalpalaru
When people talk about "compiled" vs. "interpreted" they usually mean AOT
compilation, not JIT.

~~~
nostrademons
Then Java and .NET are considered "interpreted"? And Android under Dalvik is
"interpreted", but under ART is "compiled" (using Java as the language, which
I'd always thought of as compiled, yet apparently is interpreted under your
definition)? What if you embed Clang & LLVM in your application to run C++?

I think this just illustrates the fuzziness of these definitions. A compiler
is just a piece of code; you can embed it into another piece of code and run
it whenever necessary. Maybe in the world of shrink-wrapped desktop software
there was a sharp distinction between AOT compiled languages and interpreted
ones, but we haven't lived in that world for a couple decades now.

------
mark-r
I've always thought that #1 is a sign of an incorrect operation altogether. If
you want to _always_ modify the passed parameter, it doesn't make sense to
have a default. If you want to return a modified version of the input, you
should make a copy immediately and then you don't get this problem. Doing both
an in-place modification and returning a modified object at the same time is
just wrong.

~~~
ajanuary
A slightly more realistic example:

    
    
        class Bag(object):
            def __init__(self, items=[]):
                self.items = items
    
            def add_item(self, item):
                # check the item is valid
                self.items.append(item)
    
        bag1 = Bag()
        bag1.add('an item')
    
        bag2 = Bag()
        print(bag2.items)

~~~
mark-r
Again, the problem is not what it appears. You're keeping a reference to an
existing item rather than making a copy. The results would be just as bad if
you passed in an initial list rather than taking the default.

    
    
        initial = ['first item']
        bag3 = Bag(initial)
        bag3.add_item('second item')
        print(initial)
    

I think the surprising thing to most people is that you don't automatically
get a _copy_ when you do the assignment. That's how it works in older
languages like C and C++, and how it appears to behave when you use immutable
objects.

------
Hovertruck
"Thus, the bar argument is initialized to its default (i.e., an empty list)
only the first time that foo() is called, but then subsequent calls to foo()
(i.e., without a bar argument specified) will continue to use the same list to
which bar was originally initialized."

This actually happens when the function is defined, not when it's called the
first time.

------
codezero
OT but this page hard crashes Safari on iPhone.

~~~
peter_l_downs
Also crashes Safari on my OS 10.5 Mac, and is unusably laggy in Firefox on the
same computer. All sorts of thrashy javascript nonsense seems to be going on.

~~~
rmrfrmrf
I use JavaScript Blocker for Safari, which is sort of like a less paranoid
(and more convenient) version of NoScript. Looks like this site attempts to
load 45 JavaScript files over 12 iframes, 17 of which JS Blocker blocked.

Mildly excessive? /s

------
cefstat
I have been bitten by #6 in a similar situation in the past. My solution was
the analogue of the rather convoluted

    
    
        def create_multipliers():
          def multiplier(i):
            return lambda x: i*x
          return [multiplier(i) for i in range(5)]
    
        for multiplier in create_multipliers():
          print multiplier(2)
    

I would still prefer that Python doesn't do this.

------
vram22
In "Common Mistake #2", I'd say that the mistake is fairly obvious to anyone
who understands even a little bit about OOP and inheritance. Since class C
doesn't define its own variable x, it has to be that it inherits the x in
class A, so there's no reason to be surprised that C.x changes when A.x does.

~~~
radiowave
While I agree that the "problem" case can be seen as obvious when considered
in isolation, really it's the behaviour of the two cases taken together that
can seem inconsistent. Nothing about understanding OOP or inheritance will
prepare a person for that.

~~~
vram22
>really it's the behaviour of the two cases taken together that can seem
inconsistent.

Why do you think so? I think that both cases seem consistent, or rather,
correct (and therefore this example should not be treated as a common Python
mistake), because x is not assigned a value anywhere in class C, and C
inherits from A, so it should be clear to anyone knowing OOP and inheritance,
that C's x is the same as A's x. (And the same holds true for inherited
methods.) Even the OP says that in the post:

>In other words, C doesn’t have its own x property, independent of A.

~~~
radiowave
What's happening here is that a variable is inheriting its _value_ from the
superclass, except for when it doesn't. And when it doesn't, why is that? Well
presumably it's because something's been overridden - OO tells us that's how
we change the properties that are inherited from the superclass. No wait,
that's not it; nothing's been overridden here. All that's happened is we've
assigned a value to B.x, and doing so seems to have changed the inheritance of
our class.

So this variable is neither completely shared across classes and their
subclasses (per Smalltalk class variables), nor completely independent across
classes and their subclasses (per Smalltalk class _instance_ variable), but
instead its [in]dependence alters based upon whether (and where) you assign
values to it.

While I can understand that in terms of the dictionary mechanism used to
implement it, from my point of view it's just weird behaviour.

~~~
ndeine
This is actually the same thing as regular Python scoping rules; there's not
even any fancy OOP logic behind it. Here's the same thing, but using global
scope and functions instead of classes and inheritance.

    
    
         >>> x = 1
         >>> def a():
        ...:     print(x)
        ...:
         >>> def b():
        ...:     x = 2
        ...:     print(x)
        ...:
         >>> def c():
        ...:     print(x)
        ...:
         >>> a(), b(), c()
        1
        2
        1
         >>> x = 3
         >>> a(), b(), c()
        3
        2
        3
    

I think there is an argument to be made that classes are special and "reaching
upwards" into the superclass scope should not occur - a unique copy should be
made - but I also think that Python's way of doing it makes enough sense that
it is not confusing. The Python devs are at least consistent about having
their own way of doing things.

~~~
radiowave
That's an interesting point, that the behaviour of an inherited class variable
is consistent with a case you show where inheritance plays no part at all.

So from that point of view, it comes down to whether we expect that an
inherited class variable really is just some variable in an outer scope that
we can shadow with a local variable of the same name (per your example), or
whether we expect that inheritance provides some stronger notion of ownership
of the inherited variable.

I dislike the former case, largely because I dislike the idea that the
location at which a variable is stored can appear to change merely by
assigning to it. But then, I dislike Python's implicit declaration of local
variables for exactly the same reason. So you're right, there _IS_ some
consistency there. ;-)

------
evincarofautumn
Here’s a thought experiment for you—think of “common mistakes in language X”
as “design flaws in language X” or “ways in which language X is surprising”
and what could have been done to mitigate that.

~~~
pekk
Whether you choose 0-based indexing or 1-based indexing, somebody is going to
be confused about indexing sometime.

~~~
JadeNB
That's why I always use 7-based indexing, to avoid any possibility of
confusion.

------
udioron
Regarding circular imports and #7: The main problem in arises when using the
_from mymodule import mysymbol_ notion.

The example solved this by properly using _import mymodule_ , although this
might cause some more problem if your design is wrong, as see in the example.
Calling _f()_ from the module ("library") code itself is a very bad idea.
Instead one should do this:

a.py:

    
    
        import b
    
        def f():
            return b.x
    

b.py:

    
    
        import a
    
        x = 1
    
        def g():
            print a.f()
    

main.py:

    
    
        import a
    
        a.f()

------
michaelmior
For the first gotcha, using None as a default argument solves the problem, but
checking `if not bar` instead of `if bar is None` can produce different
results if bar evaluates to None in a boolean context.

    
    
        >>> def foo(bar=None):
        ...    if not bar:
        ...        bar = []
        ...    bar.append("baz")
        ...    return bar
        ...
        >>> bar = []
        >>> foo(bar)
        ["baz"]
        >>> bar
        []

~~~
icebraining
True, but foo shouldn't be modifying bar anyway; make a copy instead:

    
    
      bar = list(bar) if bar else []

------
dmritard96
There are more than a few interesting points in here but this is funny to me
coming from someone who is seemingly well versed in python: Mistake #5

numbers = [n for n in range(10)]

this should be: range(10)

~~~
abaschin
not in Python 3, where range() replaces xrange()

~~~
dmritard96
interesting, haven't written in python 3 yet

------
ygra
And I would have thought that incorrect usage of bytestrings for text and then
asking on Stack Overflow about the UnicodeDecodeErrors would be quite common
as well ...

~~~
gtaylor
Yeah, bigtime. I'm not sure that would make for a quick, easy countdown list
of a blog post, though.

------
cridenour
For #7, now you have a performance problem of importing every time you run
that function. Rather, you can place the import at the bottom of b.py and be
okay.

~~~
mguillech
Python caches modules imported, you can check it out in your local shell with:
import sys ; sys.modules. That's why whenever you make changes to a module
which has been already loaded you won't see the changes until you load the
module again, either quitting the shell or using reload(module) on Python 2.x

------
tom_jones
Any reason why you're using a slice here? >>> numbers[:] = [n for n in numbers
if not odd(n)] I'm thinking that doing >>> numbers = [n for n in numbers if
not odd(n)] wouldn't be a problem since the assignment is executed after the
computation of the list comprehension.

------
andreif
I have just been asked by a colleague about yet another gotcha when you have
for example:

    
    
        # mypackage/__init__.py
        from .settings import settings
    

and when trying to import settings from mypackage

    
    
        from mypackage import settings
    

you get module instead of settings object.

~~~
kyro
from mypackage.settings import settings

------
leephillips
Is the explanation for #1 correct?

"when the default value for a function argument is an expression, the
expression is evaluated only once"

I would explain the behavior he shows as due to the default value being
_mutable_. I don't see an expression there, just an empty list used as a
default.

~~~
herge
The default value is an expression which is executed when the module is
instantiated, even if the expression results in an empty list. If I had:

    
    
        def f(now=datetime.datetime.now()):
            ...
    

now would be the time when the module was loaded, not when f is called the
first time, or when f is called after that, despite datetime objects being
immutable.

~~~
kghose

      import datetime, time
      def foo2():
        def foo(a=datetime.datetime.now()):
          time.sleep(1)
          return a
        return foo()
    
      for n in range(10): print foo2()
    
      2014-05-08 11:23:01.642871
      2014-05-08 11:23:02.644276
      2014-05-08 11:23:03.644579
      2014-05-08 11:23:04.645146
      2014-05-08 11:23:05.646328
      2014-05-08 11:23:06.647572
      2014-05-08 11:23:07.647904
      2014-05-08 11:23:08.648213
      2014-05-08 11:23:09.648973
      2014-05-08 11:23:10.649742
    

So, not quite module load time, but evaluation time.

~~~
marcosdumay
Kilink's comment apply again.

If you define several functions at different times, the default argument will
be evaluated each time you define a new function. You'll have the same
behaviour if you keep reassigning lambdas at the same function name, or if you
keep edditing the globals.

You are misunderstanding how dynamic Python is. And, yes, the part about
"module load time" was a simplification.

------
baby
Does someone knows if the first mistake is also present in Ruby? I ran into
this code:

    
    
        def get_analytics_data(options = {})
            options = options.merge({'ids' => GAReadonly.configuration.id })
    

and I wonder if I should fix it.

~~~
ludamad
It's fine, Ruby takes the more sensible semantics here (IMO)

------
ajanuary
The scoping rules in #4, combined with being able to reference variables
before definite assignment, is what leads to the 'variable hoisting' in
Javascript.

Is #6 really called 'late binding'? That seems like the wrong term.

~~~
mguillech
Please check out
[http://en.wikipedia.org/wiki/Late_binding](http://en.wikipedia.org/wiki/Late_binding)
to know more.

~~~
ajanuary
I've always heard it used to refer to method dispatch, which that wikipedia
article also seems to. However, it seems like the Python spec does use it to
refer to when variable values are resolved.

------
worklogin
Even the first argument doesn't make sense to me. The optional argument is
within the scope of the function; why is the temporary optional argument
getting carried over?

------
udioron
Regarding common mistake #5, when using enumerate(mylist), mylist can be
modified::

    
    
        for i, x in enumerate(numbers):
            if odd(x):
               del numbers[i]

------
euske
Some of the mistakes mentioned in OP (#1, #3, and #4) can be automatically
caught by tools like PyLint (and to lesser extent, Pyflakes), as well as good
unittests.

------
adidash
Bookmarked it. Might not be the most complete list but appreciate the OP is
taking feedback and updating it. Kudos to hsinger!

------
skywhopper
Hmm, "Common Mistake #1" appears to be an error on the side of the Python
interpreter, not the programmer.

------
ronaldx
I now feel dumb, since Python most often burns me with "Assignment Creates
References, Not Copies".

------
lukasm
For #4 use nonlocal keyword (python 3.x)

------
yeukhon
Circular import is definitely one of the things I absolutely hate to deal with
again and again.

------
warriar
Common Mistake: ceate a blog that brings mobile Safari to crash reliably on
every visit!

------
heyjonboy
Does this link crash anyone else's Safari on iPhone 5s?

------
andrewdon
good stuff

------
picanteverde
Great Article!

------
kghose
Also, for the first example, it is a scope issue:

    
    
      def foo2():
        def foo(a=[]):
          a.append('ba')
          return a
        return foo()
    
      print foo2()
      print foo2()
      print foo2()
      print foo2()
      print foo2()
    

Gives

    
    
      ['ba']
      ['ba']
      ['ba']
      ['ba']
      ['ba']

~~~
kilink
I think the way he described it is pretty accurate. Default keyword arguments
are only evaluated once at function definition, so supplying a mutable default
keyword argument can cause issues.

Your example is pretty contrived and doesn't illustrate what he was pointing
out, as you're creating a new function foo every time foo2 is called, and only
calling it once.

