Hacker News new | past | comments | ask | show | jobs | submit login

Couldn't agree more! One of my all time new python interview questions gets a surprisingly large number of developers.

Given a function like:

    def append_one(l=[]):


        return l

What does this return each time?

    >>> append_one()

    >>> append_one()

    >>> append_one()

The l (lowercase L) and the 1 (one) look really similar. Could that be the cause of some confusion? Of course, the function name helps, but most developers have learned not to trust function names to be an accurate description of what the function does, especially in tricky interview questions.

Still, I'd change this to something like:

    def append_five(l=[]):


        return l
It tests the same thing (knowledge of how default parameters work), but without the confounding problem of similar-looking characters. Of course, syntax highlighting would help the applicant out.

All of that being said, I still don't doubt that many developers don't know what they should about default parameters.

I'm not a Python expert, but iirc from various blog posts the "l" variable does not get reset between function calls which will cause undesired behavior. So calling the function 3 times without argument would produce a list of size 1,2, and 3 with the third call rather than 3 lists of size 1. Can any Python guru's confirm?

The expression presented in the parameter list is only evaluated once, and that is when the method is defined. The confusion is that people assume the expression is evaluated every time the method is called.

The confusion is that people assume the expression is evaluated every time the method is called.

Because that's how it works in a lot of other languages, such as Ruby and Javascript.

I doubt that's the only reason. I fell for it myself at first without ever seeing a line of Ruby. It initially feels intuitive, and that's why I think most fall for it.

I think it's because the arguments are bound when the function is called. It's just natural that you'd expect the default values to also be bound at the same time.

Yes that is correct. The default value gets created when the function is interpreted ("compiled").

> The default value gets created when the function is interpreted ("compiled").

No. The default value gets "created" (the expression is evaluated and stored) when the def statement is executed. Take the following example:

  In [1]: def foo():
     ...:     def append_five(l=[]):
     ...:         l.append(5)
     ...:         return l
     ...:     return append_five

  In [2]: a = foo()

  In [3]: b = foo()

  In [4]: a()
  Out[4]: [5]

  In [5]: b()
  Out[5]: [5]

  In [6]: _4 is _5
  Out[6]: False
We only wrote one function definition, but multiple lists are created. (They are created when the "def append_five" definition executes, during the execution of foo.)

I thought that was what he meant. Is there any sharp distinction between "interpreting" and "evaluating" in python that I am unaware of? I've always used the words more or less interchangeably. But now that I think about it that might be a little naive since I have no idea how it works under the hood

You could say that interpreting is first parsing and second executing/evaluating. The parser tokenizes and does a small amount of optimization such as ignoring unassigned values.

The parent wrote "compiled", which is certainly more incorrect than either "interpreted" or "evaluated."

The key is object mutability. A list type is mutable and a tuple type is immutable.

If the candidate correctly deduces what will happen, I'll ask them to write a bug-free version, which looks like one of the below:

def append_one(var=None):

    var = var or []


    return var

def append_one(var=None):

    if var is None:

        var = []


    return var

Mutability is a very subtle but very important concept to understand in python. Everyone who uses python for non-trivial code should know it well: https://docs.python.org/2/reference/datamodel.html

> The key is object mutability. A list type is mutable and a tuple type is immutable.

I don't think the question has much to do with mutability, it isn't surprising to me nor would I imagine most programmers that a list is mutable, that's very common.

The surprising part of this question is that the default value of 'l' continues to exist outside the lexical scope of the function, the expected behavior is that the value of 'l' is initialized at function call time and is garbage collected after each call. As it sits, using default values in python is sort of like defining a global that only has a named reference inside the function block, which is very strange.

It has something to do with mutability, because if an object is immutable, the behavior of Python matches what the naive developer expects. It's only mutable objects that break those expectations.

Don't even get into unexpected behavior in classes:

    In [1]: class A(object):
       ...:     l = []

    In [2]: a, b = A(), A()

    In [3]: a.l.append("Something")

    In [4]: a.l
    Out[4]: ['Something']

    In [5]: b.l
    Out[5]: ['Something']

    In [6]: class B(object0:

    In [6]: class B(object):
       ...:     l = None
       ...:     def __init__(self):
       ...:         self.l = []

    In [7]: c, d = B(), B()

    In [8]: c.l.append("Something")

    In [9]: c.l, d.l
    Out[9]: (['Something'], [])

The other scoping issue in python that always struck me as strange is that loop variables aren't scoped to the loop, they continue to exist after the loop completes. I can see the logic for this feature even if I don't agree with it, but what I really don't get is that the loop variables are not defined if you iterate over something that is empty:

   >>> for item in [1]:
   ...   print item
   >>> item

   >>> for i in []:
   ...   print i
   >>> i
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   NameError: name 'i' is not defined
I would expect i == None. That oddity makes it dangerous to use the feature unless you're really careful (e.g. using a for - else construct).

Then there's for-else:

    In [1]: for i in []:
       ...:     pass
       ...: else:
       ...:     print 'Else!'

    In [2]: for i in []:
       ...:     break
       ...: else:
       ...:    print 'Else!'

    In [3]: for i in range(2):
       ...:     break
       ...: else:
       ...:     print 'Else!'

    In [4]: for i in range(2):
       ...:     pass
       ...: else:
       ...:     print 'Else!'
The syntax could be interpreted as:

  if len(l) == 0:
    print "Else!"
    for i in l:
The "catch cases where a `break` is triggered" case isn't common enough for this syntax feature to be encountered very often, leading to confusion when people come across it (though at least it's not a bug where a common use-case has weird behavior to new-comers).

> but what I really don't get is that the loop variables are not defined if you iterate over something that is empty

if you conceptualize how a for-loop has to work as a while-loop using Python's iterator protocol (which is the only way the iterator protocol itself makes sense), it seems pretty intuitive.

That is, this:

  for item in items:
becomes, approximately:

      while True:
          __hidden_iter = items.iter()
              item = __hidden_iter.next()
          except StopIteration:
              raise __NormalLoopExit
  except __NormalLoopExit:
If you have an empty loop, the first assignment doesn't complete (instead raising StopIteration in evaluating the right side, which raises the notional exception __NormalLoopExit, which invokes the else: clause, if any) so the variable never gets around to being created.

> if an object is immutable, the behavior of Python matches what the naive developer expects

If the object was immutable then append wouldn't work. That's hardly matching expectations.

Most immutable objects don't have methods that would mutate the value, but fail because the object is immutable...

I guess the clarification to what I was saying is that, in the simple case (integers, strings, None) the objects are immutable. It's only getting into cases where the value of the object itself is mutable, that you run into issues. If all objects (or all objects 'allowed' as default values) were immutable, then this behavior would not trigger.

So saying that mutability has nothing to do with it isn't entirely true. It's the immutability of the types of values used in most simple cases that hides this issue from developers until they run into a more complex case.

Read my post that has the "correct answers" which show you how to do it. The key is setting the default to None and then doing something like:

if val is None: val = []

or the more idiomatic python way:

    val = val or []

I believe the former is more idiomatic, but I don't have a reference.

You want to explicitly check against `None` so that you're not overwriting all falsey values of `val` - even though you should generally try to enforce argument types, your second example would cause unexpected behavior in some cases, particularly those that have non-falsey 'default' assignments

I'm well aware of that way to do it, but it doesn't excuse a different way being unintuitive.

Why? You would just get a new list back.

I can see why you might think so, but remember the Zen of Python is to have only one obvious way to do something.

    >>> [1,2,3] + [4,5]
    [1, 2, 3, 4, 5]
Thus appending should do something different than addition.

    >>> x = [1,2,3]
    >>> x.append([4,5])
    >>> x
    [1, 2, 3, [4, 5]]

I'm sorry, but I don't understand why you would think this as unexpected behaviour? For the class A, the list l is a class-level attribute, hence it can be referred via either a or b objects, but for class B, after initialisation, l is an object attribute, so it is different for both c and d.

It's not the concept that can be confusing, it's the syntax python chose.

In most of the languages I'm familiar with, there are very clear syntax differences when working with class attributes. For example, in many languages class attributes have to be accessed via the class name instead of from an instance of the class making it clear to the programmer they are working with a class attribute, e.g. MyClass.myClassVariable not myInstance.myClassVariable. Additionally, the way you define class attributes in python is the way you define instance attributes in many languages, which just adds to the confusion. e.g. in Java or C# you can define class variables directly in the class body, but an explicit 'static' keyword is needed, undecorated definitions are assumed to be instance variables.

Finally, I think the definition of class B above is a little more nuanced, class B has both a class attribute named l AND an instance attribute named l.

B.l == None and B().l == []

Ah, gotcha!

It's been a while since I've done major OOP coding in any language other than Python, so I'm a little rusty. The issues you raise are perfectly legitimate and would be understandably confusing to newcomers to the language. :)

In Python everything is an object, including a function. The default value isn't a global, it belongs to the function.

I wonder if people who weren't exposed to languages which work differently ala C++ would be as surprised?

I'm not a Python dev, but I've been meaning to learn for a while. So this is really interesting stuff. A few questions, if you don't mind.

I understand mutability and immutability in other languages (and I gave your link a quick read to make sure there weren't any weird Python-specific rules), so I understand how the list can change and still be the same object, but a tuple or string would not. But why does that mean that the default parameter object remains in existence throughout all calls, instead of being recreated each time it is called?

Is there a reason for this being the default behavior? It seems like the majority of the time you would want to use a default parameter, you'd want it to behave like your bug-free examples.

Think of the default parameter values as arguments to the initializer for the function object. If you passed a list into the constructor of a class, you wouldn't be surprised that if you modified the list outside the class that it would modify the same list inside the class.

While that explains how it works, I actually completely agree with you. This is surprising behavior and, in a language that prides itself on not being surprising, seems, well, surprising.

I have to wonder if performance isn't the big reason for it. If your default is [], it isn't a big deal to re-evaluate, but if your default is get_default_cities_from_slow_web_service(), having that re-evaluated on every function call would be catastrophic. Given the choice between two negatives, the choice they made is probably reasonable.

You pretty much nailed it right there.

Before I ever ask this question (I do a lot of tech interviews sadly) I always ask the candidate about object mutability vs immutability. Almost everyone knows the textbook answer, and only a few know the actual implications of it. This tests which they know :)

Default kwargs of a function are defined at function definition. However, they are only in scope, for the scope of said function. It is a weird but important subtle difference.

I think only the second is truly bug-free. The first only does what the user expects if they pass in a non-empty list:

  my_list = []
  # my_list didn't get anything appended to it
This shows up another subtle trap related to the "truthiness" (or falsiness in this case) of things like the empty list.

Since append doesn't return a value, how about:

  def append_one(var=None):
      return (var or []) + [1]
Would this take longer and/or use more storage for long lists as vars?

When you use + on two lists, a new list is created, and elements from both are copied into the new one. Whereas the append operation modifies the list, and simply adds a value. Keep in mind that a python "list" is really like a C++ vector, so while sometimes append operation sometimes allocates a new array, and copies all the values, in general is O(1). The add operation is O(n).

And besides all that, there is nothing wrong with doing an append on one line, and returning the variable on the next. It's clear and readable.

I like this, very elegant actually.

I'd argue that the key difference from other languages is (re)assignment rather than (im)mutability.

Yes, that's correct. The default value is only interpreted once, when the `def` statement is called. After that point, it's completely mutable. You have to see Python functions as objects and default parameter values as object variables.

The problem is not the existence of the object variables, but that they are in such an unfortunate place. The rest of the parameter list is declaring fresh local variables. It's inconsistent that the left side of the equals sign is per-invocation, and the right side is per-def.

Not the op, but I'd accept the confusion response of: [[]] [[],[]] [[],[],[]]

because the behavior is the same, whether or not they misread an 'l' as a 1.

Python doesn't seem to agree with you :^)

  In [1]: a = []

  In [2]: a.append(a)

  In [3]: a
  Out[3]: [[...]]

  In [4]: a[0]
  Out[4]: [[...]]

  In [5]: a[0][0]
  Out[5]: [[...]]

  In [6]: a[0][0][0]
  Out[6]: [[...]]

  In [7]: a[0][0][0][0]
  Out[7]: [[...]]

  In [8]: a.append(a)

  In [9]: a
  Out[9]: [[...], [...]]

  In [10]: a[0][1][0] is a
  Out[10]: True

  In [11]: id(a)
  Out[11]: 4547140064

  In [12]: id(a[0][1][0])
  Out[12]: 4547140064

Yeah, the whole infinite loop thing. Wasn't fully thinking when I wrote my reply. Good catch.

I would caution you not to interview on things you would not be happy to see in your code base.

In my experence your much better off with people that look at odd syntax and say, "I don't know what that does" vs those who do.

Well, this is pulled from a list of common python errors.

Using the default value in some capacity isn't that uncommon... Though maybe you were speaking to a more general case? for example, decoding a an obfuscated C file.

I've seen this trotted out time and time again, and at least in this simplified form it's a red herring. If you're going to mutate the argument, it doesn't make sense to give it a default value. If you're going to return a modified form of the input you need to make a copy of it. Doing both is simply absurd.

Disagree. Would say it's a decent violation of expectations for the same instance to be passed into every invocation. Of course, the counterargument is 'know your tools,' which I'm partial to, but the fact that this pops up is an indication it is counterintuitive.

I actually agree it's counterintuitive. But this particular example makes no sense, nobody should be writing real-world code that looks like this in the first place. Either modify the original or return a copy, don't try to do both.

Wow, that is really ugly semantics. Here are some notes of mine on how hard R works to avoid exposing this sort of aliasing/mutability issue to the user: http://www.win-vector.com/blog/2014/04/you-dont-need-to-unde...

Yeah i really don't understand why this is just assumed to be a common 'gotcha' to be recognized and avoided by every competent python programmer. What exactly does the python spec specify as the desired behavior here? If you have this 'broken stair' that everyone should just know to step over, shouldn't somebody actually fix the stair!?

I know python is not unique in having warts like this, but it's pretty b.s. in general that unexpected behavior is just thought to be okay, especially in a language meant to be very accessible, and most especially since it's being used as a perfectly valid metric for disqualifying new python programmers from employment.

It's not broken when you understand that functions are objects and default parameters are just members of those objects. Each time the function is executed you get local vars that point to these object members. If one is a mutable type, any changes you make to it will then obviously persist.

It leaves me wondering, are the parameters also scoped to the class(seeing as they're declared at the same time)? Wouldn't this cause an issue with concurrent access to the function?

In Python there's no such thing, because GIL. Maybe in JPython.

At what level would you test an interviewee with this kind of question: Python guru, Python expert, Python ninja, Python rockstar, or merely "is familiar with Python"? Your example is a very common gotcha that has been covered ad nauseam, but IMO it's still the kind of bug that would be caught immediately in code review and is very easily fixed.

I thought we were past using trick questions like this anyways! Since, you know, it would be easy for an experienced yet anxious programmer to get tripped up on this, but someone who just browsed "python interview questions 101" to breeze on through. Also it selects against experienced multi-language developers, since language-specific quirks like this are not generally useful information to keep front-loaded, but are trivial to become re-familiar with in a work environment, or even gasp learn for the first time from a co-worker or helpful article.

If the industry as a whole cared about evidence-based, non-superstitious, non-monoculture-reinforcing hiring practices, we'd realize that tripping people up and judging programming capability based on minutia is as unfair as it is self-defeating.

I don't use this as a trick question. I ask them to describe mutable vs immutable objects and gotchas. Then I write this function and ask them to describe in excruciating detail what it does, why, and how.

It is simply a easy way to gauge a candidate's proficiency with the language. It also helps if they know that this is a problem. You'd be shocked to know a lot of people on the market for jobs writing python don't get this question correct, but the smart ones often do when talking through it even if they didn't originally.

I think the idea is to see whether the interviewee is a kind of person who always googles and reads on "gotchas of language X" whenever he/she learns X.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact