Hacker News new | past | comments | ask | show | jobs | submit login
A few things to remember while coding in Python (satyajit.ranjeev.in)
270 points by satyajitranjeev on May 19, 2012 | hide | past | favorite | 142 comments



It's worth explaining why mutable defaults are bad.

The problem with mutable defaults is that they are evaluated once only when the function is defined. Each time the function is called you'll be using the same mutable variable that was created during function definition.


I came here looking for an explanation for this, so thanks.

To the curious - SO has an explanation of why Python was designed like this, which I found interesting: http://stackoverflow.com/questions/1132941/least-astonishmen...


I don't buy this explanation.

Actually, this is not a design flaw, and it is not because of internals, or performance. It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.

Why in Common Lisp defaults behave the way one would expect, then? Functions are also first class, but defaults are evaluated at every call.


I don't know CL, but in python function definitions can be executed multiple times... at the top level this happens at module import, so it ends up being only once. But in nested definitions, e.g.

   def foo():
      def bar():
         ...
      return bar
   foo() == foo() # false
Two different function objects are created. If, in the above examle, bar took a pram thelist=[], each call to foo would produce a bar function with a different list instance for thelist. The default values can be read as expressions passed to the function object constructor, rather than a bit of code to be evaluated each function run.

I don't know how CL works in this regard, nor do I know which is better or worse. I think the explanation linked did a terrible job conflating first class functions with execution and runtime models. Some of the answers below it explain better tho. :)


The fact that definition can be evaluated many times is not really relevant here. What is important is how one specifies language's semantic -- for instance, Common Lisp: The Language, 2nd edition book (I don't own ANSI standard) says:

When the function represented by the lambda expression is applied to arguments, the arguments and parameters are processed in order from left to right. (...) If optional parameters are specified, then each one is processed as follows. If any unprocessed arguments remain, then the parameter variable var is bound to the next remaining arguments, just as for required parameter. If no arguments remain, however, then the initform part of the parameter specifier is evaluated, and the parameter variable is bound to the resulting value (...).

The CLTL2 specifies that the form representing the default value of optional parameter shall be evaluated every time the parameter is not provided.


> It's worth explaining why mutable defaults are bad

They can also be good. Here's an example from the Reddit discussion, showing how a mutable default can be used to very neatly and cleanly add memorization to a function:

   def fib(n, m={}):
      if n not in m:
         m[n] = 1 if n < 2 else fib(n-1) + fib(n-2)
      return m[n]


That's an unpythonic hack. It is possible to have an explicit static variable using function attributes; another way is to use a proper memoization decorator which factors this out.


Cleverness like this makes you feel warm and fuzzy inside right up to the point where someone decides to actually pass that second argument to your function.


Well in Python 3 you can warn them:

    def fib(n, m:"donotusethisparameter"={}):


Is that clean and neat, or weird and inscrutable? Will it be clear to /anyone/ reading that code what it's doing?


I'm a Python newbie, and that code was immediately clear and obvious to me when I read it.

I won't say it would be clear anyone that reads it, because we live in a world where people who claim to be programmers can't do fizz buzz.


I'm a Python veteran, and, if you do it like this, I'll shoot you.

Add a @memoize decorator and do it there, you need to always be as obvious as possible. Compare:

    @memoize
    def fibonacci(n): pass
    
    def fibonacci(n, memory=[]): pass
You don't even need documentation for the first example.


Starting from Python 3.2 there is a builtin decorator: http://docs.python.org/dev/library/functools.html#functools....


That's fantastic, thanks for the link.


I agree with your sceptisism. This is misusing a language feature, a better approach is using a decorator.


I blogged about this, and other uses for mutable default arguments (with tongue somewhat in cheek) a few weeks ago: http://inglesp.github.com/2012/03/24/mutable-default-argumen...


This also explains when mutable defaults are not a problem: when they are not mutated in the function's body. There's nothing wrong with this:

  def f(seq=[]):
    for x in seq:
      # do something


Yes but no. In two month, when the function has grown, the next coder may not notice the issue and start mutating the default in the function body. Then you have a hidden killer bug.

Pass all code under pylint scrutiny, comply to its complains or adjust its rules, do it early. That is the recommendation I wish all devs could read.


That seems like decidedly unexpected behaviour and makes default params far less useful.


It's a mistake. Default values for optional parameters don't act that way in any other language I know of, including Common Lisp, in which functions are also firstclass.


It's most certainly not a mistake; Python 3 would probably have fixed it, if it were. It is an (admittedly, strange) side effect of the way 'def' works.


But 'def' doesn't have to work that way. Consider, in CL:

  > (defvar *fn*
      (let ((x 3))
        (lambda (&optional (y (list nil x)))
          (push 7 (car y))   ; modifies the list
          y)))
  *FN*
  > (funcall *fn*)
  ((7) 3)
  > (funcall *fn*)
  ((7) 3)
From this example you can see two things. First, the binding of 'x' is closed over when the lambda expression is evaluated. And second, the expression that provides the default value of 'y' is evaluated every time the function is called.

There's no fundamental reason it couldn't have worked that way in Python. (I understand that changing the language so it worked that way now would likely break some code.)

EDIT: fixed formatting.


My lisp is a bit rusty, but it looks like what you're doing there is returning a function which gets redefined every time you reuse the function.

The equivalent Python would be something like this:

    def function():
        x = 3
        def internal(x, foo=[]):
            foo.append([7])
            foo.append(x)
            return foo
        return internal(x)
        
    print function()
    print function()
Which does what you would expect:

    [[7], 3]
    [[7], 3]


No. In my example the function is created only once, and called twice.


What's that lambda thingo in the middle then? Pretty sure that's another function, redefined every time your function is called.


Yes, the lambda creates the function. Note that defvar does not. So there is still only one function being defined here.


Ok, I see now.

You've still got that &optional argument though. I don't see a huge amount of difference from a semantic point of view between that and the Python version though (ie. if x == None: ...).


No, the lambda expression creates the function, which is returned as the value of the let block; defvar just binds the function to a name so we can use it multiple times.


This translates into

    fn = lambda y=[y]: y.push(7); return y
if you accept the ; to separate statements, as the lambda in python is syntactically only allowed to contain one statement.

(The introduction of the variable x into the example is not important for the behavior of default arguments, however, it is important for a separate issue. I've stripped it out here.)



Surely not. The implementation already has to check that the number of provided arguments is valid. The decision of whether to evaluate the default expression can be part of that.

The code that evaluates the default expression doesn't need to be in a separate function, either, so the argument that calling that function is too expensive also doesn't hold water.

I just tried a test in SBCL:

  (defun foo1 (x) x)
  (defun test1 (n) (dotimes (i n) (foo1 (cons nil nil))))
  (time (test1 100000000))
  => 4.4 sec, or 44ns / iteration
  (defun foo2 (&optional (x (cons nil nil))) x)
  (defun test2 (n) (dotimes (i n) (foo2)))
  (time (test2 100000000))
  => 4.1 sec, or 41ns / iteration
The version with the optional parameter is actually slightly faster, which completely blows a hole in the performance argument.

Look, no language is perfect -- not even Common Lisp :-) I think users are better served when design flaws in a language are acknowledged without defensiveness than when bogus justifications are offered.


You're assuming that the function-calling overhead is the same in python as in CL. I don't think that's the case, and it definitely wasn't at the start.

I don't agree that this is a design flaw. As I recall it bit me once as a beginner, and never again in over a decade of using python, and as a lisp hacker you know you don't design a language for beginners. :-)


IMO mutable default arguments should be forbidden just as mutable keys are not accepted in dictionaries. All of the examples which claim to have a use-case for mutable default values can be rewritten with more explicit (thus more pythonic) constructs.


The thing is, None is always a possible value for a parameter so it's actually more robust if functions are written to expect None.

If you say "f(x=[])" (assuming that worked without the actual side effects it has), someone could still say "x(None)" instead of "x()", causing the function to die. Since a robust program isn't able to avoid checking for None, it might as well set defaults there too.

There is another case where this is important; you might want the equivalent of "f(x=expensive_function_to_calculate_useful_default())", and you don't want that function called unless it needs to be. Only the x=None approach allows this to be deferred.


I disagree. If you want your programs to be robust like that, you now have to check for every case where someone might pass in something stupid (dict instead of int, maybe?). Much better to catch errors further up the chain and keep your low level code simple (ie. pass me something other than an iterable and I blow up).

In the expensive case, I'd just calculate it once and store it somewhere (possibly as a lookup dictionary if there are multiple inputs) and access that from within the function.


Well that's true, a function is generally written as if it's been given what it wants. I wouldn't check for other types either.

But None is a result that can happen in situations that would otherwise return exactly the expected type. If "nothingness" can be meaningful (especially in a function that accepts an empty list as a parameter, say), it's nicer if the code just deals with None itself instead of requiring checks for None in all the callers.


It's like whitespace. Everybody has this reaction at first, then they get over it.


Ahem, we have repeatedly intricate issues that are related to this kind of behavior, and I have yet to see an annoying white space bug.

Not to say I think python should be changed on this point. It shouldn't, there are code checkers that warn you on the gotcha, let's use them.


Actually, if you understand the way Python is evaluated (dig in, the core is pretty transparent), it's the only behavior that makes sense in this case. It's also documented as such[1], so it's quite expected. Default parameters are still just as useful for constants, such as:

    def f(x=0, y="foo", z=3.14159):
This, however, is a perfectly Pythonic idiom:

    def f(L=None):
        if L is None:
            L = []
[1]: http://docs.python.org/reference/compound_stmts.html#functio...


While it seems logical when you understand what's going on, from a practical point of view I can't see how this would ever be useful. The tradeoff appears to be that the functions are first class objects. I'm not sure what the benefit here is though. Does having them as first class objects allow some useful idioms? (I'm a ruby dev but I'm genuinely curious to know what this allows you to do)


There are a few use cases for default variables on effbot's site: http://effbot.org/zone/default-values.htm

Basically, sometimes you do want to reuse the mutable between function calls, and in those cases it can save a fair bit of code passing it in repeatedly.


Good coverage. I use it quite often for cache dictionary, it's much simpler API and overall code, than creatng a new class for it. Demo snippet from effbot's site:

  def calculate(a, b, c, memo={}):
    try:
      value = memo[a, b, c] # return already calculated value
    except KeyError:
      value = heavy_calculation(a, b, c)
      memo[a, b, c] = value # update the memo dictionary
      return value


This seems a bit leaky. You're exposing the caching mechanism in the method signature(yeah, ok, in practice it's unlikely to be a problem).


The other option is to create your own cache object and pass that in (and around, if it's a recursive function). Of course, in Python pretty much every cache object follows the dictionary interface anyway, so it doesn't really matter. One of the benefits of duck typing :)


Functions (methods) are first class objects also in Ruby: methods are instances of the class Method, while Procs are a lightweight alternative. Default arguments in Ruby are not mutable in any kind of function object, be it a lambda proc, a regular proc, or a method.

You can use a mutable default argument as an ersatz static variable, e.g. for memoization.


> Default arguments in Ruby are not mutable in any kind of function object

Maybe I misunderstood you, but they are perfectly mutable:

    class A
      attr_accessor :a
      def initialize
        @a = []
      end

      def b x = a
         x << 1
      end
    end
     
    obj = A.new
    # => #<A:0x007fd92b2e9850 @a=[]> 
    obj.b
    # => [1] 
    obj.b
    # => [1, 1] 
    obj.a
    # => [1, 1] 
If you mean "inline default arguments are not mutable", that's not true either. What is true is that the default argument is evaluated when the function is called, not when it is defined:

    a = 0
    # => 0 
    x = lambda {|y = (a + 1)| y }
    # => #<Proc:0x007fd92b1dae50@(irb):33 (lambda)> 
    x[]
    # => 1 
    a = 5
    # => 5 
    x[]
    # => 6


And this is expected behavior. What is unexpected is when this behavior is afforded to new list or dictionary arguments in Python, just as it would be unexpected to say

    ruby> class A; end
     => nil 
    ruby> def foo(bar = A.new); return bar; end
     => nil 
    ruby> foo
     => #<A:0x00000101985060> 
    ruby> foo
     => #<A:0x0000010197a6b0>
and get back the same object each time `foo` is called in Ruby.

I've even seen a major Python library with this bug (I'm sorry, I don't recall which off-hand). It's really surprising behavior for new Python devs.


Good catch, I hadn't thought of the case where the default argument is expressed in terms of another variable.


It's not designed to be useful, it just is. Functions are objects, yes, and the def statement brings them into being and assigns them to the given name in the current scope:

    >>> def f(x):
    ...     return x + 5
    >>> type(f)
    <type 'function'>
    >>> dis.dis(f.func_code)
      2           0 LOAD_FAST                0 (x)
                  3 LOAD_CONST               1 (5)
                  6 BINARY_ADD          
                  7 RETURN_VALUE       
    >>> g = f
    >>> g(10)
    15
    >>> g is f
    True
They're just variables in the current scope. If you're quite clever, your brain is already figuring out that has some interesting implications that some libraries use:

    >>> import socket
    >>> socket.gethostbyname('www.google.com')
    '74.125.71.103'
    >>> socket.gethostbyname = lambda i: '10.0.0.1'
    >>> socket.gethostbyname('www.google.com')
    '10.0.0.1'
I will not pass judgement on monkey patching like this, just pointing out it's doable. I know for a fact Ruby can as well.

Functions just being variables has useful properties when you're doing something like fancy switch/case type things (the readability of this is questionable, but it's cool to look at it, like a Duff's device):

    >>> i = 1
    >>> { str: func1, unicode: func2, int: func3 }.get(type(i), func1)(i)
    in func3
Also consider something like this, which is how decorators work (and they're incredibly useful), sort of like a closure:

    >>> def maker(i):
    ...   def ret(x):
    ...     return x + i
    ...   return ret

    >>> f, g = maker(10), maker(100)
    >>> f(5), g(10)
    (15, 110)
I'd be surprised if Ruby couldn't do everything I just did.


It's not designed to be useful

By useful he means mucking up your program in totally unexpected ways.

He's being nice about it being a silly decision to have it behave that way. The entire post reads more like a list of unexpected things that will bite you in the ass.


Thanking you for showing me the 'dis' module. Dis is going to be fun to play with!!


Or just embrace the tao of the tuple:

    def f(L=()):
        ...


I am but an egg, but isn't this the same but shorter?

def f(L=None): L = L or []


It might work the same depending on your use of it but it's not the same. In that instance L will become a blank list if it is equal to None, zero, or a zero length string. There are many cases where this wouldn't affect anything, but there can also be instances where that will cause you to define L as a blank list when you really wanted to keep L's value. I think it's always better to be explicit and test for the value(s) you expect.


Ah.

These are the little assumptions that keep blowing off my feet. Thanks.


It is these cases that brought the ternary operator to Python:

  def f(x=None): x if x is not None else []


I'm a little confused about this as an assignment to a variable. Is that because of the 'return' omission referred to below? If all function f did was return x, I could see how this works with a 'return' prepended.

But if this expression is a line in a larger function, and is intended to reset the value of argument x if no other value is passed in for it, does this really act as an assigment to x? Because I sort of read this expression as evaluating to some value -- the passed value for x, or a [] -- but does this assign that value to the argument x? Or must it be x = [expression]


  def f(x=None):
        x = x if x is not None else []
        return x
Now it will assign that value back to x. Otherwise it would just evaluate the expression.


Thanks for the additional clarity.


Yes, but why? Actually, it's No, because python culture aims to use one way to do thing, the least surprising one. In this case it's:

  def f(x=None): if x is None: x = []


Because that "if" is a statement whereas the ternary expression is, well, an expression. There are places expressions can be used that statements can't (eg lambdas) and that x or [] won't work (eg when x is False).

That said, people still seem to favor your form as the more Pythonic way. Personally, I think that's just because the ternary expression is relatively new.


you forgot 'return'


Yeah, I guess my typing went into "lambda mode" since it was a one liner.


I originally had "don't do that!" in my comment with your exact code, and edited it out for brevity because I've only seen a couple people do it (and they understood the ramifications, which others have told you). If you're interested in brevity, this is as terse as it gets:

    L = [] if L is None else L


Generally most use this:

    freqs = {}
    for c in "abracadabra":
        try:
            freqs[c] += 1
        except:
            freqs[c] = 1
If this is really the common idiom, I'd say this is a sign that professional programming has yet to fully mature as a field.

Some may say a better solution would be:

    freqs = {}
    for c in "abracadabra":
        freqs[c] = freqs.get(c, 0) + 1
Okay, so I understood immediately what was going on with the 2nd bit of code.

Rather go for the collection type defaultdict

    from collections import defaultdict
    freqs = defaultdict(int)
    for c in "abracadabra":
        freqs[c] += 1

As a non-pythonista, the 3rd bit of code, I had to Google "defaultdict" to figure out. It's only a couple of seconds to Google, and a professional should know this tidbit, but it seems like premature optimization to me. This brings to mind this post:

http://news.ycombinator.com/item?id=3995185

As a programmer, one's most valuable resource is brainpower. Supposedly, a programmer's most important goal is writing clear code. Look around at what goes on in our industry. There's a lot of our most valuable resource spent on showing off our cleverness, not directed towards the clearest code. To me this is like spending money to show one can spend money or playing an instrument to show off dexterity instead of producing gorgeous sounds.

(I think this starts in school and other environments where one is motivated to show off one's coding chops.)

Most of the complexity in our field accrues like litter: a bit here and a bit there. I think it says something about the culture of the folks who live there.


Or just say:

    from collections import Counter
    freqs = Counter("abracadabra")
I was surprised to see that missing, given that Counter was mentioned in the next section.


Ah, Guido's time machine strikes again :)

Here's my (now mostly obsolete) version:

    def count(string):
        counts = {}
        for item in set(string):
            counts[item] = string.count(item)
        return counts
        
    print count("abracadabra")
I had a look into the collections library, and it just uses iterable.iteritems(). I suspect that this might be faster for larger strings with multiple repeating characters, since set() and count() will pass the string directly to C.


I agree with the spirit of your argument.

In fact, I almost came here to write a parallel comment: I'm really not sure that reversing the list 'a' with 'a[::-1]' is better than 'reversed(a)', which usually effectively does the same thing, but whose meaning is much more obvious.

But, while I agree with your general point, in the specific case of 'defaultdict', I differ.

I use 'defaultdict' all the time and I'm glad its there. It feels cleaner than 'freqs.get(c,0)'. I define the default value in one place, and then the interface to my datastructure is simpler; hence as I continue writing, I can spend more of my brainpower in the problem domain.

Its a small detail, but its one less thing to think about when writing a complex algorithm.


I define the default value in one place, and then the interface to my datastructure is simpler; hence as I continue writing, I can spend more of my brainpower in the problem domain.

Actually, this speaks to point: optimization should be for reading, not for writing.


reversed(a) and a[::-1] are not equivalent. The former produces an iterator over the given list (with all the mutability dangers that come with it), while the latter produces a copied list. For plain iteration, you're correct, reversed() is better (similar to how xrange vs. range was back in the day); however, for reversing something and keeping it around, the slice syntax is better.

    >>> reversed([1, 2, 3])
    <listreverseiterator object at 0x107265650>
    >>> [1, 2, 3][::-1]
    [3, 2, 1]
To clarify your point, list(reversed(a)) and a[::-1] are equivalent. It's a slightly subtle point, but extremely important if you're keeping the result of reversed() around for any length of time. If you're just iterating at the moment that you use it, yes, they're effectively equivalent.


list(reversed(a))


Well, yes, but I'd also like to think that code should raise the level of the programmers reading it. You shouldn't avoid language features just because some people don't know about them. That's as ridiculous as the folks who say "Don't use the ?: operator because folks who haven't taken Intro CS 101 may not be familiar with it."

So you spent a couple seconds Googling defaultdict. Great, you now know what a defaultdict is and can use it in your own code. It's useful in a lot of places besides this toy example.

You should avoid gratuitous complexity, where you force the reader to learn something that will never, ever be useful to them again. A great example might be writing your own encryption algorithm, which will be complicated, wrong, under-performant, and totally useless on any other project. If you just use bcrypt (or whatever the recommended best practice is now), then your code works well, and all readers of your code now know about bcrypt and can use it themselves.


Well, yes, but I'd also like to think that code should raise the level of the programmers reading it...You should avoid gratuitous complexity

I have a different set of policies than most programmers, which arises from my observation that our field's priorities are out of whack with the actual cost-benefit.

Our greatest costs involve understanding systems, so our first priority should typically be to produce readable and understandable code.

You shouldn't avoid language features just because some people don't know about them.

One should pick language features to optimize for readability, which is entirely contextual. If your shop has a culture of using ?: to the point where it's like a coding standard then you should keep on doing that.

So long as code can be read and understood, programmers will learn. Better yet, if the culture of a shop is that use of language features and other tools are motivated by contextual cost-benefit, then programmers will learn from this example. As it is, programmers generally are more interested in showing off, having fun, and writing things as easily as possible. It's less common to have a culture of prioritizing reading.


I agree strongly. It amazes me that in the ugly vs. beautiful category, the article uses this as the beautiful:

   return [i/2 for i in nums if not i % 2]
"not i % 2"? That's beautiful? We do a division and ask whether the remainder is not true? What does it mean for a remainder of a division to be not true? Is sqrt(3) untrue? Is 17 not yellow? Is this really the clearest way to say even number?

Yes, after years of working in C, I'm well aware of how C does bools, but that's because C's values are small and fast, not beautiful and clear. In C, there's barely any abstraction to leak--you just manipulate bits and don't worry about mixing your metaphors. But Python has different priorities, which is why I prefer it to C when my users (and I) won't be hurt by the performance difference.

If we're showing off clarity instead of cleverness, wouldn't this be a better way to demonstrate it:

   return [i/2 for i in nums if i%2 == 0]
And I prefer the concept of "clarity" to "beauty" when it comes to code. Beauty is, well, whatever in the eyes of the beholder. Clarity, for me, is the question of how fast code can be read and understood correctly by a given programmer who is familiar (not more) with the language, not necessarily familiar with other languages, and unfamiliar with what the code does.

The faster such a person can skim the code and understand it correctly, the easier it will be to modify and keep free of bugs. If we're so smart, why don't we show it by using our brains to write code that is quicker to read and understand correctly than code written by lesser lights?


I agree, I wondered about that same snippet.

And I tested with the `timeit` module, there's not really much of a performance difference either (although the version with `not` is slightly faster, it's just by less than a percent or so).


I have a different set of policies than most programmers, which arises from my observation that our field's priorities are out of whack with the actual cost-benefit.

What metrics do you use to determine if code is readable or not? What metrics do you use to determine "actual cost-benefit"?


What metrics do you use to determine "actual cost-benefit"?

Well, if one were to take into account hard metrics for every 4 line snippet of code, then I'm not sure enough would get done fast enough. In the context of everyday programming and of the 3 examples I quoted, it's enough to ask yourself questions like: what would a newbie understand? What would an average programmer recognize immediately? If there's a quick obvious answer to either of those questions, and no onerous externalities involved, then that's what you write. (The code being 10X longer or too slow or too likely to contain bugs would be an onerous externality.)

This isn't to say that metrics aren't useful here. The question is how to apply them at a low enough cost. In a large company, perhaps one could A/B test variations on coding standards. In a diverse group of small programming shops with internally consistent coding standards or styles, one might gather metrics on bugs per line of code versus features of coding styles.

Something to think about. It's not as if metrics are commonly used to make these decisions now. Either edicts come from on high, or the local alpha-coder declarers what's best in her/his experience.


Overall, this is a nice post. There are two quibbles though.

1). For the most part, "c = collections.Counter()" is almost always better than "c = defaultdict(int)"

* Counter only supplies missing values rather than automatically inserting them upon lookup.

* The Counter version is much clearer about what it is trying to do. The defaultdict version is cryptic to the uninitiated (understanding it entails knowing that it has a __missing__ method to insert values computed by a factory function and that int() with no arguments returns zero).

* The Counter version provides helpful methods such as "most_common(n)".

2). An ellipsis in Python is normally used in a much different way than shown in the article (it's used for an extended slice notation in NumPy).


Unfortunately, Counter is not available before 2.7, so for many people it's a bit too early to require it.


There is a Python2.5 backport of collections.Counter() at http://code.activestate.com/recipes/576611/


This. On reading this post, I ran to my computer to replace defaultdict(int) with Counter in some of my code, only to find that I couldn't use it yet.


Two years old is a long enough.


OSX 10.6.8 still has Python 2.6 as the system Python.


    halve_evens_only = lambda nums: map(lambda i: i/2, filter(lambda i: not i%2, nums))
I still find it rather silly that python doesn't supper a nice list map/filter; it could be so much nicer

    nums.filter(lambda i: i%2 == 0).map(lambda i: i/2)
If they did, even including the annoyingly long-to-type "lambda". List comprehensions are cool and all, but do not really scale visually (i.e. get rather messy) when you have more than one map and filter step.

These arbitrary break-away from OO method style into module+data style (len(L) is another!) are one of the things I hate most about Python. There are some reasons for doing so, but a pure-OO (like Scala) or pure-method+data (like F#) would have saved me many a runtime error.


Python offers filtering expressions in its generator syntax. I find Python's "lambda" hurts readability for most uses, which pains me as a Lisp geek. Your example, as a generator:

    halve_evens_only = (i / 2) for i in nums if (i%2 == 0)
The parens aren't necessary, but they help readability for people who aren't used to the generator order of operations. (Again, Lisp geek, more parens means more readable in my fractured mind.)


The parens are required. What you've written raises a SyntaxError.

You can omit them in the generator expression if it's being passed directly as the only parameter to a function:

  halve_evens_only = list(i/2 for i in nums if i % 2 != 0)


Rust typeclasses solve this problem; you can create a suite of methods with syntax like this:

    impl methods<T> for [T] {
        fn filter(f : fn(T)->bool) -> [T] { ... }
        fn map<U>(f : fn(T)->U) -> [U] { ... }
    }
And then you can call it with syntax like:

    println(#fmt("%?", [ 1, 2, 3 ].map { |x| x + 3 }));
    // prints "[ 4, 5, 6 ]"
The methods are properly scoped, so code that isn't in your module needs to import your methods to use them. That way, you avoid introducing strange action-at-a-distance in your code.


  halve_evens_only = map (/2) . filter even


I think only familiarity with Javascript makes that "nice". One could argue that this

(lambda i: i%2 == 0).filter((lambda i: i/2).map(nums))

makes (marginally) more sense. But I like filter and lambda alright as they are. I agree Python's inconsistency in this is a bit unfortunate, but I haven't had much of a problem with the runtime errors you mention.


the D language supports that kind of syntax

nums.filter!(i => i%2 == 0).map!(i => i/2);


To elaborate, this is because of D's Uniform Function Call Syntax (http://www.drdobbs.com/blogs/cpp/232700394) - any function that takes an object as first argument can be called as though it were a method of that object; "map" and "filter" are actually functions in the std.algorithm module. It's a pretty neat trick and although in principle it could make things harder to reason about I haven't had any problems with it so far.


so does Scala

    stuff.filter(_ % 2 == 0).map(_ / 2)
and C#, demonstrating that this isn't some obscure feature that only language geeks care about:

    Stuff.Where(x => x % 2 == 0).Select(x => x / 2)
My point isn't that this sort of syntax is new and novel, it's just that in Python it's annoyingly inconsistent. There are reasons where you would want to use type-class style modules to structure your code in a certain way, but I do not think python's map() filter() reduce() and len() qualify as these cases


Another handy one I saw recently:

  varname, = [x for x in l if predicate_with_single_truth_value(x)]
The comma after varname is an implicit assert that the list comprehension only contains one element.


Trailing commas are really easy to miss. When reading this line of code, I did not notice it immediately; I originally assumed that varname was being assigned a list.

This sort of code would be very confusing when I'm just quickly reading through a procedure trying to find the potential bug.


Indeed. I prefer the following variant which is more explicit and thus (IMHO) more in the spirit of Python:

   (varname,) = [x for x in l if predicate_with_single_truth_value(x)]


Agreed. It could be written much more clearly in my opinion like this:

    [varname] = [x for x in l if predicate_with_single_truth_value(x)]


I had to check that on the REPL. I'm surprised that even works and I can't think of a good reason why should list syntax be allowed as a lvalue, in addition to tuple syntax.


It's called destructuring assignment. It's been around for a while.

http://dunsmor.com/lisp/onlisp/onlisp_22.html


I mean why would you want to allow both list and tuple syntax for exactly the same semantics, when either of them would be enough.


I just wish we had destructing assignment for other types as well. And maybe a proper pattern matching!


Good idea!


You could also use the ,= operator, of course:

varname ,= [...]


It appears to me that there is actually no such operator in Python; cf. http://docs.python.org/reference/simple_stmts.html#augmented...

Superficially it looks like an operator, but I suspect that's merely because of whitespace freedom; i.e., a, = [0] is equivalent to a,=[0] and a ,= [0].


I assume it was a joke; a pop culture reference to this stack overflow question:

http://stackoverflow.com/questions/1642028/what-is-the-name-...


  Sure it is! And in Python 3 there's the ,_*= operator, similar to lisp's car:

  varname ,_*= [1, 2, 3] # varname == 1


HN isn't the place for facetiousness.


Apparently it isn't the place for humour either.


Trailing commas are a bit subtle and readers of the code may think it was a mistake. (We spend more time reading code later than writing it in the first place.)

Even Python removed one of its most prominent cases of this, the print command. In Python 2.x you could have a trailing comma after a print to omit the new-line but Python 3's print() function requires print('something', end=' ') to be more explicit about it.


A better way: varname = (x for x in l if predicate_with_single_truth_value(x)).next()


That avoids the construction of the list, but doesn't check that the sequence only contains one value, which was the point of the example.


Till now I had never seen Ellipsis. It seems very similar to slice notation [:]. Found an StackOverflow comment [1] that has more details about usage of Ellipsis in slicing higher dimensional array numpy.

[1] http://stackoverflow.com/questions/118370/how-do-you-use-the...


Here's a rare example of obfuscated python, deceptively called "Python's Ellipsis Explained": http://blog.brush.co.nz/2009/05/ellipsis/


The only way in which Ellipsis can be useful is to save one character by using it instead of pass in Python3, like:

    def foo():
        ...
It is really highly unusual and I wouldn't recommend the practice shown in the blog post at all as this is not a common pattern.


There is a builtin function called `reversed`. You'd better remember that than the "useful" [::-1] idiom.

The recommendation on `iteritems` had better be generalized to include `iterkeys`, `itervalues`, and other opportunities for using iterators rather than building lists. A note that the 'iter...' versions are removed in Python 3 (because iterator behaviour becomes the default) would be appropriate here.

In relation to collections, itertools is a great module to get familiarized with. I import * from this module. I consider functions there as if they were builtins.

"Conditional assignment" is a weak and misleading name. "Conditional expressions" is more descriptive. There is no assignment in

  print "yes" if some_condition else "no"
as the article acknowledges later.

Using Ellipsis for getting all items is a violation of the Only One Way To Do It principle. The standard notation is [:].

I commend the good intentions of the writer, but I'm surprised that this article got 144 upvotes in HN.


list(reversed(xs)) to make a copy rather than a view, though


>There is a builtin function called `reversed`. You'd better remember that than the "useful" [::-1] idiom.

Reversed produces an iterator and not a copy with all the dangers of mutable semantics. The [::-1] syntax is not equivalent as it returns a copy of the list as if it were reversed, changing up somewhat what it's doing.

You'd better remember that you should know what you're talking about before you start dictating to other people how they should write their code.


  Generally, most use this:
    freqs = {}
    for c in "abracadabra":
        try:
            freqs[c] += 1
        except:
            freqs[c] = 1
Who does this?!


Not only is it overly-verbose and ultimately unnecessary (as shown by the alternative that follows in the article) but this mechanism is actually broken. For instance if "freqs" were data given to you from somewhere else and "freqs[c]" happened to have a type that cannot legally have 1 added to it, you'd want to see this error. The "except:" however will absorb any and all exception types and respond to all of them by setting the value to 1.


Also, raising exceptions used to be quite slow, which could hurt for a sparse counting set. Don't know whether that's still the case.


Also Exceptions should be used in "exceptional" circumstances and not as part of normal flow.


One exception (teehee!) to the rule: file operations and other things where atomicity matters. Example code:

  if not os.path.exists("foo"):
      os.mkdir("foo")
That introduces a race condition. If foo does not exist on the first line but is created by something else on the second line then this will raise an exception. The proper code is:

  import errno
  try:
      os.mkdir("foo")
  except OSError as exc:
      if exc.errno != errno.EEXIST:
          raise
That doesn't have the race condition.

(Yes, that's wordy. Eventually they plan to add to Python 3 a fancier exception hierarchy described at http://www.python.org/dev/peps/pep-3151/ , which lets you filter at a more fine-grained level, as below.)

  try:
      os.mkdir("foo")
  except FileExistsError:
      pass


It's good to point out the race condition.

To solve this particular problem in future code more compactly however, note that Python 3.2 (finally) adds an "exist_ok" Boolean keyword parameter to the multi-directory variant, os.makedirs(). In other words, calling os.makedirs("mydir", exist_ok=True) will silently ignore existing directories and only raise if other errors occur.


Thank you for pointing that out, I've just changed some code :)


Immature programmers. It's analogous to musicians who play everything as fast and ornately as possible.


Beat me to my equivalent answer: "java programmers."


Not a mature answer.


    freqs = {}
    for c in "abracadabra":
        if c in freqs:
            freqs[c] += 1
        else:
            freqs[c] = 1


My only complaint with this list is conditional assignments. They make it harder for me to understand the code. I am used to the if being at the start and when it isn't it takes more time for me to parse what the real meaning of the line is.


One of the reasons Python stopped being my favorite programming language was the appearance of what seemed to me to be Perl envy. I hadn't seen conditional assignments before, but they don't exactly make me regret that choice.


Probably best not to use iteritems, in python 3, it won't be there, "the dict.iterkeys(), dict.iteritems() and dict.itervalues() methods are no longer supported."


The iter* methods appear in Python 3, just without the iter prefix. Therefore it is good to use them in Python 2 because that makes it possible to translate the code automatically with the 2to3 script. Otherwise possibly superfluous list conversion might get added, e.g., list(d.keys()).


.keys(), .views() and .items() in Python3 return a memoryview which happen to be iterators but do far more.


Because the regularly-named ones got better.


Satyajit Ranjeev:

Since this is posted by the you as the author, I'll comment here: some JavaScript is running on page load that blanks the entire page in Safari and iCab on iPad, making the page turn white except for the bullet symbols, and making the article unreadable.

I was able to read it only by disabling JavaScript or parsing it with Readability. (Both disable my ability to comment about this bug there.)


There is one slight mistake there - saying that [::-1] is a special case. An empty value in a slice implies the beginning or the end, and when the stride is negative, the beginning is the last index, while the end is 0 - making [::-2] for example start from the last element and go down in jumps of two.


You should think of adding the use of `with` context managers.


isn't items more efficient than iteritems when you are going to go over all the items for sure on a short list?


No, never. The items method generates an intermediate list.


PHP: Facebook.

Python: ?

Ruby: ?


Python: Reddit.

Ruby: Twitter.


Python: Youtube


You will find the most critical components of Facebook are not written in PHP: https://github.com/facebook




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: