Hacker News new | past | comments | ask | show | jobs | submit login
Python idioms I wish I'd learned earlier (prooffreaderplus.blogspot.com)
418 points by signa11 on Nov 27, 2014 | hide | past | favorite | 169 comments

I think the example in #4 misses the point of using a Counter. He could have done the very same for-loop business if mycounter was a defaultdict(int).

The nice thing about a Counter is that it will take a collection of things and... count them:

    >>> from random import randrange
    >>> from collections import Counter
    >>> mycounter = Counter(randrange(10) for _ in range(100))
    >>> mycounter
    Counter({1: 15, 5: 14, 3: 11, 4: 11, 6: 11, 7: 11, 9: 8, 8: 7, 0: 6, 2: 6})
Docs: https://docs.python.org/2/library/collections.html#counter-o...

I recently had a use case for this where I needed a naively created bag of words along with the frequency of words. Having Counter made this extremely easy as I could simply split the string on whitespaces and pass the result to a Counter object. Another useful feature is that you can take the intersection of two Counter objects. It's a really nice data structure to have!

Python noob here. What does the _ mean in "...for _ in range(100))"?

To add to specifics:

_ in python is commonly used as a variable name for values you want to throw away. For example, let's say you have a log file split on newlines, with records like this:

    logline = " localhost.example.com GET /some/url 404 12413"
You want to get all the URLs that are 404s, but you don't care about who requested them, etc. You could do this:

    _, _, _, url, returncode, _ = logline.split(' ')
There's no special behaviour for _ in this case; in fact, normally in the interactive interpreter it's used to store the result of the last evaluated line, like so:

    >>> SomeModel.objects.all()
    [SomeModel(…), SomeModel(…), SomeModel(…), …]
    >>> d = _
    >>> print d
    [SomeModel(…), SomeModel(…), SomeModel(…), …]
Which I think is basically the same behaviour; you run some code, you don't assign it, so the interpreter dumps it into _ and goes on about its day.

Normally you would place a variable there, such as x or y. But in this case, the variable doesn't matter. You aren't using it in the function call. You're telling the programming the variable for the for loop that it doesn't matter.

It's just a name of a variable. There's nothing special about it (in this context), but it's conventionally used when you don't care about the value assigned to it.

"Because I was so used to statically typed languages (where this idiom would be ambiguous), it never occurred to me to put two operators in the same expression. In many languages, 4 > 3 > 2 would return as False, because (4 > 3) would be evaluated as a boolean, and then True > 2 would be evaluated as False."

The second half of this is correct, but it has nothing to do with whether the language is statically or dynamically typed. It's a tweak to the parser, mostly.

It's not just a tweak to the parser, and it does have to do with the type system, but you're right that it's not about static typing.

The issue is that there are languages (like C) where typing is static but weak, so e.g. booleans are also integers and can have integer operations like '>' applied to them. In other words, the problem is that in C True == 1 and 1 > 2 is a valid expression. In Python, which has strong(er) types, this expression can have an unambiguous meaning if you want to add the feature to your parser.

You can implement it entirely in the parser if you can avoid name capture - it may or may not be implemented entirely as a tweak to the parser in practice, but it's fundamentally a syntactic thing.

Your discussion of types here is all wrong - it's true that C treats booleans as if they were integers, but Python does, too:

    >>> (3 > 4) < 2
    >>> 3 > 4 < 2
    >>> 3 > (4 < 2)
It has nothing to do with types.

I think the comparison is more readable as (here with the >>> of the python shell):

    >>> 0 < x < 1
as opposed to

    >>> 1 > x > 0
Following the number line and placing x there is nicer IMHO. Other than that, very nice trick.

In terms of coding style, I agree with you that following the number line is usually going to be clearest. I think there might be some situations where descending is better than ascending, but certainly both are radically better than what I did above (low > high < lower).

As an example for what I was specifically trying to show here, though, that doesn't let me distinguish things quite as clearly.

I believe that (a < x < b) gives the same value as at least one of (a < x) and (x < b) for any value of x.

    x < a:
        a < x gives false
        a < x < b gives false

    a < x < b:
        everything gives true

    b < x:
        x < b gives false
        a < x < b gives false
So there's no way to get both of the parenthesized versions to disagree with the unparenthesized version in a single example.

Oh fun. That is not a nice associativity weirdness to have to deal with.

In fact, Python just has a non-binary AST with regard to operators, i.e. the expression "a < b < c" is not parsed as CompOp([CompOp([a, b], <), c], <) but instead as CompOp([a, b, c], [<, <]). The same holds by the way for Boolean operators, "a and b and c" is represented as BoolOp([a, b, c], [and, and]). See https://docs.python.org/3/library/ast.html#abstract-grammar for details.

Yes, you really can just tweak the parser; just (a) don't have a rule allowing comparisons to appear as children of other comparisons, and (b) add a rule that permits chains of comparisons. Types have zilch to do with this.

It's 100% a tweak to the parser.

Interestingly, you can do it with types and no parser help: https://gist.github.com/dlthomas/d18d475068f23584d473#file-c...

Obviously, that's not what's going on in Python...

It could be smoother and more generic if Haskell permitted specifying defaulting for non-numeric types. But it does support things like:

    *Main> 2 < 3 < 4 < 4 :: Bool

    *Main> 2 < 3 < 4 <= 4 :: Bool

    *Main> 2 <= 3 < 4 <= 4 :: Bool

    *Main> 2 <= 3 > 4 <= 4 :: Bool

    *Main> 2 <= 6 > 4 <= 4 :: Bool

You are absolutely correct that it could be implemented as 100% a tweak to the parser.

Assuming filmor is correct, in practice as it happens to be implemented in the Python code base it is not 100% a tweak to the parser - changing the structure of the produced AST means tweaks down the line. I think the change to the parser is still the most meaningful piece, though, even there.

Booleans are integers in Python too:

>>> True + 0


It would be more correct to say that the "bool" type implements an "__int__" method for conversion to an integer, but the types are actually distinct:

    >>> type(True)
    <type 'bool'>
    >>> type(1)
    <type 'int'>
Edit: oops, I'm wrong. "bool" also inherits from "int":


If you were underwhelmed by this blog post have a look at:

Transforming code into Beautiful, Idiomatic Python by Raymond Hettinger at PyCon 2013

https://speakerdeck.com/pyconslides/transforming-code-into-b... and https://www.youtube.com/watch?v=OSGv2VnC0go&noredirect=1

Topics include: 'looping' with iterators to avoid creating new lists, dictionaries, named tuples and more

And, if possible, see every video by Raymond Hettinger. He is a great teacher.

Sorry for the offtopic but why did you add `&noredirect=1` to the end of that youtube url?

sorry, was only an accident.

Sincerely, Transforming Code into Beautiful, Idiomatic Python – by Raymond Hettinger... http://youtu.be/OSGv2VnC0go

I was lucky to watch this video while first learning the language. Every beginner (coming from another language) should watch this to understand the idioms of Python.

Ya, me too. It's also a very funny talk. :)

One of my favorites:

    >>> print "* "* 50
to quickly print a separator on my terminal :)

Previous discussion on python idioms from 300 days ago: https://news.ycombinator.com/item?id=7151433

That's cute, but the result of a bad design decision.

Python overloads "+" as concatenate for strings. This also applies to lists. So

    [1,2,3] + [4,5,6]   yields  [1,2,3,4,5,6]
This is cute, but not what you want for numerical work.

Then, viewing multiplication as repeated addition, Python gives us

    [1,2,3]*4  yields [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
This is rarely what was wanted.

Then there's numpy, which has its own array type, needed because Python's array math is slow. Numpy arrays have different semantics - add and multiply perform the usual numeric operations. You can mix numpy arrays and built-in lists with "+" and "*". Mixed expressions are evaluated using numpy's semantics. Since Python doesn't have type checking, it's possible to pass the wrong kind of numeric array to a function and have it treated with the wrong semantics.

Moral: when designing your language, using "+" for concatenation is tempting, but generalizing that concept comes back to bite you.

You went from "not what you want for numerical work" to "generalizing that concept comes back to bite you". I don't think you can make that step.

I do non-numeric scientific computing. (Meaning, I touch numpy about once a year.) My own code does things like

    [0] * N  # could be replaced with something like numpy.zeros()
    [QUERIES_FPS] * 501
    to_trans = [None]*256  # constructing a 256 byte translation table
        # (I could use zeros(), but used None to force an exception
        # if I missed a cell)
                     [(simple.title, simple.record)]*2)
        # I parse a string containing two records and test I should
        # be able to get the (id, record) for each one
    ["--queries", SIMPLE_FPS, "-k", "3", "--threshold",
       "0.8"] + self.extra_args  # Construct a command-line from 2 lists
These idioms also exist in the standard library, like:

    webbrowser.py:  cmdline = [self.name] + [arg.replace("%s", url)
    sre_compile.py: table = [-1] + ([0]*len(prefix))
    ntpath.py: rel_list = [pardir] * (len(start_list)-i) + path_list[i:]
    traceback.py: list = ['Traceback (most recent call last):\n']
                  list = list + format_tb(tb, limit)
So while I think you are correct, in that "+" causes confusion across multiple domains with different meaning for "+", I think the moral is that operating overloading is intrinsically confusing and should be avoided for all but the clearest of use cases.

There is no best generalization to "+". For example, if you pick the vector math meaning, then regular Python would have that:

    ["A", "B"] + ["C", "D"] == ["AC", "BD"]
which has its own logic, but is likely not what most people who haven't done vector math expect.

You think it's a "bad" design decision because you think that a Python list should represent a vector of numbers (not even an array - a mathematical vector).

But a list is much more than that - conceptually it's any ordered collection of things, and not necessarily even the same type of thing. Overloading `+` to mean concatenation and `*` to mean replication means that the operators can work with any kind of list, not just lists that are mathematical vectors.

If you do want a mathematical vector, you should use a numpy array - not only are you making it clear that you have a vector of numbers, but your operations will be more efficient (because using a numpy array guarantees that the elements in it are all of the same type, so you don't have to do type dispatching on every element).

Then what would you have

    [1,2,'q',[1,('a',2)]] + 4 
yield? The reason why numpy lets you do math operation on each element in an array is because you can safely assume that each element is a number. You can assume absolutely nothing about the types of the elements in a list.

"TypeError: cannot add 'str' and 'int' objects."

Just because you can define semantics for nonsense doesn't mean you should.

I'll modify the example slightly to something which doesn't have a type error:

    [1,2,'q',[1,('a',2)]] * 4
With element-by-element operations, that would be

    [1 * 4,2 * 4,'q' * 4,[1,('a',2)] * 4]

    [4, 8, 'qqqq', [1 * 4, ('a', 2) * 4]]
applying that again, assuming that tuple * scalar is also applied element-wise gives:

    [4, 8, 'qqqq', [4, ('a' * 4, 2 * 4)]]
and ends up with

    [4, 8, 'qqqq', [4, ('aaaa', 8)]]
I can't think of any case where that's meaningful.

Also, what should this do:

    x = [1]
    print(x + x)
    print(x * 4)
? Currently these print:

    [1, [1, [...]], 1, [1, [...]]]
    [1, [1, [...]], 1, [1, [...]], 1, [1, [...]], 1, [1, [...]]]
because the print function knows how to handle recursive definitions. Do all of the element-wise operations need to handle cyclical cases like this? I think numpy can get away with not worrying about this precisely because, as wodenokoto pointed out, it can assume a flat structure.

You apparently want lst * intval to be equivalent to map(lambda n: n * intval for n in lst) or [n * intval for n in lst]. Since Python has a convenient built-in and even syntactic sugar for doing what you want, why not let the operator overloading handle a different case?

(also, your issue is not with "nonsense semantics", it's with "my idea of how this operator should've been overloaded is different from their idea", and perhaps is even a beef with the idea of operator overloading in general, though if you like numpy I think you wouldn't like losing operator overloading)

This is because you're assuming 'array' is supposed to mean 'vector' (as in the linear algebraic vector). It isn't, and it's a list -- it's meant to be a container. In this case, add meaning concatenate and multiplication meaning self-concatenate multiple times makes sense.

Even worse IMHO is the semantics of strings being implicitly iterable. Often it ends up that you're intending to iterate over something

    for item in orders:
So if `foo` is usually `[Order(...), Order(...), ...]` but due to a bug elsewhere, sometimes `foo` is "some string". Then you get a mysterious exception somewhere down in `do_something_with` or one of its callees at run time, and all because the above snippet calls do_something_with('s'), do_something_with('o'), etc.

In my experience, this behavior is so seldom what is wanted that it should be removable (with a from __future__ style declaration) or just off by default.

I use "for c in s", to read characters in a string, pretty often. Here's an example from Python3.2's quopri.py:

    def unhex(s):
        """Get the integer value of a hexadecimal number."""
        bits = 0
        for c in s:
            c = bytes((c,))
            if b'0' <= c <= b'9':
                i = ord('0')
            elif b'a' <= c <= b'f':
               i = ord('a')-10
            elif b'A' <= c <= b'F':
                i = ord(b'A')-10
                assert False, "non-hex digit "+repr(c)
            bits = bits*16 + (ord(c) - i)
        return bits
Here's another example of iterating over characters in a string, from pydoc.py:

    if any((0xD800 <= ord(ch) <= 0xDFFF) for ch in name)
It seems like a pretty heavy-weight prohibition for little gain. After all, you could pass foo = open("/etc/passwd") and still end up with a large gap between making the bug and its consequences.

Shouldn't unhex() just be int(s, 16)?

Not sure what it adds, but I don't quite understand it yet and perhaps there's something magic in the context of MIME quoted printable that I'm missing.

That is an excellent point!

Based on my reading, there's nothing magic. The context is:

    elif i+2 < n and ishex(line[i+1:i+2]) and ishex(line[i+2:i+3]):
        new = new + bytes((unhex(line[i+1:i+3]),)); i = i+3
I tweaked it to

        new = new + bytes((int(line[i+1:i+3], 16),)); i = i+3
and the self-tests still pass. (I also changed the 16 to 15 to double-check that the tests were actually exercising that code.)

It's not part of the public API, so it looks like it can simply be removed.

Do you want to file the bug report? Or perhaps it's best to update http://bugs.python.org/issue21869 ("Clean up quopri, correct method names encodestring and decodestring")?

> It's not part of the public API, so it looks like it can simply be removed.


So that is actually standard. Maybe I just don't know what you mean by public API though.

"it" == "unhex", not "int"

Oh, right! I had not read the sentence correctly.

* on lists can also mean elementwise multiplication, dot or cross product if you treat them as vectors. There's no way to choose the objectively best meaning. I'd even argue that vector math isn't the most popular use for lists in python, not because of + and * semantics, but because of performance.

So it was good design decision not to bother with math semantics for general use datastructure.

And besides Python has nice general syntax for elementwise operations if you don't care about performance:

    [x*y for (x,y) in zip(xs,ys)]
I agree it would be better not to implement + for lists at all.

"This is rarely what was wanted."

I don't know what else you would have expected...


If you want to perform an operation on each item of an iterable, do that :)

[n * 4 for n in [1, 2 3]]


map(lambda n: n * 4, [1, 2, 3])

With that logic, they should have expected [1,2,3] + [1,2,3] == [2,4,6]

I'm very aware of what you mentioned but...all I "wanted" in this case is a visual separator in my terminal when I'm working with lots of output. I don't care whether each "* " refers to the same object, I just want a line :)

With that being said, if I want to merge two lists and apply an operation on each, I don't see what's the issue with:

    In [1]: a = [1,2,3]
    In [2]: b = [5,6,7]
    In [3]: c = a+b
    In [4]: c
    Out[4]: [1, 2, 3, 5, 6, 7]
    In [5]: d = [x*4 for x in c]
    Out[6]: [4, 8, 12, 20, 24, 28]

I really like haskell's "++" for list concatenation. Makes a lot of sense.

Although the `++` is associated with increment from anyone coming to python from the C languages.

Its tricky; if you want to do vectors, use numpy.

Haskell also uses <> for combining any monoid, but of course in Python that was once not-equal... Maybe a dot? It's string concatenation in Perl, and function composition in Haskell. Interestingly, both of those are monoids...

Multiplication _is_ repeated addition.

Wow - that's really, really great list.

In particular, #7 is something that I didn't even know existed, and I've been hacking around for 2+ years.

Instead of:

   >>> print mdict.get('gordon',0)
   >>> print mdict.get('tim',0)
   >>> print mdict.get('george',0)
I've always done the much more verbose:

   class defaultdict(dict):

       def __init__(self, default=None):
           self.default = default

       def __getitem__(self, key):
               return dict.__getitem__(self, key)
           except KeyError:
               return self.default

   print mdict['gordon']
   print mdict['tim']
   print mdict['george']
I'll be sure to make great use of the dictionary get method - I'm embarrassed to admit how many thousands of times I could have used that, and didn't know it existed.

Another good/great source of Python tricks/idioms is Raymond Hettinger's "Idiomatic Python". The slides/videos are really great. I highly recommend them.


Do you know there's also collections.defaultdict ?

I do now! I had known (and used) collections.OrderedDict, but had never used defaultdict. Millions of keyerrors later...

I clearly am going to have to spend a few hours today grokking everything that collections.* has to offer. Thanks very much.

Read the manuals! ;)

There's a lot of hidden gems.

Your idea is nice in a syntactic sugar way, also, the default being a part of the dictionary rather than the get function makes it copyable.

I'll give you another gem that could be interesting: else clause in for loops

Also read other people's code :) You'll see this being used countless times in libraries / on GitHub.

Your defaultdict approach and the dict.get with a default specified is not really equivalent. In the defaultdict case when you encounter a non existing key it adds a new entry with that key into the dict. i.e. your dict will start growing.

whereas dic.get with default value keep returning you the default value without touching your dict.

re: "Your defaultdict approach and the dict.get with a default specified is not really equivalent. In the defaultdict case when you encounter a non existing key it adds a new entry with that key into the dict. i.e. your dict will start growing."

europa - The dictionary is only modified if you are using a method to modify it. When you are just passively querying it, it's not impacted.

   class defaultdict(dict):

      def __init__(self, default=None):
          self.default = default

      def __getitem__(self, key):
              return dict.__getitem__(self, key)
          except KeyError:
              return self.default

   print mdict['gordon']   
   print mdict['tim']
   print mdict['george']
   print mdict

   {'tim': 20, 'gordon': 10}

You are right about your implementation. I was thinking about the Pythons collections.defaultdict

from collections import defaultdict s = 'mississippi' d = defaultdict(int) for k in s: d[k] += 1 d.items()

[('i', 4), ('p', 2), ('s', 4), ('m', 1)]

The difference here - is that you are actually assigning a value to the dictionary element. It's the assignment that's growing the dictionary, not the query.

This is something I do instead of writing a long if-else:

    opt = {0: do_a,
           1: do_b,
           3: do_b,
           4: do_c}

It's called a jump table or vtable.

It's one of the examples in Forth. Plan 9 uses that technique in C a lot too.

This example is ramfs which creates an in memory file system (that you can mount in Unix btw, in 166 LoC)


fsopen, fsread, fswrite, fscreate are C functions declared in the same source file :

    Srv fs = {
	.open=	fsopen,
	.read=	fsread,
	.write=	fswrite,
	.create=	fscreate,

fs is then passed to a library which calls them as needed.

Yes, as SixSigma says, it's called a jump table; I've also seen it called a dispatch table (because you dispatch control to the right function based on a key). It's quite an old technique, dating from the earliest programming languages and even used in assembly language. using indirect addressing and suchlike techniques.

Edit: Looked it up, dispatch table is in Wikipedia:


Do you consider that to be idiomatic? I've been out of touch with the a Python community for a few years, but back then I wouldn't have considered that remotely idiomatic, and if I was on a team writing software, I would have argued that we shouldn't be writing code like that.

It's not something I see all the time, but using a dictionary instead of a big if/else is something that I'd consider a Python idiom, yes.

And it's something I do on occasion. For example:


Besides you can write code like this

   s = {'square': lambda x: x **2, 'simple': lambda x: x, 'cube': lambda x: x ** 3}

   s.get('square', lambda x: 0)(10)
   s.get('nonexistent', lambda x: 0)(10)
Substitute lambdas for real functions and you have a powerful switch case.

It can be idiomatic in certain circumstances.

If your dict keys are just numbers, then no, probably not. But strings mapping to functions, and in some cases objects and other things, are often used to substitute for numerous if and elif statements.

I've done this before, I think of it as a more powerful form of a switch statement.

I'd love to hear why you think it's not ideal.

"I'd love to hear why you think it's not ideal."

I'm not the poster you posed that question to. But for me, the one big drawback of using that idiom is that the function signatures have to be identical. So you either have to resort to args/kwargs, or you have an additional intermediary method between the actual "guts" of what you're calling, and the "switch" statement.

Or you live with the fact that you're passing unused/unnecessary parameters to your functions.

Good point, and in that case, I would say this idiom is not so great really. For functions with the same signature, it's a fine solution I think.

I don't think it's more powerful in any way than an if-elif-elif-else which, for what it's worth, I consider the Pythonic way.

Having said that, just because I consider something more Pythonic doesn't mean I prefer it. I've worked in a lot of languages over the years and still work in several in addition to Python. I really enjoy Python, but I prefer techniques that are more universal in many cases. For example, I prefer the idiom of looping over a list index to using Python's "enumerate" in most cases, because index looping is a common cross-language idiom and enumerate usually doesn't offer any benefit I value more than universal obviousness.

Other things such as Python's `for item in items` looping style are both VERY Pythonic and much nicer than, say, index looping, so I would almost always prefer such idioms.

The above switch -> function pointerish thing is clear to me from years of C/C++, but it is both less generally applicable across languages than if-elif... and less Pythonic, so I would prefer the if-elif... approach.

Obviously a matter of preferences, but since you asked....

It's strange that you would prefer `for item in items` but not `for i, item in enumerate(items)`. So in that case you'd manually do the `for i in range(len(items))...`?

I sometimes see this in code and think to myself, whoever wrote this needs to learn themselves some idiomatic python :) I don't think there's much point trying to force stuff that makes sense in one language into another language. Play to a languages strengths and all that.

I think you're right about if-elif generally being more powerful than the jump table. Though the jump table is useful in that you can define it in one place and use it in another.

It's still nicer than the

   func = getattr(self, 'do_%s' % thing)
I've seen in some codebases :)

Why? It's just a jump-table. In C, you would use function pointers instead of references to functions.

> There is a solution: parentheses without commas. I don't know why this works, but I'm glad it does.

It's worth mentioning that this is a somewhat controversial practice. Guido has even discussed removing C-style string literal concatenation:


You may wish to consult your project's style guide and linter settings before using it.

I personally don't like this style of using multiple strings. Makes radical changes of the text cumbersome.

I think in most cases it's better to use triple quotes. And if the content of these variables isn't exclusively shown in the shell, you should use translation files anyway.

    $ cat triple.py
    def foo():
        print """this is a triple quoted string
                 this is a continuation of a triple quoted string"""

    if __name__ == '__main__':
    $ python triple.py
    this is a triple quoted string
                     this is a continuation of a triple quoted string
This is really warty. In bash you can mostly get around this with <<- for here documents (which removes leading indentation, so even if you're defining a here doc at a place that itself is indented, you don't have to align the text to the left column yourself). The man page in my version suggests it only works for leading tabs, not spaces, though.


    $ function usage() {
            cat <<-END
                    this is a here document that
                    spans multiple lines and is indented
    $ usage
    this is a here document that
    spans multiple lines and is indented

I don't have an opinion positive or negative on it, but since many other design decisions have already been mentioned, here is Haskell's design decision for multi-line string literals. It allows "tidy indenting", but like the first design decision, interacts badly with "reflowing / reformatting / filling".


I'd normally write

    def foo():
        print """\
    this is a triple quoted string
    this is a continuation of a triple quoted string"""

    from __future__ import print_function
    import textwrap

    t = """
            Hello there
            This is aligned

    # Need strip to get rid of extra NLs 


Use `textwrap.dedent()`?

Dedent is nice, but then you still have to deal with removing single newlines (e.g. for error messages) and removing leading and trailing spaces. Ultimately nothing more than `re.sub(r'[^\n]\n[^\n]', '', textwrap.dedent(s).strip())` but kind of annoying to have to throw this in your code all over the place.

Didn't know about textwrap, but this does not strike me as particularly lightweight. You really want built-in syntax for something like this.

I normally just do this for multiline strings:

    s = "\n".join(["one","two","three"])

This happens at run time whereas IIRC the string literal is crammed pre-joined into the ,pyc / .pyo file.

The performance hit of doing this kind of thing really adds up in a larger app.

I would put all messages in variables at the top of the file or even in a separate one and print them in the appropriate places.

Having multi line prints in functions add a lot of noise in my opinion. When i read code, i dont normally care about the content of messages being printed.

I'm not much of a Python guy, but that chained comparison operator is sweet!

Sure, it's just syntax sugar, but it saves a lot of keystrokes, especially if the variable name is long.

Is Python the only language with this feature?

Common Lisp (and other dialects):

   (< a b c d)   ;; T if a < b < c < d
   (<= a b c d)  ;; T if a < b < c < d

   (lcm a b c d ...) ;; lowest common multiple

   (+) -> 0
   (+ a) -> a
   (+ a b)  -> a + b
   (+ a b c) -> (a + b) + c

   (*) -> 1
   (* a) -> a
   (* a b) -> a * b
   (* a b c) -> (a * b) * c
Is it just syntactic sugar? (< a b c) evaluates a, b and c only once, which matters if they are expensive external function calls or have side effects.

   (and (< (foo) (bar))
        (< (bar) (baz)))
isn't the same as

   (< (foo) (bar) (baz))
By the way, this could be turned into a short-circuiting operator: more semantic variation. Suppose < is allowed to control evaluation. Then an expression like (< a b c d) could avoid evaluating the c and d terms, if the a < b comparison fails.

I love clojure for providing

   (<= 1 2 2 5 6 6 9)
and for making = actually useful (works on nested structures properly).

"Is this sorted", and "are these equal" are intuitive and useful concepts in programming and you shouldn't need to reimplement them each time you need them.

"Is this sorted" is useful but "<=" is not a good name for it.

Why? It's generalisation of binary operator <= to many arguments. Works as well with <, >, >=.

But can any lisp dialect do:

    a < b >= c


Yes, for instance in Common Lisp we can make ourselves a rel macro, such that

   (rel a < b >= c)
evaluates a, b, c once, left to right, and then performs the comparisons between the successive evaluated terms.

  $ cat rel.lisp 
  (defmacro rel (&rest args)
    (loop for expr in args by #'cddr
          for g = (gensym)
          collect g into gens
          collect `(,g ,expr) into lets
          finally (return `(let ,lets
                               ,(loop for (left op right) on args by #'cddr
                                      for (lgen rgen) on gens
                                      while rgen
                                      collect `(,op ,lgen ,rgen)))))))

  $ clisp -q -i rel.lisp 
  ;; Loading file rel.lisp ...
  ;; Loaded file rel.lisp
  [1]> (macroexpand '(rel))
  [2]> (macroexpand '(rel x))
  (LET ((#:G3219 X)) (AND NIL)) ;
  [3]> (macroexpand '(rel x < y))
  (LET ((#:G3220 X) (#:G3221 Y)) (AND ((< #:G3220 #:G3221)))) ;
  [4]> (macroexpand '(rel x < y >= z))
  (LET ((#:G3222 X) (#:G3223 Y) (#:G3224 Z))
   (AND ((< #:G3222 #:G3223) (>= #:G3223 #:G3224)))) ;
  [5]> (macroexpand '(rel x < y >= z < w))
  (LET ((#:G3225 X) (#:G3226 Y) (#:G3227 Z) (#:G3228 W))
   (AND ((< #:G3225 #:G3226) (>= #:G3226 #:G3227) (< #:G3227 #:G3228)))) ;
Could use some error checking, obviously, to make it a production-quality macro.

Anyone spot the bug? Of course

  (AND ((...) (...) ...)))
should be

  (AND (...) (...) ...)
I haven't run the generated code once, yet I can debug it: such is the power of the HN development environment.

The fix, of course, is to splice the comparison expressions into the AND:

  `(let ,lets
        ,@(loop for ... )))   ; comma splat, not comma

In a Lisp-1 dialect like Scheme, rel could easily and conveniently be a function. The call (rel a < b <= c) simply evaluates its arguments. The arguments < and <= are functions. The rel function then just operates on these values.

Given that lisps tend to use prefix notation, the example doesn't even translate meaningfully.

I assume the above was intended to be a Python expression. The answer, to the best of my knowledge, is "not really". There is no built in or reasonably standard function that lets you check simultaneously that a is less than b which in turn is greater than or equal to c. Not that we couldn't define one ad-hoc, although making it reusable in a way that's reasonably idiomatic might be a challenge...

Obviously, you can express it slightly more verbosely:

(and (< a b) (>= b c))

(< a (>= b c)) same number of operators but a few extra parens

This isn't the same thing -- the Python syntax is asking for a < b and b >= c, while only evaluating b once. There are many infix math libraries for lisps (e.g. this one from 1995: https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/...) that allow things like #I(a < b >= c), but the trick is only evaluating b once. (The linked lib will expand into two evaluations if b is an expression.)

Here (>= b c) will return a boolean value, which you're then comparing to a.

Which means it could be done, but the relational operators (any one of which could be the leftmost constituent) have to all be defined as macros which analyze the rest of the expression with alternative evaluation rules.

I think syntactic sugar is vastly undervalued by programmers. Anything that lets me more naturally read & write code should lead to fewer bugs and more productive software development, as well as making programming more enjoyable.

It's not simply syntax sugar either.

    def sideEffects():
        print "Called."
        return 5

    if 0 < sideEffects() < 10:
        # sideEffects is only called once

SQL has “x BETWEEN y AND z” as a special case which does “y >= x >= z”, which in practice is quite often what you actually want to use this feature for.

> Is Python the only language with this feature?

No. See, e.g., http://stackoverflow.com/questions/4090845/language-support-...

I'm not sure if other languages have it, but I have to say that pycharm is excellent at suggesting chained comparisons to you. I didn't know this existed before I switched IDEs.

if 2 > 3 and 2 < 7


if 3 < 2 < 7

If this Pycharm really had brains it would tell you that (2>3 and 2<7) reduces to (false and true), to false, at compile time, so the code wrapped in the if is unreachable.

I'm not sure about pycharm, but i know for certain IntelliJ warns you about constant conditions like that. That said, he could have picked that example just to illustrate a point, rather than something it literally suggests.

Perl6 has it too

  > perl6
  > 3 < 4 < 5

3 < 4 < 5 evaluates to true (well, 1) in C as well.

What does 5 > 4 > 3 give?

I came across this when I was first learning Python and it has always impressed me:

    from random import shuffle
    deck = ['%s of %s' % (number, suit) for number in '2 3 4 5 6 7 8 9 10 Jack Queen King Ace'.split(' ') for suit in 'Hearts Clubs Diamonds Spades'.split(' ')]

I never liked how people in Python use stringWithSpaces.split instead of a list. Just feels wrong somehow.

But I've seen it many times so it's probably pythonic

I do it entirely, exclusively, only, purely because it requires less punctuation typing. It returns a list anyway. The performance hit is virtually unnoticeable in almost every use case (unless this is a function taking in input strings formatted this way many times per second, but in that case you've got way worse to worry about first...).

As DaFranker points out, it's just easier to type than

    ('Hearts', 'Diamonds', 'Spades', 'Clubs')
and has less opportunity for typos and syntax errors. If I was concerned about performance I would replace it with a tuple, but it was Good Enough for a quick example.

Sorry, I realised I sounded negative. Nothing against you, it's just my private opinion.

It's not about performance, but about one more incidental step when reading the code. Small cost, but it's there.

One reason I like it is that you're probably going to be reading in the possible values from a stream or arg or config file anyway.

But you have a good point. YAGNI is YAGNI.

I agree; what I usually do is type the string.split() version in a REPL (I keep one open permanently) and then copy-paste the result to the file.

It avoids having to do a big change if you need to add a new item with spaces in it.

It's like old Perl idiom where Python's 'ab cd ef'.split(), would be written as qw(ab cd ef), which probably looks nicer --'qw' stands for 'quote words', I think.

It makes sense when all the values in the list are text.

Avoids lots of ", "

also in the std library, https://docs.python.org/3/library/collections.html#collectio... field_names can take a single space separated string

I love this idiom. I also use it in Javascript.

Can someone direct me to a comparision of subprocess and os? I keep hearing subprocess is better, but have not really read any explanation as to why or when it is better.

(I'm glad I'm not the only one who was thrilled to discover enumerate()!)

The OS module interacts directly with the OS rather than abstracts it, so a lot of the functions in it have the "may not be available on all platforms" apology.

Subprocess uses OS under the hood but offers an abstraction that mostly works on all platforms, e.g. the way that "communicate" is implemented on Windows differs considerably from how its implemented on Unix.

The subprocess module is meant to replace a number of functions in the os module. The python documentation lists a number of examples here: https://docs.python.org/2/library/subprocess.html#subprocess...

If you google 'os vs subprocess python' there are a few stackoverflow and quora threads comparing them.

All explained here: https://www.python.org/dev/peps/pep-0324/


* Security (including avoiding holes like shellshock)

* Unification of process handling under one core library.

* Easier detection of errors in processes.

* Easier control of stderr/stdout.

* Universal newline support.

* Elimination of race conditions with .communicate()

I was grateful for the example of multilined strings, mysterious as it is. The lack of any way to do this has been an annoyance of mine for quite some time.

Some comments:

1. Am I the only one that really loves that `print` is a statement and not a function? Call me lazy, but I don't mind not having to type additional parentheses.

5. Dict comprehensions can be dangerous, as keys that appear twice will be silently overridden:

  elements = [('a', 1), ('b', 2), ('a', 3)]
  {key: value for key, value in elements} == {'a': 3, 'b': 2}
  # same happens with the dict() constructor
  dict(elements) == {'a': 3, 'b': 2}
7. I see

  D.get(key, None)
way too often.

8. Unpacking works in many situations, basically whenever a new variable is introduced.

  for i, el in enumerate(['a', 'b']):
    print i, el

  {key: value for (key, value) in [('a', 1), ('b', 2), ('a', 3)]}

  map(lambda (x, y): x + y, [(1, 2), (5, -1)])
Note: the last example (`lambda`) requires parentheses in `(x, y)`, as `lambda x, y:` would declare a two-argument function, whereas `lambda (x, y):` is a one-argument function, that expects the argument to be a 2-tuple.

On the subject of "call me lazy", I really like the % syntax for string interpolation. I'd like perl-style interpolation even more. "".format() is going in completely the wrong direction for me. (I don't think % is being removed, but I think it's discouraged.)

> lambda (x, y): x + y

This syntax is removed in python 3: http://legacy.python.org/dev/peps/pep-3113/

I'm still not sure why that was deprecated. It's much cleaner. I still use it… :/


print-as-a-function has a few nice qualities, like the ability to pass it around as an argument to things so you can do (stupid example):

    map(print, range(10))
It can also take some kwargs that allow you to do things that are a bit clunky with the print statement.

Which is the more intuitive of the following?:

    print(errmsg, file=sys.stderr)

    print >>sys.stderr, errmsg
(I didn't even know about the latter until I found it in someone's code and looked it up) Also, suppressing line endings:

    print "cats", 

    print("cats", end='')
My lazy-brain prefers the statement. My sensible/code-review brain prefers the function.

The first really has nothing to do with `print` being a statement - it could be parsed and passed as a function in "non-statement" positions regardless.

The rest could probably be special-cased in a backwards-compatible way as well. This is currently not valid Python 2.0 syntax:

  print "cats", end='', file=sys.stderr

well, as a statement it's also a reserved keyword, so you can't override or mock it (AFAIK), and I suspect changing the parser to identify context and operate accordingly might be rather painful.

Honestly, I'm reasonably happy with the split; there don't seem to be many compelling reasons to shoehorn all the extra bits back into statement-print other than

a) removing a single pair of parens

b) being backwards compatible (but then the old code wouldn't be using those new features anyway, and would still have to support that nasty bitshift hack.

> 1. Am I the only one that really loves that `print` is a statement and not a function? Call me lazy, but I don't mind not having to type additional parentheses.

If you think the parentheses are bad, why wouldn't you prefer a language like Ruby where you can omit them generally? Leaving them out for just one special construct seems so insufficient as a cure.

In general, Ruby's syntax (or rather, semantics) is ambiguous (you don't know if a statement without parentheses is calling a function or just accessing a value). I prefer unambiguous syntax.

However, Python's `print` statement is (1) well-known, (2) very useful (for prototyping, debugging, in REPL), and (3) shouldn't in general be present in production code (logging should be used instead). Therefore, omitting parentheses would help ease debugging and exploring (REPL prototyping), while not making code more ambiguous in general. Yes, it's a special case, but `print` is also has very special, very specific use-case.

Try %autocall in ipython. `

In [9]: %autocall Automatic calling is: Smart

In [10]: def foo(a, b): return a + b ....:

In [11]: foo 3, 4 -------> foo(3, 4) Out[11]: 7


tom, get back to work!

It is thanksgiving, lay off Tom.

"7" May be because getattr (at least) works the other way around, instead raising an exception if not found and no default specified. I'm sure many people can't always remember which works which way.

There would be no purpose in the `get()` function if it raised an exception if the key wasn't there - that's how `[]` works. On the other hand, `getattr` is IMO mostly used for situations where you don't know what properties exist on an object, so you can't just use the dot notation.

with the print as a keyword, you can't have modules defining a print function (for example, APIs where print makes sense as a function name ([he]xchat's py support comes to mind))

additionally, if you want to override print in python2, you need to replace the stdout stream with your own inbetween buffer object, which also has the downside of being global

Whhats wrong with 7.?

The default value (the second argument of the `get` method) defaults to `None` anyways. Therefore,

  D.get(key, None)
is just syntax noise (in the best case - in the worst case, it signifies someone who doesn't know/understand Python).

should be used instead, or

  D.get(key, "whatever")
if required.

Of course, sometimes I still type:

  D.get(key, None)
since I forget that dict.get and getattr() have different behavior in the case of missing keys/attributes...

I'm a fan of Python's conditional expressions.

    foo = bar if qux is None else baz
They're particularly interesting when combined with comprehensions.

    ['a' if i % 2 == 0 else 'b' for i in range(10)]
Though this particular example can be expressed much more concisely.

    ['a', 'b'] * 5

Before the conditional expression was introduced to Python I sometimes wrote things like

    foo = [baz, bar][qux is None]

That's a Pythonization of C's conditional operator.

    foo = qux == NULL ? bar : baz;
Of course, C does not do list comprehensions.

i didnt realize you could add the if statement in the comprehension where you did

    ['a' if i % 2 == 0 else 'b' for i in range(10)]
very cool!

I was only aware of code like

    ['a' for i in range(10) if i % 2 == 0]

i didnt realize you could add the if statement in the comprehension where you did

    ['a' if i % 2 == 0 else 'b' for i in range(10)]
very cool!

I was on aware of code like

    ['a' for i in range(10) if i % 2 == 0]

I work with python full time, and the last (#10 string chaining) is one of the few times the syntax had caused me grief, due to missed commas in what were supposed to be tuples of strings. The chaining rules are one of the few sources of apparent ambiguity in the syntax, especially when you include the multiline versions.

Most of these idioms actually make me sad.

When I first started using Python around 1999, it didn't even have list comprehensions. Code was extremely consistent across projects and programmers because there really was only one way to do things. It was refreshing, especially compared to Perl. It was radical simplicity.

Over the decade and a half since then, the Python maintainers have lost sight of the language's original elegance, and instead have pursued syntactical performance optimizations and sugar. It turns out that Python has been following the very same trail blazed by C++ and Perl, just a few years behind.

(At this point Python (especially with the 2 vs. 3 debacle) has become so complex, so rife with multiple ways to do even simple things that for a small increase in complexity, I can just use C++ and solve bigger problems faster.)

Are we reading the same article, though?

It's certainly up for debate whether named tuples and enums, various kinds of metaprogramming and decorators might be making the language more complex for fairly little gain... but this article talks about the `enumerate` function, about string formatting and dictionary comprehensions. Simple, straightforward stuff with no downsides.

But syntax matters! When you're writing stuff all day,

    a = [_ * 2 for _ in range(10)]
is a lot more pleasant than:

    a = []; for _ in range(10): a.append(_ * 2)
It also gives Python a lot more information about your actual intent. Suppose "range(10)" were actually "giant_list". Hypothetically, the list comprehension could pre-allocate len(giant_list) elements instead of calling list.append that many times. That's potentially a huge performance win.

You see Python as getting more complex. I see it as getting less complex by giving concise alternatives to common idioms.

There is an sweet in India called [0] phirni when you start eating from outer layer to inner layer, you will feel like walking in heaven.Now you are in outer layer.I hope you enter to the inner level and feel the python still. :)

[0] http://www.wikihow.com/Make-Phirni-%28a-Rice-and-Milk-Dish%2...

"Missing from this list are some idioms such as list comprehensions and lambda functions, which are very Pythonesque and very efficient and very cool, but also very difficult to miss because they're mentioned on StackOverflow every other answer!"

Can anyone link to good explanations of list comprehensions and lambda functions?

This seems like an easy intro to list comprehensions:


Nice list, but I was confused by the arguments to the dict .get() example until I looked up the definition.

I wish there was an interval set in Python's builtins.

I also wish that ranges were an actual proper set implementation - so you could, for example, take intersection and union of ranges.

And I wish that Python had an explicit concatenation operator.

You mean like the built in `set` object? https://docs.python.org/2/library/stdtypes.html#set

I think he or she meant a specialized set data structure that stores sets of numbers which can be written as finite unions of interval. Typically besides set operations you'd also want (1) that only endpoints of the intervals are stored, so that the structure is compactly represented in memory, (2) to be able to recover the canonical representation of the set as a sorted union of disjoint intervals.

This is exactly what I meant, thank you.

Set doesn't work for this.

You need to store all possible numbers between the lower and upper bound, which isn't exactly workable for (for example) floats.

Oh, wow. I didn't know the dict comprehensions. Since when do they exist? I always used:

    d = dict((key(x), value(x)) for x in xs)


I'm not entirely sure how to interpret the PEP header. It dates back to 2001 and was updated in 2012. It's probably in python since 2.3 but maybe 2.7(2010)/3.0(2008).

Since 2.7, IIRC.

Is there any such collection of advanced Python patterns, aimed at Python programmers with more than 2-3 years of experience?

haha.. in #1, the easter egg "not a chance" :) :)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact