
Comprehensions in Python the Jedi way - bearfrieze
https://gist.github.com/bearfrieze/a746c6f12d8bada03589
======
undershirt
I don't think the author realizes how appropriate the Jedi tone for this
article is. Comprehensions are the gateway drug to the dark side, away from
imperative programming and toward languages that treat everything as
expressions which snap together more freely. It hints at an idea of making a
`for` loop and an `if` statement return a value (see CoffeeScript). But it
also hints at the idea that useful idioms like the List Comprehension can
merge/simplify existing constructs into a new, easier syntax-- and that there
are languages that allow you to do this freely (see Lisp).

Python showed me the Force, but I'm with the Dark Side now.

~~~
hellofunk
One of the most extraordinary APIs for comprehensions can be found in Clojure,
where the "for" expression really changed how I view data. A single expression
can generate many complex data sequences with such elegant beauty.

------
bpicolo
List comprehensions are awesome. Not only that - python does them insanely
beautifully. Clojure and ES6 are examples that I think aren't as readable,
though equally powerful more or less. But in python they don't have any
clunkiness. Simple, expressive. Love them.

Nesting them can get ugly, but it's easy to avoid: Just use generator
expressions and chain them. No real runtime overhead that way.

~~~
nilliams
Well with regards to clunkiness and awesomeness, are they not still less
'naturally' composable and (as a result) less readable in composition than
collection pipelines? I don't see a good reason to prefer them over:

    
    
        collection
            .map(x => x * 2)
            ...
            .filter(isOdd)
            .reduce(blargh)
    

... style syntax that most other modern, C-style languages offer now (ES6,
Rust, Ruby, C# ...)

~~~
thomasahle
The comprehension approach, while requiring more syntax, seems to often
produce a more 'natural' order of operations. For example, compare

    
    
      [p for p in range(2,100) if all(p%q!=0 for q in range(2,p))]
    

with

    
    
      range(2,100).filter(p => range(2,p).map(q => p%q!=0).all())
    

It might be a small thing, but in the first one I feel the prime 'p' becomes
the center piece, whereas in the second, 'p' is burrowed somewhat in the
expression.

~~~
nilliams
Both are fairly unintelligible to me (probably partly due to my lack of
interest in prime numbers), so it seems like splitting hairs to draw
comparisons, but if I had to figure out what was going on, I'd rather be
staring at this (~ES6):

    
    
        range(2, 100).filter(p => {
      
            // I'd probably stick a comment here to explain this ...
            // I'd split out this whole section into a named function once I understood it.
            return range(2, p)
                .map(q => p % q != 0)
                .all();
    
        })
    

I don't feel like I could iterate on understanding this problem so well in
Python because I have to reach for one of 2 seemingly inferior solutions (list
comprehensions or borked lambdas).

------
a_bonobo
One interesting side-thing with list comprehensions in Python 2 vs. Python 3:

Python 2.7:

>list_of_numbers = [1,2,3]

>[x/2 for x in list_of_numbers]

>print(x)

3

Python 3:

>list_of_numbers = [1,2,3]

>[x/2 for x in list_of_numbers]

>print(x)

NameError: name 'x' is not defined

They "leak" their variables in Python 2, if you're someone who reuses
variables this can lead to an enormous headache!

~~~
Cyph0n
Wow, I never noticed that! Who the hell thought that was a good idea?

~~~
raymondh
Barry Warsaw proprosed List Comprehensions 16 years ago in PEP 202.

The original concept was that it provided a more concise way to write loops so
that:

    
    
       t = [expr(x, y) for x in s1 for y in s2]
    

was just a short way to write:

    
    
       _t = []
       for x in s1:
           for y in s2
               _t.append(expr(x, y))
       t = _t
       del _t
    

The original implementation reflected that design. I later added the
LIST_APPEND opcode to give the list comprehensions a speed advantage over the
unrolled code.

The question of whether to expose the loop induction variable didn't get much
discussion until I proposed Generator Expressions in PEP 279 and decided to
give them the behavior of hiding the loop induction variables so that the
behavior would match that of a normal unrolled generator function.

When set and dict comprehensions (displays) came along afterwards, they were
given the latter behavior because there was precedent, because there was a
mechanism to implement that precedent, and to provide a short-cut for then
common practice of creating dicts and sets with generator comprehensions:

    
    
        s = {expr(x) for x in t}
    

was a short-form for:

    
    
        s = set(expr(x) for x in t)
    

The advent of Python 3 gave us an opportunity to make a four forms (listcomps,
genexps, set and dict displays) consistent about hiding the loop induction
variable.

The current state in Python 3 has the advantage of being consistent between
all four variants and matches how mathematicians treat bound and free
variables.

There are some disadvantages as well. List comprehensions can no longer be
cleanly explained as being equivalent to the unrolled version. The disassembly
is harder to explore be cause you need to drill into the internal code object.
Tracing the execution with PDB is no fun because you go up and down the stack.
It is more difficult to explain scoping -- formerly, all you had was
locals/globals/builtins, but now we have locals/nonlocals/globals/builtins
plus variables bound in list comps, genexps, set/dict displays plus exception
instances that are only visible inside the except-block.

~~~
Cyph0n
Thanks for the detailed reply, Raymond! I never expected a core developer who
worked on implementing list comprehensions to respond haha!

That last point about traceability is something I never thought about, but I
think the trade-off is worth it.

Keep up the awesome work man. For anyone interested, you can follow him here:
[https://twitter.com/raymondh](https://twitter.com/raymondh)

------
cgriswald
Pedantic SW points:

1\. Rey is not a Jedi. A better dict key would be "fav_force_user".

2\. For the planets, Episode II is missing Coruscant, Episode VI is missing
Dagobah.

Edit: formatting

~~~
jordigh
Hah, like there is any doubt that she'll become a Jedi. It's basically Episode
IV all over again. Han Solo instead of Obi Wan, Rey instead of Luke, First
Order instead of the Empire, Kylo Ren instead of Darth Vader.

We like hearing a story we already know. Rey is well on her way to become a
Jedi, probably in two-movies' time.

~~~
zo1
And don't forget, old Luke is probably going to act as the Yoda-equivalent in
the next "episode".

------
d0mine
There is an easier way to convert bits to ASCII, filter spaces and reverse the
string:

    
    
      >>> n = int(bbs, 2)
      >>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
      's no  i sn e  h  e r  pm o c'
      >>> _.replace(' ', '')[::-1]
      'comprehensions'
    

[http://stackoverflow.com/questions/7396849/convert-binary-
to...](http://stackoverflow.com/questions/7396849/convert-binary-to-ascii-and-
vice-versa)

~~~
bearfrieze
I like the replace method. It's a great way of doing the same thing.

I considered using the [::-1] syntax to reverse the list, but decided that
there was enough "cute" stuff in the examples already.

~~~
d0mine
[::-1] is an idiomatic way to reverse a string in Python that is _the obvious
way_ for a habitual user of the language to do it e.g.:

    
    
      def palindrome(s):
          return s == s[::-1]
    

It could be discussed whether ''.join(reversed(s)) is more readable for a
novice programmer learning Python. In general, Python prefers words over
punctuation.

Also, there are objects that can be reversed() that are not sequences.

[http://stackoverflow.com/questions/931092/reverse-a-
string-i...](http://stackoverflow.com/questions/931092/reverse-a-string-in-
python)

------
rebootthesystem
List comprehension in Python is great. However, that's Padawan territory. If
you want to be a Jedi then APL is the only way. It's not "list comprehension"
it's the way of the force when you use APL.

------
Negative1
It's fun switching from Python to Scala where for/yield comprehensions are
idiomatic and very natural (nested comprehensions being a good example). Oh,
and type safe and fast. It's especially powerful to build off those constructs
with currying, partial application, pattern matching, etc...

I love Python and write code in it daily but as a (pseudo-)functional language
it feels very awkward to me.

------
ambicapter
> planets_flat = [planet for episode in episodes.values() for planet in
> episode['planets']]

Can somebody explain this one to me (I understand list comprehensions)? I'm
having trouble understanding how the second part uses something defined in the
first part, but the first part can't stand on its own, so

> [planet for episode in episodes.values()]

returns an error.

~~~
thomasahle
It's equivalent to

    
    
      planets_flat = []
      for episode in episodes.values():
        for planet in episode['planets']:
          plants_flat.append(planet)
    

Notice how the for loops in the comprehension goes in the same order as in the
imperative code.

~~~
hobarrera
> Notice how the for loops in the comprehension goes in the same order as in
> the imperative code.

Thanks, that's a sane way to explain the order. Up to right now, it was always
"the opposite of what you'd expect", which was a memory rule that always
failed me.

~~~
thomasahle
It also helps on how ifs should be inserted, for example:

    
    
      ys = []
      for x in xs:
        if P(x):
          for y in Y(x):
            if Q(x,y):
              ys.append(y)
    

Becomes

    
    
      [y
         for x in xs
         if P(x)
         for y in Y(x)
         if Q(x,y)
      ]
    

Surely combining this many for/if's may often be the wrong idea. Just like
making a depth 4 iterative loop isn't always ideal. It does make the order
easier to remember though :)

------
unoti
Comprehensions are great, but they can hurt readability and maintainability
when taken too far. Most of the time you shouldn't be playing code golf with
your code, iteratively seeking to pack more and more work into a single line
of code. That makes the code harder to understand and harder to re-use.

While many of the examples in this helpful article are good, the first example
with the octets is an excellent example of how not to do it. Look at the octet
parsing code we ended up with in the article:

    
    
      # Snippet 1
      octets = [bbs[i:i+8] for i in range(0, len(bbs), 8)]
      

It's nice and tight. What does it do? I'd need to peer at it a moment and
decode it, executing it in my head. This is subjective, but I think code
should be self-explanatory; it's up to the computer to execute code, not
people in their heads. Is there an off-by-one error in there? Here's another
way to do it that'd be better.

    
    
      octets = chunks(bbs, 8)
    

That function chunks() is something that I keep in an iterutils package which
I end up using all the time. It's intuitive, and it has a doctest that shows
that we definitely don't have an off-by-one error. It's also easier to re-use
than the first one. Maybe it also bears mentioning that chunks() works on an
iterator, while the first solution needs to keep the whole thing in memory at
once. Here's the chunks method I use:

    
    
        def chunks(collection, chunk_size):
            """Divides list l into chunks of up to n elements each.
                >>> l = range(75)
                >>> chunks(l,10)
                [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
                [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
                [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
                [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
                [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
                [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
                [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
                [70, 71, 72, 73, 74]]
            """
            for i in xrange(0, len(collection), chunk_size):
                yield collection[i : i + chunk_size]
    

When you catch yourself playing code golf and trying to pack more and more
meaning into a single line, look for ways that you can break the problem down
into multiple components that use each other. This kind of functional
decomposition is one of the things that makes functional programming so
wonderful. Lots of times the intermediate steps in a complex expression have
meaning and are useful on their own.

~~~
kilburn
I totally agree with you on the core issue.

However, I would reject your `chunks` function in a code review and tell you
to use `grouper` form the itertools recipes [1].

More generally,any time I've ended up with a long list comprehension, the
answer has been to "check itertools and see how you would describe this ugly
comprehension in those terms"

[1]
[https://docs.python.org/3/library/itertools.html#itertools-r...](https://docs.python.org/3/library/itertools.html#itertools-
recipes)

~~~
unoti
Grouper is a recipe, it's not part of itertools. It probably should be!

~~~
thomasahle
All of the recipes in the itertools documentation should pretty much be
considered folklore imho. If you always use the 'zip( _(iterator,)_ n)'
approach, people will recognize it and know immediately what you mean.

------
MatthewWilkes
is should not be used to check string equality, as it is in the space filter.
This only works as CPython interns short strings automatically.

~~~
bearfrieze
Thanks for pointing this out along with some folks over in the comments on
GitHub. I've updated the Gist.

------
distracteddev90
One thing I find interesting is that everyone loves to hate CoffeeScript, but
its individual features/syntax are consistently lauded in conversations about
other languages.

(Not to mention half of ES6 existed in CoffeeScript first, but that's a gripe
for another day)

~~~
stuartaxelowen
The composition of features is more important than just the individual
features - CoffeeScript is a great example of when feature composition goes
wrong.

------
alexandercrohde
Or, there's a 1-million times better way to do this with a good library. In
javascript for example

FA('0...11'.split('')).chunk(8) .map(x => parseInt(x.join(''), 2))
.map(x=>String.fromCharCode(x)) .filter(x=>x!=' ') .reverse()

[https://github.com/anfurny/Fancy](https://github.com/anfurny/Fancy)

------
RubyPinch
I just wish that python could stop being such a butt about not being like the
rest of the languages

list of items --> filter list --> operate on list

becomes

operate on list <\-- (list of items --> filter list)

And because this is how it was decided to tackle map/filter problems, we'll
always have a weird gimped anon-function operator instead, to discourage the
map/filter patterns of every other language

------
catnaroek
List comprehensions are just syntactic sugar over some common higher-order
functions. Are we really getting excited over syntactic sugar? How about
making Python's higher-order functions not suck instead?

------
matt_wulfeck
I don't want to be a jedi Python programmer and exploit every neat trick of
the language. I want to be a very good programmer and write code that's easy
for others to read, debug, and maintain.

~~~
santaclaus
I wouldn't call list comprehensions a Jedi feature of Python -- they are
pretty darn idiomatic and common.

------
jonesb6
Yes let's make the reputation of Python more cryptic and culty. Because
cryptic and culty things are better right?

Han Solo said it best "Hokey religions and ancient weapons are no match for a
good blaster at your side, kid."

Lets keep it explicit alright? It's better then implicit.

Edit: read the article, it neither makes python cryptic or culty. The Jedi
thing is just a cool SW reference. That said list comprehensions are a little
cryptic to me since I haven't written python in awhile. Personally I think an
important attribute of elegance in programming is how little you need to read
the docs to understand something, and the more one-liners we do the more times
we are likely to have to look at the documentation before reading it (not
necessarily a bad thing, but sometimes a time sink and sometimes people won't
look up the docs when they should!).

------
vadim41
Cute oneliners that are close to unreadable and definitely unmaintainable.

If I see this kind of "cute" code in code reviews, there is some serious
scolding to be done.

~~~
spamizbad
Eh, while comprehensions with multiple for-ins push the limits of good taste,
I don't think:

    
    
        planets_set = {
            planet for episode in episodes.values() 
            for planet in episode['planets']
        }
    

is less maintainable in a Python shop than say...

    
    
        planets = set()
    
        for episode in episodes.values():
            planets.update(episode['planets'])
    

Although the latter will likely make perfect sense to most non-python
developers. The former is faster and has a smaller memory footprint and might
be preferred when dealing with a larger or more irregular data sets.

The former also has the advantage of not leaking the "episode" variable into
the function/method scope, which could introduce a subtle bug if that variable
gets conditionally reused. So while it's harder to understand for a less-
experienced python developer, the set comprehension solution is inherently
safer due to python's design.

~~~
AnkhMorporkian
Additionally, generator comprehensions can be far more efficient than any
simple for loop. If you had a generator with a billion star coordinates being
read from some file, you'd never be able to load it all into memory. So,
instead of manually making a generator function, you could just do

    
    
        coordinates = (star.x, star.y, star.z for star in star_map)

