
A look at some of Python's useful itertools - naiquevin
http://naiquevin.github.io/a-look-at-some-of-pythons-useful-itertools.html
======
wting
My intention is not to be snarky, but people post all the time about
discovering the itertools or collections library. I notice it's a common gap
in newer Python programmers.

Save yourself time and effort down the road and read through both libraries'
documentation, they're well worth the effort:

<http://docs.python.org/3.3/library/itertools.html>

<http://docs.python.org/3.3/library/collections.html>

I tend to use defaultdict, deque (thread safe), namedtuple, imap, izip,
drop/takewhile. In Python 3, map and zip have been replaced with their
itertools equivalents.

I blame Haskell for all the lazy evaluation influence. :P

~~~
pjmlp
>My intention is not to be snarky, but people post all the time about
discovering the itertools or collections library. I notice it's a common gap
in newer Python programmers.

Not only in Python, but programming languages in general.

I still find people writing Java or .NET code that aren't aware of all nice
classes that are part of the runtime and end up creating their half baked
solutions for their problems.

Nowadays developers seem to code without reading.

~~~
Silhouette
_Nowadays developers seem to code without reading._

When your standard library documentation is so vast that it would take weeks
to read and understand it all, and you'd never remember most of it anyway
without context and experience using it, I don't think "coding without
reading" is really a fair complaint.

We as an industry need to get better at documentation, and in particular about
separating tutorial/overview documentation that presents a map and summary of
what's available from reference documentation, or we're going to keep
reinventing wheels like this.

Python is a particularly unfortunate example, because while its documentation
is vast, it has very little tutorial/overview material beyond the very basics.
For example, given that a substantial proportion of Python's standard library
actually doesn't work very well in practice, it would be helpful to have a
deeper tutorial/map document somewhere that introduced the various areas of
the standard library and that also promoted the good ones and suggested
popular alternatives for the not so good ones where they exist.

~~~
Too
Documentation discoverability is one problem. Willingness to learn and trust
is another one. People simply want to use things that they themselves have
proven to work before.

As an example an old colleague wanted to dump some data from python to a csv-
file and did this by for-looping through each row and each item and
concatenating each cell and a semicolon to a string. Even after pointing out
to him that python already has a built in csv writer, that handles all issues
of escaping etc, he didn't want to use it because he didn't know what it did
and didnt want to learn anything new. His version didn't even do escaping
inside the for-loop and he didn't see the issue of not doing it. To him the
for-loop gave exactly the same result and didn't require any learning and was
thus better, and why change something that works... My last suggestion was to
at least use ";".join(...) but it was also a bit too magic so he stuck to his
well known for-loop.

Usually standard libraries are quite reliable but in some cases, and
especially if adding third party libraries, bugs and performance issues inside
the library can really give you hell. If the library is supposed to just
perform a simple task maybe you would rather implement it yourself as you then
also have influence to fix those issues yourself later. Experiences like this
can scare you away from even the most reliable libraries in the future.

~~~
Silhouette
_Usually standard libraries are quite reliable but in some cases, and
especially if adding third party libraries, bugs and performance issues inside
the library can really give you hell._

I think part of the problem is that the statement above is maybe not as true
as it used to be.

Let's stick with Python as an example, though it's far from the only culprit
so I hesitate to single it out here. I have a growing list of areas of the
standard library that today I just assume won't work acceptably. I have tried
to use them before, and I have found them to be either bug-ridden or not
robustly portable or so slow as not to be worthwhile or missing enough basic
functionality that you need to add something else anyway or just write
everything from scratch. The everyday stuff in Python is pretty good, the
basic data structures and common supporting functions like itertools, but when
you start getting into the less common areas I have a very low opinion of the
design and quality of the Python standard library, and that opinion is born of
direct personal experience.

On top of the quality and robustness, there's also usability to consider. Even
if some of Python's built-in libraries do work, there might be much neater,
easier ways to achieve the same result that are only a `pip install` away.
Libraries like Kenneth Reitz's Requests come immediately to mind; if I were
teaching a newbie to program Python tomorrow, somehow I doubt urllib[N] would
feature much.

I'm not sure how that hypothetical newbie is supposed to discover these things
today without someone experienced to guide them, though. Whether it's Python
and PyPI or Perl and CPAN or C++ and Boost or whatever other language and
library repository you like, there's a lot of collective wisdom about the
easiest/safest/fastest ways to get things done, but it lives in the combined
experience of veterans rather than in comprehensive tutorials to follow once
you've got the basics down. And that's only when there is already a
recognisable place to look for general use third party libraries, not even
considering all the third party libraries that might be out there but for
whatever reason aren't incorporated into any _de facto_ standard repository to
make discovery (relatively) easy if you at least know what you're looking for.

Is it any wonder that newbies reinvent wheels under these conditions? It seems
almost inevitable to me.

~~~
acjohnson55
I haven't come away with the same impression of the Python standard library.
Besides urllib, what are the biggest offenders in your mind?

~~~
Silhouette
From a few recent projects:

The subprocess system is fairly awful in both usability and portability.

The shutil filesystem tools had bugs and documentation issues the only time I
ever tried to use them.

The various compression libraries had horrible performance problems last time
I tried them; shelling out to various command-line equivalents was around 4-5x
faster.

The command-line parsing tools are OK if you want to write a *nix-style
command line tool, but not quite flexible enough for more advanced/customised
uses.

I have yet to discover any decent GUI library for Python, standard or
otherwise, so I'm not sure whether this one counts.

Logging is flexible but can be awkward to configure, particularly across an
application that wants various logging itself but also uses libraries that
offer to log.

~~~
SEJeff
You can build a cli parser exactly like git uses (positional and short/long)
using argparse. What is difficult about that? Opt parse perhaps, but if you're
talking about argparse, it seems like you're just whining. The rest of your
comments I (overall) agree with

~~~
Silhouette
_You can build a cli parser exactly like git uses (positional and short/long)
using argparse._

But what if I want something that _isn't_ like Git? I'm slightly amused that
anyone would suggest Git as some sort of example of a good CLI, but in any
case, not all platforms share the command line conventions of *nix shells.

Suppose I'm running on Windows (where options conventionally start with '/')
and I don't want all the magic that argparse does with initial '-' characters.
If I set prefix_chars to '/', does that also disable the '--' pseudo-argument?
We were originally talking about documentation, and as far as I'm aware, the
documentation for argparse doesn't actually specify this either way.

Suppose I want to have a set of basic choices, each setting a flag to say it's
there. What if I also want some shortcut choices that represent combinations
of the basic ones and set all of the corresponding flags? As far as I'm aware,
you can't quite do this with any of the standard actions, so you have to start
writing an entire new class to define a custom action instead. At least you
can do that, but what was wrong with accepting a simple function, and where
does anything say how argparse.Action is actually defined and why it's
necessary instead?

Suppose I want to present the same data as the automatic help option, but
reformat it in some completely different way that makes more sense for my
program before it gets printed? There are assorted functions to display or
return formatted help strings, but nothing seems to just give back a neat
bundle of the relevant information for further processing. Collecting the data
and rendering it for output are conflated.

Argparse, like much of the Python standard library, has a lot of power as long
as you want to do things exactly its way, but it's not designed in a way that
is particularly easy to extend. IMHO, a better strategy for designing standard
libraries for languages is to create templates/frameworks/whatever you want to
call them, and then to provide some specific implementations for basic cases.
This way, when inevitably someone needs to go beyond the out-of-the-box
functionality, they can still fit in with established conventions instead of
starting over from scratch, which is generally better both for compatibility
and for minimising the amount of extra logic that much be built on top of the
tried and tested standard library. Of course you do have to be careful not to
go too far and make simple cases look artificially complicated, but no-one
ever said designing good APIs was easy. :-)

------
masklinn

        def flatmap(f, items):
            return itertools.chain(*map(f, items))
    

1\. in Python 2 `map` is eager which — as with the previous `even` filter —
may lead to unnecessary work if you only need part of the list (or a dead
process if the input is infinite...). itertools.imap (or a generator
comprehension) would be better. This is "fixed" in Python 3 (where the `map`
builtin has become lazy and `itertools.imap` has been removed) but

2\. it's being eagerly unpacked through *, itertools.chain also provides a
from_iterable method which doesn't have that issue (and can be used to flatten
infinite streams), introduced in 2.6

So `flatmap` would probably be better as:

    
    
        def flatmap(f, items):
            return itertools.chain.from_iterable(
                itertools.imap(
                    f, items))

~~~
naiquevin
Thanks for the corrections. I have made an edit (although not sure how long it
will take to clear the github-pages cache)

------
serjeem
I wrote my favorite function ever last semester with itertools! It (roughly)
lazily generates a list of dictionaries that map players to their moves for
all possible moves. It turns out you can do that with a chain of combinations,
two cartesian products, and an imap:
[https://github.com/shargoj/acquire/blob/master/gametree.py#L...](https://github.com/shargoj/acquire/blob/master/gametree.py#L72)

------
davvolun
On the other hand, I suspect some early programmers might get ahold of this
and perform a lot of premature optimizations. A piece of code that runs 20
loops instead of 8 once every couple of hours probably doesn't need to be
optimized. A piece of code that does two checks when one would suffice that
runs 1000 times every second might need optimization. Profile first, then
optimize.

