
The Python Standard Library - Where Modules Go To Die - b14ck
http://www.leancrew.com/all-this/
======
nkoren
I'm not really a Python programmer, but a hacker who occasionally has cause to
pick up Python scripts and do stuff with them. Perhaps I've been unlucky, but
every time I've done this, it's turned into a profoundly frustrating exercise.
There have always been dependencies outside the standard library, and those
have had dependencies -- which more often than not, are incompatible with
whatever version of Python my environment is set up for. I've frequently run
across scripts with dependencies that somehow only execute in _mutually
incompatible versions of Python_ , which always makes for an exceedingly
aggravating day of programming.

As much as people love to bash PHP -- and I agree that it's pretty awful as a
_language_ \-- its standard library is so comprehensive, backwards-compatible,
and superbly-documented that I have _never_ had a comparably aggravating
experience with it. The same is true of Javascript: a language with warts, but
whenever I try something, it Just Works.

Like I say, perhaps I've just been unlucky, but my distinct impression of
Python has been that it's a beautiful language surrounded by a particularly
problematic ecosystem of incompatible libraries and sparse documentation. I
suspect that the Python community would benefit from paying less attention to
the purity of the language, and a lot more attention to the quality of
everything surrounding it.

~~~
heretohelp
I think you've really just been unlucky.

I've had the same experience before, but chiefly with Ruby.

Interestingly, my problems with this in both Python and Ruby have evaporated
once I got in the habit of using virtualenv/rb-env/rvm for my development
environment.

Python is about the cleanest/nicest experience I have in _any_ language, for
the record. Only language that comes close is Clojure.

Leiningen is...legendary.

~~~
xaa
I would argue that Python's approach is FAR better than Clojure/Leiningen's
"no batteries included" approach.

Suppose you want to do a very common task like parse some XML. In Clojure, the
workflow is:

    
    
      1. Go to Github or Clojars, find the latest version number  of clojure.data.xml
    
      2. Add this version number to your project.clj
    
      3. Lein deps and restart the repl
    
      4. Re-acquire whatever REPL data you had
    

In Python, it's:

    
    
      1. import xml.{sax,dom,etree}
    

And, paradoxically, the availability of all these different versions of
libraries in Clojure leads to MORE conflicts between libraries than would
otherwise be the case, not less. In Python, you may not agree that, say, the
"os" or "subprocess" modules are optimal -- but by golly, they're consistent.

~~~
dpritchett
Thanks to pip I often don't even bother with the Python stdlib for crusty
things like one-off web scrapes or XML parsing. Here's a recent example where
I wanted to read some attributes out of some remote XML and did it with
requests and PyQuery rather than urllib and xml:

    
    
        _domains_text = requests.get(API_URL + "/domainlist.xml").content
        _domains_db = pyquery.PyQuery(_domains_text)
        
        DOMAINS = [d.values()[0] for d in _domains_db('domain')]

~~~
RegEx
I like to set

    
    
        jQuery = pyquery.PyQuery(someHTMLDocumentString)
    

So I can use jQuery like I'm used to. At this point, you can do

    
    
        links = jQuery('a')
    

or whatever.

------
atdt
The fact that the standard library is well-maintained, carefully debugged, and
backward-compatible is a far stronger indicator of Python's awesomeness than
the existence of shiny, new libraries. Hackers naturally gravitate toward
high-visibility projects with brave horizons and bold scopes. By contrast, it
is incredibly hard to find the motivation to update, for the umpteenth time, a
warty API -- and that's precisely the reason why contributions of the latter
sort are the truer test of the vitality of a language's ecosystem.

~~~
JoachimSchipper
It _is_ stable, but there are some really nasty warts.

To pick just one that bites Python programmers _all the time_ : by default,
the ssl library does not validate the server certificate at all. Not
validating the certificate makes SSL/TLS almost useless. But this is still the
default (see <http://docs.python.org/dev/library/ssl.html#socket-creation>,
"CERT_NONE"), because the standard library is "stable".

~~~
jordanb
On the contrary, the vast majority of the time, when one is using SSL, they're
using it because they want encryption, rather than identification.

The certificate system surrounding SSL is a _complete_ mess. It does virtually
nothing other than trigger false positives for people who who haven't paid the
appropriate "security partner."

The _very_ rare person who is actually using SSL for identification rather
than just to establish an encrypted TCP connection, and therefore cares about
certificates, can change the default.

PS: I know the standard response to this, that encryption without
identification is useless, because without identification your counter-party
might be Eve. In reality, in the real world, that doesn't happen. MITM attacks
are extremely rare. And the real Eves on the net (phishers) can easily obtain
signed certificates that will fool pretty much any end user.

~~~
jbri
If MitM attacks are so rare, why bother encrypting your traffic in the first
place? Packet-snooping attacks are also "extremely rare" by most metrics, so
why protect against one but not the other?

Either go all the way on security, or be obvious about not having any.
_Appearing_ secure when in actuality you're not is the worst option.

~~~
randallsquared
_Packet-snooping attacks are also "extremely rare" by most metrics [...]_

Really? NSA boxes in AT&T (and presumably other) switching stations suggest
that for US traffic it's extremely common.

------
IgorPartola
Agreed with the OP. The following is a shameless plug:

Python's ConfigParser module is a pain to use. It provides no validation, only
supports a limited number of types of data you can retrieve, etc. Similarly,
getopt vs optparse vs argparse is a mess. getopt is universal: not only is it
going to be in all versions of Python, but it is also the same library
available in virtually every other language. The problem with it is that it is
not declarative, so you will typically see a giant if/elif statement that goes
with it. argparse/optparse are better, but aren't universal even between
versions of Python, though argparse has been backported and is available via
pypi.

To unify all this into one convenient module, I ended up writing
<http://ipartola.github.com/groper/>. groper lets you specify your parameters
declaratively, and if you specify defaults, use them right away without having
to create/modify a config file. It automatically figures out the priority of
arguments: cmd > config > defaults. It also has some niceties such as the
ability to automatically generate usage strings, give the user intelligent
error messages, generate sample config files, etc.

~~~
slurgfest
I don't understand why I shouldn't be using argparse. Just using argparse
means no mess of 'getopt vs optparse vs argparse' because I am not using all
those other libraries. I don't see anything seriously wrong for argparse. How
does it help me to use a third party module rather than argparse?

~~~
IgorPartola
Using argparse is probably the safest approach. However, argparse does not
work with config files; groper does. So if you have more than a half-dozen
options, you should use groper (or something similar).

------
bunnyhero
The permalink for the article is [http://www.leancrew.com/all-
this/2012/04/where-modules-go-to...](http://www.leancrew.com/all-
this/2012/04/where-modules-go-to-die/) (the posted link is actually the home
page of the blog).

------
agentultra
I've heard core Python developers tell people not to worry about getting a
module into the stdlib. The problem being that once the module is there it
won't be able to change much. APIs have the exact same problem. If you change
it, you're changing other peoples' software. Tight coupling.

Is it a terrible way to write software? Maybe... but perhaps that's a
different discussion.

I think the requests library is amazing. It has a much more simple API than
urllib/urllib2. Does it need to replace those modules in the stdlib? I hope
not!

There are only three reasons I would write a module/package that depended
solely on stdlib:

    
    
      1. The module/package would be distributed primarily through package management systems.
    
      2. The installation of my module needs to avoid depending on anything else outside of a base python installation.
    
      3. The module or package will need to be supported for a long time and will likely not be updated frequently.
    

The first case is because you can't control what versions of third-party
libraries the package manager will make available. Some might run your
setuptools script while others may not. It's just easier to live with the
cruft/warts of the stdlib and be sure that they'll always be there.

The second case covers a very unique situation. Modules and libraries written
with this constraint are typically targeting one of two different kinds of
developers. The first are the beginners who may not know about development
environments and versioning. The other are experienced developers who want a
minimalist script for their little one-off utility. Both should require zero
dependency installation if possible.

The final case is harder to define up front. If you're writing something that
you expect to run for a long time and receive little maintenance (ie: cron
scripts, tools, etc) then you don't want to deal with API updates breaking
your code. Fire and forget is what a long-term stable API gets you.

------
chimeracoder
> An overstatement, certainly, but with more than a germ a truth. Once a
> library is enshrined in the standard set, it can’t change radically because
> too many programs rely on it—and its bugs, idiosyncrasies, and
> complications—remaining stable.

That's a problem inherent in the standardization process, though - it's all
but contradictory to have something be both 'standard' and 'continuously
improving'.

Once something enters the standard, does anyone propose a better way of
removing cruft without constantly deprecating everything, rendering the
concept of a 'standard' somewhat meaningless?

~~~
lloeki
Things come and go, and we have seen a number of deprecations in python
already. urllib predates urllib2, while subprocess deprecates a number of
things itself that really came from C. getopt was a port of the eponymous C
library, which optparse meant to replace, which was itself deprecated on favor
of argparse.

I would really not be surprised to see envoy, requests and so on come up in
the standard lib at some point.

------
norswap
The thing with python standard library is that it is crapily documented imho.
I often can't make heads or tails of it, while I have a much simpler time with
any other language (you name it: Java, Ruby, PHP, C, Scala, Lisp, ...).

~~~
aslewofmice
I felt that way about some of the standard documentation but was relieved to
find: <http://www.doughellmann.com/PyMOTW/>

I also highly recommend checking out Doug Hellman's book 'The Python Standard
Library by Example'. He presents every (or almost every) standard library
module with simple explanations and plenty of examples.

~~~
zeeg
I've never seen this before, but that is an AMAZING improvement on the
standard docs

------
eliben
Yes, for a language as widely deployed and used as Python, retaining backwards
compatibility and stability is more important than adding new and shiny tools
to the stdlib at a faster pace. Users rely on the fact that a module in stdlib
will remain there and will remain stable for a long time. More modules means
more maintainers, and Python is an open-source project developed by
volunteers. It's that simple.

I'm not sure what the solution this article proposes is. The tradeoff between
"coolness" and "stability" is inherently difficult, and I'm sure Python is not
the only language "suffering" from it.

After all, it's quite easy to install a new Python module, and not much harder
to distribute it with your application (for web apps it's even easier), so
what is the problem?

------
EvilTerran
It's funny, I've had the opposite problem. I was trying to write an IRC bot in
Python, noted there didn't seem to be a standard library module for the IRC
protocol, and so found myself looking at this:

[http://pypi.python.org/pypi?%3Aaction=search&term=IRC](http://pypi.python.org/pypi?%3Aaction=search&term=IRC)

That's 400+ results - at least 20 of which are actually IRC protocol modules.
There's no way of telling how mature each one actually is 'til you download
it. It turned out the first three I tried were undocumented, buggy,
incomplete, or otherwise no good.

So I gave up on PyPi and hacked it as an xchat plugin instead.

\----------------

Perhaps the way forward would be styling your package repo after, say,
addons.mozilla.org -- add just enough community functionality (as in
ratings/reviews/"times downloaded" counters/etc) to allow the occasional gems
to rise to the top of the muck. Once one solution for a given problem has been
established as the best (well, most popular), that'll get more eyeballs on its
internals as well, and it'll only increase its lead until it's de facto
standard -- but the possibility is still there for a newcomer to dethrone it
if it's genuinely better. And meanwhile, both can exist side-by-side without
causing ugly compatibility issues.

~~~
slurgfest
I believe that PyPI used to have some kind of popularity contest functionality
that got killed.

I have to say I'm not sure that selecting the package you want to use is
really the problem which PyPI needs to solve. It isn't the app store. That
said, PyPI does provide a 'weight' in searches, which seems to track with
popularity and freshness somehow.

~~~
EvilTerran
The title text on "weight" says "Occurrence of search term weighted by field
(name, summary, keywords, description, author, maintainer)". So, not
popularity/freshness, just a rough metric for how well it matches your search.

Indeed, PyPI might not be the right place for a community rating system --
perhaps a site could be built on top of it to provide that sort of
functionality.

------
zeeg
This happens in every language. I dont think it's that big of a deal.

In all honesty, you could continue to maintain a package outside of stdlib,
and just require a newer version (which gets installed via the standard
packaging tools). This type of behavior isn't well defined in Python, but it's
not unrealistic to think it could happen.

~~~
true_religion
How is it not well defined in Python?

~~~
zeeg
Well it would work just fine, but you'd always end up requiring the external
dependency even if you didn't need to.

For example, let's say there was a new urllib released (its still called
urllib). It's now version 2.0, but the stdlib version is 1.0.

If your package said "I need urllib==1.0", it would have know way of
understanding that the version was already included within the standard
library.

That said, it __would __download the correct package (assuming it existed) and
work just fine.

~~~
anthonyb
> it would have know way of understanding that the version was already
> included within the standard library

Other than by introspecting which packages are installed, that is. Most of
them will have a VERSION, __version__ or _version attribute which tells you.

~~~
jmoiron
This is available via the pkg-info in all installed packages; iirc version is
required for distutils, and there is pep386 for version number format, so it
should be possible to determine version number as well as compare them for all
well-behaved packages. There is even a package which will find and parse pkg-
info for an installed package called pkginfo:

<http://pypi.python.org/pypi/pkginfo>

~~~
anthonyb
GP was talking about built in modules in the standard library. I don't think
they use distutils, but many of them still have some sort of version number.

------
100k
This has kind of happened in Ruby, too.

Fortunately, Ruby gems are super easy to install and the standard library got
some much-needed spring cleaning in 1.9.

Python could use the same. There have been many times where I've wanted to do
some simple task that would be made easier with an external library (like
Requests) but I'm not going to bother dealing with the Python module install
pain for a one-off task.

~~~
atdt
> the Python module install pain for a one-off task.

"pip install requests" ?

~~~
RegEx
Unfortunately, it seems the official documentation on installing Python
modules[0] makes absolutely no mention of pip or even easy_install. Seems like
something that should be there, right?

[0]: <http://docs.python.org/install/>

~~~
anonymoushn
I'm glad that there's no mention of easy_install. I have no idea why someone
would want to use a package manager that can't uninstall things.

~~~
RegEx
Yeah, good point. That puzzled me as well before I learned of pip.

------
makecheck
Every language's standard library needs a "current best practices" concept,
even if it's just a well-maintained document and not something structural like
a special namespace.

I think the Python "decorator" concept goes a long way toward cleaning up
code. Basically you can add a decorator to a routine that you've deprecated so
that it will complain if it's actually used (you can even include advice on
what would be a good replacement call).

As far as cleaning up what's installed as standard, it's not really practical
to remove anything (the fact that it stays is one of the attractive things
about Python in old code bases). What you can do though is define a preferred
namespace, e.g. "preferred"; this would physically contain only those
libraries that are recommended, and perhaps even forked copies of modules that
only contain the _functions_ that should be used. This gives programs the
option to explicitly import from "preferred" and request purity over long-term
stability.

------
daxelrod
Permalink to this post: [http://www.leancrew.com/all-this/2012/04/where-
modules-go-to...](http://www.leancrew.com/all-this/2012/04/where-modules-go-
to-die/)

(The current article link goes to the front page of the author's blog.)

