
Ask HN: Good Python codebases to read? - nekopa
Hi all, 
I am currently going through Learn Python the Hard Way, and am on the review section where I have to read through source code.<p>Can anyone recommend some good open source Python software I can look at? I am specifically looking to see ones that employ idiomatic Python, and maybe see how they approach testing (I am not completely new to programming, just rusty after being out of the field for a long time)<p>Extra bonus points if the software is something you use regularly.<p>Cheers!
======
aaronjgreenberg
Jumping on the Kenneth Reitz train, you might check out The Hitchhiker's Guide
to Python: [http://docs.python-guide.org/en/latest/](http://docs.python-
guide.org/en/latest/)

He recommends the following Python projects for reading:

* Howdoi ([https://github.com/gleitz/howdoi](https://github.com/gleitz/howdoi))

* Flask ([https://github.com/mitsuhiko/flask](https://github.com/mitsuhiko/flask))

* Werkzeug ([https://github.com/mitsuhiko/werkzeug](https://github.com/mitsuhiko/werkzeug))

* Requests ([https://github.com/kennethreitz/requests](https://github.com/kennethreitz/requests))

* Tablib ([https://github.com/kennethreitz/tablib](https://github.com/kennethreitz/tablib))

Hope that helps---good luck!

~~~
tallerholler
I would add the Django project to that list as it's a very large, mature, and
successful open source python project -
[https://github.com/django/django](https://github.com/django/django)

~~~
rectangletangle
Django's source is very high quality. Though due to the large scope of the
project, there are necessarily many layers of indirection, which may be a bit
daunting for someone who is just starting out.

However reading the less abstract parts may help. For instance, the paginator
is pretty self contained.
[https://github.com/django/django/blob/master/django/core/pag...](https://github.com/django/django/blob/master/django/core/paginator.py)

~~~
rtpg
I really like the API of the framework, unfortunately some of the core
elements suffer from being extremely stateful code

The "self.thing = bar" in one function that only gets used in some other
function ( or even worse something only used in a companion class) pattern is
super prevalent.

Might just be me but I think a lot of the older code suffers from massive
locality problems that makes debugging framework bugs super tricky

~~~
rectangletangle
Honestly, I have to agree with you to an extent.

I think a lot of the issues involving overuse of state, are primarily related
to using OO when a pure function would suffice. It's just too tempting to
dynamically assign attributes to mutable instances.

To be fair, when Django is used properly it isn't usually an issue. Besides
the queryset/model API is extremely nice, and at this point very polished.

------
jonjacky
Peter Norvig's examples. They are quite short and include much explanation in
addition to code. They also include tests and benchmarking code.

[http://norvig.com/lispy.html](http://norvig.com/lispy.html)
[http://norvig.com/lispy2.html](http://norvig.com/lispy2.html) (Lisp
interpreter)

[http://www.norvig.com/spell-correct.html](http://www.norvig.com/spell-
correct.html) (Spelling corrector)

[http://norvig.com/sudoku.html](http://norvig.com/sudoku.html) (Sudoku solver)

Also his online course Design of Computer programs includes many short, well-
explained Python examples:

[https://www.udacity.com/wiki/cs212](https://www.udacity.com/wiki/cs212)

~~~
aaronchall
His coding style is not common to most writers of Python.

He's using overly terse, poorly descriptive variable names, not using
multiline strings for docstrings, not indenting where he should, and one-
lining if/elif statements and function definitions. This style does not
contribute to readability.

This is not how I would want someone just learning Python to learn it.

~~~
merlincorey
Maybe it's because I am also something of a lisper (like Norvig), but I don't
see anything wrong with inline ifs (after all, that is how if works in lisp,
with returns for a conditional, like the ternary operator) or lambda
functions. In fact, I find that improves readability dramatically because it
more declaratively says what you are trying to accomplish in many cases.

For example:

    
    
      absolute_path = lambda path: path if path.startswith('/') else '/' + path
    

To me this is perfectly clearly a simple function whose only purpose is to
prepend forward slashes to unix-style paths. Is the following _really_ so much
more readable?

    
    
      def absolute_path(path):
          if not path.startswith('/'):
              path = '/' + path
          return path
    

To my eyes and mind, the second example is not any more readable at the
expense of several lines of code.

~~~
aaronchall
The former is absolutely identical (semantically) to the latter, with two
exceptions: 1) the former function does not know its own name and 2) the
latter function can (and should) be documented with a docstring. I find the
latter eminently more readable, and I work daily in a code base under
development by 3000 Python developers for over 5 years.

Considering that the creator of the Python language considered getting rid of
lambdas because they are essentially limited functions and thus violate
Python's "one obvious way to do it" philosophy, I'd rather those _learning_
Python to be shown the latter, rather than the former.

From a development and version control perspective, as soon as the lambda
function requires more than a simple expression, i.e. a compound statement
([https://docs.python.org/2/reference/compound_stmts.html](https://docs.python.org/2/reference/compound_stmts.html)),
you have to trash the whole line, instead of adding perhaps a single extra
line of content.

~~~
merlincorey
> The former is absolutely identical to the latter

Oh, really? Let's compare:

    
    
      >>> absolute_0 = lambda path: path if path.startswith('/') else '/' + path
      >>> def absolute_1(path):
      ...     '''Return the absolute unix path from a given path name'''
      ...     if not path.startswith('/'):
      ...         path = '/' + path
      ...     return path
      >>> import dis
      >>> dis.dis(absolute_0)
        2           0 LOAD_FAST                0 (path)
                    3 LOAD_ATTR                0 (startswith)
                    6 LOAD_CONST               1 ('/')
                    9 CALL_FUNCTION            1
                   12 POP_JUMP_IF_FALSE       19
                   15 LOAD_FAST                0 (path)
                   18 RETURN_VALUE        
              >>   19 LOAD_CONST               1 ('/')
                   22 LOAD_FAST                0 (path)
                   25 BINARY_ADD          
                   26 RETURN_VALUE        
      >>> dis.dis(absolute_1)
        4           0 LOAD_FAST                0 (path)
                    3 LOAD_ATTR                0 (startswith)
                    6 LOAD_CONST               1 ('/')
                    9 CALL_FUNCTION            1
                   12 POP_JUMP_IF_TRUE        28
        5          15 LOAD_CONST               1 ('/')
                   18 LOAD_FAST                0 (path)
                   21 BINARY_ADD          
                   22 STORE_FAST               0 (path)
                   25 JUMP_FORWARD             0 (to 28)
        6     >>   28 LOAD_FAST                0 (path)
                   31 RETURN_VALUE        
    

These functions are actually not identical in their computation - only their
result.

> 1) the former function does not know its own name

If you think that is really important (hint: it's not [from a lisper
perspective, anyway]), Python thankfully allows you to do this:

    
    
      >>> absolute_0.__name__
      '<lambda>'
      >>> absolute_0.__name__ = '<lambda "absolute_0">'
      >>> absolute_0.__name__
      '<lambda "absolute_0">'
    

> 2) the latter function can be documented with a docstring
    
    
      >>> absolute_0.__doc__
      >>> absolute_0.__doc__ = 'Return the absolute unix path from a given path name'
      >>> help(absolute_0)
      Help on function <lambda "absolute_0">:
      <lambda "absolute_0">(path)
        Return the absolute unix path from a given path name
    

Of course the only reason you can't put docstrings on a lambda function in
python is because the forced indentation of code and implicit return with no
indented block available is what Guido went with for Lambda.

> Considering that the creator of the Python language considered getting rid
> of lambdas

Guido is not a proponent of functional programming in general and claims that
map, reduce, and filter are so much harder to understand than list
comprehensions (which implement some common map, reduce, and filter,
operations with special optimized syntax) that he tried to get them removed
from the language too. Thankfully for us users of the language, this view did
not win through and we can still use map, reduce, and filter in python, if we
choose.

~~~
aaronchall
Spare us the half-baked hackery. List comprehensions and generator expressions
_have_ replaced all need for map, filter, and lambdas, and are far more
readable. For someone _new_ to Python, halfway through LPTHW, they don't need
those things.

Hey, maybe you can help me decipher this, I've always wondered exactly what's
going on here: [https://docs.python.org/2/faq/programming.html#is-it-
possibl...](https://docs.python.org/2/faq/programming.html#is-it-possible-to-
write-obfuscated-one-liners-in-python)

    
    
      # Mandelbrot set
      print (lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+y,map(lambda y,
      Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
      Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
      i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
      >=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
      64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy
      ))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)
      #    \___ ___/  \___ ___/  |   |   |__ lines on screen
      #        V          V      |   |______ columns on screen
      #        |          |      |__________ maximum of "iterations"
      #        |          |_________________ range on y axis
      #        |____________________________ range on x axis

~~~
merlincorey
> Spare us the half-baked hackery.

Ad hominem? I guess I win.

> List comprehensions and generator expressions have replaced all need for
> map, filter, and lambdas

Please explain how list comprehensions and generators have replaced the need
for lambdas.

    
    
      >>> sorted(((x, -(x**2)) for x in xrange(10) if 0 == x % 2), key=lambda item: item[1])
      [(8, -64), (6, -36), (4, -16), (2, -4), (0, 0)]
    

Your "distaste" of functional programming constructs is right up there with
Guido's.

~~~
aaronchall
Nice lambda. You've defended yourself admirably. Did you know there's an
`operator.itemgetter` function that does that?

So that's:

    
    
      >>> sorted(((x, -(x**2)) for x in xrange(10) if 0 == x % 2), key=operator.itemgetter(1))
      [(8, -64), (6, -36), (4, -16), (2, -4), (0, 0)]
    

Do note that good code and code golf are two different things! :D

~~~
merlincorey
> Nice lambda. You've defended yourself admirably.

Thanks!

> Did you know there's an `operator.itemgetter` function that does that?

Yes, I'm quite aware! Are you aware the "useless" functional solution with
lambda is two characters shorter?

    
    
      >>> len('lambda item: item[1]')
      20
      >>> len('operator.itemgetter(1)')
      22
    

Cause you're apparently not aware that I was demonstrating a use-case for
lambdas as one-off functions that are passed to other functions (which is an
abstract concept from the particular function used), and you didn't
demonstrate how list comprehensions or generators make them not-needed. Of
course, that's because it was a leading question and the answer is that the
concepts are orthogonal so it cannot be demonstrated.

~~~
ectoplasm
I bet you two would be good friends IRL.

~~~
aaronchall
I'm sure we would. I'll buy the first drink. :D

------
clinth
Requests -
[https://github.com/kennethreitz/requests](https://github.com/kennethreitz/requests).

How to make a usable api. The decisions that went into each method call were
fantastic. Great test coverage as well. I use package in most python
development.

~~~
Myrmornis
requests is very useful, but it always gets mentioned as a good python
codebase and I'm not sure I agree. One example:

The first thing many users will do is

    
    
      requests.get?
    

Which tells them that it takes some kwargs, but doesn't tell them what those
kwargs are. It's easy, especially for a newcomer, to read "optional arguments
that `request` takes" and fail to understand that they should look up the docs
on (not-really-encouraged-as-part-of-public-API) function `request`. That's
pretty bad; those kwargs are important! (The reason is because requests.get is
implemented as a call to request(method, ..., __kwargs) but the user doesn 't
care what the implementation-level reason is.)

Beyond that I did look into the codebase once to investigate a possible bug
and there were a few python style things I wanted to fix, but I don't remember
them so this comment probably sounds kind of annoying (it would annoy me if I
were reading it not writing it...). It didn't strike me a really clean
codebase. But yes the library is very useful and I'm sure it's a pretty decent
python codebase.

    
    
      >>> requests.get?
      Signature: requests.get(url, params=None, **kwargs)
      Docstring:
      Sends a GET request.
      
      :param url: URL for the new :class:`Request` object.
      :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
      :param \*\*kwargs: Optional arguments that ``request`` takes.
    

Here are all the kwargs the user probably wanted to know but failed to find
out and was forced to either browse online docs or the source code:

    
    
      def request(method, url, **kwargs):
          """Constructs and sends a :class:`Request <Request>`.
          :param method: method for the new :class:`Request` object.
          :param url: URL for the new :class:`Request` object.
          :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
          :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
          :param json: (optional) json data to send in the body of the :class:`Request`.
          :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
          :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
          :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': ('filename', fileobj)}``) for multipart encoding upload.
          :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
          :param timeout: (optional) How long to wait for the server to send data
              before giving up, as a float, or a (`connect timeout, read timeout
              <user/advanced.html#timeouts>`_) tuple.
          :type timeout: float or tuple
          :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
          :type allow_redirects: bool
          :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
          :param verify: (optional) if ``True``, the SSL cert will be verified. A CA_BUNDLE path can also be provided.
          :param stream: (optional) if ``False``, the response content will be immediately downloaded.
          :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
          :return: :class:`Response <Response>` object
          :rtype: requests.Response

~~~
jdiez17
Looks like you did the legwork of listing the params already, so you might as
well send a PR with that docstring.

~~~
Myrmornis
OK, maybe I will open an issue. The obvious concern is how to avoid
duplicating the text among the various HTTP verb functions. In theory python
allows the docstring to be manipulated via __doc__. However I don't _think_
there is a precedent for using that mechanism to avoid duplication of
docstring content that is considered good style, but perhaps someone could
correct me if that is wrong.

~~~
merlincorey
You'd probably want to make a metaclass that handled the manipulation of
__doc__ for shared verbs if you didn't want to duplicate the data too much.

~~~
Myrmornis
OK, but in requests they are top-level functions, not methods. Is there
anything wrong with the below? I don't think I've seen it done.

    
    
      __shared_docstring_content = """
      bar
      baz
      """
    
      def f():
          "f docstring"
          pass
    
      f.__doc__ += __shared_docstring_content
    
    
      def g():
          "g docstring"
          pass
    
      g.__doc__ += __shared_docstring_content

~~~
andreasvc
I would do simply """f docstring\n%s""" % _shared_docstring, no need for a
separate concatenation. However, I wonder whether sphinx would handle this.

~~~
shoyer
This sort of thing works fine. We use it for pandas all the time.

------
jscottmiller
Bottle:
[https://github.com/bottlepy/bottle](https://github.com/bottlepy/bottle)

It's a nice, small, fast web framework. Great for building APIs. Also, it's
one ~3k loc, readable file.[1]

[1]
[https://github.com/bottlepy/bottle/blob/master/bottle.py](https://github.com/bottlepy/bottle/blob/master/bottle.py)

------
svieira
Several good ones have already been suggested, but here's a few more:

\- [https://github.com/mahmoud/boltons](https://github.com/mahmoud/boltons) :
utility functions, but well documented

\- [https://github.com/KeepSafe/aiohttp](https://github.com/KeepSafe/aiohttp)
: a Python 3 async HTTP server

\- [https://github.com/telefonicaid/di-py](https://github.com/telefonicaid/di-
py) : a dependency injection framework

\- [https://github.com/moggers87/salmon](https://github.com/moggers87/salmon)
: a fork of Lamson (which was written by Zed)

Python's internals are pretty darn open, so here's a few suggestions that push
the boundaries of meta programming in Python - they're not the idiomatic code
you're looking for right now, but later, when you know the best practices and
you're wondering what is _possible_ they'll be good to look at:

\- [https://github.com/Suor/whatever](https://github.com/Suor/whatever) :
Scala's magic `_` for Python

\-
[https://github.com/ryanhiebert/typeset](https://github.com/ryanhiebert/typeset)
: Types as sets for Python

\-
[https://github.com/AndreaCensi/contracts](https://github.com/AndreaCensi/contracts)
: Gradually typed Python (akin to MyPy)

\- [http://mypy-lang.org](http://mypy-lang.org) : Gradually typed Python - the
future (at least right now)

------
nyddle
Flask -
[https://github.com/mitsuhiko/flask](https://github.com/mitsuhiko/flask). It's
small, awesome and digestible.

~~~
resc1440
Also MarkupSafe -
[https://github.com/mitsuhiko/markupsafe](https://github.com/mitsuhiko/markupsafe)

It's a little bit meta, but I had lots of "wow" moments. Also, it's a nice
example of using C to speed up certain operations.

~~~
nekopa
Thanks for this recommendation. I don't know if I will check it out right now,
but it is on my list now because I am hoping to integrate C in Python when
needs be. But I don't want to jump the gun :)

~~~
njbooher
For that you might want to look at Cython.

------
shoyer
I recommend PyToolz, "set of utility functions for iterators, functions, and
dictionaries":
[https://github.com/pytoolz/toolz](https://github.com/pytoolz/toolz)

The functions in PyToolz are short, well tested and idiomatic Python (thought
the functional programming paradigm they support is not quite so idiomatic). I
recommend starting with the excellent documentation:
[http://toolz.readthedocs.org/en/latest/](http://toolz.readthedocs.org/en/latest/)

In particular, the API docs have links to the source code for each function:
[http://toolz.readthedocs.org/en/latest/api.html](http://toolz.readthedocs.org/en/latest/api.html)

------
spang
The Nylas Sync Engine is a large Python codebase with a test suite:
[https://github.com/nylas/sync-engine](https://github.com/nylas/sync-engine)

Lots of examples of SQLAlchemy, Flask, gevent, and pytest in action to build a
REST API and sync platform for email/calendar/contacts data!

~~~
aaronchall
I clicked through on this and browsed a few directories, none of which seemed
likely. I do not see any `.py` files aside from an empty `__init__.py` and the
`setup.py`. Are you sure a beginner just learning Python should see this?

~~~
spang
The main codebase is in `inbox/`, with launcher scripts and tools in `bin/`.

(Might not be totally obvious, because the package namespace is called
`inbox/` for legacy reasons.)

If a beginner wants to see real production code, rather than toy examples, I
think it's inevitable that there will be some points of confusion. Part of the
learning process is diving in and exploring and being okay with not totally
understanding _everything_ that's going on. :)

------
travisfischer
A large Python project that I haven't seen mentioned by others but that I find
to be particularly well written and designed is the Pyramid web framework.

* [https://github.com/Pylons/pyramid/](https://github.com/Pylons/pyramid/)

~~~
rachbelaid
I agree with you. Pyramid is one of the most well designed codebase that I
know.

------
mattwritescode
The django project is a good example of a large opensource project which has
aged well. [http://github.com/django/django](http://github.com/django/django)

~~~
nekopa
I know it is a good project, but would you recommend it for someone to read
through the codebase?

(Honestly, looking at the repo, I don't even know where to start if I wanted
to do a read through. Has anyone created a map? :)

~~~
iorlas
No, definitely no. Django aged well in terms of being alive, usable and
comfortable to use in most usecases. But the codebase... Let me explain.
Currently it works as "pay-to-get-a-feature" by crowdfounding. It not a bad
thing, it gives good features we all need, but it is a bit sad fact when you
look at Rails. So, it doesn't mean Django developers doesn't care about code,
they care. But codebase is so big and complicated(because some code needs to
be refactored)... it is common to see that some bug cannot be simply fixed by
two lines, because some core member insists "this all should be reworked".

------
jordigh
Mercurial.

By design, Mercurial has almost no dependencies, so it's very self-contained.
I find this makes it a particularly easy codebase to get into.

If you're interested, I would love to walk you (or anyone else!) trough it.

------
thruflo
> Those who don't study Zope are condemned to reinvent it

[http://dirtsimple.org/2007/01/where-zope-leads-python-
follow...](http://dirtsimple.org/2007/01/where-zope-leads-python-follows.html)

[https://github.com/zopefoundation?tab=repositories](https://github.com/zopefoundation?tab=repositories)

------
feathj
Check out boto. It's Amazon's official library for interacting with AWS. It is
written and tested well. I use it every day.

[https://github.com/boto/boto](https://github.com/boto/boto)

~~~
dynamicdispatch
I've had to read the source of boto since there were some boto exceptions I
was seeing in stack traces in the fabric deploy process - and ugh, I found the
code to be not very intuitive and the documentation poor. Anyone else run into
the same issues with boto?

~~~
clebio
I've found the documentation (not the codebase itself) to be very hit-or-miss.
Some components are very well documented, some not at all. Presumably due to
maturity of different parts of the stack and the Boto library, but still
frustrating from an end-user perspective.

------
veddox
1\. The Python standard library (if you're on Linux, /usr/lib/python2.x or
3.x, depending on your version).

2\. The Bazaar VCS is written entirely in Python, is very well documented and
has a large test section. (www.launchpad.net/bzr)

~~~
nekopa
I have been debating whether or not to read the standard library, pro - it's
the standard library, if I learn it I may save myself innumerable hours from
reinventing wheels, con - it may not be the best example of code (thinking
about the 400 line function from vim to see if there is input from the
keyboard. Justified in context, but would be a horrible way to learn some
code)

Bazaar I may look into, as I know it's a very capable piece of software, and
now I know it's well documented with tests makes it very relevant to my
interests :)

Edit: I'm on windows, but I'm using vagrant with Ubuntu 14lts image for my
development work. Specifically the data science vagrant box...

~~~
dochtman
I have some experience with the Mercurial code base, which I thought was
pretty well engineered. It's not PEP 8, though, so that makes it slightly
idiosyncratic. Might make an interesting comparison with Bazaar, though!

~~~
veddox
I haven't used Mercurial yet, let alone glance at the code base, but it might
be worth a look ;-)

But honestly, the Bazaar code base is great. Great documentation at every
level, and, as far as I can judge, some pretty good code too.

~~~
nekopa
Thanks for that input, that is exactly what I am looking for.

Cheers!

------
rasbt
I wholeheartedly recommend [scikit-learn]([https://github.com/scikit-
learn/scikit-learn](https://github.com/scikit-learn/scikit-learn)) - the best
organized and cleanest code I've seen so far. It is really organized and well
thought-through.

------
giancarlostoro
I see nobody has recommended CherryPy:
[http://www.cherrypy.org/](http://www.cherrypy.org/)

It is a minimal web framework like Sinatra or Flask. The beautiful thing about
CherryPy is you write code for it the same way you would write general Python
code. I enjoy using it for small projects from time to time.

edit:

Bickbucket Repository:
[https://bitbucket.org/cherrypy/cherrypy/overview](https://bitbucket.org/cherrypy/cherrypy/overview)

------
aaronchall
Here's the link to the Pandas DataFrame source:
[https://github.com/pydata/pandas/blob/master/pandas/core/fra...](https://github.com/pydata/pandas/blob/master/pandas/core/frame.py)

We spent a month of Sundays going through this in the NYC Python office hours.
You learn a lot about this object by reading the source, and the WTF per
minute rate is fairly low.

The style is also fairly non-controversial.

------
patrickk
youtube-dl: [https://github.com/rg3/youtube-
dl](https://github.com/rg3/youtube-dl)

I fell in love with this project after discovering I don't need ad-choked,
dodgy sites to download Youtube videos/mp3s. It also acts as a catch-all
downloader for a _huge_ amount of other video hosting sites, despite the name.
If you want to learn how to scrape different videos from many platforms, look
at this:

[https://github.com/rg3/youtube-
dl/tree/master/youtube_dl/ext...](https://github.com/rg3/youtube-
dl/tree/master/youtube_dl/extractor)

------
d0m
The pep8 standard is also an easy read with so many useful explanations:

[https://www.python.org/dev/peps/pep-0008/](https://www.python.org/dev/peps/pep-0008/)

~~~
justizin
Don't let it turn you into a fucking pedant, though.

~~~
compostor42
When it comes to maintaining standards in a code base, one needs to be a
pedant.

~~~
Thrymr
The mentality of someone who thinks that an 89-character line that ends with "
# noqa" is better than the 81-character line without that ending because now
it passes flake8 is one I'll never understand.

~~~
compostor42
Never seen someone do that. That's pretty bad.

I was more defending being a pedant about whatever coding standards your team
agrees upon rather than any specific PEP-8 standard.

It is no problem to increase PyLint's line length limit if your team wants to
use something other than the PEP-8 79 character limit.

------
mkolodny
Guido van Rossum, the creator of Python, co-wrote a web crawler in under 500
lines:
[https://github.com/aosabook/500lines/tree/master/crawler](https://github.com/aosabook/500lines/tree/master/crawler)

It's especially interesting because it takes advantage of new Python features
like asyncio.

------
notatoad
I learned a lot about python by reading through the tornado codebase. it's
pretty easy to read, well broken up into functions, and not too big.

~~~
jehiah
ditto.
[https://github.com/tornadoweb/tornado/tree/master/tornado](https://github.com/tornadoweb/tornado/tree/master/tornado)

------
cessor
I would recommend reading about the Tornado WebServer. It features some nice
stuff such as coroutines, async stuff.

[https://github.com/tornadoweb/tornado/](https://github.com/tornadoweb/tornado/)

------
ericjang
NetworkX - [https://networkx.github.io/](https://networkx.github.io/) Good
example of object-oriented programming patterns (mixins) in Python and module
organization.

~~~
nekopa
Thanks! Not sure if I am ready to take on OOP and mixins in Python yet, but I
do like the idea of module organization so it's definitely on my list now.

Cheers!

------
tzury
1 - (web, network, file system, and more).

    
    
        tornado - 
        tornadoweb.org 
        github.com/facebook/tornado
    

2 - scapy

    
    
        The entire product runs on the python CLI
    
        secdev.org/scapy

------
deepaksurti
NLTK: [https://github.com/nltk/nltk](https://github.com/nltk/nltk) with the
documentation at [http://www.nltk.org](http://www.nltk.org). I found the code
easy to follow through.

I referred to it when adding tests for tokenizers in a common lisp NLP
application: [https://github.com/vseloved/cl-
nlp/](https://github.com/vseloved/cl-nlp/).

~~~
rspeer
Agreed. NLTK's value is not that it should be your go-to source for NLP
algorithms -- when there's a specific task you need to accomplish, there'll be
a specific solution by now that works better than NLTK.

NLTK's value is that it shows you _how_ to write NLP algorithms, and gives you
an understandable starting point when you need to do something that nobody has
implemented yet.

------
mapleoin
Also, how about examples of good web applications built on python with
available source code?

Rather than seeing the code of great libraries, I sometimes want to see how
people use them in the real world.

------
edoceo
Gentoo Portage package manager. Its a big project, lots of moving parts,
actively developed. Really helped me with learning Python

------
rwar
youtube-dl: [https://github.com/rg3/youtube-
dl](https://github.com/rg3/youtube-dl)

------
rocho
I think Radon
([https://github.com/rubik/radon/](https://github.com/rubik/radon/)) is a good
example of clean code. It's quite a small project, but it also gets right all
the setup chore (like setup.py, testing, coverage, travis integration, etc.)
which a beginner may find interesting.

Also, I second all the suggestions about Kenneth Reitz code, it's wonderful!

------
danwakefield
Openstack does a large amount of testing for their code[1] but they is a huge
amount of it. Barbican[2] is one of the newer less crufty components.

[1]:
[https://github.com/openstack/openstack](https://github.com/openstack/openstack)
[2]:
[https://github.com/openstack/barbican](https://github.com/openstack/barbican)

~~~
agentultra
Openstack is so large it also has a plethora of examples of bad code.
Especially in vendor-contributed drivers and such. I wouldn't pay attention to
the code so much as the incredible infrastructure and process that manages it
and improves it over time.

Seriously, I've seen "sychronization threads" that use no synchronization
primitives that have a single bare exception handler trap the critical
section. The handler just restarts the computation... leading to fun times.

Cinder also has an interesting problem in the explosion of ABC "mixins" used
for constructing a backend driver. It went from 5 in the previous release to
something like 25 presently. A patch to fix it:
[https://review.openstack.org/#/c/201812/](https://review.openstack.org/#/c/201812/)

But even the process isn't perfect. I've seen patches that change a couple of
log lines get dog-piled with -1 nit-picks about inconsequential wording and
take months to get through.

Openstack is an interesting beast but it's not a good example if you're just
learning, IMO.

~~~
danwakefield
I happen to agree, Ive been working on it for the past year and its sort of
killed my drive to work with something 'big' again.

That being said Barbican is quite good, which I think is due to its proof of
concept being written in Go[1] before being ported.

If you want to learn decent testing Openstack is a good example, the code as a
whole not so much. This also only applies if you want to use unittest or their
custom test module, testtools. py.test is a much nicer way to do testing IMO

[1]:
[https://www.youtube.com/watch?v=245rSZBdm9s](https://www.youtube.com/watch?v=245rSZBdm9s)

~~~
merb
I still don't get it why they only write in Python (Horizon also has Less, JS)
However a lot of code would be way cleaner / clearer when they wouldn't use
Python that much. Especially when looking at Horizon. They use Django mainly
to use Django, they nearly need nothing from Django at all. The most things
they did were quite hacky. I mean horizon is full of magic to do really simple
things. And especially a lot of code got glued together over time..

------
rjusher
I would also add

*Twisted ([https://github.com/twisted/twisted](https://github.com/twisted/twisted))

For async python.

~~~
rspeer
I disagree. Twisted is a sprawling codebase, it started as a game library that
turned into an async library along the way, you need to read books to get the
full documentation, and some of it doesn't even have docstrings or comments.

~~~
rjusher
You are right it is not the best example of an open source project, and for
what you say it neither is a good python project example. But is there any
other place you can get the hold of working async with python, it may be hard,
but you would learn a lot.

But maybe there are other async python projects that I don't know of. If you
know of any please post them, I would also like to learn more about the
subject.

------
NumberCruncher
I´m just a user and not a contributor (I don´t know the source code), but the
project is following good webdev techniques

* [https://github.com/web2py/web2py/](https://github.com/web2py/web2py/)

* [https://github.com/web2py/pydal](https://github.com/web2py/pydal)

------
misiti3780
Django, Tornado, Sentry, Newsblur

------
SFjulie1
Really the hard way?

[https://hg.python.org/cpython/file/3.5/Lib/collections/__ini...](https://hg.python.org/cpython/file/3.5/Lib/collections/__init__.py)
Knowing specialized data structure in a language is always important and this
is well coded.

Well see the 2 lines I pointed in one of the standard library in python and
you will understand that even in good language there will always be dust under
the carpet at one point.
[https://hg.python.org/cpython/file/5c3812412b6f/Lib/email/_h...](https://hg.python.org/cpython/file/5c3812412b6f/Lib/email/_header_value_parser.py#l1329)

~~~
nekopa
Thanks for the recommendations, I think they may be a bit above me at the
moment (I don't have enough knowledge of Python to be able to identify dust
under the carpet :)

But you did teach me one thing, I didn't know that the """ could be used for
commenting. I thought it was only for multi-line print statements.

These are exactly the type of things I hope to learn while code-reading.

Cheers!

~~~
thatmalewoman
Even better: If you put the string right after your function/class definition,
python automatically assigns it to the `__doc__` attribute of the object. This
is how the `help` function works.

~~~
nekopa
Thanks!

As ironic(?) as this sounds, where is this doc feature documented?

(Part of my learning is that I am trying hard to learn how to navigate the
various Python documentation. For example, I had a hell of a time trying to
find a list of format strings, as different sources refer to it as different
things)

~~~
denzil
PEP-257 is where you can find documentation on them.

[https://www.python.org/dev/peps/pep-0257/](https://www.python.org/dev/peps/pep-0257/)

~~~
nekopa
Thanks for that, and thank you for getting me into the whole PEP. Previously I
had only heard about PEP 8, and I thought that was just it , an opinionated
piece about how to write Python. Now I have a whole other source of
information to look at.

And for people who will no doubt call me out on this, I have been working on
Zed's LPtHW for about a month. I have done numerous searches regarding Python
problems, and this is literally the first time I have come across a PEP
reference to look at beside PEP 8.

~~~
veddox
PEPs are basically the Python community's RFCs (if you're familiar with
those).

BTW, what you call "an opinionated piece on how to write Python" was
originally written by Guido van Rossum, the guy who invented Python in the
first place. I think that gives him the right to say something on the topic...

(You ought to know that Python has a large following with a strong community
feeling. Making negative remarks about their BDFL - _benevolent dictator for
life_ \- van Rossum does not go down well ;-) )

~~~
nekopa
I do know the idea behind RFCs.

And I hope that other people don't think I used the wording 'opinionated' in a
negative way. I do know about the concept of a BDFL, and I do know that Guido
was behind those comments.

But I still stand behind my assessment that the PEP 8 comments are
opinionated. And maybe rightly so. ;)

~~~
pollen23
Doesn't Guido himself preface some of the things in PEP8 as being his pet
peeves?

------
anon3_
SQLAlchemy

\-
[https://github.com/zzzeek/sqlalchemy](https://github.com/zzzeek/sqlalchemy)

\-
[http://www.aosabook.org/en/sqlalchemy.html](http://www.aosabook.org/en/sqlalchemy.html)

\-
[http://docs.sqlalchemy.org/en/rel_1_0/](http://docs.sqlalchemy.org/en/rel_1_0/)

~~~
nekopa
This one could be very interesting as I have decades of experience with DBs
and SQL, so I was always wondering about ORMs, and sqlalchemy seems to be best
in field at the moment.

Thanks for the recommendation.

Cheers!

~~~
nine_k
Large pieces of SQLAlchemy are about metaprogramming and some syntactic magic
required for an easy-to-use ORM. It's mostly nice on the inside (to my cursory
glance), but neither simple nor easy for an inexperienced person.

Still it's a good example how to make many relatively complex parts easily
composable and efficient.

------
mahouse
For the web developers out there, what do you think of reddit? Any honest
commentary on it?
[https://github.com/reddit/reddit](https://github.com/reddit/reddit)

------
gcb0
good examples with a (hopefully multiplatform) GUI?

