Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Good Python codebases to read?
278 points by nekopa on July 16, 2015 | hide | past | favorite | 151 comments
Hi all, I am currently going through Learn Python the Hard Way, and am on the review section where I have to read through source code.

Can anyone recommend some good open source Python software I can look at? I am specifically looking to see ones that employ idiomatic Python, and maybe see how they approach testing (I am not completely new to programming, just rusty after being out of the field for a long time)

Extra bonus points if the software is something you use regularly.


Jumping on the Kenneth Reitz train, you might check out The Hitchhiker's Guide to Python: http://docs.python-guide.org/en/latest/

He recommends the following Python projects for reading:

* Howdoi (https://github.com/gleitz/howdoi)

* Flask (https://github.com/mitsuhiko/flask)

* Werkzeug (https://github.com/mitsuhiko/werkzeug)

* Requests (https://github.com/kennethreitz/requests)

* Tablib (https://github.com/kennethreitz/tablib)

Hope that helps---good luck!

+1 to anything written by Kenneth Reitz, he's also a fantastic OSS maintainer and extremely welcoming of newbies and their PR's if you want to put some of your learning into practice

Not the most responsive maintainer - still waiting on him to tag a 1.0.0 release for autoenv - https://github.com/kennethreitz/autoenv/issues/82

I would add the Django project to that list as it's a very large, mature, and successful open source python project - https://github.com/django/django

Django's source is very high quality. Though due to the large scope of the project, there are necessarily many layers of indirection, which may be a bit daunting for someone who is just starting out.

However reading the less abstract parts may help. For instance, the paginator is pretty self contained. https://github.com/django/django/blob/master/django/core/pag...

I disagree. A lot of code that does very little.

I prefer sklearn like https://github.com/scikit-learn/scikit-learn/blob/master/skl...

A lot of code that does a lot.

Bottle is nice on the web dev front.


I really like the API of the framework, unfortunately some of the core elements suffer from being extremely stateful code

The "self.thing = bar" in one function that only gets used in some other function ( or even worse something only used in a companion class) pattern is super prevalent.

Might just be me but I think a lot of the older code suffers from massive locality problems that makes debugging framework bugs super tricky

Honestly, I have to agree with you to an extent.

I think a lot of the issues involving overuse of state, are primarily related to using OO when a pure function would suffice. It's just too tempting to dynamically assign attributes to mutable instances.

To be fair, when Django is used properly it isn't usually an issue. Besides the queryset/model API is extremely nice, and at this point very polished.

definitely agree.. I recently looked at the management command code as a reference when building some non-django scripts that use python argparse...


Is the Django project Pythonic? I mean, on the one hand, Python is the language of bells and whistles builtin. On the other hand, packages are encouraged to be simple and to the point.

Whilst this wasn't a discussion on which framework is best (and I'm not a Web dev by trade either), I must say I turn to Flask, as I find the API more Pythonic.

I guess what I'm trying to say is that the Django source code is probably great, but the less heavyweight packages/frameworks the better. Learn Pythonic API design from somewhere else.

SQLAlchemy is one of my favourite larger python codebases: https://github.com/zzzeek/sqlalchemy

Holy crap! I don't know if this helps or hinders! ;)

Thank you for this resource, much appreciated!

Peter Norvig's examples. They are quite short and include much explanation in addition to code. They also include tests and benchmarking code.

http://norvig.com/lispy.html http://norvig.com/lispy2.html (Lisp interpreter)

http://www.norvig.com/spell-correct.html (Spelling corrector)

http://norvig.com/sudoku.html (Sudoku solver)

Also his online course Design of Computer programs includes many short, well-explained Python examples:


His coding style is not common to most writers of Python.

He's using overly terse, poorly descriptive variable names, not using multiline strings for docstrings, not indenting where he should, and one-lining if/elif statements and function definitions. This style does not contribute to readability.

This is not how I would want someone just learning Python to learn it.

Maybe it's because I am also something of a lisper (like Norvig), but I don't see anything wrong with inline ifs (after all, that is how if works in lisp, with returns for a conditional, like the ternary operator) or lambda functions. In fact, I find that improves readability dramatically because it more declaratively says what you are trying to accomplish in many cases.

For example:

  absolute_path = lambda path: path if path.startswith('/') else '/' + path
To me this is perfectly clearly a simple function whose only purpose is to prepend forward slashes to unix-style paths. Is the following really so much more readable?

  def absolute_path(path):
      if not path.startswith('/'):
          path = '/' + path
      return path
To my eyes and mind, the second example is not any more readable at the expense of several lines of code.

I'd write that function like this:

  def absolute_path(path):
      if not path.startswith('/'):
          return '/' + path
      return path
This way you don't have to keep track of mutating the path variable. I do find that more readable - it's clearer that there are two paths through the code. That said, this example is trivial enough that I'd probably do it with an inline if/else statement, although still probably not a lambda:

  def absolute_path(path):
      return path if path.startswith('/') else '/' + path
This way it's more obvious at a glance that you're defining a function. It's easier to follow someone else's code if they generally adhere to standards, and Python is a very convention-oriented language.

The former is absolutely identical (semantically) to the latter, with two exceptions: 1) the former function does not know its own name and 2) the latter function can (and should) be documented with a docstring. I find the latter eminently more readable, and I work daily in a code base under development by 3000 Python developers for over 5 years.

Considering that the creator of the Python language considered getting rid of lambdas because they are essentially limited functions and thus violate Python's "one obvious way to do it" philosophy, I'd rather those learning Python to be shown the latter, rather than the former.

From a development and version control perspective, as soon as the lambda function requires more than a simple expression, i.e. a compound statement (https://docs.python.org/2/reference/compound_stmts.html), you have to trash the whole line, instead of adding perhaps a single extra line of content.

> The former is absolutely identical to the latter

Oh, really? Let's compare:

  >>> absolute_0 = lambda path: path if path.startswith('/') else '/' + path
  >>> def absolute_1(path):
  ...     '''Return the absolute unix path from a given path name'''
  ...     if not path.startswith('/'):
  ...         path = '/' + path
  ...     return path
  >>> import dis
  >>> dis.dis(absolute_0)
    2           0 LOAD_FAST                0 (path)
                3 LOAD_ATTR                0 (startswith)
                6 LOAD_CONST               1 ('/')
                9 CALL_FUNCTION            1
               12 POP_JUMP_IF_FALSE       19
               15 LOAD_FAST                0 (path)
               18 RETURN_VALUE        
          >>   19 LOAD_CONST               1 ('/')
               22 LOAD_FAST                0 (path)
               25 BINARY_ADD          
               26 RETURN_VALUE        
  >>> dis.dis(absolute_1)
    4           0 LOAD_FAST                0 (path)
                3 LOAD_ATTR                0 (startswith)
                6 LOAD_CONST               1 ('/')
                9 CALL_FUNCTION            1
               12 POP_JUMP_IF_TRUE        28
    5          15 LOAD_CONST               1 ('/')
               18 LOAD_FAST                0 (path)
               21 BINARY_ADD          
               22 STORE_FAST               0 (path)
               25 JUMP_FORWARD             0 (to 28)
    6     >>   28 LOAD_FAST                0 (path)
               31 RETURN_VALUE        
These functions are actually not identical in their computation - only their result.

> 1) the former function does not know its own name

If you think that is really important (hint: it's not [from a lisper perspective, anyway]), Python thankfully allows you to do this:

  >>> absolute_0.__name__
  >>> absolute_0.__name__ = '<lambda "absolute_0">'
  >>> absolute_0.__name__
  '<lambda "absolute_0">'
> 2) the latter function can be documented with a docstring

  >>> absolute_0.__doc__
  >>> absolute_0.__doc__ = 'Return the absolute unix path from a given path name'
  >>> help(absolute_0)
  Help on function <lambda "absolute_0">:
  <lambda "absolute_0">(path)
    Return the absolute unix path from a given path name
Of course the only reason you can't put docstrings on a lambda function in python is because the forced indentation of code and implicit return with no indented block available is what Guido went with for Lambda.

> Considering that the creator of the Python language considered getting rid of lambdas

Guido is not a proponent of functional programming in general and claims that map, reduce, and filter are so much harder to understand than list comprehensions (which implement some common map, reduce, and filter, operations with special optimized syntax) that he tried to get them removed from the language too. Thankfully for us users of the language, this view did not win through and we can still use map, reduce, and filter in python, if we choose.

Is there any reason to use your hackery instead of the canonical form? (Other than I-am-right-syndrome)

Not sure which part you're curious about. If it wasn't clear from elsewhere in this thread...

Lambda functions are most useful for very small, straightforward functions that return a value - especially in cases where you don't necessarily need to name or document them deeply, such as for passing as arguments to other functions (for example, sort/sorted).

So if you want terse code that only does what it needs to do and nothing more, pepper with lambdas as needed.

If you want to invent names for things just cause you like inventing them or need to document every single function you write (even if the code is simple enough to document itself), feel free to make all one-line returning functions in the fully named and documented format.

Spare us the half-baked hackery. List comprehensions and generator expressions have replaced all need for map, filter, and lambdas, and are far more readable. For someone new to Python, halfway through LPTHW, they don't need those things.

Hey, maybe you can help me decipher this, I've always wondered exactly what's going on here: https://docs.python.org/2/faq/programming.html#is-it-possibl...

  # Mandelbrot set
  print (lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+y,map(lambda y,
  Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM,
  Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro,
  i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y
  >=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr(
  ))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)
  #    \___ ___/  \___ ___/  |   |   |__ lines on screen
  #        V          V      |   |______ columns on screen
  #        |          |      |__________ maximum of "iterations"
  #        |          |_________________ range on y axis
  #        |____________________________ range on x axis

> List comprehensions and generator expressions have replaced all need for map, filter, and lambdas, and are far more readable.

I would say that is debatable. Coming from an FP background, I use map, filter and lambdas nearly all of the time because I find it more readable and can easily reason about the code. I have seen some two line list comprehensions and they are far harder for me to read and understand.

OK so let's talk about that, which is more readable?:

  iterable = xrange(10)

  ge = (x*x for x in iterable if not x % 2)

  mf = map(lambda x: x*x, filter(lambda x: not x % 2, iterable))
Assume imap and ifilter from itertools or Python 3 (with range) for equivalence. I'm betting if we ask any person new to Python and new to programming, they'd think the former much more readable than the latter. Yes, we left it in the language for you cranks who think map, filter, and lambda are way better, but it's functionally no different.

> Spare us the half-baked hackery.

Ad hominem? I guess I win.

> List comprehensions and generator expressions have replaced all need for map, filter, and lambdas

Please explain how list comprehensions and generators have replaced the need for lambdas.

  >>> sorted(((x, -(x**2)) for x in xrange(10) if 0 == x % 2), key=lambda item: item[1])
  [(8, -64), (6, -36), (4, -16), (2, -4), (0, 0)]
Your "distaste" of functional programming constructs is right up there with Guido's.

Nice lambda. You've defended yourself admirably. Did you know there's an `operator.itemgetter` function that does that?

So that's:

  >>> sorted(((x, -(x**2)) for x in xrange(10) if 0 == x % 2), key=operator.itemgetter(1))
  [(8, -64), (6, -36), (4, -16), (2, -4), (0, 0)]
Do note that good code and code golf are two different things! :D

> Nice lambda. You've defended yourself admirably.


> Did you know there's an `operator.itemgetter` function that does that?

Yes, I'm quite aware! Are you aware the "useless" functional solution with lambda is two characters shorter?

  >>> len('lambda item: item[1]')
  >>> len('operator.itemgetter(1)')
Cause you're apparently not aware that I was demonstrating a use-case for lambdas as one-off functions that are passed to other functions (which is an abstract concept from the particular function used), and you didn't demonstrate how list comprehensions or generators make them not-needed. Of course, that's because it was a leading question and the answer is that the concepts are orthogonal so it cannot be demonstrated.

I bet you two would be good friends IRL.

I'm sure we would. I'll buy the first drink. :D

A more idiomatic way to handle this is to arrange the items in the tuple by sort precedence, then restructure the data in the tuple after the sort.

>>> [(b, a) for a, b in sorted(((-(x2), x) for x in xrange(10) if 0 == x % 2))]

[(8, -64), (6, -36), (4, -16), (2, -4), (0, 0)]

However, ocasionally you still need a more complex sorting function. So lambdas are still handy, IMO.

>> Spare us the half-baked hackery.

>Ad hominem? I guess I win.

That was not ad hominem, because what you wrote is indeed half-baked hackery.

Nobody said it cannot be done the way you did it. You were just pointed at the shortcomings of your approach and that the Python community generally prefers stupidly simple, easy to understand solutions. Using magic attributes to argue against it just makes it worse - remember this is a thread about idiomatic Python code bases.

Now you're moving goalposts...

> Nobody said it cannot be done the way you did it. You were just pointed at the shortcomings of your approach

I "addressed" the shortcomings of "my approach" by showing that Python (the language, not the community) allows you to access and manipulate the data you claimed was important and missing.

I don't believe it is necessary for most simple functions to know what their name is. I do believe the demonstrated code is self documenting enough to not require a documentation string. You made those "requirements". I never claimed that every function must be a lambda - you seem to be implying I am, so I am explicitly stating that I do not.

Here's a place where you really do need a named function (due to deficiencies in Python's lambda implementation):

  >>> def named_lambda(procedure, name, documentation=''):
  ...     procedure.__name__ = '<lambda {}>'.format(name)
  ...     procedure.__doc__ = documentation
  ...     return procedure
  >>> absolute_path = named_lambda(lambda path: path if path.startswith('/') else '/' + path, 'absolute_path', 'Return the absolute unix path from a given path name')
  >>> absolute_path.__name__
  '<lambda absolute_path>'
  >>> absolute_path.__doc__
  'Return the absolute unix path from a given path name'
Of course, that's completely silly... since the point of a lambda function generally is that the function is generally small enough and short-lived enough that it does not need a name or documentation.

> the Python community generally prefers stupidly simple, easy to understand solutions.

Which, despite your protests, includes using lambdas!

> Using magic attributes to argue against it just makes it worse - remember this is a thread about idiomatic Python code bases.

How else does a function "know its own name" unless it uses the "magic" attribute "__name__"? Oh, you prefer "func_name"? That's cute:

  >>> named_lambda.func_name
  >>> named_lambda.__name__
  >>> named_lambda.__name__ = 'lol'
  >>> named_lambda.func_name
  >>> named_lambda.func_name = 'named_lambda'
  >>> named_lambda.func_name
So in this comment I am replying to, it is a bad thing that I made use of "__name__", but in the comment THAT was replying to, it was a bad thing that I did NOT use "__name__" or its linked "func_name". That's how you move goal posts!

To me the first is far more quicker to read and understand, and I am no lisper. There is something immediately gratifying about the first that I find missing in the more laborious ponderous prose of the latter.

While that's true, he also tends to write fairly idiomatic Python -- using base data types a lot, list comprehensions, etc, and his comments and overall insight make them Very worthwhile to read -- especially once you're familiar enough reading it to recognize what he's doing.

The spellchecker still blows my mind, and I don't yet understand the Sudoku puzzle. We can always take the code and re-format / re-name variables to aid our understanding.

Well, between a run of the mill programmer who happens to indent where he/she should vs Norvig, I will likely choose the latter If I could.

Reminds me of that blog thread where this design pattern guru hemmed and hawed from his high horse over several posts about how to write constrained based solvers and still did not get to a piece of code that actually solved the problem, whereas Norvig just posted a simple solution. A few non-idiomatic indents here and there (although his style has never been a problem for me) are nothing really.

>Well, between a run of the mill programmer who happens to indent where he/she should vs Norvig, I will likely choose the latter If I could.


>Reminds me of that blog thread where this design pattern guru hemmed and hawed from his high horse

I read a similar story a while ago. Two programmers are given the task of writing a program for some non-trivial problem.

One of them, Hoity Toity Harry, tries to apply many of the latest and greatest algorithms, techniques, paradigms, etc., to impress people (of course).

The other, Down To Earth Dan, just strives for a good implementation, with reasonably good algorithms, etc. After a while, Dan finishes his program and does a test run. It works well enough for the task. Meanwhile, Harry is not even near to finishing his code, due to struggling with complexities of the techniques he has tried to use.

The boss comes in, sees the results, and congratulates Down To Earth Dan.

Hoity Toity Harry, of course, has to protest, trying to put down Down To Earth Dan's implementation, saying that it uses simple algorithms, etc., while his own code uses sophisticated, state of the art techniques. Dan replies: "Yes, I could also have used those things. But my program runs, and yours doesn't."

Okay, I changed the programmers' names for fun and effect, but I really did read the story, in some good (and pragmatic) software book, a while ago.

Speaking of which, I once saw a user review of Norvig's book PAIP complaining about the unsophisticated code -- no monads, etc. (I forget what other patterns the reviewer wanted to see.)

Ha ha, good one. I wonder if the reviewer knew who Norvig is.

I'm personally a fan of conditional expressions, they're easy to read IMO as long as you're not nesting them.

For instance, I find this unnecessarily difficult to read.

>>> 0 if True else 1 if True else 2


But breaking it up like this keeps it reasonably concise, while retaining readability.

    if True:
        a = 0 if True else 1
        a = 2

They can also be combined with comprehensions, which can be useful.

>>> import random

>>> ['a' if random.choice((True, False)) else 'b' for _ in range(6)]

['a', 'a', 'a', 'a', 'a', 'b']

Norvig's code have several qualities to praise

Adherence to Python's best practices / pep-8 are not one of them

PEP 8 itself has the quote "a foolish consistency is the hobgoblin of small minds." It is not intended as a prescriptive standard that everyone needs to follow, just a recommendation.

Yeah, I know that, you need to tell everybody else

I'd rather have Norvig's code solving some IA problem than some perfectly compliant PEP-8 code solving an issue in a naive way

Requests - https://github.com/kennethreitz/requests.

How to make a usable api. The decisions that went into each method call were fantastic. Great test coverage as well. I use package in most python development.

Thanks, I've been hearing a lot about this package, I think it's one I will check out.


This presentation by the author explains the thinking behind it: https://speakerdeck.com/kennethreitz/python-for-humans

requests is very useful, but it always gets mentioned as a good python codebase and I'm not sure I agree. One example:

The first thing many users will do is

Which tells them that it takes some kwargs, but doesn't tell them what those kwargs are. It's easy, especially for a newcomer, to read "optional arguments that `request` takes" and fail to understand that they should look up the docs on (not-really-encouraged-as-part-of-public-API) function `request`. That's pretty bad; those kwargs are important! (The reason is because requests.get is implemented as a call to request(method, ..., kwargs) but the user doesn't care what the implementation-level reason is.)

Beyond that I did look into the codebase once to investigate a possible bug and there were a few python style things I wanted to fix, but I don't remember them so this comment probably sounds kind of annoying (it would annoy me if I were reading it not writing it...). It didn't strike me a really clean codebase. But yes the library is very useful and I'm sure it's a pretty decent python codebase.

  >>> requests.get?
  Signature: requests.get(url, params=None, **kwargs)
  Sends a GET request.
  :param url: URL for the new :class:`Request` object.
  :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
  :param \*\*kwargs: Optional arguments that ``request`` takes.
Here are all the kwargs the user probably wanted to know but failed to find out and was forced to either browse online docs or the source code:

  def request(method, url, **kwargs):
      """Constructs and sends a :class:`Request <Request>`.
      :param method: method for the new :class:`Request` object.
      :param url: URL for the new :class:`Request` object.
      :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
      :param data: (optional) Dictionary, bytes, or file-like object to send in the body of the :class:`Request`.
      :param json: (optional) json data to send in the body of the :class:`Request`.
      :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
      :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
      :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': ('filename', fileobj)}``) for multipart encoding upload.
      :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
      :param timeout: (optional) How long to wait for the server to send data
          before giving up, as a float, or a (`connect timeout, read timeout
          <user/advanced.html#timeouts>`_) tuple.
      :type timeout: float or tuple
      :param allow_redirects: (optional) Boolean. Set to True if POST/PUT/DELETE redirect following is allowed.
      :type allow_redirects: bool
      :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
      :param verify: (optional) if ``True``, the SSL cert will be verified. A CA_BUNDLE path can also be provided.
      :param stream: (optional) if ``False``, the response content will be immediately downloaded.
      :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
      :return: :class:`Response <Response>` object
      :rtype: requests.Response

Looks like you did the legwork of listing the params already, so you might as well send a PR with that docstring.

OK, maybe I will open an issue. The obvious concern is how to avoid duplicating the text among the various HTTP verb functions. In theory python allows the docstring to be manipulated via __doc__. However I don't think there is a precedent for using that mechanism to avoid duplication of docstring content that is considered good style, but perhaps someone could correct me if that is wrong.

You'd probably want to make a metaclass that handled the manipulation of __doc__ for shared verbs if you didn't want to duplicate the data too much.

OK, but in requests they are top-level functions, not methods. Is there anything wrong with the below? I don't think I've seen it done.

  __shared_docstring_content = """

  def f():
      "f docstring"

  f.__doc__ += __shared_docstring_content

  def g():
      "g docstring"

  g.__doc__ += __shared_docstring_content

I would do simply """f docstring\n%s""" % _shared_docstring, no need for a separate concatenation. However, I wonder whether sphinx would handle this.

This sort of thing works fine. We use it for pandas all the time.

Pandas does this, and I think it works well, even if it's not entirely transparent in the source.

> requests.get?

I was using Python for 8 years and IPython for somewhat 5 years if my memory serves me right but today I have learned that you can invoke help on an object by appending '?'. I guess I might delve into IPython documentation sometime.

Thank you for lengthy and detailed explanation.

Ipython is amazing. "requests.get??" will show you the source (works on modules too). "%pdb on" is another fave to drop straight into the debugger with uncaught exceptions.

Bottle: https://github.com/bottlepy/bottle

It's a nice, small, fast web framework. Great for building APIs. Also, it's one ~3k loc, readable file.[1]

[1] https://github.com/bottlepy/bottle/blob/master/bottle.py

Several good ones have already been suggested, but here's a few more:

- https://github.com/mahmoud/boltons : utility functions, but well documented

- https://github.com/KeepSafe/aiohttp : a Python 3 async HTTP server

- https://github.com/telefonicaid/di-py : a dependency injection framework

- https://github.com/moggers87/salmon : a fork of Lamson (which was written by Zed)

Python's internals are pretty darn open, so here's a few suggestions that push the boundaries of meta programming in Python - they're not the idiomatic code you're looking for right now, but later, when you know the best practices and you're wondering what is possible they'll be good to look at:

- https://github.com/Suor/whatever : Scala's magic `_` for Python

- https://github.com/ryanhiebert/typeset : Types as sets for Python

- https://github.com/AndreaCensi/contracts : Gradually typed Python (akin to MyPy)

- http://mypy-lang.org : Gradually typed Python - the future (at least right now)

Flask - https://github.com/mitsuhiko/flask. It's small, awesome and digestible.

Armin Ronacher often writes pretty and compact Python — https://github.com/mitsuhiko

Also MarkupSafe - https://github.com/mitsuhiko/markupsafe

It's a little bit meta, but I had lots of "wow" moments. Also, it's a nice example of using C to speed up certain operations.

Thanks for this recommendation. I don't know if I will check it out right now, but it is on my list now because I am hoping to integrate C in Python when needs be. But I don't want to jump the gun :)

For that you might want to look at Cython.

Flask was the framework that finally made web frameworks "click" for me - the biggest advantage of learning about Flask is you can build up your knowledge of web services piece by piece, allowing you to take time and fully understand each component before moving on to the next. That way you're not trying to comprehend all the "magic" that's going on behind the scenes all at once.

Thanks! I am also looking into using Flask soon too. I dabbled in django for a while, but too much magic was happening and I am trying to learn.

Django is quite transparent when you learn enough of it. But there is a lot to learn.

I was talking about just using django to make a web app. It is awesome in that regard. But if I'm making a web app just for learning about the issues regarding making web apps, django is not suitable because that's it's job :)

But I am hoping to move into django eventually, but first I want tknow what is going on behind the curtain.

I agree, Django is likely not the best if you're just starting learning, and writing a webapp from scratch. Working in an existing codebase, with other devs of whom you can ask questions, is a luxury that I was very thankful for.

I use Django at work, and it was my first time with Python. The beauty of Django is, IT'S ALL SOURCE. When you run it locally, you have TONS of learning options:

- Get an IDE, and read the sources if you don't understand how something works. "Find-Definition" all the way down. (I heart PyCharm.)

- If something's broken, you can edit it!

- You can put in `import ipdb; ipdb.set_trace()` calls anywhere and get a debugging prompt! (If you don't have IPython, you can use `pdb` instead of `ipdb`... shudder.) Being able to print What Actually Gets Passed Around can occasionally be very helpful.

- You can put in debugging messages at multiple points in the flow!

- make a `pygrep` alias to answer "where is ..." questions:

  # ignore south migrations, and external libraries.
  # You can alter this if you care about reading libs' codebase
  alias pygrep='grep -rin --include=*.py --exclude=*.pyc --exclude-dir=lib --exclude-dir=migrations'

I recommend PyToolz, "set of utility functions for iterators, functions, and dictionaries": https://github.com/pytoolz/toolz

The functions in PyToolz are short, well tested and idiomatic Python (thought the functional programming paradigm they support is not quite so idiomatic). I recommend starting with the excellent documentation: http://toolz.readthedocs.org/en/latest/

In particular, the API docs have links to the source code for each function: http://toolz.readthedocs.org/en/latest/api.html

The Nylas Sync Engine is a large Python codebase with a test suite: https://github.com/nylas/sync-engine

Lots of examples of SQLAlchemy, Flask, gevent, and pytest in action to build a REST API and sync platform for email/calendar/contacts data!

Putting on my "Python teacher" hat for a moment (I also work at Nylas), our codebase is interesting for learning purposes because it's a fairly homogenous, 'live' product managed by a commercial engineering team. Definitely on the complex side of things, though -- let me know if you want any help figuring it out (we also have a community Slack at http://slack-invite.nylas.com/).

On the testing side, I was chatting to a fellow Pyladies attendee about how we do test fixture setup and teardown (especially around databases) -- you might find that interesting to look at too.

I clicked through on this and browsed a few directories, none of which seemed likely. I do not see any `.py` files aside from an empty `__init__.py` and the `setup.py`. Are you sure a beginner just learning Python should see this?

The main codebase is in `inbox/`, with launcher scripts and tools in `bin/`.

(Might not be totally obvious, because the package namespace is called `inbox/` for legacy reasons.)

If a beginner wants to see real production code, rather than toy examples, I think it's inevitable that there will be some points of confusion. Part of the learning process is diving in and exploring and being okay with not totally understanding everything that's going on. :)

A large Python project that I haven't seen mentioned by others but that I find to be particularly well written and designed is the Pyramid web framework.

* https://github.com/Pylons/pyramid/

I agree with you. Pyramid is one of the most well designed codebase that I know.

The django project is a good example of a large opensource project which has aged well. http://github.com/django/django

I don't know what would be of my life right now without Django and I can't praise it enough... but "good example of readable python-code", it is not.

However, I think it's one of the best examples of phenomenal documentation.

I learned a lot from reading the Django codebase. I didn't read it like a book, I read parts of it as I tried to figure out how to do things in Django, but I never felt like the code was bad. This was around 5 years ago so it's possible things have gotten more convoluted since then.

If the OP wants to grok a small, discrete codebase then I agree that Django is not what he's looking for.

I know it is a good project, but would you recommend it for someone to read through the codebase?

(Honestly, looking at the repo, I don't even know where to start if I wanted to do a read through. Has anyone created a map? :)

No, definitely no. Django aged well in terms of being alive, usable and comfortable to use in most usecases. But the codebase... Let me explain. Currently it works as "pay-to-get-a-feature" by crowdfounding. It not a bad thing, it gives good features we all need, but it is a bit sad fact when you look at Rails. So, it doesn't mean Django developers doesn't care about code, they care. But codebase is so big and complicated(because some code needs to be refactored)... it is common to see that some bug cannot be simply fixed by two lines, because some core member insists "this all should be reworked".

Have you considered building something with Django and figuring out the magic as you go along??

I would start with django.core.handlers.base.BaseHandler (https://github.com/django/django/blob/master/django/core/han...), which is Django's Front Controller.


By design, Mercurial has almost no dependencies, so it's very self-contained. I find this makes it a particularly easy codebase to get into.

If you're interested, I would love to walk you (or anyone else!) trough it.

Check out boto. It's Amazon's official library for interacting with AWS. It is written and tested well. I use it every day.


I've had to read the source of boto since there were some boto exceptions I was seeing in stack traces in the fabric deploy process - and ugh, I found the code to be not very intuitive and the documentation poor. Anyone else run into the same issues with boto?

I've found the documentation (not the codebase itself) to be very hit-or-miss. Some components are very well documented, some not at all. Presumably due to maturity of different parts of the stack and the Boto library, but still frustrating from an end-user perspective.

Boto3 might also be worth a read. https://github.com/boto/boto3

Boto3 was made using a much more principled design approach, while Boto grew organically and frankly got a bit out of control. I love boto3, it makes AWS a joy to use.

1. The Python standard library (if you're on Linux, /usr/lib/python2.x or 3.x, depending on your version).

2. The Bazaar VCS is written entirely in Python, is very well documented and has a large test section. (www.launchpad.net/bzr)

I have been debating whether or not to read the standard library, pro - it's the standard library, if I learn it I may save myself innumerable hours from reinventing wheels, con - it may not be the best example of code (thinking about the 400 line function from vim to see if there is input from the keyboard. Justified in context, but would be a horrible way to learn some code)

Bazaar I may look into, as I know it's a very capable piece of software, and now I know it's well documented with tests makes it very relevant to my interests :)

Edit: I'm on windows, but I'm using vagrant with Ubuntu 14lts image for my development work. Specifically the data science vagrant box...

I have some experience with the Mercurial code base, which I thought was pretty well engineered. It's not PEP 8, though, so that makes it slightly idiosyncratic. Might make an interesting comparison with Bazaar, though!

In my opinion mercurial's codebase is much better than bazaar's. It's concise and to the point, uses classes when necessary and functions in other cases, following logic is usually straightforward task and it educates well. On the other hand, bazaar's codebase is a mess of hundreds of classes, relations of which are often hard to understand, they are hard to navigate and IMO it's over-engineered. Even writing a plugin for bzr is hard because there is plenty of docs which usually say nothing substantial and codebase is so hard to navigate.

Disclaimer: contributed to mercurial (and wrote some plugins), wrote few plugins for bzr (it was used in one company I worked for).

P.S. Didn't notice right away, hey, Dirkjan! :)

I haven't used Mercurial yet, let alone glance at the code base, but it might be worth a look ;-)

But honestly, the Bazaar code base is great. Great documentation at every level, and, as far as I can judge, some pretty good code too.

Thanks for that input, that is exactly what I am looking for.


That is actually a very cool idea, taking 2 codebases that work on the same domain and seeing the trade offs that each made.

Thank you! (And I had no idea that either bazaar or mercurial were written in Python, because them being 'serious' software I automatically assumed they were written in C. Color me stupid)

The Python standard library is hit-or-miss. Some parts are really excellent code. Other parts are horribly hacky crap from 1990 that can’t be ripped out or improved for backwards-compatibility reasons.

Thanks for that input, that is exactly what I thought it would be. My problem is that I am not able to discern the crap from good at my level.

I wholeheartedly recommend [scikit-learn](https://github.com/scikit-learn/scikit-learn) - the best organized and cleanest code I've seen so far. It is really organized and well thought-through.

I see nobody has recommended CherryPy: http://www.cherrypy.org/

It is a minimal web framework like Sinatra or Flask. The beautiful thing about CherryPy is you write code for it the same way you would write general Python code. I enjoy using it for small projects from time to time.


Bickbucket Repository: https://bitbucket.org/cherrypy/cherrypy/overview

Here's the link to the Pandas DataFrame source: https://github.com/pydata/pandas/blob/master/pandas/core/fra...

We spent a month of Sundays going through this in the NYC Python office hours. You learn a lot about this object by reading the source, and the WTF per minute rate is fairly low.

The style is also fairly non-controversial.

youtube-dl: https://github.com/rg3/youtube-dl

I fell in love with this project after discovering I don't need ad-choked, dodgy sites to download Youtube videos/mp3s. It also acts as a catch-all downloader for a huge amount of other video hosting sites, despite the name. If you want to learn how to scrape different videos from many platforms, look at this:


The pep8 standard is also an easy read with so many useful explanations:


Don't let it turn you into a fucking pedant, though.

When it comes to maintaining standards in a code base, one needs to be a pedant.

The mentality of someone who thinks that an 89-character line that ends with " # noqa" is better than the 81-character line without that ending because now it passes flake8 is one I'll never understand.

Never seen someone do that. That's pretty bad.

I was more defending being a pedant about whatever coding standards your team agrees upon rather than any specific PEP-8 standard.

It is no problem to increase PyLint's line length limit if your team wants to use something other than the PEP-8 79 character limit.

I don't disagree, but keep in mind that on the large codebase you want to avoid false negative in term of pep8 errors.. I.e. if every time you build you see 100s of errors, you start to ignore those.

+1 to that.

Guido van Rossum, the creator of Python, co-wrote a web crawler in under 500 lines: https://github.com/aosabook/500lines/tree/master/crawler

It's especially interesting because it takes advantage of new Python features like asyncio.

I learned a lot about python by reading through the tornado codebase. it's pretty easy to read, well broken up into functions, and not too big.

I would recommend reading about the Tornado WebServer. It features some nice stuff such as coroutines, async stuff.


NetworkX - https://networkx.github.io/ Good example of object-oriented programming patterns (mixins) in Python and module organization.

Thanks! Not sure if I am ready to take on OOP and mixins in Python yet, but I do like the idea of module organization so it's definitely on my list now.


1 - (web, network, file system, and more).

    tornado - 
2 - scapy

    The entire product runs on the python CLI


NLTK: https://github.com/nltk/nltk with the documentation at http://www.nltk.org. I found the code easy to follow through.

I referred to it when adding tests for tokenizers in a common lisp NLP application: https://github.com/vseloved/cl-nlp/.

Agreed. NLTK's value is not that it should be your go-to source for NLP algorithms -- when there's a specific task you need to accomplish, there'll be a specific solution by now that works better than NLTK.

NLTK's value is that it shows you how to write NLP algorithms, and gives you an understandable starting point when you need to do something that nobody has implemented yet.

agreed: I also learned a lot from nltk too, esp. freqdist/probdist as well as the classifiers in nltk and sklearn

Also, how about examples of good web applications built on python with available source code?

Rather than seeing the code of great libraries, I sometimes want to see how people use them in the real world.

Gentoo Portage package manager. Its a big project, lots of moving parts, actively developed. Really helped me with learning Python

I think Radon (https://github.com/rubik/radon/) is a good example of clean code. It's quite a small project, but it also gets right all the setup chore (like setup.py, testing, coverage, travis integration, etc.) which a beginner may find interesting.

Also, I second all the suggestions about Kenneth Reitz code, it's wonderful!

Openstack does a large amount of testing for their code[1] but they is a huge amount of it. Barbican[2] is one of the newer less crufty components.

[1]: https://github.com/openstack/openstack [2]: https://github.com/openstack/barbican

Openstack is so large it also has a plethora of examples of bad code. Especially in vendor-contributed drivers and such. I wouldn't pay attention to the code so much as the incredible infrastructure and process that manages it and improves it over time.

Seriously, I've seen "sychronization threads" that use no synchronization primitives that have a single bare exception handler trap the critical section. The handler just restarts the computation... leading to fun times.

Cinder also has an interesting problem in the explosion of ABC "mixins" used for constructing a backend driver. It went from 5 in the previous release to something like 25 presently. A patch to fix it: https://review.openstack.org/#/c/201812/

But even the process isn't perfect. I've seen patches that change a couple of log lines get dog-piled with -1 nit-picks about inconsequential wording and take months to get through.

Openstack is an interesting beast but it's not a good example if you're just learning, IMO.

I happen to agree, Ive been working on it for the past year and its sort of killed my drive to work with something 'big' again.

That being said Barbican is quite good, which I think is due to its proof of concept being written in Go[1] before being ported.

If you want to learn decent testing Openstack is a good example, the code as a whole not so much. This also only applies if you want to use unittest or their custom test module, testtools. py.test is a much nicer way to do testing IMO

[1]: https://www.youtube.com/watch?v=245rSZBdm9s

I still don't get it why they only write in Python (Horizon also has Less, JS) However a lot of code would be way cleaner / clearer when they wouldn't use Python that much. Especially when looking at Horizon. They use Django mainly to use Django, they nearly need nothing from Django at all. The most things they did were quite hacky. I mean horizon is full of magic to do really simple things. And especially a lot of code got glued together over time..

Hey, thanks for that type of input. This is exactly why I posed the question on HN. I really have no idea on how to judge Python code, so that's why I put the question to the community. I am looking for good code, but I don't (can't) know it when I see it, so I trust you guys to lead me to it.

Thank you for all the good replies peeps!

Cool, thanks, didn't realize openstack was Python. It may be too much for me to take on at my level, but it is definitely on my 'to look at' list.


I would also add

*Twisted (https://github.com/twisted/twisted)

For async python.

I disagree. Twisted is a sprawling codebase, it started as a game library that turned into an async library along the way, you need to read books to get the full documentation, and some of it doesn't even have docstrings or comments.

You are right it is not the best example of an open source project, and for what you say it neither is a good python project example. But is there any other place you can get the hold of working async with python, it may be hard, but you would learn a lot.

But maybe there are other async python projects that I don't know of. If you know of any please post them, I would also like to learn more about the subject.

I´m just a user and not a contributor (I don´t know the source code), but the project is following good webdev techniques

* https://github.com/web2py/web2py/

* https://github.com/web2py/pydal

Django, Tornado, Sentry, Newsblur

Really the hard way?

https://hg.python.org/cpython/file/3.5/Lib/collections/__ini... Knowing specialized data structure in a language is always important and this is well coded.

Well see the 2 lines I pointed in one of the standard library in python and you will understand that even in good language there will always be dust under the carpet at one point. https://hg.python.org/cpython/file/5c3812412b6f/Lib/email/_h...

Thanks for the recommendations, I think they may be a bit above me at the moment (I don't have enough knowledge of Python to be able to identify dust under the carpet :)

But you did teach me one thing, I didn't know that the """ could be used for commenting. I thought it was only for multi-line print statements.

These are exactly the type of things I hope to learn while code-reading.


Even better: If you put the string right after your function/class definition, python automatically assigns it to the `__doc__` attribute of the object. This is how the `help` function works.


As ironic(?) as this sounds, where is this doc feature documented?

(Part of my learning is that I am trying hard to learn how to navigate the various Python documentation. For example, I had a hell of a time trying to find a list of format strings, as different sources refer to it as different things)

PEP-257 is where you can find documentation on them.


Thanks for that, and thank you for getting me into the whole PEP. Previously I had only heard about PEP 8, and I thought that was just it , an opinionated piece about how to write Python. Now I have a whole other source of information to look at.

And for people who will no doubt call me out on this, I have been working on Zed's LPtHW for about a month. I have done numerous searches regarding Python problems, and this is literally the first time I have come across a PEP reference to look at beside PEP 8.

The features added in PEPs should be incorporated into the documentation -- if they're not, please file a bug -- so people mostly don't refer to PEPs except for historical purposes.

It's not code, but also see Raymond Hettinger's talk "Beyond PEP 8" on YouTube. Actually, pretty much any talk by Hettinger is worth watching:


PEPs are basically the Python community's RFCs (if you're familiar with those).

BTW, what you call "an opinionated piece on how to write Python" was originally written by Guido van Rossum, the guy who invented Python in the first place. I think that gives him the right to say something on the topic...

(You ought to know that Python has a large following with a strong community feeling. Making negative remarks about their BDFL - benevolent dictator for life - van Rossum does not go down well ;-) )

I do know the idea behind RFCs.

And I hope that other people don't think I used the wording 'opinionated' in a negative way. I do know about the concept of a BDFL, and I do know that Guido was behind those comments.

But I still stand behind my assessment that the PEP 8 comments are opinionated. And maybe rightly so. ;)

Doesn't Guido himself preface some of the things in PEP8 as being his pet peeves?

I think you'll find lots of people in the Python community who think that Python 3 was a mistake, that GvR's insistence on artificially weakening lambdas is a mistake, etc. Disagreeing with style is pretty minor.

Doesn't make it any less of a community. Families can disagree with each other and still love each other.

For the benefit of other novice like me, here [0] is the index for all PEPs.


You might actually find Dogpile (also by Mike Bayer) more approachable and useful in this context. Even the separation of the project into logically distinct packages is instructive.

- https://bitbucket.org/zzzeek/dogpile.core - https://bitbucket.org/zzzeek/dogpile.cache

This one could be very interesting as I have decades of experience with DBs and SQL, so I was always wondering about ORMs, and sqlalchemy seems to be best in field at the moment.

Thanks for the recommendation.


Large pieces of SQLAlchemy are about metaprogramming and some syntactic magic required for an easy-to-use ORM. It's mostly nice on the inside (to my cursory glance), but neither simple nor easy for an inexperienced person.

Still it's a good example how to make many relatively complex parts easily composable and efficient.

For the web developers out there, what do you think of reddit? Any honest commentary on it? https://github.com/reddit/reddit

good examples with a (hopefully multiplatform) GUI?

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact