
Ask HN: Good python code for code reading - btw0
Reading good python code must be an enjoyable learning experience, any suggestion?
======
jnoller
I would definitely recommend sync'ing the python subversion tree and picking a
few modules and reading through them. Doing so taught me some of the less
obvious things within the language, and also taught me a lot about the various
dunder (__foo__) methods for objects.

Additionally, I'd recommend reading through the PEP documents
(<http://python.org/dev/peps/>) - there's a lot of great examples and
rationales contained in those.

Finally, Doug hellmann has done an excellent job with his Python Module of the
Week series (<http://www.doughellmann.com/PyMOTW/>) and a new project "The
hazel tree" (<http://www.thehazeltree.org/>) is doing a great job at compiling
the various examples, docs/etc together in one place.

~~~
Jasber
This was a major eye-opener for me as-well. I jumped into various python
modules and was amazed at what I discovered. Like the easter-egg hidden away
in "this.py":

<http://pastebin.com/f25f08f20>

(Couldn't figure out how to get code formatting to work properly)

~~~
andyn
Ha. All that code at the bottom of the file could be replaced by "print
s.decode('rot13')" - I suppose it's backwards compatible though...

~~~
ivank
Or forward-compatible, since encode/decode in py3k is between bytes and
unicode only.

------
icey
It's an essay that contains code, but this is one of my all time favorites:

Peter Norvig writes a spelling corrector in 21 lines of Python:
<http://norvig.com/spell-correct.html>

~~~
jcl
Norvig's sudoku-solving essay is pretty awesome, too (100 lines of Python).

<http://norvig.com/sudoku.html>

~~~
vlad
Speaking of python code, are there any cool data structures and such that are
better or easier to use than java's?

And here's my take on Norvig's sudoku solver and spell checker that people
have posted in this thread.

I created a sudoku solver in java for a class a few weeks ago. It uses depth
search plus back tracking, which means it is very efficient with memory. It
uses a single matrix, whereas Norvig's solution has the possibility of
creating more and more variations of the board in memory at the same time.
This isn't a big deal for a 9x9 board, but my sudoku solver, which is probably
also written in 100 or 200 lines of code, can solve sudoku problems of any
size board, including 16x16 that I found on a web site, and even 100x100...
which when I made up a puzzle for it with maybe 8 values filled in, thinking
there must be a solution, I ended up ending the program after 20 minutes
because I had to go to class. :) Also, I'm going to have to read what he did
more carefully at a later point, as it seems he describes many cool
approaches.

And I've also written a spell checker a few years ago when I was maybe 20
years old, based on reading the idea of getting rid of vowels and replacing
consonants in words to their phonetic sound (there's like 9 possibilities),
and comparing it against each of the phonetic spelling of the dictionary
words. In other words, you would shrink the word to what remained the phonetic
sounds, eg. words that might sound alike or very close. Find a list of
suggestions based on how close the phonetic sounding of the dictionary words
are to the phonetic sounding of the misspelled word (word that's not in the
dictionary.) Order the list of suggestions by how close the actual dictionary
word is to the actual misspelled word. It worked very well. I added endings
like -ing and pluralization. The suggestions ended up being incredibly cool.
Once again, I think this is more useful than Norvig's example because the
spell checker I wrote could suggest words that aren't spelled even remotely
close, but could be what the user meant, while Norvig's would only suggest
corrections to a misspelled word that has a few letters transposed or missing
a few letters, as long as most or all of the real letters were in fact there--
mine didn't require even a single real letter to match or be in the
misspelling. Also, it didn't need training models.

Finally, Peter says he's amazed that others don't realize how a spell checker
might work, and I'm amazed he didn't consider that google very likely harvests
search queries to make logical assumptions based on user behavior, e.g. "a
user had 3 results and corrected some words and now he got 20,000 results, and
therefore those words are either related or misspellings of each other." I
thought google might be doing this back in 2004, if not earlier, in order to
be able to suggest alternative spellings to queries that might not even be
dictionary words, like names of celebrities. That is way more obvious to me
than just a spell checker.

I've even once googled for a theorem, and the #1 result was my math
professor's web page describing it. The next day I searched this again and
noticed that google was redirecting search results (links) to track them,
which I noticed happened from time to time (i.e. the search results would take
you to what I assumed was a google counter first, and then the actual page,
instead of directly to the link like normal, so google was collecting stats or
whatnot on their user's patterns from time to time, or so I thought.)

So I clicked on the 2nd link a couple of times, making sure I waited 30
seconds or so each time so that google believed it was a good search result
(i.e. that I didn't press the back button right away, implying I hated the
result--at least, I imagined might be happening and that's what it might be
detecting and might have made a difference), and then refreshed the search
result page. Now, my professor's page had swapped places with the previous #2
result!

So this shows that google does use user queries and behavior to improve their
results. And right now, you can type in a search for Pauel Garahum and it
knows who it is. It might be using a cool spell checker, it might use phonetic
spelling methods, or even better and cooler, simply track that this is what a
previous user searched for, got no results, and edited their search query just
slightly before submitting for a successful query with 20,000 results, and
then proceeded to go to one of the results and not come back to google for 2
hours--thus the other users were happy--so this means that we can suggest to
this user, who is running a bad or misspelled search as others have in the
past, the query that other users changed theirs to after not finding anything.
(Then refine this until you can make logical conclusions on a regular basis,
live, and don't need to have a page of no results to trigger this logic, etc.)

~~~
jcl
In my mind, Python's most important data structures are its set, tuple,
dictionary, and list. While they are no more powerful than what you can find
or make in Java, they are extremely convenient to use. Note that Norvig solves
both problems exclusively using these structures.

Norvig's sudoku solver _is_ using depth-first search and backtracking,
implemented in the function "search". I'm not sure where you are getting the
idea that it is simultaneously using many more boards than the search depth.

Google could well be using every clever trick you can think of to implement
their spelling corrector, but I think you're overestimating the value of
tracking user variations over multiple searches. It's highly unlikely that
someone searched for "Pauel Garahum" in the past, then corrected it to "Paul
Graham". Likewise, you can search for "brootnenny spars" and it comes up with
a good suggestion. More likely, they are using the search frequency as an
indicator of correctness (P(c) in Norvig's article) and coming up with a
better error model (P(w|c)), probably using phonetics as you proposed earlier.
And once you have this, you don't really need to go through the effort of
correlating search variations.

It is, however, well-known that Google tracks the links that people click on
and uses this information to improve search rankings. They may well be
tracking whether or not the user clicks on results and using this fact to
improve their estimate of the correctness of the search.

...And I'm not sure if you know this, but the reason Norvig specifically
mentions _Google's_ spelling correction is because he is Google's Director of
Research. He didn't "not realize" that Google could be using search results to
improve spelling correction; he intentionally left stuff out because the
article is only supposed to be an introduction to spelling correction.

------
kilowatt
I learned quite a bit from web.py
(<http://github.com/webpy/webpy/tree/master>). It's small enough to be fun to
poke through, but has more than enough "advanced" Python tricks to be worth
your while.

------
pogos
BitTorrent <http://download.bittorrent.com/dl/>

~~~
notdarkyet
<http://www.onlamp.com/pub/a/python/2003/7/17/pythonnews.html>

This article might help you along the way. Inside it is a link to Bram Cohen's
blog post titled "How to Write Maintainable Code". I have never actually
looked into the Bittorrent code myself but depending on your skill level,
understanding whats going on in there might be a bumpy ride.

------
mamama
The Cookbook (Amazon it, I'm too sleepy to link) contains examples of
idiomatic code that you should use.

~~~
silentbicycle
<http://oreilly.com/catalog/9780596001674/>

------
pistoriusp
I've always found Django to be a very clean code base.

------
bayareaguy
Python ships with plenty of good python code. Just take your time and read
through the Lib directory of your standard python distribution.

<http://svn.python.org/view/python/trunk/Lib>

------
olefoo
Mailman <http://list.org/>

Things you should be looking at are queues and error handling in an
asynchronous message passing architecture.

Also you'll learn that not every web application needs an SQL database for
persistence.

------
alecco
For performance and algorithms the implementations at
<http://shootout.alioth.debian.org/u32q/python.php> and pay attention to the
different benchmarks.

Also all RPython code coming from pypy, sometimes shown in
<http://morepypy.blogspot.com/>

They are reimplementing all the C modules and doing a great job. The new
implementations are of course closer to current best practices in python.

Enjoy.

------
llimllib
pybloxsom is a nice project to hack on; you can read and understand the whole
of the code in a day, and it illustrates the "request handler with filters"
design pattern very nicely. <http://pyblosxom.sourceforge.net/>

------
stuartcw
I enjoyed reading the code from "Hacking RSS and Atom" by Leslie M. Orchard
(ISBN: 978-0-7645-9758-9). There's a lot more to it than just RSS related code
and if you read the book you get the explanation too.

------
nirs
twisted is very clean and readable.

------
intellectronica
The Zope3 codebase.

