Ask HN: Good python code for code reading

jnoller · on Oct 9, 2008

I would definitely recommend sync'ing the python subversion tree and picking a few modules and reading through them. Doing so taught me some of the less obvious things within the language, and also taught me a lot about the various dunder (__foo__) methods for objects.

Additionally, I'd recommend reading through the PEP documents (http://python.org/dev/peps/) - there's a lot of great examples and rationales contained in those.

Finally, Doug hellmann has done an excellent job with his Python Module of the Week series (http://www.doughellmann.com/PyMOTW/) and a new project "The hazel tree" (http://www.thehazeltree.org/) is doing a great job at compiling the various examples, docs/etc together in one place.

Jasber · on Oct 9, 2008

This was a major eye-opener for me as-well. I jumped into various python modules and was amazed at what I discovered. Like the easter-egg hidden away in "this.py":

http://pastebin.com/f25f08f20

(Couldn't figure out how to get code formatting to work properly)

andyn · on Oct 9, 2008

Ha. All that code at the bottom of the file could be replaced by "print s.decode('rot13')" - I suppose it's backwards compatible though...

ivank · on Oct 9, 2008

Or forward-compatible, since encode/decode in py3k is between bytes and unicode only.

icey · on Oct 9, 2008

It's an essay that contains code, but this is one of my all time favorites:

Peter Norvig writes a spelling corrector in 21 lines of Python: http://norvig.com/spell-correct.html

jcl · on Oct 9, 2008

Norvig's sudoku-solving essay is pretty awesome, too (100 lines of Python).

http://norvig.com/sudoku.html

vlad · on Oct 10, 2008

Speaking of python code, are there any cool data structures and such that are better or easier to use than java's?

And here's my take on Norvig's sudoku solver and spell checker that people have posted in this thread.

I created a sudoku solver in java for a class a few weeks ago. It uses depth search plus back tracking, which means it is very efficient with memory. It uses a single matrix, whereas Norvig's solution has the possibility of creating more and more variations of the board in memory at the same time. This isn't a big deal for a 9x9 board, but my sudoku solver, which is probably also written in 100 or 200 lines of code, can solve sudoku problems of any size board, including 16x16 that I found on a web site, and even 100x100... which when I made up a puzzle for it with maybe 8 values filled in, thinking there must be a solution, I ended up ending the program after 20 minutes because I had to go to class. :) Also, I'm going to have to read what he did more carefully at a later point, as it seems he describes many cool approaches.

And I've also written a spell checker a few years ago when I was maybe 20 years old, based on reading the idea of getting rid of vowels and replacing consonants in words to their phonetic sound (there's like 9 possibilities), and comparing it against each of the phonetic spelling of the dictionary words. In other words, you would shrink the word to what remained the phonetic sounds, eg. words that might sound alike or very close. Find a list of suggestions based on how close the phonetic sounding of the dictionary words are to the phonetic sounding of the misspelled word (word that's not in the dictionary.) Order the list of suggestions by how close the actual dictionary word is to the actual misspelled word. It worked very well. I added endings like -ing and pluralization. The suggestions ended up being incredibly cool. Once again, I think this is more useful than Norvig's example because the spell checker I wrote could suggest words that aren't spelled even remotely close, but could be what the user meant, while Norvig's would only suggest corrections to a misspelled word that has a few letters transposed or missing a few letters, as long as most or all of the real letters were in fact there--mine didn't require even a single real letter to match or be in the misspelling. Also, it didn't need training models.

Finally, Peter says he's amazed that others don't realize how a spell checker might work, and I'm amazed he didn't consider that google very likely harvests search queries to make logical assumptions based on user behavior, e.g. "a user had 3 results and corrected some words and now he got 20,000 results, and therefore those words are either related or misspellings of each other." I thought google might be doing this back in 2004, if not earlier, in order to be able to suggest alternative spellings to queries that might not even be dictionary words, like names of celebrities. That is way more obvious to me than just a spell checker.

I've even once googled for a theorem, and the #1 result was my math professor's web page describing it. The next day I searched this again and noticed that google was redirecting search results (links) to track them, which I noticed happened from time to time (i.e. the search results would take you to what I assumed was a google counter first, and then the actual page, instead of directly to the link like normal, so google was collecting stats or whatnot on their user's patterns from time to time, or so I thought.)

So I clicked on the 2nd link a couple of times, making sure I waited 30 seconds or so each time so that google believed it was a good search result (i.e. that I didn't press the back button right away, implying I hated the result--at least, I imagined might be happening and that's what it might be detecting and might have made a difference), and then refreshed the search result page. Now, my professor's page had swapped places with the previous #2 result!

So this shows that google does use user queries and behavior to improve their results. And right now, you can type in a search for Pauel Garahum and it knows who it is. It might be using a cool spell checker, it might use phonetic spelling methods, or even better and cooler, simply track that this is what a previous user searched for, got no results, and edited their search query just slightly before submitting for a successful query with 20,000 results, and then proceeded to go to one of the results and not come back to google for 2 hours--thus the other users were happy--so this means that we can suggest to this user, who is running a bad or misspelled search as others have in the past, the query that other users changed theirs to after not finding anything. (Then refine this until you can make logical conclusions on a regular basis, live, and don't need to have a page of no results to trigger this logic, etc.)

jcl · on Oct 10, 2008

In my mind, Python's most important data structures are its set, tuple, dictionary, and list. While they are no more powerful than what you can find or make in Java, they are extremely convenient to use. Note that Norvig solves both problems exclusively using these structures.

Norvig's sudoku solver is using depth-first search and backtracking, implemented in the function "search". I'm not sure where you are getting the idea that it is simultaneously using many more boards than the search depth.

Google could well be using every clever trick you can think of to implement their spelling corrector, but I think you're overestimating the value of tracking user variations over multiple searches. It's highly unlikely that someone searched for "Pauel Garahum" in the past, then corrected it to "Paul Graham". Likewise, you can search for "brootnenny spars" and it comes up with a good suggestion. More likely, they are using the search frequency as an indicator of correctness (P(c) in Norvig's article) and coming up with a better error model (P(w|c)), probably using phonetics as you proposed earlier. And once you have this, you don't really need to go through the effort of correlating search variations.

It is, however, well-known that Google tracks the links that people click on and uses this information to improve search rankings. They may well be tracking whether or not the user clicks on results and using this fact to improve their estimate of the correctness of the search.

...And I'm not sure if you know this, but the reason Norvig specifically mentions Google's spelling correction is because he is Google's Director of Research. He didn't "not realize" that Google could be using search results to improve spelling correction; he intentionally left stuff out because the article is only supposed to be an introduction to spelling correction.

kilowatt · on Oct 9, 2008

I learned quite a bit from web.py (http://github.com/webpy/webpy/tree/master). It's small enough to be fun to poke through, but has more than enough "advanced" Python tricks to be worth your while.

pogos · on Oct 9, 2008

BitTorrent http://download.bittorrent.com/dl/

notdarkyet · on Oct 9, 2008

http://www.onlamp.com/pub/a/python/2003/7/17/pythonnews.html

This article might help you along the way. Inside it is a link to Bram Cohen's blog post titled "How to Write Maintainable Code". I have never actually looked into the Bittorrent code myself but depending on your skill level, understanding whats going on in there might be a bumpy ride.

mamama · on Oct 9, 2008

The BitTorrent code looked awful the last time I saw it; complete with 20 arguments for one function.

mamama · on Oct 9, 2008

The Cookbook (Amazon it, I'm too sleepy to link) contains examples of idiomatic code that you should use.

silentbicycle · on Oct 10, 2008

http://oreilly.com/catalog/9780596001674/

pistoriusp · on Oct 9, 2008

I've always found Django to be a very clean code base.

bayareaguy · on Oct 9, 2008

Python ships with plenty of good python code. Just take your time and read through the Lib directory of your standard python distribution.

http://svn.python.org/view/python/trunk/Lib

olefoo · on Oct 9, 2008

Mailman http://list.org/

Things you should be looking at are queues and error handling in an asynchronous message passing architecture.

Also you'll learn that not every web application needs an SQL database for persistence.

alecco · on Oct 9, 2008

For performance and algorithms the implementations at http://shootout.alioth.debian.org/u32q/python.php and pay attention to the different benchmarks.

Also all RPython code coming from pypy, sometimes shown in http://morepypy.blogspot.com/

They are reimplementing all the C modules and doing a great job. The new implementations are of course closer to current best practices in python.

Enjoy.

llimllib · on Oct 9, 2008

pybloxsom is a nice project to hack on; you can read and understand the whole of the code in a day, and it illustrates the "request handler with filters" design pattern very nicely. http://pyblosxom.sourceforge.net/

stuartcw · on Oct 9, 2008

I enjoyed reading the code from "Hacking RSS and Atom" by Leslie M. Orchard (ISBN: 978-0-7645-9758-9). There's a lot more to it than just RSS related code and if you read the book you get the explanation too.

nirs · on Oct 9, 2008

twisted is very clean and readable.

intellectronica · on Oct 9, 2008

The Zope3 codebase.