

Python Packages for Social Scientists - limist
http://www.drewconway.com/zia/?p=204

======
devinj
Eh, some of the items in this list aren't so great.

simplejson is included in the standard library as json now. Yes, cjson is
faster, but it's also more fragile and, well, less standardized. Everything
but simplejson is effectively dead: simplejson won the json library wars.

html5lib is an odd choice for a new HTML parsing kit. lxml.html offers its own
fast and good HTML parser, and can also use html5lib or BeautifulSoup as its
backend while still producing an Element tree with plenty of convenience
methods (like xpath). I am not sure why html5lib would be better here (maybe
it's not as complicated, somehow?)

As for MySQLdb, that thing is awful. There are some replacements out there,
and out of them my personal preference is for oursql. The creator/maintainer
is a regular (an op, actually) on the #python IRC channel, so it's easy to get
support, and it fixes a lot of the nasty quirks of MySQLdb.

His first four suggested libraries are definitely good, though

<http://docs.python.org/library/json.html>
[http://simplejson.googlecode.com/svn/tags/simplejson-2.1.0/d...](http://simplejson.googlecode.com/svn/tags/simplejson-2.1.0/docs/index.html)

<http://codespeak.net/lxml/lxmlhtml.html>
<http://codespeak.net/lxml/elementsoup.html>
<http://codespeak.net/lxml/html5parser.html>

<https://launchpad.net/oursql>

EDIT: checked the date, the article is from 2008. I guess that explains
things. :)

------
MaxMorlock
I wonder how large the proportion of social scientists is who actually know
that Python is a programming language. (I am a social scientist myself).

In my own experience, it is already pretty hard to teach them R.

~~~
limist
Many of the students and academics from the older engineering fields (e.g.
mechanical, chemical) don't know of python either. When I tell them (python +
numpy + scipy + matplotlib) is as potent as MatLAB for most situations, and
far more flexible and generalizable later, most don't believe me. They get
started faster with MatLAB, then struggle later with its constraints, and end
up creating overly-specialized, one-off tools that few others can use.

Ideally, science/engineering schooling (college and up) should not allow
anyone to graduate without knowledge of at least one general purpose
programming language. But in the meanwhile, those who can program while
working in another field - whether sociology or environmental engineering -
have an enormous advantage.

~~~
physcab
I know exactly what you are talking about. When I first started my coursework
in Machine Learning, I was given the freedom to use any programming language.
But I simply defaulted to MatLAB because the rest of the CS department used
it. I would have loved to do all my work in Python--and infact I tried to with
NumPy and SciPy-- but it was just too hard to resist the comfortable
environment MatLAB creates. So now I'm stuck porting over all my old code,
which to be quite honest, is a good exercise in itself.

------
chasingsparks
For agent-based models -- what the author seems to enjoy -- also see
<http://cs.gmu.edu/~eclab/projects/mason/>

------
cool-RR
I would like to recommend my own project, GarlicSim: <http://garlicsim.org>

It is a Pythonic framework for simulations. It's still in alpha, and I'd be
happy to help people start using it in research.

------
agconway
Glad this list still resonates with researchers.

I have added an addendum to the bottom of the list promoting the Enthought
distribution (<http://www.enthought.com/>) of Python, which includes most
(maybe all) of these packages in a single distribution.

------
lunchbox
I would add Scrapy (<http://scrapy.org/>) for web crawling.

------
jordanmessina
Wow I've never heard of NetworkX. Are there any other good python packages for
graph theory?

~~~
evgen
The big three are networkx, igrpah, and python-graph. Each has it's advantages
and disadvantages, so test each one a bit if you are doing serious graph work
in python. For toy/casual problems I would recommend python-graph or networkx.

------
mumrah
This seems to be a good list for anyone using Python in their research, not
just social scientists.

I used numpy for all my numerical analysis homework. The professor always
lauded me for having the most concise code in the class.

