

Why Python rocks for research - agconway
http://www.stat.washington.edu/~hoytak/blog/whypython.html

======
miloshasan
Python is awfully close to being a superior (and free) replacement for Matlab,
but there are a few annoyances that keep preventing me from switching forever.
Unfortunately, these are mostly not bugs but bad design that is believed to be
correct by the core developers, so it is unlikely to ever change:

\- Matrices are a pain. The r_[] and c_[] operators could be a reasonable
replacement for Matlab's elegant matrix construction syntax, but they do not
work as expected (as smart hstack and vstack), instead doing something
completely different and inconsistent for vectors and matrices.

\- Tensors are a bigger pain. Matlab has a very well-defined semantics for
operations like permute and reshape; in NumPy these operations sometimes
create just a view, at other times they reshuffle the memory contents. I know
the idea was to "protect" the user from having to know the memory layout of
data, but this idea is bad.

\- Ipython is great in every way except when it comes to reloading parts of
your program. After any tiny change to your code, the only safe thing to do is
to quit ipython and start it again. All the other options (run, reset,
reload...) make some secret and wrong assumptions on what you want to reload.
In contrast, this works flawlessly in Matlab.

------
apl
In the end, it's all about the ecosystem. Perl wins for bioinformatics because
there are boatloads of scientists already using it, with all the neat
libraries and resources that brings. Equally, Python wins for, say,
prototyping in robotics because of libraries, support and so on.

There's nothing intrinsically science-apt about Python/Perl, but Ruby and
friends can't compete when it comes to the programming environment; that's
what counts.

~~~
cageface
As a language I prefer Ruby but the Python ecosystem for this kind of thing is
definitely a huge advantage. You can actually do quite a bit with Ruby + GSL
but it's still not really competitive.

~~~
apl
Yep, same with me. I do prefer Ruby, but it's just not feasible at the moment.
Languages that get behind in a certain area enter a terribly vicious circle:
There's no scientific community backing Ruby, so why would I develop something
that'd make Ruby more competitive?

~~~
djacobs
I'm in the same boat and it seems like several others as well. I'm fine moving
to Python for the moment, but I wonder how the Ruby community will ever know
if there's enough of a demand for scientific tools to merit their development?

~~~
phren0logy
I prefer Ruby generally, but honestly Ruby and Python are sufficiently similar
that I'd rather smart programmers put their time to good use doing something
other than reimplementing Python's science libs.

I think that making some tools available in a totally different language
(maybe something functional) would be much more useful, because it would allow
for a very different approach to the problem if needed.

In a perfect world, we could also have the option of using light wrappers
around OpenCL matrix libraries, and push the linear algebra to GPUs that eat
matrices for breakfast.

~~~
cageface
Agreed. There's only so much skilled labor available for this kind of thing.

This is exactly why I'm focusing on Python lately. Ruby is a great language
but I don't want to be pigeon-holed as a web guy forever. I've already done
over 10 years of web dev and I'd like to try out a couple of new problem
domains before I kick the bucket.

~~~
djacobs
I'll have to agree as well. The only problem for me is: if not on the web, how
are we going to make GUIs that aren't severely limited to our platform? Wasn't
web design supposed to solve the "platform question"?

~~~
xiongchiamiov
There are a number of cross-platform GUI toolkits. For instance, QT is pretty
nice wherever you put it, and KDE's been putting a lot of work into making
their libraries and such work on Windows.

~~~
djacobs
I'm not a huge fan of Qt simply because I can't get it to feel natural on
Gnome, my desktop of choice. I've looked into Gtk+, but I'm not so sure I want
to commit to it yet.

Are there plans to make Qt feel more natural on Gnome and OS X?

------
RodgerTheGreat
This article makes some fairly convincing arguments that Python is a more
flexible tool than Matlab or Perl, but I can't help but come away with the
sense that the author hasn't tried many other languages.

There are an awful lot of languages that provide iterators, a powerful set of
data structures, extensive libraries and facilities for structuring and
maintaining large codebases. .Net languages (maybe F# would be good for
this?), Java or most of the emerging languages for the JVM stack, Ruby (which
is generally considered to be "different but equivalent" to Python), and so
forth.

~~~
mechanical_fish
I only scanned the article very briefly, but my impression is that the
important comparison is vs. Matlab. The other players aren't really in the
author's game. It's a question of the use case and the community and the
library support.

In theory, .Net could displace Matlab or Python as the canonical platform for
scientific researchers. And in theory Python could displace PHP as the
canonical platform for classic CRUD web apps. In practice neither is likely to
happen, no matter how much we might or might not wish it to.

~~~
hogu
I've been a Matlab user for 10 yrs, recently switched to python because of the
increased flexibility. never going back to Matlab ever again. Python math and
science libs can be a little rough around the edges, but the flexibility more
than makes up for it for me.

~~~
samd
I'm a Matlab user myself, but I try to use Ruby whenever possible if for
nothing else but to just get away from all the god damned matrices. I didn't
realize Python had similar capabilities, I look forward to trying it out. But
does Python have "Index exceeds matrix dimensions." errors? I just can't
imagine life without seeing a few of those every day. That said, the workspace
is really handy, is there some equivalent GUI in Python?

~~~
infinite8s
Check out EPD: <http://www.enthought.com/products/>, particularly the ETS
framework. It contains almost all the popular scientific/numeric libraries in
python and is free for academic use (and much of it is open-source).

(disclaimer - i work for Enthought).

------
ogrisel
And the corollary: Why do researchers never respect the PEP8 when they write
python code?

Yes I am a bit overreacting since the blog post is very well written and I
actually agree 100% with the content. But please people: respect the PEP8 [1].
It makes your readers feel at home while reading your code. It is very
important if you want to get new contributors to your project. See [2] for
instance.

[1] <http://www.python.org/dev/peps/pep-0008/> [2]
[http://www.dataists.com/2010/10/whats-the-use-of-sharing-
cod...](http://www.dataists.com/2010/10/whats-the-use-of-sharing-code-nobody-
can-read/)

~~~
BrandonM
From PEP 8:

> The preferred place to break around a binary operator is _after_ the
> operator, not before it.

I'd be interested in hearing the justification for this rule. I think that
leading a continuation line with the binary operator makes it super-clear that
it is a continuation line. What is the benefit of the preferred style?
Compare:

    
    
      if (the_result_of_this_function(on_this_arg) == 10
          and this_overly_descriptive_boolean):
          do_stuff()
    
      if (the_result_of_this_function(on_this_arg) == 10 and
          this_overly_descriptive_boolean):
          do_stuff()
    

To me, the first one is quite clearly a continuation line (no statement can
start with "and"). The second requires closer inspection.

~~~
zephyrfalcon
I would write:

    
    
      if the_result_of_this_function(on_this_arg) == 10 \
      and this_overly_descriptive_boolean:
          do_stuff()
    

Indenting the second line of the if statement would, at first glance, indicate
that it's part of the block instead. Then again, it depends. If it was the
header of a def statement, I would follow the PEP, e.g.

    
    
      def __init__(self, width, height,
                         color='black', emphasis=None, highlight=0):
    

On a side note, I once did the "Art & Logic challenge"
[<http://www.artlogic.com/>] and they use guidelines that apply to several
languages, e.g. you would use the same formatting style for C++ and for
Python, if at all possible. Much of it flies in the face of PEP 8.

------
rdouble
I've wasted most of my professional life tweaking various unix software to
make it work. However, the typical scientific python setup proved to be too
frustrating to install on OSX. The recommended solution is to just buy the
Enthought distro. If I'm paying for software anyway, why is Enthought better
than Matlab?

~~~
hogu
disclaimer I work for enthought

I did my whole phd in matlab.

EPD is much cheaper and is free for academics

even if it weren't free, I would use it anyways.

but it really isn't why is EPD better than matlab, it's why python is better
than matlab. matlab is a domain specific application with a domain specific
language. It doesn't work well with things outside of its domain.

python is a general purpose language (And as such, has good general purpose
constructs) but it happens to have excellent scientific and mathematical
libraries. This is useful when you actually have to apply your research and
build an application.

numpy is also better for large data, because slicing arrays does not create
copies of them (you can make it do so if you want to, but it doesn't by
default) in matlab, slicing large arrays can cause you to run out of memory.

Cython makes it really easy to start out with python, and then optimize your
code down into C.

with python you can run your calculations over a massive compute grid. Use
messaging libraries like PyZMQ to distribute your data and result, and build
real time GUIs to consume the final results.

\- a matlab cluster is quite expensive

\- chacko - another enthought python library which is free and open source is
great for real time datavisualization, matlab does not have anything
equivalent.

\- python has a large number of messaging libraries, with matlab I think
you're stuck with MPI.

Matlab always made me feel limited. I would work on a problem, and then reach
a point where Matlab could not do what I needed to do.

That rarely happens to me with python.

~~~
rdouble
Thank you, that is the kind of response I was looking for. I will take a look
at Enthought.

~~~
hogu
you WILL get frustrated by some things - some of the matrix concatenation
operations are less convenient, some of the libraries are less polished, it's
been worth it for me. msg me if you need help.

use IPython, not just python shell for interactivity.

also checkout 3d datavisualization with mayavi, that stuff is really awesome.

~~~
Anon84
Is there a _mayavi_ tutorial somewhere? It looks pretty interesting, but I
could never figure it out.

~~~
hogu
<http://conference.scipy.org/scipy2010/tutorials.html>

there was a mayavi tutorial, and the files are available at the link

------
bwooceli
I learned Python on the fly specifically for research. I used Django to build
out an enterprise reporting/analytics system to support a customer experience
(survey) program for the cost of time (huzzah open source). We had bids on
this project upwards of 80k. We generate ~100k surveys / month and are able to
get targeted, meaningful, automated insights directly to front-line
management. Python FTW.

~~~
shill
Python + Django FTW.

------
agconway
Python rocks, but Python + R + bash rocks way harder for research

~~~
levesque
R is definitely powerful and a good part of any scientific data analysis
toolkit.

I use python, ipython, matplotlib, numpy and R. I call my R scripts directly
from python using rpy.

~~~
wildanimal
Agreed. I use Python for heavy shell-scripting and text-processing (though R
surprisingly does have respectable facilities for all but the most
overwhelming of these tasks) and R for the rest of the analysis. I've thought
about switching to NumPy/SciPy as it's part of Python to integrate everything,
but R's data frame, factors, and reshape, plyr, and lattice packages makes you
think very differently about how to approach the data - and hard to go back to
lower-level manipulation of arrays; not to mention all the stats/graphics
packages which are very easy to install and apply. And documentation of its
functions is superb.

------
sintesoro
Python is good, you could also consider Maxima.

A single example:

f(x):= x^2+3 _x+7;

Maxima provides: Symbolic computation, blas and laplack integration for
numeric algebra, 500 pages manual in several languages, a complete library for
statistics, differential equation, calculus, series. Graphics with
matplotllib. Also maxima language is not much complicate that python:

for i in range(10):print i_i versus for i:0 thru 9 do print i _i;

[i_ _2 for i in range(10)] versus makelist(i_ *2,i,0,9)

But Matlab libraries are greater than python and maxima.

~~~
kwantam
Maxima rocks for symbolic math. I prefer it to Mathematica, which is saying a
lot. In contrast, octave always feels like "almost-Matlab" and I still prefer
the latter.

Also see wxMaxima, which will (among many other things) produce LaTeX for you.

~~~
nurbl
If you haven't already, you might also want to check out Sage
(<http://www.sagemath.org/>) It uses python to "glue" together other free math
tools (e.g. Maxima) into a unified system with a nice interface.

------
maurits
I feel this article is somewhat unbalanced in its single minded rejoice for a
certain tool/environment. So in the same spirit here come a couple of reasons
not to switch from Matlab to Python, all stemming from my experience when I
decided to try to switch from Matlab to Python/C

\- installing all these packages on (any) system is painful. Different
versions don't play together or don't work (yet) on some platform and or
architecture. This stems from my own experience of getting a version of python
to work with numpy, scipy, matplotlib, opencv and PIL on a windows, mac, and
linux machine. No 100 percent success yet on any platform.

\- central and consistent documentation. Even for very simple cases, I got a
bit of a headache. I encounter a python print statement for the first time
that obviously differs somewhat from its c printf cousin. I google "python
print syntax" only to find that the first xx hits, including the official
documentation, do not cover the full specification of this statement. I fear
the moment I might actually need detailed information on something less
trivial.

\- Numerical integration is more accurate in Matlab.

\- Visualization capabilities of matlab are more powerful. But who knows,
perhaps there is yet another package floating around :-)

\- Matlab may not have advanced data-structures, but it is a rapid prototyping
tool, for testing ideas. If I need to write an actual application, I will use
a tool and language geared for that task.

~~~
hogu
install is painful - enthought python distribution does make it pretty
painless, but its not free for non-academics

agreed on documentation

actually I think python's visualization capabilities are more powerful, have
you looked at mlab? the 3d capabilities there are insane

I use python because I can do rapid prototyping, and turn it into a full
application with the same code base.

did you ultimately go back to matlab?

~~~
maurits
Well, I only got it all the work ( on OS X ) like an hour or two ago, and am
currently happily learning&exploring.

I wanted to venture beyond Matlab because for what I am currently doing the
environment and language is to limited, yet I do not wish to prototype in C++.
Python together with some libraries seemed to be a deal in heaven. I also
thought it would function as a better stepping stone towards an actual
application.

~~~
tmarthal
Installation of the PIL pre-built packages on OSX is notoriously difficult,
since it relies on system c libraries to do some processing.

Moving your development to a linux machine will clear up all of those issues.

Also 'easy_install' should get you all of the packages you want.

------
woodson
Often languages are chosen based on already existing tools used in a specific
research project. That's why it's good to be able to quickly pick up new
languages. For example scripting languages integrated in some frameworks, like
Scheme in the Festival speech synthesis system. In the end, this often results
in projects involving things like Python, Scheme, bash, R and a bit of tcl
;-).

------
b_emery
_> In MatLab everything is flat – all functions are declared in the global
namespace. However, this discourages code reusability by making the programmer
do extra work keeping disparate program components separate. In other words,
without a hierarchical structure to the program, it’s difficult to extract and
reuse specific functionality. _

I completely disagree. Reusable Matlab code has been my holy grail for the
last couple of years. The key is to break out specific functionality as
subfunctions. When these are abstracted and generally useful elsewhere, then
they become new tools for the toolbox. The subfunctions also make great
starting points for repurposing code. This layout results in _much less_ work.

~~~
tel
It's absolutely true that using more functions makes your code more
maintainable. It's also absolutely true that nearly every other language on
the planet does this better than Matlab.

~~~
snth
Yes; in particular, Matlab's restriction of ONE outside-callable function per
file is a huge pain.

------
woodson
For those interested in scientific frameworks and existing packages:
Scientific track of EuroSciPy 2010 [1]. Everything from seismology to visual
programming.

<http://www.euroscipy.org/track/870>

------
ansgri
One more powerful combo is R+Java+C. UI, integration and massive data
processing in Java, R for prototyping, plotting and model fitting, and C if
you have numerical simulations.

