Why astronomers (and other scientists) should program in Python

juiceandjuice · on May 27, 2011

Actually most astronomers and physicists I know are already moving toward Python.

Coming from a mixed particle physics and astronomy experiment, almost all our tools are written in Python for researchers, but usually with ROOT (C++, http://root.cern.ch/ ) underneath. ROOT is python friendly.

SciPy/NumPy, PyROOT, PyFITS are all unbeatable tools for anything in Physics or Astronomy as far as I'm concerned. Throw some knowledge of C in with that, and you can scale anything up to supercomputing clusters or back down to your laptop, and that's a very important, powerful thing for a scientist.

foob · on May 27, 2011

In high energy physics we've recently completed a big transition from FORTRAN77 to C++. PAW and CERNLIB were standard in the 90's but now most major experiments use ROOT. You can use ROOT with Python but most people don't bother. Programming comes pretty naturally to most physicists and learning the details of the ROOT libraries and experiment specific libraries takes much more time than picking up C++. I've seen most grad students get comfortable with the basics of C++ in a couple of days while they typically struggle with the usage of the libraries for a number of months. The documentation is autogenerated from the code so in many cases you need to read through the library code to figure out how things work. In these situations it's very helpful to be as comfortable as possible in C++. The benefit of using Python instead of C++ would be fairly minimal in my opinion because so much of analysis work depends on understanding these libraries that are written in a lower level language out of necessity.

What occurred to me while reading this article though is that we should perhaps be using Python for our scripting needs. We have a number of scripts that we use regularly for querying databases and submitting jobs and most of them are written in Perl. I think that Python would be much easier to read and hack; especially because most of the physicists don't know Perl.

masterzora · on May 27, 2011

Background: I've done a touch of IDL astronomy programming with Marc Buie. I haven't touched IDL or astronomy for about a year, though I am still in contact with a couple astronomers.

In my (anecdotal) experience, it seems that a number of astronomers would actually love to move to Python but I've only seen one actually make an effort to make the move. The primary reason cited has been that some particular library modules they use haven't been ported yet. Glancing around the links from the article, it definitely seems that the porting has come a long way, but that's not the same thing as completeness.

Now, obviously the best solution would be if these astronomers each took it upon themselves to translate their own necessary tools and contribute back to the Python community, but that's pretty unrealistic, I think. I write Python for at least a good 8-10 hours/day and my IDL was clean, tight, and well-documented and I'm still terrified of the thought of translating my project.

Of course, the other question I have (and would _love_ an answer to) is how well Python holds up performance-wise to well-crafted IDL astronomy code. We were working with a data set large enough that, even on a cluster, each step of the process took between several days and a couple weeks. If Python could improve that, it would have been amazing, but if it would only hurt then Python would basically be useless for that project.

cop359 · on May 27, 2011

You know what's even easier to learn and more beginner friendly, has great debugging tools and is very powerful? MATLAB.

And most engineers/scientists use it. If you know MATLAB, I don't see why you would even bother going to figure out the dozens of Python libraries available.

splat · on May 27, 2011

Speaking as a grad student in astronomy, I tried using MATLAB for a little while but it occupied an awkward middle ground for me. It was a bit too high-level for the lower-level things that python is useful for (e.g. grabbing images off of a survey's servers or writing a script to align a spectroscopic mask), but it was a bit too low-level to compete with the things Mathematica is good at (e.g. symbolic integration, carrying units through equations).

One major advantage python has over MATLAB that the author mentioned but didn't really emphasize is that python packages exist to interface with IRAF. Pretty much every astronomer who has to touch data uses IRAF to reduce the data and I can tell you from personal experience that IRAF is a horrible piece of software to work with directly. The python interface makes it much more tolerable, but I don't think there exists a comparable MATLAB version. Without a MATLAB interface to IRAF, python beats MATLAB for astronomical purposes hands down.

thesnark · on May 27, 2011

MATLAB is a bad choice for science. It is proprietary, making it difficult to share work among people who do not own a license, therefore making it more difficult to reproduce results.

Not to mention the fact that the MATLAB language is extremely awkward anytime you want to work with something that isn't a matrix.

dexen · on May 27, 2011

GNU Octave [0] is mostly compatible with MATLAB. GPLed and implemented in Fortran and C++, it's pretty much hack-able to your needs.

> MATLAB language is extremely awkward anytime you want to work with something that isn't a matrix.

From my limited experience, that's true but rarely of relevance. You can, and want to, use matrix operations all the way. Less bugs (simplier code), faster execution.

[0] http://www.gnu.org/software/octave/

thesnark · on May 27, 2011

In my experience 90% of scientific data analysis work is: getting the data, cleaning it and transforming it into a form suitable for analysis. MATLAB fails miserably at these tasks.

dexen · on May 27, 2011

> 90% of scientific data analysis work is: (...) and transforming it into a form suitable for analysis.

Is that infinite recursion or endless loop? Or is there an end to it, caused by quantum nature of work -- at some point the 90% becomes an undivisible unit? ;-)

((terribly sorry, couldn't help it))

At any rate, that sounds like you want to feed the input through a pipe of simple, programmable textual filters -- cue sed, awk etc. And then pipe into standard input of something -- I know Octave has standard input, MATLAB I wouldn't be so sure.

keypusher · on May 27, 2011

Many, many labs are migrating their MATLAB code to Python these days. This is no accident. My last job (academic research) I was a full time MATLAB coder, and I can say that MATLAB has many problems of its own. It has a decent all in-one IDE/REPL that makes it easy to get started and do off the cuff work. It has very well optimized matrix operations, you can cook up some decent scripts and it has some nice toolboxes, but for anything larger it very quickly becomes a huge mess. Try maintaining and updating a 100k+ loc MATLAB script with others. Or writing a GUI. It's not pretty. MATLAB might not quite deserve the terrible reputation it has within the larger software community, but I would never go back, and can testify that there is a massive and rapid migration among the scientific community away from MATLAB into Python going on right now.

d0mine · on May 27, 2011

Bye Matlab, hello Python, thanks Sage https://vnoel.wordpress.com/2008/05/03/bye-matlab-hello-pyth...

10 Reasons Python Rocks for Research (And a Few Reasons it Doesn’t) http://www.stat.washington.edu/~hoytak/blog/whypython.html

Should i switch to Python? http://stackoverflow.com/questions/5063037/should-i-switch-t...

zwieback · on May 27, 2011

Matlab and Python address different problem domains, as the article explains and I think anyone in a technical career would be well advised to know a scripting language. Whether Python is better than others is hard to say and the article doesn't really explain why Ruby or Perl couldn't be used.

I've looked at a bunch of students' resumes lately - every single one listed Matlab as one of the languages they know. Not seeing a lot of Python in the resumes I'm reviewing, which are mostly EE students. I think Matlab is becoming the de-facto standard for numerical analysis and there are some nice toolboxes.

Coming from a traditional SW background I find some of the Matlab syntax alien but fundamentally it's a great product.

gammarator · on May 27, 2011

Ruby or Perl are certainly good options, although some of Perl's syntax can by off-putting to the beginner, IMO. Right now, there is much better library support for astronomers in the Python ecosystem, though.

SriniK · on May 27, 2011

This is also one more important feature/pain author didn't capture. Python is designed with scientific calc in mind. Floats are represented as pure floats like C or C++ way.

python

>>> 2 * 3.1 6.2000000000000002

perl -e "print 2 * 3.1" 6.2

ruby -e "print 2 * 3.1" 6.2

dexen · on May 27, 2011

That's a good point, but it may be a superficial difference, actually. The final output is not necessarily direct representation of value of variable, but may be rounded up by default for easier use. Rounded up in a configurable way.

For example, in MATLAB, Octave and JavaScript, a value is displayed (formatted) as an `integer' if it's very close to an integer (closer than certain epsilon -- and that's configurable in case of M & O). At all the time, what the variable holds -- and what undergoes computations -- is a purely float value, never an integer.

duke_sam · on May 27, 2011

  >>> from decimal import Decimal
  >>> 2 * Decimal("3.1")
  Decimal('6.2')

juiceandjuice · on May 27, 2011

Most engineers use MATLAB. Most scientists don't. Chemistry adopts it a bit more, but if a scientist is using it, chances are they are actually solving an engineering problem or just making a plot, and not performing data analysis or simulation.

keypusher · on May 27, 2011

Dunno where you got that idea. I used to work in an fMRI lab at MIT. We would put people in a brain scanner and have them do some tasks. After, we had a timeseries of very high resolution 3-D blood flow images in the brain (these are basically matrices with a number for the value in each voxel). Then we ran data analysis. I'm talking a multi-stage data analysis using different statistical techniques for motion correction, normalization, modeling the haemodynamic response function, slice timing correction and lining all this up with the time data for the tasks, then doing whole group analysis. This generally ran on a 12 node Linux cluster we had in house and typical analysis could take hours or days. Up until recently, MATLAB dominated the research in this field, but now everyone is moving to Python, or wrapping existing programs in Python.

Anyway, it's not only engineers that use MATLAB. It is quite popular in the sciences, and not just for plotting.

juiceandjuice · on May 27, 2011

Well this sort of thing especially is where MATLAB shines (i.e. Matrix processing, Signal Analysis, hence MATrix LABaratory), especially with 3D FFTs like you'd do with fMRIs. So that's fine, but once you start getting into Monte Carlo simulations and massive batch processing, event filtering, instrument calibration, and all your database connections and all sorts of other stuff, it honestly just doesn't scale up or provide a complete solution. So you end up doing some in MATLAB, some in C, some bash scripts, some in whatever else. Python gives the flexibility at all stages to solve all that in one or two languages (C/C++ for optimization, python for everything else). I do know chemists that use MATLAB, but not the computational chemists on the big clusters. Most the solid state physicists I know use OriginLab, LabView and even Maple or Mathematica all on Windows and proprietary stuff like that, usually because they are hooking up to various instruments. Even R has been more popular with the physicists I know than MATLAB.

Maybe this is all just a big anecdote, but I also work in scientific computing as well, and this has been my experience.

orenmazor · on May 27, 2011

as somebody who spent quite a large amount of time turning the algorithms physical engineers (photonics, specifically) into something software people can use (into c# and ironpython), I'd rather they all stuck to python. I'm tired of having to spend my day translating matlab into something useful.

b_emery · on May 27, 2011

Have you ever tried to use the matlab compiler for this sort of thing? Too limiting?

ignifero · on May 27, 2011

You mean I have to install and licence it on every system that i use? No thanks, i prefer python that I can use everywhere, even in remote clusters overseas. FYI Numpy + Matplotlib have a very intuitive syntax for matlab users. http://mathesaurus.sourceforge.net/matlab-numpy.html

zwieback · on May 27, 2011

I imagine any kind of number crunching would be done in a different language or use optimized libraries. I wonder how many numerical apps are moving from Fortran to modern C++, e.g. using templates for optimized number crunching.

tmarthal · on May 27, 2011

I think that a lot of 'number crunching' scientific programs that evolved in the 1990's were written in Matlab. Most of them in the 1980's and earlier where in FORTRAN77. Where the comparison to python really gets interesting is the addition of the REPL, the (now mature) NumPy/SciPy libraries, tkinter (python Tk libraries), and a bunch of other common things that are causing many established scientists and engineers to ditch their Matlab licenses and move towards python.

This is more geared at the analyst level, which you could compare astronomers to, rather than the scientific programmer level (who have been writing C or C++ in the 1990s/2000s instead of Matlab). Python makes things neat because the same libraries (mentioned in the article) that the scientific programmers are using can now be used directly by the analysts/astronomers.

TheSOB88 · on May 27, 2011

Maybe finding those people and engaging them on a personal level would help more than submitting to HN.