Hacker News new | past | comments | ask | show | jobs | submit login
Python+Scipy+Matplotlib vs Matlab?
73 points by schtog on Nov 13, 2008 | hide | past | favorite | 44 comments
I'm learning datamining, machine learning, image processing etc by myself now but will start uni next year probably doing the same.

I tried Octave briefly and wasn't that impressed. Ok some neat functionality and easy matrix manipulation but pretty ugly and language isn't as nice as Python. Not sure about interoperability with other tools.

I already knew Python so I naturally tried numpy+scipy+matplotlib and was literally blown away. So easy to use, really nice plotting capabilities and it is extremely convenient using a real and interactive programming language, especially when that language is Python. Being able to do everything with one tool is awesome.

I haven't tried Matlab yet because it costs money. Big minus right there obv especially since it is not exactly cheap. It is also proprietary which is a another problem. From what I have gathered it is an awesome tool though and there are huge amounts of Matlab-code out there.

Have any of you tried both? Which do you prefer and why? Do you think Scipy can take over? Python seem to be used everywhere in science; signal processing, chemistry, bioinformatics, NASA, Google etc.

I have found a library for pretty much everything for Scipy though. I mean having 10 different FFT-libs isn't exactly much of a plus, one great one is enough. So does Matlab beat Scipy on that point or not? And is Matlab much better than Octave? Different feel? Does Matlab allow easy interaction with databases and other tools?

The sage project (sagemath.org) has explicitly stated that its goal is to become a viable open source alternative to Mathematica, Matlab, and Maple. To a reasonable extent, it has already reached that goal and it is progressing and changing rapidly. And it plays well with other mathematical systems (R, Octave, Mathematica, Matlab, Maple, Maxima, Magma, etc.).

sage is built as an extension to Python with both a terminal interface (extended version of ipython) and a web-based interface. It builds on Scipy and many other tools.

Anyway, it's a strong enough system that I've used it to replace both Mathematica and Matlab in my daily activities. It's FFT is similar in speed to that of Matlab (both based on the same open source software). Check it out.

Also, NumPy and SciPy are being integrated with Sage.

Interesting you should say this, as this is exactly what I've been doing this week as well.

If you don't like Octave's language, you won't like MATLAB's, they're almost identical. They were both designed for engineers (I mean engineers, not computer programmers) to explore matrix models interactively, then save their work as scripts - you were never meant to use m-files for general purpose programming. MATLAB was the first GUI programming I did that wasn't just for myself.

What you're paying for with MATLAB is access to the Mathworks Toolboxes. If you need them then it's absolutely worth every penny. If you don't, Octave will do. You also get prettier graphics with MATLAB than gnuplot can do. Matplotlib is good, but it's nowhere near MATLAB, which does a heck of a lot more than plotting 2D graphs.

I think Python will get an increasing market share here because it's free and easy to use and lets you do things that are clunky in MATLAB like parsing log files (in the past I have used C or Perl to munge things into a format MATLAB will like tho' it is possible in m-files). I don't think MATLAB is in any danger tho', it does too much too well and has enormous mindshare and legacy (no-one doing civil or mechanical or aeronautical engineering cares what Google uses to show people ads - SRSLY). Think about how people happily pay for Photoshop when there's GIMP. DSP is another big MATLAB market, but (AFAIK) biologists aren't doing the matrix-heavy computations that it's best suited for.

You can get a cheap version of MATLAB when you're a student (or know a student) it used to have a limited matrix size (256*256 IIRC) but I think now it doesn't. That was sufficient for the Mech Eng stuff I was doing, but not in my 3rd year when I got into image recognition. It's really nice, definitely worth trying, your university probably already has it.

> Think about how people happily pay for Photoshop when there's GIMP.

I don’t think this is a legitimate comparison (i.e. the GIMP isn’t a strictly more powerful but less focused superset of Photoshop’s features; instead it’s a strictly less powerful knock-off with an often unintelligible interface and much less polish). Numpy/Scipy are quite nice, and make creating any tools that need a bit of real programming (i.e. not just matrix math) much much nicer than trying to work with MATLAB. If someone else already built the tools using MATLAB and you don’t need to write any code whatsoever yourself, that’s obviously nicest of all. But that doesn’t sound like what the question is really about.

To the OP: if you already know Python, absolutely use Numpy/Scipy. They are fantastic.

Matlab is, to the programmer with experience in almost any other language, a tremendous horror. The language is almost as cobbled together and inconsistent as PHP with the added bonus of being worked on almost exclusively by engineers instead of people who actually want to write programs (this coming from an engineering student here).

That being said, if you have the mathematical chops to rearrange your problem into something solvable via matrix transformations, you can probably write it quickly and elegantly in Matlab without worrying too greatly about execution speed. Better, the built in toolboxes have already solved huge (engineering) problem spaces. Code already written is better than code potentially written...

Unless you want a solution that is repeatable or more general than Matlab affords. At that point you'd be better off doing the math by hand. I feel that the Python et al and C solutions fit into this niche. Prototype the math in Matlab, implement in a language that doesn't suck.

I don't agree Matlab is a horror - and I am a CS person, not an "engineer" as you say. It is an excellent domain-specific language. It might not be a good general-purpose language, but it was never intended to be that. True, some of the "bolt-on" parts of Matlab feel inconsistent, but the matrix/vector/tensor core is very elegant and powerful.

I would consider Matlab to be a DSL grown out of control. If you are working purely with matrix/vector/tensor math, as you say, you can be reasonably comfortable with the language as long as you remember to double-check the docs on nearly every function before you use it just in case there is some strange edge-case it handles awkwardly.

Unfortunately, in any real project I've ever done with Matlab you cannot survive just inside the matrix DSL and once you start touching the bolted-on bits you lose patience very quickly.

So, I suppose I agree, it's a nice DSL, but I'd also wager that the majority of people using it don't realize the limitations implicit there.

...some strange edge-case it handles awkwardly.

Or stupidly. I don't know if the following should qualify as an edge case generally, but it does in Matlab.

The controls lab in the aerospace department at the U of W had a wonderful inverted cart-pendulum device that a grad student was doing some seriously freaky non-linear control research on. We were tasked, as part of a lab, to write a controller in Matlab that would balance the pendulum and move the cart to a designated position. After trying for days to come up with a combination of gains to result in stable control, I decided to write an evolutionary algorithm to choose a controller.

It was dead simple. The genome was a list of gains. The code created generations of genomes and then ran them through the simulation. The tricky part (for me at that time) was finding the settling time (walk backwards from the end of the data, checking for values outside of the tolerance). Choose the winners, perturb, preserve best performers, and go around again. Doing all the steps by hand worked wonderfully.

When I ran the code, I got combinations of gains that resulted in massively unstable systems; systems that overshot off the chart as soon as physically possible given the hardware. What was going on?

It turns out, when I ran the simulation from the command line, Matlab graciously accepted my 5-second simulation time. However, when run from a script, Matlab ignores that parameter and runs the simulation until the output careers off the chart. Because I was looking for a minimum time with no overshoot, my algorithm was finding combinations of gains that overshot steeply enough to convince Matlab to stop simulating ASAP! It was bizarre. My secondary condition was minimum overshoot, so I got solutions that were exactly steep enough to cause the simulation to stop when the output hit the desired value!

A friend of mine stumbled on to a workable combination of gains. I used those.

You're misrepresenting the situation a bit. Matlab was not stupid here. The evolutionary algorithm you implemented found a way to satisfy your constraints that you didn't think of. That would have happened regardless of the language you chose.

I'm not arguing that Matlab is beautiful - it's not. I have worked in Matlab for years and years, and know most of its ugliness. But your example highlights your frustration with the task more than any particular weakness of Matlab.


The point is that the behavior changes based on whether you are running from the command line or from inside a script, that this is completely arbitrary, and that there was no override possible for this behavior. When run from the command line, simulations for combinations of gains that caused the system to career away did indeed career away, they just continued to do so for the full length of time specified in the function call. That kind of abject flakiness renders the program unsuitable for professional use in that you can't plan for those kind of quirks. I had invested hours by the time I worked out what the bug was, and invested hours more trying to fix it. I finally had to give up, after investing my time and effort, because the platform was fundamentally and fatally incapable of doing what it was asked.

It is completely unacceptable for a package to have such wildly different (and undocumented) behavior for the same exact function call. Matlab was indeed stupid. It was a buggy minefield of stupidity when I was using it.

Finally, it looks like your defense of this awful weakness on the part of Matlab is, ironically, a result of your commitment to the platform rather than a reasoned response to the case outlined.

Please accept my apology.

It seems I didn't read the final paragraph in your original post very well.

I was not defending a weakness of Matlab, but instead misunderstood what you said. Indeed, I'm a pretty harsh Matlab critic myself because I know its ugly parts well.

(edit: grammar)

I worked for Enthought (the people behind SciPy, and now NumPy, as Oliphant works for Enthought, as well) for several years when I lived in Austin. Nearly all of the guys that work on SciPy and Matplotlib came from using Matlab in some capacity or another. I've never worked with Matlab, but from the examples I've seen, I much prefer the Python version of things to the Matlab version.

I'll also mention that SciPy is in use at some of the biggest companies in the world, and because of its stronger programming language base, it can be used for much larger problem sets than Matlab. Massive fluid dynamics computing projects, requiring clusters of machines, for example, is feasible (and being done) with SciPy. Likewise for geological data analysis. I don't know anything about parallelizing Matlab, but I'm guessing the possibilities are much greater with Python.

And, of course, Python skills are probably more transferable to other work.

I don't see how you could lose by trying SciPy. It's free, has a great community of incredibly smart people (it's the community with the highest ratio of PhDs to others that I've ever been a part of), and is fun to play with.

If you're working for personal gain, you're probably better off with the Python+ solution because it's a) cheaper, and b) translates into knowledge that can be harnessed in other applications you write in Python. You're probably unlikely to write a real application in Matlab.

If you're going to be doing work at a university, you may find reluctance from the persons you are working with as I did a few years back. The argument you'll likely get is that when you pay for Matlab, you pay for the assurance that the implementation of the tools provided is correct and therefore your research is based on a proven foundation.

I saw the argument, agreed, but disagreed in the logic presented that those contributing to Python+numpy+scipy+matplotlib didn't have a vested interest in those tools also being completely correct. Afterall, I hear NASA is using some of this stuff...

Space Telescope Science Institute (they run Hubble under a NASA contract) has contributed to development of numarray and numpy.

I'm going to say the unthinkable here:

In my Information Retrieval class, I got numpy/scipy set up and went about implementing homework assignments with it.

However, no matter how much I tried to push as much as possible down into the matrix libraries implemented in C/C++, the surrounding Python code slowed everything down. I was having trouble getting everything to finish in time to hand in my homework by the deadline.

I talked to a classmate who was using Java, and not having any speed problems at all. The night before it was due, I rewrote the whole thing in Java and got it to finish running (I handed in a day late, but at least I had something to hand in.)

I'm sure there are tricks to make things faster in Python. (For example, I later figured out a method I was calling was running all Python code, and if I had called a different method, it would have dropped directly into the fast C code.) But with Java, I didn't have to think about performance. It was just fast.

Java almost certainly has a library for anything you might possibly want to do. "But," I hear you say, "that means I have to write my program in...Java! shudder"

And I empathize with you. Which is why now I'm doing a lot of experimenting with Clojure. Fast as Java, because it compiles to the JVM (as long as you follow as few guidelines.) Access to any Java library with no extra effort on your part. (One of my favorite moments on one of Rich Hickey's Clojure video is where he shows a macro that makes Java calls requiring FEWER parentheses than Java. He was pretty excited about that.) I found a Java open source matrix library that, while not nearly as pretty as Python, got the job done.

So, that's my totally radical recommendation. Clojure + whatever Java libraries you need to get your work done.

> However, no matter how much I tried to push as much as possible down into the matrix libraries implemented in C/C++, the surrounding Python code slowed everything down. I was having trouble getting everything to finish in time to hand in my homework by the deadline.

Did you profile your code? What was the slow bit?

Sorry, it's just been too long now to remember the details. I think it was the EM (expectation maximization) algorithm, though, which is hard to reduce entirely to matrix operations, at least for me it was.

Any good implementations for EM in Python (plus bridged libraries) out there?

I've been wondering, are there Java libs as good as numpy/scipy? Sure, in general Java has more libraries but is that true when it comes to math?

Oh, and great call on Clojure. I've been having fun with it too.

Here is one take on Python vs Matlab: http://vnoel.wordpress.com/2008/05/03/bye-matlab-hello-pytho...

Also, check out Enthought's distribution: http://www.enthought.com/products/epd.php

That first link was really informative. Thanks.

I've had a lot of luck building scientific / research tools in Python / NumPy with Matplotlib. Integrating Fortran / C / C++ code is extremely easy as well, things like Cython / Pyrex also ease that quite a bit as well.

My biggest complaint about Matlab (besides the licensing) is that it's just a horrendously bad programming language (if you can call it a language at all). Any self-respecting hacker deserves better.

Matlab you have to buy a toolbox for everything (e.g. SQL database interaction). There is not (for most applications) you could want to do that Matlab can do and Python + NumPy can't.

Just thought I'd mention that `ipython -pylab` exists. It's a fantastic python repl (tab completion, all that nice stuff) which integrates nicely with matplotlib/numpy.

Matlab does have one advantage, though: scientific library support. Need neural networks, TV based image processing algorithms or something else similarly specialized? Matlab wins.

Yes, libs are my biggest concern but Python has neural net libs, don't know the quality though. Also very good image processing libs but perhaps not one for TV-stuff.

> My biggest complaint about Matlab (besides the licensing) is that it's just a horrendously bad programming language (if you can call it a language at all). Any self-respecting hacker deserves better.

I second this. I found it extremely hard to work in Matlab during the day after spending hours the night before writing personal stuff in Python.

If you're affiliated with a university, it's very likely that a Matlab license is already available to you. I do research in graphics, and I find it extremely productive to test ideas in Matlab. you can hardly find a better time-saving procedure than testing an idea in Matlab and knowing in 10 minutes that it doesn't work, instead of spending a week implementing it in C++. Once you find what works, re-implementation in C++ is a breeze.

I've used both and I greatly prefer python+scipy+matplotlib. Because it's a real programming language, unlike Matlab's clunky Basic-like thing. So if matplotlib doesn't do what you need, you can extend it or just write something custom in GTK+Cairo. The depth and stability of scipy & matplotlib is very impressive.

Depending on your bent, you may find R (http://www.r-project.org/) with Python (http://www.omegahat.org/RSPython/) a very powerful combination. More statistical than some of the others you mention, and sadly constrained by memory unless you play some games, you will probably find that most cutting edge stats code is available for R. Data mining and AI folks also use R, but as you point out, other matrix and functional languages may fit your specific approach.

Numpy/Scipy syntax is very close to matlab, but python is a lot more powerful. I port matlab code over to python pretty frequently.

A toy example is here along links to other comparisons: http://www.datawrangling.com/python-montage-code-for-display...

Most universities have studen license servers for Matlab afaik. And, Matlab is really really, annoyingly powerful. You've got almost anything to try your ideas, implement an academic paper. But, it is slow (execution time).

Prototype with Matlab, implement with C++. That's what I usually prefer. I am not a Python expert, just learning it. but I think it will be slower than C, and less powerful than Matlab, from my perspective of working.

I think you'd be surprised at how powerful NumPy is. Try timing multiplying large matrices (I can tell you who wins, and not by a small margin)

One downside is you do have to poke around for the right functions, whereas in Matlab you have all the functions sitting in your (always global) namespace.

It seems that NumPy is the equivalent of Perl's PDL. PDL is some inline C code so you get the benefit of both languages.

We use python for pretty much everything in my research group -- high-level scripting of simulations written in C; controlling experiments over serial, GPIB, usb; and data analysis and plotting for publications. I used Matlab for several years prior, but got really annoyed by the limits of the language when you try to do anything beyond matrix manipulation. Python + scipy + pylab is a pretty effective replacement for matlab prototyping and data analysis, with a much better general purpose language and FFI. Anything that needs to be fast you can write in C/C++ and wrap with swig or ctypes so that you can still use a high-level language to run all your simulations, and do the data analysis as well.

In my opinion,"Python+Scipy+Matplotlib" can replace the "equal" libs in MATLAB. I had the same problem with python when I decided to develop a face detection software by useing Neural Networks ! In that period of time, there were no complete N.N. lib for python. I don't think that there is any comprehensive one too. long story short, if you want to stick to some small libs, that mixture is OK. But if you want to develop your field (like blending NN and DIP) you will fall in trouble .... As a solution, you can mix python and libs like OpenCV for computer vision and image processing . Check pyOpencv too :)

The other big problem with Matlab is that because it's licensed, you can't just do what you like with it. I've got several servers that I like to be able to crunch numbers on, and that would be not only expensive but incredibly inconvenient to maintain all those licenses of Matlab.

I don't like the looks of the pylab graphs, I wouldn't put them in a paper. However, there's also PyX (http://pyx.sourceforge.net/index.html) which is a little more verbose in usage but makes very nice "latexy" plots that go well along with a latexed paper. And it can do some really advanced stuff, too, that I wouldn't even know how to begin thinking about doing in pylab.

Here is a benchmark comparing R, Octave, Matlab, S-PLUS, and a few other common mathematical programming languages.


Suprisingly, R is one of the best in terms of speed, comparable to Matlab (Matlab is pretty fast if you vectorize your code). Plus, if you like functional programming, the R language is based on Scheme.

I find that matlab is a great program to try out new ideas and algorithms when you are just starting out. The plotting library is also very easy to use and there are many helpful built in functions for matrix, linear algebra manipulations. However the language itself is a horror to code in, its based on fortran. I still really hate the 1 based array indexing after using it for years.

That sounds like me about IDL... ;-)

integer division JUST KILLS Python for scientific computing, it introduces the most disgusting silent errors everywhere. And the visualization functions generally suck compared to the handy plot() of Matlab.

Still, Py handily beats other free alternatives to Matlab

Two comments about this:

1. In Python 3.0, integer division will return a float, e.g., 1/3 will be 0.3333... At Scipy 2006, Guido explicitly stated in his keynote talk that the design choice he made in Python (i.e., that n/m is floor(n/m)) was a mistake.

2. In Sage (http://sagemath.org), which is built on Python, we do some very minimal preparsing of input, so that 1/3 is the exact rational number 1/3 (instead of Python's stupid 1/3 == 0). We also replace, e.g., 2^3 by 23. Sage is does a lot of exact symbolic and high precision arithmetic, so 1/3 staying the rational 1/3 makes sense as the default (though one can easily change this).

Disclaimer: I started the Sage project.

What do you mean with silent errors? Like float representation?

I find matplotlib to very easy to use and creates beautiful plots but I haven't tried Matlab so can't compare obv.

Tastes vary (as the mix of comments here will attest), but having sampled a variety of development and analysis tools, I have settled on MATLAB as my tool of choice. Part of my reasoning can be found in the Nov-08-2006 posting to my Weblog, "Why MATLAB for Data Mining?":


I am a climate scientist and Matlab offers a lot to me : great matrix syntax, very fast algorithms (SVD, matrix inversion, FFT), large mindshare (cf Central File Exchange), and advanced toolboxes for statistics, spectral analysis and the like. Not being a programmer by birth, I couldn't care less about the fact that it's not a general purpose language ; it is very adequate for my own purpose.

BUT... it is proprietary, pretty expensive, and at the moment I am considering problems that require parallel computations on dozens of CPU. Matlab has a Parallel Computation Toolbox that is a joke (<= 8 CPUS at a time, monopolizing the same number of licenses), and that is why I am considering Python.

Does NumPy enable parallel computations ? Is it reasonably easy to translate Matlab code into NumPy ? Is there a good library of linear algebra routines (like SVD, eigendecompositions, inversions, LU, Cholesky decompositions, etc..) ? A large user community with a comprehensive archive ?

Well... matplotlib (and scipy to a lesser extent) have atrocious design and documentation from a programming perspective. On the other hand, Matlab is worse in that regard and Python is an excellent general purpose programming language. Unless you plan on using one of the more obscure libraries that come with Matlab, I would advise the python combo.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact