
Python+Scipy+Matplotlib vs Matlab? - schtog
I'm learning datamining, machine learning, image processing etc by myself now but will start uni next year probably doing the same.<p>I tried Octave briefly and wasn't that impressed. Ok some neat functionality and easy matrix manipulation but pretty ugly and language isn't as nice as Python. Not sure about interoperability with other tools.<p>I already knew Python so I naturally tried numpy+scipy+matplotlib and was literally blown away. So easy to use, really nice plotting capabilities and it is extremely convenient using a real and interactive programming language, especially when that language is Python. Being able to do everything with one tool is awesome.<p>I haven't tried Matlab yet because it costs money. Big minus right there obv especially since it is not exactly cheap.
It is also proprietary which is a another problem.
From what I have gathered it is an awesome tool though and there are huge amounts of Matlab-code out there.<p>Have any of you tried both? Which do you prefer and why?
Do you think Scipy can take over? 
Python seem to be used everywhere in science; signal processing, chemistry, bioinformatics, NASA, Google etc.<p>I have found a library for pretty much everything for Scipy though. I mean having 10 different FFT-libs isn't exactly much of a plus, one great one is enough.
So does Matlab beat Scipy on that point or not?
And is Matlab much better than Octave? Different feel?
Does Matlab allow easy interaction with databases and other tools?
======
dhbradshaw
The sage project (sagemath.org) has explicitly stated that its goal is to
become a viable open source alternative to Mathematica, Matlab, and Maple. To
a reasonable extent, it has already reached that goal and it is progressing
and changing rapidly. And it plays well with other mathematical systems (R,
Octave, Mathematica, Matlab, Maple, Maxima, Magma, etc.).

sage is built as an extension to Python with both a terminal interface
(extended version of ipython) and a web-based interface. It builds on Scipy
and many other tools.

Anyway, it's a strong enough system that I've used it to replace both
Mathematica and Matlab in my daily activities. It's FFT is similar in speed to
that of Matlab (both based on the same open source software). Check it out.

~~~
ninguem2
Also, NumPy and SciPy are being integrated with Sage.

------
gaius
Interesting you should say this, as this is exactly what I've been doing this
week as well.

If you don't like Octave's language, you won't like MATLAB's, they're almost
identical. They were both designed for engineers (I mean _engineers_ , not
computer programmers) to explore matrix models interactively, then save their
work as scripts - you were never meant to use m-files for general purpose
programming. MATLAB was the first GUI programming I did that wasn't just for
myself.

What you're paying for with MATLAB is access to the Mathworks Toolboxes. If
you need them then it's absolutely worth every penny. If you don't, Octave
will do. You also get prettier graphics with MATLAB than gnuplot can do.
Matplotlib is good, but it's nowhere near MATLAB, which does a heck of a lot
more than plotting 2D graphs.

I think Python will get an increasing market share here because it's free and
easy to use and lets you do things that are clunky in MATLAB like parsing log
files (in the past I have used C or Perl to munge things into a format MATLAB
will like tho' it _is_ possible in m-files). I don't think MATLAB is in any
danger tho', it does too much too well and has enormous mindshare and legacy
(no-one doing civil or mechanical or aeronautical engineering cares what
Google uses to show people ads - SRSLY). Think about how people happily pay
for Photoshop when there's GIMP. DSP is another big MATLAB market, but (AFAIK)
biologists aren't doing the matrix-heavy computations that it's best suited
for.

You can get a cheap version of MATLAB when you're a student (or know a
student) it used to have a limited matrix size (256*256 IIRC) but I think now
it doesn't. That was sufficient for the Mech Eng stuff I was doing, but not in
my 3rd year when I got into image recognition. It's really nice, definitely
worth trying, your university probably already has it.

~~~
jacobolus
> _Think about how people happily pay for Photoshop when there's GIMP._

I don’t think this is a legitimate comparison (i.e. the GIMP isn’t a strictly
more powerful but less focused superset of Photoshop’s features; instead it’s
a strictly less powerful knock-off with an often unintelligible interface and
much less polish). Numpy/Scipy are quite nice, and make creating any tools
that need a bit of real programming (i.e. not just matrix math) much much
nicer than trying to work with MATLAB. If someone else already built the tools
using MATLAB and you don’t need to write any code whatsoever yourself, that’s
obviously nicest of all. But that doesn’t sound like what the question is
really about.

To the OP: if you already know Python, absolutely use Numpy/Scipy. They are
fantastic.

------
tel
Matlab is, to the programmer with experience in almost any other language, a
tremendous horror. The language is almost as cobbled together and inconsistent
as PHP with the added bonus of being worked on almost exclusively by engineers
instead of people who actually want to write programs (this coming from an
engineering student here).

That being said, if you have the mathematical chops to rearrange your problem
into something solvable via matrix transformations, you can probably write it
quickly and elegantly in Matlab without worrying too greatly about execution
speed. Better, the built in toolboxes have already solved huge (engineering)
problem spaces. Code already written is better than code potentially
written...

Unless you want a solution that is repeatable or more general than Matlab
affords. At that point you'd be better off doing the math by hand. I feel that
the Python _et al_ and C solutions fit into this niche. Prototype the math in
Matlab, implement in a language that doesn't suck.

~~~
miloshh
I don't agree Matlab is a horror - and I am a CS person, not an "engineer" as
you say. It is an excellent domain-specific language. It might not be a good
general-purpose language, but it was never intended to be that. True, some of
the "bolt-on" parts of Matlab feel inconsistent, but the matrix/vector/tensor
core is very elegant and powerful.

~~~
tel
I would consider Matlab to be a DSL grown out of control. If you are working
purely with matrix/vector/tensor math, as you say, you can be reasonably
comfortable with the language as long as you remember to double-check the docs
on nearly every function before you use it just in case there is some strange
edge-case it handles awkwardly.

Unfortunately, in any real project I've ever done with Matlab you cannot
survive just inside the matrix DSL and once you start touching the bolted-on
bits you lose patience very quickly.

So, I suppose I agree, it's a nice DSL, but I'd also wager that the majority
of people using it don't realize the limitations implicit there.

~~~
mkn
_...some strange edge-case it handles awkwardly._

Or stupidly. I don't know if the following should qualify as an edge case
generally, but it does in Matlab.

The controls lab in the aerospace department at the U of W had a wonderful
inverted cart-pendulum device that a grad student was doing some seriously
freaky non-linear control research on. We were tasked, as part of a lab, to
write a controller in Matlab that would balance the pendulum _and_ move the
cart to a designated position. After trying for days to come up with a
combination of gains to result in stable control, I decided to write an
evolutionary algorithm to choose a controller.

It was dead simple. The genome was a list of gains. The code created
generations of genomes and then ran them through the simulation. The tricky
part (for me at that time) was finding the settling time (walk backwards from
the end of the data, checking for values outside of the tolerance). Choose the
winners, perturb, preserve best performers, and go around again. Doing all the
steps by hand worked wonderfully.

When I ran the code, I got combinations of gains that resulted in massively
unstable systems; systems that overshot off the chart as soon as physically
possible given the hardware. What was going on?

It turns out, when I ran the simulation from the command line, Matlab
graciously accepted my 5-second simulation time. However, when run from a
script, Matlab ignores that parameter and runs the simulation _until the
output careers off the chart._ Because I was looking for a minimum time with
no overshoot, my algorithm was finding combinations of gains that overshot
steeply enough to convince Matlab to stop simulating ASAP! It was bizarre. My
secondary condition was minimum overshoot, so I got solutions that were
exactly steep enough to cause the simulation to stop when the output hit the
desired value!

A friend of mine stumbled on to a workable combination of gains. I used those.

~~~
wjy
You're misrepresenting the situation a bit. Matlab was not stupid here. The
evolutionary algorithm you implemented found a way to satisfy your constraints
that you didn't think of. That would have happened regardless of the language
you chose.

I'm not arguing that Matlab is beautiful - it's not. I have worked in Matlab
for years and years, and know most of its ugliness. But your example
highlights your frustration with the task more than any particular weakness of
Matlab.

~~~
mkn
No.

The point is that the behavior changes based on whether you are running from
the command line or from inside a script, that this is completely arbitrary,
and that there was no override possible for this behavior. When run from the
command line, simulations for combinations of gains that caused the system to
career away did indeed career away, they just continued to do so for the full
length of time specified _in the function call._ That kind of abject flakiness
renders the program unsuitable for professional use in that you can't plan for
those kind of quirks. I had invested hours by the time I worked out what the
bug was, and invested hours more trying to fix it. I finally had to give up,
after investing my time and effort, because the platform was fundamentally and
fatally incapable of doing what it was asked.

It is completely unacceptable for a package to have such wildly different (and
undocumented) behavior for the same exact function call. Matlab was indeed
stupid. It was a buggy minefield of stupidity when I was using it.

Finally, it looks like your defense of this awful weakness on the part of
Matlab is, ironically, a result of your commitment to the platform rather than
a reasoned response to the case outlined.

~~~
wjy
Please accept my apology.

It seems I didn't read the final paragraph in your original post very well.

I was not defending a weakness of Matlab, but instead misunderstood what you
said. Indeed, I'm a pretty harsh Matlab critic myself because I know its ugly
parts well.

(edit: grammar)

------
SwellJoe
I worked for Enthought (the people behind SciPy, and now NumPy, as Oliphant
works for Enthought, as well) for several years when I lived in Austin. Nearly
all of the guys that work on SciPy and Matplotlib came from using Matlab in
some capacity or another. I've never worked with Matlab, but from the examples
I've seen, I much prefer the Python version of things to the Matlab version.

I'll also mention that SciPy is in use at some of the biggest companies in the
world, and because of its stronger programming language base, it can be used
for much larger problem sets than Matlab. Massive fluid dynamics computing
projects, requiring clusters of machines, for example, is feasible (and being
done) with SciPy. Likewise for geological data analysis. I don't know anything
about parallelizing Matlab, but I'm guessing the possibilities are much
greater with Python.

And, of course, Python skills are probably more transferable to other work.

I don't see how you could lose by trying SciPy. It's free, has a great
community of incredibly smart people (it's the community with the highest
ratio of PhDs to others that I've ever been a part of), and is fun to play
with.

------
apgwoz
If you're working for personal gain, you're probably better off with the
Python+ solution because it's a) cheaper, and b) translates into knowledge
that can be harnessed in other applications you write in Python. You're
probably unlikely to write a real application in Matlab.

If you're going to be doing work at a university, you may find reluctance from
the persons you are working with as I did a few years back. The argument
you'll likely get is that when you pay for Matlab, you pay for the assurance
that the implementation of the tools provided is correct and therefore your
research is based on a proven foundation.

I saw the argument, agreed, but disagreed in the logic presented that those
contributing to Python+numpy+scipy+matplotlib didn't have a vested interest in
those tools also being completely correct. Afterall, I hear NASA is using some
of this stuff...

~~~
lutorm
Space Telescope Science Institute (they run Hubble under a NASA contract) has
contributed to development of numarray and numpy.

------
jimbokun
I'm going to say the unthinkable here:

In my Information Retrieval class, I got numpy/scipy set up and went about
implementing homework assignments with it.

However, no matter how much I tried to push as much as possible down into the
matrix libraries implemented in C/C++, the surrounding Python code slowed
everything down. I was having trouble getting everything to finish in time to
hand in my homework by the deadline.

I talked to a classmate who was using Java, and not having any speed problems
at all. The night before it was due, I rewrote the whole thing in Java and got
it to finish running (I handed in a day late, but at least I had something to
hand in.)

I'm sure there are tricks to make things faster in Python. (For example, I
later figured out a method I was calling was running all Python code, and if I
had called a different method, it would have dropped directly into the fast C
code.) But with Java, I didn't have to think about performance. It was just
fast.

Java almost certainly has a library for anything you might possibly want to
do. "But," I hear you say, "that means I have to write my program in...Java!
_shudder_ "

And I empathize with you. Which is why now I'm doing a lot of experimenting
with Clojure. Fast as Java, because it compiles to the JVM (as long as you
follow as few guidelines.) Access to any Java library with no extra effort on
your part. (One of my favorite moments on one of Rich Hickey's Clojure video
is where he shows a macro that makes Java calls requiring FEWER parentheses
than Java. He was pretty excited about that.) I found a Java open source
matrix library that, while not nearly as pretty as Python, got the job done.

So, that's my totally radical recommendation. Clojure + whatever Java
libraries you need to get your work done.

~~~
jacobolus
> _However, no matter how much I tried to push as much as possible down into
> the matrix libraries implemented in C/C++, the surrounding Python code
> slowed everything down. I was having trouble getting everything to finish in
> time to hand in my homework by the deadline._

Did you profile your code? What was the slow bit?

~~~
jimbokun
Sorry, it's just been too long now to remember the details. I think it was the
EM (expectation maximization) algorithm, though, which is hard to reduce
entirely to matrix operations, at least for me it was.

Any good implementations for EM in Python (plus bridged libraries) out there?

------
jonmc12
Here is one take on Python vs Matlab:
[http://vnoel.wordpress.com/2008/05/03/bye-matlab-hello-
pytho...](http://vnoel.wordpress.com/2008/05/03/bye-matlab-hello-python-
thanks-sage/)

Also, check out Enthought's distribution:
<http://www.enthought.com/products/epd.php>

~~~
breck
That first link was really informative. Thanks.

------
wesm
I've had a lot of luck building scientific / research tools in Python / NumPy
with Matplotlib. Integrating Fortran / C / C++ code is extremely easy as well,
things like Cython / Pyrex also ease that quite a bit as well.

My biggest complaint about Matlab (besides the licensing) is that it's just a
horrendously bad programming language (if you can call it a language at all).
Any self-respecting hacker deserves better.

Matlab you have to buy a toolbox for everything (e.g. SQL database
interaction). There is not (for most applications) you could want to do that
Matlab can do and Python + NumPy can't.

~~~
yummyfajitas
Just thought I'd mention that `ipython -pylab` exists. It's a fantastic python
repl (tab completion, all that nice stuff) which integrates nicely with
matplotlib/numpy.

Matlab does have one advantage, though: scientific library support. Need
neural networks, TV based image processing algorithms or something else
similarly specialized? Matlab wins.

~~~
schtog
Yes, libs are my biggest concern but Python has neural net libs, don't know
the quality though. Also very good image processing libs but perhaps not one
for TV-stuff.

------
miloshh
If you're affiliated with a university, it's very likely that a Matlab license
is already available to you. I do research in graphics, and I find it
extremely productive to test ideas in Matlab. you can hardly find a better
time-saving procedure than testing an idea in Matlab and knowing in 10 minutes
that it doesn't work, instead of spending a week implementing it in C++. Once
you find what works, re-implementation in C++ is a breeze.

------
tlb
I've used both and I greatly prefer python+scipy+matplotlib. Because it's a
real programming language, unlike Matlab's clunky Basic-like thing. So if
matplotlib doesn't do what you need, you can extend it or just write something
custom in GTK+Cairo. The depth and stability of scipy & matplotlib is very
impressive.

------
mwexler
Depending on your bent, you may find R (<http://www.r-project.org/>) with
Python (<http://www.omegahat.org/RSPython/>) a very powerful combination. More
statistical than some of the others you mention, and sadly constrained by
memory unless you play some games, you will probably find that most cutting
edge stats code is available for R. Data mining and AI folks also use R, but
as you point out, other matrix and functional languages may fit your specific
approach.

------
pskomoroch
Numpy/Scipy syntax is very close to matlab, but python is a lot more powerful.
I port matlab code over to python pretty frequently.

A toy example is here along links to other comparisons:
[http://www.datawrangling.com/python-montage-code-for-
display...](http://www.datawrangling.com/python-montage-code-for-displaying-
arrays)

------
koraybalci
Most universities have studen license servers for Matlab afaik. And, Matlab is
really really, annoyingly powerful. You've got almost anything to try your
ideas, implement an academic paper. But, it is slow (execution time).

Prototype with Matlab, implement with C++. That's what I usually prefer. I am
not a Python expert, just learning it. but I think it will be slower than C,
and less powerful than Matlab, from my perspective of working.

~~~
wesm
I think you'd be surprised at how powerful NumPy is. Try timing multiplying
large matrices (I can tell you who wins, and not by a small margin)

One downside is you _do_ have to poke around for the right functions, whereas
in Matlab you have all the functions sitting in your (always global)
namespace.

~~~
dangoldin
It seems that NumPy is the equivalent of Perl's PDL. PDL is some inline C code
so you get the benefit of both languages.

------
zentux
In my opinion,"Python+Scipy+Matplotlib" can replace the "equal" libs in
MATLAB. I had the same problem with python when I decided to develop a face
detection software by useing Neural Networks ! In that period of time, there
were no complete N.N. lib for python. I don't think that there is any
comprehensive one too. long story short, if you want to stick to some small
libs, that mixture is OK. But if you want to develop your field (like blending
NN and DIP) you will fall in trouble .... As a solution, you can mix python
and libs like OpenCV for computer vision and image processing . Check pyOpencv
too :)

------
tlb
The other big problem with Matlab is that because it's licensed, you can't
just do what you like with it. I've got several servers that I like to be able
to crunch numbers on, and that would be not only expensive but incredibly
inconvenient to maintain all those licenses of Matlab.

------
lutorm
I don't like the looks of the pylab graphs, I wouldn't put them in a paper.
However, there's also PyX (<http://pyx.sourceforge.net/index.html>) which is a
little more verbose in usage but makes very nice "latexy" plots that go well
along with a latexed paper. And it can do some really advanced stuff, too,
that I wouldn't even know how to begin thinking about doing in pylab.

------
yters
Here is a benchmark comparing R, Octave, Matlab, S-PLUS, and a few other
common mathematical programming languages.

<http://www.sciviews.org/benchmark/index.html>

Suprisingly, R is one of the best in terms of speed, comparable to Matlab
(Matlab is pretty fast if you vectorize your code). Plus, if you like
functional programming, the R language is based on Scheme.

------
huangnankun
I find that matlab is a great program to try out new ideas and algorithms when
you are just starting out. The plotting library is also very easy to use and
there are many helpful built in functions for matrix, linear algebra
manipulations. However the language itself is a horror to code in, its based
on fortran. I still really hate the 1 based array indexing after using it for
years.

~~~
lutorm
That sounds like me about IDL... ;-)

------
ninjaa
integer division JUST KILLS Python for scientific computing, it introduces the
most disgusting silent errors everywhere. And the visualization functions
generally suck compared to the handy plot() of Matlab.

Still, Py handily beats other free alternatives to Matlab

~~~
wstein
Two comments about this:

1\. In Python 3.0, integer division will return a float, e.g., 1/3 will be
0.3333... At Scipy 2006, Guido explicitly stated in his keynote talk that the
design choice he made in Python (i.e., that n/m is floor(n/m)) was a mistake.

2\. In Sage (<http://sagemath.org>), which is built on Python, we do some very
minimal preparsing of input, so that 1/3 is the _exact_ rational number 1/3
(instead of Python's stupid 1/3 == 0). We also replace, e.g., 2^3 by 2 __3\.
Sage is does a lot of exact symbolic and high precision arithmetic, so 1/3
staying the rational 1/3 makes sense as the default (though one can easily
change this).

Disclaimer: I started the Sage project.

------
Predictor
Tastes vary (as the mix of comments here will attest), but having sampled a
variety of development and analysis tools, I have settled on MATLAB as my tool
of choice. Part of my reasoning can be found in the Nov-08-2006 posting to my
Weblog, "Why MATLAB for Data Mining?":

[http://matlabdatamining.blogspot.com/2006/11/why-matlab-
for-...](http://matlabdatamining.blogspot.com/2006/11/why-matlab-for-data-
mining.html)

~~~
elninyo
I am a climate scientist and Matlab offers a lot to me : great matrix syntax,
very fast algorithms (SVD, matrix inversion, FFT), large mindshare (cf Central
File Exchange), and advanced toolboxes for statistics, spectral analysis and
the like. Not being a programmer by birth, I couldn't care less about the fact
that it's not a general purpose language ; it is very adequate for my own
purpose.

BUT... it is proprietary, pretty expensive, and at the moment I am considering
problems that require parallel computations on dozens of CPU. Matlab has a
Parallel Computation Toolbox that is a joke (<= 8 CPUS at a time, monopolizing
the same number of licenses), and that is why I am considering Python.

Does NumPy enable parallel computations ? Is it reasonably easy to translate
Matlab code into NumPy ? Is there a good library of linear algebra routines
(like SVD, eigendecompositions, inversions, LU, Cholesky decompositions,
etc..) ? A large user community with a comprehensive archive ?

------
newt0311
Well... matplotlib (and scipy to a lesser extent) have atrocious design and
documentation from a programming perspective. On the other hand, Matlab is
worse in that regard and Python is an excellent general purpose programming
language. Unless you plan on using one of the more obscure libraries that come
with Matlab, I would advise the python combo.

