
Homogenization of scientific computing – Python is eating other languages’ lunch - reactor
http://www.r-bloggers.com/the-homogenization-of-scientific-computing-or-why-python-is-steadily-eating-other-languages-lunch/
======
Xcelerate
I do scientific computing, and Python is one language I never actually got
around to learning for some reason. However, as a long-time hobby, I do have
an interest in programming languages so I like exploring things like Haskell,
Clojure, Lisp, etc.

One language I'm really excited about for scientific computing though is
Julia. From a language-design perspective, it's beautiful. It was actually
thought out rather than kludged together. I've been trying to gradually use it
more and more for my research, but the only problem I've found so far is the
large mental context-switch I make going from my usual languages to Julia.
It's hard to tell what Julia code will be the most performant because there's
many ways of doing the same task. I saw someone in the comments on this page
mention that you can hand-tune the LLVM generated output within the REPL
itself. I imagine this would be very useful if I can get around to learning it
(anyone know a good tutorial?)

~~~
xfax
The big question for me is whether Julia will be able to maintain its "purity"
as it gains adoption.

R probably started out "beautiful" and "thought out" but has lost that edge
with years of community driven development. It's also what make it so damn
useful -- you can pretty much find anything on CRAN, often multiple
implementations of it.

~~~
mbq
R is actually one of the most pure languages out there; it basically says "I
have vectors; they can have missing values, be nested, and can have other
vectors as attributes. And I have functions with lexical scoping. Now go and
build the rest as you like." So people did this, one better, one worse -- but
the core and beautiful stuff here is that all those approaches will work
together and just do the job.

------
gajomi
>The combination of NumPy/SciPy, MatPlotLib, pandas and statmodels had
effectively replaced R for me, and I hadn’t even noticed

I am surprised that he "hadn't noticed" the switch from plots in R to
MatPlotLib. I am a long time MatPlotLib user and I STILL find myself noticing
all the time just how painful is can be (irregular data model, weird function
names, the insanity that it the documentation). Then I go to the page and feel
guilty because the guy who started the project (which I am using for free)
died and all the finished plots look so beautiful.

------
frik
For scientific computing with some R or Matlab/Octave background, I suggest
the new language Julia: [http://julialang.org/](http://julialang.org/)

Scientific computing Python community commonly use version 2.x there is the
migration step to version 3.x ahead..

~~~
sentenza
For me, Python has everything I need, among which there are many things that R
or Matlab have not. If I should summarize what makes Python so suited for the
things I do (and did when I was still doing research) it's the following:

Python is an easy to use scripting language that can be integrated with
number-crunching C/C++ code and for which a scientific standard library _with
a vibrant community_ exists.

Also, I haven't written a piece of 2.x code in half a year, which is of course
only possible because scipy and matplotlib are 3.x ready.

~~~
dangayle
>> Also, I haven't written a piece of 2.x code in half a year, which is of
course only possible because scipy and matplotlib are 3.x ready.

Whoa. I didn't know people like you existed in the wild. Someone needs to
contact the authorities and let them know that you exist.

------
zwieback
Too bad the first part of the post title was edited out of the HN title. I
think outside of scientific computing the picture is a little more nuanced.

~~~
StefanKarpinski
Even inside of scientific computing, the picture is quite a bit more nuanced.
This post is basically about a the author's personal migration to Python as a
user of other people's scientific programming packages. In doing interviews
with people inside of companies, there's fairly little actual use of Python
for scientific computing – lots of Python for data preparation, but R and
Matlab (not to mention Simulink) still dominate for the actual scientific
part. And of course, there's the bizarre blind spot that the SciPy community
has to the fact that they are really doing _scientific computing in C_ –
literally every single package you use that's scalable and performant is
actually written in C. This is true of R and Matlab too, of course.

~~~
hharrison
Yeah, I'm a Python convert like the author, though coming mostly from Matlab
rather than R, and everyone in my field reacts with surprise when I tell them
I prefer Python. They're open-minded, and I'm hoping to convert a few myself,
but I don't think the mass migration has happened yet.

Regarding your second comment- you're correct of course, but what makes this a
"blind spot"? After all, if the user is writing code in Python, they're doing
scientific computing in Python, regardless of what the Python library calls
behind the scenes. In my experience, a lot of people doing scientific
computing--particularly those more interested in the science than the
computing--could care less about what's going on behind the curtain. Any
moment they have to think about implementation is a moment not thinking about
science and therefore a waste of time. So it's actually a _benefit_ for an
ecosystem to hide the underlying mechanics--calling it a "bizarre blind spot"
seems to imply they're doing something wrong.

~~~
StefanKarpinski
I call it a "bizarre blind spot" because it seems like there's a silent
consensus to never talk about this basic fact. It's a bit surreal attending
SciPy and hearing all of these people talking about scientific computing _in
Python_ when almost every single person in the room spends the vast majority
of their time and energy writing C code.

I disagree that the separation between implementation and user-land that's
enforced by two-language designs like C/Python or C/R is socially beneficial:

1\. If your high-level code doesn't perform fast enough (or isn't memory
efficient enough), you're basically stuck. You either live with it or you have
to port your code to a low-level language. Not impossible, but not ideal
either.

2\. When there are problems with some package, most users are not in a
position to identify or fix those problems – because of the language boundary.
If the implementation language and the user language are the same, anyone who
encounters a problem can easily see what's wrong and fix it.

3\. Basically a corollary of 2, but having the implementation language and
user language be the same is great for "suckering" users into becoming
developers. In other words, this isn't just a one-time benefit: as users use
the high-level language, they automatically become more and more qualified to
contribute to the ecosystem itself. It is crucial to understand that this does
not happen in Python. You can use NumPy until the cows come home and you will
be no more qualified to contribute to its internals than you were when you
started.

These benefits aren't just hypothetical – this is what is actively happening
with Julia, where almost all of its high-performance packages are written in
Julia. In fact, I never realized just how important these social effects where
until experiencing it first hand. The author of the article wrote:

> It turns out that the benefits of doing all of your development and analysis
> in one language are quite substantial.

It turns out that it is even more beneficial to not only do development and
analysis, but also build libraries in one language. Of course, Julia has a lot
of catching up to do, but it's hard to not see that the author's own logic
implies that it eventually will catch up and surpass two-language systems for
scientific computing.

~~~
dded
Thanks for this reply. I thought the "bizarre blind spot" comment was some
sort of (absurd) thought that numpy users were unaware that C was being used
under the hood.

> it eventually will catch up and surpass two-language systems for scientific
> computing.

Assuming that, like hardware engineers, scientists have a fair bit of general-
purpose scripting to do, Julia will itself be part of a different kind of two-
language solution unless it is up-to-snuff w.r.t. said general-purpose
scripting. This implies libraries and good interaction with OS utilities. Any
thoughts on whether or not this will be an issue with Julia?

~~~
ihnorton
In addition to what jamesjporter mentioned, Julia has IMHO a _very_ nice,
clean shell interaction paradigm for this very use case ("glue"):

[http://julialang.org/blog/2012/03/shelling-out-
sucks/](http://julialang.org/blog/2012/03/shelling-out-sucks/)

One of the best examples of this is the package manager's concise wrapping of
git CLI commands:

[https://github.com/JuliaLang/julia/blob/master/base/pkg/git....](https://github.com/JuliaLang/julia/blob/master/base/pkg/git.jl)

(aside: there has been some discussing of moving to libgit2 for performance
reasons)

Until recently, the startup time somewhat precluded use for general scripting.
However, on the trunk the system image is statically compiled for fast
startup, so scripting usage is viable.

~~~
pygy_
WRT shell integration, a follow up post details the safe (no code injection)
and straightforward Julia implementation.

[http://julialang.org/blog/2013/04/put-this-in-your-
pipe/](http://julialang.org/blog/2013/04/put-this-in-your-pipe/)

------
JonSkeptic
I know that I use python because of how easy it is to code. I can focus
wholly, totally on the logic of my code without ever worrying about if I
misplaced a semi-colon or left out some weird punctuation.

Python frees me to code and not worry about things that get in the way of
coding. That's why it's eating other language's lunches, the freedom is almost
intoxicating.

~~~
area51org
It's ironic that you say that, given that if you don't get the whitespace
correct, you'll have a syntax error. That's one of the big reason Python rubs
me the wrong way: white space is semantic.

~~~
pyre
Getting the indentation right should be the least of your worries if you have
a good editor (and don't do something like mix spaces and tabs, which I think
everyone is in general agreement with across _all_ languages).

When was the last time that you manually typed out 4 (or 2, or 8, etc) spaces
to indent a line of code vs. just hitting tab and letting the editor handle
inserting those spaces (or the editor automatically indenting when you hit
enter on the previous line)?

As an aside, all of the people that I've met in person that get red in the
face over the idea of white space being semantic are the sort of people that
write code like this:

    
    
      sub function1 { return map { $_[2]->do_something($_) } @{shift->(@_)[0]} }
    

I'm sorry, but I can't get worked up about not being able to write code like
that.

Note: the above code is a reasonable approximation of _actual_ code I
encountered by an actual person that would get visibly upset about Python's
semantic white space.

P.S. The two '$_'s in the map block actually refer to two different variables,
and this is one of the reasons I remember that bit of code. It makes no sense
to mix usage like that because it becomes confusing.

~~~
simias
> Getting the indentation right should be the least of your worries if you
> have a good editor

I never understood that. The whole problem for me is that the indentation
being the only thing denoting blocks the editor can't know for sure how things
should be indented, since it's not simply cosmetic.

I haven't written a whole lot of Python but how do you even refactor python
code? In C I can just copy paste a block of code from anywhere to anywhere (no
matter the coding style in the source and destination file and the level of
indentation) and then hit C-M-\ in emacs and have it reindent everything
properly. In Python you have to make sure that everything is at the level of
indentation it belongs to. If you refactor huge chunks of code it's easy to
miss one fubar tab and have code subtly broken and introduce weird
regressions.

Also, regarding the OP and "focusing only on your code", I think we all feel
that way about the language we're the most familiar with. For me that's C and
I can't say I've had a "missing semicolon" compilation error in months of
heavy use. Once you're used to the syntax it becomes automatic.

~~~
bjourne
Indeed that is a problem when you are copy-pasting huge blocks of code. In
deeply nested code it can be difficult to determine whether the nesting should
be say 28 or 32 spaces. In practice, most people shy away from writing such
code because to many levels of nesting is hard to follow. People also prefer
to write atomic 5-15 line functions in which keeping track of the nesting
levels is trivial.

Many C# and Java-heads complain that Python lacks support for auto-completion.
Which is true, the language makes it so you can't have as sophisticated auto-
completion as is available for the aforementioned languages in Visual Studio
and Eclipse. But it's not so bad because Python developers are trained to
prefer shorter names instead of OverlyLongJavaNames such as "getattrs" instead
of "GetAllAttributes".

Btw have you noticed that on this site, the only thing that indicates how the
comment threads are structured is how the individual comments are indented?

~~~
Aloisius
_Many C# and Java-heads complain that Python lacks support for auto-
completion. Which is true_

Isn't that an IDE issue and not a language issue? I have no problem with the
auto-completion in ipython for instance, though even ipython notebook is only
useful for writing simple amounts of code. PyDev though works well too for
larger projects albeit a bit sluggishly.

~~~
bjourne
No, Pythons dynamicness means it is in general impossible to find all
available completions:

    
    
        m = type('', (), {})()
        setattr(m, 'foo', 123)
        m.f<tab>
    

Without actually running the code (which is unsafe), no editor could at that
point figure out what the completion should be.

------
jofer
I do all of my scientific computing in python these days.

However, I think it's interesting to compare popularity using stackoverflow
(which isn't a great metric, as most scientists aren't aware that it exists):

    
    
      Semi-useless Stackoverflow Popularity Metric
      --------------------------------------------
    
      Searching for questions tagged "[r]":
        * 45,119 questions
    
      Searching for questions tagged "[matlab] or [simulink]":
        * 27,044 questions
    
      Searching for questions tagged "[numpy] or [scipy] or [pandas] or [matplotlib]":
        * 18,745 questions
    
      Searching for questions tagged "[julia-lang]":
        * 95 questions
    
      Searching for questions tagged "[python]":
        * 255,603
    

Sure, python isn't as widely used for scientific computing as R or matlab (as
evidenced by the third item above), but there's a lot to be said for using a
very widely-used language for scientific computing. This is doubly true once
you branch out from the "core" scientific code. Building a deployable desktop
application is a lot easier in python than in matlab (Done it, partly through
java. Don't want to again.) or R (Never tried. Might be easier than I think.).

~~~
chillingeffect
Matlab has its own community.

There have been 25,000 answers in the last 30 days. [1].

It has 100,000 users. [2]

That stackoverflow metric is not great. It is semi-useless.

[1]
[http://www.mathworks.com/matlabcentral/answers/activities](http://www.mathworks.com/matlabcentral/answers/activities)
[2]
[http://www.mathworks.com/matlabcentral/about/answers/](http://www.mathworks.com/matlabcentral/about/answers/)

~~~
jofer
And thus the semi-useless title :)

I'm quite aware of MatlabCentral, and it's a _very_ active community.
Similarly, there are other forums for scipy, etc (mostly mailing lists).
Traditionally, this is where the majority of questions were answered in the
scientific python community, though lately stackoverflow has gained
popularity.

I wasn't claiming it was a complete sample. I used it as an unbiased random
sample, but it's obviously not completely unbiased, either.

I do think it's fair to say that usage python as a whole (of which the
scientific python community is a very small part) is larger than matlab usage
as a whole. That alone is not a good reason to choose a tool, but it does have
some advantages. That's the point I was trying to get at.

------
fit2rule
I find Lua more interesting than Python. It has all the simplicity, all of the
power, none of the indentation, and is quite a nice portable tool.

That said, I do wonder at times what it is about Lua that makes so many people
not-interested in it, when .. from my naive point of view .. its an almost
perfect language for rapid development. I don't have that feeling about
Python, quite so much ..

~~~
gaius
For me, Tcl does what Lua does, but much better. In fact given Tcl, it's hard
to see why Lua was created.

~~~
dkersten
_given Tcl, it 's hard to see why Lua was created_

Because:

    
    
        From 1977 until 1992, Brazil had a policy of strong trade barriers (called a market reserve) for computer hardware and software. In that atmosphere, Tecgraf's clients could not afford, either politically or financially, to buy customized software from abroad. Those reasons led Tecgraf to implement the basic tools it needed from scratch.
    

[https://en.wikipedia.org/wiki/Lua_(programming_language)#His...](https://en.wikipedia.org/wiki/Lua_\(programming_language\)#History)

~~~
gaius
Very interesting, I did not know that - esp. since Tcl was free!

------
Fomite
As much as I like scikit-learn and pandas, Python likely won't be replacing my
R code for quite some time, and I'll continue to hop between the two of then.

R is, first and foremost, a language for statistical analysis, and that's
really where it shines. Python is getting better (it used to be "you want to
do... _what_?"), but it doesn't have nearly the package infrastructure R does
for advanced statistics. It is, for most _statistical computing_ tasks not
even on the radar for a number of my colleagues.

------
julienchastang
I mostly agree with this article, but we are not there yet. I work with
scientists who love the IPython Notebook technology. Some claim the IP[y]:
Notebook to be the best thing since the Mosaic web browser and the most
important development in scientific computing in a decade. I tend to agree, it
is a revolutionary technology and the idea of executable papers is
tantalizing. But there are also big problems. In particular, setting up a
Python environment with all the necessary libraries is a real pain in the neck
even with technologies like pip. For a fee, companies like Enthought are
making good progress at taking the pain away (though what happens when you
have awkward custom dependencies?). Cloud solutions for preconfigured IP[y]:
Notebook servers is another exciting possibility, but not ideal if you work
with big data where you want your data local to your Python environment.

Also, as I understand, taking advantage of multicore parallelism is not
trivial because of the Python Global Interpreter Lock. I have also worked in
JVM environments where parallel computing is becoming significantly easier and
I don't see that happening in Python anytime soon. I would love to be proven
wrong, of course.

~~~
dkersten
Re: setting up environment, I love Anaconda for that reason. Wget the
installer, run it, all done - you have a fully featured Python environment
ready to go, including Numpy, Scipy, Scikit-learn and much more. Running
IPython and the IPython Notebook is then trivial.

If you need anything else, you can use their own Conda package manager, or you
can just use pip as usual.

~~~
julienchastang
@dkersten What happens in the scenario where you have other Python
distributions on your system? Does Anaconda keep things nicely
compartmentalized like a virtualenv?

~~~
dkersten
Yes, Anaconda is completely self-contained (you can actually install it
anywhere you like, for example, your home directory) and does not interfere
with the rest of your system at all. You can also run virtualenv from Anaconda
too, if you want to have isolated Anaconda environments without installing it
multiple times (which you could also do, if you wanted..)

I like using it to make sure I have a very easy to install consistent
environment between the various computers I use (including EC2 instances).

------
ghaff
Folks may be interested in this piece by Stephen O'Grady of RedMonk:
[http://redmonk.com/sogrady/2013/11/26/python-r/](http://redmonk.com/sogrady/2013/11/26/python-r/)

He looks at the contention that Python is killing R--based on various data
sources--and ultimately concludes:

"While the original argument is certainly defensible, then, I find it
ultimately unpersuasive. The evidence isn’t there, yet at least, to convince
me that R is being replaced by Python on a volume basis. With key packages
like ggplot2 being ported, however, it will be interesting to watch for any
future shift."

------
norswap
Python now has useful libraries, let's all rejoice!

(And the part about it eating other's language lunch is unsubstantiated.)

------
mrfusion
My only question mark from this is matplotlib. I tried it five or six years
ago and it seemed clunky to use and install. And worst I couldn't seem to just
throw up a plot, I recall there being a lot of settings required. And the
plots didn't look good by default you had to fool with fonts, font sizes, etc.

Does anyone know if it's improved a lot since then? Otherwise I'm not seeing
how it could hold a candle to R's plotting abilities and ease of use.

~~~
bsg75
R may still have an advantage when it comes to plotting simplicity.

On the Python side, matplotlib is still a bit of a pain, but has improved.

Also look at ggplot.py (alpha-ish?) and Bokeh from ContinuumIO

~~~
plafl
What is simpler than this?

    
    
      x = arange(0, 2*pi, 1e-2)
      plot(x, sin(x))

~~~
leephillips
This:

    
    
      set xrange [0:2*pi]
      plot sin(x)
    

EDIT: Yes, this is gnuplot, not R.

~~~
plafl
However, that is gnuplot (I think), not R code. I have not been clear with my
question, since I was talking about this parent comment:

 _R may still have an advantage when it comes to plotting simplicity_

------
cwmma
In GIS at least I'm noticing a divided migration with one camp standardizing
to python and the other to JavaScript.

------
jliechti1
I have a Python background and recently signed for the Coursera course on R
that just started
([https://www.coursera.org/course/compdata](https://www.coursera.org/course/compdata))
because I wanted to get a small taste of R and see how it differed from
Python's scientific computing stack.

So right now I'm not far enough in the learning curve to see all the benefits
R provides. Is it worth investing time in R now if I'm already pretty familiar
with a good amount of the Python ecosystem? Or, would it make more sense to
continue on in Python?

~~~
micro_cam
If you are serious about data analysis you should probably at least read R
(and maybe Matlab) as lots of algorithms were released and only exist in one
of those languages.

You could get by with a more general statistics course that happened to use R.

------
sprash
Title should have been: "Homogenization of scientific computing – Python 2.x
is eating other languages’ lunch"

Nobody cares about Python 3 in the scientific community.

------
guildR
RCloud might be useful to look at. It implements Ipython Notebooks for R, with
added collaboration based on a github backend.

------
sushirain
Please add (2013) to the title, since it's not new.

------
plg
Python so slow!!

~~~
baldfat
What does this have to do with Scientific Computing? I don't think there is
anyone would say that in the realm of Scientific Computing there is a problem
with slow in python.

~~~
0x001E84EE
I would argue that many Scientific Computing problems involve large amounts of
data and/or computation.

~~~
baldfat
Have you ever worked with MatLab or other "Scientific Computing Languages"?
They are like 10 times slower then Python.

I am guessing you are against dynamic languages?

