
Scientific computing’s future: Can Haskell, Clojure, or Julia top Fortran? - x43b
http://arstechnica.com/science/2014/05/scientific-computings-future-can-any-coding-language-top-a-1950s-behemoth/1/
======
beloch
A lot of people coming from a CPSC background view Fortran as a dinosaur.
However, it's probably better viewed as a Crocodile in the sense that, even if
it is ancient, it has evolved to fill a specific niche very, very well. Fotran
allows complex math (especially linear algebra) to be expressed more compactly
than most low level languages (e.g. C) are capable of while still offering
excellent control over how hardware is utilized. Just as it's relatively easy
to dance around a crocodile out of the water, it shouldn't be too difficult
for languages like Python to challenge Fortran when ease of use matters but
performance is of a lesser concern. However, going into the muddy water to
wrestle with crocodiles is a different matter entirely! I would not be
surprised if people rehash this conversation with an entirely new set of
prospective croc-slayers another twenty years from now.

~~~
cabinpark
I think people forget that Fortran was designed to only do mathematics. I
don't know of any other languages that are designed just to do number
crunching and nothing else. Even though C is very low level, you can write
almost anything in it. Even MATLAB/Julia are able to do more general purpose
programming which Fortran doesn't really allow.

The more domain specific you are, the better you are at that one task. This is
where Fortran excels and why it won't be replaced any time soon.

~~~
walshemj
Well of course real programmers can do anything in Fortran :-)

I wrote billing systems for a major Telco which where mostly Fortran.

~~~
cabinpark
Absolutely! Has anyone written a server in Fortran?

------
kunstmord
I've dealt with a lot of legacy Fortran (and Pascal) code, and while the I
agree with the article in that Fortran has to go, Haskell and Clojure seem
VERY weird and pointless choices in the area of computing (see the comments on
the ars website, a lot of valid points there). But the biggest problems that
I've encountered in the Fortran code I dealt with were not exactly Fortran-
related: 1) Terrible variable naming (aaa = eee / ccc + 1. and so on) 2)
goto's, huge chunks of code with no structure (or, even worse, structured with
goto's) 3) disregard for numeric accuracy and overflow/underflow

And this, imo, has more to do with the way CS is taught to scientists - I had
a two-year course in C/C++, and we spent those two years writing all kinds of
trees, lists and stuff like that. Needless to say, that is good and all, but
that didn't exactly help us with writing scientific code later on - a lot of
people wrote terrible code to get their AVL trees working, for example, just
to get a passing grade. No one taught coding style, working with CVS, computer
arithmetics and such. The same goes for the MATLAB course I took.

In my opinion, it would've been a lot wiser to teach people scientific
computing using Python. It has tons of scientific libraries (a lot of people
that I know who are involved in scientific computations often neglect to re-
use code, use publicly available libraries; teaching people how to use third-
party packages/libraries is important), forces programmers to indent (the
amount of unindented C code I've dealth with makes me shudder), and makes them
realize what makes a program fast or slow. Besides, using
Numba/Cython/Theano/multiprocessing), it is possible to give a more or less
painless introduction to the world of parallel/optimized computing. And only
then start teaching C/C++/OpenMP/MPI/Fortran.

Now, I'm judging from my personal experience and from what I've seen at my
university (which is the second-biggest research university in the country),
there's a huge difference between how CS is taught to CS students and science
students (physics, mechanics). The knowledge that science students receive is
subpar, and, unfortunately, enough to start writing computational code.

~~~
cabinpark
I was going to write a long post but you really summarised my thoughts
exactly. Haskell and Clojure don't even make my list of scientific programming
languages.

My old university switched from C++ to Python for teaching the physicists,
which I think is a good move. For many scientists, Python has most of the
tools they need to do their research effectively and there is no need to go
into the more work-horse languages of C/C++/Fortran. If they need the more
work-horse languages of C/C++/Fortran, there are plenty of resources
available.

The supercomputing consortium at my old university were also making a big push
towards Python (over MATLAB) and, I think this is good, teaching scientists
how to write and maintain code. Software Carpentry regularly came through and
gave weekend sessions on tools like version control (which I have seen more
and more scientists use) and how to write readable code.

I think people are recognising the need to teach the basics of software
engineering to scientists and it is catching on in Canada based on what I've
seen from Compute Canada (the group in charge of all the academic
supercomputers in Canada). I think as the current generation, who are now
being taught to use these tools early on, becomes professors, we will see even
more of this. Unfortunately it will take time but it is changing.

------
jnbiche
"Scientific computing" is such a broad term that it's not terribly useful.

For numeric computing, Fortran, C, and C++ will likely remain at the top for
years to come.

For statistical and exploratory data analysis, R has long been king (and
closed-source tools before R), but Python is rapidly coming out on top here.
Clojure could challenge here, but it's far from having the popularity of R or
even Python right now.

For machine learning, Matlab along with its open-source analog Octave, have
long been de rigueur, but Python is rapidly gaining ground here, too. I think
here is where Julia is hoping to gain ground, at least initially.

So it's a bit odd to lump different areas of scientific computing together,
but even odder to neglect the one language that has a chance of topping more
than one of these areas. And I say that as someone who is moving from Python
to Go and Rust for a lot of my software (but still stay with Python for data
exploration).

Nonetheless, not a bad introduction to the languages in question.

------
gammarator
As the Ars comments make clear, with the exception of Julia none of these
languages has any chance of wide adoption under the broad umbrella of
"scientific computing."

For a defense of the numerical/scientific computing tradition of which FORTRAN
is the ne plus ultra, see this article:
[http://www.evanmiller.org/mathematical-
hacker.html](http://www.evanmiller.org/mathematical-hacker.html)

It's telling of the Ars author's lispy blinders that he gives recursive
examples for computing Fibonacci numbers. As the linked article makes clear,
this is ridiculous because there's a closed form solution:

""" long int fib(unsigned long int n) { return lround((pow(0.5 + 0.5 *
sqrt(5.0), n) - pow(0.5 - 0.5 * sqrt(5.0), n)) / sqrt(5.0)); }

No recursion (or looping) is required because an analytic solution has been
available since the 17th century."""

~~~
walshemj
Julia appears to be to slow for a lot of large real-world uses - this is where
the power cost of running the cluster becomes important.

There is a job posting outstanding near me for over a year looking for some
one to port CFD Fortran to C++ - I have never had to stifle giggles when
talking to a recruiter before.

They would better of training their new staff in Fortran after all that's what
I did at BHRA.

I suspect its ARA out at twinwoods - one would hope my old employer isn't so
silly.

~~~
KenoFischer
> Julia appears to be to slow for a lot of large real-world uses

I'm curious what kind of application you are referring to and where you get
that impression. We are always looking for examples where we don't get good
performance, so we can optimize, so I'd love to hear your experiences.

~~~
walshemj
well I was going off stack overflow

[http://stackoverflow.com/questions/20613817/julia-julia-
lang...](http://stackoverflow.com/questions/20613817/julia-julia-lang-
performance-compared-to-fortran-and-python) where even with hand tuning the
Julia code its a lot slower than Fortran.

I as thinking of large scale CFD work ie simulating two-phase flow in a
nuclear reactor or a simulation of airflow over aero systems - which is where
I suspect that job I mentioned is based.

Must have a look at Julia after I have finished teaching myself Java (Spit!)
for hadoop

~~~
KenoFischer
Ok, looking at the code is seems like you could probably get another 2x by
switching to views rather than slices which will be the default sometime in
the julia 0.4 timeframe. That would probably put julia perforamance at
1.2-1.5x Fortran, which while there of course is always more optimizations to
be done is at least pretty good.

------
haddr
I was missing R and Python in the article. And who says we need a king? Maybe
the rich ecosystem of many coexisting languages is better?

~~~
bayesianhorse
In my experience there is a substantial subset of programmers/computer
scientists who don't even consider any dynamically typed language as worth
their time. In their view, these are at most for beginners or small projects.
This bias is like a huge blind spot...

~~~
reitzensteinm
I think there's also an equal but opposite bias, where programmers disregard
static typing by falsely equating it to what's in Java and C#.

In both cases, I think the blub paradox is solidly at work.

~~~
bayesianhorse
No, actually, proponents of Python or R don't generally claim that all
statically typed languages are useless in practice. In fact I don't remember
any evidence of that.

Also there doesn't seem to be any evidence that dynamic typing is detrimental
to programming.

And I don't see all non-computer-science-scientist-programmers learning
Haskell any time soon.

------
jey
Julia is definitely going to be the winner. I've been using it for a few weeks
and it is just so _natural_ for scientific/technical computing and effectively
covers a wide range of use cases. The type system and syntax allow for code to
be expressed in terms of the domain's natural objects and notation, without
having to do awkward translations between the math and the code. It has the
mathematical features of MATLAB without compromising on speed (Python) or
expressivity (C and FORTRAN).

~~~
quanticle
Julia will really take off when some of the libraries and toolkits available
on MATLAB get ported to it. From what I've heard, very few people genuinely
like MATLAB. It's just that MATLAB has toolkits with optimized functions for
almost everything under the sun, so if you need to get results quickly, you're
better off sucking it up and using a MATLAB toolkit that does half the work
for you than reimplementing everything from scratch in a less insane language.

AFAIK, this is why Python has caught on so quickly in numerical computing
circles. NumPy isn't up to the level of MATLAB's toolkits in terms of having
functions for specialized applications, but it is a comprehensive numerical
computation library with fast C implementations of a wide range of common
functions.

That's the core lesson that I think the article misses. What matters isn't the
language itself. What matters is the collection of libraries available for
that language. Research and scientific computing is not like typical software
development. In "normal" software development, the maintenance costs of a
particular piece of code will easily swamp the cost of writing the code, so it
makes sense to write the code in a more maintainable language. But for a
research project, once the paper is written, it's fair to say that the code
will never be looked at again. The situation is changing, slowly, as things
like software carpentry and more data-driven research projects spread modern
software engineering principles into the research computation community. But,
by and large, research computing is still defined by one-off projects where
the speed of initial implementation (which directly affects the time to
publication) matters a lot more than the long-term maintainability of the
codebase. It's this tradeoff, which is radically different from commercial
software, that explains the persistence of FORTRAN and MATLAB in scientific
computing.

~~~
jey
Julia has a sophisticated and performant mechanism for calling C libraries. It
ships with common packages like SuiteSparse already built into the standard
library's sparse matrix types.

I do agree that Julia isn't ready for prime time "cookbook" style uses, and is
more useful to those writing computational routines.

------
Xcelerate
I do molecular dynamics simulations using LAMMPS on HPC systems. LAMMPS is
written in C++. I'm not normally a fan of object-oriented languages, but this
seems to work well for a system where you have an abstract base class (like an
atomic pairwise potential) that allows users to easily derive their own
potential class from it.

I wouldn't say LAMMPS is super-optimized for a particular application compared
to some other MD codes, but it is very good for a wide-variety of situations,
kind of like C++. Just guessing, I'd say LAMMPS is easily within a factor of
2-3x of most hand-tuned assembly codes, but the generality of it really
outweighs the performance penalty.

Personally, in terms of programming languages, Julia is really growing on me.
I've been using it for performance-intensive, single-threaded programs and it
works great. I'm actually considering experimenting with it for some of my web
application projects (currently using Node.js for those) just because of how
much I like the language design.

------
bayesianhorse
For quite a few cases, the "symbolic" route might be the future. In Python for
example there is sympy, which is mostly a computer aided algebra toolkit, but
it can translate formulas to Fortran, Theano and Javascript.

Theano on the other hand is also a symbolic toolkit designed to make linear
algebra super-fast. It takes a symbolic representation of the computations,
and then compiles it into C code or Cuda for GPUs.

Theano has been used extensively in deep learning, but it has other
applications as well.

PyMC is a library implementing Bayesian inference through monte carlo methods.
Version 3 implements samplers based on Theano. The advantage here is that
Theano can automatically deduce derivatives, which allows for more
sophisticated algorithms and better performance.

------
tom_jones
I personally prefer Scala.

It supports functional programming (combining it nicely with object oriented
programming), immutability, tail recursion, lazy evaluation (you have to
specify what will be evaluated lazily), collections with parallel processing
support, actor based processing, pattern matching and of course a REPL.

But none of that is mandatory, for example when needed, you can also use
mutable variables and collections.

It runs on the JVM, and you can mix Java code and libraries with Scala code
and libraries. And the ecosystem of Java libraries is huge.

~~~
warmfuzzykitten
Yes, Scala certainly has all the latest programming language bells and
whistles, but - leaving aside the many highly tuned libraries and 60 years of
compiler experience on every imaginable hardware configuration - it doesn't
have the single attribute that keeps Fortran on top: runs numeric codes fast.

~~~
frowaway001
A trait shared with the other languages mentioned here. It will be hard to
beat Fortran, although Scala might be closer than the alternatives.

~~~
tormeh
It would be fun to see what Scala's performance is like with long-running
simulations. Should be ideal for hotspot optimization. Isn't Java just as
often used for HPC as C?

