
Why physicists still use Fortran (2015) - dralley
http://moreisdifferent.com/2015/07/16/why-physicsts-still-use-fortran/
======
thicknavyrain
"Professors usually have this legacy code on hand (often code they wrote
themselves decades ago) and pass this code on to their students. This saves
their students time, and also takes uncertainty out of the debugging process."

This is so true. I'm a PhD student in physics using Fortran for pretty much
that reason. At the start of my PhD, in response to my supervisor telling me I
should learn Fortran to modify our current codebase, I asked if I could
rewrite what I'd be working on into C++ first, since I was already familiar
with it and wanted to bring future development into a more "modern" language.

His response was "You could do that and it would probably be enough to earn
your PhD, since it'll take you at least three years. But I suspect you'll want
to work on something else during that time".

He was right. I later learnt one of our "rival groups" attempted the same
thing and it took three phd students working full time for a year to rewrite
their code from fortran to C++.

~~~
ChuckMcM
Now you need to do a follow on study to see how much science the 'rival' group
does with a more modern codebase than your 'legacy' group does. Would you know
if there are enough examples of two groups who have diverged like this to get
meaningful (as in statistically significant) results on the cost benefit of
porting / not porting?

~~~
sampo
> Now you need to do a follow on study to see how much science the 'rival'
> group does with a more modern codebase than your 'legacy' group does.

I would guess that a C++ codebase written by PhD students, not by seasoned C++
experts, is more complicated and much slower to debug, than a corresponding
Fortran codebase.

~~~
adrianN
Well, that Fortran codebase probably also was written in large parts by PhD
students, not export Fortran programmers.

~~~
sampo
Fortran is a much simpler and safer language than C++. And faster to learn.
Fortran is even somewhat simpler than C or Java, whereas C++ is probably the
most complicated language in the known universe.

So especially in the hands of non-experts, Fortran should produce less bugs.

~~~
HelloNurse
C++ is complex. Perl is complicated.

------
smcdow
Speaking from a government contracting point of view: Nobody is going to pay
you to rewrite existing code that's already working. Nobody. The customer
doesn't give a flying shit about the implementation. He'd be happy with a box
of diodes as an implementation, as long as it worked and came in on time and
on budget.

When you're writing up your proposal for a contract or a grant, the theme
should always be that you're "adding capabilities" (which should be well-
defined and constrained) to the existing codebase. If you get the money, then
you've got carte blanche to rewrite to your heart's content - just don't tell
the customer that this is what you're doing. Just make sure that those new
capabilities indeed make it into the re-write and that you introduce no
regressions in the new code.

~~~
fnord123
People don't tend to write tests for their Fortran code so the assumption that
its already working and the numbers coming out are correct is a matter of
faith.

But yes, no one sees it this way.

~~~
gmueckl
Writing tests for the the kind of numerical code that FORTRAN is usually used
for is hard. Sometimes there is no direct way of testing it because if you
knew any of the results already you wouldn't need to run the simulation in the
first place. Quite often, the best that you can do is proper sanity checks
like conservation of energy and momentum or things like that.

~~~
fnord123
Yes, it's hard. But without an automated test suite checking the numbers
coming out then any change to the code could introduce numerical instability.

------
quadruplebond
As someone working on an Exascale project for electronic structure
calculations, I have a theory about the longevity of Fortran. It's the fact
that many of these codes were started years ago and the people who have the
credentials and ability to get funding for super computing projects learned on
Fortran and stayed with Fortan because they were scientist first and
programmers second.

Modern Fortran has many nice features in 2017, but the people that wanted
these features moved to C/C++ long before the features became available in
Fortran and those that are left using Fortran are usually scientist, not
programmers, and so don't care so much about these features. I think it is
largely the older generation that says they will never stop using Fortran, in
the survey mentioned in the article.

Just to suggest where the field is moving though. NwChem is a large successful
electronic structure package using Fortran. Its next gen version NwChemEx that
is being designed for exascale will exclusively be written in C++
([https://www.pnnl.gov/science/highlights/highlight.asp?id=441...](https://www.pnnl.gov/science/highlights/highlight.asp?id=4411)).

Also just from experience people who work in HPC mostly would rather be
writing C/C++, but use Fortran because they have to not because they want to.

~~~
sseagull
Yes, the electronic structure community is rapidly moving away from Fortran.

In my opinion, Fortran will fall behind as newer hardware, libraries, etc,
will drop Fortran support. Also, newer grad students are all much more
interested in C/C++/Python. I think this is in part because the newer
languages are widely used outside of science and therefore there is much more
documentation and tutorials/guides. Not to mention that skills in those
languages are transferrable to other areas (data science, machine learning,
etc).

As a side note: Wow someone working on the ECP on HN? I'm tangentially related
to the project (and just visited PNNL last week)

~~~
jabl
> Yes, the electronic structure community is rapidly moving away from Fortran.

Really? Seems to me that with very few exceptions (e.g. GPAW which is python/C
and nwchemEx which I've never heard about until the parent poster mentioned
it), electronic structure is pretty much a Fortran bastion.

(Source: I did a Phd doing mostly electronic structure calculations, graduated
~5 years ago)

~~~
sseagull
Well, it's moving, not moved yet.

New libraries are being written in C/C++, and maybe Python. This includes
libraries that should form the foundation of the QM community (matrix/tensor
and integral libraries). These are meant to take advantage of newer hardware
and libraries which themselves are written in C/C++, and often in a way that
is inaccessible from Fortran.

The old Fortran code will be around for a long time, but I don't know of any
large-scale, serious efforts to develop new packages or major new
functionality that are starting with Fortran.

(I'm not totally against Fortran - I just spend a week devloping in it. But I
still much prefer C++ and Python)

~~~
gnufx
I don't understand why the implementation language of low level libraries
should determine a high level language that uses them. In what way are C
libraries inaccessible from Fortran, given that it defines interoperability
with C?

I'm afraid you need a 10-, or preferably 20-, year perspective, not a week.

~~~
sseagull
I have been developing in Fortran for years. I just mentioned that as an
aside.

About C compatiblity: Many C libraries use pointers in their interfaces.
Interoperability is indeed defined by the Fortran 2003 standard, which I have
used several times to wrap existing C libraries. However, much of the existing
code is F90 only (some even F77...), and a vast majority of Fortran developers
in the field are not familiar with even modules and other F90 features, let
alone iso_c_binding in 2003.

Also, newer libraries tend to be C++ as well, which is more difficult (or at
least more awkward) to wrap.

~~~
quadruplebond
It’s partly the power of library authoring that is moving things I think. To
my knowlegdge most post lapack tensor algebra is all C/C++ also. I don’t know
Fortran but I am not sure if it has the same flexibility when it comes to
generic code and things like writing things like template expression math
codes.

Finally the fact that groups like Facebook and google are writing their
machine learning code in C++ shows that 1 they find it useful and 2 plenty
performant. This kinda became a response to the comment above yours sorry.

~~~
sseagull
If you don't mind, I'd like to maybe chat with you a bit and get your opinion
on some things (and maybe see if I've actually met you before). My (mainly
throwaway) email is ytterbium35 (at major email service run by google).

If you don't feel like it, feel free to ignore.

------
cubano
I have a somewhat amusing FORTRAN story from my undergrad days at the Florida
Institute of Technology...

So my first programming class ever was a Numerical Analysis class taught at
FIT, and to be honest, this was my first exposure to a "real" editor (vi on a
PDP11 in this case)..up to then it was all MS-BASIC with that wonky line
editor and, of course, goto-s and line numbers.

At the end of the first class (8am ugh), the instructor announced that anyone
looking to get extra credit and perhaps _skip having to come to early class_ ,
to talk to him after class. Of course that sounded good to me, so I went to
see him and he said "ok...if you can write me a bowling league manager in 10
weeks you will get an A and not have to ever come to class."

Ok...hell yes in fact! This sounded a ton more interesting then sitting around
a silly class _talking_ about programming. He gave me a spec sheet and away I
went to lab to begin my struggles with vi and FORTRAN.

I wasn't easy, but holy shit did i learn a lot...more then I ever could have
just doing the exercises in floating-point rounding error and non-linear
simulations (I ended up doing that later as well) that were "taught" in class.

I can still remember FORTRAN (77 I believe) has the a very strict formatting
scheme where the column had to match the keyword in order of the program to
compile or something stringent like that. But mostly, coming from BASIC, it
was a breath of fresh air.

I ended with completing the program with extra bells and whistles...sorting,
multiple leagues and other things...and the instructor was duly impressed.

I got my A and never woke up before 8am again.

~~~
trapperkeeper74
I knocked out a bunch of lower division CS classes at a JC (taking 5 at a
time). I think I went to the first class and a few before midterms and finals.
Just got the labs and handed them in the next day.

Hurray for attendance-optional JCs! :D

Most of it transferred to an UC and then the fun began:

\- caching http/1.0 forking select() proxy server as the third project in a
networking class, circa 2002

\- Java subset to MIPS assembly compiler

\- Reimplement most of the OpenGL pipeline in C++, quaternions and write a
trapezoid (scanline to scanline) engine (on which a triangle engine could be
built). Oh and then model the interior of the building.

\- Pipelined, microcoded, simple branch-predicting processor. Bonus points for
smallest microcode and fewest microcycles. (I Huffman mapped the histogram of
the sample assembly programs’ executed instructions to the user-defined binary
macro ISA (students had to write the assembler too), and then used progressive
decoding in the microcode (43 micro ops long microprogram IIRC). Blew the
doors off the extra credit in that class.)

------
noobermin
Some of the points brought up here are in fact correct, mainly legacy,
testing, and awesome compilers tuned for supercomputers. However, a lot of
these "why fortran" articles (on both sides) I find are written by people who
don't dabble enough on both sides of the fence, and are ignorant of what
either side offers. For example, numpy implements a lot of the stuff from
fortran the author listed, like broadcasting operations across arbitrarily
shaped arrays, striding and negative indices, etc, not to mention the scipy
library that contains leagues of the famous fortran codes...you get all that
with a quick and easy to prototype language for the stuff that isn't bottle
neck.

Another issue is computational people think C++ is about OOP, ffs, what a way
to sell C++ short and ignore the more significant tool C++ brings to the
table: generic programming. Whenever I talk to my computational colleagues,
they talk about "C++ and OOP" as if they are two peas in a pod; what if I told
you you didn't need to use inheritance to leverage the best of what C++ offers
(what if I told you you didn't need inheritance to even leverage OOP!?).
Templates have the potential to be a powerhouse for performance in codes I
feel, just no one in the computational side has leveraged them because they
quite simply don't understand it.

The same sort of thing is true for cs people and their critique of Fortran
usage, but I'll leave scathing comments on one of those stories that are
shared here.

~~~
DaiPlusPlus
C++'s generic-programming feature still has shortcomings - I think functional-
programming has more relevance to scientific computing, but C++'s functional
features are "okay" but still not as capable or proven as, say, Haskell's or
OCaml's - for example for tail-recursion you still depend on the compiler
supporting that optimization, you can't force it or necessarily assume it will
happen, with fun consequences for your stack if it doesn't.

~~~
sidlls
> I think functional-programming has more relevance to scientific computing

What do you base this on?

~~~
beisner
Functional languages force you to be more correct, more often. Eliminates a
bunch of classes of bugs which are anathema to scientific computing, and are
generally so high level that compilers can optimize extremely aggressively.
Also, scientific programming is usually much more about data flow and
transformation, which is FP’s wheelhouse.

~~~
sidlls
Do you have examples where it is the case that FP code yielded a superior
scientific result?

~~~
DaiPlusPlus
Output results would be the same - that's concerned with program correctness,
regardless of whether it's written in a function, object-oriented, or
procedural paradigm.

It should instead be compared with how long it took to engineer and build the
system or program in a particular paradigm and the qualitative engineering
aspects of a particular platform. FP may be amazing for certain areas, but a
difficulty in basing a large-scale project or business on it would be hampered
by the small supply of developers who can comfortably program in it.

------
gnufx
The article is wrong or misleading in a number of respects. For instance,
OpenMPI doesn't define the language interfaces -- the MPI standard does. It
talks about "no aliasing of memory" \-- the rules actually concern "storage
association" \-- and then claims Fortran passes by reference, misunderstanding
the whole thing. The Benchmarks Game is pretty useless generally, but it's
clearly useless to compare supposed language speed by using two different
compilers anyway. I don't mean to knock Fortran.

~~~
PeachPlum
Briefly stated, the Gell-Mann Amnesia effect works as follows ...

[https://calhounpress.net/blogs/blog/78070918-wet-streets-
cau...](https://calhounpress.net/blogs/blog/78070918-wet-streets-cause-rain-
michael-crichton-on-the-gell-mann-amnesia-effect)

------
foob
One of the key points of the article is that there is a lot of legacy code
written in Fortran. As a former high energy physicist, I have an anecdote here
that some people might find interesting.

There was a library written in Fortran called CERNLIB which included a broad
variety of miscellaneous numerical algorithm implementations ( _e.g._
minimization, integration, special functions, random number generation) [1]. I
couldn't tell you exactly when the library was first released, but my best
guess would be the early 80s. It can't possibly be later than 1986 when PAW
was initally released [2]. The field has since transitioned from the Fortran
based PAW to the C++ based ROOT since then, but many high energy
collaborations still rely on CERNLIB for their own analysis frameworks (keep
in mind that many of these experiments had been in planning and development
stages for over a decade before they actually turned on).

The thing about this that I find interesting is that compiling CERNLIB has
become a lost art and that this fact has had far reaching consequences. The
last available binaries were compiled with GCC 4.3 in 2006 and packages are
only available for Scientific Linux 4 [3]. This crucial dependency has led to
collaborations using extremely outdated Linux distributions and GCC versions
in their computing facilities. The majority of analysis code is written in
C++, but not even C++11 additions can be used because everything is frozen on
GCC 4.3. Nobody can even run the analysis environment on their local machines
without resorting to the use of virtual machines running SL4. It was really a
nightmare to deal with.

[1] -
[https://en.wikipedia.org/wiki/CERN_Program_Library](https://en.wikipedia.org/wiki/CERN_Program_Library)

[2] -
[https://en.wikipedia.org/wiki/Physics_Analysis_Workstation](https://en.wikipedia.org/wiki/Physics_Analysis_Workstation)

[3] -
[http://cernlib.web.cern.ch/cernlib/version.html](http://cernlib.web.cern.ch/cernlib/version.html)

~~~
jjdredd
Huh, funny thing. I had to use a heavy-ion collision simulation program
written in fortran, which had to be compiled using a certain compiler
implementation (1). After a 2-3 weeks of debugging and trying different
compilers in vein, my supervisor had put me on a phone with a guy that was
more successful than me and told me which fortran compiler to use.

(1) Each compiler gave different results: compilation errors, code that goes
in an infinite loop.

~~~
noobermin
For every nth performant, tested fortran code, there are 2^n goto ridden
legacy codes.

------
oconnor663
> The benchmarks where Fortran is much slower than C/C++ involve processes
> where most of the time is spent reading and writing data, for which Fortran
> is known to be slow.

Why would IO be slow in any language? What does the language have to do
besides buffering and system calls?

> In Fortran, variables are usually passed by reference, not by value. Under
> the hood the Fortran compiler automatically optimizes the passing so as to
> be most efficient.

Aren't arrays implicitly passed by reference in C also?

~~~
tyingq
>Why would IO be slow in any language?

I believe many Fortran implementations default to unbuffered io.

Which is probably easy enough to change.

But I think that's really the core issue. Physicists don't want to learn more
about programming languages. They want whatever mostly works out of the box
and has local documentation and expertise specific to their problem domain.

>Aren't arrays implicitly passed by reference in C also?

He covers that. C passes arrays by reference, but the individual elements
aren't contiguous. He says Fortran passes an optimized reference.

~~~
sbmassey
Unless you're passing arrays of pointers to things about, the individual
elements in a C array should be contiguous.

~~~
tyingq
>Unless you're passing arrays of pointers

That seems to be the context from his example:

    
    
      for(i = 0; i < nrows; i++){
         array[i] = malloc(ncolumns * sizeof(double));
      }
    

I suppose the counterpoint is that he's doing it wrong. But again, maybe
physicists just don't want a tool with that much flexibility.

~~~
sbmassey
Fair enough then, I suppose that in C you have to do something like that if
you want both that your matrix size is decidable at runtime, and to be able to
index it with m[x][y] rather than some function.

~~~
iamrecursion
That’s not strictly true. You can malloc an m*n sized array and then assign to
another array the pointer to the head of each column, allowing you standard
style indexing.

~~~
PeterisP
If you do that, every access will actually dereference that pointer and won't
be able to optimize the standard style indexing to a multiplication and
addition, so this still carries a significant performance cost.

------
pletnes
Fortran is much easier to use than its direct competitors if you are writing
array-based number crunching. For all other used, it’s hopelessly outdated. If
I were to explain its purpose to a non-fortran programmer, I’d say that
Fortran is useful for number crunching kinda like regular expressions are
great for certain text processing tasks. It’s basically a domain specific
language and should be used accordingly. Wrap in C++ or f2py and call the
routines from python/C++, where you do the «software parts»: IO, GUI, ...

That being said, I usually just use python and surf on other people’s hard
work!

Oh, and the author is wrong on many of the specific details. For instance, MPI
is available to many languages, including python.

~~~
robotresearcher
> [Fortran is] basically a domain specific language and should be used
> accordingly.

I read this whole discussion with interest, and I think this is the most
compact and insightful statement here. Thinking about it this way makes the
situation very clear.

------
rdiddly
The paragraph on "legacy code" is a bit weak and half-hearted because it
underemphasizes one of the most important arguments for using old code: it's
been thoroughly debugged already. The most the author can summon on the topic
is the fact that legacy code "takes uncertainty out of the debugging process."
What? There _is_ no debugging process, because that code has been debugged for
40 years and is damn near bulletproof at this point!

Everybody is used to cringing when they hear "legacy code," and that's
justifiable for several good reasons. Note that "not wanting to learn an
unfamiliar language" isn't one of them. And "not having, or not being willing
to use/cultivate, the skill set of reading someone else's code" isn't one of
them either.

But there is obviously a lot of bad code out there. And that's the thing,
there are only two kinds of code: good code and bad code. And by extension
there is bad legacy code and there is good legacy code. Don't assume legacy
code is always bad code. If something has been used successfully for 40 years,
do yourself a favor and try to have the humility to assume people implemented
it well, found all the bugs, know what they're doing, and/or generally are
rational-thinking adults who make good choices... instead of the usual naïve
assumption that everybody's an idiot but _I 'm going to change all that!_ No,
you're going to duplicate a lot of effort, and possibly (depending on the
faithfulness of your reading of the code) reintroduce some of the same bugs
that were dealt with years ago.

~~~
mushishi
I don't know about mathematical domain, so this might be off-topic but in some
cases a legacy code base that works perfectly can be brittle and full of holes
and bugs that never manifested themselves because the code not directly
interfacing the input data is guarded by subset of possible data it should
support, and many of the possible code paths have not been evaluated.

But if you try to refactor the code to, say support other feature or optimise
it, you might get into nastiness that is beyond comprehension, and you cannot
count on that the code is coherent or correct.

This is my experience for maintaining a legacy base that has been many years
in production. It's just a pile of frozen code that nobody has properly
refactored probably out of fear for breaking something. This way you end up
with unreadable layers and weird technology-specific hacks that were carefully
made just work, probably not understanding what the existing code actually did
but cargo-culting and resulting with massive amount of code that does little.

------
druidcz
Another reasons not mentioned : 1) builtin support for complex numbers 2)
Fortran compiler usually generates faster code (due lack of pointers it can do
more assumptions). But recently I see more and more physics code written in
C++

------
dahart
> Interestingly, C/C++ beats Fortran on all but two of the benchmarks,
> although they are fairly close on most.

I think this is fairly recent that C/C++ wins. I don’t know how recent
exactly, but I remember a colloquium not too long ago by a compiler researcher
who said that cross-compiling to Fortran and then optimizing almost always
produced faster code than the C/C++ compiler could. Fortran is apparently
easier to optimize.

~~~
nejenendn
Fortran is easier to optimize compared to c/++ if you don’t use restrict for
the c end. If you do use restrict (iirc) the compilers are competitive.

~~~
noobermin
What I don't get is why someone doesn't make a superb template library with
C++ that hides the restricts; it would make writing performant codes in C++
easier and you'd get the host of what C++ has to offer

~~~
quadruplebond
Eigen isn’t good enough? You can offload some stuff to MKL with it also.

------
RhysU
> Even if old code is hard to read, poorly documented, and not the most
> efficient, it is often faster to use old validated code than to write new
> code.

Amen. A one-character mistake might take a week to find as it exhibits only
subtly wrong behavior (e.g. wrong grid convergence rate, overly noisy boundary
condition, odd symmetry breaking beyond IEEE floating point). During that week
no science happens.

~~~
trapperkeeper74
Yup. Think of C as a rusty straight razor and Fortran as a barn full of rusty
implements about ready to fall at any time. C++ maybe a rusty safety razor.

Originally, Fortran had manual memory management, as per the times.
Thankfully, the language progressed.

Overall, the evolution of languages from assembly/raw instructional to
procedural ones needed early languages like Fortran on which other higher-
level languages, tools and OSes could be later built/bootstrapped.

------
ISL
Our physics group has a core library, first written in 1987, that is in
Fortran (a Microsoft dialect, to be specific).

Why haven't we moved to something else? It works, it is time-tested, and the
original author continues to maintain it.

(P.S. I'd like to compile it with the gfortran tools, in order to preserve the
library for the future. Is there any documentation for simple conversions from
Microsoft's implementation of the language to the more-traditional spec?)

~~~
craftyguy
> and the original author continues to maintain it.

What's the plan when they retire?

~~~
ISL
My hope is to release it in a form that can be compiled by open-source tools.

~~~
noobermin
A problem for computational science is people care about their publications
more than people being capable of reproducing their work. The funny thing is
an open source code is a sure way to attain a legacy.

------
rcarmo
A slight gap in knowledge in the piece: Most Python libraries for numerical
computation are written in C/C++ or... Fortran. Last time I had to compile
scipy from scratch I had to install gfortran.

~~~
nine_k
Some of the Numpy code is coming from 1970s: it uses BLAS that originates from
1979.

[https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprogra...](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms)

 _Immense_ amount of effort was put in performance, correctness, feature set,
and numerical stability of such widely used libraries. Replacing the without a
_very_ good reason is hardly feasible.

~~~
cm2187
I needed a .net math library. My IT department insisted on my trying a wrapper
around a Fortran library (I believe purely because they had a license
already). All the classes and variables names are short hexadecimals strings.
None of the interface is idiomatic to .net (the library only has void
returning methods which pass error codes as byref arguments instead of using
exceptions, requesting a parameter for the length of any array passed as
argument instead of reading it from the array), etc...

Basically I refused to touch this thing, using a library which makes the code
unreadable is going to be a bug magnet. I would be surprised if I was the only
one having this reaction to 1960s coding convention surfacing in modern code.

So selling new licenses should be a good enough reason.

~~~
ryl00
Sounds like your IT department botched the wrapper. Wouldn't a more successful
candidate likely just have used an abstraction layer between the .net
conventions you were expecting and whatever the native code was doing?

~~~
cm2187
Nope, the library itself is sold that way. See the documentation:

[https://www.nag.co.uk/numeric/DT/nagdotnet_dtw02/html/conten...](https://www.nag.co.uk/numeric/DT/nagdotnet_dtw02/html/contents.html)

~~~
ryl00
Ah I see, makes more sense. It would be a massive undertaking to abstract that
API into something else.

------
noahdesu
I spent several summers working at Los Alamos National Lab in their HPC group
on I/O, among other things. Supporting legacy codes was a requirement, even if
it meant that we couldn't explore obvious and important optimizations. I
remember an anecdote from the division lead, that the reason a lot of legacy
code is never replaced was because of certification. Codes are designed to
simulate some critical system, and it takes years to trust the result. So any
change that forces a multi-year validation process was a non-starter.

My understanding is that the code modernization and co-design efforts that are
a part of the Exascale initiative are changing this.

------
drngdds
Seeing the author talk about "C/C++" like it's one language (and basically
just C) is frustrating.

~~~
adwn
That was my initial reaction as well. However, I would guess that C++ is
mostly used as a "C with classes" in the domain the author is talking about,
so "C/C++" wouldn't be so incorrect.

------
tictacttoe
I don't think the choice to continue using Fortran is as well thought out as
the author. I work in computational physics and have written a lot of Fortran,
C++ and Python.

The predominant issue is that physics codes are typically written by PhD
candidates, often with little to no programming experience. The projects can
span 3-5 years and continue existing in the physics ecosystem for decades.
Good programming practices are seldom employed, and the codes become these
massive patchwork ships, leaking everywhere with holes plugged by spaghetti
code.

The issue is not that the students aren't smart enough to learn good
programming practices, it's that the advisers are not patient enough to wait
for documentation, unit tests and so forth. They view good practices as wasted
time, after all it's "physics not computer science".

Unfortunately, the path of least resistance is actually to adopt unit tests,
write code documentation and generally employ industry programming practices,
it's just that this path has a barrier to entry and the benefits are not
immediately apparent to older academics who have fallen out of the loop.

The end result is quite sad; students with advanced programming skills are
chronically under appreciated in the field. Professors will happily bring them
on board as post docs to develop their simulations, but they balk at giving
them jobs based on their computational skill set. It is thus no surprise why
the computational talent leaves physics and takes careers in the private
sector where the union of math skills and programming is in high demand.

------
gravypod
A lot of the arguments in this post can simply be rebutted with basic
abstractions. Things like "Dynamically allocating and deallocating ... 2D
array" is easy in C++. You could easily have someone define a MathArray<Type,
Rows, Columns> class and turn this messy fortran code:

    
    
        real, dimension(:,:), allocatable :: name_of_array
        allocate(name_of_array(xdim, ydim))
    

Into something that looks like

    
    
        auto *my_matrix = new MathArray<double, 10, 10>();
    

The code the author showed demonstrates a lack of understanding of C and C++.
Even if you restrict it to C your matrix code should look something like this

    
    
        typedef struct {
            size_t rows, cols;
            void values[];
        } mat;
    
        inline mat *matrix_make(size_t rows, size_t cols, size_t value_size) {
            mat *m = malloc(sizeof(size_t) + sizeof(size_t) + (rows * cols * value_size));
            m->rows = rows;
            m->cols  = cols;
            memset(m->values , 0, rows * cols * value_size);
            return m;
        }
    

While that code may be complicated the physicists only need to see....

    
    
        mat *matrix = matrix_make(10, 10, sizeof(double));
    

And if you'd really like you can hide the sizeof via a macro...

    
    
        #define MATRIX_MAKE(r, c, type) matrix_make(r, c, sizeof(type))
    

For me "real, dimension(:,:), allocatable :: " is much more complicated than
"matrix_make"

Many of the issues people see in the speed difference between Fortran and C
code will likely be based on their misunderstanding of how Fortran actually
does their data layout and a misunderstanding of how the computer hardware
(and what you're describing to C) to do. This "Double array" that was defined
would never be allowed in production code. The amount you'd be hitting the OS
for even small allocations is crazy.

The arguments for Fortran, as far as I'm concerned, are:

    
    
       1. We already know it
       2. We're not going to get grant money to rewrite a library
       3. We've built a bunch of computer clusters and have to justify what we spent (rather than buying 2 GPUs for your workstation)
       4. We've all spent a lot of time learning how to use MPI that we're never getting back.

~~~
sampo

        > auto *my_matrix = new MathArray<double, 10, 10>();
    

In Fortran, one can choose the matrix size at runtime, though.

But otherwise, great. Now if you just add slicing so that if we, say

    
    
        allocate(name_of_array(-1:5, 1:3))
    

and then say fill in the matrix by

    
    
         1  2  3  4  5  6  7
         8  9 10 11 12 13 14
        15 16 17 18 19 20 21
    

then

    
    
        name_or_array(0:4:2,2:3)
    

should return

    
    
         9 11 13
        16 18 20
    

and then we're talking. (Slicing should of course also work in higher
dimensions than 2.)

~~~
gravypod
If you want to do make an implementation that supports runtime-specified
matrix sizes you change your implementation from...

    
    
        auto *my_matrix = new MathArray<double, 10, 10>();
    

Into

    
    
        auto *my_matrix = new MathArray<double>(10, 10);
    

Most programs don't need dynamically sized arrays (rows and columns) and as
such it makes sense to also provide a template Row and Column width. By doing
this you can likely implement matrix multiplication and addition as a
constexpr (with some effort) and thus get....

    
    
       1. Compile time matrix evaluation
       2. Vectorized multiplication/addition of matrices
       3. Pipline-efficient code

------
gh02t
Physicists / engineers / mathematicians are the target audience for Fortran.
For heavy number crunching it's still quite good, it's people trying to use it
for other stuff that causes problems.

That said, Fortran really is dying. Scientific code is much larger nowadays
with more functionality and scientists want to do everything in one language.
C++ and Python are taking over.

~~~
azag0
Vast majority of scientists is not able to write idiomatic Fortran, yet alone
idiomatic C++. Scientific C++ code that didn't have an oversight from a
professional C++ developer will be always horrible. Scientific Fortran code
written without such oversight can sometimes be bearable. This is perhaps the
main advantage of Fortran.

~~~
gh02t
Eh, I'm talking mostly about the large scientific code packages that are being
developed with millions of dollars in funding and large, organized teams. The
people writing these sorts of codes know what they are doing and a lot of
migration to C++ is because they are _more_ familiar with it and it's easier
to hire skilled people.

------
jxcole
So the article basically says:

1) Some stuff is already written in Fortran so they don't want to rewrite
that. I dig it.

2) It's fast (except C sometimes) but easier to write than c. Like 100x faster
than python.

I'm not sure about number two. With the gpu processing revolution wouldn't a
python/TensorFlow stack be faster than Fortran? Am I missing something?

I remember talking to someone who had worked heavily on atmospheric weather
predictors recently and her description of the program was: we divide the
space into tiny little cubes and then run some differential physics equations
to predict what will happen next. My basic questions to her:

1) From a computational perspective this seems very GPU friendly.

2) Why not use a convolutional neural network? If you use the same data for
training you will probably wind up with a more accurate prediction than a
theoretically based physics system.

Her reaction was basically that she hadn't heard of these things before so my
impression is in fact that the physics community is just behind and they will
catch up when they are ready.

Hope I'm not missing something here.

~~~
openasocket
TensorFlow is for machine learning, not general purpose computations. And no,
you will not get better results from a neural network than state of the art
computational fluid dynamics.

As for general purpose GPU programming, some parts physics are GPU friendly
put not all of them.

~~~
jabl
IIRC there was some project doing more or less real-time weather forecasts for
small areas (think airports and such) that used deep learning instead of
traditional CFD style simulations. They were able to do it with several orders
of magnitude less CPU usage than the CFD calculations.

~~~
sidlls
But was it more accurate? Was it for the same purpose? Getting within a few
percent of a CFD model might be good enough for industrial use. For studying
systems (from the perspective of scientific investigation) it's not even
close.

~~~
noobermin
Of course it isn't which is why ML isn't used by us in the computational
community...yet.

------
typon
Why not create a DSL in C++ that gives you the same syntax as Fortran for
doing array/matrix manipulations? That's really the one main advantage I
gleaned from this article

~~~
quadruplebond
Things like this exist but there are two issues. They aren’t part of the
standard and Fortran compilers are better at optimizing built in Fortran
abstractions than c++ compilers are at optimizing user made abstractions.

I suspect it’s the lack of standardization that’s really the issue.

------
theophrastus
I wish there was a clean nomenclature for the context of program use. My world
of programming is academia and industrial research. In that setting the vast
majority of software is fashioned for ad-hoc use. There is no expectation that
a large population will ever make use of what we code except to adapt it to
some new specific related need. Scientific publishing promotes novel
investigations, novel investigations promote one-off programs. And the user
interface doesn't have to be, and is very seldom, "elegant" in any sense. It's
in that context of "rapid cycle until it works then seldom use it again" that
Fortran (or in my hands, python scripts which call Fortran numerical
libraries) still makes sense. Understanding the context will free us from the
head-scratching.

------
angrygoat
On array convenience: Fortran also supports optional runtime bounds checking
of array indexes. I've worked writing signal processing code in Fortran, and
it's really quite nice to have that when doing dev, even if you turn it off in
prod for performance reasons.

~~~
gravypod
GCC/G++ will remove bounds checking on O3 I think.

~~~
lvh
GP is saying the contrapositive for Fortran: you can have runtime bounds
checks during development instead of disabling them for production builds.

I'm not sure which feature you're referring to, because while GCC has
-fbounds-check, that's for the GNU Compiler Collection and only for frontends
that support it (to wit, Fortran and Java). I don't know of any runtime bounds
checking that ever made it into vanilla. Clang and GCC both have some limited
array bounds checking, but it's static, and there are plenty of issues it
won't catch. People maintained third party patches for a long time, but these
are obsolete now. Perhaps you're thinking of ASAN/-fstack-protector-*?

------
boznz
Not sure I see the problem, Fortran compilers are quite efficient these days
by the sound of the comments. If its not broke, don't fix it.

Just because it's not as "cool" as the language your currently using is not a
reason to change and never will be.

------
justin_vanw
I think this just shows how scientists don't understand programming. Pretty
much every 'advantage' listed for Fortran over C++ could be added to C++ in an
afternoon. C++ is an extensible language, you can define your own types and
operators, so all the examples can be easily implemented. For some of the
examples it's just sad that they were brought up. For example, 'you have to
write a loop to allocate an array' should say 'you have to write a function
one time to allocate arrays and never write this loop again'.

~~~
4lch3m1st
I don't think that's the case, seems a little too extreme to think that. I
believe the central point of someone not directly enrolled with programming as
an actual job (or as a hobby taken seriously) picking Fortran over C/C++ has
to do with the features it has out-of-the-box, or at least that are more
apparent from the surface of language knowledge. Even if it takes one
afternoon to implement a certain feature, would scientists WANT or NEED to do
that?

Don't get me wrong, this is probably an opinion I would defend if it were
exclusively related to programmers being "too lazy to learn a new language",
but this is a different learning purpose.

~~~
justin_vanw
I can't agree, comparing two languages purely by a superficial "this one lets
me write code like this out of the box" ignores everything that matters.

Lets say I wanted to write a web blog server. One language has "import blog;
blog.run()" and I am up and running instantly. Another language makes me
install a blog library and some other side stuff, and choose a webserver. The
point is it isn't just built right in. Which language is better for writing
web blog software? The answer is, you have literally no idea from what I just
told you. My analysis is insanely superficial and meaningless. Presumably, if
I am going to spend hundreds or thousands of hours in some coding environment,
what is 'built in with no effort' matters somewhat on hour 0, but virtually
not at all by hour 200. Professional scientists presumably spend thousands of
hours on this stuff, it's really not too much to ask that they become somewhat
competent with the tools they are using.

------
sundarurfriend
> researchers at MIT have decided to tackle this challenge with full force by
> developing a brand new language for HPC called Julia

This and the linked news piece [1] from MIT News sound pretty weird to me. The
OP article probably takes the bit about "researchers at MIT" developing Julia
from the MIT News page, so that's the real source of the issue - the MIT News
pages seems to have been written with weird biases, making it sound like a
primarily MIT project that other people have just tacked a few things on to.
And then there's:

> A few years ago, when an HPC startup Edelman was involved in [...] was
> acquired by Microsoft, he launched a new project with three others.

That to me sounds like an implication that Edelman was the one to initiate the
project and take in the others. They seem to be writing from the usual
academic bias of "the senior faculty gets the credit even if the actual work
is done by the PhD/graduate students". Edelman was Bezanson's thesis advisor
and a crucial part of Julia's history, but this article seems to be
downplaying the role of the other core contributors and the open source
community.

I had assumed university news, at least in such technical topics, would be
more reliable and less inherently biased, learned something new today.

[1] [http://news.mit.edu/2014/high-performance-computing-
programm...](http://news.mit.edu/2014/high-performance-computing-programming-
ease)

------
trapperkeeper74
In the late 90’s, I helped port a nuclear reactor simulator to Win32. It was
around 20 million lines of Fortran and was actively developed by physicists
and engineers (none were really software engineers). And, at that time the
codebase was around _40 years_ old. Apart from disabling virtual memory, it
worked on winDOwS nearly flawlessly on an COTS PC and ran about 50% faster
than the fastest *nix test lab box.

It’s done mostly for historical tradition reasons, and it costs nontrivial
time and money to switch.

------
ivanhoe
The same reason many big systems still use COBOL: it works, it's well tested,
why change it? Usually they just run it as long as they have hardware they can
run it on...

~~~
peterburkimsher
Speaking of hardware, we're entering an era where custom ASICs are becoming
popular (e.g. Apple's A11 chip, Google's TPU).

If we know the software libraries won't change (blas, lapack) then what chips
can we build to make them run even faster?

------
filereaper
I've always wondered about the following approach for old apex predators like
COBOL and FORTRAN, it follows what .NET does.

The idea is to take current COBOL and FORTRAN code and compile it down to IL
similar to .NET's CIL, in the .NET world once code is brought down to CIL it
can be read back in VB.NET, C#, C++.NET

Essentially bring it down to some sort of lossless IL to convert to another
language. It should be possible to do this given we have the source code. In
certain cases where source code doesn't match the binary (happens over years
due to monkey patching the binaries etc...) then we'll have to take an
approach a few folks at IBM are talking with recompilation and reoptimization
of existing old binaries for COBOL. [1]

Don't throw away that debugged and battle-hardened code, change the IL and the
final compile target, if possible re-interpret the IL into a newer language if
its not lossy.

[1][http://www.ibm.com/common/ssi/cgi-
bin/ssialias?infotype=AN&s...](http://www.ibm.com/common/ssi/cgi-
bin/ssialias?infotype=AN&subtype=CA&appname=gpateam&supplier=897&letternum=ENUS215-407&pdf=yes)

~~~
mschaef
MicroFocus has done COBOL for .Net for years...

[https://www.microfocus.com/products/visual-
cobol/](https://www.microfocus.com/products/visual-cobol/)

"Compile COBOL applications directly to Microsoft intermediate language for
deployment within Microsoft .NET... Compile COBOL applications to Java byte
code for deployment within the Java Virtual Machine"

I've actually seen this idea suggested, at least as part of a bake-off between
design ideas. There's a certain amount of sense to it.

------
leephillips
If you're interested in these issues you might enjoy my article¹ from a few
years ago about Fortran. It even uses the same epigraph!

¹[https://arstechnica.com/science/2014/05/scientific-
computing...](https://arstechnica.com/science/2014/05/scientific-computings-
future-can-any-coding-language-top-a-1950s-behemoth/)

------
jcadam
I remember when I was a junior engineer working on aerospace simulation
software (defense contracting). I was primarily writing C++, but we had a
large collection of physics algorithms written in Ye Olde FORTRAN that we had
to link in.

I brought up a possible rewrite, and one of the greybeards told me that they
had looked at that in the past, but the govt V&V process on any rewritten
algorithms would have been so onerous that they eventually dropped the idea.

The last guy who actually understood that old code retired about a year after
I started there and then things started to get... interesting.

As an aside, for 'new' code, I'm actually ok with the science folks dumping a
pile of matlab script on me so I can rewrite it in Python, Java, etc. (rather
than letting scientists write production code, which I'll have to rewrite
later anyway).

------
crispinb
Often there are many (more or less reasonable) extrinsic reasons for non-tech
domains staying with the stacks they use.

I've done some programming for cognitive psychology experiments, fMRI analysis
etc, and although I didn't like the often proprietary systems used (E-Prime,
Presentation etc), I could see it would have required hefty investments of
very scarce time to switch to something 'better'. The vast bulk of the
software was written by non-programmer grad students, for whom the tech was a
very 3rd order issue: they just needed their experiments up and running. This
was generally done by finding a close-enough prior experiment, and tweaking it
in a hurry, often with limited understanding of how the system worked. There
was in most cases no possibility of paying programmers to do the work.

------
accurrent
Modern fortran is quite awesome for matrix manipulation. The ease of using it
for math makes me wonder why we don't use it more often. The main downside to
modern fortran is its I/O capabilities are quite stunted. I honestly think we
need an open source fortran95 to gpu compiler.

~~~
jabl
Gcc does have a nvptx backend, and libgfortran supports it. Never tried it
myself though.

------
egl2019
Be careful about performance comparisons:

If you are using the GNU compilers (and the appropriate compiler flags with
gcc), there isn't much difference in performance --- and there shouldn't be.

There are a few FORTRAN-only compilers out there; I'd be curious to see how
well they do.

~~~
sampo
I think the Intel compiler has a reputation of producing the fastest code, and
Intel makes both C++ and Fortran compilers.

------
autarchprinceps
What would you use for a HPC application? You need a very fast language that
doesn't get in your way in data management. Trust me even in the basic
examples we did in our parallel algorithms course minor changes in data layout
could save hours of computational time and that was in C, where no crappy
garbage collection gets in your way. Not to even mention the masses of good
low level performance analysation tools and parallel libraries made for
Fortran and C/C++, but not other languages. Why would you use anything else? I
think some people misunderstand what makes languages good. It is not
generalisable, it depends on the case. Sure to write a script, e.g. to quickly
automate a few things, shell or common scripting languages like ruby or python
may make sense, because it is relatively easy to get going and write something
in them. But that is not an important question for an HPC application. You
need to write code that will definitively give the correct result and that
will run extremely fast on the cluster of machines that make up the
supercomputer. You need the language means to define very detailed how your
memory is to be layed out, etc.. The very thing that makes a language annoying
to use in a scripting context is a feature here. On the other hand you don't
care about the ease of portability, in fact you will want to optimise it for
one specific architecture as much as time allows. That the program will have
to be recompiled is a minor issue in comparison to memory layout, threading
schedules or network communication changes to the algorithm to optimise it for
a new system.

No language is truely superior to all others, the question is always context
and the conditions and constrictions it puts on the developer.

For physicists Fortran or C are the best choices. Even Go uses a garbage
collector which brakes it for large HPC scenarios. VM based languages are
completely useless. Their low speed is already a nuisance for simple common
tasks, never mind problems that already take days or weeks to execute when
they are properly optimised. If you think Java, Ruby or any such language
could be used, look at benchmarks. You will find CPU time of 1.5-2.5x and
memory at least 5-7x the amount needed by the same problem executed by a
program written in C.

------
chmaynard
Fortran and Cobol both orginated in the late 1950s and are still in widespread
use in certain domains. It's extraordinarily cheap in these domains to keep
using legacy Fortran and Cobol software. These folks are passionate about
their domains, but they probably couldn't care less about so-called modern
software languages, new development tools, and new hardware. Those of us who
thrive on change find this very hard to accept. We need to get over it.

------
The_suffocated
I think one problem with Fortran is that it has no standard library. To a
Fortran beginner, it isn't obvious where to look for high quality third-party
libraries.

------
Ericson2314
Rust with [https://github.com/rust-
lang/rfcs/blob/master/text/2000-cons...](https://github.com/rust-
lang/rfcs/blob/master/text/2000-const-generics.md) should be better positioned
to pay physicists real dividends than C++. Fingers crossed!

------
throwaway7645
FORTRAN is still great for fast numerical applications and easier for me to
read than C++.

My industry is rewriting its FORTRAN base in C++ and a lot of physicists are
switching to Python + Numpy for all but the most intensive tasks. I see
FORTRAN being only used in legacy systems within the next 5-10 years.

~~~
baldfat
FORTRAN is still in much of the R Core. Fortran is faster than C++ and I
actually feel that many people are starting to realize why we still have
FORTRAN around.

Though there are many using Python + Numpy R still have a larger piece of user
base in scientific and mathematical spaces.

~~~
quickben
"Fortran is faster than C++"

Whats your usecase?

It seems it is faster for some, but a lot slower in others:

[https://benchmarksgame.alioth.debian.org/u64q/compare.php?la...](https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=ifc&lang2=gpp)

~~~
coliveira
When people say that FORTRAN is faster than C++ it is obviously for numerical
code, since general systems code is very hard to do in FORTRAN. Many people
will complain that C++ can be faster than FORTRAN, however the problem is that
you need to know a lot of C++ or have access to the right libraries to write
competitive numeric code. In FORTRAN you can do that out of the box.

~~~
eesmith
Moreover, the article mentions this point, references the benchmarks game, and
says "However, the two benchmarks where Fortran wins (n-body simulation and
calculation of spectra) are the most physics-y".

~~~
igouy
>> where Fortran wins <<

spectral-norm

    
    
        1.99s Fortran Intel 
    
        1.99s C++ g++ 
    

URL provided @quickben

~~~
eesmith
And the full text in the article is:

"However, the two benchmarks where Fortran wins (n-body simulation and
calculation of spectra) are the most physics-y. The results vary somewhat
depending on whether one compares a single core or quad core machine with
Fortran lagging a bit more behind C++ on the quad core."

Though I don't see how the single core version is measured. As far as I can
tell, the spectra calculations are always using 4 CPUs while the n-body is
always using 1 CPU.

~~~
igouy
And the article was "Posted on July 16, 2015".

Back-then a second set of measurements were shown with the programs forced to
run on one-core, using set-affinity.

~~~
eesmith
Ahh, I missed the 2015 part. Thanks!

------
keithnz
I'd be interested in some stats around this, My father was a physicist, and I
did some programming work for him at a large research institute, and while
there was fortran around, most of the scientists were reaching for new tools
where they could.

------
netcraft
I dont have much to add to this conversation but it surprises me that Rust
hasn't come up in the comments. Would/could rust be appropriate for these
kinds of use cases today/in the future?

~~~
steveklabnik
In theory, yes. In practice, we need to get SIMD working on stable, which is
underway but not done yet. We also need to get RFC 2000, const integers,
implemented. And I'm sure other things too.

Basically, Rust _could_ be excellent at this in the future, but right now, is
merely okay.

------
cryptonector
I wonder if Haskell wouldn't be a better fit. I realize there's no rewriting
all that legacy code in any language though -- but I'm thinking of new code
here.

------
treyhuffine
Fortran was a requirement for my undergrad. There are often times thousands of
lines of legacy Fortran in national laboratories.

------
yCloser
1) there is a fortran codebase.

2) noone wants to spend 1$ to rewrite it in any other language.

-> enjoy fortran

------
hardtke
glmnet, one of the core packages for machine learning is written in Fortran.
So are many of the linear algebra packages commonly used. It's not just
physicists that still rely on this stuff.

------
lerie82
Get with the time

------
KasianFranks
Same reason why string manipulators still use Tcl.

~~~
throwaway7645
I think that's more of once a language has a hold on a particular industry
(Ex: TCL in semiconductors), it takes awhile for something else to replace it
due to all the in-house code written in it. If the vendor uses only a TCL API,
then double the time to switch. Assuming TCL isn't doing a great job serving
their needs.

