
Rewriting Fortran Software in Rust - fanf2
https://mckeogh.tech/post/shallow-water/
======
codezero
I used to maintain a large FORTRAN codebase - it seemed like every time new
grad students showed up, everyone got a wise idea to re-build everything in a
modern language.

This question is revisited a lot, but the general consensus has been that
FORTRAN is fast, simple, easy to understand and code, and the compilers used
are super optimized (Intel's FORTRAN compiler was really a gem - it managed to
do a ton of automatic parallelization on modern hardware)

I'll see if I can find it, but I remember attending a conference talk at AGU
in ~'08-10 called "Do-over or make-do" [1] that analyzed how many people-hours
it would take to update all the climate modeling software (earth and space
sciences are really huge on legacy FORTRAN code), and the conclusion of that
talk was: use modern code/tools for glue, keep your climate models in FORTRAN.

There's a ton relating to precision - we had a huge ordeal converting our
32-bit space weather model to 64-bit because the change in precision changed
our results, so we couldn't publish (in good conscious) without making sure
the results weren't within a good margin of error. Anyhow.

This is always a fun thing to revisit now that I'm really far away from having
to maintain any FORTRAN code any more :)

edit found it! [1]: [https://www.easterbrook.ca/steve/2010/12/agu-session-on-
soft...](https://www.easterbrook.ca/steve/2010/12/agu-session-on-software-
engineering-for-climate-modeling/) [2]: slides
[http://www.cs.toronto.edu/~sme/presentations/Easterbrook-
AGU...](http://www.cs.toronto.edu/~sme/presentations/Easterbrook-AGU-
fall2010.pdf) Also randomly found this:
[http://www.moreisdifferent.com/2015/07/16/why-physicsts-
stil...](http://www.moreisdifferent.com/2015/07/16/why-physicsts-still-use-
fortran/)

~~~
xioxox
Although Fortran compilers may produce fast code (they have to - there's no
chance to play with SIMD intrinsics or templates unless you want to use C++),
I don't find them very good. I've personally hit several bugs in gfortran. The
code I maintain now has some extremely ugly hacks to work around unfixed
compiler bugs. I recently tried compiling it with Intel Fortran and got a
compiler crash on one input file. Maybe they work well with Fortran 77 level
code, but the Fortran 2003 level support seemed rather buggy. I suppose this
is the problem with a minority language.

I won't recommend Fortran unless the code has to be written by scientists who
know nothing else. It has horrible string support, no data structure or
algorithm library (like STL), no templates, few non-numeric libraries (e.g.
http, json...) and (as above) buggy compilers. Furthermore, C++ gives you much
more control over numerical speed when you need it.

~~~
TheRealKing
"It has horrible string support", this is untrue. Learn Fortran 2003, 2008,
2018.

~~~
xioxox
What? Where is this good string support in Fortran 2003, 2008 and 2018? I must
have missed it in the manual. Now you can allocate a string of a non fixed
size - simply amazing technology! Actually, one of my compiler crashes was
trying to use this revolutionary feature.

~~~
pantalaimon
> Now you can allocate a string of a non fixed size - simply amazing
> technology!

That’s more than what C gives you out of the box ;)

~~~
smabie
Many languages forbid mutable strings. They aren't really a very good idea,
imo. Even less of a good idea are extensible strings. More complex data
structures are required for efficient extensible strings, and this complexity
shouldn't be hidden in a simple interface. Better to have a Buffer type or
something that is distinct from the String type so programmers are made aware
of the performance/allocation characteristics.

Though, this is kind of ridiculous to even talk about, since the CPU and
compiler itself are responsible for the greatest sins of hiding something
unimaginably complex behind a simple interface.

Ah, the joys of the 6502 and the Commodore 64!

~~~
benibela
Delphi had mutable xor aliased strings. That is really great. Each function
can change the string as fast as a mutable string, but once it shared with
other functions, they can be certain that it will not be changed randomly

------
saagarjha
It’s nice to see an article that makes assumptions and then realizing that
they didn’t really work out and walking back on them. It’s often good to try
rewriting things, but often the thing you think is old and crusty (FORTRAN) is
actually quite good, perhaps in ways you weren’t aware of. So it’s important
to be able to say, “yes I tried some nice new thing, it didn’t really work out
how I had expected, also some of my assumptions I started off with were not
really correct either and I might not have made a fair comparison”.

------
atrettel
This is an interesting article. I have often wondered if there would be
benefits to porting scientific simulation codes to more modern languages. In
the end I've always reluctantly stayed in C or Fortran for scientific
simulations. The article touched on the some of the issues (and the ever
present issue of "If it isn't broke, don't fix it"). But I do think that the
article ignored what I think is the biggest issue.

To me the biggest issue is that some simulations just need to run on 1000s or
hundreds of 1000s of cores, and the primary technology that lets this happen
is MPI (Message Passing Interface). I've been writing new Fortran code only
because I want the simulation to be able to scale to this kind of size. The
issue is that MPI only has official bindings for C and Fortran, so the only
real choice I have is to keep programing in C or Fortran if I want to use MPI.

Any thoughts on this? Personally I would love to move towards more modern
languages but without the proper libraries it is still a tough decision.

~~~
zozbot234
Rust can bind to C libraries, as can many other languages.

~~~
atrettel
I have looked into this a bit, but the issue appears to be that MPI is not a
library per se but a standard interface to more underlying function calls.
Different implementations of MPI are in effect different libraries using the
same API, and because of that it is difficult to write general bindings rather
than just bindings to a particular version of an MPI library that is
implemented on a particular supercomputer. In short, it's too site specific
and fragile, unfortunately. If you change the system, the program might no
longer compile. I appreciate the comment and feedback, though.

This Stack Overflow page comments more on this issue:

[https://stackoverflow.com/questions/22949462/rust-on-grid-
co...](https://stackoverflow.com/questions/22949462/rust-on-grid-
computing/24913447)

~~~
TheRealKing
MPI officially supports only C and Fortran. Anything else for any other
language can be only considered as adventures of some open-source developers,
that may not last forever. Take a look at the fate of the C++ bindings for
MPI.

------
cs702
I find it remarkable that one person with just over a year of Rust experience
can successfully rewrite ~6000 lines of well-tested, highly tuned Fortran in
Rust and end up with similar performance on benchmarks without having in-depth
expertise in the domain (simulation of fluid dynamics). The benchmarks
([https://mckeogh.tech/post/shallow-
water/84896735-54a97580-b0...](https://mckeogh.tech/post/shallow-
water/84896735-54a97580-b09c-11ea-9e61-6b3fcd37a3cd.jpg)) show similar
performance across all input sizes for both the old Fortran and the new Rust
code, with Fortran outperforming in some cases and Rust outperforming in
others.

~~~
faitswulff
Actually closer to 4 years of experience:

> As for my Rust experience, I’ve been using it for personal projects since
> ~2016 and I worked as a Rust software engineer at a startup in Berlin for a
> year after leaving high school.

Publish date on this blog post: July 14th, 2020

------
gautamcgoel
This is a really cool post, but I wish the author had profiled his code. He
makes several statements like "I think the performance issue is XYZ" but never
bothers to check if his guess is correct. As Walter Bright liked to say, Use
the profiler, Luke!

------
milancurcic
Nice article. I'm curious which Fortran compiler did you use? One key
difference that I see in the context of this article is that Fortran is
natively parallel (both shared and distributed memory) via coarrays, teams,
and events, whereas Rust (and Rayon), as far as I can tell, is shared-memory
only. For applications that need to run on 100s or 1000s of cores, Rust needs
to bind with C MPI. Is there any other approach to distributed-memory
parallelism in Rust?

Good points on the lack of certain features and ergonomic libraries. We're
working hard on making this much better for Fortran [1] by developing a
community-driven stdlib, package manager, and similar tools.

In fact, we'll have our monthly community call tomorrow [2] where we'll
discuss the best way forward for a nice strings API in Fortran stdlib. I
welcome everybody interested to join the call.

[1]: [https://fortran-lang.org](https://fortran-lang.org) [2]:
[https://fortran-lang.discourse.group/t/fortran-monthly-
call-...](https://fortran-lang.discourse.group/t/fortran-monthly-call-
july-2020/195)

------
schultzer
This is interesting, I did write BLAS in Rust
[https://github.com/schultzer/libblas](https://github.com/schultzer/libblas)
it's only single threaded at the moment.

I feel people forget to ask a more important question before they start a full
rewrite and that is: Where is the hardware trends going and, is dedicated GPU,
CUDA etc. even the future.

I feel it would be wasteful to spend a lot of time writing GPU specific code,
Its not really maintainable on these scientific project (where only a select
few people contribute, usually people that do the research).

------
mindB
The author says he's happy with the performance improvements he made, but--
even at the size where the Rust version performs best--he gets much less than
a 2x speedup. After a couple months of effort, that's not a whole lot to show.
This reads like a case study in why it's probably a bad idea to rewrite even
when you think you have compelling reason for it.

~~~
pjmlp
Also the author has not used state of the art compilers like XL, Intel and
PGI.

~~~
milancurcic
Indeed, and this depends between applications. For the WRF (Weather Research
and Forecasting) model, I get 3-4x speed up with Intel Fortran compiler over
gfortran. I don't see where the Author mentions the Fortran compiler used,
though.

~~~
wycy
> I get 3-4x speed up with Intel Fortran compiler over gfortran

Wow, even at similar optimization levels? -O3 for each?

~~~
milancurcic
Not quite, but at optimization levels set by default WRF configuration for
each compiler (definitely not a fair comparison):

gfortran: -O2 -ftree-vectorize -funroll-loops ifort: -O3

I don't have the timing results anymore. This was in 2018 on Xeon Platinum
8168.

I recently tried replacing -O2 with -Ofast -ffast-math to gfortran settings,
which gives about 18% speed up. So still far from Intel. I recently proposed
it here [1].

[1] [https://github.com/wrf-model/WRF/issues/1254](https://github.com/wrf-
model/WRF/issues/1254)

~~~
RockIslandLine
Have you posted about this on the gfortran mailing lists? They are generally
interested in examples that show where the compiler could be improved.

~~~
mkbosmans
I doubt they are short on inspiration for further improvements. A big part of
the speed advantage of ifort is the aggressive loop unrolling, pipelining,
splitting and multiversioning.

Coming up with these transformations is not the difficult part. That would be
actually implementing these transformations correctly and keep the whole
compiler optimization framework maintainable.

And of course there is certainly a cost for the improved runtime. Primarily in
compile time and code size. For the Intel compiler it generally makes sense to
sacrifice these in favor of runtime performance, because it is used on these
kind of scientific codes a lot, but for gfortran the balance might different.

------
pjmlp
Already mentioned on the Reddit thread, the author apparently isn't aware of
modern Fortran capabilities, IDE tooling support, NVidia and Intel offerings
for GPGPU programming or HPC.

------
danmg
PGI's Fortran compiler, bought by NVidia a few years ago, has excellent GPU
extensions. You set an attribute on an array, move stuff on there, do your
computation, move stuff back. It was the easiest GPU coding experience I've
had.

------
efxhoy
Nice post!

I have written several versions of a scientific simulation program in python,
first using MPI (single machine) and then native multiprocessing. Using numpy
and Pandas for all the heavy data operations. I really valued the ergonomics
of python and native multiprocessing and found the MPI code to be not very
human friendly. However, I ran into some very strange bugs relating to MKL and
multiprocessing that just hung the program on large input sizes with
MKL_NUM_THREADS above 1. I also had to bounce the input data to the worker
processes to files due to mp not being able to pickle data when it got too
large.

As soon as Apache arrow support in rust gets more feature complete I'll
probably try to redo it there.

------
estebank
TL;DR: wanted to leverage GPUs for computation (but couldn't), the ergonomics
of arrays in Rust for scientific computing without the use of ndarray are not
good, had memory bandwidth issues to keep cache lines fed, performance of the
two implementations is comparable

One of the things I find interesting is how the Fortran version is faster both
with small and huge inputs, while the Rust version is faster for input sizes
in the middle.

------
morty_s
In my experience... Reinventing (or rewriting) the wheel is a fun exercise.
It’s even more fun to write it in rust (biased). Occasionally, the new wheel
is better. Mostly, I learn how some of my assumptions were wrong and/or how I
didn’t even account for “this, that, and the other.”

That said, I enjoyed reading through and finding “Edit: I was wrong about X.”
Learning experiences all around!

Also, might have been mentioned but is this a typo?

> I’m guessing that FORTRAN is faster on small inputs due to the threading
> overheads, but faster on large inputs

Anyways, thanks! Enjoyed the article!

------
ghostwriter
> Having written parallel software in both C and Rust, the memory safety
> guarantees and easy parallelisation with Rayon offered by Rust contrasts
> quite sharply with my poor experience using OpenMP

While I appreciate the strive for safety, and the effort being put into the
rewrite, I believe in that particular case it would be more beneficial to
rewrite it in ATS with safe formally-proved C rewrites/known to be safe
inlined C calls.

~~~
phkahler
It seemed like this one area was a definite plus for the rewrite in Rust. I
recently added some OpenMP goodness to some C++ code. Making the loops run
parallel was almost trivial. The challenge was going over a couple thousand
lines of code checking for safety. There were data structures being created
and shared in various was but the code was written in a very functional style
so it worked in the end. With Rust, all that analysis is done by the compiler.
So for that code migrating to Rust is probably a good idea.

------
melling
“ I’m guessing that FORTRAN is faster on small inputs due to the threading
overheads, but faster on large inputs because of contention between threads
for cache lines and memory bandwidth"

I think the author meant FORTRAN was slower on small inputs?

~~~
estebank
Look at the graph (or the chart on the left, I had to to understand it): the
green and blue lines cross _twice_ , first the FORTRAN version is faster, then
the Rust one, then the FORTRAN version again.

------
adultSwim
New engineer thinking, want to do everything themself

------
TheRealKing
why? Why would you do so? Did you gain any significant improvement in your
final results compared to the original software? Why would rewrite a software
that already works fine, other than not having the will to learn a bit of a
new language? Why???

~~~
aw1621107
All your questions are addressed in the article. It does turn out that some of
the author's beliefs were incorrect (e.g., the state of Fortran GPU compute
libraries), though, so it's possible that they would have come to a different
conclusion had they known that.

