Hacker News new | past | comments | ask | show | jobs | submit login
Toward Modern Fortran Tooling and a Thriving Developer Community (arxiv.org)
60 points by milancurcic 37 days ago | hide | past | favorite | 40 comments



I've used Fortran for scientific computing work, mainly because it's ultra-fast and my advisor still used it. If we are going to invest a high-performance computing ecosystem and community, why would Fortran make more sense than e.g. Rust?

(I have never used Rust for anything but HN seems to love it)


Author here, so I'm biased toward Fortran, though I've been enjoying learning Rust as well. I think there are a few reasons.

First, Rust's multidimensional arrays are either limited and/or difficult to use. Fast, flexible, and ergonomic multidimensional arrays and arithmetic are essential for HPC. They are possible with Rust, but my two favorite Rust books not mentioning them suggests to me that they're not the focus of the language. This may or may not change in the future.

Second, Rust may be too complex to learn for scientists who aren't paid to write software but to do research. Fortran is opposite--multidimensional whole-array arithmetic looks like you would write it as math on a whiteboard. While scientists can sure learn to program Rust effectively, I think most scientists don't think like Rust, but they do think like Fortran. For somebody not familiar with Fortran but familiar with Python, I'd say Fortran very much feels like NumPy.

Third, such ecosystem would be built in Rust from scratch. In Fortran, most of the value is already there, but needs to be made more accessible with better and more modern tooling. For example, Fortran's fpm (https://github.com/fortran-lang/fpm) is largely modeled after Rust's Cargo because we recognize the importance of good user experience when it comes to building and packaging software. With the recent Fortran-lang efforts, we study many programming language ecosystems and communities (e.g. Python, Julia, Rust, etc.) to find what could work best for modern Fortran tooling.


> Third, such ecosystem would be built in Rust from scratch.

Rust can actually interoperate quite seamlessly with external libraries, and make its own libraries available to other languages. You're quite right that the support for even simple numerics in Rust isn't quite stable just yet, so it's not ready for productive use. But it can get there, to a far greater extent than C/C++.


Fortran makes sense to the extent there is a lot of battle-tested and highly optimized numerics code written in it. It was mostly replaced by C and (legacy) C++ for new code when I last worked in HPC many years ago. Fortran is not an ergonomic language for most types of software by modern standards and performance parity was achieved a long time ago. I used all three languages.

Rust doesn't solve many problems for HPC, and HPC still often involves weird hardware targets compiling from some dialect of C or C++. A bit like embedded but without the market scale to justify tool investment. Fortran and Julia are more likely targets for a second language for the obvious reasons.

Another issue is that some HPC silicon has unusual low-level concurrency and memory semantics with no analogues in ordinary CPU architectures. At least in some cases, I suspect it would be require non-trivial modification of Rust's safety checking to allow it to work correctly on that silicon.


> Another issue is that some HPC silicon has unusual low-level concurrency and memory semantics with no analogues in ordinary CPU architectures

As long as it's properly reflected in the LLVM IR, I don't think this would be an issue.


That does not address the issue I was raising.

In some systems, sophisticated memory ownership and concurrency control are implemented in the hardware. These abstractions are very low-level and are intended to transparently support fine-grained and extremely high concurrency with almost no overhead.

The memory ownership model is oblivious to the programming language's concept of such a thing. As long as the compiler is emitting the appropriate memory control instructions, there can never be a conflict no matter how many concurrent mutable references there are and this conflict resolution is nearly free. This scales to thousands of cores and millions of concurrent threads.

The issue with porting Rust to this silicon, without thinking about it too deeply, seems to be two-fold:

First, code organized to satisfy the borrow-checker etc is an anti-optimization on these CPUs because they are specifically designed to encourage you to ignore memory ownership and mutability at the code level. This enables absurdly concurrent code execution on mutable shared memory that would be difficult-to-impossible to write safely on an ordinary CPU in any language.

Second, you would probably want the Rust compiler to ignore many issues of ownership and concurrency, instead emitting appropriate memory control instructions and delegating it to the hardware, since the hardware does it much better. This seems non-trivial, particularly since the hardware model does not map perfectly to Rust's model in the details (which vary by CPU design). This also creates the case of code that safely compiles on this silicon won't compile on e.g. x86 because the safety is provided by the hardware rather than at compile-time.

Ironically, a large part of the reason these CPUs have never been commercially successful is that people don't know how to write idiomatic massively concurrent mutable code. The assumption that thread concurrency is expensive and dangerous is so pervasive that everyone writes code that tacitly assumes this even when it isn't true for the given hardware, with significant loss of performance. Idiomatic algorithms and data structures in C++ on these platforms look very different than what you would use on e.g. x86 or ARM.


Whatever system you are talking about doesn't exist, not in the way you described at least.


The prototype for architectures with these properties were the Tera MTA supercomputers. These were evolved by Cray but a few other companies (including Intel) experimented with and produced their own variants of the concept. It was a great model once you grokked it.

The general idea is that every memory address has several semantic bits that annotate the contents on load and store. Each core has multiple independent hardware threads (128 in the case of MTA) which adapt their scheduling to those annotations on a clock cycle by clock cycle basis. You can design massively multithreaded code for these platforms with almost perfect scalability that would have catastrophically high contention and overhead anywhere else, which was the point.

There are quirks to designing software for these systems, but they don’t involve safety.


Barrel processors still need to be programmed with parallelism and cache behavior in mind. And if normal imperative code can't be reused with minimal modification, than such a device holds little appeal over GPUs (as Intel found out with Xeon Phi).

"There are quirks to designing software for these systems, but they don’t involve safety."

Parallel programming is intrinsically racey. Even if hardware could enforce safety like you're saying (which it can't, even in a single thread) it would imply a sequentially consistent memory model, i.e. the antithesis of any parallel machine.


One reason would be that Fortran has proven itself in the HPC area. Another one would be that it is a much simpler and easy to learn language than Rust (so can quickly train students, PhDs and so on to improve your simulation code, even when they have no programming background).

In most of the HPC codes I've worked with there are no complicated memory management, most of the things are arrays usually declared and allocated at launch time and that's it, no more memory management after that. So I'm not sure what rust would bring on the table, but maybe I'm just missing it.


The languages that have proven to be effective for high-performance computing are Fortran, Julia, C, and C++; but only the first two are convenient media for the scientific programmer. Recent activity in scientific computation is moving to Julia, but the vast amount of existing Fortran code makes this effort around tooling worthwhile.


"Recent activity in scientific computation is moving to Julia"

I'm not saying you are wrong, but I wonder what metric you are using.


That’s a fair question. I can’t claim to have anything that would merit being called a “metric”. The number of papers and projects using Julia increases every year, but that in itself doesn’t quite support my claim. What I mean is that there are regularly new large-scale computational projects adopting Julia, and they are the types of projects that, in most cases, I’m pretty sure would have used Fortran seven years ago.


I guess this project list is a possible metric.

https://juliacomputing.com/case-studies/


The fact that Fortran applications are not collected in a single page by a for-profit company that does not exist in the case of Fortran, does not in anyway prove that new applications moving away from Fortran.


There was actually a very good 'Rust vs Fortran' subject in the Rust forum [0].

My personnal belief is that Fortran codebases will be replaced as new programmers don't want to learn the language due to its reputation. At the moment they are rewritten in C++ (fast but so many footguns) and Python (easy but terrible performance): I am betting on Julia replacing Python in the medium term (faster, not harder to learn and designed for this particular use-case).

Rust is hard to learn so I think most scientists will not invest in it but I do hope to be wrong and that it will replace C++ in the HPC world...

[0]: https://users.rust-lang.org/t/rust-vs-fortran/64642


Julia for sure. Rust is not a good fit for scientific HPC.


Rust may be a good option for safety and speed, but putting effort into Julia to speed it up because of its ergonomics seems like a pretty good idea.


Long live Fortran. Learned it for some CFD applications. I don't understand all the devs who think Fortran is some dead language that nobody uses.


Such “devs” are primarily familiar with computers as communications devices for serving advertising over the web. Fortran is not popular in this area.


No.

Fortran is not popular with embedded, nor with distributed computing (in the internet sense, not HPC one), nor with OS, nor with UI/UX, nor with compilers, nor with computer graphics, nor with many other disciplines.

There are lots of areas which do complex, interesting tasks, and where Fortran is some dead language nobody uses.


I agree that your list is of complex interesting things that Fortran is not the best, or even a good, choice for. But that doesn’t mean it‘s “dead”, and it is not. It’s not even sick.

For a new project that I would have chosen Fortran for five years ago, today I would choose Julia. But it’s still not dead. Anyone who thinks so does not read papers in computational physics, for example.


"Fortran is the oldest high-level programming language that remains in use today"

This is like the ship of Theseus, except where the ship has changed from a fishing vessel to a dreadnought.


“I don't know what the language of the year 2000 will look like, but I know it will be called Fortran.”

-- Tony Hoare, winner of the 1980 Turing Award, in 1982


Battleship of Theseus it is, then.


Aircraft Carrier of Theseus it is, then.


We joke, but there actually was an aircraft carrier by the name HMS Theseus. It was under construction towards the end of the Second World War, and deployed to Korea in 1950.

https://en.wikipedia.org/wiki/HMS_Theseus_(R64)


Co-author here. If you have any feedback on our work or any questions, please let us know. I'll be happy to answer.


The ATS programming language offers an excellent platform for scientific computing because it is marketed at academics and offers very high performance.

It’s an obscure language but maintained and used by very smart people. You just need to be a bit of a hacker to be good at it but it’s at the cutting edge of high performance programming language research.


I remember when I got my first programming job out of college in 2000. I ended up being responsible for maintaining the Fortran 77 bindings of our software at the company.

"I was born in 77," I thought... this was trippy.


Scientific computing has the reputation, perhaps unfair, of being write-only programming. True?

In the 60+ years since Fortran was invented, enormous strides in computer languages have made source code more readable, and 99.9% of new programmers learn on only those languages.

If you ignore all that and continue to write Fortran, you've guaranteed that almost no one except other scientists will ever want to look at your code. And probably not even them.


I think there's a common misconception that numerical codes should be accessible without documentation, and without a background in the subject.

https://github.com/nasa/NASTRAN-95

Take NASTRAN-95, for example. It has good documentation in the form of manuals, books, and papers. If you have the mechanical background, you should be able to understand the docs explaining the implementation approach, and then you should be able to understand the code. It doesn't matter that there are no comments anywhere. Aside from some cosmetic differences, it looks pretty much like what you'd write today with MATLAB. It's perfectly readable and accessible by the target audience.


I don't think anyone would say that all code should be accessible to everyone.

The question is, do people who use [1] and [2] use only FORTRAN for everything, even common programming tasks that any competent person could tackle? And when they have something new to write, do they continue with FORTRAN because it's what they're used to?


I just meant to say that numerical code is not actually write-only.

Fortran is like a DSL. You just write the numerical solver part with it. Everything else is just regular code.

Just like anything else, the decision comes down to whether the ecosystem has the right tools for the job. It's a deliberate decision. It's not like people are relegated to it because they don't know anything else.


“True?”

No. I’ve never heard this before. It’s true that scientists don’t always write the clearest code, but Fortran, and even the style of Fortran typically written by engineers and physicists, is comparatively straightforward to read. It is named for FORmula TRANsator, after all. You will recognize the equations in the code. Fortran is inherently easier to read than C; it’s really not harder to read than Python. You can make it hard to read by doing complicated array indexing tricks, of course, and there are sometimes good reasons to resort to that.

And no one except other domain experts will ever have any interest in looking at your source. Why would they? No one except someone designing a website will ever want to try to figure out your CSS either.


All code gets looked at by someone else, sooner or later. And the equations are never the hard part -- it's the control flow.


Ok, now I see where the confusion is coming from. Fortran is used pretty much exclusively for numerical software. A lot of numerical software has very simple control flow using very few and very simple types. Numbers haven't changed much since they were invented. Fortran is used very differently than you're imagining.


...by people who have a reason to look at it.

What is it about Fortran’s control flow that makes you think it’s particularly hard to read? Say, harder than Python? Note that people don’t generally use computed GOTOs any more.


Your concept of Fortran sounds a little out of date to me; have you taken a look at it in the past 31 years?


Deplorably, no. Haven't looked at Object-Oriented COBOL, either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: