High performance scientific computing is very much still a FORTRAN, C, and C++ game. And of those, FORTRAN has some compelling advantages in terms of first-class built-in support for multidimensional arrays, and quite excellent compilers. And, as others have noted, until the `restrict` keyword in C99, there were optimizations in FORTRAN that were not even possible in C.
I mostly used C for my (small-scale) HPC work in grad school because it’s what I knew best, but at several points I wished I had learned Fortran instead.
Probably one of the only “higher level” languages that’s ever been used for serious petascale scientific computing is Julia (first with Celeste on the astro side, possibly soon with CliMA for climate modeling), which not coincidentally follows similar array indexing conventions as FORTRAN. And while that’s what I mostly use now, I don’t see Fortran going away any time soon.
If anything, with cool new projects like LFortran [1] making it possible to use Fortran more interactively, it’s probably quite a good time to learn modern Fortran!
Open question: why are multi-dimensional arrays, and matrices specifically, so neglected in almost every other language?
They map well to practically freakin' everything, for what seems like.. not that much effort on the language design side, but an enormous amount of tedious, duplicated effort on the user side.
That is a great question, and while I don’t know for sure, I think a relevant anecdotal observation that stands out to me is that many languages which do have first-class multidimensional arrays also have one-based indexing. And I would speculate this in turn is because most people who wanted matrices badly enough to make them a language feature wanted to do linear algebra with them, where you generally also want one-based notation since that is how all the equations in textbooks and papers are written.
So a language with multidimensional arrays is in a lose-lose position of having to choose to either satisfy the linear algebraists at the cost of alienating general-purpose programmers who want to do pointer arithmetic, or else satisfy the latter while alienating the core demographic for multidimensional numeric arrays.
Personally, I’m fine with (or even slightly prefer) one-based for my own scientific computing, despite starting with C, since it really is more elegant for linear algebra, and I have never found myself needing or wanting to do pointer arithmetic in a language that does have good multidimensional arrays — but clearly it is still a major turn-off to many others.
I don't get why a language can't have both 0 and 1 based indexing, just set it as a compiler flag or something. They are a homomorphism, it's not hard to add or subtract one.
That sounds like it belongs in the top 10% of the worst (programming-related) things I could wish upon another programmer: Having to maintain projects where you have to read the build system's configuration and keep it in mental context just to figure out if an array access is mathematically correct or not.
I would imagine this support to be stomachable if it was active at all times but using a visually different syntax for 0-based and 1-based indexing, maybe using something like the guillemet for the latter (like this: «1») that is wild enough to draw the user's attention when reading the code, and not common/prominent enough across keyboard layouts to make most users think twice before using it when writing code unless it makes sense for the domain.
It's hard enough to understand a codebase, but if a language was like Haskell for instance (you can enable certain extensions in Haskell with a flag at the top of a file), you could have a line at the top of a file defining that this file is 0 based, while maybe the default is 1 based for the language. It doesn't seem like that much more than you already have to consider in many languages.
Funny you put it like that, because the fragmentation regarding language extensions is frequently discussed as one of the biggest issues affecting real world Haskell.
You can actually switch starting indices fairly easily in Fortran [1], or for that matter Julia, but for some reason the option doesn’t seem to be particularly widely used. People seem to give a lot of weight to the default, even when the effort to change is minimal. Added complexity/ambiguity?
You could declare the array as array[0..100] of something resp. array[1..100] of something for 0/1 based, and just as well do array[2..100] of something, or array[-1..100] of something, for -1/2 based.
Or with letters array['a'..'z'] of something,
Or with enums. TEnum = (red, blue, green), and then an array array[red..green] of something
This would all be fine until you start passing and returning these arrays and indexes between independently implemented functions and modules. Then you'd have to start declaring an index type tied to a specific array or a way to look up the index base for a passed in array.
Which would be fine actually for the program itself, if this is all figured out by the compiler/at runtime, though obviously more inefficient "for no good reason".
It would wreak havoc on the programming side though. Have you ever worked on a code base with more than a few people? Have you ever had to deal with different preferences programmers have? Differences in opinion between smart, slightly (or not so slightly) non-social people?
Using 0 or 1 indexed arrays, the new space vs tabs war! Or the new 2 spaces vs. 4 spaces vs. 8 spaces vs. using tabs war!
Ada lets you use any integer subtype as an array range. The first index can start anywhere you wish. It's string library defaults to 1-based but you can easily interoperate with strings using a different subtype. Code can be written that is base agnostic by using the attribute mechanism or by aliasing arrays to use a known range.
I remember some version of BASIC from the early 80's. You could create real multi-denominational arrays including sparse ones with any starting index(s) you wanted.
OK, thanks. I've not used that. The problem I had was that I learned Fortran before coming across BASIC, so converting Fortran programs was often a problem when features where missing.
My dumb guess is that whereas Fortran was kinda designed with scientific computation as a first class use case and c/c++ let you do anything with memory (and make it easy to use inline assembly to access specific instructions), most other languages are designed as general purpose language without scientific (as in "high performance") computing as a main design objective.
And thus specialized data type, operators and syntax look like overhead. And thus language designers leave it out.
One interesting wrinkle is that traditionally (dating back to FORTRAN IV at least), Fortran compilers store matrices in RAM address space using column-major order, where consecutive elements are in the same column, not the same row. [1]
Most languages that prioritize Fortran code interop also adopt column-major order, but most other languages that support multidimensional arrays do row-major order. I'm not sure why Fortran went column-major but because it did, a lot of libraries designed for Fortran callers (such as LAPACK and all BLAS implementations) need to be told that input arrays have been transposed when they come from languages like C++.
That’s one reason why APL and its descendants are so powerful in the hands of people who have become fluent in viewing computation through the lens of arrays.
Because the only place you really need them is in mathematical/scientific computing, and while many scientists tend to pick up a little programming for their research, the vast majority of computer scientists and programmers, in particular the ones with the interest and ability to work on languages, are not concerned with mathematical modeling/processing. It is a fairly rare interdisciplinary combination.
Plenty of DSA work has more optimal representation and evaluation as matrix arithmetic. Particularly graph traversals and transforms which are heavily used in optimizing compilers.
1. There are thousands of scientific subroutines written in Fortran that are stable and well tested - in fact, there are well established formal libraries of them that go back over 60 years. The researchers know and trust them.
2. Despite the sneers and derision that Fortran has been subjected to from non-Fortran programmers over recent decades, Fortran is an excellent language to do intensive scientific and mathematics work. Compilers are optimized for calculation/math speed and large intensive calculations. From the outset, it could handle complex numbers, double precision, etc., etc. natively without having to resort to calling libraries/special routines as other languages had to do back then.
3. Scientific enterprise alongside mainframes and supercomputers have well established and stable ways of working including program and data exchange etc. Essentially, a well established computing ecosystem/infrastructure surrounds scientific computing that researchers know and understand well. There is no need to change it as it works well. Moreover, it's a stable and reliable environment when compared to other computing environments - Fortran was introduced long before the cowboys entered the computing field, back then the programming/computing environment was more formal and structured, this contributed to that stability.
For the record, Fortran was the first language I learned, and my programming back then was done on an IBM-360 using KP-26 and KP-29 keypunches and 80-column Hollerith cards.
Researcher in Nuclear Engineering chiming in with a similar experience. Much of our code is the same way, and in fact our field more or less invented many computing methods used today. One widely used simulation code called MCNP [1] can trace it's origins back to running on the ENIAC at Los Alamos in the 40s(!). At some point, it was considered a major upgrade when it was rewritten into fortran90.
> From the outset, it could handle complex numbers, double precision, etc., etc. natively without having to resort to calling libraries/special routines as other languages had to do back then.
At least until sometime in the '80s another advantage of Fortran over C was that it could handle single precision floating point. C always promoted float to double when you did arithmetic with it.
Arithmetic was faster on floats than doubles, and that could make quite a difference in a big simulation that was going to take a long time to run.
The high energy physics group at Caltech back then had a locally modified version of the C compiler that would use single precision when the code tries to do arithmetic on floats. Some of the physicists used that for programs that otherwise would have been in Fortran.
You're testing my memory now. If I recall correctly, at least complex numbers were in the WATFOR (University of Waterloo) FORTRAN-IV compiler that I used in 1968 on the System/360. (That said, I've always understood it was there from at least FORTRAN II.)
I seem to recall needing 'i' for vector stuff, Maxwell's equations etc. (If I'm wrong it must have been on the VAX FORTRAN-IV several years later.
Mostly the compiler won't catch violation of the storage association rules. Toolpack had a useful tool for strict F77 code which was a boon fixing stuff not to break with the Alliant compiler, which was the first one the project I inherited used that took advantage of the optimization opportunity. Possibly Cray's would have done, but hadn't been used for a while at that point.
The fact that people won't believe the rules exist, is a continual source of bug hunting in Fortran unfortunately. At least restrict is visible.
Fortran didn't originally have pointers, and pointer based code is relatively rare in fortran, so making requirements about pointer aliasing doesn't have same potential to completely break your code that it does in c. The reason c can't actually restrict the restrict keywoard, is that the standard would break code!
The storage association rules have nothing to do with pointers like that. Code will typically break because arguments are passed with copy-in/copy-out semantics, not by reference.
I __think__ Fortran just never gave you this particular footgun. No aliasing of pointers allowed from the getgo. It was never an issue. Somebody let me know if that is wrong.
I read Complex numbers came in FORTRAN IV, so mid 60’s.
I never used anything older than FORTRAN 77, and it certainly had complex numbers, and also ways for the “inventive” programmer to make use of pointer like functionality. You could e.g. pass functions to functions, if you were so inclined.
At face value, that statement implies complex numbers came later but before FORTRAN III, which also implies a point release (FORTRAN II was released in 1958). Moreover, I read somewhere that FORTRAN III had it but it wasn't widely released. Likely then that FORTRAN IV was the first widely released version with complex numbers. As it was released in 1961, we should assume then that this was the likely date.
You'll note from one of my other posts that FORTRAN-IV was my intro into programming. However, that was toward the end of the decade not the beginning. By then, many universities were running the WATFOR FORTRAN-IV compiler from U. Waterloo.
At the time it was anything but ancient. We students felt both cocky and privileged in that we knew that we were some of the very few people in the country who had access to a state-of-the-art IBM mainframe.
BTW, the mainframe, a System/360, had just 44kB memory and ran 7 concurrent operations at once. It was a red-letter day when the '360 got upgraded to 77kB memory. The university's rag had the fact as large headlines on its front page.
I like it! Sorry to call older dialects ancient. Being a Fortran programmer is somewhat like being from a red state, I find. I naturally self deprecate (or stick with c++ as appropriate) in front of others.
I used to work on a system that was older than me, but immaculately refactored (to the authors admittedly rare taste) through the years. It had run on all kinds of hardware in the 80s, but was different enough from its predecessor programs that I am not sure what all it ran on - I recall maybe DEC, maybe pdp this and that, maybe a 360?? being mentioned. The guy who led the team that wrote it passed on two years ago now. Man he was a trip! He wasn’t old. He used to smoke, darn it.
Sounds like you had some fun times with the 360! Makes me want to retro-compute a bit.
Actually my phone makes me want to retro compute as well, every time I “type” on it. ;)
Somehow your mention of cowboys entering the field feels irritating, considering Kazushige Goto and his contributions to BLAS while being in Austin, TX.
Tested in the sense of test suites and formal proofs? That would be valuable to other disciplines using other tools.
But this article isn't about fundamental algorithms being correctly-implemented in an endowment of legacy code, it's about defending a siloed language choice, which seems like an antiquated concern to me.
I applaud anyone using a tool that works for them, but if it's good, then its users have accomplished things which transcend an individual tool.
"But this article isn't about fundamental algorithms being correctly-implemented in an endowment of legacy code, it's about defending a siloed language choice, which seems like an antiquated concern to me."
1. The first comment I would make is that the headline premise is wrong or at least deliberately misleading. Let's start with a correction. I would very much doubt that climate models are written in programming languages from 1950s. The Fortran code of the 1950s was not that much like the Fortran code that I learned in the late 1960s, and that late 1960s code bears little resemblance with modern Fortran code of today. Furthermore, the Fortran standard is certainly not dead, it is being continually updated: http://fortranwiki.org/fortran/show/Standards.
2. When I made comment about libraries going back 60 or so years, this would imply that libraries written in Fortran II ca 1956 or so would pose a problem today. I would suggest that it is not so because the process of updating libraries is formal and strict, thus an updated Fortran II subroutine would work perfectly well with today's modern Fortran. This 'upgrade' process is not like converting code from say Pascal to C or whatever, for here we are still within the confines of the Fortran framework and that that conversion process is well understood, straightforward and procedural.
3. This isn’t my idea, nor am I defending something that I learned decades ago and don't want to give up. Frankly, I do little Fortran programming these days so it's essentially irrelevant to me. The point I made and that I make again is that there is a sophisticated scientific computing environment in existence and it is used by thousands of current researchers and scientists around the world. Scientists would not use antiquated software on cutting edge science if it did not work. The fact is modern Fortran is a modern programming language that delivers the goods par excellence—likely much better than any other language, especially so given its long and mature infrastructure. For example, here are the first two reference I came across on the net, there are thousands more just like this:
4. Now let's look at the current situation—'modern' software. To begin, you should read this 27-year old Scientific American article titled 'Software's Chronic Crisis' from September 1994:
I would contend that this article is just as relevant today as it was 27 years ago if not more so. In summary it says.
• Programmers work more like undisciplined artists than processional engineers (this problem remains unresolved).
• Essentially programming is not true engineering (since the time of the article, computer science has progressed somewhat but on the ground we still have multitude of unresolved problems).
• If programming is to be likened to engineering then it is in a highly unevolved state somewhat akin to the chemical industry ca 1800. Its practical function or operation is a mismatch with the everyday world or we wouldn't have the proliferation of problems that we currently have.
When, these days, one examines the situation with literally hundreds of different computer languages in existence, it is clear that there isn't enough human time and effort to rationalise them all and develop a coherent set of tools, in essence almost everything around us is only half done. We stand in the midst of an unholy Tower of Babel and it's an unmitigated mess (I could spend hours telling you what's wrong but you'll know that already).
The crux of the problem is that programmers spend much time and resources learning one or more computer languages and that it's dead easy to poke fun at mature languages such as Fortran as being old fashioned and out of date. The fact is they either do not adequately understand them or the reasons why they are used, or it is both.
The fact is it is this very maturity of Fortran that makes it so valuable to scientist and engineers. Those who are not conversant with or do not program in Fortran have simply not understood the reasons for its success.
Scientists and engineers have found the most reliable, stable and best fit available and that is to use a modern version of Fortran—simply because its reliable and it works.
This article only shows authors lack of understanding of the problem.
Oh, BTW, let me add that I have no contention with theoretical computer science models. It's just the divide between theoretical computer science and what happens in practice is as wide as it ever was.
I guess the downvotes and this reply indicate I was unclear or wrong, but I can't tell how. I guess suggesting that the rest of us can learn from Fortran is disrespectful to it, which means I'm not interested.
"I guess the downvotes and this reply indicate I was unclear or wrong, but I can't tell how. I guess suggesting that the rest of us can learn from Fortran is disrespectful to it, which means I'm not interested."
Please don't get upset about this, you're not alone. I try to state my opinions forthrightly to stimulate debate and sometimes that sounds abrasive, especially so when there's a clash of cultures (which is partly so here). Incidentally, I didn't downvote your comment.
It is hard to summarize or adequately paraphrase all the technical and cultural issues involved with programming here because there are so many, to do so would require a book of explanations. Moreover, an adequate and satisfactory explanation of the problematic issues facing modern computing would have to begin with a high-level overview then it would be necessary to drill down and explain the many underpinning parts one by one. For instance, to provide a proper explanation of why programmers tackle or approach their work in a significantly different way than is done in traditional (well proven) engineering and why this has had a noticeable deleterious effect on programming rigor one would also have to launch into a study of modern-day ergonomics of programming—and this, in and of itself, would be a massive undertaking.
Treating the matter bluntly, which I admit I've done here in my summary of the September 1994 Scientific American article [Software's Chronic Crisis], will definitely alienate many present-day programmers but what else can one do to raise attention about such an important matter in such a short space? How do we raise and restate the fact that not only SciAm's staff writer, W. Wayt Gibbs but also that many others have put forward strong evidence to the effect that the way we tackle practical programming tasks isn't truly professional and that it is this lack of professionalism that is a major contributor to modern-day computing problems—and do so without actually offending anyone? I reckon one cannot, and the fact that even raising the issue (as here) alienates is also problematic (in that it acts as an additional contributory factor that must be managed).
For reasons already mentioned, Fortran has effectively managed to bypass many of these problem because of its historical role and the fact that it successfully services a specific niche field, albeit a very large one.
Many programmers of other languages do not appreciate the fact that one of the main reasons why Fortran still has such a strong hold upon scientific and engineering programming is because of its origins. Fortran's very raison d'être arose directly from a very practical need of electrical engineers to overcome the problem of the assembly language bottleneck, its limitations were that it was not only slow and tedious but also error-prone, especially so when the job was a big one.
It is high tribute indeed that John Backus and his team did such a remarkable job in solving the input bottleneck problem with Fortran; the fact that they did so in such an effective and eloquent way is the reason why Fortran is still alive and well today some 65 years after its initial release. (It's worth taking a minute to read this Wiki on Backus: https://en.wikipedia.org/wiki/John_Backus.)
Thus, from the outset, Fortran was seen as a very useful tool, for the first time scientists and engineers now had a practical and comprehensible way of entering science and engineering data into computers in ways that were essentially analogous with those that they had always used to solve practical problems that required mathematical solutions.
The key point was (and still is) that Fortran programs now allowed for computer input that was easy to understand as it offers a representational view of what was already within the mindset of programmers whose principle interests were in science and engineering and NOT specifically programming per se. That is, Fortran was and is still so successful in those fields because its paradigm or modus operandi allows it to align easily with or follow accepted methodologies and practices—and that this usefulness was immediately obvious to its target audience/users.
Programmers of other languages must get their heads around the key fact that Fortran, scientists and engineers were a dovetail fit from the outset (and they still are). Therefore continue to expect to see climate models and most other [especially large] scientific modelling to be written in Fortran for the foreseeable future!
Nowhere in these posts have I said that Fortran is the best programming language or that it ought to be a kind of universal language—nothing is further from the truth. Essentially, I am just making the point that currently it is the best fit for the job and that it has long demonstrated the fact with its well established lineage of some 65 years.
Initially, I too was underwhelmed by Fortran. When I was first confronted with having to study it I thought that whoever dreamt up this goddamn cryptic system could have made it much more logical and intuitive (back then, there was little to compare it with—not even BASIC) not to mention its seemingly arcane basic input and output statements (format, print and all that horrible Hollerith stuff). To my mind, they were so overly complex as to be essentially impractical (and my errors were numerous).
Of course, the real problem turned out to be me—not the language! Like many programmers who have objected to having to change programming languages and or having to dispense with bad habits in exchange for better ones, initially I too riled against certain programming improvements made in later versions of Fortran, so I well understand where the reluctance to accept Fortran comes from. (This I believe is the crux of the problem. As we can see from the story/article, the author, Partee, has aptly demonstrated his ignorance in such matters, unfortunately he is not alone.)
For instance, early versions of Fortran made much use of the GO TO statement. I loved it, as there was seemingly no end to the amount of spaghetti code that I could write without doing hardly any planning. Moreover, in the days of punched cards, GO TO statements also made programs easy to modify on-the-fly (as most of us had many pre-punched cards that we could insert into a card stack for a quick fix). When GO TO was rightfully depreciated to force a more structured programming style I, like many, initially resented the fact. [FYI: http://fortranwiki.org/fortran/show/Modernizing%2BOld%2BFort...]
I believe that much of the reason for why Fortran is not appreciated for what it is and why it has little currency outside the scientific and engineering communities is this commonplace reluctance of many to take time to appreciate facts or things that are seemingly outside their realm of interest or their immediate need to know. Right, it takes both time and effort to appreciate what Fortran is about and few are actually prepared to invest in either let alone both. Dismissing Fortran as obsolete and irrelevant is thus a much easier option.
The actual title is, "Why are Climate models written in programming languages from 1950?".
I think the assumption that "old is bad" is the cause of many, many, many foolish decisions. Useless code rewrites, company reorganizations that are not significant improvements, and many other bad ideas hinge on this Worship Of The New. Why are we using an alphabetic system originally developed c. 1800BC? It's old, we should switch to new writing systems every 10 years because they're new, right :-)?
Older is not better. Newer is not better. Better is better. There's no point in switching something if the destination isn't better, and even if it's clearly better, it needs to be so much better that it's worth the switching cost.
And it's just a misunderstanding. I think it was Perlis who said, "We don't know what the programming language of the future will look like, but we know it will be called FORTRAN."
Hoare said "I don’t know what the language of the year 2000 will look like, but I know it will be called Fortran". He also said "ALGOL 60 is alive and well and living in Fortran 90", which is a decent compliment from him.
Engineers are horrified to learn that mathematical modeling is done in a language created in the 50s, but aren't bothered by the fact that the dominant computer interaction model used in the field right now dates back to pre-WW2 teleprinters. Someone is lacking in self- and historic awareness.
What practical problems does Fortran cause when used for numerical computing?
Of course it is, just like all programming languages. If it weren't formal, a computer couldn't interpret it precisely. Formal means mechanical; a language whose calculus can be carried out by a machine.
Hint for commenters: Since Fortran 90, it's spelled Fortran, not FORTRAN. By using the latter you signal that your experience on the topic is from 30 years ago.
Right. Sometimes old habits die hard and I use FORTRAN when I mean Fortran but I don't do it intentionally nor do I do it out of ignorance. (I note that as I type this into Firefox, its speller still wants to capitalize the word! As I've discovered so do many other editors and word-processors.)
In recent years I've adopted the following nomenclature and you'll note I've done so here in my earlier posts. That is to treat the name of each specific version as a proper name. As FORTRAN IV was originally called that including the Romanized numerals for the version number I use that out of respect for those who originally named it in the same way I'd always use say John and not john. Nowadays, when I refer to Fortran in its generic sense I use its new default name rather than its old acronym form.
C and C++ are definitely the competitors to Fortran; not Chapel or Python. In the life sciences, large amounts of Fortran code has been rewritten in C/C++. But they have orders of magnitude more funding than climate science and teams of professional programmers to maintain the code.
Fortran is a domain-specific language for scientists, and excels at array arithmetic (for graph-based problems though, maybe look elsewhere). Even badly-written code can run reasonably fast, which is not the same for C/C++. There is also the decades of concerted hardware and compiler optimizations that make Fortran hard to beat on HPC systems.
It's not as readable as Python, but it's more readable than C/C++ written by a professional programmer.
I work at one of the labs mentioned and get paid for running not only the climate models but mesoscale models as well, which are also written in Fortran.
The premise of the article is that Fortran, 70 years later is still an appropriate tool to use for crunching numbers which it absolutely is but it neglects one major problem.
Like the COBOL issue that was all the rage 20 years ago, it is difficult to hire younger generation programmers that want to and are excited to develop in Fortran.
> ...it is difficult to hire younger generation programmers that want to and are excited to develop in Fortran.
How much are you paying? Most often times I see this kind of reasoning, digging deeper shows that the salaries are not competitive. There's a large number of us that just want to work on interesting problems for adequate money and don't care what the toolset is. I'm fully on board with the idea of being paid to write Fortran.
Also, COBOL's problem isn't so much that younger generations aren't excited about it, but that the problems in the domain solved by COBOL all require highly specialized domain knowledge about an obtuse set of systems said code runs on (with most of their documentation paywalled, at least until recently). The barriers to entry are much, much higher and few companies are willing to train at the rates the language demands.
My understanding is that they're mostly fortran programs linked together with unix scripts which are run on HPCs - could the models run in a more distributed way like high quality grid computing setup? Lastly, what's the best way to find and learn more about the models?
Switching to any sort of commercial grid or cloud computing setup would be rather complicated by the fact that climate models are critically dependent on the fast, low-latency interconnects (e.g., infiniband) of a proper HPC system to achieve good performance at scale. This is usually coordinated with hand-written message passing via MPI directly in the relevant top-level Fortran (or C/++) program.
There are some other (i.e, “embarrassingly parallel”) scientific computing problems where a higher-latency distributed setup would be fine, but in climate models, as in any finite-element model, each grid cell needs to be able to “talk to” its neighbors at each timestep, leading to quite a lot of inter-process communication.
Yes, they run in the cloud, see e.g. https://cloudrun.co (disclaimer: my side-business), but others have done it as well, for a few years now. On dedicated, shared-memory nodes, it's no different from HPC performance-wise. It can be even better because cloud instances tend to have later generation CPUs, whereas large HPC systems are typically updated every ~5 years or so. But for distributed-memory parallel runs (multi-nodes), latency increases considerably on commodity clouds which kills parallel scaling for models. Fortunately, major providers (AWS, GCP, Azure) have recently started offering low-latency interconnects for some of their VMs, so this problem will soon go away as well.
Indeed, basically, though you may lose from lack of direct access to the hardware. But it's typically expensive.
Do AWS and GCP actually have RDMA fabrics now? The AWS "low latency" one of a year or so ago had a similar latency to what I got with 1GbE at one time.
I was part of a project looking at the feasibility of migrating some of the EPA's air pollutant exposure models from Fortran to R/Python. While Fortran was decisively faster, I think the project lead recommended migrating the model to R since not many people used Fortran anymore. It was also harder to share Fortran code for collaboration as well.
I wrote an addon for the MBS application Simpack[1] in Fortran as part of my master thesis and I have to say except for the stupid line length limit I enjoyed using Fortran (was first contact with Fortran then). My educational background is Mechatronics, so my cs background is not web/gui applications, but rather embedded systems.
Not only "climate models"... a large chunk of scipy is just a thin Python layer over decades-old Fortran code. That many physicists chose to use the real thing instead of the fisher-price interface speaks in favor of them.
Actually, as a user of the fisher-price interface, I'm glad that it can bind to C and FORTRAN libraries, so my numerics are based on the highest quality code.
I'm also a very happy user of the dumbed-down interfaces. But I find it strange when my fellow users are surprised or even horrified at the fact that some people still write and maintain Fortran and C code. Hell the Python interpreter they use is written in C! But apparently if you write in C you are some sort of old man yelling at clouds.
Gonum wraps the same code, from what i could tell fortran seem to have a neat way to handle more specialised number systems like dual numbers and hyperbolic numbers.
- Ability to operate on arrays (or array sections) as a whole, thus greatly simplifying math and engineering computations.
- whole, partial and masked array assignment statements and array expressions, such as X(1:N)=R(1:N)*COS(A(1:N))
Fortran 2003:
- Object-oriented programming support: type extension and inheritance, polymorphism, dynamic type allocation, and type-bound procedures, providing complete support for abstract data types
Fortran 2008:
- Sub-modules—additional structuring facilities for modules; supersedes ISO/IEC TR 19767:2005
- Coarray Fortran—a parallel execution model
- The DO CONCURRENT construct—for loop iterations with no interdependencies
- The BLOCK construct—can contain declarations of objects with construct scope
> How is Fortran coming along with GPUs? (last I looked it was being done with proprietary compiler language extensions, but that was a while ago)
There is support for CUDA in Fortran. In fact, Nvidia purchased one of the main Fortran compiler vendors (PGI) and is open sourcing their compiler as flang.
CUDA is the predominant GPU programming model in the HPC space. There are open standards, but they are nowhere nearly as widely used.
> Are modern supercomputers faster than a cluster of consumer-grade GPU cards?
Fundamentally, supercomputers use the same processors and GPUs that you find in consumer hardware. The differences tend to lie in A) the sheer quantity of hardware used (think millions of cores for Top 10 systems), B) high bandwidth, low latency interconnects and C) some market segmentation by hardware vendors (e.g. Nvidia deliberately limits the double-float performance of consumer hardware)
Another advantage to Fortran in academic settings, at least the older version of Fortran, is that there isn't much to the language. Someone already familiar with programming can pick it up in a day or two.
So if your a Prof with a large code-base that you want to have a stream of Grad-students, undergrad research assistants, assoc. Profs etc. contribute to before they move on, having a language that doesn't require squandering half a semester on learning to code before you can start doing actual science is a big bonus.
Fortran will always be around because there's too much investment in it.
A nuclear reactor simulator I ported from UNIX to Win32 in 1998 was several million lines of code written by nuclear engineers (not software engineers) and physicists. It's over 60 years old now.
TFA would seem more reliable if it didn't have such a needlessly obscurantist Fortran-python code comparison. Nothing about the two languages calls for different base case logic! That is, in order to prevent confusion, the "Python3" code should have been this:
def fibonacci(n):
if n < 2:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)
From the other side, I thought the Fortran code had too much syntactic ceremony.
I haven't written Fortran in a while, but I was pretty sure that for illustrative examples like this, you could dispense with the entire MODULE declaration, the use of END FUNCTION Fibonacci instead of just END, and the usually-optional :: separator between the variable's type and name.
Something like this? Again, no recent experience:
implicit none
recursive function fibonacci(n) result (fib)
integer n
integer fib
if (n < 2) then
fib = n
else
fib = fibonacci(n - 1) + fibonacci(n - 2)
endif
end
(The IMPLICIT NONE has to stay because of the now-regrettable Fortran convention that without it, the type of a variable is determined by the first character of the name (n would be integer because variables starting with m, n, i, etc. are integer, while fib would be floating point).)
I'm not an expert in Fortran, but my understanding is that Fortran 90 is basically the equivalent of C++11--it added a slew of major features (such as free format code and array notation) that makes it a pretty different language from pre-90 code. Even if newer language revisions add some more useful features, it's the core set from Fortran 90 that's worth differentiating, much as I might describe modern C++ code as C++11 even if the project requires C++14 or C++17.
This is absolutely correct - Fortran standards after 90/95 have mostly added extra features, rather than fundamental changes to how people write Fortran. Fortran 2003 added OO support, but I don’t believe that has seen widespread adoption.
I don't think it would look different. There is a big difference between Fortran 77 and Fortran 90, but less between Fortran 90 and Fortran 2018, at least for this example.
I don't entirely agree with the overall assertion of this article. The author has some valid points, but I think it misses the forest for the trees.
TLDR: I think Fortran tooling and HPC clusters are a self-reinforcing local maximum. They are heavily optimized for each other, but at the cost of innovation and extensibility.
For example, we'll never get a fully differentiable climate model in Fortran. The tooling does not exist, and there are not enough Fortran developers to make a serious dent in the tooling progress made outside of the HPC world. The MPI stacks these codes rely on are not great for hardware outside of a supercomputer, and Fortran codes basically are built around full interconnect. I have many PFLOPs at my disposal that I cannot use because these codes are too brittle without being entirely rewritten.
At the end of the day, everything is a Turing machine, so you can technically do whatever you want in Fortran or any other language (or mix and match), but strategically staying in Fortran leaves a lot of resources on the table.
Well Fortran was, notably, one of the first languages to have proper source-to-source autodiff (TAPENADE) [1-3], so it’s probably not impossible, though my choice for a fully differentiable climate model would personally be Julia, like the CliMA folks at Caltech [4].
For heavens sake, let's stop the discussion of Fortran array index starting at 1. In Fortran the starting index can be anything. A(-10:-5) is valid, A(-10:10) is valid, A(1:10) is valid, A(0:10) is valid. Choose what you want and do not complain about it again, please. No other language has this amazing capability.
These considerations are valid also in another fields such as in computational quantum physics/chemistry. Major software are written in Fortan and in C++. I work in the ML community now, after many years of quantum chemistry and when I say that I know Fortran people usually laugh :).
I'm not familiar at all with the world of high performance scientific computing. Are C++, Rust, Nim, Zig, & co. even remotely considered as potential candidates in the future, or is it really only C and Fortran with no expectation to see much changes? Just curious.
C++ certainly, and I would not be surprised to see Rust in the near future.
I have not seen anyone use Nim or Zig yet. There are also some special-purpose languages like Fortress (apparently now defunct), Coarray-Fortran, and Chapel, though none seems to have achieved too much market-share.
Personally I have almost entirely switched to Julia (from mostly C), which lets me do my everyday plotting / analysis / interpretation and my HPC (via MPI.jl) in the same language. Fortran definitely still has some appeal as well though.
Well, I'd say I'm generally at the shallow end of the HPC pool, and also in a field that's relatively new to HPC in general so there aren't a ton of established community codes to draw upon. Consequently, I'm willing to make a few mild performance tradeoffs if it means my grad students and I can iterate faster.
If you wanted to use Julia at petascale or above (like the Celeste folks), then you'd probably want to be doing that in a case where you see fundamental algorithmic improvements you could readily make over the current SOTA with a higher level, dispatch-oriented language - or else a case where you really need, say, certain types of AD or the DiffEq + ML capabilities of the SciML ecosystem (the latter of which very much depends on the level of composability that follows from the dispatch-oriented programming paradigm).
In general, my two cents on what it takes to get "good enough" performance from Julia for my sort of HPC are that you (1) embrace the dispatch-oriented paradigm and take type-stability seriously, and (2) either disable the GC and manage memory manually or else (my usual approach) allocate everything you need heap-allocated up-front, and subsequently restrict yourself to in-place methods and the stack.
MPI.jl pretty much "just works" in my usage so far (just have to point it towards your cluster's OpenMPI/MPICH) and things like LoopVectorization.jl are great for properly filling up your vector registers with minimal programmer effort.
C++ is widely used for High Energy Physics. The ROOT framework [0], developed at CERN, is mostly C++.
I'd be tempted to say that Rust could be used as well, but the equivalent of MPI and OpenMP for Rust is still not as fast as in C++/C/Fortran.[2] That's easy to understand: there are decades of investment in MPI/OpenMP for C/C++/Fortran, and Rust is not there yet.
Also, in some cases where high throughput is needed, languages with garbage collector are not suited. In this scenario, deterministic execution time and deterministic latency are very important. Not directly related to HPC, but Discord migrated from Go to Rust for this reason[2].
C++ sees a fair amount of use and has some mature libraries written in it like the trilinos collection alongside being able to use existing C libraries. I think for other languages the limiting factor is the maturity of their linear algebra libraries and if someone has written bindings to use MPI
I haven't read the article but in my experience FORTRAN is used because a lot of really complex numerical routines were written years ago by really smart people in FORTRAN and nobody really wants to rewrite them. There are exceptions - the basic LAPACK stuff has modern alternatives (e.g. Eigen for C++, nalgebra for Rust).
But there are a ton of more specialist libraries, e.g. ARPACK that people probably aren't going to rewrite.
That said, there's a FORTRAN to C transpiler that works pretty well. I used it when I needed ARPACK and didn't want to deal with FORTRAN.
The reference BLAS in Fortran is indeed slow. I'm not aware of any tuned version in Fortran. It might be possible to re-write the BLIS structure in Fortran and get reasonable performance on, say, Haswell, but not on SKX, if we talk x86.
The tooling is stable so when you link to FORTRAN binaries the ABI tends not to bitrot. IMHO FORTRAN might get a borrow checker before C/C++ to be on parity with Rust for memory safety.
I mostly used C for my (small-scale) HPC work in grad school because it’s what I knew best, but at several points I wished I had learned Fortran instead.
Probably one of the only “higher level” languages that’s ever been used for serious petascale scientific computing is Julia (first with Celeste on the astro side, possibly soon with CliMA for climate modeling), which not coincidentally follows similar array indexing conventions as FORTRAN. And while that’s what I mostly use now, I don’t see Fortran going away any time soon.
If anything, with cool new projects like LFortran [1] making it possible to use Fortran more interactively, it’s probably quite a good time to learn modern Fortran!
[1] https://lfortran.org/