Man, why is inertia considered a bad thing? "new code is bolted on, with a prayer that nothing breaks" is literally all of software development. How many times do things break because some fool decided we need to use js framework 2022.04.07 when 2022.03.05 worked fine but new, shiny etc thus break. "Old" does not mean broken automatically, people seriously need to excise this assumption in software development because it simply isn't true.
HPC administrator, researcher and scientific software developer here.
Inertia is not a bad thing, but a long living code evolves in strange ways, because of us, humans. First of all, programming languages and software design has a symbiotic relationship. They feed each other. Support for required patterns and new technology is added to languages, and designs are made in confines of language capabilities.
Moreover, technology changes with every generation. Hardware change, capabilities change, requirements change, designs and languages change. More importantly mindset change. So you're bolting modern designs over "vintage" designs. Inevitable shims get coded (sometimes implicitly), and thing get complicated, even with best documented designs and developers with well intentions. The code pushed down solidifies, knowledge fades, and documentation get lost unless it's bundled with the code repository.
As a result, legacy code becomes something of a monster, where developers don't want to see, touch, or work with.
My research was coded with C++11 in the beginning. If I want to modernize with C++14 or add new parts with C++14, these parts will probably look so different that it'll look like two different languages glued together with black magic. It's the same thing with long living FORTRAN code. Only with longer legacy.
The culture and dynamics around scientific and academic programming is different from both FOSS and commercial software. This needs to be taken into account.
> The culture and dynamics around scientific and academic programming is different from both FOSS and commercial software. This needs to be taken into account.
I'm currently teaching modern scientific Python programming, if that's a thing.
You raise an excellent point, the software development life cycle is quite different in academia. Do you have resources in mind on that topic?
I was thinking of dividing academic code in 3 "Ps":
- Playground: throw-away scripts for data analysis, fast moving code where any rigidity slows the process. Internal use by a single researcher usually.
- Prototype: code that serves as a foundation to the above, or is reused frequently, but is otherwise not shared outside a research group.
- Product: code, often open source, that is shared with the community.
Most tutorial about "good coding practices" do not make distinctions between these three stages.
For Python, there are practices that work at any level: auto-formatting code and sorting imports, gradual typing. But things like tests, API documentation, dependency management only apply to steps 2 and/or 3.
Stage 3 does not always happen. There are a few remarkable counterexamples, but most research software does not usually finish taking the form a product.
In my experience, what happens is that some prototype works quite well and gets used in the next project. But the next project is building a new prototype, not improving the previous one. After a few projects, the person who wrote that first prototype already left the department, but there are people who depend on it to get results. If enough time passes, you will have some piece of legacy code that very few people can understand.
The problem with academia is that, while the prototype is an excellent topic for an article, the real product, a maintained (and maintainable) software project, is not. And getting funding in academia to do something of no academic interest (ie. that cannot be turned into a paper) is very difficult. If the software is of no interest to some industrial partner willing to spend money, that code will just keep moving between the USB drives of PhD students, probably branching into slightly different versions, until someone is brave enough to start working in a new prototype and the cycle starts again.
With the increasing expectation that academic research results should be reproducible, there's really no such thing as throwaway "playground" or "prototype" code. Everything is potentially a "product".
Not only are software development life cycles different, but also the attitude to development. Which is not a surprise when in most cases, the output is not the software (or system), but research or learning; the code is merely a side effect.
May be you don't realize it but still, the hidden hypothesis in your argument is once again "new/current is better." You say "mindset[s] change," again, that just comes with developing software in general. Your mindset changes when you make a web app, a system tool (like a driver), or scientific code. I have a different mindset when I knead the 30 some odd year old PIC code I have to manipulate every now and then than the few times I played around in js for personal projects and it's a different mindset when I write 6502 asm for another. The thing is you say a "different mindset" that is different from what is familiar to someone new to a codebase is an argument for why "inertia is an issue." There is an obvious and different solution to "rewrite the code" and it is "re-learn the code/old design." The only reason you opt for "rewrite the code" here I can imagine is some hidden bias towards a modern mindset.
I understand how code bases change and warp but it really is a push and pull for when you should abandon something vs. keeping it around. Moreover, the other alternative, learning to actually use old code and understand it in the mindset it was developed avoids the frankenstein-ization you refer to becase if people actually understood the old code, they can add to it in a way that meshes well with the existing code rather than it being a bolt-on. That said, I can understand if you inherit something that already has the bolt-ons such that you're not really responsible for that, and that can be harry, but really I don't really feel like that is something unique to computational science in the abstract. Bolt-ons are common across CS I feel like.
The main thing I am railing against and have been doing for a long time is the tendency for developers to have more on an emphasis on writing code as opposed to reading code that already works. In particular, taking time to understand so-called legacy code, learning to think in the way it was written, then modifying said code in a way idiosyncratic to it. Unironically, we focus way too much on creativity in CS. That sounds a bit funny but the fact it does already demonstrates the reality of that mindset's (ironically) strangehold on software in general. It's really funny because creativity is actually not that important for the majority of users of computers but is very much valued by developers in general because they develop computers for a living. On the other hand, something that works and is stable is actually something people don't even know they want but even better (or worse), they rely on or at least grow accustomed to given they bitch and moan once the familiar is broken often to fill some need of some developer to chase the new and shiny.
This is a long comment, but there is one last thing I'll touch on: one of the places where I do agree somewhat is new technology. For example, for people in modeling laser-plasmas (where I hail), people have still not really adopted GPUs even though that's all the rage (and has been for years already actually), because the main tools (PIC and MHD) do not map well to GPUs and the algorithms were developed with a large memory space accessible across a node assumed. There are efforts being made now but it's still considered hot shit for the most part. So, there is one place I'll grant you that it does require some willingness to "write more" so to speak, to be able to take advantage of new technologies. That said, "writing more" in this case still requires rather deep knowledge of the old codes and particularly why they made the choices they did in order to save yourself a few years of recreating the same mistakes they did (which btw, is literally what I see whenever people do attempt that sort of thing today).
I strongly suspect there is an preference-oscillation in coding languages that is similar to the one found in broader culture between serif and sans-serif fonts. Can't fix on what it would be though?
""Old" does not mean broken automatically, people seriously need to excise this assumption in software development because it simply isn't true."
My first language was Fortran. Whilst I've some lingering affection for the language and that I still use it occasionally I also use other languages, Lisp for instance. It's significantly different to Fortran so I reckon my comments below come from having a wider experience than just Fortran alone.
First, there are things about Fortran that I don't like and have never liked the main culprit being the obtuse formatting and I/O (I'd have altered it if I could!). I'm not alone, one of the reasons Kemeny and Kurtz invented BASIC was that with BASIC all you have to do to print is to use a simple 'Print' statement. I also agree that Fortran hasn't fully kept up with the times but that's understandable given its large and significant role in science and engineering (I shouldn't need to explain that). At least it's lost some of its more egregious points—for instance, the infamous 'GOTO' instruction has long been kaput (and so on).
However, criticizing Fortran isn't the thrust of my argument here, which is the ongoing need to use existing Fortran programs without the need to change them. There are many, many thousands of programs—mathematical and scientific subroutines, etc.—that have been developed over the past 70 or so years which have been shown to be debugged and highly reliable by the fact that they have been and are still used repeatedly to good effect in critical industries such as nuclear, space and civil engineering just to mention a few.
Translating and or rewriting these routines into more modern languages risks introducing errors and bugs, and whilst modern programmers would likely find this ascetically pleasing that ought to be their secondary consideration—the primary one of course being the correct operation of the program they're working on. Even if all that coding were translated successfully, we'd likely still have issues with different compilers interpreting the translated code in subtly different ways to the original Fortran ones, [as I mention below, authentication of the translated code and similarly the new compilers to meet necessary standards alone would be a nightmare].
In short, we cannot guarantee the translated routines will do exactly the same thing or behave in exactly the same way as they did under a native Fortran compiler—at least not without one hell of an effort!
As mentioned, the key issue for keeping this huge library of Fortran routines operational is that it is huge—truly vast (nearly 70 years is tied up in developing Fortran code/libraries across a multitude of disparate industries and endeavors). Moreover, much of it was programmed by scientists and engineers who had a different attitude to today's programmers in that their primary work involved working and rearranging atoms which is much harder than just recompiling the source upon discovery of an error.
In short, engineers and scientist were used to designing hardware that worked properly and reliably the first time—bridges falling down and nuclear reactors melting down the moment they are commissioned would have serious consequences that today's programmers will never experience; often, the first they know of a failure mode in their code is when it comes in from the field after it was supposedly deemed to work. Thus, this 'work-first-time' exactness attitude spilled over into early Fortran programs and has been the mainstay for much of the code written in the language since (it's a significant reason why so many of these programs have been so reliable). Remember, John Backus, Fortran's chief instigator, was trained as an engineer.
Ideally, it would be nice to see new intelligent compilers that could accurately compile all of Fortran's variants as well as other languages, Algol, COBOL, Ada, PL/I, etc. including C etc. in one package as that would allow easy reuse of old 'solid' code. Of course, this will never happen for a multitude of reasons, the first of which is that it would take years and years to fully standardize and authenticate any such compiler to full ANSI/ISO standards comparable to past Fortran ones (especially so when multiple languages are involved). Then there's the problem of who would do the programming (I'd reckon most of today's programmers would hate such work—they'd see it as much worse than writing 'hated' hardware drivers not to mention the need for them to maintain standards and endure rigid discipline across the length of the project—and that could be for quite some years.)
That brings us back to where we started. As noobermin said, old software isn't broken as is so often assumed, moreover for reasons stated, I contend that history has demonstrated that it's often much more reliable than much of today's new code.
For these reasons, I reckon Fortran is going to be around for much longer than many of us care to imagine. History has shown it has staying power whether we like it or not.
The key advantage of fortran is that it doesn't allow pointer aliasing, which allows the compiler to vectorize more. Couple that with the typical fortran use cases and compiler vendors competing on performance by vectorizing more and more.
What you end up with is a language with a reputation for being faster for computation
When C99 introduced the restrict keyword this argument fell apart.
But I can assure you, the vast majority of legacy Fortran is not vectorizable. At least not without a bit of refactoring.
Oh and did I mention, most of this legacy code was hand optimized for memory utilization. You see, back in the day 8k of memory was cutting edge HPC and about half that went to the OS and compiler. Well we all know, everything is a balancing act between time and space - the old timers traded time for space just to be able to do the computation at all on the hardware they had.
> When C99 introduced the restrict keyword this argument fell apart.
1) Nobody puts "restrict" everywhere, and in C it can be very dangerous to do so unless you're exceedingly careful.
2) The fact that few codebases use it means it's riddled with compiler bugs even if you are exceedingly careful. Just look at how many times Rust has had to disable noalias due to LLVM bugs and regressions.
Just because Fortran does the equivalent of implicitly using restrict everywhere does not mean the compiler actually statically checks for abuse. You can just as easily write bad pointer aliasing code in Fortran and depending on what optimization level you built with you may or may not experience memory corruption and/or segmentation fault at run time.
AIUI, the only language in common use that has comprehensive static checks for their equivalent to 'restrict' is Rust. Not coincidentally, there's quite a bit of interest in Rust adoption for numerics-heavy and scientific codes, the traditional preserve of FORTRAN.
You seem to be operating under the assumption that the Fortran compiler will catch pointer aliasing. In fact major implementations of Fortran do not (for technical reasons).
So the standard just says "don't do it, LOL" but then you are able to write Fortran violating this. Which is how you end up with bugs that go away when you switch optimization levels.
You seem to be operating under the assumption that the C developers know how to use restrict, when in practice they don't, so it hasn't made the argument fell apart.
In fact, only C++ is able to confortably beat Fortran, despite not having official support for restrict, because of the stronger type system and compile type metaprogramming that offer optimization opportunities out of reach to C compilers.
Usually HPC places like CERN and Fermilab don't migrate their aginging Fortran code to C, rather C++.
One of the people in my team has a PHD in weather forecast modelling and it was only a few hours ago we were talking about the Met Office having a decade(s ?) old Fotran core with lots of stuff bolted on.
Was the conclusion that decades old Fortran existing good or bad, likewise the bolt-ons? Ideally, if given a chance, what's the consensus of what you'd all like to do?
It's really not. It is simply very expensive to build up experimental validation of simulation code rewritten in a new language, and much more expensive than the aesthetic value given by programmers who want new things. There's nothing magic about the old things, there's just an existing ecosystem which works which would involve a whole lot of effort to recreate and most people think they have better things to do.
More likely, a rewrite doesn't look good on a quarterly spreadsheet and "shockingly" code designed by Johnny Two-Shoes in the 60s that uses COOMON blocks both for internal storage as well as an API is incredibly difficult to test and thus incrementally modify. Bonus points for next to no documentation.
College software engineering textbooks that explain why global memory is generally a terrible design choice were written off the lessons learned from legacy Fortran.
The way I see it, Fortran exists today because of inertia, not because it still has technical superiority.
These mountains of Fortran code represent decades of investment and undocumented "features" that you could never code around during a rewrite.
And so today, new code is bolted on, with a prayer that nothing breaks, may god have mercy on your soul.