Hacker News new | past | comments | ask | show | jobs | submit login
Research software code is likely to remain a tangled mess (coding-guidelines.com)
182 points by hacksilver 14 days ago | hide | past | favorite | 165 comments

I don't really agree with the reasons given, even though my conclusions are the same. The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time. Moreover, you have no idea on advance where your experiments are going to take you, thus giving no opportunity to structure the code in advance so it is easy to change.

To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.

The closest to "orderly" I think research code can become would be akin to Enterprise style coding, where literally everything is an interface and all implementation details can be changed in all possible ways. We already know how those codebases tend to end..

As someone who has been on both the research and industry software end, there’s really not that much difference. Requirements change, you build that into your plans. Frankly, a lot of best practice software development that gets totally ignored by academia (e.g. OOP) can handle this exact case, and makes things way more flexible.

If the problem was only unpredictability, then projects with a clear and defined end goal (eg, a website to host results) would be of substantially higher quality. But they’re not. Well defined projects tend to end up basically just as crappy as exploratory projects.

The problem is evaluation and incentives. There’s literally no evaluation of software or software development capability in the industry. I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money. Usually there are grant updating mechanisms, and reports, but he bsed his way through that knowing there’s a 0.0000000% chance that any granting agency is going to look through his code. The fraud was only found because he got fired for unrelated activities.

I once looked up older web projects on a grant. 4/6 were completely offline less than 2 years after their grants completed. For 2 of those 4, it’s unclear whether the site ever completed in the first place.

>I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money.

I hate that every HN post about academia ends with an anecdote describing some rare edge-case they've heard about. Intentional academic fraud is a very small percentage of what happens in academia. Partly this is because it's so stupid: academia pays poorly compared to industry, requires years to establish a reputation, and the systems make it hard to extract funds in a way that would be beneficial to the fraudster (hell, I can barely get reimbursed for buying pizza for my students.) So you're going to do a huge amount of work qualifying to receive a grant, write a proposal, and your reward is a relatively mediocre salary for a little while before you shred your reputation. Also, where is your "collected money" going? If you hire a team, then you're paying them to do nothing and collude with you, and your own ability to extract personal wealth is limited.

A much more common situation is that a researcher burns out or just fails to deliver much. That's always a risk in the academic funding world, and it's why grant agencies rarely give out 5-10 year grants (even though sometimes they should) and why the bar for getting a grant is so high. The idea is to let researchers do actual work, rather than having teams manage them and argue about their productivity.

(Also long-term unfunded project maintenance is a big, big problem. It's basically a labor of love slash charitable contribution at that point.)

> I hate that every HN post about academia ends with an anecdote describing some rare edge-case they've heard about

This isn’t a rare edge case, this is very common in software projects. I’ve heard of it because I was part of the team brought in to fix the situation.

Intentional fraud only is rare when it’s recognized as fraud. P-hacking was incredibly widespread (and to some extent still is) because it wasn’t recognized as a form of fraud. Do you really think not delivering on a software project has any consequences? Who is going to go in and say what’s fraud, what’s incompetence, and what’s bad luck?

The problem is that the bar for getting software grants isn’t high, it’s nonsensical. As far as I can tell, ability to produce or manage software development isn’t factored in at all. As with everything else, it’s judged on papers, and the grant application. In some cases, having working software models and preexisting users end up being detrimental to the process, since it shows less of a “need” for the money. You get “stars” in their field, who end up with massive grants and no idea of how to implement their proposals. Conversely, plenty of scientists who slave away on their own time on personal projects that hundreds of other scientists depend on get no funding whatsoever.

Just curious, what kind of 3-year informatics grant not being completed ends up with a team brought in to fix the situation? Multi-million dollar grants don't sound big enough to be a dependency for any major customer (like defense or pharma), so I imagine if fraud was detected, they would just demand a reimbursement and ban the PI.

But I think you're both right in some sense. The cases of intentional major fraud is probably a rare edge case and they make the news when they're uncovered. But there's a lot of grey-ish area like p-hacking as you mentioned, plus funding agencies know there needs to be some flexibility in the proposed timeline due to realities. Realities like you don't necessary get the perfect student for the project right when the grant starts, as the graduate student cycle is annual, plus the research changes over time and it isn't ideal to have students work on an exact plan as if they are an employee.

But I totally agree that maintaining software that people are using should be funded and rewarded by the academic communities. A possible way to do this is have a supplement so that after a grant is over, people who have software generated from the grant that is used by at least 10 external parties without COI, should be funded 100K/yr for however many years they are willing to maintain and improve it. Definitions of what this means needs to be carefully constructed, of course.

I'll be a bit vague to protect my coworker's privacy, but the scientist was fired for other, unrelated violations, and my boss was brought in to replace him. I think he was leading an arm of a "U" grant, so he wasn't the only senior PI on it. Since they handled it internally, they couldn't just demand a reimbursement. On some level administration knew that the project wasn't moving forward, but once we started asking around, it was clear that there was no effort to start the project at all.

>But I totally agree that maintaining software that people are using should be funded and rewarded by the academic communities. A possible way to do this is have a supplement so that after a grant is over, people who have software generated from the grant that is used by at least 10 external parties without COI, should be funded 100K/yr for however many years they are willing to maintain and improve it. Definitions of what this means needs to be carefully constructed, of course.

I think that this is a great idea.

I can tell you why the sites went offline, because the funding stopped. I don't know what you're research background is but its painful to even get 5 GBP a month to host a droplet on digital ocean in a pretty lucrative department with liberal internal funding.

Agreed, but all these little things are just a sign that the industry just does not give a shit about software. They could develop mechanisms to fund this stuff, pretty easily actually. But they don’t.

A couple of other weird inequities that I’ve found are: 1. It’s hard to get permission to spend money on software subscription based licenses since you won’t “have anything” at the end. However, it’s much easier to get funding for hardware with time based locks (e.g after 3 years the system will lock up and you have to pay them to unlock). The end result is the same, you can’t use the hardware after the time period is up, but for some reason the admin feels much more comfortable about it.

2. It’s hard get funding to hire someone to set up a service to transfer large amounts of data from different places. It’s much easier to hire someone to drive out to a bunch of places with a stack of hard drives and manually load the data on them, and drive back. Even if it’s 2x more expensive and would take longer. Why? Again my speculation is that the higher ups are just more comfortable with the latter strategy. They can picture the work being done in their head, so they know what they’re paying for.

Louisiana state government spent a buttload of money on dedicated high speed fiber optic lines between a bunch of different universities in the state for videoconferencing, telenetworking, "grid computing" etc. 10 years later the only people who remember how to use the system are at LSU, rendering the purpose moot. Everyone else just uses Zoom or Skype.


> The end result is the same, you can’t use the hardware after the time period is up, but for some reason the admin feels much more comfortable about it.

Simple: predictability. With a subscription based model, admin has to deal with recurring (monthly / yearly) payments, and the possibility is always there that whatever SaaS you choose it gets bought up and discontinued. Something you own and host yourself, even if it gets useless after three years, does not incur any administrative overhead and there is no risk of the provider vanishing. Also, there are no "surprise auto renewals" or random price hikes.

> 2. It’s hard get funding to hire someone to set up a service to transfer large amounts of data from different places.

Never underestimate the bandwidth of a 40 ton truck filled with SD cards. Joke aside: especially off-campus buildings have ... less than optimal Internet / fibre connections and those that do exist are often enough at enough load to make it unwise to shuffle large amounts of data through them without disrupting ongoing operations.

Is N years of opex not part of the budget in grant applications?

In research no and it would depend entirely on your institution. For example, I looked at a job putting together a portal for people to freely examine the research put together for a research team. The project had secured a connection with the british museum, and so that website would live on under that. However, if the project had asked to host it themselves even for 60$ a year for 10 years the answer would be no. Funding grants see small opex that extend beyond the life of the project to be open to corruption or just too facile to fund, wrongly or rightly.

> As someone who has been on both the research and industry software end, there’s really not that much difference. Requirements change, you build that into your plans. Frankly, a lot of best practice software development that gets totally ignored by academia (e.g. OOP) can handle this exact case, and makes things way more flexible.

I've done both, and OOP can also make things worse. Now instead of just doing the calculations in a straightforward procedural fashion anyone who knows the research can understand, you've added a layer of structure to obfuscate it, and that structure may be harder to change if you guessed wrongly about what will be consistent and what won't. Research by its nature needs to be more flexible and will be more unpredictable than industry development. It is far more common to have to go back and reexamine even your most basic assumptions.

Of course a lot of researchers are doing the same things as industry (what should be described as development and not be getting research funding), and are certainly doing a much more amateur job of it.

Grant fraud is penalized severely in the US by the way. You can even get a bounty for reporting someone.

I work on a 12+ year academic (full stack Python) codebase, where there was an initial push for an OOP/DI architecture which was key to adapting to later grant requirements. The codebase is still evolving fine.

> I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money.

I wonder if a whistleblower payout similar to the one that SEC is doing for 1M+ fines (10-30%) would help in cases like this. The host organization would potentially be on the hook as well, so there is going to be a significant incentive to not let that happen (especially with all the associated reputational damage).

> The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time. Moreover, you have no idea on advance where your experiments are going to take you, thus giving no opportunity to structure the code in advance so it is easy to change.

I'd say you're confirming the author's theory that writing code is a low-status activity. Papers and citations are high-status, so papers are well refined after the research is "done". Code, however, is not. If the code was considered on the same level as the paper, I think people would refine their code more after they finish the iteration process.

Yes... and no. It is true that after a result is obtained, one could clean up the code for publication. And it is true that coding is not seen add first class at the moment.

At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else. Reproducing results can be done with ugly code, and future research efforts will not benefit from the clean up for the same reasons I outlined in my previous post.

While easing code review for other people is definitely helpful (it can still be done if one really wants to, and clean code does not guarantee that people will look at it anyway), overall the gains are smaller than what "standard" software engineers might assume. And I'm saying this as a researcher that always cleans up and publishes his own code (just because I want to mostly).

> At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else.

I assumed that most code published could be directly useful as an application or a library. Considering what you're saying, this might be only a minority of the code. In that case, I agree with your conclusion about smaller gains.

Most academic code runs once, on one collection of data, on a particular file system.

Academic code can be really bad. But most of the time it doesn't matter, unless they're building libraries, packages, or applications intended for others. That's when it hurts and shows.

I'm a research programmer. I have a master's in CS. I take programming seriously. I think academic programmers could benefit from better practice. But I think software developers make the mistake of thinking that just because academics use code the objective is the same or that best practices should be the same too. Yes, research code should perform tests, though that should mostly look like running code on dummy data and making sure the results look like you expect.

I know a lot of "research programmers" (meaning people who write code in research labs but are not themselves the researchers or investigators on a study), and they often have MS degrees in CS - though actually, highly quantitative masters degrees where very elaborate code is used to generate answers is a bit more common than CS per se (math, operations research, branches of engineering, bioinformatics, etc).

Here's the thing - in industry, this background (quant undergrad + MS, high programming ability, industry experience) is kind of the gold standard for data science jobs. In academic job ladders it's... hmm. Here's the thing - by the latest data, MS grads in these fields from top programs are starting at between 120k-160k in industry, and there are very good opportunities for growth.

I actually think that universities and research centers can compete with highly in demand workers in spite of lower salaries, but highly talented people in demand will not turn away an industry job with salary and advancement potential to remain in a dead end job.

Yeah my standard quote about research code is that it is not the product, so it is ok thta it is bad. The results are the product and those need to be good. Someday someone will take those results (in the form of some data or a paper) and make a software product, and that should be good.

I am under the impression that most authors do not even publish functioning code when publishing ML/DL papers which I find to be absurd. The paper is describing software. Imo the code is more important than the written word.

Shouldn't checking for bugs be of primary importance. How many times have impressive research results turned out to be a mirage built upon a pile of buggy code? I get the sense that is far too common already.

> How many times have impressive research results turned out to be a mirage built upon a pile of buggy code?

You're actually making bugs sound like a feature here. I'm pretty sure that if you've gotten impressive results with ugly code, the last thing you want to do is touch the code. If you find a bug, you have no paper.

I think software quality in research has nothing to do with the problems themselves. It's more like article suggests that nobody cares about your software. The only goal is to get published and be cited as many times as possible. Your coding mistakes don't matter if they cannot be found out or hurt your reputability.

How many tests would be written for business software if it had only to run for one meeting and then never be looked at again?

There seems to be an underlying assumption in many of these posts that code has no value once papers are published. This hasn't been my experience working in a research environment at all. The big, complex pieces of code are almost always re-used in some way. For example, theory collaborators send us their code so we can generate predictions from their work without bothering them. Probably 50% or more (and usually the most important parts) of the code written to process experimental data ends up in other experiments. From the perspective of an individual experimentalist, there is tremendous value in creating quality code that can be easily repurposed for future tasks. This core code tends to follow the individual in their career. In some ways it's an extension of commonly used mental tools, and there are diverse incentives to maintain it.

> To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.

I don't have to imagine it, I'm employed in the software industry.

Seriously, nothing you describe sounds any different from normal software development.

The only difference is speed IMO. Sure, new requirements appear and they can wildly change the underlying assumptions of the systems - but usually, in such case we're given months or years to adapt/rewrite the system in a systematic manner. If, for every wild idea the researcher wants to explore, this amount of rigor was applied in its implementation, I'm guessing the research would slow down immensely. BTW most of research code written for chasing dead ends (quickly testing some small hypetheses), and will be discarded without sharing with anyone - so, investing into writing it properly seems especially wasteful.

Rapid fundamental changes and short-lived code to explore an idea that will most likely be thrown away are very much the everyday development experience in industry too, IME.

The program I wrote for my dissertation is as good as it needs to be for a program that had to run once!

In my world, it does sound different, I work with HIPAA data that takes months to get access to. So sharing your code is borderline unacceptable to some orgs, even if it itself doesn't have any privacy data, there's a mass paranoia that you'll accidentally leak patient data, which can lead to fines of 2 million USD.

> The main reason why research code becomes a tangled mess is due to the intrinsic nature of research. It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time.

Oh, boy, how many times have I heard this working at a startup. There is some truth to it, it's hard to organise code in the first weeks of a new project. But if you work on something for 3+ months, it becomes a matter of making a conscious effort to clean things up.

> To make a concrete example, imagine writing an application where requirements changed unpredictably every day,

Welcome to working with product managers at any early stage-company. Somehow I managed to apply TDD and good practices most of the time. Moreover, I went back to school after 7+ years developing software full-time. I guarantee that most of the low-quality research code is a result of a lack of discipline and experience in writing maintainable software.

> I guarantee that most of the low-quality research code is a result of a lack of discipline and experience in writing maintainable software.

Bingo! Most research code is written by graduate students who never had a job before, so they do not know how to write maintainable software. You are definitely the exception, as you held a software dev job before going back to school.

Some researchers from top-10 schools still publish python2 code in 2020. I don't have an explanation for that. It's not even a lack of experience, but something on another level.

Mathematics doesn't suddenly stop working because your interpreter is a bit old.

>To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.

That sounds like software development, alright. It takes a while for domain experts to learn that if programmer ask "is X always true/false", they mean that there are no exceptions from that rule.

I would like for researchers to just name variables sensibly. Even that would improve code quality a lot.

Still the key problem is that there are zero incentives for researchers to even make their code readable! It does not improve any of the metrics they are judged by.

Yes, not pointing out the difference between coding some novel technique and a well defined software project, completely misses the reason the code is often not well organized. Suggesting that researchers are bad programmers is just a lazy excuse, somewhat damaging, and by no means the rule. I wrote a large complex framework for my research and the very nature of it causes me to add modules and techniques for parts I didn't know would work. And at times hard forks for when I wanted to try something new, which merging back would be impossible to do cleanly. At times you have a hunch and like a fever dream, change who knows what, but you just have to see something through. There is no waterfall method, kanban and agile makes no sense here and even unit tests are I'll defined.

This sounds like my software development methodology when I was in my early teens. I was certainly able to get things done and explore all kinds of things (I was doing game dev of course), but the code was a mess and I didn't even have a mature understanding that it was. I just thought that was how programming was and you just had to be really smart to keep things straight in your head.

Do you think that people doing research at large technical organizations structure their code in the same way as academics? No, although there's always a portion which is active and unstable, they create packages, define interfaces, abstract out pieces which can be reused reliably and depended on. Similarly for other types of researchers in fields where the code is considered an important product. Eg. if you are doing research in compiler design, you're likely to want to create a compiler which can be used by other people. So you make a stable thing with tests, automated builds and so on. And you delimit and instrument the experimental parts.

The real reason is the incentives. Not just are there no incentives to produce good quality code, there are incentives which make people focus on other outputs. Publish or perish means that people put up with technical debt just to get to the next result for the next paper, then do it again and again.

>The real reason is the incentives. Not just are there no incentives to produce good quality code, there are incentives which make people focus on other outputs. Publish or perish means that people put up with technical debt just to get to the next result for the next paper, then do it again and again.

I believe this is true and is fueled by a misconception of what software is in research. Software in research is often akin to experimentalist work in the past. It's tacked onto theoretical work projects as an afterthought and not treated as what it really is: forcing the theory to be tested in a computational environment.

If we start treating research software like experimentalism in the past, we might get a bit more rigor out of the development process as well as the respect it really deserves.

I've worked with a lot of research code. I agree with you that tangled code is somewhat intrinsic to the kind of code written for research.

Here's the thing. Sometimes, there's no code - I mean, they'll find something, but nobody can say, with certainty, that it is the code that generated the data or results you're trying to recreate. There's often no data - and by that, I mean, nothing, not even a dummy file so you can tell if it even runs or understand what structure the data needs to be in. No build, no archive history, no tests. And when I say no tests, I'm not talking about red/green bar integration and unit tests, I mean, ok, the code ran... was this what it was supposed to produce?

Many of these projects are far, far more messed up than the intrinsic nature of research would explain - though I will again agree that research code may be unusually likely to descend into entropy.

There certainly is quite a lot to be said about constant requirements drift. However, this is not something untypical to some of fast-paced product work or, even more closely, r&d effort within the industry.

What then drives the improvement of the code quality is the potential need for continuity and knowledge retention - either in the form of iterative cleaning of the debt or the re-write. This is reliant on the perceived value for the organisation. From this perspective it's more straightforward to get to author's reasons.

> imagine writing an application where requirements changed unpredictably every day


There's only one way to solve this: Simplicity.

Ironically this is also what occams razor would demand from good Science, so you'd have a win win scenario, where you both create good software and good research, because you focus on the simplest most minimal approach that could possibly work.

How do you keep a codebase simple when you need have things in it like implementations of state of the art algorithms to compare against and the previous iterations of your own method so that you can test whether you're actually improving? Then, depending on what you're doing, there's also all the extra nontrivial code for tests and sanity checks of all these implementations.

Simplicity is a nice dream. The realities of research are very often stacked against it.

How the heck to you hope to gain any insighfull metrics when you've got a cobbled together mess that you only half understand. For what it's worth you might only be benchmarking random code layout fluctuations.

I've seen research groups drown in their legacy code base.

The issue of juggling too many balls you describe is one you only have to begin with because the state of the art implementations are so shoddy to begin with.

Research suffers as much as everybody else from feature creep. Good experiments keep the number of new variables low.

Research code is not only written to measure runtime. Reducing the argument to only that aspect is not helping the discussion.

And you say it yourself: good experiments change a single variable at a time. So how do you check that a series of potential improvements that you are making is sound?

> good experiments change a single variable at a time

Although this is a tangent from the above conversation, this isn't actually true: well-designed experiments can indeed change multiple variables at the same time. There's an entire field of statistics dedicated to experimental design (google "factorial designs" for more information). One-factor-at-a-time (OFAT) experiments are often the least efficient method of running experiments, although they are conceptually simple.

See the following article for a discussion: http://www.engr.mun.ca/~llye/czitrom.pdf

I can't quite follow what the article is trying to describe because of the heavy use of analogies.

A Google search makes it look like Julia has a mechanism where you can extent the sets of overloads of a function or method outside the original module. The terminology is different (functions have methods instead of overloads in their speak). I don't see how that feature solves the problem in practice.

In my experience simplicity and generality don‘t go well with performance. If you want to build something that can be used for all kinds of problems and it is simple it will be slow as hell compared to the (dirty) optimised code running hardcoded structures on the GPU

Simplicity pretty much excludes generality in a lot of cases, you're only able to port code to the GPU if it wasn't a million LOC to begin with, so you're pretty much making the case for it.

Note that Simple != Easy or Naive

Hardcoded structures is potentially exactly the kind of simplicity needed.

What's not simple is a general "this solves everything and beyond" code-base with every imaginable feature and legacy capability.

>It is highly iterative work where assumptions keep being broken and reformed depending on what you are testing and working on at any given time.

This is describing infinitely fast and efficient p-hacking (i.e. research that is likely to produce invalid results).

If your assumptions are broken then that should ideally be reported as part of your research.

When you do research, you ideally start out with fixed assumptions, and then test those assumptions. The code required to do this can be buggy (and can therefore get fixed), and you can re-purpose earlier code, but the assumptions/brief shouldn't change in the middle of the coding it up.

If you aren't following the original brief, you've rejected your original research concept and you're now doing a different piece of research than you started out - and this is no longer a sound piece of research.

Research should be highly dissimilar to a web design project in this respect.

The reason these projects often become a tangled mess is because researchers don't have the coding skill to program any other way (in my opinion, and nor do institutions invest sufficiently in people who do have this skill).

I'm currently refactoring a fairly large piece of research code myself. It was written with lean startup thinking in that a little code ought to produce some value in its results. If i was able to eeek some usefulness out of this code, then Id put more energy into it. Otherwise I was perfectly happy to Fail Fast and Fail Cheap.

How did it become such a mess in the first place? Simple - I didn't know my requirements when I started writing it. I built it to do one thing. In running it I learned more things (this is good - why you build stuff like this in the first place). The code changed rapidly to accommodate these lessons.

It wasn't long before I was running into limitations in the design of the underlying libs I was using etc. Of course I could find a way to make it work but it wasn't going to win any Software Design Awards.

Im happy to report that despite ending up a tangled mess, it actually helped me to come to understand and conquer a very specific kind of problem. In doing so I learned the limitations of commercially available tooling, the limitations of commercially available data, not to mention a great deal about the problem domain itself.

This research software has earned its keep and is now being cleaned up into a more organized, near commercial quality kind of project. Im glad I threw out "architecture" when I first started with this. It could have gone the other way where I had a very well built piece of code that didn't in fact perform any useful function.

I believe that of all the lessons to come from contemporary software development, constant refactoring may be the most valuable.

The spaghetti monster looms large when you're in the heat of battle. But we've all got some idle time for whatever reason. I spend some time every week doing a couple of things: 1) Reading about good techniques. 2) Working through old code and cleaning it up.

Because changing your code could always break it, refactoring also reinforces the habit of writing code that can be readily tested -- also a good thing.

That was my exact thought reading this.

I used to write some crazy spaghetti code as an untrained student working in a lab. Coding would go really quickly at first, but as I kept adding on to accommodate new requirements it became a huge kludgy mess.

Recently (after quite a few years of software engineering experience) I helped a researcher friend to build some software. He was following along with my commits and asked why I kept changing the organization and naming of the code, pulling things out into classes, deleting stuff that he thought might be needed later, etc. He spends only a small part of his time writing code, so he's never realized how much time it actually saves to keep things organized and well-factored.

Makes you wonder why most languages don't come with good refactoring tools.

Safe automatic refactoring requires the ability to do static analysis of the code. Many refactorings are harder in loosely-typed languages.

Refactoring tools were invented in Smalltalk and worked just fine.

Smalltalk is dynamically typed, but I wouldn't call it loosely typed. It's closer to Ruby's "duck typing". I was thinking more of Javascript and PHP. Weak and loose.

This has always surprised me, since I learned it.

What are the features of Smalltalk that allowed this to happen? Conversely, what is stopping this from existing in more modern dynamic languages?

Smalltalk has simple and strong reflective features. Moreover, it does not make a difference between the developed program and IDE. This means that doing things like that are very natural and well established in the Smalltalk cultural background.

Indeed. Having the whole system in front of you, and knowing how patterns like MVC or Thing-Model-View-Editor encapsulate parts of it, makes it very easy to "reason about" wholesale changes to the system.

I should have added that I am probably abusing the term refactoring if it has a precise definition. What I'm talking about is working on improving the readability of my code, but also improving its structure. Today, "spaghetti" probably doesn't refer to a tangled mess of code sequences because we've gotten rid of the GOTO, but tangled interactions between modules, many of which are vestigial.

A lot of my code interacts with hardware configurations that will cease to exist when a project is done, but I mainly look at the stuff that's potentially reusable, and making it worth re-using.

I'm using Python, and there are a lot of tools for enforcing coding styles and flagging potential errors. I try to remove all of the red and yellow before closing any program file. I don't trust myself with too much automation! "Walk before you run."

I don't think refactoring tools are that useful for refactoring, most of the time you are doing non-obvious refactoring tools can't help with anyways. Depends what we call "refactoring", tools are mostly useful for what I would call "housekeeping".

Well, every refactoring can be seen as a series of correctness-preserving housekeeping operations.

At least in the languages I work primarily in (JS and Java), I find my IDE to be pretty good at analyzing a lot of it.

Refactoring is kind of subjective, because there is rarely One Right Way to solve a problem, and you need context, so I could see why it’s not something that languages themselves take strong opinions on.

I like Brooks' "plan to throw one away; you will, anyhow.":

This [first] system acts as a "pilot plan" that reveals techniques that will subsequently cause a complete redesign of the system.

However, in practice I'm not confident enough in my understanding, and fear losing all that hard-won work, so I refactor too.

A rewrite from scratch is probably more viable when the project is small enough to keep in your head at once.

We are talking about research code here. And from many of the comments, it seems like the biggest hurdle is not understanding requirements the first time around. This is incredibly common.

I work in computational biology, and my normal thought process is that by default, you should expect to write the code three times (especially for less experienced developers).

The first time, you don’t know the problem.

The second time you’ve figured out the problem, but don’t know the best way to do it.

The third time, you’ve figured out the problem and a decent strategy to solve the problem.

With more experience, you can narrow that to just two iterations. But really, especially with research, you rarely have a good feel for the problem domain the first time around. And when you have the expectation that you’re going to throw the code away, you don’t get quite as hung up on implementation details for the first two rounds. And because of that, the process is easier. And you don’t have to worry about refactoring bad code. Just accept the first round as an experiment and take what you’ve learned about the problem to write better the next time.

Brooks has since amended this[1] to say that he really meant it in the context of traditional "waterfall" development, where the first iteration is meticulously planned and designed as a whole system before any code is written at all.

Rapid, iterative prototyping, followed by refactoring, is a perfectly reasonable approach today. No need to create a fresh repository and rewrite all code from scratch.

David Heinemeier Hanssen, creator of Rails and a big advocate of building working code as early as possible, wasn't even born in 1975 when the mythical man-month was written. Linus Torvalds was a (presumably) plucky 6-year-old. Brooks wrote that book for an audience that would have known waterfall as the only way.

[1] https://wiki.c2.com/?PlanToThrowOneAway

wikiwiki seems to be having trouble moving to github. Here's wayback https://web.archive.org/web/20190821162524/https://wiki.c2.c... And the bit you referenced:

  "This I now perceived to be wrong, not because it is too radical, but because it is too simplistic. The biggest mistake in the 'Build one to throw away' concept is that it implicitly assumes the classical sequential or waterfall model of software construction."
You don't have to throw everything away. Like reusing libraries, you can reuse your modules, or whatever code you like.

It's the part that is a tangled Gordian Knot that is easier to cut than meticulously unravel.

"Starting with a clean slate" is a common idea in many contexts. One very similar to code is writing. You can hack and edit, but a fresh draft is easier (and safer) for fundamental conceptual and structural changes.

BTW "Waterfall" was a parody of processes, though the conceptual aspects (requirements vs specifications etc) are useful. People then were just as intelligent as today. Maybe more.

I don't see waterfall as a product of limited intelligence but rather limited tools and history. It's natural that early software engineers would generalize processes from mechanical and chemical engineering, especially given the cost of iteration with early software tools.

Good architecture is pretty much just about slicing things up so that rewrites / refactors can happen incrementally, rather than all at once. This can actually go both bottom-up (these functions are easy to re-arrange and don't need a rewrite) and top-down (these functions suck, but I don't have to rearrange anything to replace them).

"Good architecture is the one that allows you to change."

What if you slice it up wrong?

Then you're gonna have a shit time of it, and may need to do a total rewrite, instead of an incremental one.

Well as somebody who has written research software, I don't agree that research software is a "tangled mess". A couple of points,

1. often when I read read software written by profession programmers I find it very hard to read because it is too abstract, almost every time I try to figure out how something works, it turns out I need to learn a new framework and api, by contrast research code tends to be very self contained

2. when I first wrote research software I applied all the programming best practices and was told these weren't any good; turns out using lots of abstraction to increase modularity makes the code much slower, this is language dependent of course

3. you will find it much harder to read research code if you don't understand the math+science behind it

> many of those writing software know very little about how to do it

This is just not true. I found in my experience that people writing research software have a very specific skillset that very very few industry programmers are likely to have. They know how to write good numerics code, and they know how to write fast code for super computers. Not to mention, interpreting the numerics theory correctly in the first place is not a trivial matter either.

Quite a few professional programmers evaluate the quality of code by "look": presence of tests, variable length, function length etc. However, what makes great code is really the code structure and logical flows behind. In my experience, good industrial programmers are as rare as good academic programmers. Many industrial programmers make a fuss about coding styles but are not really good at organizing structured code for a medium sized project.

As someone who's worked for a large part of my career as a sort of bridge between academia and industry (working with researchers to implement algorithms in production), both you and the original author are right to an extent.

On one hand, academics I've worked with absolutely undervalue good software engineering practices and the value of experience. They tend to come at professional code from the perspective of "I'm smart, and this abstraction confuses me, so the abstraction must be bad", when really there's good reason to it. Meanwhile they look at their thousands of lines of unstructured code, and the individual bits make sense so it seems good, but it's completely untestable and unmaintainable.

On the other side, a lot of the smartest software engineers I've known have a terrible tendency to over-engineer things. Coming up with clever designs is a fun engineering problem, but then you end up with a system that's too difficult to debug when something goes wrong, and that abstracts the wrong things when the requirements slightly change. And when it comes to scientific software, they want to abstract away mathematical details that don't come as easily to them, but then find that they can't rely on their abstractions in practice because the implementation is buried under so many levels of abstraction that they can't streamline the algorithm implementation to an acceptable performance standard.

If you really want to learn about how to properly marry good software engineering practice with performant numerical routines, I've found the 3D gaming industry to be the most inspirational, though I'd never want to work in it myself. They do some really incredible stuff with millions of lines of code, but I can imagine a lot of my former academia colleagues scoffing at the idea that a bunch of gaming nerds could do something better than they can.

> a lot of the smartest software engineers I've known have a terrible tendency to over-engineer things.

Your definition of "smartest software engineers" is the opposite of mine. In my view, over-engineering is the symptom of dumb programmers. The best programmers simplify complex problems; they don't complicate simple problems.

I don't know that our definitions are that different. Most of the over-engineering I've seen in practice was done in the name of simplifying a complex problem, but resulted in a system that was too rigid to adapt. Our definition of "over-engineered" might be different, though.

I work on mathematical modeling, dealing with human physiology. Likewise, the software packages used can be esoteric, and the structure of your "code" can be very different looking, to say the least.

This is certainly a lot of work, and this takes a lot of practice to perform efficiently: But no matter what, I comment every single line of code, no matter how mundane it is. I also cite my sources in the commenting itself, and I also have a bibliography at the bottom of my code.

I organize my code in general with sections and chapters, like a book. I always give an overview for each section and chapter. I make sure that my commenting makes sense for a novice reading them, from line-to-line.

I do not know why I do this. I guess it makes me feel like my code is more meaningful. Of course it makes it easier to come back to things and to reuse old code. I also want people to follow my thought process. But, ultimately, I guess I want people to learn how to do what I have done.

"Esoteric software used for mathematical models of physiology" brought back a strong memory of the xpp software we had to use as undergrads. Apparently it was the best tool available for graphing bifurcations in nonlinear systems... but damn that was some old software.

Writing long descriptions in comments works if you're the only one editing the code, or you supervise all contributions... in a fast-changing industrial codebase, those things go out of date very quickly, so comments are used more sparsely. I document the usage of any classes or functions that my package exports, and I'll write little inline comments explaining lines of code whose purpose or effect is not obvious. Mostly I just try to organize things sensibly and choose descriptive names for variables and functions.

Your points apply to industry, too. I heretically push flatter code all the time. I'm not against abstraction, but it is easy to fall into the trap of building a solution machine, but missing the solution you need.

Point 1 is so true, I think it’s why I like Golang without generics so people can’t go crazy with abstractions.

Disclaimer: I am one of the trustees of the mentioned charity, The Society of Research Software Engineering.

You say that you don't see it having much "difference with regard status and salary". The problem here is two-fold. Firstly, salaries at UK universities are set on a band structure and so an RSE will earn a comparable amount to a postdoc or lecturer. These aren't positions that are known for high wages and historically the reason that people work in research is not for a higher salary.

As for status, I can see that the creation of the Research Software Engineer title (since about 2012) has done great good for improving the status of people with those skills. Before they were "just" postdocs with not many papers but now they can focus on doing what they do best and have career paths which recognise their skills.

My role (at the University of Bristol - https://www.bristol.ac.uk/acrc/research-software-engineering...) is focused almost entirely on teaching. I'm not trying to create a new band of specialists who would identify as RSEs but rather provide technical competency for people working in research so that the code they write is better.

There is a spectrum of RSEs from primarily research-focused postcode who write code to support their work along to full-time RSEs whose job is to support others with their research (almost a contractor-type model). We need to have impact all the way along that spectrum, from training at one end to careers and status at the other.

For more info on the history of the role, there's a great article at https://www.software.ac.uk/blog/2016-08-17-not-so-brief-hist... written by one of the founding members of the Society of Research Software Engineering.

> writing software is a low status academic activity

Yep, that's the one liner right there.

The incentives simply do not match the complaints. Researchers already work upwards of 60 hrs/wk on most occasions. Alongside writing code, they also have to do actual research, write papers, give talks and write grants.

All of the latter tasks are primary aspects of their jobs and are commensurately rewarded. The only situation where a well coded tool is rewarded, is when a package blows up, which is quite rare.

Like all fields, the high-level answer to such questions is rather straightforward. The individual contributors align their efforts to the incentives. Find a way to incentivize good research code, and we will see changes overnight.

This article seems to cover research software that even can be built. I claim the majority of _code_ written to support research articles is a collection of scripts written to produce figures to put in the paper. Even when the article is about an algorithm, the script that runs this algorithm is just good enough to produce the theoretically expected results; it is never tested, reproduced, or published, never mind being updated after publication.

While others here point out that researchers = bad programmers is a lazy excuse, I think it is important to point out just how steep the learning curve of computer environments can be for the layperson that uses Excel or MATLAB for all their computational work. It can be a huge time investment to get started with tools, such as git or Docker, that we take for granted. I think recognizing this dearth of computer skills is a first step towards training researchers to be computer-competent. Currently, I find the attitude among academics (especially theorists) to be dismissive of the importance of such competencies.

I am a research scientist published via R, Stata, and Excel analyses. My code documents wouldn't be helpful since the data is all locked up due to HIPAA concerns. We're talking names, health conditions, scrambled SSN, this isn't reproducible because the data is locked to those without security clearance.

The code itself is a ton of munging and then some basic stat functions. This information can be gleaned from the methods section of the article anyway.

So, really, my field of public health doesn't use GitHub or sharing much, there's simply too little benefit to the researcher to share their code.

There's an unwarranted fear of getting your work poached. In modern science, publications are everything, they determine your career. Enabling your direct competitors, those who want the same grants and students and glories, is not common in science.

In well designed software, data ingestion should be easily separable from the core logic of the application. Which is the point the parent comment is making. Some basic best practices would allow you to share your core code without implicating HIPAA. Even if it’s just basic stats, sharing the code makes it easier to reproduce your results and to check your logic.

Although I agree with your analysis that enabling competitors in science is not common, it really, really should be. That’s kinda the point of publication, at least in theory. Sharing knowledge and methods.

> enabling competitors in science ... really, really should be.

Said someone whose livelihood doesn't depend on said competition.

I don't disagree with you on any points. I have some academic friends who mostly do "a ton of munging and then some basic stat functions", as you say (but with less sensitive data). The problem is that their workflow is prone to human error. Even though the stat functions are simple, the proper labeling of inputs and outputs is less reliable.

I have some research published for which I wrote MATLAB code years ago. I trust the fundamental results but not the values displayed in the tables. I would have personally benefited from rudimentary version control and unit testing.

As a public health statistician, I am very grateful to all researchers who publish code. More so for those who publish packages that make their techniques easy to use. I am not an expert in the field of statistics, just a grunt applying what you guys devise. It takes a while for me to do enough research and testing to be sure I'm correctly implementing new techniques. Even a basic pseudo-code walkthrough would immensely help.

>My code documents wouldn't be helpful since the data is all locked up due to HIPAA concerns. We're talking names, health conditions, scrambled SSN, this isn't reproducible because the data is locked to those without security clearance.

Is there a standard format for this kind of data? If so, consider using it. That way, others can easily create artificial datasets to test it. Even if you have no control over your data source, you can convert the raw data to the standard as a "pre-munging" step.

>So, really, my field of public health doesn't use GitHub or sharing much, there's simply too little benefit to the researcher to share their code.

Sad but true.

Reproducibility is a major principle of the scientific method.

Yet computer scientists consistently fail to achieve reproducibility with a tool that is the most consistent at following instructions - the computer.

Even private business is on the DevOps movement, because they see the positive effects of reproducibility.

If the academic world is truly about science, then there is no more excuse, the tools are out there, they need to use them.

This is really an artifact if unreasonable expectations and modern software ecosystems. When I say unreasonable expectations, the issue is that people assume they can use the latest greatest trendy library and get reproducible results. Good luck on the level of determinism you're looking for.

You need to step back and look at more mature, simple codebases and what you can do in those sorts of environments when you want reproducibility. You can't cobble together a bunch of async services in the cloud and hope your Frankenstein tool gives you perfect results. It will give you good enough results for certain aspects if you focus on those specific aspects (banking does a good job of this with transactional processing and making sure values are consistent because it's their entire business, maybe your account or their web interface is skrewy but that's fine, that can fail).

> it is never tested, reproduced, or published

This never ceases to amaze me. I regularly read recent papers on shortest-path algorithms. Each one is religiously benchmarked down to the level of saying what C++ compiler was used. But the code itself is almost never published.

These are some concepts that I believe in for research code.

Research code shouldn't be a monolith. Each hypothesis should be a script that follows a data pipeline pattern. If you have a big research question, think about what the most modular progression of steps would be along the path from raw data to final output, and write small scripts that perform each step (input is the output from the previous step). Glue them all together with the data pipeline, which itself is a standalone, disposable script. If step N has already been run, then running the pipeline script once again shouldn't resubmit step N (as long as the input hasn't changed since the last run).

This "intermediate data" approach is useful because we can check for errors each step on the way and we don't need to redo calculations if a particular step is shared by multiple research questions.

I was taught this by a good mentor and I've been using this approach for many years for various ML projects and couldn't recommend it more highly.

This, absolutely this.

I looked over a friend's PhD program because the results were unstable. I knew nothing about the domain which was a large disadvantage, but on the code front it was a monolith following a vague data pipeline approach. Unfortunately components wouldn't run separately and there were only a single end to end tests taking hours to run. Had each section had its own tests, diagnosing which algorithm(s) were malfunctioning would have been easier. We never did.

There is no real incentive to organize and clean up the code, even if the scientists involved have the skills to write well-organized software. And organizing this kind of code that often starts in a more exploratory way is a pretty large amount of additional effort. This kind of effort is simply not appreciated, and if spending time on it means you publish fewer papers it's a net negative for your career.

I'd settle for just publishing the code at all, even if it is a tangled mess. This is still not all that common in the natural sciences, though I have a bit of hope this will change.

Yeah I mean, if your study is implying the apocalypse (or even if not, but more so if that's the case) you better put the code there, because that's what the scientific method requires, how should I believe your conclusions or and cute graphs if I can't see how you arrived at it? Maybe it was drawn in Narnia for all I know, maybe it has significant errors, or it's so tailored to produce those results that it's irrelevant.

And if the tools and methods you used for arriving at them are so messy that you dare not publish them what does that tell me about: - your process; - the organisation of your ideas; - the conclusions or points made in the paper?

I don't mean it has to be idiomatic well written code, but it should be readable enough to be followed.

I don't think research code in aggregate is any worse than any other source for code. If we had the same kind of visibility into all the commercially written code, it would be the same pattern of some well structured, and some a complete mess, without any correlation with the companies concerned, but with a lot of correlation to the ability of the author.

The recent example of Citibank´s loan payment interface comes immediately to mind. So does Imperial's Covid model (the one that had timing issues when run on different computers.)

Exactly. You can imagine most engineering software to be in a similar state as research code. It's just that people get to see research code.

If you want to write good research software, a good way is to have professional developers implement it.

I worked closely with an NLP researcher for a while on a project that had received a hefty state grant. She knew more or less what her team needed, but she needed someone to implement it cleanly and in a way that would not make users step on each others toes.

The chances of that project being a buggy mess would have been pretty high if it had been written by people who don't write software for a living. And maybe that's OK.

Here's the problem with hiring a pro.

The workhorse NIH grant[0] is a R01 with a $250,000/year x 5 years "modular" budget. Most labs have, at most, one. Some have two, and a very few have more than that. This covers everything involved in the research: salaries (including the prof's), supplies, publication fees, etc. Suppose you find a programmer for $75k. With benefits/fringe (~31% for us, all-in), that's nearly $100k/year. If the principal investigator (prof, usually) takes a similar amount out of the grant, there's very little money left to do the (often very expensive) work. In contrast, you can get a student or postdoc for far less--and they might even be eligible for a training grant slot, TAship, or their own fellowship, making their net cost to the lab ~$0.

This would be easy to fix: the NIH already has a program for staff scientists, the R50. However, they fund like two dozen per year; that number should be way higher.

[0] Other mechanisms exist at the NIH--and elsewhere--but NSF (etc) grants are often much smaller.

> In contrast, you can get a student or postdoc for far less

Yeah I totally agree on this part. The academic system relies not on monetary compensation for its labor, rather it provides them reputation by getting their names on a paper.

I worked essentially for free for a lab in my spare time for 4 years. They get to the result they want, even if its built on a shaky foundation, and for basically free (it doesn't cost anything to put a name on a paper). At the end of the 4 years the dream of getting my name on a paper didn't even pan out (lab was ramping down and was essentially a teaching research lab by the time I showed up).

What doesn't really get mentioned in the article, is that a lot of academic software was written by a single developer. All bigger software projects, academic or not, that were only built and maintained by a single person tends to become messier and messier with time. Perhaps most software suffers from that, that over time it becomes a mess, but having more developers look at code (and enough time, and many other factors) can certainly help to keep things in better shape.

Plus it becomes impossible to get multiple developers to work on the code if they can’t understand it because of its messiness, so there’s a bit of survivor’s bias and stronger motivation to clean the code up to be comprehensible to others when you have multiple people working on it.

Also, I feel personally attacked by the headline. :)

It is for this reason I try to keep my code and models pretty simple, only two or three pages of code (or ideally a single page), and I don’t try to do too many things with one program, and I choose implementations and algorithms that are simpler to implement to make concise code feasible (sometimes at the expense of speed or generality).

I think there's not enough researchers that publish code.

For example, discrete optimization research (nurse rostering, travelling salesman, vehicle routing problem, etc.) is filled with papers where people are evaluating their methods on public benchmarks but code never sees the day. There's a lot of state-of-the-art methods that never have their code released.

I'm pretty sure it's like that elsewhere. Machine learning and deep learning for some reason has a lot of code in the open but that's not the norm.

I'd prefer the code to be open first. Once that's abundant then I might prefer the code to also be well designed.

> I think there's not enough researchers that publish code.

I agree, although lately there's been some effort by academia to make authors publish their code, or at least disclose it to the reviewers.

Several conferences have an artifact evaluation committee, which tries to reproduce the experimental part of submitted papers. Some conferences actually require a successful artifact evaluation to be accepted (see, for instance, the tool tracks at CAV [1] and TACAS [2]).

Others, while not requiring an artifact evaluation, may encourage it by other means. The ACM, for instance, marks accepted papers with special badges [3] reflecting how well the alleged findings can be reproduced and whether the code is publicly available.

[1] http://i-cav.org/2021/artifact-evaluation/

[2] https://etaps.org/2021/call-for-papers

[3] https://www.acm.org/publications/policies/artifact-review-an...

This feels like the right approach. If peer review were to include artifact evaluation, including some kind of code review, and require certain standards be met for acceptance, things would change. As others have noted here, the mechanisms of grant-funded work strongly discourage attention to code quality, and that would have to change as well.

I'm not in academia now, but I started out my career doing sysops and programming in a lab at a medical school and have worked with academics a bit since. I don't do it much because it's basically volunteer work, and it's almost impossible to contribute meaningfully unless you are also well-versed in the field.

Math and physics are a tangled mess so it's not surprising that mathematicians and physicists write code which looks like a tangled mess. Mathematicians and physicists are trained to handle ambiguous concepts and they can work with weird abstractions which are far detached from reality. Unlike programming languages, the language of math is full of gaps - This requires the reader to make assumptions using past knowledge and conventions. Computers, on the other hand cannot make assumptions so the code must be extremely precise and unambiguous.

Writing good code requires a different mindset; firstly, it requires acknowledging that communication is extremely ambiguous and that it takes a great deal of effort to communicate clearly and to choose the right abstractions.

A lot of the best coders I've met struggle with math and a lot of the best mathematicians I've met struggle with writing good code.

Thank you. Having trained as a physicist, this matches my experience.

So here is simple fact.

It does not make sense to judge any piece of code that does not meet "highest standard" to be a tangled mess.

There are valid reasons to have varying quality of code and also the idea of quality might be changing from problem to problem and project to project.

A quality of code that governs your car's ECU should be different from quality of code that some research team threw together to demonstrate an idea.

A coding project should achieve some kind of goal or set of goals as efficiently as possible and in many valid cases quality is just not high on the list and for a good reason.

Right now I am working on a PoC to verify an idea that will take a longer time to implement. We do this because we don't want to spend weeks on development just to see it doesn't work or that we want to change something. So spending 2-3 days to avoid significant part of the risk of the rest of the project is fine. It does not need to be spelled out that the code is going to be incomplete, messy and maybe buggy.

There is also something to be said for research people to be actually focusing on something else.

Professional developers focus their careers on a single problem -- how to write well (or at least they should).

But not all people do. Some people actually focus on something else (physics maybe?) and writing code is just a tool to achieve some other goals.

If you think about people working on UIs and why UI code tends to be so messy, this is also probably why. Because these guys focus on something else entirely and the code is there just to animate their graphical design.

Checkout the openmmlab project [1], where messy research codes in computer vision are rewritten an reorganized into a coherent whole. If there are more researchers join this, then not only research are fully reproducible, much more accessible to everyone, but also be compared fairly. (I'm the maintainer of mmpose [2], mmaction2 [3], and mmediting [4])

[1] https://github.com/open-mmlab [2] https://github.com/open-mmlab/mmpose [3] https://github.com/open-mmlab/mmaction2 [4] https://github.com/open-mmlab/mmediting

One of the things which has helped derail my own research career [1] is the tendency to not write tangled-mess code, and to publish and maintain much of my research code after I was supposed to be done with it.

Annoyingly, more people now know of me due to those pieces of software than for my research agenda. :-(

[1] : Not the only thing mind you.

C'mon, no good deed goes unpunished. Everyone knows that.

Structure is great for a well understood problem space, but this is not usually the case when working working something novel. As a researcher your focus should be on learning and problem solving, not creating a beautiful code base. Imposing too many constraints early on can negatively impact your project later on. In the worse case, your code starts to limit the way you think about your research. I agree that there are some general best practices that should be applied to nearly all forms of coding, but beyond that it's a balance.

The same thinking should be used when adding regulation to an industry. Heavy regulation on a rapid developing industry can stifle innovation. Regulation (if needed), should be applied as our understanding of the industry increases.

Results need to be refined so that the way they were first formulated doesn't get in the way of their replication. At scale, this too becomes a cost to industry.

In the small, this isn't different from taking a lab notebook and making it clearer and better summarized so that it can be passed on to the poor sucker who has to do what you did after you move on to another project.

Furthermore, software projects that are put under the same iterative stress you imply for R&D inevitably go through a refactoring phase so that performance isn't affected in the long run.

Agreed that there should be a minimum bar for "completed" research code such as reproducibility and a clear summary, but engineers shouldn't expect the first version of a new algorithm to be easy to understand without additional material or ready for production without a complete rewrite.

There's a huge digital divide forming as well. Between the hardware a junior software engineer at a well funded research institution such as DeepMind has access to. Compared to the postdoc in Theoretical Physics at Princeton. Who is expected not only to write software. But maintain hardware for a proprietary "supercomputer" that was probably cast off ages ago from a government lab or wall street.

We don't expect Aerospace / Mechanical engineering students to learn metalworking. They typically have access to shop technicians for that work. Why not persuade university administrators to similarly invest in in-house software engineering talent. Generalists who can provide services to any problem domain: from digital humanities to deep reinforcement learning?

> We don't expect Aerospace / Mechanical engineering students to learn metalworking. They typically have access to shop technicians for that work.

You'd be surprised, but that is often not the case. Lack of sufficient funding, or technicians being dicks, or mis-management by PIs, often result in graduate students having to do the technical work of metalwork, welding, lab equipment calibration, and a bunch of other tasks. Sometimes they even have to operate heavier machinery, or lasers etc without the minimum reasonable technical staff support.

I know this from my time on the executive committee of my old university's Grad Student Organization.

> We don't expect Aerospace / Mechanical engineering students to learn metalworking.

Umm...we sorta do.

As a neuroscience postdoc, I have done virtually everything from analysis to zookeeping, including some (light) fabrication. We outsource really difficult or mass-production stuff to pro, and there's a single, very overworked machinist who can sometimes help you, but most of the time it's DIY.

I am actually quite surprised at the figure of 73% research-related code packages not being updated after the publication, was expecting it to be higher.

Same. But it could be an issue with the sample. 213 in a span of 14 years is not a lot.

Also, a question. If you publish a paper with a repo, what would be the best way to handle the version in the paper matching the repo in the future?

An opinion, there is such a thing as software being ‘done’ and ‘as is’. Software solves a need. After that’s meet, that’s it.

There’s also this part that strikes me,

>Given a tangled mess of source code, I think I could reproduce the results in the associated paper (assuming the author was shipping the code associated with the paper; I have encountered cases where this was not true).

And it strikes me as weird. The main issue to reproduce results is usually data. And depending on the dataset, it’s very hard to get. To be able to reproduce the code, I just need the paper.

The code may have bugs, may stop working, may be in a different language/framework. The source of truth is the paper. This is why the paper was published.

>The source of truth is the paper. This is why the paper was published.

Speaking as someone who's not the best at math, I find it easier to understand what a paper is saying after I run the code and see all the intermediate results.

When the code doesn't work, it takes me 20 times longer to digest a paper. They could do with only uploading code -- to me it's the shortest and most effective way to express the ideas in the paper.

>Speaking as someone who's not the best at math, I find it easier to understand what a paper is saying after I run the code and see all the intermediate results.

As long as you understand the paper after, that's okay.

> When the code doesn't work, it takes me 20 times longer to digest a paper.

What if the data isn't available? That's another issue. I see where you're coming from, but that's why the paper itself is the source of truth. Not the implementation.

Another case, what if the implementation makes assumptions on the data? Or on the OS it's being run on?[0][1]

> They could do with only uploading code -- to me it's the shortest and most effective way to express the ideas in the paper.

In my opinion, no. The math and algorithm behind it is more important than an implementation and better for longevity.

[0] https://science.slashdot.org/story/19/10/12/1926252/python-c...

[1] https://arstechnica.com/information-technology/2019/10/chemi...

> Also, a question. If you publish a paper with a repo, what would be the best way to handle the version in the paper matching the repo in the future?

You can include the hash of the commit used for your paper.

oh, that's good. Or even a tag

> The source of truth is the paper.

Yes, although truth of the flimsiest kind. A lowly but wise code monkey once said "Talk is cheap. Show me the code."

Here's some code. Data is proprietary. There's no paper explaining the data, prep, steps to gather, caveats, assumptions, etc.

What now?

No need for the reductionist strawman. Some experiments cannot be reproduced for proprietary data. Those that can should be.

It turns out that maintaining a package is a lot of work, and the career benefit post-publishing said package and accompanying paper is really low.

- writing general purpose software that works on multiple platforms and is bug free is really really hard. So you're just going to be inundated with complaints that it doesn't work on X

- maintaining software is lots of work. Dependencies change, etc.

- supporting and helping an endless number of noobs use your software is a major pita. "I don't know why it wouldn't compile on your system. Leave me alone."

- "oh that was just my grad work"

- its hard to get money to pay for developing it further. great when that happens though.

I see people here saying research is like writing software with fast-changing requirements. I can see how that could seem like an adequate analogy to a software engineer, but it's not.

Researchers use code as a tool of thought to make progress on very ambiguous, high-level problems that lack pre-existing methodology. Like, how could I detect this theoretical astrophysical phenomenon in this dataset? What would it take to predict disease transmission dynamics in a complex environment like a city? Could a neural network leveraging this bag of tricks in some way improve on the state-of-the-art?

If you have JIRA tickets like that in your queue, maybe you can compare your job to that of a researcher.

I think that incentives play a big role here. Software has near to zero value in academic evaluation and even less its update and maintenance. The only way to make research software survive is to offer packages that other researchers can also use. Maybe.

This is changing drastically. The issue is that more and more science relies heavily on computation. Analytic platforms, computational science, modeling/simulation, etc. There's less "bench" science and more of the scientific process is being embedded in software.

There's a certain degree of naivity in this process that SMEs think it's a trivial step translating their research into software. It's not, not if you demand the rigor science should be operating at. As such, many budgets are astronomically lower than they should be. This has worked in the past but as more science moves into software and it becomes more critical to the process, you must invest in the software and it's not going to be cheap. The shortcuts taken in the past won't cut it.

There's a bigger issue in that as a society we don't want to invest in basic research so it's already cash strapped. Combine that with research scientists who already have to cut corners with the massive cost quality software will take and you're creating a storm where science will either produce garbage or well need to reevaluate how we invest in software systems for science.

I worked at a productive computing research institute for a number of a years. I cannot count the number of times I found research teams duplicating critical algorithms. Research Scientists not only pay the price of the spaghetti nature of the code, they pay it over and over again by not sharing and improving on what has already been built by previous research groups.

The software industry has its own share of problems, but from what I've seen the research community is still largely operating on an outdated software model that shuns open collaboration out of fear of being "scooped".

Not all research software is a tangled mass. I have extensively worked as a "quant" (before the term was popular) for math, medical, network, media, and physics researchers as my side gig for decades. I'd say about 1/3 of the home brewed research software is constructed with fairly reasonable assumptions, the authors are scientists after all, and I am able to grow their basic setup into a framework they intimately understand and prefer to use. More than once I've found brilliantly engineered software not unlike what I'd find at a pro software development firm.

>>> writing software is a low status academic activity; it is a low status activity in some companies, but those involved don’t commonly have other higher status tasks available to work on.

If measured by compensation, then research is a low status activity. Perhaps more precisely, researchers have low bargaining power. But I don't think that academics actually analyze activities in such detail. The PI might not even know how much programming is being done.

The researcher is programming, not because they see it as a way to raise (or lower) their status, but because it's a force multiplier for making themselves more productive overall. Though I work in industry, I'm a "research" programmer for all intents and purposes. I program because I need stuff right away, and I do the kind of work that the engineers hate. Reacting to rapidly changing requirements on a moment's notice disrupts their long term planning. Communicating requirements to an engineer who doesn't possess domain knowledge or math skills is painful. Often, a working piece of spaghetti code that demonstrates a process is the best way to communicate what I need. They can translate it into fully developed software if it threatens to go into a shipping product. That's a good use of their time and not of mine.

>>> Why would a researcher want to invest in becoming proficient in a low status activity?

To get a better job. I sometimes suspect that anybody who is good enough at programming to get paid for it, is already doing so.

>>> Why would the principal investigator spend lots of their grant money hiring a proficient developer to work on a low status activity?

Because they don't know how to manage a developer. Software development is costly in terms of both time and effort, and nobody knows how to manage it. Entire books have been written in this topic, and it has been discussed at length on HN. A software project that becomes an end unto itself or goes entirely off the rails can eat you alive. Finding a developer who can do quantitative engineering is hard, and they're already in high demand. It may be that the PI has a better chance managing a researcher who happens to know how to translate their own needs into "good enough" code, than to manage a software project.

Keep in mind that there're different kinds of research software. Take Seurat[1] as an example. There's CI, issue tracking, etc. It might not be the prettiest code you ever seen, but it absolutely has to be maintainable as it's being actively developed. Such projects are rare, but the low quality is often an indication of a software that isn't used by anyone.

1. https://github.com/satijalab/seurat

Also things like EISPAC, BLAS, LINPACK and so on, for FORTRAN. Back in the 70s my dad worked a bit with them when he was employed for UTHERCC: The University of Texas Health, Education, and Research Computer Center. You can find references to UTHERCC in papers from that era.

Come to think of it, something like UTHERCC might be exactly what is needed to help the current situation.

I’m an engineer from the other side of researches or science, but somehow interested in the topic. Recently, I’ve learned about great work done by Grigori Fursin and entire community of reserach engineers with the goal to make research software more applicable to the industries by doing it with some kind of framework inside. I want to leave some links here, if you don’t mind to watch it – the talk is called ” Reproducing 150 Research Papers and Testing Them in the Real World”:ACM page with webcast https://event.on24.com/wcc/r/2942043/9C904C7AE045B5C92AAB2CF...

Also, source docs available here: https://zenodo.org/record/4005773?fbclid=IwAR1JGaAj4lwCJDrkJ...

And, their solution product https://cknowledge.io/ and source code https://github.com/ctuning/ck

I guess it should be helpful to the researchers community.

One of the biggest eye-openers for me as an undergrad was when, upon getting to the point where I'd have to decide whether to pursue graduate education or exit academia and join the workforce, I began to look at the process for publishing novel computer science.

To be clear, novel computer science is valuable and the lifeblood of the software engineering industries. But the actual product? I discovered of myself that I like quality code more than I like novel discovery, and the output of the academic world ain't it. Examples I saw were damn near pessimized... not just a lack of comments, but single-letter variables (attempting to represent the Greek letters in the underlying mathematical formulae) and five-letter abbreviated function names.

I walked away and never looked back.

If there's one thing I wish I could have told freshman-year me, it's that software as a discipline is extremely wide. If you find yourself hating it and you're surprised you're hating it, you may just be doing the kind that doesn't mesh with your interests.

This topic is near and dear to my heart and at a quick glance, I pretty much agree with all/most of this post.

I gained multiple years of industry software engineering experience before joining academia (non-CS, graduate-level). And I was flabbergasted at the way software and programming is treated in research setting where the "domain" is not CS or software itself. It took me a few years just to get a hint of what on earth these people (my collaborators who program side-by-side with me) are thinking, and what kind of mindset do they come from.

Then I took a short break and went to the industry. Software engineering, hardcore CS; no domain, no BS. I was expecting that it would feel like an oasis. It didn't. Apart from a handful of process improvements, like use of version control, issue tracking, deadline-management, the quality of the tangled mess of the code was only slightly better.

Initially I took away the lesson that it's the same in academia and industry. But on further reflection there are two big differences:

- The codebase I worked on in the industry was at least 10x bigger. Despite that, the quality was noticeably better.

- More importantly, I could connect with the my coworkers in the industry. If I raised a point about some SwE terminology like test-driven dev, agile, git, whatever, I could have a meaningful discussion. Whereas in academia, not only most domain experts knew jack about 90% of software-engineering concepts and terminology, they were expert at hiding their ignorance, and would steer the conversation in a way that you wouldn't know if they really didn't know or knew too much. I never got over that deceitful ignorance mixed with elitist arrogance.

In the end, I do think that, despite enormous flaws, the industry is doing way better than academia when it comes to writing and collaborating on software and programming, and that the side-by-side comparison of actual codebases is a very small aspect of it.

as people are saying, the typical software engineering advice simply wouldn't work in a research context.

one exception is the most basic stuff - people should use version control, do light unit testing, and explicitly track dependencies. These weren't really done in the past but are becoming more and more common, fortunately.

I think if software engineering experts actually sat down, looked at how researchers work with computers, and figured out a set of practices to follow that would work well in the research context, they could do a lot of good. This is really needed. But the standard software engineering advice won't work as it is, it has to be adapted somehow.

Another issue is that the standard software engineering advice doesn't guarantee clean code either.

I agree the post makes valid points, but is there anything new in that? It had been discussed several times here and on other forums as well. "RSE" is just another made-up position with a very average pay structure -- even this is not new.

However, RSEs (or just general software training) may help research groups establish a structure on how to format code, put some standards in place, and at least have some basic tests. This way, more people can read/modify the code efficiently (more = not necessarily general public, but it at least helps incoming grad students/postdocs to pick up the project easily).

I was reminded that there are research packages like LINPACK, BLAS, and EISPACK for FORTRAN (and some other languages) that have been maintained since the 70s and are still in use.

Back in the 70s my dad was working for an organization called UTHERCC, the University of Texas Health, Education, and Research Computer Center, and these libraries were some of the code he worked with.

You can find references to UTHERCC in papers from the time, although I don't think it exists under that name. Maybe institutions need something like UTHERCC as an ongoing department now.

I once interviewed for a programming job that was a bit of bait and switch. The hiring manager showed me a foot-high stack of green bar paper that was Fortran code written by optical scientists that I was expected to convert to C. He was somewhat surprised when I declined and ended the interview. I pity whoever got stuck with that task.

A nice counter example of research software code that adheres to general software engineering best practices and is easy to pick up and use is the OSMNX project: https://github.com/gboeing/osmnx

Props to Geoff for setting a nice standard.

What I find really surprising about research software, is that even people in Computer Science write poorly designed code as part of their research. I would have imagined that they would be better qualified to create good code. Just goes to show that Software Engineering != Computer Science.

I did some research projects, but the problem is that they are a mix of regular projects and experiments.

Things like Nix worked out great, but other stuff I saw is a tangled mess of Java grown over the last 10 years, written by 30 different students that didn't talk or let alone knew each other.

This is actually the opportunity for our startup. I think there is generally a great opportunity to be the Databricks of a lot of academic software. We're starting in a big research area in biology :)

Why is this surprising? Has anyone been inside a Chemistry or Bio lab? You think that what happens in those labs to get research done is industrial grade?

jupyterlab github issue advocating for documenting research: https://github.com/jupyterlab/team-compass/issues/121

Good. Please, keep on writing more spaghetti code. Dumb people like myself need it to make a living "cleaning" it up. Circle of life.

I agree that academia produces its fare share of spaghetti code, but I don't think all of his arguments are correct.

> writing software is a low status academic activity

This is just not true. People like: Stallman, Knuth, Ritchie, Kernighan, Norvig or Torvalds are not considered as people of low status in the academic world.

Writing horrible spaghetti code in academia may be considered "low status"; but that's another story.

He should compare apples to apples. I.e. do people who work in academia write better or worse code there; compared to when they work for a business? I.e. they should be compared to themselves in different situations, not to some imaginary high coding standard that I've never seen anywhere.

In my own experience from academia at least I'd say that the lack of deadlines; the possibility to do whatever I want, plus the lack of management, creates much higher quality software in academia. When you work commercially, you will churn out embarrassing stuff just to make some stuff work before a deadline.

>> writing software is a low status academic activity > This is just not true. People like: Stallman, Knuth, Ritchie, Kernighan, Norvig or Torvalds are not considered as people of low status in the academic world.

I understand the meaning of 'academic software developers' to mean 'software developers that assist in building software for other, non-CS, fields of research', but you only mention people famous within CS. I don't think this article is meant to apply to CS.

Your probably correct, I just skimmed through the first section when I saw his bullet lists.

I guess that's true for academia in general, i.e. they consider anything but their own field as a joke.

I think rather than low status academic activity, it is just not valued in the academia... The code is usually just the by-product of the paper, and higher quality code does not translate to higher quality paper.

When your code works, you probably already developed the main part of your paper, and you would have no incentives to improve on your program if what you want is just publication... At least this is what I think.

I've seen many research papers where all they disclose about the software is some pseudo code + some tables with timing results to prove their "performance gains".

For those types of papers I agree with your statement. But in many academic scenarios others will want to inspect the source, and the quality of that code is certainly something that will add or subtract from your "status" in the academic world so to speak :-)

OK, perhaps I should read more papers :)

Agreed. Based on my lab experiences, it might have been more accurate for the author to write that often code is not being written by individuals who are the intellectual drivers of the lab. In many labs the 'thinkers' get far more credit and are more valued than the 'doers'.

> In many labs the 'thinkers' get far more credit and are more valued than the 'doers'

In terms of software this never made much sense to me. I would understand if we where talking chemistry or some other discipline where a "new idea" has to be investigated/verified by some "lab-rat" doing mundane tasks for 2 years. In that case the lab-rat would probably get less credit than the person with the actual idea, but this just does not apply to software. Developers are not doing mundane tasks on behalf of some great thinker.

Rarely are the developers writing grants, generating hypotheses, planning experiments, composing manuscripts, presenting the lab's work, teaching at the university, etc. Perhaps my choice of words was poor, but this should be more clear.

I guess this differs from place to place, but at my university (Oslo), we did all that..

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact