To make a concrete example, imagine writing an application where requirements changed unpredictably every day, and where the scope of those changes is unbounded.
The closest to "orderly" I think research code can become would be akin to Enterprise style coding, where literally everything is an interface and all implementation details can be changed in all possible ways. We already know how those codebases tend to end..
If the problem was only unpredictability, then projects with a clear and defined end goal (eg, a website to host results) would be of substantially higher quality. But they’re not. Well defined projects tend to end up basically just as crappy as exploratory projects.
The problem is evaluation and incentives. There’s literally no evaluation of software or software development capability in the industry. I know of a researcher that held a multimillion dollar informatics grant for 3 years. In that 3 years they literally did nothing except collect money. Usually there are grant updating mechanisms, and reports, but he bsed his way through that knowing there’s a 0.0000000% chance that any granting agency is going to look through his code. The fraud was only found because he got fired for unrelated activities.
I once looked up older web projects on a grant. 4/6 were completely offline less than 2 years after their grants completed. For 2 of those 4, it’s unclear whether the site ever completed in the first place.
I hate that every HN post about academia ends with an anecdote describing some rare edge-case they've heard about. Intentional academic fraud is a very small percentage of what happens in academia. Partly this is because it's so stupid: academia pays poorly compared to industry, requires years to establish a reputation, and the systems make it hard to extract funds in a way that would be beneficial to the fraudster (hell, I can barely get reimbursed for buying pizza for my students.) So you're going to do a huge amount of work qualifying to receive a grant, write a proposal, and your reward is a relatively mediocre salary for a little while before you shred your reputation. Also, where is your "collected money" going? If you hire a team, then you're paying them to do nothing and collude with you, and your own ability to extract personal wealth is limited.
A much more common situation is that a researcher burns out or just fails to deliver much. That's always a risk in the academic funding world, and it's why grant agencies rarely give out 5-10 year grants (even though sometimes they should) and why the bar for getting a grant is so high. The idea is to let researchers do actual work, rather than having teams manage them and argue about their productivity.
(Also long-term unfunded project maintenance is a big, big problem. It's basically a labor of love slash charitable contribution at that point.)
This isn’t a rare edge case, this is very common in software projects. I’ve heard of it because I was part of the team brought in to fix the situation.
Intentional fraud only is rare when it’s recognized as fraud. P-hacking was incredibly widespread (and to some extent still is) because it wasn’t recognized as a form of fraud. Do you really think not delivering on a software project has any consequences? Who is going to go in and say what’s fraud, what’s incompetence, and what’s bad luck?
The problem is that the bar for getting software grants isn’t high, it’s nonsensical. As far as I can tell, ability to produce or manage software development isn’t factored in at all. As with everything else, it’s judged on papers, and the grant application. In some cases, having working software models and preexisting users end up being detrimental to the process, since it shows less of a “need” for the money. You get “stars” in their field, who end up with massive grants and no idea of how to implement their proposals. Conversely, plenty of scientists who slave away on their own time on personal projects that hundreds of other scientists depend on get no funding whatsoever.
But I think you're both right in some sense. The cases of intentional major fraud is probably a rare edge case and they make the news when they're uncovered. But there's a lot of grey-ish area like p-hacking as you mentioned, plus funding agencies know there needs to be some flexibility in the proposed timeline due to realities. Realities like you don't necessary get the perfect student for the project right when the grant starts, as the graduate student cycle is annual, plus the research changes over time and it isn't ideal to have students work on an exact plan as if they are an employee.
But I totally agree that maintaining software that people are using should be funded and rewarded by the academic communities. A possible way to do this is have a supplement so that after a grant is over, people who have software generated from the grant that is used by at least 10 external parties without COI, should be funded 100K/yr for however many years they are willing to maintain and improve it. Definitions of what this means needs to be carefully constructed, of course.
>But I totally agree that maintaining software that people are using should be funded and rewarded by the academic communities. A possible way to do this is have a supplement so that after a grant is over, people who have software generated from the grant that is used by at least 10 external parties without COI, should be funded 100K/yr for however many years they are willing to maintain and improve it. Definitions of what this means needs to be carefully constructed, of course.
I think that this is a great idea.
A couple of other weird inequities that I’ve found are:
1. It’s hard to get permission to spend money on software subscription based licenses since you won’t “have anything” at the end. However, it’s much easier to get funding for hardware with time based locks (e.g after 3 years the system will lock up and you have to pay them to unlock). The end result is the same, you can’t use the hardware after the time period is up, but for some reason the admin feels much more comfortable about it.
2. It’s hard get funding to hire someone to set up a service to transfer large amounts of data from different places. It’s much easier to hire someone to drive out to a bunch of places with a stack of hard drives and manually load the data on them, and drive back. Even if it’s 2x more expensive and would take longer. Why? Again my speculation is that the higher ups are just more comfortable with the latter strategy. They can picture the work being done in their head, so they know what they’re paying for.
Simple: predictability. With a subscription based model, admin has to deal with recurring (monthly / yearly) payments, and the possibility is always there that whatever SaaS you choose it gets bought up and discontinued. Something you own and host yourself, even if it gets useless after three years, does not incur any administrative overhead and there is no risk of the provider vanishing. Also, there are no "surprise auto renewals" or random price hikes.
> 2. It’s hard get funding to hire someone to set up a service to transfer large amounts of data from different places.
Never underestimate the bandwidth of a 40 ton truck filled with SD cards. Joke aside: especially off-campus buildings have ... less than optimal Internet / fibre connections and those that do exist are often enough at enough load to make it unwise to shuffle large amounts of data through them without disrupting ongoing operations.
I've done both, and OOP can also make things worse. Now instead of just doing the calculations in a straightforward procedural fashion anyone who knows the research can understand, you've added a layer of structure to obfuscate it, and that structure may be harder to change if you guessed wrongly about what will be consistent and what won't. Research by its nature needs to be more flexible and will be more unpredictable than industry development. It is far more common to have to go back and reexamine even your most basic assumptions.
Of course a lot of researchers are doing the same things as industry (what should be described as development and not be getting research funding), and are certainly doing a much more amateur job of it.
Grant fraud is penalized severely in the US by the way. You can even get a bounty for reporting someone.
I wonder if a whistleblower payout similar to the one that SEC is doing for 1M+ fines (10-30%) would help in cases like this. The host organization would potentially be on the hook as well, so there is going to be a significant incentive to not let that happen (especially with all the associated reputational damage).
I'd say you're confirming the author's theory that writing code is a low-status activity. Papers and citations are high-status, so papers are well refined after the research is "done". Code, however, is not. If the code was considered on the same level as the paper, I think people would refine their code more after they finish the iteration process.
At the same time, you need to consider that such a clean up is only realistically helpful for other people to check whether there are bugs in the original results, and not much else. Reproducing results can be done with ugly code, and future research efforts will not benefit from the clean up for the same reasons I outlined in my previous post.
While easing code review for other people is definitely helpful (it can still be done if one really wants to, and clean code does not guarantee that people will look at it anyway), overall the gains are smaller than what "standard" software engineers might assume. And I'm saying this as a researcher that always cleans up and publishes his own code (just because I want to mostly).
I assumed that most code published could be directly useful as an application or a library. Considering what you're saying, this might be only a minority of the code. In that case, I agree with your conclusion about smaller gains.
Academic code can be really bad. But most of the time it doesn't matter, unless they're building libraries, packages, or applications intended for others. That's when it hurts and shows.
I'm a research programmer. I have a master's in CS. I take programming seriously. I think academic programmers could benefit from better practice. But I think software developers make the mistake of thinking that just because academics use code the objective is the same or that best practices should be the same too. Yes, research code should perform tests, though that should mostly look like running code on dummy data and making sure the results look like you expect.
Here's the thing - in industry, this background (quant undergrad + MS, high programming ability, industry experience) is kind of the gold standard for data science jobs. In academic job ladders it's... hmm. Here's the thing - by the latest data, MS grads in these fields from top programs are starting at between 120k-160k in industry, and there are very good opportunities for growth.
I actually think that universities and research centers can compete with highly in demand workers in spite of lower salaries, but highly talented people in demand will not turn away an industry job with salary and advancement potential to remain in a dead end job.
You're actually making bugs sound like a feature here. I'm pretty sure that if you've gotten impressive results with ugly code, the last thing you want to do is touch the code. If you find a bug, you have no paper.
How many tests would be written for business software if it had only to run for one meeting and then never be looked at again?
I don't have to imagine it, I'm employed in the software industry.
Seriously, nothing you describe sounds any different from normal software development.
Oh, boy, how many times have I heard this working at a startup. There is some truth to it, it's hard to organise code in the first weeks of a new project. But if you work on something for 3+ months, it becomes a matter of making a conscious effort to clean things up.
> To make a concrete example, imagine writing an application where requirements changed unpredictably every day,
Welcome to working with product managers at any early stage-company. Somehow I managed to apply TDD and good practices most of the time. Moreover, I went back to school after 7+ years developing software full-time. I guarantee that most of the low-quality research code is a result of a lack of discipline and experience in writing maintainable software.
Bingo! Most research code is written by graduate students who never had a job before, so they do not know how to write maintainable software. You are definitely the exception, as you held a software dev job before going back to school.
That sounds like software development, alright. It takes a while for domain experts to learn that if programmer ask "is X always true/false", they mean that there are no exceptions from that rule.
I would like for researchers to just name variables sensibly. Even that would improve code quality a lot.
Still the key problem is that there are zero incentives for researchers to even make their code readable! It does not improve any of the metrics they are judged by.
The real reason is the incentives. Not just are there no incentives to produce good quality code, there are incentives which make people focus on other outputs. Publish or perish means that people put up with technical debt just to get to the next result for the next paper, then do it again and again.
I believe this is true and is fueled by a misconception of what software is in research. Software in research is often akin to experimentalist work in the past. It's tacked onto theoretical work projects as an afterthought and not treated as what it really is: forcing the theory to be tested in a computational environment.
If we start treating research software like experimentalism in the past, we might get a bit more rigor out of the development process as well as the respect it really deserves.
Here's the thing. Sometimes, there's no code - I mean, they'll find something, but nobody can say, with certainty, that it is the code that generated the data or results you're trying to recreate. There's often no data - and by that, I mean, nothing, not even a dummy file so you can tell if it even runs or understand what structure the data needs to be in. No build, no archive history, no tests. And when I say no tests, I'm not talking about red/green bar integration and unit tests, I mean, ok, the code ran... was this what it was supposed to produce?
Many of these projects are far, far more messed up than the intrinsic nature of research would explain - though I will again agree that research code may be unusually likely to descend into entropy.
What then drives the improvement of the code quality is the potential need for continuity and knowledge retention - either in the form of iterative cleaning of the debt or the re-write. This is reliant on the perceived value for the organisation. From this perspective it's more straightforward to get to author's reasons.
Ironically this is also what occams razor would demand from good Science, so you'd have a win win scenario, where you both create good software and good research, because you focus on the simplest most minimal approach that could possibly work.
Simplicity is a nice dream. The realities of research are very often stacked against it.
I've seen research groups drown in their legacy code base.
The issue of juggling too many balls you describe is one you only have to begin with because the state of the art implementations are so shoddy to begin with.
Research suffers as much as everybody else from feature creep. Good experiments keep the number of new variables low.
And you say it yourself: good experiments change a single variable at a time. So how do you check that a series of potential improvements that you are making is sound?
Although this is a tangent from the above conversation, this isn't actually true: well-designed experiments can indeed change multiple variables at the same time. There's an entire field of statistics dedicated to experimental design (google "factorial designs" for more information). One-factor-at-a-time (OFAT) experiments are often the least efficient method of running experiments, although they are conceptually simple.
See the following article for a discussion:
A Google search makes it look like Julia has a mechanism where you can extent the sets of overloads of a function or method outside the original module. The terminology is different (functions have methods instead of overloads in their speak). I don't see how that feature solves the problem in practice.
Note that Simple != Easy or Naive
Hardcoded structures is potentially exactly the kind of simplicity needed.
What's not simple is a general "this solves everything and beyond" code-base with every imaginable feature and legacy capability.
This is describing infinitely fast and efficient p-hacking (i.e. research that is likely to produce invalid results).
If your assumptions are broken then that should ideally be reported as part of your research.
When you do research, you ideally start out with fixed assumptions, and then test those assumptions. The code required to do this can be buggy (and can therefore get fixed), and you can re-purpose earlier code, but the assumptions/brief shouldn't change in the middle of the coding it up.
If you aren't following the original brief, you've rejected your original research concept and you're now doing a different piece of research than you started out - and this is no longer a sound piece of research.
Research should be highly dissimilar to a web design project in this respect.
The reason these projects often become a tangled mess is because researchers don't have the coding skill to program any other way (in my opinion, and nor do institutions invest sufficiently in people who do have this skill).
How did it become such a mess in the first place? Simple - I didn't know my requirements when I started writing it. I built it to do one thing. In running it I learned more things (this is good - why you build stuff like this in the first place). The code changed rapidly to accommodate these lessons.
It wasn't long before I was running into limitations in the design of the underlying libs I was using etc. Of course I could find a way to make it work but it wasn't going to win any Software Design Awards.
Im happy to report that despite ending up a tangled mess, it actually helped me to come to understand and conquer a very specific kind of problem. In doing so I learned the limitations of commercially available tooling, the limitations of commercially available data, not to mention a great deal about the problem domain itself.
This research software has earned its keep and is now being cleaned up into a more organized, near commercial quality kind of project. Im glad I threw out "architecture" when I first started with this. It could have gone the other way where I had a very well built piece of code that didn't in fact perform any useful function.
The spaghetti monster looms large when you're in the heat of battle. But we've all got some idle time for whatever reason. I spend some time every week doing a couple of things: 1) Reading about good techniques. 2) Working through old code and cleaning it up.
Because changing your code could always break it, refactoring also reinforces the habit of writing code that can be readily tested -- also a good thing.
I used to write some crazy spaghetti code as an untrained student working in a lab. Coding would go really quickly at first, but as I kept adding on to accommodate new requirements it became a huge kludgy mess.
Recently (after quite a few years of software engineering experience) I helped a researcher friend to build some software. He was following along with my commits and asked why I kept changing the organization and naming of the code, pulling things out into classes, deleting stuff that he thought might be needed later, etc. He spends only a small part of his time writing code, so he's never realized how much time it actually saves to keep things organized and well-factored.
What are the features of Smalltalk that allowed this to happen? Conversely, what is stopping this from existing in more modern dynamic languages?
A lot of my code interacts with hardware configurations that will cease to exist when a project is done, but I mainly look at the stuff that's potentially reusable, and making it worth re-using.
I'm using Python, and there are a lot of tools for enforcing coding styles and flagging potential errors. I try to remove all of the red and yellow before closing any program file. I don't trust myself with too much automation! "Walk before you run."
Refactoring is kind of subjective, because there is rarely One Right Way to solve a problem, and you need context, so I could see why it’s not something that languages themselves take strong opinions on.
This [first] system acts as a "pilot plan" that reveals techniques that will subsequently cause a complete redesign of the system.
However, in practice I'm not confident enough in my understanding, and fear losing all that hard-won work, so I refactor too.
A rewrite from scratch is probably more viable when the project is small enough to keep in your head at once.
I work in computational biology, and my normal thought process is that by default, you should expect to write the code three times (especially for less experienced developers).
The first time, you don’t know the problem.
The second time you’ve figured out the problem, but don’t know the best way to do it.
The third time, you’ve figured out the problem and a decent strategy to solve the problem.
With more experience, you can narrow that to just two iterations. But really, especially with research, you rarely have a good feel for the problem domain the first time around. And when you have the expectation that you’re going to throw the code away, you don’t get quite as hung up on implementation details for the first two rounds. And because of that, the process is easier. And you don’t have to worry about refactoring bad code. Just accept the first round as an experiment and take what you’ve learned about the problem to write better the next time.
Rapid, iterative prototyping, followed by refactoring, is a perfectly reasonable approach today. No need to create a fresh repository and rewrite all code from scratch.
David Heinemeier Hanssen, creator of Rails and a big advocate of building working code as early as possible, wasn't even born in 1975 when the mythical man-month was written. Linus Torvalds was a (presumably) plucky 6-year-old. Brooks wrote that book for an audience that would have known waterfall as the only way.
"This I now perceived to be wrong, not because it is too radical, but because it is too simplistic. The biggest mistake in the 'Build one to throw away' concept is that it implicitly assumes the classical sequential or waterfall model of software construction."
It's the part that is a tangled Gordian Knot that is easier to cut than meticulously unravel.
"Starting with a clean slate" is a common idea in many contexts. One very similar to code is writing. You can hack and edit, but a fresh draft is easier (and safer) for fundamental conceptual and structural changes.
BTW "Waterfall" was a parody of processes, though the conceptual aspects (requirements vs specifications etc) are useful. People then were just as intelligent as today. Maybe more.
"Good architecture is the one that allows you to change."
1. often when I read read software written by profession programmers I find it very hard to read because it is too abstract, almost every time I try to figure out how something works, it turns out I need to learn a new framework and api, by contrast research code tends to be very self contained
2. when I first wrote research software I applied all the programming best practices and was told these weren't any good; turns out using lots of abstraction to increase modularity makes the code much slower, this is language dependent of course
3. you will find it much harder to read research code if you don't understand the math+science behind it
> many of those writing software know very little about how to do it
This is just not true. I found in my experience that people writing research software have a very specific skillset that very very few industry programmers are likely to have. They know how to write good numerics code, and they know how to write fast code for super computers. Not to mention, interpreting the numerics theory correctly in the first place is not a trivial matter either.
On one hand, academics I've worked with absolutely undervalue good software engineering practices and the value of experience. They tend to come at professional code from the perspective of "I'm smart, and this abstraction confuses me, so the abstraction must be bad", when really there's good reason to it. Meanwhile they look at their thousands of lines of unstructured code, and the individual bits make sense so it seems good, but it's completely untestable and unmaintainable.
On the other side, a lot of the smartest software engineers I've known have a terrible tendency to over-engineer things. Coming up with clever designs is a fun engineering problem, but then you end up with a system that's too difficult to debug when something goes wrong, and that abstracts the wrong things when the requirements slightly change. And when it comes to scientific software, they want to abstract away mathematical details that don't come as easily to them, but then find that they can't rely on their abstractions in practice because the implementation is buried under so many levels of abstraction that they can't streamline the algorithm implementation to an acceptable performance standard.
If you really want to learn about how to properly marry good software engineering practice with performant numerical routines, I've found the 3D gaming industry to be the most inspirational, though I'd never want to work in it myself. They do some really incredible stuff with millions of lines of code, but I can imagine a lot of my former academia colleagues scoffing at the idea that a bunch of gaming nerds could do something better than they can.
Your definition of "smartest software engineers" is the opposite of mine. In my view, over-engineering is the symptom of dumb programmers. The best programmers simplify complex problems; they don't complicate simple problems.
This is certainly a lot of work, and this takes a lot of practice to perform efficiently: But no matter what, I comment every single line of code, no matter how mundane it is. I also cite my sources in the commenting itself, and I also have a bibliography at the bottom of my code.
I organize my code in general with sections and chapters, like a book. I always give an overview for each section and chapter. I make sure that my commenting makes sense for a novice reading them, from line-to-line.
I do not know why I do this. I guess it makes me feel like my code is more meaningful. Of course it makes it easier to come back to things and to reuse old code. I also want people to follow my thought process. But, ultimately, I guess I want people to learn how to do what I have done.
Writing long descriptions in comments works if you're the only one editing the code, or you supervise all contributions... in a fast-changing industrial codebase, those things go out of date very quickly, so comments are used more sparsely. I document the usage of any classes or functions that my package exports, and I'll write little inline comments explaining lines of code whose purpose or effect is not obvious. Mostly I just try to organize things sensibly and choose descriptive names for variables and functions.
You say that you don't see it having much "difference with regard status and salary". The problem here is two-fold. Firstly, salaries at UK universities are set on a band structure and so an RSE will earn a comparable amount to a postdoc or lecturer. These aren't positions that are known for high wages and historically the reason that people work in research is not for a higher salary.
As for status, I can see that the creation of the Research Software Engineer title (since about 2012) has done great good for improving the status of people with those skills. Before they were "just" postdocs with not many papers but now they can focus on doing what they do best and have career paths which recognise their skills.
My role (at the University of Bristol - https://www.bristol.ac.uk/acrc/research-software-engineering...) is focused almost entirely on teaching. I'm not trying to create a new band of specialists who would identify as RSEs but rather provide technical competency for people working in research so that the code they write is better.
There is a spectrum of RSEs from primarily research-focused postcode who write code to support their work along to full-time RSEs whose job is to support others with their research (almost a contractor-type model). We need to have impact all the way along that spectrum, from training at one end to careers and status at the other.
For more info on the history of the role, there's a great article at https://www.software.ac.uk/blog/2016-08-17-not-so-brief-hist... written by one of the founding members of the Society of Research Software Engineering.
Yep, that's the one liner right there.
The incentives simply do not match the complaints. Researchers already work upwards of 60 hrs/wk on most occasions. Alongside writing code, they also have to do actual research, write papers, give talks and write grants.
All of the latter tasks are primary aspects of their jobs and are commensurately rewarded. The only situation where a well coded tool is rewarded, is when a package blows up, which is quite rare.
Like all fields, the high-level answer to such questions is rather straightforward. The individual contributors align their efforts to the incentives. Find a way to incentivize good research code, and we will see changes overnight.
While others here point out that researchers = bad programmers is a lazy excuse, I think it is important to point out just how steep the learning curve of computer environments can be for the layperson that uses Excel or MATLAB for all their computational work. It can be a huge time investment to get started with tools, such as git or Docker, that we take for granted. I think recognizing this dearth of computer skills is a first step towards training researchers to be computer-competent. Currently, I find the attitude among academics (especially theorists) to be dismissive of the importance of such competencies.
The code itself is a ton of munging and then some basic stat functions. This information can be gleaned from the methods section of the article anyway.
So, really, my field of public health doesn't use GitHub or sharing much, there's simply too little benefit to the researcher to share their code.
There's an unwarranted fear of getting your work poached. In modern science, publications are everything, they determine your career. Enabling your direct competitors, those who want the same grants and students and glories, is not common in science.
Although I agree with your analysis that enabling competitors in science is not common, it really, really should be. That’s kinda the point of publication, at least in theory. Sharing knowledge and methods.
Said someone whose livelihood doesn't depend on said competition.
I have some research published for which I wrote MATLAB code years ago. I trust the fundamental results but not the values displayed in the tables. I would have personally benefited from rudimentary version control and unit testing.
>My code documents wouldn't be helpful since the data is all locked up due to HIPAA concerns. We're talking names, health conditions, scrambled SSN, this isn't reproducible because the data is locked to those without security clearance.
Is there a standard format for this kind of data? If so, consider using it. That way, others can easily create artificial datasets to test it. Even if you have no control over your data source, you can convert the raw data to the standard as a "pre-munging" step.
>So, really, my field of public health doesn't use GitHub or sharing much, there's simply too little benefit to the researcher to share their code.
Sad but true.
Yet computer scientists consistently fail to achieve reproducibility with a tool that is the most consistent at following instructions - the computer.
Even private business is on the DevOps movement, because they see the positive effects of reproducibility.
If the academic world is truly about science, then there is no more excuse, the tools are out there, they need to use them.
You need to step back and look at more mature, simple codebases and what you can do in those sorts of environments when you want reproducibility. You can't cobble together a bunch of async services in the cloud and hope your Frankenstein tool gives you perfect results. It will give you good enough results for certain aspects if you focus on those specific aspects (banking does a good job of this with transactional processing and making sure values are consistent because it's their entire business, maybe your account or their web interface is skrewy but that's fine, that can fail).
This never ceases to amaze me. I regularly read recent papers on shortest-path algorithms. Each one is religiously benchmarked down to the level of saying what C++ compiler was used. But the code itself is almost never published.
Research code shouldn't be a monolith. Each hypothesis should be a script that follows a data pipeline pattern. If you have a big research question, think about what the most modular progression of steps would be along the path from raw data to final output, and write small scripts that perform each step (input is the output from the previous step). Glue them all together with the data pipeline, which itself is a standalone, disposable script. If step N has already been run, then running the pipeline script once again shouldn't resubmit step N (as long as the input hasn't changed since the last run).
This "intermediate data" approach is useful because we can check for errors each step on the way and we don't need to redo calculations if a particular step is shared by multiple research questions.
I was taught this by a good mentor and I've been using this approach for many years for various ML projects and couldn't recommend it more highly.
I looked over a friend's PhD program because the results were unstable. I knew nothing about the domain which was a large disadvantage, but on the code front it was a monolith following a vague data pipeline approach. Unfortunately components wouldn't run separately and there were only a single end to end tests taking hours to run. Had each section had its own tests, diagnosing which algorithm(s) were malfunctioning would have been easier. We never did.
I'd settle for just publishing the code at all, even if it is a tangled mess. This is still not all that common in the natural sciences, though I have a bit of hope this will change.
And if the tools and methods you used for arriving at them are so messy that you dare not publish them what does that tell me about:
- your process;
- the organisation of your ideas;
- the conclusions or points made in the paper?
I don't mean it has to be idiomatic well written code, but it should be readable enough to be followed.
The recent example of Citibank´s loan payment interface comes immediately to mind. So does Imperial's Covid model (the one that had timing issues when run on different computers.)
I worked closely with an NLP researcher for a while on a project that had received a hefty state grant. She knew more or less what her team needed, but she needed someone to implement it cleanly and in a way that would not make users step on each others toes.
The chances of that project being a buggy mess would have been pretty high if it had been written by people who don't write software for a living. And maybe that's OK.
The workhorse NIH grant is a R01 with a $250,000/year x 5 years "modular" budget. Most labs have, at most, one. Some have two, and a very few have more than that. This covers everything involved in the research: salaries (including the prof's), supplies, publication fees, etc. Suppose you find a programmer for $75k. With benefits/fringe (~31% for us, all-in), that's nearly $100k/year. If the principal investigator (prof, usually) takes a similar amount out of the grant, there's very little money left to do the (often very expensive) work. In contrast, you can get a student or postdoc for far less--and they might even be eligible for a training grant slot, TAship, or their own fellowship, making their net cost to the lab ~$0.
This would be easy to fix: the NIH already has a program for staff scientists, the R50. However, they fund like two dozen per year; that number should be way higher.
 Other mechanisms exist at the NIH--and elsewhere--but NSF (etc) grants are often much smaller.
Yeah I totally agree on this part. The academic system relies not on monetary compensation for its labor, rather it provides them reputation by getting their names on a paper.
I worked essentially for free for a lab in my spare time for 4 years. They get to the result they want, even if its built on a shaky foundation, and for basically free (it doesn't cost anything to put a name on a paper). At the end of the 4 years the dream of getting my name on a paper didn't even pan out (lab was ramping down and was essentially a teaching research lab by the time I showed up).
Also, I feel personally attacked by the headline. :)
It is for this reason I try to keep my code and models pretty simple, only two or three pages of code (or ideally a single page), and I don’t try to do too many things with one program, and I choose implementations and algorithms that are simpler to implement to make concise code feasible (sometimes at the expense of speed or generality).
For example, discrete optimization research (nurse rostering, travelling salesman, vehicle routing problem, etc.) is filled with papers where people are evaluating their methods on public benchmarks but code never sees the day. There's a lot of state-of-the-art methods that never have their code released.
I'm pretty sure it's like that elsewhere. Machine learning and deep learning for some reason has a lot of code in the open but that's not the norm.
I'd prefer the code to be open first. Once that's abundant then I might prefer the code to also be well designed.
I agree, although lately there's been some effort by academia to make authors publish their code, or at least disclose it to the reviewers.
Several conferences have an artifact evaluation committee, which tries to reproduce the experimental part of submitted papers. Some conferences actually require a successful artifact evaluation to be accepted (see, for instance, the tool tracks at CAV  and TACAS ).
Others, while not requiring an artifact evaluation, may encourage it by other means.
The ACM, for instance, marks accepted papers with special badges  reflecting how well the alleged findings can be reproduced and whether the code is publicly available.
I'm not in academia now, but I started out my career doing sysops and programming in a lab at a medical school and have worked with academics a bit since. I don't do it much because it's basically volunteer work, and it's almost impossible to contribute meaningfully unless you are also well-versed in the field.
Writing good code requires a different mindset; firstly, it requires acknowledging that communication is extremely ambiguous and that it takes a great deal of effort to communicate clearly and to choose the right abstractions.
A lot of the best coders I've met struggle with math and a lot of the best mathematicians I've met struggle with writing good code.
It does not make sense to judge any piece of code that does not meet "highest standard" to be a tangled mess.
There are valid reasons to have varying quality of code and also the idea of quality might be changing from problem to problem and project to project.
A quality of code that governs your car's ECU should be different from quality of code that some research team threw together to demonstrate an idea.
A coding project should achieve some kind of goal or set of goals as efficiently as possible and in many valid cases quality is just not high on the list and for a good reason.
Right now I am working on a PoC to verify an idea that will take a longer time to implement. We do this because we don't want to spend weeks on development just to see it doesn't work or that we want to change something. So spending 2-3 days to avoid significant part of the risk of the rest of the project is fine. It does not need to be spelled out that the code is going to be incomplete, messy and maybe buggy.
There is also something to be said for research people to be actually focusing on something else.
Professional developers focus their careers on a single problem -- how to write well (or at least they should).
But not all people do. Some people actually focus on something else (physics maybe?) and writing code is just a tool to achieve some other goals.
If you think about people working on UIs and why UI code tends to be so messy, this is also probably why. Because these guys focus on something else entirely and the code is there just to animate their graphical design.
Annoyingly, more people now know of me due to those pieces of software than for my research agenda. :-(
 : Not the only thing mind you.
The same thinking should be used when adding regulation to an industry. Heavy regulation on a rapid developing industry can stifle innovation. Regulation (if needed), should be applied as our understanding of the industry increases.
In the small, this isn't different from taking a lab notebook and making it clearer and better summarized so that it can be passed on to the poor sucker who has to do what you did after you move on to another project.
Furthermore, software projects that are put under the same iterative stress you imply for R&D inevitably go through a refactoring phase so that performance isn't affected in the long run.
We don't expect Aerospace / Mechanical engineering students to learn metalworking. They typically have access to shop technicians for that work. Why not persuade university administrators to similarly invest in in-house software engineering talent. Generalists who can provide services to any problem domain: from digital humanities to deep reinforcement learning?
You'd be surprised, but that is often not the case. Lack of sufficient funding, or technicians being dicks, or mis-management by PIs, often result in graduate students having to do the technical work of metalwork, welding, lab equipment calibration, and a bunch of other tasks. Sometimes they even have to operate heavier machinery, or lasers etc without the minimum reasonable technical staff support.
I know this from my time on the executive committee of my old university's Grad Student Organization.
Umm...we sorta do.
As a neuroscience postdoc, I have done virtually everything from analysis to zookeeping, including some (light) fabrication. We outsource really difficult or mass-production stuff to pro, and there's a single, very overworked machinist who can sometimes help you, but most of the time it's DIY.
Also, a question. If you publish a paper with a repo, what would be the best way to handle the version in the paper matching the repo in the future?
An opinion, there is such a thing as software being ‘done’ and ‘as is’. Software solves a need. After that’s meet, that’s it.
There’s also this part that strikes me,
>Given a tangled mess of source code, I think I could reproduce the results in the associated paper (assuming the author was shipping the code associated with the paper; I have encountered cases where this was not true).
And it strikes me as weird. The main issue to reproduce results is usually data. And depending on the dataset, it’s very hard to get. To be able to reproduce the code, I just need the paper.
The code may have bugs, may stop working, may be in a different language/framework. The source of truth is the paper. This is why the paper was published.
Speaking as someone who's not the best at math, I find it easier to understand what a paper is saying after I run the code and see all the intermediate results.
When the code doesn't work, it takes me 20 times longer to digest a paper. They could do with only uploading code -- to me it's the shortest and most effective way to express the ideas in the paper.
As long as you understand the paper after, that's okay.
> When the code doesn't work, it takes me 20 times longer to digest a paper.
What if the data isn't available? That's another issue. I see where you're coming from, but that's why the paper itself is the source of truth. Not the implementation.
Another case, what if the implementation makes assumptions on the data? Or on the OS it's being run on?
> They could do with only uploading code -- to me it's the shortest and most effective way to express the ideas in the paper.
In my opinion, no. The math and algorithm behind it is more important than an implementation and better for longevity.
You can include the hash of the commit used for your paper.
Yes, although truth of the flimsiest kind. A lowly but wise code monkey once
said "Talk is cheap. Show me the code."
- writing general purpose software that works on multiple platforms and is bug free is really really hard. So you're just going to be inundated with complaints that it doesn't work on X
- maintaining software is lots of work. Dependencies change, etc.
- supporting and helping an endless number of noobs use your software is a major pita. "I don't know why it wouldn't compile on your system. Leave me alone."
- "oh that was just my grad work"
- its hard to get money to pay for developing it further. great when that happens though.
Researchers use code as a tool of thought to make progress on very ambiguous, high-level problems that lack pre-existing methodology. Like, how could I detect this theoretical astrophysical phenomenon in this dataset? What would it take to predict disease transmission dynamics in a complex environment like a city? Could a neural network leveraging this bag of tricks in some way improve on the state-of-the-art?
If you have JIRA tickets like that in your queue, maybe you can compare your job to that of a researcher.
There's a certain degree of naivity in this process that SMEs think it's a trivial step translating their research into software. It's not, not if you demand the rigor science should be operating at. As such, many budgets are astronomically lower than they should be. This has worked in the past but as more science moves into software and it becomes more critical to the process, you must invest in the software and it's not going to be cheap. The shortcuts taken in the past won't cut it.
There's a bigger issue in that as a society we don't want to invest in basic research so it's already cash strapped. Combine that with research scientists who already have to cut corners with the massive cost quality software will take and you're creating a storm where science will either produce garbage or well need to reevaluate how we invest in software systems for science.
The software industry has its own share of problems, but from what I've seen the research community is still largely operating on an outdated software model that shuns open collaboration out of fear of being "scooped".
If measured by compensation, then research is a low status activity. Perhaps more precisely, researchers have low bargaining power. But I don't think that academics actually analyze activities in such detail. The PI might not even know how much programming is being done.
The researcher is programming, not because they see it as a way to raise (or lower) their status, but because it's a force multiplier for making themselves more productive overall. Though I work in industry, I'm a "research" programmer for all intents and purposes. I program because I need stuff right away, and I do the kind of work that the engineers hate. Reacting to rapidly changing requirements on a moment's notice disrupts their long term planning. Communicating requirements to an engineer who doesn't possess domain knowledge or math skills is painful. Often, a working piece of spaghetti code that demonstrates a process is the best way to communicate what I need. They can translate it into fully developed software if it threatens to go into a shipping product. That's a good use of their time and not of mine.
>>> Why would a researcher want to invest in becoming proficient in a low status activity?
To get a better job. I sometimes suspect that anybody who is good enough at programming to get paid for it, is already doing so.
>>> Why would the principal investigator spend lots of their grant money hiring a proficient developer to work on a low status activity?
Because they don't know how to manage a developer. Software development is costly in terms of both time and effort, and nobody knows how to manage it. Entire books have been written in this topic, and it has been discussed at length on HN. A software project that becomes an end unto itself or goes entirely off the rails can eat you alive. Finding a developer who can do quantitative engineering is hard, and they're already in high demand. It may be that the PI has a better chance managing a researcher who happens to know how to translate their own needs into "good enough" code, than to manage a software project.
Come to think of it, something like UTHERCC might be exactly what is needed to help the current situation.
Also, source docs available here: https://zenodo.org/record/4005773?fbclid=IwAR1JGaAj4lwCJDrkJ...
And, their solution product https://cknowledge.io/ and source code https://github.com/ctuning/ck
I guess it should be helpful to the researchers community.
To be clear, novel computer science is valuable and the lifeblood of the software engineering industries. But the actual product? I discovered of myself that I like quality code more than I like novel discovery, and the output of the academic world ain't it. Examples I saw were damn near pessimized... not just a lack of comments, but single-letter variables (attempting to represent the Greek letters in the underlying mathematical formulae) and five-letter abbreviated function names.
I walked away and never looked back.
If there's one thing I wish I could have told freshman-year me, it's that software as a discipline is extremely wide. If you find yourself hating it and you're surprised you're hating it, you may just be doing the kind that doesn't mesh with your interests.
I gained multiple years of industry software engineering experience before joining academia (non-CS, graduate-level). And I was flabbergasted at the way software and programming is treated in research setting where the "domain" is not CS or software itself. It took me a few years just to get a hint of what on earth these people (my collaborators who program side-by-side with me) are thinking, and what kind of mindset do they come from.
Then I took a short break and went to the industry. Software engineering, hardcore CS; no domain, no BS. I was expecting that it would feel like an oasis. It didn't. Apart from a handful of process improvements, like use of version control, issue tracking, deadline-management, the quality of the tangled mess of the code was only slightly better.
Initially I took away the lesson that it's the same in academia and industry. But on further reflection there are two big differences:
- The codebase I worked on in the industry was at least 10x bigger. Despite that, the quality was noticeably better.
- More importantly, I could connect with the my coworkers in the industry. If I raised a point about some SwE terminology like test-driven dev, agile, git, whatever, I could have a meaningful discussion. Whereas in academia, not only most domain experts knew jack about 90% of software-engineering concepts and terminology, they were expert at hiding their ignorance, and would steer the conversation in a way that you wouldn't know if they really didn't know or knew too much. I never got over that deceitful ignorance mixed with elitist arrogance.
In the end, I do think that, despite enormous flaws, the industry is doing way better than academia when it comes to writing and collaborating on software and programming, and that the side-by-side comparison of actual codebases is a very small aspect of it.
one exception is the most basic stuff - people should use version control, do light unit testing, and explicitly track dependencies. These weren't really done in the past but are becoming more and more common, fortunately.
I think if software engineering experts actually sat down, looked at how researchers work with computers, and figured out a set of practices to follow that would work well in the research context, they could do a lot of good. This is really needed. But the standard software engineering advice won't work as it is, it has to be adapted somehow.
However, RSEs (or just general software training) may help research groups establish a structure on how to format code, put some standards in place, and at least have some basic tests. This way, more people can read/modify the code efficiently (more = not necessarily general public, but it at least helps incoming grad students/postdocs to pick up the project easily).
Back in the 70s my dad was working for an organization called UTHERCC, the University of Texas Health, Education, and Research Computer Center, and these libraries were some of the code he worked with.
You can find references to UTHERCC in papers from the time, although I don't think it exists under that name. Maybe institutions need something like UTHERCC as an ongoing department now.
Props to Geoff for setting a nice standard.
Things like Nix worked out great, but other stuff I saw is a tangled mess of Java grown over the last 10 years, written by 30 different students that didn't talk or let alone knew each other.
> writing software is a low status academic activity
This is just not true. People like: Stallman, Knuth, Ritchie, Kernighan, Norvig or Torvalds are not considered as people of low status in the academic world.
Writing horrible spaghetti code in academia may be considered "low status"; but that's another story.
He should compare apples to apples. I.e. do people who work in academia write better or worse code there; compared to when they work for a business? I.e. they should be compared to themselves in different situations, not to some imaginary high coding standard that I've never seen anywhere.
In my own experience from academia at least I'd say that the lack of deadlines; the possibility to do whatever I want, plus the lack of management, creates much higher quality software in academia. When you work commercially, you will churn out embarrassing stuff just to make some stuff work before a deadline.
I understand the meaning of 'academic software developers' to mean 'software developers that assist in building software for other, non-CS, fields of research', but you only mention people famous within CS. I don't think this article is meant to apply to CS.
I guess that's true for academia in general, i.e. they consider anything but their own field as a joke.
When your code works, you probably already developed the main part of your paper, and you would have no incentives to improve on your program if what you want is just publication... At least this is what I think.
For those types of papers I agree with your statement. But in many academic scenarios others will want to inspect the source, and the quality of that code is certainly something that will add or subtract from your "status" in the academic world so to speak :-)
In terms of software this never made much sense to me. I would understand if we where talking chemistry or some other discipline where a "new idea" has to be investigated/verified by some "lab-rat" doing mundane tasks for 2 years. In that case the lab-rat would probably get less credit than the person with the actual idea, but this just does not apply to software. Developers are not doing mundane tasks on behalf of some great thinker.