Hacker News new | comments | show | ask | jobs | submit login
The Recomputation Manifesto (software.ac.uk)
110 points by ajdecon on July 16, 2013 | hide | past | web | favorite | 59 comments

Going to bang a small drum here and proclaim:

When doing science, don't use a tech stack that will be obselete next year or in 3 years. Use something that will be AROUND; something that is well defined, something that other people can use down the road on a different system.

Virtual machines are "okay". Running them requires having the hypervisor working correctly and - hopefully - the original architecture supported.

Personally, I used Common Lisp for my Master's work. Other examples of tech stacks that are very stable are C, C++, and Fortran.

For certain research problems, different and often obscure tool sets are required. That, to me, is a great argument behind recreating everything your environment needs in a VM - unlike Chemists who can't bundle up their lab into a little transportable box, Computer Scientists can.

In making the argument for recomputation to aid its adoption, it's important not to add too many constraints on the researcher e.g. "Your results should be reproduceable and written in C because it should still be around ten years from now."

Even an obsolete language in a working VM sandbox can be edited and inspected to make changes, verify correctness etc. If you want to extend the results of that research you may have a porting project to update the code to more a current language, but that's leaps and bounds ahead of how things are now: read and re-read the paper until it makes sense and implement the algorithms from scratch.

Agree with your hypervisor/architecture point but I suppose there has to be some broad architectural choice made.

(The author of this article was my PhD supervisor. I've always found him to be an inspiring, forward thinking researcher so as much as I love the recomputation idea on merit, I'm also a little biased.)

unlike Chemists who can't bundle up their lab into a little transportable box, Computer Scientists can

This is often a good thing for scientific progress in chemistry, though. Independent reproduction in independent labs gives a lot more confidence in results than shipping the exact same lab around would. A lot of interesting results come up due to issues in replication, which you wouldn't find if you just literally reran the identical experiment on the identical equipment.

That said, being able to look at and experiment with someone's original apparatus is better than nothing.

> read and re-read the paper until it makes sense and implement the algorithms from scratch.

More often than not to realize that some nasty detail got omitted because it was nasty.

To me, this is one of the reasons why it's good for people to do independent re-implementations. Academia doesn't value these highly enough.

Recomputation and reproduction are both valuable

I agree completely. From my experience, because of things like the nasty omitted details mentioned in the parent comment, techniques are rarely reimplemented unless they claim pretty substantial improvements.

I think having a starting point of reproduceability would lead to more reimplementations.

How are all these VMs going to react if people switch, say, to desktops with ARMv8, or whatever comes next?

I can emulate a PDP11, a Nintendo Gameboy, or any other old system.

There are perfectly functional x86 emulators for ARM already. Many people used to run Windows on PPC macs via emulation too.

Emulating a Gameboy is fine, but emulating a large-scale scientific computation may be just a little bit more demanding, I'm afraid. :/

It will be slower, but I don't think unusable. Actually, comparing gameboy (and other consoles) is not really a good idea, as they are systems where getting all the processors exactly in sync is important and difficult. Emulating a PC is much easier, as no-one expects to get exactly equal speeds out of processors, as every PC model is slightly different.

As I say, I (and many other people) emulated x86 windows on PPCs for years. I ran visual studio and it dragged a little, but was fairly usable. Certainly it won't be as fast, but then again by the time x86 has died and people have moved, hopefully systems will have got fast enough to make up the difference!

In gaming world it seems like it's always hard to emulate the last generation, and quite easy to do ones before that. (E.g. Xbox not emulating their predecessors.)

It's not my area of expertise but I expect to see something similar here. If you need to emulate hardware you might take a 10x hit, which might be unacceptable for very large scale computations. But in the longer run that might not be a problem as more resources become available.

>Agree with your hypervisor/architecture point but I suppose there has to be some broad architectural choice made.

There's a risk that the VM platform decision could date and, ultimately, render useless an experiment sandbox.

However, two points. Firstly, any attempt to wholly encapsulate an experiment will be susceptible to problems concerning the technology choices made. By choosing the abstraction at a VM level, you're more likely to get a better number of years out of your choice than picking specific programming languages (which I contend would fail to gain adoption as researchers wouldn't change their obscure toolkit if they felt it was the right one for their research).

Secondly, even a VM for an obsolete architecture is better than what we have today - the original paper and perhaps an email address for the lead author.

Just use Qemu. It can run a number of CPU and architectures and is cross-platform. Link: http://wiki.qemu.org/Main_Page

Pulling the long-time-portability lever one mark further, I suggest Bochs. It's slow because it doesn't use Qemu's dynamic recompilation, but on the other hand, that simplicity makes the job of future archaeologists easier.

I fundamentally agree--yet, at the same time, it'd be so nice if a paper could just point to a URL for a Docker container, and you could pull it and immediately execute the program.

However, I fear that a Docker container in 2023 will be as useful as a Plex86 VM image today...

The issue isn't really software then, it's hardware obsolescence.

That example still completely in the realm of software obsolescence. The LXC API may not be around in 2023's Linux kernels any more, just like Plex86 has been abandoned (and I doubt that Plex86 runs unchanged on a modern x86_64 machine with a 3.x kernel).

But hardware obsolescence is a valid issue, too. That's why I suggested Bochs as a lowest common denominator in another comment.

Why are you limited to using the 2023 version of the kernel? Are they going to erase the 2013 version of the kernel in 2018?

No, but maybe your Apple GeForce 5500 graphics drivers and the drivers for the ansible network haven't been backported, so you'd have to run headless and offline. Or in another layer of VM, or two if the VM's virtual hardware is still too new for the old kernel. At some point, you'd just start to believe unseen that the paper's authors were probably right.

The point is not for everyone to recompute everything, but to ensure that it remains possible, and with known, well defined hardware platforms capable of running a specific bundle of software, it will likely reman possible for much longer than anyone cares about the specific papers.

The point is not that everyone will recompute everything, but ensuring that it remains possible.

I am working on a startup addressing this exact problem. Docker sounds really exciting although I haven't played much with it yet. Limmeau's concern below is a valid one. With the popularity of cloud computing centered around VMs, it seems like a wise choice to at least offer a VM as a worst-case option for recomputing an experiment.

As mentioned in the article, performance should be secondary when thinking about recomputation. A reasonable solution to the large downloads necessary is to spin up the VM on the server side and allow access to it either through the browser or through a terminal login.

You should definitely give Docker a try. Drop by the IRC channel and you'll find a great community ready to give you some advice.

> Other examples of tech stacks that are very stable are C, C++, and Fortran.

On the scale of a few years, yes. Assuming you don't depend on external libraries at all.

But I've worked on recompiling more than enough C code in the 15-20 year range or older that contained sufficient assumptions - some valid then, some that just happened to work then - that are no longer valid on most common architectures.

If you care about recomputability for <5 years, you can probably get away with "just" shipping a bundle with the code and any library dependencies. If you care about anything more than that, you're on shaky ground very quickly.

On the other hand, we have javascript emulators for a bunch of 30+ year old architectures. As it happens, it is becoming increasingly clear that targeting hardware stacks is far more stable than trying to re-assemble a software stack.

It's certainly ideal if everyone can manage to future proof their work, but I don't really know how realistic that is for many considering the demands. Learning Fortran is a formidable proposition to anyone who can do their work well in Python. At least we can be optimistic that as the tech gets better and better it will be easier to emulate dated environments with greater resources, but of course it's important that there be some interested party. If the research sphere embraces these sorts of action the pool of interested parties should only grow, and the methods for sustaining the practice should get better. Whether that means some sort of gentleman's agreement or standardization on how to create or maintain the machines I couldn't say, but we could probably do much better.

>Other examples of tech stacks that are very stable are C, C++, and Fortran.

You can probably add Python to that list now too, given how embedded it is in *nix ecosystem and its excellent scientific/numerical libs.

ANSI C is standard, as are its standard libraries, but I guess little C code avoids the use of nonstandard libraries, and some popular libraries are tricky to build even today. And much C code uses compiler-specific extensions.

Hi, this is Ian Gent, author of the Recomputation Manifesto.

Many thanks for all the comments.

If anybody wants to get in touch to work in any way on recomputation, please do! You can find me very easily on google.

Special thanks to @lifebeyondfife, I worked out who you are and you were a pleasure to supervise too. Hope all is going well.

Nice article, and thanks for coming by!

What are your thought on how to treat performance benchmarks, or really any claims that one algorithm is "better" than another? Since these are often extremely hardware specific, I've been wondering if instead of a VM it makes sense to offer a full bootable image.

Are there virtual image formats that can be either run within a VM or copied to a USB stick and booted?

I've thought about this quite a lot because people often ask me.

My main comment is that for one algorithm to be judged against another, the more different environments it is tested in the better, and we can get a deeper understanding of its performance profile. If it's always better then that's clear. If it is sometimes better and sometimes worse, either it's not really better, or there's some dependency on the hardware or other environment: but that is an interesting result in itself.

However there is always a place for very hardware specific claims. E.g. "For this chip/motherboard combination this flag is better", and for that we might always have a problem.

Interesting thought about the either VM or booting. Somebody suggested to me making live images (as in live dvds) which would serve this purpose.

I don't know if you're aware of this initiative regarding reproducible research in networking: https://reproducingnetworkresearch.wordpress.com/about/

It seems to be about providing full code to reproduce experiments using MiniNet.

IIUC, there's a project aiming to solve some of the problems with code dependencies, packaging the data with the code etc:


(the author hangs occasionally here on HN, but thought I'll plug this anyway, in case he misses this post)

Yes CDE looks very good though I haven't investigated in depth. The main limitation I think is that you need to have a recentish linux kernel installed. But if somebody distributes a CDE package then it will probably be easy to run it.

If I may suggest, you should contact the author - he wrote he takes pride in having real-world users, and adapting the tool to their needs.

Hi Ian,

Without being too cheeky - what's happened to TAILOR?

The Minion project you work on is a classic example of this problem - an important tool to actually encode problems for the solver has disappeared.

In this case it seems it is literally a case of a disappearing post doc and her site disappearing...

This is not being cheeky at all. A very fair point. In general terms one reason I am keen on recomputation is because I can see how bad we have been about things in the past, and I want us to do better.

To answer your concrete question: Tailor has been replaced by SavileRow which serves the same purpose and which is being maintained.


Huzzah - cracking name as well.

I really am going to have to get round to digging out my dissertation work and having a play.

TL;DR: any computer science paper that presents practical work without disclosing source code should not be accepted to any scientific conference or journal.

Agree on all. I've been few years in research in computer security before quitting for industry. I must report that so many papers that presents some kind of algorithms (I would say the majority) very rarely also provides the source code of the implementation. I have always thought and advocated for that any computer science paper that presents practical work without disclosing source code should not be accepted to any scientific conference or journal.

I know (because I did many times) that opening the source takes an incredible amount of time, but it is mandatory for being capable of 'standing on the shoulders of giants'. Writing code and keeping private in research is just a non sense.

Your TL;DR should read "...source code and a virtual environment that allows the process to be repeated..."

From the article:

"There has been significant pressure for scientists to make their code open, but this is not enough. Even if I hired the only postdoc who can get the code to work, she might have forgotten the exact details of how an experiment was run."


"The only way to ensure recomputability is to provide virtual machines"

To that end, the site http://recomputation.org/ is mentioned as a repository for recomputable experiments.

Point being: source code alone does not specify the process or workflow in which it was used.

sorry... TL;DR was intended for those who were not interested to read my whole comment.

actually the idea of providing VMs preconfigured to run the test is very good since it saves time both for who write the code and for those who want to test

> Point being: source code alone does not specify the process or workflow in which it was used.

I completely agree! when we released the code, we spent hours to clearly define the environment where the code had to run.

For those who may skim the article without reading the actual manifesto, the closing paragraph is rather keen:

> A manifesto is a call that people reading it should vote for your point of view. Don’t vote with a signature or a petition. Vote by making your computational experiments recomputable. Do it at http://recomputation.org, or at your own web site, or at another repository. But make your experiments recomputable.

Full manifesto linked from the article: http://arxiv.org/pdf/1304.3674v1.pdf

Before even reading the article I was thinking to myself "gee, this might actually be one of the best use cases I've heard for vagrant/etc". Turns out that's exactly what this is :)

This is great. I see that the first reference made in the paper is to my joint paper in Nature arguing for release of source code. Even that seems like a radical step too far to some scientists, goodness knows what they'd think about this, but it's a great idea.

Since John seems too modest to link to his own paper, here it is for those who were as curious as I:


Thanks. I've had enough problems getting my own code working a few months later, never mind anyone else's, which is partly where this comes from. But the overall motivation is the same: how can we judge scientific work if we can't examine it statically (open source) and dynamically (recomputation)?

I hope "Recomputability" emerges as a distinct term.

At least in the biological sciences, I'm seeing the term "reproducibility" used a lot where the meaning is much closer to "recomputability", i.e. "you can repeat the exact computational steps we performed" -- without necessarily saying much about either the lab-work and/or sample-collection parts of the project, or the possibility of performing similar analyses using different tools/platforms.

(I'd also like to see a bit more recognition of the importance of full reproduction -- i.e. someone starts with the same hypothesis or idea and does their own experiment -- in modern science).

I do agree ... I'd love "recomputation" and variants to catch on.

I also agree completely about distinguishing between recomputation and a proper reproduction. It's not what I emphasise in the manifesto but it is true that a recomputation doesn't really say anything about the generality of a result.

That's where I see recomputation as not quite pushing the same goals as reproducibility, even though its advocates often couch them as the same goal. Recomputation can be useful, but re-running the exact same code in the same virtual machine isn't really an independent reproduction of the claimed scientific result. That often benefits from not using the original source code; two independently written implementations claiming to implement the same approach and achieving the same results is a much better reproduction.

In the natural sciences, independent reproduction often finds subtle dependencies on the original apparatus that change the interpretation: when lab B tries to reproduce lab A's results on slightly different equipment and can't, it can highlight an unnoticed dependency on some specific feature of the original equipment, and may throw into question the original paper's conclusions. You would never have found that if, instead of lab B independently reproducing the result, lab A just packed up their equipment into a shipping container and shipped it to lab B, who unpacked and ran it unchanged. That's what the VM approach is arguing for, and that's not really reproducibility.

I completely agree. A recomputation of an experiment is not ensuring reproducibility of the scientific result. It's ensuring reproducibility of the individual experiment.

The analogy I have given is with cold fusion. If we could reproduce their exact lab setup then we could find out if the results were real - i.e. were not misread or anything, and assuming they were, have a chance of explaining the anomaly.

But no, it's not the same as reproducibility.

This is fantastic - and a serious challenge.

Recomuputability is to all intents and purposes the goal of devops and testing. And we are stumbling around at the edges of proving one environment it same as another.

This is one to watch - hell one to join in

Yes that is good point. There is one advantage to recomputability. Which is that - at least in the first instance - all that matters is being able to recompute the specific experiment for a paper. So testing as in "it works on other examples" is less critical. But indeed, as you say, there's close links with testing.

imho, he is missing the most important point, and walking in the wrong direction instead.

He is true, that science requires recomputation, the ability to verify or falsify results. But recomputation in science is more then just the ability to run the black magic box again. A black magic box makes it worse, because the box might change and fail over time, and its black magic VM. Recomputation requires source code, that is human readable.

So my suggestion instead is to use a combination of Gentoo and Linux Containers instead. Gentoo enforces that everything on the machine has its source code that did run through the compiler, and Linux Containers encapsulate the project in a way, that a simple backup can preserve it.

well I normally prefer Debian because of lower maintenance cost. But Gentoo could play out its strength in this edge case.

I think both are important. My real hope, in the long term, is that people will package source in the recomputable VM, and recompile it as part of the recomputation.

However, particularly when academics are gluing together multiple pieces of software, often which are themselves quite fragile, just trying to reassemble working software can be almost impossible.

I have a number of things I wrote myself from when C++11 was first coming out (yes, I possibly shouldn't write software with compilers for unfinished languages, but I like to live on the state of the art). Now C++11 support in gcc is finalised and some corners have been cleaned up, these programs don't compile any more. I know how to fix them, and have. I wouldn't want someone else to have to do that.

Languages change, their compiler changed, a Gentoo backup would come with sources of the compiler you used to compile your C++11preBeta project also.

A VM is fine but at least it should be minimal so you can see what of the 400MB matters. A minimal environment (boot to TeX? Would be good).

For the sample chess problem experiment, we do also provide a tarfile or zip of the experiment directory, which is just a few MB. So if that works in your environment, you're good to go. If not there's the 400MB to fall back on.

Obviously it would be nice to know exactly which bits of that are unnecessary to save space, but for now I'm happy enough to be able to give you something that works.

Sure, just thinking longer term. Dependencies are important to understand for replication. Eg your result might be only due to a dodgy random number stream (say). What do you need to rebuild? What should it be robust to?

If you have a working VM, and non-working tarball, you can "binary search" for the right environment.

This should apply equally to papers in Economics as well. The R&R Excel debacle was embarrassing.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact