When doing science, don't use a tech stack that will be obselete next year or in 3 years. Use something that will be AROUND; something that is well defined, something that other people can use down the road on a different system.
Virtual machines are "okay". Running them requires having the hypervisor working correctly and - hopefully - the original architecture supported.
Personally, I used Common Lisp for my Master's work. Other examples of tech stacks that are very stable are C, C++, and Fortran.
In making the argument for recomputation to aid its adoption, it's important not to add too many constraints on the researcher e.g. "Your results should be reproduceable and written in C because it should still be around ten years from now."
Even an obsolete language in a working VM sandbox can be edited and inspected to make changes, verify correctness etc. If you want to extend the results of that research you may have a porting project to update the code to more a current language, but that's leaps and bounds ahead of how things are now: read and re-read the paper until it makes sense and implement the algorithms from scratch.
Agree with your hypervisor/architecture point but I suppose there has to be some broad architectural choice made.
(The author of this article was my PhD supervisor. I've always found him to be an inspiring, forward thinking researcher so as much as I love the recomputation idea on merit, I'm also a little biased.)
This is often a good thing for scientific progress in chemistry, though. Independent reproduction in independent labs gives a lot more confidence in results than shipping the exact same lab around would. A lot of interesting results come up due to issues in replication, which you wouldn't find if you just literally reran the identical experiment on the identical equipment.
That said, being able to look at and experiment with someone's original apparatus is better than nothing.
More often than not to realize that some nasty detail got omitted because it was nasty.
Recomputation and reproduction are both valuable
I think having a starting point of reproduceability would lead to more reimplementations.
There are perfectly functional x86 emulators for ARM already. Many people used to run Windows on PPC macs via emulation too.
As I say, I (and many other people) emulated x86 windows on PPCs for years. I ran visual studio and it dragged a little, but was fairly usable. Certainly it won't be as fast, but then again by the time x86 has died and people have moved, hopefully systems will have got fast enough to make up the difference!
It's not my area of expertise but I expect to see something similar here. If you need to emulate hardware you might take a 10x hit, which might be unacceptable for very large scale computations. But in the longer run that might not be a problem as more resources become available.
There's a risk that the VM platform decision could date and, ultimately, render useless an experiment sandbox.
However, two points. Firstly, any attempt to wholly encapsulate an experiment will be susceptible to problems concerning the technology choices made. By choosing the abstraction at a VM level, you're more likely to get a better number of years out of your choice than picking specific programming languages (which I contend would fail to gain adoption as researchers wouldn't change their obscure toolkit if they felt it was the right one for their research).
Secondly, even a VM for an obsolete architecture is better than what we have today - the original paper and perhaps an email address for the lead author.
But hardware obsolescence is a valid issue, too. That's why I suggested Bochs as a lowest common denominator in another comment.
As mentioned in the article, performance should be secondary when thinking about recomputation. A reasonable solution to the large downloads necessary is to spin up the VM on the server side and allow access to it either through the browser or through a terminal login.
On the scale of a few years, yes. Assuming you don't depend on external libraries at all.
But I've worked on recompiling more than enough C code in the 15-20 year range or older that contained sufficient assumptions - some valid then, some that just happened to work then - that are no longer valid on most common architectures.
If you care about recomputability for <5 years, you can probably get away with "just" shipping a bundle with the code and any library dependencies. If you care about anything more than that, you're on shaky ground very quickly.
You can probably add Python to that list now too, given how embedded it is in *nix ecosystem and its excellent scientific/numerical libs.
Many thanks for all the comments.
If anybody wants to get in touch to work in any way on recomputation, please do! You can find me very easily on google.
Special thanks to @lifebeyondfife, I worked out who you are and you were a pleasure to supervise too. Hope all is going well.
What are your thought on how to treat performance benchmarks, or really any claims that one algorithm is "better" than another? Since these are often extremely hardware specific, I've been wondering if instead of a VM it makes sense to offer a full bootable image.
Are there virtual image formats that can be either run within a VM or copied to a USB stick and booted?
My main comment is that for one algorithm to be judged against another, the more different environments it is tested in the better, and we can get a deeper understanding of its performance profile. If it's always better then that's clear. If it is sometimes better and sometimes worse, either it's not really better, or there's some dependency on the hardware or other environment: but that is an interesting result in itself.
However there is always a place for very hardware specific claims. E.g. "For this chip/motherboard combination this flag is better", and for that we might always have a problem.
Interesting thought about the either VM or booting. Somebody suggested to me making live images (as in live dvds) which would serve this purpose.
It seems to be about providing full code to reproduce experiments using MiniNet.
(the author hangs occasionally here on HN, but thought I'll plug this anyway, in case he misses this post)
Without being too cheeky - what's happened to TAILOR?
The Minion project you work on is a classic example of this problem - an important tool to actually encode problems for the solver has disappeared.
In this case it seems it is literally a case of a disappearing post doc and her site disappearing...
To answer your concrete question: Tailor has been replaced by SavileRow which serves the same purpose and which is being maintained.
I really am going to have to get round to digging out my dissertation work and having a play.
Agree on all. I've been few years in research in computer security before quitting for industry. I must report that so many papers that presents some kind of algorithms (I would say the majority) very rarely also provides the source code of the implementation. I have always thought and advocated for that any computer science paper that presents practical work without disclosing source code should not be accepted to any scientific conference or journal.
I know (because I did many times) that opening the source takes an incredible amount of time, but it is mandatory for being capable of 'standing on the shoulders of giants'. Writing code and keeping private in research is just a non sense.
From the article:
"There has been significant pressure for scientists to make their code open, but this is not enough. Even if I hired the only postdoc who can get the code to work, she might have forgotten the exact details of how an experiment was run."
"The only way to ensure recomputability is to provide virtual machines"
To that end, the site http://recomputation.org/ is mentioned as a repository for recomputable experiments.
Point being: source code alone does not specify the process or workflow in which it was used.
actually the idea of providing VMs preconfigured to run the test is very good since it saves time both for who write the code and for those who want to test
> Point being: source code alone does not specify the process or workflow in which it was used.
I completely agree! when we released the code, we spent hours to clearly define the environment where the code had to run.
> A manifesto is a call that people reading it should vote for your point of view. Don’t vote with a signature or a petition. Vote by making your computational experiments recomputable. Do it at http://recomputation.org, or at your own web site, or at another repository. But make your experiments recomputable.
Full manifesto linked from the article: http://arxiv.org/pdf/1304.3674v1.pdf
Before even reading the article I was thinking to myself "gee, this might actually be one of the best use cases I've heard for vagrant/etc". Turns out that's exactly what this is :)
At least in the biological sciences, I'm seeing the term "reproducibility" used a lot where the meaning is much closer to "recomputability", i.e. "you can repeat the exact computational steps we performed" -- without necessarily saying much about either the lab-work and/or sample-collection parts of the project, or the possibility of performing similar analyses using different tools/platforms.
(I'd also like to see a bit more recognition of the importance of full reproduction -- i.e. someone starts with the same hypothesis or idea and does their own experiment -- in modern science).
I also agree completely about distinguishing between recomputation and a proper reproduction. It's not what I emphasise in the manifesto but it is true that a recomputation doesn't really say anything about the generality of a result.
In the natural sciences, independent reproduction often finds subtle dependencies on the original apparatus that change the interpretation: when lab B tries to reproduce lab A's results on slightly different equipment and can't, it can highlight an unnoticed dependency on some specific feature of the original equipment, and may throw into question the original paper's conclusions. You would never have found that if, instead of lab B independently reproducing the result, lab A just packed up their equipment into a shipping container and shipped it to lab B, who unpacked and ran it unchanged. That's what the VM approach is arguing for, and that's not really reproducibility.
The analogy I have given is with cold fusion. If we could reproduce their exact lab setup then we could find out if the results were real - i.e. were not misread or anything, and assuming they were, have a chance of explaining the anomaly.
But no, it's not the same as reproducibility.
Recomuputability is to all intents and purposes the goal of devops and testing. And we are stumbling around at the edges of proving one environment it same as another.
This is one to watch - hell one to join in
He is true, that science requires recomputation, the ability to verify or falsify results. But recomputation in science is more then just the ability to run the black magic box again. A black magic box makes it worse, because the box might change and fail over time, and its black magic VM. Recomputation requires source code, that is human readable.
So my suggestion instead is to use a combination of Gentoo and Linux Containers instead. Gentoo enforces that everything on the machine has its source code that did run through the compiler, and Linux Containers encapsulate the project in a way, that a simple backup can preserve it.
well I normally prefer Debian because of lower maintenance cost. But Gentoo could play out its strength in this edge case.
However, particularly when academics are gluing together multiple pieces of software, often which are themselves quite fragile, just trying to reassemble working software can be almost impossible.
I have a number of things I wrote myself from when C++11 was first coming out (yes, I possibly shouldn't write software with compilers for unfinished languages, but I like to live on the state of the art). Now C++11 support in gcc is finalised and some corners have been cleaned up, these programs don't compile any more. I know how to fix them, and have. I wouldn't want someone else to have to do that.
Obviously it would be nice to know exactly which bits of that are unnecessary to save space, but for now I'm happy enough to be able to give you something that works.