
Don't Enforce R as a Standard - doctoboggan
http://timotheepoisot.fr/2015/04/02/do-not-enforce-R/
======
jdietrich
To give the other side of the argument:

Most scientists have little or no quality training in software development,
but scientific research is increasingly reliant on software. At present,
software is a glaring black box in a great deal of research, because very few
reviewers have the skills to thoroughly scrutinise it.

A software monoculture always has negative consequences, but fragmentation can
be equally problematic in many cases. A reviewer saying "This would be better
in R" usually means "I have no chance of understanding your code, because it's
not in R". For better or worse, R is currently the statistical computing
lingua franca in most fields.

I believe that the scientific method is in real trouble, due largely to the
immense complexity of much modern-day research. Scientists have access to
immensely powerful analytical and statistical tools, but most lack the
training and infrastructural support to use them in a rigorous manner. Bad
practices in software and statistics are the norm, rather than the exception;
I'm sure most of this is just an honest shortcoming, but I'm equally sure that
the lack of CS and stats experience amongst reviewers is a gift to would-be
Bogdanovs and Obokatas.

Science has always been a collaborative effort, but I think most fields are in
desperate need of greater support from computer scientists and statisticians.
Ultimately I would like to see those professions become deeply integrated into
all scientific fields, with the expectation that all papers should credit a
statistician and (where applicable) a computer scientist. Likewise, editors
and reviewers need far closer ties to CS and statistics professionals. Of
course, these issues are intertwined with many other problems in funding and
peer review.

Until then, the hegemony of R may simply be a price we have to pay for better
research in the short-term. A software monoculture at least gives reviewers a
fighting chance of spotting issues with software that might affect the
validity of results.

~~~
rjurney
I think its more like, "I have no chance of understanding your code, BECAUSE
IT'S IN R." The vast majority of R users and I bet 95% of ecologists, can't
read and understand R modules. Its the PHP of Data Science.

The point, then is to supply code with simple instructions so that anyone can
run it. Secondary goal: write it in a language a human can actually read.

If you're doing science and you want people to understand your code, you use
Python, not R.

~~~
hadley
If you want people to understand your code, you need to treat it like writing:
rewrite, rewrite, rewrite. The language itself is almost irrelevant: you can
write horrible code in Python and beautiful code in R.

~~~
rjurney
True, but if you write decent or so-so Python code, it is readable to anyone
who knows a C like language. So-so R code is still impenetrable.

The language matters. One is optimized for ... honestly I have no idea what R
is optimized for. I don't think R works that way. It just is. Python though,
its optimized for readability.

~~~
hadley
Badly written R is impenetrable. So is badly written R. I suspect your
standards for so-so R code are vastly different to mine.

R and Python really are very similar as languages. R is more functional, but
that shouldn't impede your ability to understand code (once you master some of
the basic idioms of FP)

~~~
rjurney
I can't actually comment on what is good/bad R code, because I've never seen
any R code beyond a few lines that I could follow.

------
davidmr
I'm torn on this. I've spent nearly the last 20 years in some form or another
in academic/research computing.

On the one hand, attitudes like the third reviewer's are a primary reason for
the state of HPC today, where new advances in research are just as often
finding a way to run 30 year old code on a modern supercomputer as they are
writing new software to take advantage of the fantastic array of new hardware.
The number of times you hear "sorry, we can't change that. Livermore wrote
that 15 years ago, and nobody knows how to change it anymore" is enough to
drive a rational person over the edge.

On the other hand, the state of academic peer review being what it is, I don't
fault the reviewer for suggesting the paper be resubmitted using more common
methodology. A conscientious reviewer has a lot of papers to review, and
spends a good amount of time on the ones that aren't written by crackpots.
While the author was of course fully in their rights (and may even have
advanced the field) to use their own software written in a relatively uncommon
language, for the results and methodology to be understood, that's asking a
lot of a reviewer.

I'm genuinely unsure as to how I would have responded to the review request. I
have much sympathy for both people involved.

~~~
Blahah
The correct response from the reviewer would have been "I am not capable of
reviewing code written in Julia, therefore I must decline this review request"
rather than "I am not capable of reviewing code written in Julia, therefore I
must recommend rejecting the paper".

~~~
Obi_Juan_Kenobi
Have you submitted to academic journals before?

Reviewer comments are not a list of absolute requirements; they're thoughts
and suggestions that help both the authors and editors improve the manuscript.
They will include a recommendation for publishing, and they can make that
recommendation contingent on particular issues, but that's somewhat unusual.

It's not completely clear what's happening in this case, but I'm strongly
inclined to think this was a 'minor comment' from a reviewer.

    
    
       The three reviews were helpful and constructive, but these two comments infuriated me.
    

I think the author is simply taking exception that these comments are
prevalent attitudes, not that they were significantly contributing to the
editorial decision.

It's very common to address reviewer comments without actually changing
anything. Basically, you say you disagree for reasons X, Y, and Z. The editor
can disagree, ask for further clarification from the reviewer, or simply
accept the argument. Nothing is set in stone.

As for the implication that there are other reviewers waiting in the wings
with suitable experience, good luck.

~~~
Blahah
Yes, I have some publications, and am familiar with the process.

I was attacking a theoretical review to make a point I wanted to make. The
blog post doesn't say the reviewer recommended rejection on the basis of
implementation in Julia, or that the editor's decision cited the language
choice. My point was that if those things were true (which is not clear from
the post), that would be bad. I make that point because in my experience, it's
not uncommon for reviewers to take similarly unreasonable positions.

------
zzleeper
These were the same types of people that said to me "don't use R, no one uses
it" several years ago.

I know there must be a balance between bleeding-edge and stability, but if in
_research_ you cannot use the new tools, then there will not be any progress
at all.

~~~
gerty
That's true. In my field, there's plenty of Matlab and it's arguably not even
the best tool for the job: someone started using it, people improved on the
work and now, no one cares about rewriting that.

------
apalmer
Why did R take over the world of statistics anyways? I remember about 8 years
ago I had an interest in statistics. Everything was SAS, SPSS etc which i just
didnt have the budget for... I happened to find R as it was a freely available
for linux, impression I got was no one used it, well some limited use in
academia, but no professional usage.

Fast forward it seems like the most widely used tool in computer data
analysis. Did something fundamentally change?

~~~
rm999
The biggest boon for R was the rise of "data science". Data scientists have
actually existed for decades, but they had different titles (quants,
actuaries, quantitative researchers, etc) and came from fields that preferred
enterprise software for various reasons.

Data scientists, on the other hand, largely rose from tech companies that
hired software engineers and preferred open-source software. When I started in
the field ~10 years ago there were very few reasonable open-source options
outside of R. Matlab and SAS were prohibitively expensive, Octave was too
immature, Python didn't have basic functionality like data frames. In short,
it was our only viable option, so we used it. A lot of people started using it
for the same reasons, so R's library support became the best. This is what
keeps me semi-locked into R - I'll probably eventually move to Python or
Julia, but R hit the critical mass first and in a huge way.

~~~
grayclhn
R's library support predates your timeline.

~~~
rm999
The most impactful, important packages I use were all developed after 2005:
ggplot2, plyr, glmnet, gbm, reshape, knitr. Out of curiosity, which ones are
you referring to?

edit: I may be using the word library imprecisely here, which may be causing
miscommunication. I mean 'packages', to be clear.

~~~
Blahah
Prior to those packages, R was the language in which most new statistical
methods would be published. So it has had very good library support for
statistics for a long time. Only more recently has it also gained excellent
data manipulation support.

~~~
disgruntledphd2
Which kinda violates the principles expressed by the writers of said open
source project, but I'm all for it on the basis that if I never again have to
use base reshape, stack or gsub then it would be a good thing.

I believe An Introduction to R specifically notes that one should use other
software to provide R with appropriate data.

------
tmalsburg2
The author writes that the software wasn't the focus of the manuscript and
that the reviewers should not have commented on the software at all but that
it actually not true. If you look at the preprint manuscript, you'll find that
the software is prominently advertised in the abstract of the paper. In this
situation, it is perfectly reasonable for the reviewer to comment on the
implementation of the software. And as much as I like Julia, I think he is
right when he says that an R package would be more useful.

------
weinzierl
"...it can be written in lisp or cobold for all I care"

The typo is too good. Cobold is the sprite that plays its tricks on us even 56
years after its first appearance.

~~~
rcthompson
I assumed it was intentional.

~~~
weinzierl
That makes it even better.

------
minimaxir
The only reason I still use R is because of the hadley-verse packages. (Namely
dplyr and ggplot2). R's base packages are incredibly terrible and
counterintuitive, especially compared to Python's pandas and scipy.

------
guelo
My non-academist opinion is that the biggest problem in science is that there
is little incentive for scientists to reproduce each other's results. Anything
that makes it easier for other labs to rerun an experiment is a good thing for
your field overall. The goal of a paper should be to communicate as plainly as
possible how to reproduce the experiment. That probably means a good-enough
standard programming language that everyone learns in school and uses in their
work.

~~~
Gimpei
I agree, unless it's something that can't be done well in the current
language. Or, if like me, you're an economist and the default language used by
academics is horrible, horrible Stata, which needs to be replaced by R.

~~~
zzleeper
For me, both are horrible. I use Stata a lot and thought a lot about switching
to R (there is even a nice tutorial by Matthieu Gomez at
[http://www.princeton.edu/~mattg/statar/](http://www.princeton.edu/~mattg/statar/)
).

However, I believe that moving to R is not the answer, as you will still be
riddled with a lot of the problems that you faced in Stata such as a quirky
syntax (except backticks. I hate those backticks). Maybe Python will catch up
(I doubt it, what we need is a DSL, not something too general that requires
loong commands to do a simple regress). My best bet would be Julia, but it's
still a long way to go regarding things like missing values.

------
scottfr
This is really just an example of Bike Shedding.

Reviewing research papers is really, really hard work to do well. It takes a
massive time commitment to truly digest and understand novel work to the level
where you can provide a quality critique and review.

It's much easier to throw out facile critiques like the ones these reviewers
provided. It shows that you know your stuff and did your job as a reviewer,
without requiring you to actually understand the authors' innovations or lack-
there-of.

~~~
tmalsburg2
You are implying that the rest of the review, which we haven't seen, was
useless and shallow. However, the author himself said that the reviews were
actually "helpful and constructive". I agree that cases of bike shedding
exist, but there is no evidence that this was the case here.

------
Pinatubo
I think the author is being overly sensitive here, and the reviewers are
actually giving good advice.

Most methods papers in most fields are ignored by the vast majority of
researchers. Just like the author doesn't have the time to rewrite his code in
R, his prospective audience doesn't have the time to learn a new software
package just to try out his proposed method.

If the author actually wants to change the way research is conducted in his
field, he needs to make it easy for others to try his method out and possibly
change the way they do research. As of today, that means an R package.

Or, the author can dig in his heels, refuse to write an R package, and be
ignored. Maybe that isn't fair, but as I've learned through my own
experiences, that's the way it is.

------
Mikeb85
I think it's just that R has become the de-facto standard since the previous
tools used were either terrible, or closed source and expensive.

And it's probably taken them so long to adjust to R, that they don't want to
change again any time soon.

And you're right, the tool shouldn't matter, and I definitely agree with your
stance on open source, but keep in mind those attitudes do exist. Speaking of
attitudes concerning open source, here's one of my favourite blog posts about
the subject: [http://www.catuhe.com/post/Le-syndrome-du-
Puppy.aspx](http://www.catuhe.com/post/Le-syndrome-du-Puppy.aspx)

------
moron4hire
Even the language we use to talk about arithmetic, algebra, and calculus had
to--at one time--be agreed to and standardized, so that mathematicians could
have a common ground on which to speak to each other about their ideas and how
they reached them.

The program is not just "a tool". It's your proof. You may have written a
proof in the paper, but that was just a practice run. It's the one you wrote
in the code that is the real proof, because there is little garaunteeing that
seasoned software developers write code that matches the spec, say nothing
about" amateur" programmers in the sciences.

Now, that does not preclude innovation. People still invent new mathematical
notation systems. It's usually with the express purpose of solving problems
that can't easily or at all be stolved in current systems. It is the bleeding
edge of the science of math. But physicists are still expected to use the math
their colleagues will understand, or spend a lot of time explaining their new
system (QM anyone?).

There is, of course, the issue that math is significantly less rigorous and
specific than code. Math doesn't compile and doesn't run. Or rather, it gets
compiled and ran by people. Part of avoiding new, arbitrary notations is so
that your work is accessible for verification. That gives running code,
especially coffee with a comprehensive set of unit tests, a distinct advantage
over math. But that doesn't obviate the bed for verification.

If you are using any nontrivial piece of custom software in your science, it
needs to be scrutinized just as much as the rest of the math in your paper--if
not more so, since it does the actual work.

The first reviewer is almost certainly right. If they can't understand your
code, then they aren't the equivalent of lay-users complaining about open
source software. First of all, it is still an issue of much debate whether or
not releasing source code to user's for software products is necessary. If you
are writing code for your science, you have an absolute duty to open source
that code (even if in a non-extendable way, we still need to see the code).
You wouldn't make a claim without providing the math. And you wouldn't provide
math other people couldn't understand, arguing "some math is better than no
math".

I don't think that standard should be R, or that it necessarily has to always
be the same thing. It can adapt over time. But until you manage to release a
few papers on how Julia improves over R, in explicit detail... "when in Rome,
do as the Romans."

------
bachmeier
Does this story tell us more about R as a standard, or more about the
refereeing process? I've dealt with referees for a long time and I'd have to
go with the latter. This is to me yet another case of a referee viewing
his/her preferences as the only correct way to do things.

------
mattexx
Peer review is a huge part of our culture here at Climate. It's often
unreasonable to expect a single reviewer to review a complex paper, so we
sometimes address this by assigning each reviewer a domain.

Something like:

Reviewer 1: Review scientific theory (domain expert)

Reviewer 2: Review scientific methods and conclusions (senior scientist)

Reviewer 3: Review code for scientific accuracy (senior programmer)

