
Handful of Biologists Went Rogue and Published Directly to Internet - srikar
http://www.nytimes.com/2016/03/16/science/asap-bio-biologists-published-to-the-internet.html
======
dekhn
When I first found out about the web, early 90's, and it was "obvious" that
the role of the web was to expand scientific publishing. I expected that
everybody would publish latex files (raw source, not PDFs) in computationally
accessible formats, with raw data in easily parseable, semantic forms.

That didn't really happen as expected. In my own chosen field (biology) it
happened much more slowly than I hoped- physics (with arxiv) was far better.
However, just getting PDFs on biorxiv is only a small part of the long game. I
did not appreciate the huge importance placed on publishing in high-profile
journals has on one's career trajectory and how large a role that would play
in slowing the transition to free publication and post-publication review.

The long game is to enable the vast existing resources, and the new resources,
to be parsed semantically by artificial intelligence algorithms. We've already
passed the point where individuals can understand the full literature in their
chosen field and so eventually we will need AI just to make progress.

~~~
logicrook
> I expected that everybody would publish latex files (raw source, not PDFs)
> in computationally accessible formats, with raw data in easily parseable,
> semantic forms.

Do you realize that many researchers would be ashamed if others saw their
latex code?

A friend worked for a journal, he kept a folder of the worst that he had ever
seen, and could rant for hours one some gems. Including some papers that were
beautifully typeset, about elegant combinatorics/type theory, and were coded
on the latex equivalent of a spaceship cobbled with recycled plastic bottles,
scraps and duct tape.

Do you imagine going through thousand lines macro tex file that grew over the
years, when Latex is one of the worst mess ever, with conflicting packages,
absurd syntactic 'features', and legacy crippling flaws that make debugging an
horrible task?

Please, continue to compile your .tex.

~~~
soperj
It would be much easier to get better if you could view other's latex code.

~~~
anigbrowl
I will never understand why latex devotees think everyone else should be
really into typesetting as a hobby. It's like sneering at photographers who
don't want to mix darkroom chemicals.

~~~
p4wnc6
That's not a good comparison, because mixing your own chemicals for photo
development adds _more_ time, effort, and defect rate to the process.

Becoming more proficient with TeX _saves time_ and makes for more re-usable
typesetting structures, which allow you to accurately do what you want faster
and faster.

It's similar to using emacs or vim. You could say, "Why should programming
devotees expect others to adopt those editors?"

It's about productivity. I'm a _productivity devotee_ and when I hear people
use short-term thinking to discount how valuable the up-front costs for
learning TeX (or emacs/vim, etc) are, it motivates me to help them see why
it's not true.

Of course, I don't want to be dogmatic either. If something works for you and
your personal utility function discounts the value of things like TeX or
emacs, that's fine.

~~~
amelius
> It's about productivity.

No it is about aesthetics, which are mostly irrelevant for 99% of all research
papers.

~~~
Spivak
It's not just aesthetics, having proper typesetting makes papers more
accessible, easier to read and parse, the standard formatting makes
understanding the layout of papers easier, there is a huge library of packages
that cleanly abstract the task of rendering obscure symbols and complex
mathematical formulas, and it automates the job of tracking figure, equation
and reference citations to name just a few of the features. Fine, TeX is an
ugly language that only a programmer could love, but in over twenty years
nobody has yet been able to replace its expressive power and flexibility --
doc generators just compile to LaTeX and more user friendly GUIs like Lyx and
TeXMaker have been written to ease the process of writing for end users, but
the underlying format has yet to be replaced in over twenty years. Like
Markdown, TeX rides the line between being able to be written by hand for
greater flexibility and automatically generated for newbies.

Unless you think that reading red text on a blue background in payprus would
be an equally enjoyable reading experience it more than simple aesthetics.

~~~
smsm42
You sound like aesthetics is a bad thing you need to disavow for some reason.

> Fine, TeX is an ugly language that only a programmer could love, but in over
> twenty years nobody has yet been able to replace its expressive power and
> flexibility

Which doesn't make it less ugly. If we can't cure common cold yet, that is no
reason to praise it and making it sound as if it's a great thing. It is a
problem and so is the need to use ugly language to produce aesthetically
pleasing papers. The fact that we don't have solution for this problem yet
does not turn it into non-problem. It just says nobody yet was good enough to
solve it.

------
blakesterz
I was working in academia back in 2002 and I remember talking about this crazy
open access thing, and blogs, and wikis, with the folks on the tenure
committee then. I remember thinking how fast this tenure/publishing thing will
change in the next few years. And here it is 13 years later and there's a
headline about a HANDFUL of biologists going rouge and daring to publish a
PREPRINT!? I know there's been quite a bit of progress, but I'm still
surprised at just how little things have changed.

~~~
michaelhoffman
3000 manuscripts posted to bioRxiv since 2013. More than a handful.

~~~
yitchelle
The HANDFUL is referring to the number of biologist, not the number of
manuscripts.

However, compared to the number of potential biologists or manuscripts that
could be published, it is just a HANDFUL.

------
nickbauman
Uh ... isn't this what Tim Berners-Lee meant when he created the world wide
web was that scientists do this exactly? It's like he handed a machine gun to
us cavemen scientists 25 years ago and we've been collectively clubbing him in
the head with it ever since.

------
reporter
I just uploaded three articles in the last two weeks to BioRxiv. The papers
were previously just sitting in review. I have already received several emails
thanking and informing me that my work is influencing the manuscripts they are
writing - more citations. Overall the experience has been an extremely
positive experience for me. I don't really see any downsides. So excited for
the revolution.

~~~
dnautics
the amazing thing is that arXiv has always had a "quantitative bio" section,
and I guess it took a sea change for this to be come a thing in bio.

~~~
michaelhoffman
It's a social problem, not a technical one. The main new features bioRxiv has
are comments and digital object identifiers (DOIs).

------
slizard
While this is nice in that it sets the example (and provides publicity to
bioarxiv) it's not a couple of Nobel laureates that post one out of dozen(s)
of papers/year published that will make the real difference.

The system is aged and inefficient (some would even argue it's rotten) and IMO
comprehensive changes are needed. Like racial or gender discrimination can't
be addressed without changing the social rules people live by, the current
academic system that's rather elitist, non-inclusive, discriminatory, often
more biased and less fair than many think needs to change substantially.

Such change will be aided by important people setting examples (and often
going back to their old ways). However more substantial change is needed on
multiple levels, most importantly: academic leaders and funding agencies (run
by the former) need to stop looking at who's who and how many
Nature/Science/insert-your-fancy-journal papers does the person have. For
instance, the culture of applying for grant money with work that's half done
to maximize one's chances needs to stop and so should the over-emphasis of
impressive and positive results.

Additionally, publishers exploiting everyone need to die out and as long as
these researchers "go rogue" with a single paper (rather than for instance
committing to publish 100% preprints and >75% open access), not much will
change.

------
jackcosgrove
I know the "wisdom of crowds" is passe, but the continued success (all things
considered) of Wikipedia and open source software really makes me question the
value of quality gatekeepers. I know I'm biased because I work in software and
the costs of mediocrity in this industry are less than in others, but I think
we could speed up innovation and discovery if we opened up science and made it
more publicly accessible and collaborative. At some point the gatekeepers are
just protecting their turf and hold back progress.

~~~
d0mine
The crucial difference is that an amateur can provide a meaningful
contribution to an open-source software project while the same is unlikely in
(modern) science.

You need years of intense studying even to understand the current state of the
art in a chosen scientific field.

Experiments require more money too.

~~~
nl
As someone who has fallen into a transition from software engineering to
science I think I can say this is mostly wrong.

It seems to me that any given field of "science" isn't any harder than
dropping into a new area of software engineering. There's a lot to learn,
sure, but there's a lot to do as well and there are areas where an complete
rank amateur can make significant contributions.

It's still true that every day I discover something I have no idea at all
about. But it seems that this is fairly normal in science - people know a lot
about a single specialized area and not much outside it.

~~~
d0mine
> I think I can say this is mostly wrong.

There are 3 sentences in my comment. What specific claim do you consider to be
wrong and what is your evidence?

~~~
nl
All 3

 _The crucial difference is that an amateur can provide a meaningful
contribution to an open-source software project while the same is unlikely in
(modern) science._

[http://motherboard.vice.com/read/meet-the-amateur-comet-
hunt...](http://motherboard.vice.com/read/meet-the-amateur-comet-hunter-who-
out-gazes-the-big-telescopes)

[http://io9.gizmodo.com/5841287/the-story-of-the-woman-who-
di...](http://io9.gizmodo.com/5841287/the-story-of-the-woman-who-discovered-
new-species-in-her-garden)

 _You need years of intense studying even to understand the current state of
the art in a chosen scientific field._

From personal experience, this isn't the case. But you'll complain about
citations, so: [https://www.quantamagazine.org/20160313-mathematicians-
disco...](https://www.quantamagazine.org/20160313-mathematicians-discover-
prime-conspiracy/)

 _Experiments require more money too._

See the examples provided above.

------
hwang89
What's the best way to support and reward these researchers? Something we can
do in the next five minutes while they have the reader's attention.

~~~
adrianN
Write a paper that cites them and get it published in Nature. Papers and
citations in high-impact journals are what gets scientists a job, grants, and
tenure.

~~~
shawn-furyan
It's hard to tell whether this is a joke with a lot of truth, a genuine
sideways takedown of the effort being reported in the article, a cynic's
lament, or just a straightforward answer to the question of what would be the
most helpful (if not most accessible) action for these authors. In any case I
find it a tremendously compelling comment. It evokes a lot in a very small
footprint.

edit: clarification

------
timrpeterson
Nobel prize winning scientists can go rogue. Until these same rogues hire
incoming professors based on their own biorxiv papers this is a small advance.

This whole thing needs to start at the level of the funding agency, namely
NIH. Publishing in a good journal is a prerequisite to getting a grant. Try
getting an R01 on a Biorxiv paper. Not gonna happen.

~~~
michaelhoffman
Most of the pre-prints posted on bioRxiv are submitted and later accepted to a
traditional journal. The authors get the best of both worlds—early
dissemination of their results and still get the stamp of approval from
traditional journals that many other institutions value.

------
cs702
When a mainstream publication like the New York Times has a _positive_ article
about Nobel-prize-winning scientists bypassing the choke-hold of established
journals by directly publishing preprints online, you know _it 's the
beginning of the end_ for the old, bureaucratic way of publishing scientific
research.

Awesome.

~~~
excalibur
But when the same article is exclusively citing Twitter as a source, it's far
closer to the end of the end of their publication's integrity.

------
chrisamiller
This article is a little bit breathless. In the academic circles that I run
(genomics, computational biology, cancer), bioArxiv is not "going rogue". It's
becoming pretty common, and will continue to increase in popularity as the FUD
surrounding preprints and high-impact journals begins to dissipate. i.e.
Nature won't accept my paper if it's on bioArxiv! (Yes they will).

~~~
nycticorax
It's certainly fear, uncertainty, and doubt, but usually when people use the
term "FUD' they mean to imply that the fear, uncertainty, and doubt are
unfounded. But in this case, all three are justified. The article mentions
that Nature and Science are open to papers that have been pre-published. But I
believe that many biology journals still have a blanket policy of not
considering papers that have been pre-published. If that's the case, the
working scientist is highly motivated to _not_ pre-publish, since it shuts the
door to later (peer-reviewed) publication. Unless, of course, the scientist is
100% sure they can get it published in Science or Nature. And in practice one
is almost never sure of this.

I'm always curious to know how physics, as a field, got over this and related
humps. Was it just easier to get everyone on board because it's a smaller
community? Were the journals not as savvy to the fact that pre-prints are not
really in their interest?

~~~
KKKKkkkk1
The journals don't really care about preprints. Most of their revenue comes
from university subscriptions, and so long as they can claim that their
paywalled versions of the papers are the final and official ones and the
preprints are merely unedited drafts, there is no threat to that revenue
stream. (Placating the journals is part of the reason why they're called
_preprints_ rather than just papers. On Arxiv, authors will often update their
preprint even after the paper is accepted in a journal.)

~~~
nycticorax
How do you know this? One reason they're called "preprints" is that they
haven't (necessarily) been peer-reviewed...

------
pnathan
I talk with a doctoral candidate in Chemistry regularly about the different
paper cultures. It's amazing how different different disciplines are... her
accounts of Chemistry is that (I synthesize) they are extremely locked down
and research is very much aimed towards granting patents. Knowledge sharing to
the broader chemistry community does not appear to be a key goal.

It was a huge shock to me, coming from CS: knowledge sharing has such a high
value in our community.

~~~
analog31
A grad student is really exposed to being "scooped," i.e., to having their
project taken up, finished, and published by somebody else. There are even
professors who are notorious for this.

~~~
pnathan
> A grad student is really exposed to being "scooped," i.e., to having their
> project taken up, finished, and published by somebody else. There are even
> professors who are notorious for this.

I have no words for my disgust at the lack of honesty that scooping would
demonstrate.

------
mirimir
Even cooler, I think, are "working papers". In my arguably limited experience,
this seems to be popular mostly in economics. As I understand it, authors are
soliciting comment from peers, and thinking becomes long-term collaborative.
It's a conversation, not a paper. Maybe scientific research can become even
more open and collaborative, using the GitHub model or whatever.

------
ylem
I'm a physicist and we routinely publish on the arxiv. I hope the chemists are
next (a number of chemistry journals ban preprints)!!!

------
devy
Is it just me who think that this NYT title reflects the negative view of
releasing research results in preprints by using the phrase "went rogue"?

Speeding up the knowledge sharing and to solve more problems more quickly is a
good thing IMMO. As the same article pointed out, Physicists has been
releasing research results in preprints since 1990s.

~~~
valleyer
I agree it's a terrible title, but I think the "went rogue" reflects more the
NYT's attempt to conjure up a BuzzFeedy title than anything else.

------
larakerns
This needs to happen in more scientific fields, like a field-specific
consensus publishing platform. Everyone agrees to publish their research to
benefit everyone else.

------
yeukhon
Total off topic, but here is an anecdote I heard from a professor when he was
explaining about review process. One reviewer argued to change

    
    
        Figure 5. The statistics ....
    

to

    
    
        Figure (5) The statistics ....
    

because the reviewer liked () format better, although (IMO) the new format is
so ugly.

Another reviewer saw the comment and had a really nasty debate. The paper was
published with the original format nonetheless.

------
return0
That is great (or, to be honest, it's way past time for this, this shouldn't
be news). But we need to go beyond that. The PDF format is a relic. We need a
platform where scientists can directly edit their articles. Figures should be
replaced by interactive visualizations where possible. This would solve the
problem of data availability and allow other researchers to have direct access
to the data shown in a plot.

~~~
CroCroCro
HTML?

~~~
return0
Of course, i mean an easy way for researchers to write them.

------
batbomb
Why not just use arxiv.org?

~~~
jessriedel
I heard from Paul Ginsparg that bioArxiv's original raison d'etre was to give
biologists more assistance in producing professionally typeset documents since
far fewer of them know LaTeX then physicists. This was going to be funded by a
submission fee of ~$50. That idea was apparently nixed, so that the difference
with the arXiv are now mostly cosmetic (with the notable exception that
bioArxiv has commentary).

Fractionalization is generally undesirable, but it's plausible that bioArxiv
can make tweaks that accelerate adoption among biologists compared to arXiv.

~~~
cossatot
Discoverability is a big part of it as well, at least at the user end. I find
out about new papers from email alerts of subject matter and tables of
contents of the new issue of the journals I keep up with. (Obviously when I
need to look into a particular issue I use google scholar).

I have never once found a paper of particular relevance to my research
(geology/geophysics) on arXiv. I haven't looked very many times, but after
striking out several times, why keep trying?

If there was a geoRxiv then I would probably browse it more regularly because
the chances of me finding something relevant would be much higher.

For that reason, and another, I kind of disagree that fragmentation (or
fractionalization) is undesirable. The second reason, which is not unrelated,
is that–at least with peer-reviewed journals–the quality of the work in field-
or subfield-specific venues is often far higher than in the sort of pan-
scientific journals like _Science_ or _Nature_. I think a lot of it has to do
with the quality of the reviewing/editing, but a lot of it is that the papers
have to be written for _and justified to_ a wider audience that wants to be
wowed and doesn't know the background well enough to evaluate the science for
its own sake.

If I write a paper and send it to _Tectonophysics_ , I know that the
readership will understand what I'm doing and why, and I will write the paper
accordingly. If I write the paper for _Nature_ , then I have describe and
justify the how and why to a wide range of people from my peers to journalists
for phys.org and the NYT. Sometimes that's fine: If I find out that the
Seattle fault is loaded and ready to pop, the press, policy makers and
citizens need to know. But if I find out that the stress field on the Seattle
fault is largely determined by the topography near the fault and that has some
persistent influence on how an earthquake rupture propagates on the fault (but
doesn't necessarily change the seismic hazard) then I don't need to go through
the rigmarole of explaining and justifying any of that to anyone who isn't
intrinsically interested, and maybe more importantly, I don't have to explain
(i.e. gloss over) the subtleties, ambiguities and caveats of the work to an
audience that lacks the relevant background. This simply allows me to write a
more clear and more honest paper.

This brings up a tangent that is relevant to the broader topic of self-
publication: You _always_ need to write to a specific audience, and with a
journal you know who that audience is. With a blog or a website, you don't
necessarily. That may be fine but it can trip a lot of people up, and make the
writing much worse.

~~~
jessriedel
The arXiv serves to unbundle the _dissemination_ part of journal publishing
for the _filtering_ and _certifying_ part. The arXiv is only trying to
disseminate, and it is happy (for now) to let traditional journals and other
sources do the filtering and certifying.

Let me reply to your points in particular:

> Discoverability is a big part of it as well, at least at the user end...If
> there was a geoRxiv then I would probably browse it more regularly because
> the chances of me finding something relevant would be much higher.

Are you just talking about the individual subject areas (ecology, genetics,
etc.)? The arXiv has those as well, which you can subscribe to. Few physicists
subscribe to the entire thing. (Incidentally, folks may find
[https://scirate.com/](https://scirate.com/) filters a bit better.)

Of course, the arXiv doesn't have a dedicated biology section, much less sub-
divisions, but this is because there hasn't been enough interest.

> The second reason, which is not unrelated, is that–at least with peer-
> reviewed journals–the quality of the work in field- or subfield-specific
> venues is often far higher than in the sort of pan-scientific journals like
> Science or Nature.

The main point of the arxiv is to put everything in one place which is
permanent, searchable, sortable, freely available etc. Filters generally come
from elsewhere, such as the aforementioned sections or SciRate, or by simply
looking at arXiv papers published in certain journals (without needing journal
access).

Obviously, in the absence of additional filters, the bioArxiv won't be useful
as filter either.

> If I write a paper and send it to Tectonophysics,...

You'll find that there are plenty of popular-level papers on the arXiv sharing
space with highly technical ones. This is generally noted in the abstract.
While the arXiv is not meant for public consumption, there are plenty of
filters that try to pluck out accessible papers, e.g., the Physics ArXiv blog
(which isn't as official as it sounds) [https://medium.com/the-physics-arxiv-
blog](https://medium.com/the-physics-arxiv-blog)

------
kusmi
So it was put online without peer review? Papers can always be submitted by
the author to PMC using NIHMS if the journal doesn't do it. However the paper
must go through a journal because they arbitrate the peer review.

------
octatoan
There's a massive project going on in math using GitHub to write an open-
source algebraic geometry textbook.

[http://stacks.math.columbia.edu](http://stacks.math.columbia.edu)

------
zem
can someone explain this bit:

> If university libraries drop their costly journal subscriptions in favor of
> free preprints, journals may well withdraw permission to use them

withdraw permission to do what exactly, and enforced how?

~~~
jhbadger
Withdraw permission to publish an article in their journal that has already
been distributed as a preprint. Many journals in biology do that now -- they
consider it publishing an article twice, which is a big no no ("self
plagiarism"). This is beginning to change and hopefully preprints will be
accepted in biology as they are in other fields.

~~~
michaelhoffman
Times have already changed. Many biology journals and publishers allow pre-
prints.

[https://news.ycombinator.com/item?id=11293619](https://news.ycombinator.com/item?id=11293619)

------
p4wnc6
This isn't _that_ innovative. Creationists have been doing this for years.

~~~
hutzlibu
But for different reasons ... nobody (serious) wants them. If they were in
high demand, they would paywall themself, for sure.

------
tevlon
Does somebody know why we are still using pdfs for papers ? I know a lot of
people that are trying to parse PDF files and it is an awful process.

If somebody is looking for an idea for a new venture, this is a problem, yet
to be solved !

~~~
jessriedel
We are using PDFs because they have universal adoption and, importantly, they
reliably produce the same document everywhere. Most alternatives you might
think of will give variable results on different machines.

There definitely needs to be something that makes it easier to parse and
otherwise interact with a PDF. But, for network-effect reasons, it's probably
easier to introduce a parseable overlay for PDFs than to replace the format
wholesale.

~~~
jandrese
There is a tug of war taking place here. TeX is nice because the documents are
formatted for whatever your reading situation happens to be. PDF is formatted
for exactly one situation, the A4 sheet you targeted.

But on the flipside, the TeX document will often be ugly no matter how you are
reading it, and the author can't easily apply tweaks for aesthetics or
readability--many try, hence the TeX markup horrorshow.

~~~
jessriedel
Honestly, most of these issues are just a product of .tex's long and storied
history. If some foundation plunked down $1-$10 million, it could definitely
produce an open source successor to .tex (with extensive, maintained, and
documented libraries like Mathematica) that avoids most of the badness.

------
danieltillett
If I was an evil journal editor I would use the metrics on biorivx to accept
or reject papers. This would make it easy to predict the future impact factor
and help you game the impact factor of your journal.

~~~
stuxnet79
This is actually very clever. With this we will just be moving the goal posts
but at least it will prove to scientists that any notoriety acquired in the
pre-print stage can lead to later career success.

~~~
danieltillett
It is actually quite hard to predict which papers will become hits and which
become misses. If I look back over my own papers there were a few I knew were
going to (minor) hits, but quite a few of the papers I thought would be of
interest were ignored while others I thought were pretty minor attracted a lot
of interest.

One area I have never understood (apart from editor laziness) is letting the
authors write the article title and abstract. So many great papers are
overlooked because the authors wrote a boring or misleading title or abstract.

------
the_watcher
The concerns over "peer review" seem ridiculous to me. The peers who would
review it would still have access, and it would open it up to exponentially
more people.

------
cmurf
What about ODF? The Open Document Foundation has been separate from the
content creation applications for a couple of years now.

------
z3t4
What's stopping them from writing in HTML with some standard CSS?

~~~
pc2g4d
MathML's lack of support?

------
dfraser992
Information wants to be free, BITCH... But seriously, given the increasing
"media savviness" of subsequent generations (from Baby Boomers who grew up
with TV, to Gen Y for whom the Internet is a given), the general ability
across the spectrum of humanity to synthesize disparate information sources
and filter them, compare and contrast, decide what is 'truthy' vs actually
true... is increasing. Given all the information scientists have to process...
What if machine learning was applied to this problem? The role of traditional
gatekeepers is breaking down. I see this is the publishing industry - lots of
content, most of the self-published books are awful, but books like "Wool" are
able to rise to the top.

At least I hope humanity is getting more sophisticated. What is the median of
the age of Trump supporters and the one sigma std dev? That would be an
interesting statistic.

