
Reputation Metrics Startups Aim To Disrupt The Scientific Journal Industry - cwan
http://techcrunch.com/2013/02/03/the-future-of-the-scientific-journal-industry/
======
impendia
> Increasingly it will be seen as perverse to submit a paper to a journal and
> wait 12 months for comments from two scientists, instead of sharing it on a
> platform like Academia.edu and getting comments from hundreds of scientists
> in two weeks.

This seems absurdly farfetched for me. It is easy and common for
mathematicians to share their work (e.g. on the arXiv) and I consider myself
fortunate to get two or three good comments, let alone hundreds. (Of course,
this is a claim made by TC, not by academia.edu.) Whatever the virtues of
academia.edu or any other technological solution, I do not expect it to
multiply the audience for my work by an order of magnitude.

By way of comparison, check out Terry Tao's math blog. His reputation, his
expositional ability, and the size and enthusiasm of his audience are
surpassed by no one. And of course, the blog (and the ability to comment) are
freely available to everyone with a web browser.

<http://terrytao.wordpress.com/>

Look at the comment counts. For many posts, in the single digits. And this
low, even though Tao personally responds to many of the comments and
questions.

I suggest that this is likely to be an upper bound for responses to math
writing posted anywhere, in any format, with or without restrictions on how
can read, write, or respond to what's posted.

I don't think peer review is broken, at least not in math. It can be annoying
to do (which is why it takes so long), but we regard it as an obligation and
that's why we do it.

The issue, in my mind, is to separate peer review from "publishing", or more
to the point, from dissemination. The reason this problem is so thorny and
obnoxious (again, in my mind) is that in principle it is trivial.

~~~
RichardPrice
I believe that the two problems you mention, about peer review taking so long,
and mathematicians not generally commenting much online, can be solved with
reputation metrics.

The reason scientists don't blog a lot, or comment on blogs, is that there
aren't reputation metrics associated with that kind of contribution to the
scientific canon. It is the same reason that scientists don't share their
data-sets. It is work to blog, and to prepare data-sets for public
consumption, and there is no reward. If we reward scientists for sharing their
comments on each other's papers, and for sharing data-sets, more scientists
will be incentivized to make these kinds of contributions to the scientific
canon.

The same goes for speed. When the journal editor asks a scientist to peer
review a paper, the scientist accepts the request out of a sense of social
obligation, but immediately that job goes to the bottom of her list of
priorities. She will get no credit for doing the peer review, and so there is
no reason to do it quickly.

This de-prioritization of peer review slows down the whole scientific
communication system. To speed it up, we need to incentivize scientists to
share their comments quickly. If someone sees something wrong the description
of an experimental set-up, the system needs to be set up in such a way that
they are incentivized to share that insight quickly.

~~~
visarga
I've been saying for many years - if scientists can't break past the issue of
copyrights, nobody will. Academia concentrates probably the smartest minds on
earth. They should do better than cowering under the tyranny of copyright.
They should lead the way.

~~~
Bellone
Few realize that signing over copyright to the publisher frees authors from
protecting themselves in case of infringements on their creation. Few also
realize the marketing and indexing power of the publisher. Break away from
copyright? "Be careful what you wish for", because you're on your own from
then on.

------
mjn
I'm missing the leap here:

> _The taxpayer ends up paying twice for the same research: once to fund it
> and a second time to read it. The heart of the problem lies in the
> reputation system, which encourages scientists to put their work behind
> paywalls. The way out of this mess is to build new reputation metrics._

The heart of the problem simply lies in closed-access journals, and a simpler
solution is to move to open-access journals, which is already happening.
Reputation is not an insurmountable barrier to this. In my field (artificial
intelligence) the top two journals are now open-access. No fancy new metrics
or removal of the concept of journal needed; problem solved.

I view the solutions proposed here as a bit more problematic. Download counts
in particular are likely to just reward linkbaity research, rather than
quality research, and exacerbate the already problematic race to put out
misleading hyped-up press releases. Citation counts are liable to gaming as
well, and are commonly gamed by both citation rings, and by people who
consciously choose the subjects they publish in an ADDish citation-maximizing
way, jumping in and out of hot areas to drop off a paper that'll turn up in
searches. When it comes to judging the quality of AI research, I have a lot
more trust in the editorial process of the open-access journals like _JMLR_ or
_JAIR_ than I do in gimmicky new metrics that try to reconceptualize
scientific publishing as gamification, with all the baggage gamification
brings (mostly notably that it's not openly _open to_ , but positively
_encourages_ treating it as a points-grabbing system to be gamed).

I don't have much trust in the supposedly "open" motives of this new batch of
para-academic for-profit companies, either. Notice how academia.edu won't even
let you download the PDFs of articles without registering for an account. I
would guess the real purpose of these metrics is to set themselves up as new
scientific gatekeepers in one form or another: to transition from a journal-
oriented publication system to an "academic marketplace" oriented system where
they own the marketplace. I have a lot more trust in the by-scientists, for-
scientists model of _JMLR_.

~~~
RichardPrice
Open access journals, based on authors paying $1,000-$3,000 to publish a
paper, are going to be a temporary stepping stone on the way to the decline of
the journal.

We need new reputation metrics to encourage the sharing of data-sets, videos,
code, and generally the full range of a scientist's output. 75% of the world's
scientific data doesn't get shared because the rewards aren't there for the
scientist to share it. Journals aren't going to start publishing data-sets (or
code, or videos, or blog posts), and so we need new reputation metrics, and
reward systems, to encourage scientists to share this kind of information.

The other problem with the journal system is that the peer review system is
slow and surfaces the opinions of only two or three scientists about a given
paper. If you have a community of 20,000 people in the DNA sequencing research
community, asking two people what they think about a paper is not going to
deliver a statistically significant result. Furthermore the typical time-lag
for the system to surface those opinions is 12 months.

We need a more robust peer review system which surfaces opinions from the
entire scientific community in real time. Reputation metrics are the key to
building that system. We need to provide the rewards for scientists to share
their comments on papers, so they can build their reputations off the insights
they share on each other's papers.

~~~
mjn
Yes, I get that's the sales pitch, but you didn't address gamification at all,
which is what you're promoting, and will positively harm science, not improve
it.

In addition, peer-review at well-run open access journals, which are the model
I think we should be moving towards, does not take anywhere near 12 months,
and in CS there are no publication fees (I gather this differs in biology). I
typically get very well-thought-out reviews which help to improve my papers in
about 6-8 weeks, and if accepted, publication is completely free of charge.
And, it's becoming increasingly common to circulate preprints ahead of
submission, on places such as the arXiv or at conferences or workshops, to get
the feedback of a larger proportion of the community before publishing the
final version of the paper.

I agree peer review does not settle the correctness of a paper; it's a system
for ensuring a high-quality discourse, not to publish only papers which are
the last word on a subject. That's what reply papers, replication studies,
etc. are for. I am skeptical that attaching a bunch of points and badges to
the discourse is going to improve the quality of that discourse, and suspect
it will harm it. I'd rather look at how to improve the discourse by improving
the journals. But that won't make money for VCs, so I'm not surprised that
we're only seeing it from nonprofit organizations like JAIR and PLoS.

~~~
RichardPrice
With the time-lag, you have to take into account the rejection rate. Most
articles don't get into their first choice journal, and perhaps get into their
second or third choice. You can't parallel submit, meaning that the time-lag
between finishing the paper, and it getting published in a journal, often
ranges between 6 and 18 months.

I understand that Computer Science does have a slightly faster peer review
process than other fields. Here is a piece where a leading neuroscientist
writes "Today the lag [in my field] between submission and acceptance is often
more than a year".
[http://www.pnas.org/content/early/2013/01/25/201300924.short...](http://www.pnas.org/content/early/2013/01/25/201300924.short?rss=1)

Gamification is an important issue. You should note that the journal's peer
review system, based on two people reading a paper, gets gamed. There is the
wide-spread practice of defensive citation, where you defensively cite anyone
who might conceivably peer review your paper, in order to maximize the chances
of its getting accepted. Some people know how to play that game better than
others.

One way of handling gamification is to look at the quality of the people
interacting with a piece of content, and not just the quantity. This is how
Google's PageRank algorithm deals with the issue.

There is also the approach, taken by Twitter and Facebook, of picking out the
people's whose judgements you trust, and then letting them act as quality
filters. Those systems can be gamed too, but aren't typically gamed to a large
extent.

~~~
jacquesm
I think you are confusing gamification and gaming the system, those are not
the same thing.

GP was referring to badges, points etc (gamification) and you are referring to
gaming the system (ie citing defensively).

~~~
jessriedel
If so, then mjn needs to clearly distinguish between (bad) gamification and a
(good) reputation system, which will necessarily involve _something_ like
points or certifications.

------
PommeDeTerre
The article mentions Stack Overflow reputation. I've always seen it as an
extremely bad measure of reputation. In practice, it only measures a few
things, none of which are very relevant:

\- The length of time somebody has been using SO.

\- That person's ability (which usually amounts to just having lots of free
time) to repeatedly answer the very basic jQuery or .NET questions that are
asked time and time again by new users.

\- The number of other people who will blindly upvote an answer after merely
seeing a picture of, say, Jon Skeet's face.

None of those are indicative of true knowledge, experience, or natural talent.
They're more a measure of popularity than they are of reputation.

~~~
minopret
Service is widely considered a key measure of excellence for academics, along
with research and teaching.

I value my SO rep because it accords approximately with how much (whether in
quality or quantity) I have served the site's readership. The site is robustly
healthy. I think its success is evidence that SO rep is an effective incentive
for SO participants.

I agree that unless we examine the specific activities, SO rep validates only
a minimal degree of technical ability. I just never thought that was its
primary function.

------
cantastoria
_The business models that will emerge in science will be as diverse as the
ones on the web at large. There will be advertising businesses; freemium
models; and enterprise sales models._

I'm not sure who the author thinks the customers for these services will be.
Perhaps universities will pay for access but I guarantee you faculty and
students will not. I also can't imagine advertisers paying much to target
academics, a small and not particularly unique demographic. In short, I just
don't see how these sites hope to generate any revenue. Academia is an
incredibly difficult market to get money out of and even if you're going to
sell at the university level you're talking about a massive sales effort that
is going to take years to yield results (think Blackboard). As other have said
here journals are going to end up being open-access and free. Many of them
will probably end being run by well endowed university presses (e.g. MIT
press).

~~~
ivan_ah
> I just don't see how these sites hope to generate any revenue.

Good point. The academic publishing "sector" is definitely a very interesting
domain and there is a lot of POWER in knowledge, but it is not clear what the
role of a startup company would be in there. We could bring cool new
technology but without wide adoption tech is useless.

Mr. Price is trying the generic (and by now annoying as hell) "social network
for science" approach but it is not clear that having "likes" for papers
inside the academia.edu walled garden do anything for science. What is cool is
the tremendous opportunity for better scientific discovery applications and
the organization of knowledge. Hopefully at some point Mr. Price will //do//
something interesting on that front instead of spending his time in PR mode
writing guest posts for tech news sites.

@Richard: Sorry bro, but the "sign up to see the PDF" really pissed me off the
other day so you are in the bad books until you change that. Not cool.

My guess is that the future of dissemination looks a lot like the arXiv (self-
service, cheap) and that a peer review system can be bolted on top of it.
Prof. Gowers was recently mentioned he heard of an initiative to build "arXiv
overlay journals" [1].

[1] [http://gowers.wordpress.com/2013/01/16/why-ive-also-
joined-t...](http://gowers.wordpress.com/2013/01/16/why-ive-also-joined-the-
good-guys/)

------
kriro
The scientific publication business modell is a work of "genius". In a
nutshell

\- Outsource content production, get content for free

\- Outsource the review process, get quality assurance for free

\- Slap brand name on it and charge (many customers being opted in on autopay
plans)

I'm looking forward to one of the disruptors succeeding. I think that it is
reasonable to demand that any tax funded research should be openly available
for society to benefit

------
tokenadult
I see the article kindly submitted here is a guest post by Richard Price,
founder and CEO of the Academia.edu website. I have signed up for Academia.edu
(and the confusingly similar site ResearchGate, run by a different group of
founders), but so far haven't seen a lot of activity on these new sites.

That point is well taken that there needs to be reform in how scientists
develop their reputation as researchers. Right now, reviewing submissions to
scientific journals is anonymous, and not well rewarded. Jelte Wicherts,
writing in Frontiers of Computational Neuroscience (an open-access journal),

Jelte M. Wicherts, Rogier A. Kievit, Marjan Bakker and Denny Borsboom. Letting
the daylight in: reviewing the reviewers and other ways to maximize
transparency in science. Front. Comput. Neurosci., 03 April 2012 doi:
10.3389/fncom.2012.00020

[http://www.frontiersin.org/Computational_Neuroscience/10.338...](http://www.frontiersin.org/Computational_Neuroscience/10.3389/fncom.2012.00020/full)

suggests new procedures for making the peer-review process in scientific
publishing more rewarding and more reliable too. Wicherts does a lot of
research on this issue to try to reduce the number of dubious publications in
his main discipline, the psychology of human intelligence.

"With the emergence of online publishing, opportunities to maximize
transparency of scientific research have grown considerably. However, these
possibilities are still only marginally used. We argue for the implementation
of (1) peer-reviewed peer review, (2) transparent editorial hierarchies, and
(3) online data publication. First, peer-reviewed peer review entails a
community-wide review system in which reviews are published online and rated
by peers. This ensures accountability of reviewers, thereby increasing
academic quality of reviews. Second, reviewers who write many highly regarded
reviews may move to higher editorial positions. Third, online publication of
data ensures the possibility of independent verification of inferential claims
in published papers. This counters statistical errors and overly positive
reporting of statistical results. We illustrate the benefits of these
strategies by discussing an example in which the classical publication system
has gone awry, namely controversial IQ research. We argue that this case would
have likely been avoided using more transparent publication practices. We
argue that the proposed system leads to better reviews, meritocratic editorial
hierarchies, and a higher degree of replicability of statistical analyses."

------
academicnumber
There are a number of aspects of publishing and perhaps reputation academics
would like to reboot. We are frustrated with e.g. Elsevier, and there is a big
boycott on. Presumably this boycott, and the reasons for it, will extend to
Mendeley after it becomes part of Elsevier. Given that all these startups are
probably heading for similar exists (if successful), this is just more of the
same. When we reboot publishing, it will be in a way we control, not with
Elsevier 2.0.

------
minopret
"A few years ago, Google Scholar started displaying inbound citation counts
for papers... . Scientists have started to see these inbound citation counts
as a way to demonstrate the impact of their work..."

Let's note that article citation counts have been commonly available at
research libraries for about fifty years in the Science Citation Index.
(Disclaimer/disclosure: I do not speak for the business that produces the
Science Citation Index. That business is my current employer.)

~~~
RichardPrice
Yes, that's true. But authors only started including their inbound citation
counts on their job applications in the last few years.

I think the main reason for this is that Google Scholar's citation metrics
were free and openly accessible, and the Science Citation Index was, and still
is, behind expensive paywalls. Scientists were happier to include citation
metrics in their resumes when they knew that the committee could easily check
them, and wouldn't have to pay to get access to them.

~~~
gammarator
Supporting your point, astrophysicists have been using citation counts in this
way for quite a few years thanks to the citation data provided by the NASA
Astrophysical Data System.

Citation counts are biased, but at present they are the best we have. Their
main attraction is that they are _transparent_ and hence trustworthy, and a
proxy for what we really care about: aggregate expert judgement of importance.
Any replacement will have improve one or both dimensions--I don't think
timeliness alone is enough.

------
jnazario
i disagree with the author of this piece about a major premise underlying his
thesis: that journals ultimately have no real contribution to the author or
community that can't be done away with.

top tier journals are top tier journals not because of marketing but because
of their standards. their standards are high because they enforce rigorous
quality control (reviews) and solicit the best, most groundbreaking work in
the field. they bring on the best editorial staff and set directions, not
following it.

arxiv, for example, is not the same as PNAS. anyone can put anything in arxiv,
there is no quality control, no editorial board, no selection process.

i think that if you tried to demolish that, you'd wind up with a replacement,
not a whole new model.

to outsiders this all seems capricious, fickle, arbitrary and anachronistic.
instead, it's important because science - the forward progression of human
knowledge and understanding - relies on ensuring that valid, repeatable
results get publish, not falsehoods (e.g. "vaccines cause autism" and such
tripe) or outright plagiarisms. journals work because they provide this. top
tier journals are recognized as such because they publish the best quality
work which attracts more top quality work. w

~~~
DennisP
That's the whole reason for adding a reputation system, which is what replaces
quality control and selection.

------
rflrob
There's one other function of journals that few people seem to be bringing up
or addressing: that of filtering the flood of research. Hundreds of papers per
week are published a week in "biology" across dozens of journals, but in most
weeks, 0-1 of those will be interesting to me. By scanning the titles in
Science, Nature, and Cell, I'm much likelier to find the really cool stuff
than if I were just given a list of all papers with a given tag.

I'm not saying that this recommendation is an intrinsically hard problem, just
that it doesn't seem to be getting as much play as the other aspects of the
open science revolution.

------
cantos
Richard's proposition is basically that metrics should be created and used
because its always better to have more data. This is only true if the data is
good. The biggest problem I can see is that metrics are easy to fake. That's
fine on sites StackOverflow where the stakes are very low but when metrics are
accorded extremely high value, as is being proposed here, you immediately
create a huge incentive to pollute the data.

The only way out of this is to create some kind of quality assurance of the
metrics themselves. To me that seems like a monster problem that no startup
can possibly handle.

If they restrict their scope a bit more than trying to liberate all of
academia from the publishers there are probably ways to add value.

------
jpdoctor
> _and the primary reputation metric in science is being published in
> prestigious journals, such as Nature, Science, and The Lancet._

Absolutely not. The primary metric is citations.

Where do people get these ideas?

~~~
mjn
Speaking of metrics, using a very rough one (ye olde Google Book Ngrams), we
can guess that citation-counting became prominent in the late 1970s:
[http://books.google.com/ngrams/graph?content=citation+count%...](http://books.google.com/ngrams/graph?content=citation+count%2Chighly+cited&year_start=1800&year_end=2008&corpus=15&smoothing=3&share=)

~~~
jpdoctor
I remember tenure committees would have "vertical inches" measured from the
Science Citation Index.

In the world of big data, not all citations are created equal, and I look
forward to new metrics based on quality of citations.

~~~
mjn
I've long wanted some kind of citation-sorting for research, though I'm more
skeptical of the whole business for assessment. For research, I think it could
improve discoverability and ability to trace chains of influence. If I find a
paper that was cited 190 times and I want to see whether anyone's really
followed up on it (critiqued it, extended it, revised it, refuted it, etc.),
the reverse-citation list is too noisy. It'd be great if I could filter the
citations, for example to exclude "related work exists [2,5,12,44]" type
throwaway citations.

------
revorad
Richard, do you have any plans of making tools for making writing scientific
articles easier? Right now, the tools and output are too rooted in the old
ways of deadtree publishing - after all, articles are still called "papers".
The whole format is based on printing.

Since you're trying to get scientists to publish new types of media, don't you
think you need to provide software that makes creating such media easier?

You can't move from books to blogs to tweets, if all you have is a typewriter.

------
eschulte
A more straightforward solution would be to pass this bill [1], which would
require that tax-payer funded research be made publicly available. Once this
becomes mandatory the publication/reputation system will be forced to adjust
accordingly (and many journals may not survive the transition).

[1] <http://www.govtrack.us/congress/bills/112/hr4004>

------
pseut
"Disrupt[ing] the scientific journal industry" is a massively ambitious goal.
Good luck! I'd really like to know more about startups that are trying a less
ambitious first step; something like "wordpress for marine biologists who want
to start an open-access journal" or "reddit for geneticists." The issues in
different disciplines seem different enough that _intimate_ domain knowledge
is almost mandatory and something that starts with the idea "I want to talk to
these people about this subject in this way" could work out better than "kill
the journals." Since it's fun to quote PG:

> The way to get startup ideas is not to try to think of startup ideas. It's
> to look for problems, preferably problems you have yourself. [1]

I guess, "I want to read other people's research for free" counts as a
problem, but I'd actually be surprised if it were literally a problem that the
founders of this startup (and the others mentioned in the article) were
facing.

I mention this in a comment below [2], but Economics is _extremely_ open by a
lot of standards (maybe because of the policy-relevance of a lot of the
research, but that's speculation on my part). There is a large blogging
community that includes very accomplished researchers, most working papers are
available online well before publication (and will typically reflect at least
one round of referee comments), there are comprehensive citation-based
rankings, etc. Anyone interested in this stuff should check out
<http://repec.org>; the html is dated, but it seems like an established
unwalled-garden version of what this article, and others, describe (one that
uses email and rss feeds rather than direct-messages and social-network-type
friends; decide for yourself which you prefer).

Economists still publish any important research in peer-reviewed publications,
and most of them are closed-access. So I find it unlikely that just the right
infrastructure and "metrics" are going to kill peer-reviewed publication. I
think it's _possible_ that some set of tools could kill the crappy scientific
journals in one or two fields and certainly could kill all of the crappy
closed-access journals in those fields, and then iterations and incremental
improvements could lead to something that worked across fields and worked for
higher-quality publications.

[1] <http://paulgraham.com/startupideas.html> [2]
<https://news.ycombinator.com/item?id=5160633>

------
DennisP
Seems like Pagerank would be an ideal reputation metric, following citations
in papers instead of links in web pages. Except it's patented until 2018 or
so.

~~~
jacquesm
Pagerank is a citation based ranking system, it existed in the academic world
_long_ before it was applied to the web.

That patent should not apply to anything, but it certainly doesn't apply to
citation ranking of academic papers.

~~~
DennisP
Do you mean the general idea of citation ranking, or the specific math used by
Pagerank?

~~~
jacquesm
Citations in scientific papers are by their very nature uni-directional (you
can't cite a future paper), the web does not have that restriction.

~~~
pseut
Not quite true; papers in many fields are available as working papers that get
updated and I've frequently seen papers that cite each other.

