
A Reboot of the Legendary Physics Site ArXiv Could Shape Open Science - tonybeltramelli
http://www.wired.com/2016/05/legendary-sites-reboot-shape-future-open-science/
======
Aardwolf
I currently really like the site as-is. Useful links to useful PDFs :) The
article says it's a physics site, but I use it more for Mathematics and
Computer Science.

The HTML is small and loads fast and automatically works on mobile because
it's just simple HTML.

I hope the reboot won't mean huge bloated HTML+JS, and then a "mobile" version
that has less features than what it has right now.

P.S. the wired article talks about arxiv.org but doesn't have a single
clickable link that goes to it??!!

~~~
mohsinr
Exactly, not a single link to the site. Had to google and find: Here it is
just in case: [https://arxiv.org/](https://arxiv.org/)

~~~
vanderZwan
Welcome to modern web articles, where every link points back to more articles
of the same website. For example:

> _As one of the first open access science sites, this redesign may influence
> the paths of its younger siblings, like biologists’ bioRxiv whether they
> follow the same route or rebel against it._

There's a link in the article.. which links to a Wired article on bioRxiv[0],
but not to bioRxiv[1] itself. And before you ask: no, the article on bioRxiv
does not link to bioRxiv.

[0] [http://www.wired.com/2016/02/the-rainbow-unicorn-trying-
real...](http://www.wired.com/2016/02/the-rainbow-unicorn-trying-real-hard-to-
transform-biology-publishing/)

[1] [http://biorxiv.org/](http://biorxiv.org/)

~~~
PascalsMugger
Can't let that precious link juice spill out to other websites! Don't you
remember link juice? Bullshit SEO tactics are killing the web.

~~~
jaytaylor
Agreed, the juice stinginess is lame.

They could've at least added a useful outbound link with rel="nofollow" to be
helpful while avoiding leaking any of their precious juice.

------
charles_dickens
Interesting to see their approach and the reasons why they are not building
more features on top of ArXiv. Although comments on papers might be a
dangerous area to venture into there are definitely places on the web where
the ability to annotate and comment papers is helping science move forward. A
good example is the Polymath project and Terry Tao's blog. Tao's recent
solution to the Erdos discrepancy problem,an 80-year-old number theory
problem, was actually triggered by a comment on his blog. Another example is
www.fermatslibrary.com. Although the papers in the platform are more
historical/foundational, they were able to get consistently good/constructive
comments that help people understand papers better.

~~~
egjerlow
Would you elaborate on why you think comments on papers might be dangerous?
For me it seems like the most natural step forward, given where we are today.

As I see it, it would solve many issues. Ideally, we would move away from a
publication-count metric, and more onto a reputation-based metric. It would
lower the bar for participation in scientific discussion, make reproducibility
more important, and generally be a healthy thing for science as a whole, I
think. But I want to hear opposing viewpoints!

~~~
xioxox
I don't think comments would be useful. To make a proper comment on a paper
requires a significant amount of work (as does refereeing). Lowering the
barrier to entry would produce some first impressions, the usual moans about
lack of citation and misunderstandings. If a paper deserves a serious comment,
I think the proper format for that is another paper.

Although citations are horrible metrics, I think some sort of reputation
ratings based on quick likes would be even worse and even easier to game.

~~~
Aeolun
The only thing you have to do to make comments worthwhile is restrict them to
actual academics. Even if they went on to post shit, it would only destroy
their own reputation.

~~~
adrianratnapala
This destroys one of the potential benefits of comments -- which is to open up
what might be closed communities. Academic disciplines can at times turn into
echo chambers.

But there is the rub: if you make comments open, then you will will probably
be flooded with. If you close them up, then fail to get the benefits of
openness.

------
jessriedel
One argument for conservatism: if physicists want a commenting system, why
hasn't SciRate taken off?: [https://scirate.com/](https://scirate.com/) The
interface is clean, it has a lot of nice features, etc. It's gotten some
traction in quantum info (my field), but the comments are rarely used. You can
chalk this up to network effects (chicken-and-egg), which direct arXiv hosting
could solve, but that problem can probably be eliminated by taking the minimal
step of allowing the authors to link to the SciRate page from the arXiv page.

No, I think the major feature about having the arXiv host comments directly is
that it would _force_ physicists to engage with folks who comment on their
paper, because otherwise there would be unanswered criticism attached
("stapled") directly to their public-facing papers. Maybe that's what we want,
but that's a huge step and there are a lot of ways for it to go badly.

Here is a mockup that Paul Gisparg made for author-curated links on arXiv
article pages. See the "Author suggestions" column on the right:
[http://www.cs.cornell.edu/~ginsparg/arxiv/1212.3061-mockup.h...](http://www.cs.cornell.edu/~ginsparg/arxiv/1212.3061-mockup.html)
Allowing links to places like SciRate plausible solves most of the chicken-or-
egg problem by allowing the author to designate "the" place for public
discussion without picking a particular site to win or lose, or bundling
_release_ of a paper with forced public _discussion_.

Shameless plug: folks here may be interested in my blog post about the future
of academic papers and how the arXiv influences that:
[http://blog.jessriedel.com/2015/04/16/beyond-papers-
gitwikxi...](http://blog.jessriedel.com/2015/04/16/beyond-papers-gitwikxiv/)
There was good discussion on HN last year when it was posted:
[https://news.ycombinator.com/item?id=9415985](https://news.ycombinator.com/item?id=9415985)

~~~
ap22213
I am a non-academic who reads a lot of academic papers. I love arXiv, and I
probably spend too much time on there.

Here's my 10-minute take on scirate (from viewing it on a mobile browser).
Overall, I don't immediately see any added value. The layout and design isn't
great - not on mobile at least. The main view isn't optimized for what I want
to see. There's a huge navigation tree that takes up all the view space. Then,
it shows me a list of ranked papers that are not of interest to me. It forces
me to register to access any features. I don't see comments on any of the
papers, only a Reddit-style rating. Do I have to register to see the comments?
Overall, I don't think I'd ever go back. But, maybe it wasn't designed with my
type of use cases in mind.

~~~
jessriedel
Thanks for the thoughts. (Hope it's clear I have no connection to SciRate.)
The link to comments is directly under the Reddit-style rating, although it
only appears if there actually are comments. Most papers don't have comments
and you have to be logged into comment.

Personally, I don't expect it to be nice on mobile and I don't want it to be
effortless to comment. The amount of work it takes to make a useful academic
comment dwarfs those tiny frictions.

------
thomasahle
I hope they take inspiration from the Arxiv Sanity Preserver:
[http://www.arxiv-sanity.com/](http://www.arxiv-sanity.com/)

Also, ArXiv has a lot more than physics. Here's a map over all the papers:
[http://paperscape.org/](http://paperscape.org/)

~~~
eduren
Paperscape looks like a really useful tool for exploring a subject. Thanks for
mentioning it.

------
nemoniac
So arxiv served about 105000 different articles last year. I'm curious how
much it cost and how that compares with what the big publishing houses tell us
that should cost for their version of "open" publishing.

~~~
capnrefsmmat
The arXiv 2016 budget is online:
[https://confluence.cornell.edu/display/culpublic/arXiv+Susta...](https://confluence.cornell.edu/display/culpublic/arXiv+Sustainability+Initiative?preview=/127116484/329816057/CY16arXivBudget.pdf)

They estimate spending just over a million dollars this year, funded via
members (university libraries contributing a few thousand dollars each) and
grants.

However, as I always point out, we must remember that the costs of a published
journal are higher for reasons other than profit: journal articles are
typically converted to HTML and XML via semi-automated processes with manual
proofing and tweaking, production staff and some editors (for the biggest
journals) are paid salaries, peer review involves chasing down deadline-
abusing academics over weeks or months and building some kind of review
tracking system, articles have to be reliably archived (e.g. lockss.org), and
so on.

Then add that most journals are still printed and distributed by mail for some
reason, and you have extra full-time staff and equipment.

I wouldn't want to throw _all_ of that away and just keep the arXiv. Ditching
print makes sense, some things can be better automated (we need better tools
than LaTeX or Word to prepare papers, so they can easily be produced in PDF,
HTML, or semantic XML form), and some services are unnecessary (I've had
journals copy-edit my articles and make no useful changes), but I don't want
to just read crappily-formatted base LaTeX from the arXiv with no frills
either.

(Incidentally, PLOS is non-profit and still has to charge over $1000 per
article in PLOS ONE, to pay for the professional staff and all the manual
manuscript conversion gunk. Biologists love Word, so converting must be hell.)

~~~
AlephGarden
What's wrong with LaTeX (or extensions like XeTeX or LuaTeX)? Cursory
searching pops up tools such as LaTeXML
([http://dlmf.nist.gov/LaTeXML](http://dlmf.nist.gov/LaTeXML)) for converting
from LaTeX to XML; not that I've used it, but it seems to have about a 60%
success rate at converting arXiv papers [1]. That's still about 20,000 papers
that don't build successfully, but it's a tool that could be improved.

What would you suggest as an alternative to LaTeX or Word?

I'm just curious; I use LaTeX regularly and think it works just fine.

[1][http://arxmliv.kwarc.info/](http://arxmliv.kwarc.info/)

~~~
capnrefsmmat
I'm a big LaTeX fan, but the core typesetting engine is pretty heavily focused
on print layout, so you'd have to restrict your macro subset to produce clean
XML reliably. (You'd also have to see if you can produce JATS, the standard
XML dialect for journal articles.)

I don't know of any good alternatives. LaTeX succeeded because of its
excellent extensibility, so the core can be used to produce almost anything.
I'd like to find similarly extensible tools that can cleanly produce XML as
well as PDF. Maybe Pollen ([http://docs.racket-
lang.org/pollen/](http://docs.racket-lang.org/pollen/)) will be one.

There's also Scholarly Markdown
([http://scholarlymarkdown.com/](http://scholarlymarkdown.com/)), but I
suspect Markdown's lack of easy extensibility will doom it, since you'd have
to write documents that fit narrowly into the features they provide. (What if
you want a "remark" environment and they don't provide it?) reStructuredText
is also an option, since it's built for extension, but it's not very well-
known outside of Python circles.

------
drole
I find that these discussions of "to comment and like" or "not to comment and
like" always misses the mark. Both sides of the discussion have solid
arguments and I find it hard to imagine the discussion ever resolving itself
to anything thereby keeping a status quo noone is really satisfied with.

The problem is that most peoples frame of reference when talking about
comments and "likes" is the facebook, reddit or [insert some publisher] model.
That is all fine and dandy as a starting point, but let's not forget that all
of those models are specifically engineered to keep people posting and liking
as much as possible rather than focusing on like and comment quality. There is
no reason, or at least i don't see it, why a different model focusing on like
and comment quality could not be engineered. Furthermore, if such a model
could be engineered well enough, I don't see why a comment and like section of
arxiv could not contribute to furthering science as a whole. With that in mind
a much more fruitful and positive discussion could be had about "how do we
create a comment and like section where everyones interests are protected?"
rather than the old back and forth discussion of easing barrier of entry vs
quality.

As others have noted much better starting points for the above discussion
could be the stack-overflow model or something similar.

------
conjectures
If this happens, there's a lot to be learned from Stack Exchange on how to do
this well. The stats.stackexchange.com site is actually pretty high quality
IMHO.

Arxiv could use some of that reputation & moderation structure to regulate
content. Cranks and time-wasters could be sidelined. It would certainly work
much better than a typical newspaper comments section.

It would work by establishing expectations of how much substance should be in
a response to a paper. Then getting the community to pitch in on the quality
of responses. You still get echo-chamber effects with this, but the quality
filter is generally worth it.

~~~
freddref
I _really_ like the idea of using something similar to the stack overflow
model to help with peer review. It could open up the peer review process to
more people for a longer period of time, giving a higher chance that more
people will contribute if there is interest and a higher overall quality of
discussion. All contributions give you a clear increase in reputation points.

Groups of academics could band together to form online-only journals on a
specific topic and then allocate medals to papers that are considered
accepted. So while there may be many loose papers out there with little or no
comments (like on axiv currently) only a few will get the seal of approval.

~~~
HelloNurse
I would fear fabricated likes and positive comments about "our" papers and
fabricated un-likes and negative comments about "their" papers; academic
cliques and individual professors can leverage their students to pollute any
StackExchange-like mechanism, and if they have too few slaves there are plenty
of PR firms to do the job.

------
westurner
I've written up a few ideas about PDFs, edges, and reproducibility (in
particular); with the Hashtags #LinkedReproducibility (and #MetaResearch)

[https://twitter.com/search?q=%23LinkedReproducibility](https://twitter.com/search?q=%23LinkedReproducibility)

[https://twitter.com/search?q=%23MetaResearch](https://twitter.com/search?q=%23MetaResearch)

\- schema.org/MedicalTrialDesign enumerations could/should be extended to all
of science (and then added to all of these PDFs without structured edge types
like e.g. {intendedToReproduce, seemsToReproduce} (which then have specific
ensuing discussions))

\- [http://health-lifesci.schema.org/MedicalTrialDesign](http://health-
lifesci.schema.org/MedicalTrialDesign)

\- there should be a way to evaluate controls in a structured, blinded, meta-
analytic way

\- PDF is pretty, but does not support RDFa (because this _is_ a graph)

... notes here: [https://wrdrd.com/docs/consulting/data-science#linked-
reprod...](https://wrdrd.com/docs/consulting/data-science#linked-
reproducibility)

(edit) please feel free to implement any of these ideas (e.g. CC0)

~~~
westurner
> a way to evaluate controls

a way to evaluate premises (assumptions, controls, data, data transformations)
and conclusions (presented as e.g. JSONLD, RDFa in a standard form
(potentially like IPython/Jupyter .ipynb; but with an OrderedMap of I/O
sequences with fixed #urifragment IDs)

------
joelthelion
I just hope they don't wreck ArXiv in the process.

------
hyperion2010
Aggregation and curation should remain separate. Perhaps the most useful thing
would be for arxiv (or some other entity) to collect links from known
curatorion sources to aid in the evaluation of papers.

------
PepeGomez
Weird, but the article seems to be paywalled.

~~~
crispyambulance
No, they just detect if you're using adblocker and put up a block screen if
you are.

Sorry Wired, but too many newspaper websites have abused their advertising
privilege for me to ever read a newspaper without an adblocker again.

~~~
a_imho
How is trying to detect what kind of software I'm running is not a violation
of CFAA
[https://ilt.eff.org/index.php/Computer_Fraud_and_Abuse_Act_%...](https://ilt.eff.org/index.php/Computer_Fraud_and_Abuse_Act_%28CFAA%29)
, especially the exceeding authorization part? IANAL and not from the US,
guess it not really applies to me, still, I won't be sad the day wired and co
goes down the drain.

------
wodenokoto
I never browse arxiv, but access it a lot through other services.

I kind of like that model. Have the site being good at what it is best at, and
help other services provide further services, such as search (google scholar)
or stricter moderation and commenting and discussion.

------
auvi
I am not sure why it is called "Legendary". What's the legend here?

~~~
dalke
Like many words, "legendary" has multiple meanings, and the correct one needs
to be inferred from context. A dictionary can help if something, like this,
doesn't makes sense. For example, [http://www.merriam-
webster.com/dictionary/legendary](http://www.merriam-
webster.com/dictionary/legendary) says:

> 1: of, relating to, or characteristic of legend or a legend

> 2: well-known, famous

It seems you only knew of the first definition, when the article uses the
second.

------
erubin
posted behind a paywall on wired...

~~~
paride5745
Clean Firefox (no addons) from Italy, no paywall or ad-wall.

I have the no-tracking enabled in the FF preferences and I'm on Ubuntu MATE
16.04 LTS.

