
GitWikXiv: Toward a successor to the academic paper - jessriedel
http://blog.jessriedel.com/2015/04/16/beyond-papers-gitwikxiv/
======
nartz
I dont think 'collaboration' is the problem, I think the main problem is the
presentation.

Academics want the credit for their work, which I think they should retain.
However, dissemination of work is lacking - there are tons of supersmart ideas
locked in academic papers; they act as a barrier for many.

Instead of a dense academic paper in LaTEX or whichever, what if the standard
were to provide the simplest explanation possible, that required a graphic
that demonstrated the idea.

Its sort of a tragedy though, when revolutionary papers present ideas simply
and are overlooked, often because of the misguided notion that 'simple to
understand' equates to a 'trivial insight' \- intoerhwords, if you read it and
it makes sense immediately, sometimes you think that it musn't be
revolutionary. This doesn't always happen, but definitely to an extent.

All of the 'correctness' and 'proofs' are useful for a separate crowd, and
should also be included, but in a separate section because they are for a
different user, namely, for other academics who are well versed in the domain.
Also, a place for code/data/materials, as well as a checklist or script or
otherwise to strictly reproduce results.

~~~
robotresearcher
(edit - tldr: go buy the textbook.)

I don't understand your complaint. Papers are written by researchers to each
other. The 'correctness' and 'proofs' are the point of the paper. The standard
_is_ the simplest possible explanation that _includes the justification for
all claims made_. You are asking researchers to write another, different,
article for you. Maybe they would be better spending their time on more
research and leave the popular writing to others who might be better at it.

If you are asking for some explanation that lies in between the extremes of
original paper and a popular article, you are in luck: this is what textbooks
do.

The supersmart ideas usually spread out before the textbook gets written -
grad students bring them along as they go into industry, etc. Most - almost
all in fact - papers are pretty boring by themselves. The ordinary case is
that papers gradually build and refine ideas for a few years until we look
back and say 'wow, we made some improvements on a decade ago. Cool. Now,
pushing on....'

This idea:

"revolutionary papers present ideas simply and are overlooked, often because
of the misguided notion that 'simple to understand' equates to a 'trivial
insight'"

is popular in the imagination but I don't think it happens very often. More
common is when a great new idea is expressed wonderfully simply and people
skilled in the art read it and say 'holy shit that's neat'. For example
Einstein in 1905 was a near-nobody that presented powerful ideas very simply
and his papers were not passed over.

~~~
nartz
1\. Many papers are never put into textbooks (or if they are, its much later)

2\. A ton of research papers are extremely dense, while the actual 'newness'
could be explained with a simpler definition and a really good diagram or
code.

3\. Without making too blanket of a statement, often, formal proofs are much
more useful after one understands the intuition.

Inotherwords, I'd like to see a format that stresses expressing the 'new idea'
simply. Proving that the new idea is supported by mathematics is definitely
important (as you point out - its the 'meat'), but I believe understanding it
follows intuition in many cases.

~~~
robotresearcher
Most papers don't deserve to be in textbooks, as they are of very minor
interest by themselves. Textbooks distill the fundamental bits.

I agree with your other points, and many authors do work hard to give an
intuitive description. Intuitive, that is, to a reader skilled in the art.

But don't think that just because you don't understand a paper that means it
is overly complicated or obtuse. It might be perfectly compact and
straightforward if you are familiar with the field and its conventions and
notation. And this is the most efficient way to communicate.

Of course, some researchers just suck at writing. A few might even be trying
to sound fancy and make things sound complicated or high-falutin'. Grad
students start out with this tendency, but we try to beat it out of them ASAP.
Some are never redeemed. But I like to think my group produces readable
papers.

------
jsweojtj
I've always hoped that the PaperBricks,
[http://arxiv.org/abs/1102.3523](http://arxiv.org/abs/1102.3523) model would
catch on.

My summary of the idea:

There is an awful lot of redundancy and wasted effort that goes into most
papers. From introductions that need to be rewritten every time (when linking
to a solid introduction would be both better and less time consuming). Each
piece of a piece of a full paper (intro, data, analysis, ...) could be peer-
reviewed and published individually. A full paper could then be built from
these paper-bricks. Anyway, recommend reading the paper as it's well written
and clear.

There's also a YouTube video by the author explaining it:
[http://www.youtube.com/watch?v=4sorEcLjN04](http://www.youtube.com/watch?v=4sorEcLjN04).

~~~
akavel
TL;DR for PaperBricks:

 _" Formalize the structure of papers, such that each paper is composed of one
or many (clearly marked) of the following "sections" ("bricks"):_

    
    
        symbol     description             my description
        "I"     Introduction             ("domain intro")
        "PS"    Problem Statement        ("specific problem")
        "HLSI"  High-Level Solution Idea ("solution vision")
        "D"     Details                  ("solution implementation")
        "PE"    Performance Evaluation   ("benchmarks")
    

_and each "brick" must reference one or many bricks of the same or "earlier"
level, forming a global graph."_

\-------- >8 ---------- >8 ----------

Some of the advantages:

* no need to rephrase the same "intro" in every domain paper, just reference an existing "I" brick;

* a benchmark (PE) can just reference many "D"s;

* one can easily work "backwards" \-- e.g. start with a benchmark (PE) of existing implementations _and already publish it_ , then propose a new implementation (D);

* if someone publishes a similar paper before you, with similar "vision/idea" (HLSI), this doesn't totally destroy your publication, as you can still publish the part with an alternative implementation (D);

* _" I+PS+HLSI or I+HLSI: This is what some communities call a "vision paper" [...]"_

* & many more listed in the linked arxiv paper [http://arxiv.org/abs/1102.3523](http://arxiv.org/abs/1102.3523). Very nice, short and readable one, this.

~~~
jsweojtj
Thank you for this more detailed comment. It does a nice job capturing the
essentials of the paper.

------
panax
It would be very useful if we had access to not only the papers but the actual
data behind what is presented in the paper. Like have everybody store their
research in the cloud and allow everyone else access to it. Right now much
data sits on hard drives in labs at universities and are never seen by anyone
but the researchers who write the papers. Eventually the data gets deleted and
all that data is lost and all you are left with is the paper. That is a lot of
money/effort/information lost. See:
[http://dx.doi.org/10.1016/j.cub.2013.11.014](http://dx.doi.org/10.1016/j.cub.2013.11.014)

‘We are nonchalantly throwing all of our data into what could become an
information black hole.’ -Google's Vincent Cerf
[http://www.theguardian.com/technology/2015/feb/13/google-
bos...](http://www.theguardian.com/technology/2015/feb/13/google-boss-warns-
forgotten-century-email-photos-vint-cerf)

Open access to the data would allow others to corroborate it, determine the
correctness of the analysis in the paper, perform meta analysis, aggregate it
with other similar data. Though there are bound to be privacy concerns,
especially with medical research.

Science itself has still not undergone the digital revolution and it
desperately needs it.

------
hyperion2010
There are lots of people working in this space. See for example
[https://www.force11.org/](https://www.force11.org/). Some of the issues that
the article does not touch on involve the difference between annotating an
article and editing an article. The idea of forking an article doesn't work
very well because what you want is annotation not editing. Changing someone's
words isn't really very useful unless you are editing and collaborating.
Collaboration isn't really an issue, google docs basically solves this issue
(or git+tex or whatever). Wiki provides a reasonable way to deal with this as
well but doesn't really solve the annotation problem. In a practical sense we
are still looking for something that can work like
[http://www.xanadu.com/](http://www.xanadu.com/) (amazingly there is now a
demo). Nothing is going to replace the paper, we can make them better and make
it easier to annotate and talk about them. One major issue is that to date the
citation graph for all papers and works is still trapped behind paywalls.
There is no magical solution for this, it requires us to build lots of
infrastructure and work to bring the publishers along.

One thing the article does not mention is the need for better ways of
documenting provenance for data.

~~~
jessriedel
Thanks for the link to force11. Any other suggestions?

I specifically don't like the idea of annotations rather than editing, because
annotations _expand_ and _diffuse_ the literature rather than distilling it to
excellent concise articles. In particular, annotations only address two of my
six motivations (#4 and #6).

Within the hard sciences, what do you think annotation accomplishes that can't
be accomplished through good editing?

> Nothing is going to replace the paper,

Ahh, a defeatist. You may say I'm a dreamer...

But seriously, I think this is a much more realistic goal than Wikipedia and
ArXiv looked like when they were launched.

~~~
coliveira
The problem is that wikipedia editors are not creating new knowledge, they are
just organizing it. A sort of wikipedia for research would be very valuable as
a separate project, but papers are necessary because they enable a technical
conversation. One or more authors present their particular view of a topic,
and other researchers join the discussion by writing their own papers. On the
other hand, a shared page is an amorphous piece of information that is very
difficult to use as a discussion medium. In this sense, wiki articles are much
more primitive than traditional research papers.

~~~
jessriedel
You make an good and important distinction, and I'll back off my flippant
suggestion that annotations are useless. However, _for fixing the problems
listed in my post_ , annotations are not a good solution for the reason I said
above: they don't condense the literature. (The importance of this relies on
the non-obvious empirical fact that the dispersed literature, even when nicely
linked, presents huge barriers to learning anything outside your sub-sub-
specialty; not sure how much this is disputed.)

Based on your comment, I'm now convinced that a wiki-only model is unsuitable.

------
abecedarius
"Hypertext Publishing and the Evolution of Knowledge" had ideas in this
general area back before the web existed. An excerpt: "One result of all this
activity is what amounts to a review article, developed incrementally,
thoroughly critiqued, and regularly updated. It takes the form of a hierarchy
of topics bottoming out in a hierarchy of result-summaries; disagreements
appear as argumentation structures [17]. When new results are accepted, their
authors propose modifications to the summary-document; they become visible (to
a typical reader) to the extent that they become accepted."

[http://e-drexler.com/d/06/00/Hypertext/HPEK3.html#anchor3165...](http://e-drexler.com/d/06/00/Hypertext/HPEK3.html#anchor316503)
(to link to a relatively concrete scenario; the paper as a whole is
interesting but some of it had to be just to address the very desirability of
something like the web.)

Much more recent and also good: _Reinventing Discovery_ by Michael Nielsen.

------
sjtrny
About arXiv licensing: most people choose the minimal licensing because they
are submitting the paper to a conference of journal. If you put the paper in
the public domain or under a cc license then you risk the journal or
conference rejecting your paper.

If journals didn't care about having the exclusive publication rights then I
suspect a lot more academics would select more flexible licenses.

------
pickle27
great post.

I've also spent a lot of time thinking about this problem and would like to
eventually put some work towards it. A couple of additional ideas that I've
had:

* A paper could rely on a critical reference to build upon and the referenced paper could be disproven down the line but this is not immediately obvious from the paper that used it.

* Currently it doesn't seem like any merit is given to researchers who are very good at reviewing papers. Compare this to software where a good code review is celebrated. Editing and cleaning up the state of science should be valued when scientists are looking for work so I think that something along the line of a Github CV for scientists would be valuable.

~~~
ISL
Good journals honor great referees by listing them on an annual accounting of
Outstanding Referees:

[http://journals.aps.org/OutstandingReferees](http://journals.aps.org/OutstandingReferees)

------
Jace_Harker
Very cool article. I like that it lays out the key features that would be
needed on any academic paper platform.

This actually sounds similar to Authorea
([https://www.authorea.com](https://www.authorea.com)). It's an online
collaborative academic word processor and publishing platform. Uses LaTeX, but
more powerful and efficient.

I added a comment on the post with some more details. (Disclaimer: I work at
Authorea, so I'm biased.)

~~~
jessriedel
Like I say in my reply, I think Authorea is a nice example of a useful tool to
learn from like StackExchange, but even if successful wouldn't answer the
central question I brought up.

[http://blog.jessriedel.com/2015/04/16/beyond-papers-
gitwikxi...](http://blog.jessriedel.com/2015/04/16/beyond-papers-
gitwikxiv/#comment-1980409937)

------
jedbrown
Nice proposal for living documents, but how do you suggest measuring
importance? Journals are a low bar -- it's too easy to publish crap and yet
many of the most important papers were rejected by the prestigious journals
they were first submitted to. In my opinion, decoupling dissemination (what
arXiv provides) from endorsement is desirable. Endorsements should be
revokable and non-exclusive.

~~~
coliveira
I think you're mistaken about the quality of journals. They work most of the
time, although the anomalies you mention do occur. If it was so easy to
publish crap on journals as you suggest, these same journals would not be so
prestigious and used as an evaluation tool by university departments
worldwide.

------
xvilka
I'd suggest to integrate more semantics in this system, like e.g. Knowledge
Graph in Google or SemanticMediaWiki. This can help for automated processing
years later. Also, adding more interactivity with ipython/webppl notebooks
embedded directly in the article and/or comments would be a huge step up for
the community. Or even integration with systems like Coq, Agda, Why/Alt-Ergo.

~~~
jessriedel
I agree. However, I think that the _medium_ of the work is mostly (although
obviously not completely) separable from whether papers are collaborative and
continuously evolving.

------
bkcooper
I'm with you on the lack of review articles, and I think they are probably the
aspect of conventional publishing that this could most likely replace.

I think you'd need a very radical overhaul of how science works to replace the
scorekeeping aspect. I don't think a Github CV is a good analogy to how this
could work, because academic science is interested in hiring leaders (i.e.
people who can get funded), not contributors. I think realistically you would
need to change how science rewards/emphasizes certain activities first, and
then publishing would follow. That would be a good outcome for science, but I
think it'll be awfully hard to get out of this equilibrium.

edit: I'm curious what you think of the PubPeer model. That's obviously
different from what you're envisioning, but thematically I think there are
some similarities.

~~~
jessriedel
> I think you'd need a very radical overhaul of how science works to replace
> the scorekeeping aspect. I don't think a Github CV is a good analogy to how
> this could work, because academic science is interested in hiring leaders
> (i.e. people who can get funded), not contributors. I think realistically
> you would need to change how science rewards/emphasizes certain activities
> first, and then publishing would follow.

I have a lot of criticisms of academic incentives, and I agree there is
something of a chicken-or-the-egg problem, but at least in my field there are
plenty of people who don't command big grants but have large citations. The
problem comes more because people are hired based on metrics that are
unusually bad at tracking what we want.

> I'm curious what you think of the PubPeer model.

It could be useful. Only seems to differ strongly from blogs in its
centrality, but (surprisingly) I'm not sure this is actually a big issue. My
immediate concern with PubPeer is that it _expands_ and _diffuses_ the
literature rather than distilling it to excellent concise articles.

------
amelius
I would already consider it an improvement if we could just comment publicly
on papers. I'm surprised Google scholar isn't offering something like this
already.

~~~
jessriedel
Another commenter suggested PubPeer.

[https://pubpeer.com/](https://pubpeer.com/)

I think this might take off in physics if the ArXiv interfaced with, or
copied, PubPeer. Doesn't address most of my concerns, though.

~~~
jspaulding
We're also working on this as part of ThinkLab
[http://thinklab.com/publications](http://thinklab.com/publications)

------
whistlerbrk
I've been working on the same thing for quite some time, I hope to launch this
year now that I have time again. Love to see this theme continuing to appear
in HN.

------
jessriedel
(I expanded the title to give a bit more context for HN.)

~~~
ikeboy
Is this the idea hinted at in
[http://hpmor.com/notes/119](http://hpmor.com/notes/119)?

>First, I’ve designed an attempted successor to Wikipedia and/or Tumblr and/or
peer review, and a friend of mine is working full-time on implementing it.

~~~
jessriedel
Nope. I'm certainly not working full time on this. (And I've only briefly met
Eliezer once.) Would be very interested to know what that friend is up to,
though...

------
christudor
I think about this a lot. How would you deal with peer review in all this? We
don't want a system where academic papers haven't been peer reviewed, do we?

