Hacker News new | past | comments | ask | show | jobs | submit login
GitWikXiv: Toward a successor to the academic paper (jessriedel.com)
112 points by jessriedel on April 21, 2015 | hide | past | favorite | 49 comments

I dont think 'collaboration' is the problem, I think the main problem is the presentation.

Academics want the credit for their work, which I think they should retain. However, dissemination of work is lacking - there are tons of supersmart ideas locked in academic papers; they act as a barrier for many.

Instead of a dense academic paper in LaTEX or whichever, what if the standard were to provide the simplest explanation possible, that required a graphic that demonstrated the idea.

Its sort of a tragedy though, when revolutionary papers present ideas simply and are overlooked, often because of the misguided notion that 'simple to understand' equates to a 'trivial insight' - intoerhwords, if you read it and it makes sense immediately, sometimes you think that it musn't be revolutionary. This doesn't always happen, but definitely to an extent.

All of the 'correctness' and 'proofs' are useful for a separate crowd, and should also be included, but in a separate section because they are for a different user, namely, for other academics who are well versed in the domain. Also, a place for code/data/materials, as well as a checklist or script or otherwise to strictly reproduce results.

(edit - tldr: go buy the textbook.)

I don't understand your complaint. Papers are written by researchers to each other. The 'correctness' and 'proofs' are the point of the paper. The standard is the simplest possible explanation that includes the justification for all claims made. You are asking researchers to write another, different, article for you. Maybe they would be better spending their time on more research and leave the popular writing to others who might be better at it.

If you are asking for some explanation that lies in between the extremes of original paper and a popular article, you are in luck: this is what textbooks do.

The supersmart ideas usually spread out before the textbook gets written - grad students bring them along as they go into industry, etc. Most - almost all in fact - papers are pretty boring by themselves. The ordinary case is that papers gradually build and refine ideas for a few years until we look back and say 'wow, we made some improvements on a decade ago. Cool. Now, pushing on....'

This idea:

"revolutionary papers present ideas simply and are overlooked, often because of the misguided notion that 'simple to understand' equates to a 'trivial insight'"

is popular in the imagination but I don't think it happens very often. More common is when a great new idea is expressed wonderfully simply and people skilled in the art read it and say 'holy shit that's neat'. For example Einstein in 1905 was a near-nobody that presented powerful ideas very simply and his papers were not passed over.

1. Many papers are never put into textbooks (or if they are, its much later)

2. A ton of research papers are extremely dense, while the actual 'newness' could be explained with a simpler definition and a really good diagram or code.

3. Without making too blanket of a statement, often, formal proofs are much more useful after one understands the intuition.

Inotherwords, I'd like to see a format that stresses expressing the 'new idea' simply. Proving that the new idea is supported by mathematics is definitely important (as you point out - its the 'meat'), but I believe understanding it follows intuition in many cases.

Most papers don't deserve to be in textbooks, as they are of very minor interest by themselves. Textbooks distill the fundamental bits.

I agree with your other points, and many authors do work hard to give an intuitive description. Intuitive, that is, to a reader skilled in the art.

But don't think that just because you don't understand a paper that means it is overly complicated or obtuse. It might be perfectly compact and straightforward if you are familiar with the field and its conventions and notation. And this is the most efficient way to communicate.

Of course, some researchers just suck at writing. A few might even be trying to sound fancy and make things sound complicated or high-falutin'. Grad students start out with this tendency, but we try to beat it out of them ASAP. Some are never redeemed. But I like to think my group produces readable papers.

Proving that the new idea is supported by mathematics is definitely important

No, it is all that matters. Having a neat and intuitive understanding of a new idea that is wrong has at best no value and at worst is actively harmful.

> All of the 'correctness' and 'proofs' are useful for a separate crowd, and should also be included, but in a separate section because they are for a different user, namely, for other academics who are well versed in the domain.

Isn't the audience of an academic paper other academics? Placing proofs and details in a separate section is convenient for the layman but bothersome for the intended audience. Often, it is precisely these details that are important. In mathematics for example, using new methods to give simpler proofs of old theorems is very useful; here, other researchers would care about how particular details are resolved.

One of the things that motivate people to work on academic research is the possibility of doing what you suggest should not be done. Academic researchers love reading and writing precise papers, the same way that classical musicians love to read and write complex music. Do you really think that they should measure their work by other people standards? If you don't like what academics do, just write your own layman versions intended for public consumption. Also, in many cases that is not even possible: most papers are just making a small improvement in a very complex issue, so it is very difficult to provide a concise version that is intelligible to non-professionals.

I think that might be in danger of becoming like infographics - things that people share and coo over but without really understanding what it means and what the implications are.

I could draw you a picture of my latest papers if you wanted, and it may look pretty but I think you'd be cheating yourself if you thought you were getting anything of out it.

I think you made a valid point, but it is not an argument against doing things another way.

You're talking about a different issue. Your suggestions don't solve the problems I used to motivate my post.

Abstract, Introduction, Conclusion (5-10 minutes max) is usually the first readthrough for an academic paper (I typically read them in 3 readthroughs).

If I can't get the key value of the paper from that first readthrough the paper is usually not very good. Writing a good paper is actully hard and the typical formats exist for a reason. If you review say 20 papers/day during your initial research phase it's more valuable to have clear structure and an abstract than to have "easy to understand" language with lofty examples.

So basically...good papers do provide the simplest explanation possible. In fact it's something you very actively try to do when writing a paper. Or in other words: I think you just want more papers to be good (there's a lot of unreadable crap that seems pseudosmart but ask most academics and they'll tell you they strive for easy to understand).

To be fair, most papers include abstracts that try to give that "simplest explanation possible". Though, as others have pointed out, these papers are meant for communication between researchers and not general consumption.

I do think there's a problem with how research gets shared more generally though. I wish there were more scientifically literate writers at that boundary, because unfortunately, there aren't enough hours in the day for researchers themselves to fill that role.

Try lolmythesis.com/

I've always hoped that the PaperBricks, http://arxiv.org/abs/1102.3523 model would catch on.

My summary of the idea:

There is an awful lot of redundancy and wasted effort that goes into most papers. From introductions that need to be rewritten every time (when linking to a solid introduction would be both better and less time consuming). Each piece of a piece of a full paper (intro, data, analysis, ...) could be peer-reviewed and published individually. A full paper could then be built from these paper-bricks. Anyway, recommend reading the paper as it's well written and clear.

There's also a YouTube video by the author explaining it: http://www.youtube.com/watch?v=4sorEcLjN04.

TL;DR for PaperBricks:

"Formalize the structure of papers, such that each paper is composed of one or many (clearly marked) of the following "sections" ("bricks"):

    symbol     description             my description
    "I"     Introduction             ("domain intro")
    "PS"    Problem Statement        ("specific problem")
    "HLSI"  High-Level Solution Idea ("solution vision")
    "D"     Details                  ("solution implementation")
    "PE"    Performance Evaluation   ("benchmarks")
and each "brick" must reference one or many bricks of the same or "earlier" level, forming a global graph."

-------- >8 ---------- >8 ----------

Some of the advantages:

* no need to rephrase the same "intro" in every domain paper, just reference an existing "I" brick;

* a benchmark (PE) can just reference many "D"s;

* one can easily work "backwards" -- e.g. start with a benchmark (PE) of existing implementations and already publish it, then propose a new implementation (D);

* if someone publishes a similar paper before you, with similar "vision/idea" (HLSI), this doesn't totally destroy your publication, as you can still publish the part with an alternative implementation (D);

* "I+PS+HLSI or I+HLSI: This is what some communities call a "vision paper" [...]"

* & many more listed in the linked arxiv paper http://arxiv.org/abs/1102.3523. Very nice, short and readable one, this.

Thank you for this more detailed comment. It does a nice job capturing the essentials of the paper.

I think a positive side-effect of PaperBricks idea would be full standardization of notation. It's tiring to look up meaning of variables as they differ with each author.

Thanks for this pointer. I added it to the bottom of the post.

It would be very useful if we had access to not only the papers but the actual data behind what is presented in the paper. Like have everybody store their research in the cloud and allow everyone else access to it. Right now much data sits on hard drives in labs at universities and are never seen by anyone but the researchers who write the papers. Eventually the data gets deleted and all that data is lost and all you are left with is the paper. That is a lot of money/effort/information lost. See: http://dx.doi.org/10.1016/j.cub.2013.11.014

‘We are nonchalantly throwing all of our data into what could become an information black hole.’ -Google's Vincent Cerf http://www.theguardian.com/technology/2015/feb/13/google-bos...

Open access to the data would allow others to corroborate it, determine the correctness of the analysis in the paper, perform meta analysis, aggregate it with other similar data. Though there are bound to be privacy concerns, especially with medical research.

Science itself has still not undergone the digital revolution and it desperately needs it.

There are lots of people working in this space. See for example https://www.force11.org/. Some of the issues that the article does not touch on involve the difference between annotating an article and editing an article. The idea of forking an article doesn't work very well because what you want is annotation not editing. Changing someone's words isn't really very useful unless you are editing and collaborating. Collaboration isn't really an issue, google docs basically solves this issue (or git+tex or whatever). Wiki provides a reasonable way to deal with this as well but doesn't really solve the annotation problem. In a practical sense we are still looking for something that can work like http://www.xanadu.com/ (amazingly there is now a demo). Nothing is going to replace the paper, we can make them better and make it easier to annotate and talk about them. One major issue is that to date the citation graph for all papers and works is still trapped behind paywalls. There is no magical solution for this, it requires us to build lots of infrastructure and work to bring the publishers along.

One thing the article does not mention is the need for better ways of documenting provenance for data.

Thanks for the link to force11. Any other suggestions?

I specifically don't like the idea of annotations rather than editing, because annotations expand and diffuse the literature rather than distilling it to excellent concise articles. In particular, annotations only address two of my six motivations (#4 and #6).

Within the hard sciences, what do you think annotation accomplishes that can't be accomplished through good editing?

> Nothing is going to replace the paper,

Ahh, a defeatist. You may say I'm a dreamer...

But seriously, I think this is a much more realistic goal than Wikipedia and ArXiv looked like when they were launched.

The problem is that wikipedia editors are not creating new knowledge, they are just organizing it. A sort of wikipedia for research would be very valuable as a separate project, but papers are necessary because they enable a technical conversation. One or more authors present their particular view of a topic, and other researchers join the discussion by writing their own papers. On the other hand, a shared page is an amorphous piece of information that is very difficult to use as a discussion medium. In this sense, wiki articles are much more primitive than traditional research papers.

You make an good and important distinction, and I'll back off my flippant suggestion that annotations are useless. However, for fixing the problems listed in my post, annotations are not a good solution for the reason I said above: they don't condense the literature. (The importance of this relies on the non-obvious empirical fact that the dispersed literature, even when nicely linked, presents huge barriers to learning anything outside your sub-sub-specialty; not sure how much this is disputed.)

Based on your comment, I'm now convinced that a wiki-only model is unsuitable.

> One or more authors present their particular view of a topic, and other researchers join the discussion by writing their own papers.

I am interested in solutions in this space, but for a completely different practical effect: democratic discourse.

See: https://news.ycombinator.com/item?id=9245772

Annotation is one way to make citations explicit within the literature. So if there is a follow up paperer that says "we couldn't reproduce this part of the experiment" it shows up in the original source. Similarly if there is an update to a method in the paper then the original method can be annotated. While this second case does look like editing you don't want to get rid of it because often you need to know what the original method was to interperet the results of that paper. Similarly community commentary on a work is not well supported by a purely editing model. Consider the Talmud for example, you have centuries of commentary and annotations on the Torah as a rather absurd end of the spectrum where editing the original work would be outright forbidden. I actually think that most of the difference comes down to what we name and what gets its own URI, but editing seems to imply that the changes become embedded in the original text whereas annotation is something that sits beside or on top of the text and is its own discrete document, a link rather than a commit so to speak.

"Hypertext Publishing and the Evolution of Knowledge" had ideas in this general area back before the web existed. An excerpt: "One result of all this activity is what amounts to a review article, developed incrementally, thoroughly critiqued, and regularly updated. It takes the form of a hierarchy of topics bottoming out in a hierarchy of result-summaries; disagreements appear as argumentation structures [17]. When new results are accepted, their authors propose modifications to the summary-document; they become visible (to a typical reader) to the extent that they become accepted."

http://e-drexler.com/d/06/00/Hypertext/HPEK3.html#anchor3165... (to link to a relatively concrete scenario; the paper as a whole is interesting but some of it had to be just to address the very desirability of something like the web.)

Much more recent and also good: Reinventing Discovery by Michael Nielsen.

About arXiv licensing: most people choose the minimal licensing because they are submitting the paper to a conference of journal. If you put the paper in the public domain or under a cc license then you risk the journal or conference rejecting your paper.

If journals didn't care about having the exclusive publication rights then I suspect a lot more academics would select more flexible licenses.

great post.

I've also spent a lot of time thinking about this problem and would like to eventually put some work towards it. A couple of additional ideas that I've had:

* A paper could rely on a critical reference to build upon and the referenced paper could be disproven down the line but this is not immediately obvious from the paper that used it.

* Currently it doesn't seem like any merit is given to researchers who are very good at reviewing papers. Compare this to software where a good code review is celebrated. Editing and cleaning up the state of science should be valued when scientists are looking for work so I think that something along the line of a Github CV for scientists would be valuable.

Good journals honor great referees by listing them on an annual accounting of Outstanding Referees:


Thanks for this. I now quote those points at the bottom of the post.

Very cool article. I like that it lays out the key features that would be needed on any academic paper platform.

This actually sounds similar to Authorea (https://www.authorea.com). It's an online collaborative academic word processor and publishing platform. Uses LaTeX, but more powerful and efficient.

I added a comment on the post with some more details. (Disclaimer: I work at Authorea, so I'm biased.)

Like I say in my reply, I think Authorea is a nice example of a useful tool to learn from like StackExchange, but even if successful wouldn't answer the central question I brought up.


Nice proposal for living documents, but how do you suggest measuring importance? Journals are a low bar -- it's too easy to publish crap and yet many of the most important papers were rejected by the prestigious journals they were first submitted to. In my opinion, decoupling dissemination (what arXiv provides) from endorsement is desirable. Endorsements should be revokable and non-exclusive.

I think you're mistaken about the quality of journals. They work most of the time, although the anomalies you mention do occur. If it was so easy to publish crap on journals as you suggest, these same journals would not be so prestigious and used as an evaluation tool by university departments worldwide.

Agreed. I think collaborative documents will ease the transition away from journals as endorsements.

I'd suggest to integrate more semantics in this system, like e.g. Knowledge Graph in Google or SemanticMediaWiki. This can help for automated processing years later. Also, adding more interactivity with ipython/webppl notebooks embedded directly in the article and/or comments would be a huge step up for the community. Or even integration with systems like Coq, Agda, Why/Alt-Ergo.

I agree. However, I think that the medium of the work is mostly (although obviously not completely) separable from whether papers are collaborative and continuously evolving.

I'm with you on the lack of review articles, and I think they are probably the aspect of conventional publishing that this could most likely replace.

I think you'd need a very radical overhaul of how science works to replace the scorekeeping aspect. I don't think a Github CV is a good analogy to how this could work, because academic science is interested in hiring leaders (i.e. people who can get funded), not contributors. I think realistically you would need to change how science rewards/emphasizes certain activities first, and then publishing would follow. That would be a good outcome for science, but I think it'll be awfully hard to get out of this equilibrium.

edit: I'm curious what you think of the PubPeer model. That's obviously different from what you're envisioning, but thematically I think there are some similarities.

> I think you'd need a very radical overhaul of how science works to replace the scorekeeping aspect. I don't think a Github CV is a good analogy to how this could work, because academic science is interested in hiring leaders (i.e. people who can get funded), not contributors. I think realistically you would need to change how science rewards/emphasizes certain activities first, and then publishing would follow.

I have a lot of criticisms of academic incentives, and I agree there is something of a chicken-or-the-egg problem, but at least in my field there are plenty of people who don't command big grants but have large citations. The problem comes more because people are hired based on metrics that are unusually bad at tracking what we want.

> I'm curious what you think of the PubPeer model.

It could be useful. Only seems to differ strongly from blogs in its centrality, but (surprisingly) I'm not sure this is actually a big issue. My immediate concern with PubPeer is that it expands and diffuses the literature rather than distilling it to excellent concise articles.

I would already consider it an improvement if we could just comment publicly on papers. I'm surprised Google scholar isn't offering something like this already.

Another commenter suggested PubPeer.


I think this might take off in physics if the ArXiv interfaced with, or copied, PubPeer. Doesn't address most of my concerns, though.

We're also working on this as part of ThinkLab http://thinklab.com/publications

There's also PubMed Commons, for publications in journals index by PubMed:


Wouldn't be helpful for physics and math, though.

I've been working on the same thing for quite some time, I hope to launch this year now that I have time again. Love to see this theme continuing to appear in HN.

(I expanded the title to give a bit more context for HN.)

I've thought a lot about this as well and think what you presented is good.

I'd like to add the one problem that has usually motivated me to think about this problem the way you have: Academic works can contain ideas or data that becomes outdated or is found to be incorrect.

If you're not an expert in a niche, it's hard to sift this out when you come across it. It seems intellectually wasteful to have works that are, for example, 90% accurate and relevant, but have an idea that needs to be updated or tossed. An example of a field where I believe this happens too often is in economics.

Alone, I don't have the time to keep up and fact check everything I read, but collaborative editing could help a lot in this area.

Is this the idea hinted at in http://hpmor.com/notes/119?

>First, I’ve designed an attempted successor to Wikipedia and/or Tumblr and/or peer review, and a friend of mine is working full-time on implementing it.

Nope. I'm certainly not working full time on this. (And I've only briefly met Eliezer once.) Would be very interested to know what that friend is up to, though...

I think about this a lot. How would you deal with peer review in all this? We don't want a system where academic papers haven't been peer reviewed, do we?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact