How do you overcome that barrier? It's not even network effects alone, it's like extreme network effects because the tail is so long in science. I guess it's really the barrier between being high-level discussion about only the latest big-name science news in certain fields, and being a central hub for all kinds of relatively short comments on scientific papers. Seems like a difficult problem to solve.
Have you thought about maybe pre-populating your database with all papers, ready for discussion, rather than having it as a Reddit-like discussion of only recently submitted things? Or is there some other kind of grand vision for where this will go?
> When I read papers, I often write short notes to myself about
> them, as I imagine most people do.
* re-draw diagrams in different styles
* make tables for data that wasn't presented as a table
* categorize which claims are associated with which citations
* make lists, lots of lists! You can never have enough lists. Gunkel said so: http://ideonomy.mit.edu/gunkel.html
* change the way information is displayed
* demonstrate equations, or at least attempt to work through them
* make up summaries of content
* paste direct excerpts.. which over time deteriorate in usefulness.
So far, every "annotation solution" I have tried just impedes my work, so I always resort to using some HTTP file server to host my files, or write HTML when I want to add markup to my notes.
I do have around 450k papers from JSTOR (released legally via archive.org). I have these papers indexed and can expose them via openjournal.
Also, as I mentioned in my previous post, a few guys from Berkeley (Tony Chen et al) are work on peer library which will be a more complete non profit academic search engine.
Good points about the 'barrier'. I'm hoping people will contribute their own papers (even if they are no published).
My ultimate vision would be for people to write their papers within a git repo and then upload/submit their .tex source (along with unit tests). I'm in the process of building some of these features.
Also, experimenting with some ocr and pdf analysis to scrape as much useful contextual information as I can from the papers contributed (for the great benefit of our users)
Thanks for taking the time to respond, streptomycin!
PubMed would be another obvious datasource, if you were going to go that route.
Do you use a library management tool like Zotero or Mendeley for notetaking, or writing on printouts, or something else?
http://www.researchblogging.org/ is a great website that aggregates more long-form posts about papers, but it has a really shitty UI and it doesn't do anything for short comments on random papers by random non-bloggers.
One challenge I've found is that it's often difficult for discussion sites to gain sufficient traction to build a critical mass of discussion - http://plasmyd.com, http://papercritic.com, http://scicombinator.com, http://chemfeeds.com.
I wonder if focusing on supporting existing small-group interactions (real-life journal clubs) would help?
I took a slightly different approach when I wrote http://www.papernautapp.com and chose instead to aggregate existing discussions about academic papers (mostly blogs, a few news sites, HN, and r/science, with a goal to cover to more sites and mailing lists). It's also freely licensed, and there are some interesting things I discovered that might be useful to OpenJournal (looking at your TODO list and GH issues):
* CrossRef.org runs a ton of cool lookup/crossref/deref services at http://labs.crossref.org/
* They also have some great libraries at https://github.com/crossref/ - who wouldn't geek out at this: http://labs.crossref.org/pdfextract/
* If you want to do some auto-identification on webpages, the https://github.com/zotero/translators project is great and actively maintained by the Zotero community.
(Some notes on how Papernaut is put together, if you're interested: http://jayunit.net/2013/01/06/papernaut-exploring-online-dis... )
Lastly, if you're fostering discussion and feedback on papers, there's overlapping interest with the http://altmetrics.org and http://altmetric.com folks.
I think an improvement to the services you linked to would be to add a few new articles each week from very selective journals/conferences in each field. I imagine existing measures like a journal's impact score or the number of a conference's attendees would be a good start and tracking blogposts (as you're already doing) could be a good supplement.
This might help pull older or less visible publications out of obscurity; if something published in a domain-specific journal is germane to a discussion, a commenter might point this out while discussing a more highly visible article.
Do you have any interest in being informed? Also, do you know anyone who may be interested in helping curate / contribute?
One thing that always bothers me with a purely Reddit-style, point-based system for surfacing academic discussions across domains, though, is that it's unclear what kind of papers are being surfaced: a very good paper in a very niche space may not get the attention that a mediocre paper written for a mass audience (for some definition of "mass") would. Is that an acceptable drawback for openjournal? Or should there be some way for niche papers to gain exposure? Forking openjournal and making your own "sub-openjournal" for your research domain? Weighted voting mechanisms?
Also, like reddit, it might be useful to have a mechanism to demonstrate, emphasize, and/or sort by specific commenters' backgrounds, training, and credentials. For many domains, peer review and commentary from people in the same field might be more useful than general commentary.
As a minor wish, I've always wanted to see a mechanism for encouraging sharing of implementations, test code, and other raw experimental results along with the actual papers. 'Cause really, for most cases, I'm not going to implement a multi-page algorithm just to verify a conclusion or make use of an insight. But if I can fork and compile a github repo associated with the paper...
I'd love a great solution to this problem and I'd even consider trying to build one, but I am not sure there is any money in it.
One thing I'd love in an academic paper reader is something that allowed comments/annotations inline with the paper. For example, if a paper in the future contradicts something that is stated, you could add a comment linking to the contradiction. Or you could merely ask and provide clarifications, or comment on simpler alternatives to given part of the paper.
Also, I'd like to be able to rate papers for say, readability or difficulty, tag them as theoretical or empirical, etc.
I'd also like if cited papers were automatically dereferenced so I didn't have to hunt down the references myself.
Personalization would also be a nice feature. EG, recommend other papers by the same author, or other highly cited papers that cite/are cited a given paper, or frequent co-authors of some author that I like.
I'd love to be able to download a bunch of papers easily for offline viewing.
It seems like the rest of your issues could be solved by Mendeley- WDYT?
You guys should get in touch with the peer library guys, send me an email if you'd like an intro. I'd love to see more collaboration in the space.
Internet Archive (archive.org) is also interested in contributing to the space and has been super helpful in aiding our efforts at open journal.
I think the three biggest problems in the space are (1) discovery + accessibility (including open-access), (2) collaboration (sharing, commenting, contributing), and (3) quality assurance (maintainability, scm-backed, repeatable research).
There are many solutions to target discovery and accessibility but I'm (as an academic) personally dissatisfied with the level sharing/collaboration/openness, the lack of community, and the lack of standards in academic research. I think the world needs for academia and research what github did for social programming.
I've loved the GitHub for science analogy ever since it first surfaced a couple years ago- I couldn't agree more.
There is a subreddit for academic papers, I figured it would make sense having something ultra-targeted for computer science papers and I didn't feel like the right community was using the subreddit (the results quality wasn't great). It did teach me that a lot of people like requesting papers, so this is something I am considering.
Also, this was a good opportunity for me to attack a problem I am really passionate about while testing out a web framework I've been writing (waltz).
Finally, I did the project for sentimental reasons. I was talking to Aaron Swartz about open journal a while back over skype and was looking forward to working with him on it, so I thought it would be nice to finish it in his honor :o)
You have some great ideas (mass download is something I've seen requested by some of my friends from my phd program). There's a team called peer library who's attacking many of these problems and I'll be giving them as much support as I can.
Thanks for your kind remarks + great feedback
To this end, it would be great if this were written in such a way that it implicitly considers papers from all the "standard" academic sources as part of the system, ideally with duplicate removal.
That is, automatically add articles from arXiv and major currently existing journals and conferences, try to automatically detect duplicate papers (perhaps add a concept of versions of papers).
In addition, such a site could really benefit by having "virtual journals", where users collect topical collections of must-read papers.
The only thing that would keep such a site from growing is the relative reservation of less technical crowds (at least that has been my observation). HN, proggit, SO: they are all useful and fun for the technical minded. Similar sites for other segments (excluding cats, cats always win) are much less active and sometimes fail to attract some critical mass.
Please feel welcome to submit issues on github and I will try to deal with them in realtime: https://github.com/mekarpeles/openjournal/issues