Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A HN/Reddit-style site for scientific pre-prints and publications (upvote.pub)
275 points by danielecook on Feb 1, 2018 | hide | past | web | favorite | 107 comments

As the title suggests, the site is centered around discussions regarding scientific pre-prints and publications. Current sites supported are:

* arXiv * bioRxiv * Pubmed * Pubmed Central * DOI

Threads are submitted using the publication ID, DOI, or URL of a pre-print or publication. PDFs are fetched when available and used to generate a thumbnail.

Currently, the site only supports academic email addresses. I am trying to figure out the best way to open the site to all without having to manage spam. Let me know if you have any thoughts/ideas. If you are interested in accessing the site and do not have an academic email address please PM me with an email address to whitelist.

I completely understand the issue of not wanting spam of various sorts broadly defined, but I do think restricting it to academic addresses is a huge mistake. As an established researcher on the verge of leaving an academic institution, but who is still very much interested in research, I'm becoming acutely aware of how cliquey academics is, and how problematic it is for the field and society as a whole.

The assumption that anyone who might have anything important to say would be at an academic institution is extremely dangerous (just as is the problem of access to journals outside of academic institutions, etc.).

Seconded. I'm not an academic but I deal with academic publishing and I might want to participate.

My suggestion: use ORCiD.org for authentication with OAuth. This is a researcher profile. Most academics have one (or should!). They're free, the service is open. I think it strikes the right balance between barrier-to-entry for spam and accessibility.

I've implemented sign-in using ORCID, and although I support the idea, its UX is far from great, its documentation isn't either, and it also isn't terribly stable (it was down on my launch day...).

How does ORCiD.org itself deal with spam? Seems like they just use a captcha, no?

More about ORCiD for the skeptical: https://en.wikipedia.org/wiki/ORCID

If anyone has any questions I can ask someone from ORCiD to come over here and answer them. They're very open!

I somewhat disagree.

Academic journal papers are meant to be digested by experts in that topic, not the general population. It seems necessary to have a filter on registration to ensure that the platform is useful to these experts. You could imagine the restriction could be relaxed to research institution emails -or- a referral.

Even with a restrictive registration, the platform could still be useful to the a non-expert who is interested in the topic by reading the exchanges between experts.

>Academic journal papers are meant to be digested by experts in that topic, not the general population.

TL;DR: Studies like these need scrutiny from everyone. Not just people that declare themselves experts.

I disagree.The latest E-Cig study says they found evidence of tumors in mice who were exposed to cigarette smoke....What they don't tell you in the abstract is that they used 10mg/ml nicotine E-juice and exposed the mice for 3h/day, for 12 weeks. The highest ratio most people vape at is 9mg/ml. No one smokes them for 3 hours straight. The study failed to include how exactly they produce the vapor, no mention of wicking material or coil metallurgy, temperature, or voltage of the coil. It could very well be that the tumors were cause by combustion of either the coil being heated above it's melting point, the wicking material burning, or the juice being combusted and not vaporized. Even though I'm not considered an expert in tumors or cancer I have learned that this test is flawed is missing information and doesn't really show anything that we as people don't already know "Heating things beyond the combustion point produces byproducts that could be harmful to our health." I can rest assured instead of listening to headlines that didn't bother to read or scrutinize the study. I think studies like these need scrutiny from everyone. Not just people that declare themselves experts.

Let's also be fair...just because I have an academic email address doesn't mean I'm an expert. I could be a janitor at JHU for intent and purposes.

It would be helpful to link the study. Without seeing the paper, I would be cautious of separating the paper and conclusions it draws from a possible news article that has extrapolated or expanded the conclusion actually made in the paper.

> I do think restricting it to academic addresses is a huge mistake

I have to agree. I'm no academic for almost 10 years, now, yet I've spent the last three years working with academics on my free time and it's really frustrating how hard it is to work without a “proper” affiliation. I've had to create my own “lab” (it's just a name, and it's only me) to have my name on posters, but I'm still unable to get access to some services because, well… because I'm not paid for the work I do, I guess.

The academic world seems collectively addicted to credentialism and status. A friend of mine recently went from being a grad student in academic geology to private-sector geology. She's experiencing some pretty severe culture shock.

She's getting paid a reasonable amount. She gets paid overtime. Her opinions are given weight. Her physical needs are taken seriously. She gets credit for her work. She is, in general, treated as an actual human being capable of having expertise and bringing value.

The total result is that she's spent months going "WHY DID NOBODY TELL ME THE PRIVATE SECTOR WAS AWESOME?!"

Perhaps they can add support for alternate emails - sign up with your primary academic email, add a secondary. When a user leaves an institution, they can promote the secondary.

Can you please allow .mil and .gov addresses? Those "government scientists" add up to a lot of researchers.

Totally agree. I did government research for a while and published papers. Don't exclude them - anyone at the intersection of government work and research is likely to have meaningful contributions to this website.

Great job! I had thought of building something like this before. Some ideas I'd toyed with were:

- A tool for annotation so that people could comment on specific portions of papers. This could help new researchers in a field to more quickly understand the different broader contexts in which a result can be interpreted, or the technical limitations of a given method and how that should be considered when say using the results to motivate other work.

- An option for some form of anonymised commentary, so that people could voice their views on a particular work without fear of reprisal. I think this would be amazing for non-tenured academics to exert some form of power in the community, if a platform like this were to take off. Of course, it would make for potentially very messy and difficult to moderate discussions.

The first thing you mentioned, annotations, or maybe even inline Q&A/discussions, was my dream in grad school (math). I imagined something like a separate MathOverflow for each paper.

The basic problem is that I was constantly finding myself struggling with a tricky part of some paper from the 70's ("Why is this sentence true??"), and I knew many students before me had struggled with the same section, but there was no way for one really great expositor to clarify it, get upvoted, and be done.

I'm sorry, but what is an academic email address?

I guess, any email address provably bestowed by a university (like .edu emails in the US).

Speaking of which; https://github.com/leereilly/swot

Swot is a community-driven or crowdsourced library for verifying that domain names and email addresses are tied to a legitimate university of college - more specifically, an academic institution providing higher education in tertiary, quaternary or any other kind of post-secondary education in any country in the world.

I actually used the swot database for the site. If anyone is interested this gist will download the swot repo and generate a dictionary of the email --> school/university in python:


Usually when you study/work on an university, you get an email address with the university domain (e.g. jorge@mit.edu).

Those are used in many places as a simple way to validate that you are a student or a researcher.

Ops, I meant to ask how they were validating academic addresses. I'm actually a postdoc, but my university domain is @upv.es, and I've found that sometimes it doesn't get recognized in American websites which expect a .edu for claiming academic discounts or things like that.

@bringtheaction comment mentioning SWOT is an interesting solution. Although the name of my institution is outdated! I'll send a PR.

The important part here is "usually". It should be stressed that using lists (such as SWOT mentioned in another comment here) can add to the bias towards US and other well-known universities.

I have been involved in academia for 10 years now, and I never held an email address that would match against the SWOT list.

How about doing a scraping of e.g. 10k Google Scholar profiles with more than ~10 publications to get the domains of their verified email addresses? Then you have a much broader whitelist than just *.edu.

Or you can crowdsource this, say I want to register as john.doe@instituteoptique.fr; in the registration form simply require me to give a link to someone's Google Scholar profile with an email address from the same domain.

It should also be noted that e.g. on Figshare, anyone can upload any PDF and get a DOI for it.

Only 39% of the submissions to the latest ICLR (machine learning conference) were from .edu emails [1]. There's a lot of research going on outside of academia, and there's also a large portion of academia that doesn't use the .edu TLD. That said, I don't have a solution to your problem of fighting spam.

[1]: http://webia.lip6.fr/~pajot/dataviz.html

Do you plan to add PsyArXiv? (https://psyarxiv.com/)

> Let me know if you have any thoughts/ideas

Many papers list authors' email addresses.

Scrape PDFs on the services you support for email addresses. If a domain name occurs in enough author email addresses and isn't on a "known generic" list (gmail et al), consider it "safe".

This looks really cool. I've emailed admin@upvote.pub about tracking your use of DOIs.

Did you advertise it in any academic circles, then?

I have mentioned sent it out to people I know and will be posting it elsewhere as well.

Have you actually had problems with spam? Or is this a purely hypothetical problem?

swot seems to be restricted to places of higher education, leaving out research institutes, such as the MPG association in Germany or the national labs in the US.

Really like the site! Not an academic but i was wondering why the thumbnail? They all look pretty much the same, why would I want to see a thumbnail of lot of text?

I agree about the thumbnail -- I'd be a fan of compact icons that show what field(s) of study the linked paper covers, something like the tags https://www.reddit.com/r/askscience/ has.

I like the idea of integrating the sub/field similar to what r/askscience has.

Thank you - There are a few of reasons.

(1) the thumbnails indicate that the PDF is available and can be downloaded. This is to promote submissions from pre-print servers and open access journals that make their research readily available. If the PDF is behind a paywall there will not be a thumbnail.

(2) the behavior of the site differs slightly from HN and Reddit. Clicking the title link of a publication takes you to be comment page, whereas the thumbnail takes you to the PDF. Open to hearing what people think about this, but the thumbnail image replaces the title link.

(3) the thumbnails do tend to look similar - but you can often tell what journal it is in and you can recognize familiar papers from it. I like it when authors put figures up front too which some of the ML paper do and which adds some variety.

I think the sort of people who are looking up scientific articles would almost entirely be the same sort of people who appreciate information density and hate useless whitespace.

Your first and second points could be satisfied by showing a small PDF icon where the thumbnails currently are. Whatever size that allows the listings themselves to be as close together as possible. The PDF icon could also be different colours depending if it has a paywall, or an money symbol over it.

The really big thumbnails that mostly all look the same make the site look very amateurish on first sight.

Thanks for the feedback. The thumbnails could definitely be made smaller and/or substituted. I'll be making some sort of change but not sure exactly what yet.

Compact mode, which Reddit also offers, is an easy way to satisfy both camps. Set the default to whatever your users prefer (determine this with A/B testing) and allow the other choice in Settings.

I don't know if I agree with the thumbnail at all but if you want to keep it try to make it fit the content better, i.e. make it the same high as the right column and put the votes and the vote button closed to the content as well.

Here's an example where I on the top row changed the size of the thumbnail to 6em height and width in proportion to the page size and also aligned the votes and upvote button closer to the content. The 2nd and following entries are left with your style.


(1) I think you should just color paywalled PDFs differently

(2) I don't think this is good enough justification for the huge amount of space wasted by the thumbnail

(3) Recognizing familiar papers isn't all that useful. Making a specific thumbnail for each sub would be better IMO.

I agree with this speaker, I would rather see an excerpt from the abstract then a thumbnail. It's much better to the distinguish paywalled content in another way, as I user I didn't get that association at all, rather add some paywall icon (maybe a warning-sign or a cross) in a colour associated with warning or danger i.e. red or yellow (think traffic lights, paywall is a stop).

1) and 3) would be better solved, I think, by appropriate text tags. And tags are also great when integrated with the site search, or even the native search/find in the browser. On some subs of reddit for example, paywalled articles are announced by a simple flair.

Perhaps there could be an option to give a pagenumber for the thumbnail when making a submission? The poster could thus optionally choose their "moneyplot" from the paper to draw attention.

Maybe you could extract pages that have figures and show one of those? Or a larger preview on hover?

Unrelated to the main discussion, I’ve been wondering how to approach this UI in a document processing product I’m working on. What is the best way to “preview” a document with a thumbnail?

This is a really good question! Maybe they should ditch them and give greater visual importance to subs?

Cool, but personally when I hear something as being HN/reddit style, I think of of the listings being pretty compact (submittable, votable listings are done on many other places).

On my phone, I see about 14 listings per screen on HN, 9ish on Reddit (mobile website, I don't use the app). This? Only 4. So whole I would personally prefer something more dense, but good job nonetheless!

I wonder, why not list them as just text titles? Do people really get value from seeing a preview of the actual paper?

I'm working on making them more compact. Thanks.

I’m a researcher and I also think reddit and upvotes/karma is trash.

Why should I use your service?

Why do you use this service? My dream for the site would be that the most value would come from the comments section where people could ask questions, discuss ideas, expand on the background of papers, or debate whether the results justify the conclusions. I'm also hopeful that much like HN, the site wouldn't track the newest and shiniest thing but also would highlight older work that is worth revisiting.

You probably shouldn't use the service, simple as that.

This is how ML research works these days, right?

You cherry-pick results and polish them to look super flashy so everyone hears about your work over Twitter, Facebook, and r/MachineLearning cause that's how everyone learns about new papers these days.

Do you think HN is trash or that upvotes are a bad system for this site? Seems like the simplest way to have a community self-filter content

Reddit/HN are popularity contests. My guess is GP would prefer something more like RottonTomatoes, curated by "experts".

That is an interesting idea. I would worry that it would have the "cliquey" problem that another commenter mentioned.

I do like the idea of having a public forum to comment on work, regardless of where it is published.

If you add comments, then you get to the problem of maintaining comment quality which will likely lead to some form of moderation. IMO that leads back to the "cliquey" issue just on another level.

There’s Faculty of 1000 for this, though my impression is that it has gotten less active and opinionated lately.

I do want to ask why you are building this? If just for fun, then it's a nice project.

Building the site won't be the hard part, though I'm sure you'll get plenty of feedback here.

Attracting a community of scientists that foster real discussion will be. Any ideas there at what will set you apart and properly entice them?

I built this for fun and I can say that it was fun to build! It's an experiment too. But prior to building it, it was something I could see myself using.

To entice users, I have thought about adding features that users find directly useful to their work (e.g. the ability to take notes on papers within the site and export them), or the ability to export saved articles as bibtex... but I'm still thinking this through. Open to ideas.

What if you allowed users to curate lists of articles (maybe with options for RSS or notifications)? I think we would see lists like "Seminal papers in X area". Maybe anything that helps people find stuff that is a) good b) what they are interested in would get them engaged.

Can you open source this on github? Or share directly ?

a site that fosters real, thoughtful discussion requires moderation - if you compare the amount of clickbait on Facebook vs the general helpfulness of articles on Hackernews, one is moderated, one is designed for maximum interaction.

Some idea for other subs: Computer Science/Maths/Physics should be the most common I think

Also https://eprint.iacr.org/ is used a lot for publishing crypto preprints

A sub or filter for each of the arxiv subcategories would make sense and would be easy to implement for papers submitted from the arxiv. As someone in physics, i would most likely want to contribute comments in my subfield and just view top papers (or most discussed) from a combined physics super category.

I like the idea, I think eventually peer review should be done on a site like this. It might be helpful to think about why currently, this site would not be suitable for peer review, and work towards that it might be in the future.

Peer review as it currently works is not perfect, but it is the status quo, so needs to be taken into account. Obviously many things can be considered to be changed, for example I would enforce that anyone who comments is displayed with his/her real name.

Why do you think the real name is important? I think that there are so many people in academia whose position is precarious and subject to the whims of people in powerful positions - many of whom have ego issues. It would be great to have an anonymous forum that the community used to discuss the current state of their field where people could be more open about their (potentially controversial) opinions. This could work out better for science.

A statement about scientific research should stand on its own merit, regardless of who it came from.

Who it comes from is part of that merit - there's a big difference between some untrained person shouting "correlation isn't causation" when it was never implied in the first place (ie every thread on reddit that references any sort of scientific article) and someone who has a deep background in the specific niche that the paper addresses.

Peer review is to be done by one's peers, not some random anonymous commenter.

I think for an open discussion it is important to know who is part of the discussion. As an author you are putting a paper out there, and you do your best to make sure the paper is original, relevant and correct. As a commenter, you should also try to do your best, and the best way to assure that in my opinion is to attache prestige not only to papers, but also to comments. This includes negative prestige for making bad comments. That's why I would say a real name policy is important.

See also SciRate, which is mildly popular in the quantum info community.


Thank you - I have never seen this.

I highly recommend messaging the developers. I was interested in their project (from an academic perspective, not a developer one) and got some good feedback from them. In particular, SciRate has sort of stagnated and you'd probably want to try something they haven't to increase the chance that you get a foothold.

I have thought about a site like this before (as I'm sure many have)! One thing I thought about is how do you keep noise out of the comments while maintaining engagement with the community?

Let's say this takes off and is the answer to the publisher fees and everyone posts their papers here an peer reviews them -- I assume that's the dream end goal.

As the user base grows and engagement with posts grow, comment sections can become overwhelmed by well meaning illinformed people or even bots with an agenda.

Have you thought about registering people in the space and having a separate section for them to discuss? In your mind, is the site primarily geared towards researchers publishing and their peers, regular people looking for more access to papers and the process, or a mix of both.

Very nice start and cool to see someone actually moving on this problem!

I hope the target audience is not researchers. I think it doesn't make sense for researchers because they are usually very focused on a topic and don't care (or don't have time) about papers on other topics.

I see this as a tool to increase the hype about Machine Learning. Only people starting in the topic will use it.

It will make sense to filter the whole list with keywords or topics or anything that filters the papers and shows only the ones related to your researcher. The subs still have too many papers.

For non-mobile, increase the size of all of the text. You have a lot of room to work with, there's no reason for text to be so tiny. That includes the upvote arrow & text, and all of the article text. It's ok for titles to wrap. Also add better spacing between each text segment that belongs to each article (title, authors, date submitted, comments).

Mobile is obviously the extreme majority target now, however it should take less than 20-30 minutes to do a good job adjusting the styling to make it a much better experience on non-mobile.

Small issue: when I click an arrow, it properly gives me a basic alert() that I need to be logged in to do that, but it still changes the arrow to blue. I suspect it's an MVP, and you may already be aware of that.

You might want to investigate having a mode where you can see some of the figures from each paper. This works very well for astronomical arxiv papers: http://arxiver.moonhats.com/

I'm getting server error 502. But I'm looking forward to testing this. I can certainly see this as being useful. In my research area some discussion is taking place on twitter but it's too scattered and too superficial to be useful.

Working on this should be back in a little bit!

If there are any subs you would like to have added please let me know. Currently, I know the site is restricted to biology and a small stats section but I can easily add chemistry, physics, math, etc.

I'd be interested in having subs for "cognitive science" and "linguistics". Perhaps allow users to create new subs? If this is taking off, you will never be able to keep up with requests for new subs. Like reddit, your platform should rely on self-organization to the extent possible.

Yes, please, physics and astronomy. Maybe the same subdivisions as arXiv?

Promising site!

Thanks - I'd love to add these but I was unsure whether they would be appropriate. Going off of the biology section on arXiv - I think there are better ways to subdivide a field. Are there any improvements/consolidation that would be appropriate for those subdivisions on arXiv for physics and astronomy?

Seconded. I'd like to see if the astro-ph sections get any traction here.

Ocean sciences would be good (oceanography, marine biology, remote sensing), but I'm not sure to what level of detail you're willing to split subjects.

This is awesome! One suggestion I should make: I think it would be nice to have the abstract underneath the title of the paper so I can skim to see what the doc is about before opening it.

Also I was wondering how you went about grabbing contents from the Bioarxiv? Are you using their RSS feed? I built a web scraper myself which will grab the pdfs and relevant info from (https://www.biorxiv.org/content/early/recent) and store it on my computer (to run some ML algorithms) and it was kinda a pain to do..

I'd guess that it uses the API: https://share.osf.io/

It seems really skewed towards biology publications, is it on purpose ?

That’s my background (:

I’d love to add subs for other fields but I wasn’t sure which subfields would be best.

Great work! I’d love to see added a stub for graphics, image processing, and computer vision!

So one question: who is your target audience? Is it the layman/citizen researcher who might want to read academic publications? Or is it more meant to apply peer review to preprints?

One thing that I strongly dislike is the name, upvote.pub. Science is not a popularity contest where the candidate with the most upvotes wins. To me the name suggests that this site is trying too hard to force science into the categories of social media and that it may not be a place for serious discussion. I write this not because I want to be negative but because I know that many of my colleagues will immediately be put off by the name. ResearchGate is also trying to conceptualize academic exchange in terms of the categories of social media and most of my colleagues dislike it. The platform certainly looks like it could be useful but, yes, it's sending mixed signals about the audience and purpose.

Edit: Downvotes? Please explain how I violated the etiquette?

I have the same exact feelings. A standard non-expiring, ordered by recentness, threaded, forum would just work better.

> Science is not a popularity contest where the candidate with the most upvotes wins.

Correct, it's the most citations.

It's easy to be cynical about this, but even citation count is a more useful signal than upvotes.

Thanks for the feedback.

The site is very much an experiment and can definitely be improved. Its not designed to be a popularity contest. I am hoping old and new research is submitted and that the commenting function of the site can counterbalance articles that are overhyped.

As I said, I think the page itself could be useful. It's mainly a marketing issue. Although, a point could be made that such a page should not have voting at all. Some simple one-click user feedback can be useful but perhaps don't call it "vote" which implies that the community is trying to reach a verdict on someone's research. Researchers are very sensitive about their work and may not like it.

The target audience is anyone interested in discussing academic preprints/publications. Currently, it is restricted to only academic emails but I am hoping to open it up to anyone. Clearly, that will likely draw in scientists but I'd like anyone that wants to participate to be able to.

Really, really, really cool. Site looks slick. I would definitely use this if there were more users (the old chicken and egg problem ...).

My suggestion is not to restrict it based on email but to have a very very strong voting policy (i.e. HN). If you say something dumb on HN you get downvoted into oblivion, and that is OK.

How did you implement the academic email filter? Where did you get the domain list? I just have registered my account with a none .edu email address.

What CMS used for upvote.pub? Is this CMS open-source?

I forked this repo on github: https://github.com/codelucas/flask_reddit

Updated it ot Python3 and added a number of new features.

Nice. Could you tell wich fork is your on Github[0]?

[0] https://github.com/codelucas/flask_reddit/network/members

Hug of death?

...And we're back! Hopefully it should stay up now.

> upvote.pub collects information based on what pages you browse and what items you download on this site.

Please, dissable this sh*t function[0]! No need collect such data in our non safe web...

[0] https://upvote.pub/h/privacy-policy

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact