Hacker News new | comments | show | ask | jobs | submit login

I developed and maintain Arxiv Sanity Preserver (http://www.arxiv-sanity.com/), one of the Arxiv overlays the article mentions. I built it to try address some of the pains that the "raw" arXiv introduces, such as being flooded by paper submissions without any support or tools for sifting through them.

I'm torn on how Arxiv should proceed in becoming more complex. I support what seems to be the cited poll consensus ("The message was more or less ‘stay focused on the basic dissemination task, and don’t get distracted by getting overextended or going commercial’") and I think the simplicity/rawness of arXiv was partly what made it succeed, but there is also a clear value proposition offered by more advanced search/filter/recommendation tools like Arxiv Sanity Preserver. It's not clear to me to what extent arXiv should strive to develop these kinds of features internally.

Whether they go a simple or more complex route, I really hope that they keep their API open and allow 3rd party developers such as myself to explore new ways of making the arXiv repository useful to researchers. Somewhat disappointedly, the arXiv poll they ran did not include any mentions of their API, which in my opinion are a critical, overlooked and somehow undervalued. For example, today their rate limits are very aggressive and make it tricky to pull down publication metadata for Arxiv Sanity Preserver, even when this undoubtedly costs minimal bandwidth. In the future, I'm concerned they will discontinue the API altogether and prevent similar 3rd party overlays from being built.




I can see many potential problems arising if arxiv itself implements recommender systems or in some other way tries to be a gigantic journal or the primary social media for scientists or something alike. Tools that are strictly about making accessing the repository easier, like better word search, are fine.

I'd prefer if arxiv stays a repository, and the filtering and discussion and rating is left to other parties, for example, overlay journals such as Discrete Analysis (discreteanalysisjournal.com) [that ranked highly on HN when it launched]. Many separate journals can compete. The future of arxiv being as neutral as possible ground that disseminates papers freely on the internet (in standardized format and with maybe even the API you mentioned) and allows for a beneficial ecosystem of journals and other communities to develop, well, it sounds a more healthier future than arxiv trying to become the science community that encompasses everything from hosting of papers to discussion about them and judging their quality.

Assuming they adopt objective like that, I can easily imagine arxiv becoming a FB-of-science of sorts, and it might be very convenient ("I can just log in to Arxiv and everything will be there!") and maybe people would love it. (Everybody except lunatics and luddites and generally people who like to be inconvenient out of sheer malice to their fellow human beings are in FB, right? At least in my social circles. And that gives FB overwhelming ability to dictate terms for using FB, given that the ordinary user isn't their customer.) It could hand too much power to just one institution to define how the scientific community communicates.

Suppose they introduce comments. Sooner or later that would go the same way as any other internet community that has commenting and isn't 4chan, and they'd need actual comment moderation, maybe even ban people. If they are the major science-on-the-internet platform, that could be bit too much. I understand that currently they only do some basic sanity checks that submitted papers are not utter garbage?


Hi Andrej, first of all let me thank you for creating my prefered interface to arXiv.

Do you select papers to be displayed on arXiv-sanity by hand or automatically? Does manual selection explain why there are sometimes 2-3 days with no publications, and the suddenly a bulk of papers?


I don't select anything by hand, it's listed by date, as it comes from arxiv API. The 2 day lags are due to weekends, when arxiv does not update.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: