A problem: I think one of the most necessary things that are missing from arXiv.org is comments. People just come, read, and then take their discussions somewhere else, fragmented all around the net. Arxiv-Sanity already filters just the ML articles and does personalized feeds, maybe it could also be a place of discussion. I know it potentially leads to other complications (like moderation), but I really think readers would benefit from reviews, questions and answers.
The current ML related discussion sites (blogs, /r/machinelearning, G+, Twitter, StackExchange and YC) are often mixed with lots of noise. I'd like to read what researchers think.
Another suggestion: add links to code repositories, where they are available. Maybe some of your trusted users could be empowered with the right to add such links, if it's too much work for a single person. If interesting discussions are reported on other pages on the internet, they could also be added to the article, to make them easier to find.
As to discussions about papers there are plans (semi-related to arxiv-sanity) in motion to do that well and correctly, not just from me alone. I think we'll see a big delta here over the coming months.
Its much easier to tell when a paper is relevant for me if it happens to cite 3 of the commonly used datasets for my particular task.
btw I use arxiv-sanity, its pretty great, thanks a lot!
Another feature I'd like to add is an ability to follow people, but I'm worried about the exact implementation since the current assumed contract is that your library is private.
One more feature of course I hear about often are comments, but I'm afraid of the site disintegrating into YouTube comments. I think comments have to be done very carefully and would require significantly higher code complexity to incorporate moderation tools, etc. Tricky and non-trivial not just implementation wise but design-wise, incentive-wise, etc.
I feed in a .bib file with papers I like and use a Naive Bayes classifier to find papers I might like in news feeds (science, nature, PNAS, etc).
It works pretty well. As a bonus you can use post high ranked papers to slack or use papers sent to me by other people to repopulate the bib file.
Always welcoming suggestions: https://github.com/pfdamasceno/shakespeare
The publishing culture from the life sciences is toxic and will be avoided by the AI community.
(2) I check my field's arXiv every other day or so.
(3) Google Scholar alerts me of papers that it thinks will interest me, based on my own papers, and it's very useful. Most of what it shows me is in fact interesting for me, and it sometimes catches papers from obscure venues that I wouldn't see otherwise. The problem is that you need to have papers published for this to work, and also, it's only good for stuff close to your own work, not that much for expanding horizons - (1), (2) and Google Scholar search are better for that.
None of them are a substitute for a proper related work search when I'm writing up a paper though, this is just to keep current on what the trends and interests of the community are.
For example, I usually log in to the ACM site and go to my SIGs and see what's new there. I've never thought about visiting arXiv.
Just to give a concrete example, this paper (which was a relevant read for me) was published in TACL in July this year but was available in arXiv since February: https://arxiv.org/abs/1602.01595
The one place where one could actually use a "Follow" button for other people...there isn't one. Classic.
Ok, it doesn't need to be your paper. Just find a paper that was so influential that others working on the same problem probably will cite it, and monitor the new citations.
Particularly something that's generally open.
Best tool I've got ready access to is Google Scholar. There are citations indices I can get access to, by going on-site to a specific facility, but that's pretty limiting when the rest of my work can be done (and the bulk of my materials) are in my office.
(And yes, I'm aware that having to go to where the indices are is how it Used to Be Done, and in fact, I Did That. Technology has moved on.)
So, if you want to see a reddit for research, better news feeds, etc., it is the SHARE dataset that can provide that data. SHARE won't build all those things--we want to facilitate others in doing so. You can contribute at
The tooling is all free open source, and we're just finishing up work on v2. You can see an example search page http://osf.io/share, currently using v1. Some more info on the problem and our approach....
What is SHARE doing?
SHARE is harvesting, (legally) scraping, and accepting data to aggregate into a free, open dataset. This is metadata about activity across the research lifecycle: publications and citations, funding information, data, materials, etc. We are using both automatic and manual, crowd-sourced curation interfaces to clean and enhance what is usually highly variable and inconsistent data. This dataset will facilitate metascience (science of science) and innovation in technology that currently can't take place because the data does not exist. To help foster the use of this data, SHARE is creating example interfaces (e.g., search, curation, dashboards) to demonstrate how this data can be used.
Why is SHARING doing it?
The metadata that SHARE is interested in is typically locked behind paywalls, licensing fees, restrictive terms of service and licenses, or a lack of APIs. This is the metadata that powers sites like Google Scholar, Web of Science, and Scopus--literature search and discovery tools that are critical to the research process but that are incredibly closed (and often incredibly expensive to access). This means that innovation is exclusive to major publishers or groups like Google but is otherwise stifled for everyone else. We don't see theses, dissertations, or startups proposing novel algorithms or interfaces for search and discovery because the barrier of entry in acquiring the data is too high.
I've also read the front page, the about page, and your post several times, and I'm not exactly clear what you provide. I thought I'd do some searches to see the product made sense. A search for a field in interested in, arthritis, yielded zero results. Okay, so... no medical research? A search for "reddit" yielded results, and mentions of "providers". I'm not clear what providers are... is reddit a provider, or the research papers, or the publishers, or the researchers...?
I'll read more later when I'm not on mobile, maybe it will be clearer.
I'm starting a project related to analysing published research, so this is a field I'm very interested in. I hope SHARE can help in some way, and I'll definitely be keeping tabs on your work. Thanks for posting.
What should I be reading? I'm a computer science student, I want to go into a "Software Engineering" line of work. Are there any places to read up on related topics? I have yet to find something that interests my direct field of choice. Is there one on in academia writing about software?
I also like NLP and other interesting parts. Basically all practical software and their applications are things that interest me.
Also, they generally have industry or "in practice" tracks that have postmortems from the big software companies in case you want something more applied.
Instead, read more review papers and seminal papers in your field.
Finding out how to identify the relevant older work in your field, finding it, reading it, and seeing for yourself how it's aged, been correctly -- or quite often incorrectly -- presented and interpreted, and what stray gems are hidden within it can be highly interesting.
I've been focusing on economics as well as several other related fields. Classic story is that Pareto optimisation lay buried for most of three decades before being rediscovered in the 1920 (I think I've got dates and timespans roughly right). The irony of economics itself having an inefficient and lossy information propogation system, and a notoriously poor grip on its own history, is not minor.
The Internet Archive, Sci-Hub, and various archives across the Web (some quite highly ideological in their foundation, though the content included is often quite good) are among my most utilised tools.
Libraries as well -- ILL can deliver virtually anything to you in a few days, weeks at the outside. It's quite possible to scan 500+ page books in an hour for transfer to a tablet -- either I'm getting stronger or technology's improving, as I can carry 1,500 books with one hand.
You're welcome to try it (not sure if the signup workflow still works; let me know). I'll be happy to hear your feedback.
Edit: you can upvote papers, and they'll float to the top just like on HN.
If I restarted from scratch I would do the server-side in Python because there's just a lot more good libraries available.
In my case, I really enjoy Go, but certainly not all the time. It has its place. You may find either that it's the best thing ever, or that you cannot stand how it does X, and Python does it so much better. Some comparisons are objective, but the things that make or break it for you may be subjective.
I also have a few ScienceDirect search alerts set up, that come in once every few weeks typically with 1-5 papers.
And Google Scholar, if you use it and you are logged in with an account, learns from your search history and suggests new papers for you to read. It's relatively good.
If there's a fundamental new result in basic CS or something like that, I figure I'll hear about it on HN or another news site.
I can imagine it's different for people actively working on new research, though.
These days, accepted papers in specialized conferences are actually on mixed topics these days.. like you'll see security and file systems in SOSP
Yes, this is precisely the sort of application RSS is excellent for.
I hope it catches on.
Others have tried and they don't get enough traffic to get it to take off but since low levels of hosting are free, I could just keep it out there for a long time.
It would be nice if someone solved the problem and managed to create a working one, though.
With the right weighting this could really boost the size and quality of your dataset.
Right now we are working on helping the community surface information and working with verified researchers to build their "curated" lists for different topics.
That would create an echo chamber. You need to know about research that challenges your assumptions.
Is something like that for papers on the arXiv.
Reddit doesn't allow subreddits to limit who can moderate posts or comments except by taking the subreddit private and limiting the membership.
It's actually a bit of a major pain, particularly for smaller public subreddits.
Reddit's moderation system in general is just hugely problematic. It kind of works, but it really doesn't, and has received very little love.
The first question for any such system should be "what is your goal?". Reddit serves popularity relatively well. Accuracy, relevance, information: rather less so.
Some non-brief thoughts on that from a few years back:
The only thing I can suggestion w/o understanding your needs on more than a superficial level would be to create bots that have admin access, that attach "flair" that denotes rank that the bot uses to move stories around, etc. Network effects and availability still make sites such as reddit very attractive.
I have always wanted a multidimensional discussion, so that joke posts and memes automatically diverge from the current hyperplane of the discussion.
What Reddit does offer, and woeprks fairly well, is moderation tools and teams sufficient to scale out pretty well.
A bigger problem is that conversation sinply doesn't scale well, something old-timers have been realising for a while. I've got a Dave Winer quote somewhere to tthat effect, and was rereading Shiky's "A Group is its Own Worst Enemy" which suggest what I'm increasingly concluding: with the right people, from 2-3 through maybe 50-100 people can actually discuss something. More than that and it's broadcast or a large number of comingled side conversations.
I'm coming to appreciate Wordpress and blogging platforms' capabilities, and sheer size. There's a ton of blogged content out there, it's mostly that finding and commenting on it is challenging.
Another element that's lacking is filtering tools, for which I think randomness and/or community ought play a larger role -- filtering content up through smaller groups.
Also both implicit measures and known trusted quality "roots" (vetters / editors).
I'm coming to appreciate Wordp
As you mention, it is broadcast vs discussion.
I'm trying an experiment (and am way behind schedule) at /r/MKaTS and /r/MKaTH along these lines. There's a private and a public subreddit, one for more closed discussion, one for more open. The idea is to build these out.
Using flair, you can get something like the related-subtopic discussion. See /r/dredmorbius (a solo bloggy effort) or any of the big subs with flaired discussion (/r/AskHistorians or /r/AskScience) for examples -- you can look at the full sub, or dive into a specific flair's topics.
A significant problem with Reddit is that establishing these structures is difficult. Setting up post flair -- the names, the styles, the sidebar search, etc. -- is a major PITA. FSM help you should you want to revise the scheme later.
And you're still stuck with the problem that it's not possible to filter out a flair to report only posts above some arbitrary cutoff (you can sort by "best" or "top"), not that the moderation system gives you any particularly good mechanism for doing that in the first place.
Reddit (as with many discussion systems) is a bit too focused on the now and not sufficiently on the good. I'm particularly annoyed that it's not possible to revisit old posts for discussion (the six month comment freeze), a feature of G+ which actually turned out to be really useful.
There's also the whole Notifications dynamic which ... simply doesn't work well. Yes, you see if someone's mentioned your name, specifically, but you can't get a general notification of discussion on a post (unless you've specifically subscribed to it, and that only for 48 hours). That's utterly unworkable for larger discussions, but works well for small ones.
subscribe to archive email lists
Semantic Scholar (no notifications) is good for manually finding things
Google Scholar notifies you when your papers get citations... Unfortunately they don't have a way for you to get notified if the paper is not yours.. so I made a few fake accounts that add papers to the library as if they are the author and then I set up a forwarding to my email. (really wish they would just expand the notified of citations feature to your library and not just your papers but whatever)
There are some obvious exceptions on the cutting edge of technology (VR etc) but developers in my position care more about reliably making reliable software that earns (or saves) money. To this end, it's usually better to apply techniques and technologies that are already somewhat mature. I think this is more typical.
This doesn't mean I'm stuck on Java 2, but it means I don't read the papers on Paxos and Dynamo and such (instead I read the Hacker News article on the release of Apache Cassandra and build distributed software on top of relatively-early beta versions - and occasionally the business deals with costs from migrating from Thrift to CQL but the risk was worth it).
They index a whole bunch of sites and repos to provide a recommendation engine tailored to you and your field.
I scan the emails during weekly meeting.
I help build a product called BrowZine . It's focused on researchers at an institution - academic, private, and medical especially - who want to easily track the latest research papers in their favorite journals.
If you have login credentials at one of our institutions, please login and try it out! We think it's a great way to discover what journals your school/hospital/organization subscribes to, and My Bookshelf lets you save favorite journals for later, and keeps track of new articles as they are published.
If you don't have login credentials at a supported school, you can try out the Open Access library with just OA content.
Give it a try - we have a great team trying our best to make it easy to stay up to date with your journal reading! Love to hear your thoughts.
Further more I follow other people interested in this field on twitter/google +/facebook, some of which are researchers in this field.
Moreover when a major conference's program is released I try to look into the proceedings.
You can also use the rss feeds with a service like IFTTT or Zapier to set up an alert system.
I basically just check my twitter account daily (also follow many great researchers who have twitter accounts :))
Also, the more experienced researchers all seemed to be have many connections to other researches through which news propagated.
That's a link to the various sites, blogs, updates that I subscribe to, Phronix and Ars are both a bit noisey but other than them the rest I take good care to keep up with.
I personally think it's fantastic that RSS has made such a come back (some would say it never actually went away), it' such a simple, useful tool that's easy to integrate with just about anything.
Another interesting discussion I enjoy having is finding out how people read / digest / discover feeds:
tldr; I use Feedly to manage my rss subscriptions and keep all my devices in sync, but instead of using the Feedly's own client, I use an app called Reeder as the client / reader itself.
I can see myself dropping back to a single app / service, which would likely be Feedly but for me Reeder is just a lot cleaner and faster, having said that I could be a bit stuck in my comfort zone with it so I'm open to change if it ever causes me an issue (which it hasn't).
I use a combo of two tools:
Feedly - https://feedly.com
RSS feed subscription management.
- Keyword alerts
- Browser plugins to subscribe to (current) url
- Notation and highlighting support (a bit like Evernote)
- Search and filtering across large numbers of feeds / content
- IFTTT, Zapier, Buffer and Hootsuite integration
- Built in save / share functionality (that I only use when I'm on the website)
- Backup feeds to Dropbox
- Very fast, regardless of the fact that I'm in Australia - which often impacts the performance of apps / sites that tend to be hosted on AWS in the US as the latency is so high.
- Article de-duplication is currently being developed I believe, so I'm looking forward to that!
- Easy manual import, export and backup (no vendor lock-in is important to me)
- Public sharing of your Feedly feeds (we're getting very meta here!)
2. Reeder - http://reederapp.com
A (really) beautiful and fast iOS / macOS client.
- The client apps aren't cheap but damn they're good quality, I much prefer them over the standard Feedly apps
- Obviously supports Feedly as a backend but there are many other source services you can use along side each other
- I save articles using Reeder's clip to Evernote functionality... a lot
- Sensible default keyboard shortcuts (or at least for me they felt natural YMMV of course)
- Good customisable 'share with' options
- Looks pleasant to me
- Easy manual import an export just like Feedly
- Now can someone come up with a good bookmarking addon / workflow for me? :)
Edit: Formatting - god I wish HN just used markdown
> - Now can someone come up with a good bookmarking addon / workflow for me? :)
Unless I've missed something I'm puzzled why social bookmarking has never taken off or achieved critical mass. Once upon a time there was deli.cio.us (or however they punctuated it!) but when that went through a bunch of churn I think it felt like it got semi-abandoned - I stopped using it _ages_ ago anyway.
What is more incredible to me is that Linked Data is based on URLs so you'd think that social bookmarking would have evolved out of something in that space at some point but to the best of my knowledge it hasn't.
Perhaps it's the organisational, classification, taxonomy/folksonomy, tagging conundrum that is holding this space back, I really don't know.