I created this as part of Udacity's Data Science nanodegree. It uses metadata provided by Arxiv's API to build a categorization model for the scientific disciplines, using each papers title to build a "bag of words" vectorized model and then allocates each paper to a category based on the contents in its summary abstract. Each category grows out of a "parent category", sorted by a simple difference in words of the categorical labels.
I plan on writing up a blog post about the development of this, mostly for myself to reflect on the personal growth (and sometimes frustrating!) experience, that hopefully others find interesting or informative. For now, I need to focus on finish up the nanodegree (A/B testing next!)