Hacker News new | past | comments | ask | show | jobs | submit login

Full-Stack Engineer / Data Scientist | The Open Syllabus Project (http://explorer.opensyllabusproject.org) | NYC / SF | Full-time | NYC or Remote

The Open Syllabus Project is an academic data-mining project at Columbia and Stanford that’s extracting structured information from a corpus of 1M+ college course syllabi. What’s actually being taught in college classrooms? How has this changed over time? What can we learn about the organization of the modern university from large-scale trends in the texts that are being assigned? How can insights from these data be applied to curriculum development, education policy, and lifelong learning?

We launched a beta version of the platform with an op-ed in the New York Times in January, and since then the project has appeared in The Washington Post, Time, The Chronicle of Higher Education, MarketWatch, Der Spiegel, Business Insider, Lifehacker, FiveThirtyEight, WNYC, QZ, and elsewhere. It's also been picked up by major news outlets in Europe, Russia, China, Japan, South Korea, Ukraine, Egypt, and Mexico.

We're looking for someone who has experience with large-scale data analysis, natural language processing, web archiving, and web application development to help us grow OSP into a comprehensive, feature-rich authority about teaching trends in higher education. Some of the things we're going to be working on in the coming months:

* Build a scalable infrastructure for crawling university websites for syllabi, with the goal of growing the corpus to 4-5M documents in the next 6 months.

* Expand the universe of books and articles that we search for in syllabi by identifying new bibliographic databases (Citeseer, arXiv) and integrating them into OSP’s data extraction pipeline.

* Write classifiers to improve the accuracy of the citation and metadata extraction jobs.

* Expand the public-facing web application to surface new types of information – visualize change in assignment trends over time, add profile pages for authors and publishers, and build richer ways to explore the citation graph.

* Help develop a research program around the data. We’re interested in applications to information science, literary studies, education policy, history of science, and canon / university studies.

If these kinds of projects sound interesting, we'd love to hear from you! We use Python for the data extraction rig and the public-facing website (Flask), Elasticsearch for citation extraction, React+Redux on the front end, and Ansible to manage infrastructure on AWS. Beyond specific technologies, though – first and foremost we're looking for a collaborator and partner who can help us build on what we have and push the project in new directions.

Drop us a line at syllabusopen@gmail.com.

Links:

* http://www.nytimes.com/2016/01/24/opinion/sunday/what-a-mill...

* https://www.washingtonpost.com/news/wonk/wp/2016/02/03/what-...

* http://time.com/4234719/college-textbooks-female-writers

* http://www.spiegel.de/unispiegel/studium/aristoteles-bis-mar...

* http://www.businessinsider.com/the-most-popular-required-rea...

* http://lifehacker.com/open-syllabus-project-shows-the-books-...

* http://fivethirtyeight.com/features/to-kill-a-mockingbird-au...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: