

A Data Analysis Curriculum - gautambay
http://www.mysliderule.com/data-analysis-intro

======
daemonk
I don't find khan academy's videos to be that great to get at the intuition
behind probability and stats. It's good for reference and surface
explanations.

I recommend Harvard stats 110 youtube videos:
[https://www.youtube.com/playlist?list=PL2SOU6wwxB0uwwH80KTQ6...](https://www.youtube.com/playlist?list=PL2SOU6wwxB0uwwH80KTQ6ht66KWxbzTIo)

These videos are more focused on probability, but they contain a lot of great
intuitions.

~~~
gautambay
Thanks, we'll be sure to take a look!

------
gautambay
We (SlideRule) launched our first “Learning Path” on Web Development on HN a
few weeks ago, to very encouraging feedback.
[https://news.ycombinator.com/item?id=7501516](https://news.ycombinator.com/item?id=7501516)

This is our second Learning Path, on Data Analysis, built by the awesome
Claudia Gold (MIT alum, self-taught data scientist, early at Airbnb). The aim
is to list helpful resources in a sequence that a beginner can follow.

Once again, we realize this is _a_ curriculum, not _the best_ curriculum. We'd
love your feedback on what we should change or add.

\------

Edit: Since we have your attention, here are some other ways in which you can
help us:

1\. Tell us which new Learning Paths you‘d like to us build.

2\. Collaborate with us to build a Learning Path on a subject where you're an
expert.

3\. Request features that will help you take better advantage of Learning
Paths.

We’re at founders@mysliderule.com

~~~
pskittle
It would be nice to have a learning path on Computer Engineering and Computer
Science which can guide a beginner all the up. Thanks for this one, will check
it out.

~~~
gautambay
Thanks! We'll try to add a full CS&E curriculum.

The closest thing we have is a beginners' web development path (not the same
thing at all, I realize, but sharing if helpful)
[http://www.mysliderule.com/courses/learning-paths/web-
develo...](http://www.mysliderule.com/courses/learning-paths/web-development-
python-django/)

~~~
pskittle
thanks, I did sign up

------
ths291
Love the idea of expert-curated learning paths.

With so many "free" learning resources online, we end up "paying" through the
mental churn and frustration of trying to separate the wheat from the chaff.
This is a great step in truly making free resources more accessible and
meaningful.

~~~
gautambay
Thanks for the feedback! That's exactly what we're hoping to do, create the
"glue" around all the great content that's out there!

One question: How important are the credentials of the "expert" to you?

~~~
andrewguenther
To me, the credentials of the "expert" are very important. They are the only
indication I have going into an online course that the person who built it has
any idea what they're talking about.

~~~
gautambay
Thanks! What are the ideal credentials that will give you confidence in the
curriculum?

~~~
andrewguenther
Extensive industry experience or a PhD in the subject, the same I would expect
out of someone teaching in a traditional university or community college.

------
Denzel
I like the curation of free educational content in a specific area because it
eliminates the guesswork, and duplicated effort, of filtering for high-quality
resources. Thanks to Claudia Gold for the amazing amount of work she put into
this. My main gripe comes with the majority of these data science
courses/tracks.

It appears that no comprehensive treatment of applied data science exists. For
the past few months, I've been searching high-and-low. I understand
collaborative filtering; I've heard about the Netflix recommendation challenge
ad nauseam; I grasp machine learning, bayesian statistics (prior, posterior,
conjugate prior distributions, etc.) on a superficial level. Conversationally,
I can hold my own with practitioners', albeit on a beginner level.

But what I, and others, want to learn is how to apply these techniques in a
scalable way on a real production system. Right now, it's easy to conjecture
about what could/should be done, but there's a lack of confidence in how to
achieve the goals. I'm experimenting with a collaborative filtering problem
using Cassandra as the data store for thumbs up/down ratings on products, and
Hadoop for the MR pipeline; it'd be great to have more visible examples
available. Is there any place I could find detailed information on real,
online machine learning/statistical inference systems?

~~~
Claud334
Thanks for your comments! I completely agree about the lack of hands-on
courses. I found the same thing when I was putting this together. The capstone
project is our attempt at including something more practical, but it's self-
directed, so that's not exactly what you are after. (Creating individual
courses was outside the scope of this project.) However, I'm confident it will
exist someday, given the current popularity of both data science and online
courses. I assume you've also done some Kaggle challenges?

I agree with the suggestion that you should attend meetups and tech talks (or
watch them online if there are none in your area). You'll hear more about real
life examples and have a chance to ask questions.

The other main way to learn what you're asking is to get a job doing it! You
have more than enough background (assuming you also have knowledge of tools)
and you will learn more from others and as you need the information.

~~~
Denzel
You know, I haven't had time to try any Kaggle challenges yet. I'll have to
sit down and attempt one this weekend. I appreciate the advice, from both you
and Brenden, I'm going to look for more data science meetups and keep my ear
to the grapevine for any exciting positions. Keep up the great work Claudia.

------
blutoot
I love the idea of expert-curated learning paths - this is so much needed with
the proliferation of all the competing MOOCs. Thank you for putting this
together.

I've noticed that there's a growing demand for performance and reliability
engineering types of roles in the tech. Can that become a learning path? The
courses for that could be: 1\. OS 2\. Computer Networks 3\. Distributed
Systems 4\. Intro to Algorithms 5\. Intro to Statistics 6\. <Some course on
best practices of general systems-level troubleshooting?> 7\. <Some course on
best practices of software debugging?>

I know it sounds almost like a full-fledged MS program in CS. But this could
be a great opportunity for those who are not enrolled in those programs but
love systems in general and would like to make a career out of it. Apologies
if this type of "learning path" makes no sense to most of the industry
insiders.

------
krrishd
Interesting. I wonder how this compares to Coursera's Data Science
specialization[0], from what it looks like they both have very similar
curriculum.

[0]:
[https://www.coursera.org/specialization/jhudatascience/1?utm...](https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage)

~~~
gautambay
We got asked this question before, and here's our analysis of the differences.

1\. Coursera focuses solely on R for Data Science. SlideRule covers additional
tools (e.g. Python​, SQL​) which a practicing data analyst will find handy. It
seems there's a bit of an R vs Python debate in the data world, so we think
it's useful for people to know both.

2\. SlideRule's path has an (optional) "intro to programming" section for
beginners. Coursera assumes some prior programming experience.

3\. Most of the courses in the SlideRule path are "self-paced", so in theory
someone studying this full-time could cover it in 4-6 weeks. Coursera has
fixed start and end dates, so the fastest one could complete the track
(accounting for interdependencies of courses) is ~24 weeks.

~~~
krrishd
Thanks for the response, that definitely makes sense. I guess it really
depends on the specific technology you want to learn and the type of learner
you are.

------
findjashua
Login page keeps redirecting me to the sign up page. There I'm told I'm about
to login to the django server (why the django server bit, just say I'm about
to login), but when I enter my email address, it says a user with that email
already exists and I should try logging in instead. The cycle continues.

~~~
gautambay
Sorry, we're working to fix it! For now, you can access the Learning Path by
dismissing the sign-up modal (X in the top-right corner).

------
orky56
It would be great to add an elective course for Growth Hacking where you can
assume the knowledge of data analysis and provide a survey/use cases of
effective examples of using analysis and other methods to inform product
development and/or design.

~~~
gautambay
Absolutely! A Growth Hacking path is on our wish list. Know anyone awesome who
could help us build it?

~~~
2mur
Ian Landsman

------
Claud334
Hi! I'm Claudia Gold, the author. Happy to answer any questions you might
have. :)

~~~
cjf4
Hi Claudia, thanks for putting this together, I've already found it very
useful.

I didn't see any linear algebra anywhere here, and from my (probably naïve)
understanding of data science, it seems to be core to a lot of the main ideas.
Do you know of any good resources in this same vein as the rest of the track?
I've been watching Coursera and EDX and it seems linear algebra offerings are
somewhat sporadic.

~~~
Claud334
Hi, Glad you're finding it helpful! The reason I didn't include linear algebra
is that it is possible to do the day-to-day work of most entry level data
science jobs without it. That said, it is great to know for a deeper
understanding and if you are writing your own machine learning algorithms.
This MIT OCW class provides a good introduction, with video lectures and
problem sets: [http://ocw.mit.edu/courses/mathematics/18-06-linear-
algebra-...](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-
spring-2010/)

There is more discussion on this topic here: [http://www.quora.com/Big-
Data/What-concepts-of-linear-algebr...](http://www.quora.com/Big-Data/What-
concepts-of-linear-algebra-should-one-master-to-be-a-good-data-scientist)

------
chiachun
I find that your "Apply to YC" path is also very interesting.
[http://www.mysliderule.com/apply-to-
Ycombinator](http://www.mysliderule.com/apply-to-Ycombinator)

~~~
gautambay
Haha thanks, that was just a fun side project when we were applying to YC S14.
:-)

------
dang
We took "Show HN" out of the title because this site had a Show HN recently:
[https://news.ycombinator.com/item?id=7501516](https://news.ycombinator.com/item?id=7501516).

~~~
gautambay
Okay.

Could we please reinstate the "built by a former Airbnb Data Scientist",
though? That's material information, in that this is not just _any_
curriculum, but one that's expert-curated. As people on this thread have
indicated [1], the credentials of the person building a Learning Path are
important.

[1]
[https://news.ycombinator.com/item?id=7816100](https://news.ycombinator.com/item?id=7816100)

~~~
dang
HN tends to take authorial information out of titles, especially when it's
promotional, as part of a general aversion to linkbait.

Anyone who clicks on the post can easily see the credentials as you've
highlighted them.

------
theop
and the Data Analysis hipe continues..

------
cornholio
"Data science" my ass. It's called statistics, econometrics and programming.

~~~
rjtavares
One thing has multiple names. Welcome to the wonderful world of human
language!

