
The Open Source Data Science Masters - nns
http://datasciencemasters.org/
======
neilsharma
Good collection, but as someone who has been slowly learning data science over
the past few years, I think it needs far fewer lectures and waaaay more
projects.

The biggest difficulty I have with learning data science is not how the
algorithm or tools work, but the problem setup. Where is the data? How do I
clean it? What insights can I draw from this? Which algorithms to use? What
can I do with the algorithm assuming it works?

Most MOOC projects decide all this for you by giving you a set of tasks to do
in order and skeleton code to work off of. Your job is simply to implement a
small part of whatever algorithm you learned that week and press run. This way
lacks creative development, exploration, trial and error, and critical
thinking skills necessary when you go out in the real world.

Also, I think there should be more emphasis on publishing, even if your
attempts are inaccurate. Push out a jupyter notebook to github of how you
tested out a rudimentary monte carlo simulation on stock data. Or write a blog
post with your attempt at determining how much silicon valley home prices will
drop if 10K more family units magically existed in SF. Or try to code a random
forest algorithm from scratch in a language of your choice. You don't have to
be right, but publishing forces you to at least take a critical look at your
work and think about the material deeply. MOOCs, at least from my experience,
just encourage you to move on to the next topic the moment your code works,
without diving too deeply.

~~~
shanusmagnus
Agree. I'll add that while it's valuable to compile a giant list of topic
areas, and references and resources for those topic areas, working through all
the stuff on this list would take years to do in a non-trivial way.

What I really prize are curricula that err in the other direction; something
like: here are the handful of foundational topic areas you _have_ to know
about, and the pieces of those areas that will give you the absolute minimum.
But taken collectively, you will then be able to make a beginning; and be able
to engage with other, more advanced, resources, as the need arises, and as the
sophistication of your projects require.

It's hard (at least for me) to know which subset of knowledge is required to
make a beginning. That's where I need help.

~~~
neilsharma
That's true -- there's definitely a bit of knowledge needed to get started,
but most of that can probably be taught in 2-3 classes. I think the first
problem in learning data science is coming up with a "Foundations" curriculum:

\- Learn a relevant programming language (R, Python) + tools (ipython,
anaconda, etc)

\- Basic linear algebra (nothing more complex than multiplying matrices) and
calculus (what are derivatives and integrals)

\- Intro to statistics (just to know the vocabulary -- covariance,
correlation, standard error, etc.)

\- Rough overview of Machine Learning / AI as a whole and where its used in
the world today

After that comes the second problem: "What do I do next?"

\- coming up with several interesting _but manageably small_ projects

\- getting data for these projects

\- access to quality advanced resources that can be consumed as needed while
working on the project. A 3-month long MOOC on Neural Networks is an
impractical resource. A well-written blog post (with code) or youtube video is
far better.

MOOCs + textbooks seem to do better for the first problem if you can sift
through all the noise, but fail at the second.

------
randcraw
That's a nice overview of autodidact resources for DS.

But I suggest that you tweak the name a little, like "The Open DS Masters
Program" or "Toward OSDS Mastery".

"OSDS Masters" sounds like a plural noun, like you're trying to say, "at this
website you can find the great open source masters of data science" \-- like
Richard Stallmann or the authors of Weka. It's a bit confusing.

~~~
cholantesh
I presumed the 'Masters' was akin to the usage in "master's degree'

~~~
j_s
If so then they should add the word 'Degree' everywhere.

~~~
dragonwriter
Or, just eliminate the subtitle and merge the _one_ important word in the
subtitle that isn't in the subtitle, and call it "The Open Source Data Science
Master's Curriculum", which has the advantage that while it _invokes_ the idea
of a curriculum of the level of a Master's Degree, doesn't falsely present
itself as an offering an actual _degree_ , which it is not.

~~~
cholantesh
I don't think that's the intent, but I can see how it would give such an
impression. That alone should be reason to change the title, though.

------
jmde
This seems like a nice compilation for introductory material in one place.

I still can't get over the term "data science", though. Not only is it
ridiculously meaningless - what sort of science doesn't involve data, and how
often would data be useful to something that isn't scientific at some level -
its meaninglessness derives from the hyped buzzword trendiness that drove its
upswing.

I say this as someone whose expertise is really sitting at the nexus of what
would be considered data science. I feel as if I have been doing what might be
considered data science for a long time, before there was a label for it, but
watching its ascendance in demand and popularity has been troubling. I should
be happy, but I feel like it's being driven by fashion rather than
fundamentals, which makes me worried about the trajectory going forward, and
disturbed by some communities being thrown under the bus.

~~~
ende
>I still can't get over the term "data science", though. Not only is it
ridiculously meaningless - what sort of science doesn't involve data, and how
often would data be useful to something that isn't scientific at some level -
its meaninglessness derives from the hyped buzzword trendiness that drove its
upswing.

Out of curiosity, how do you feel about the word 'computer science'?

~~~
jmde
That's an interesting question - I agree it's an interesting parallel and one
I hadn't thought of before.

I have always been puzzled by the term "computer science" a bit also, because
so much of it isn't really science per se (more math or theory along with
engineering). When I've thought about it, I usually come to some peace with it
because there _is_ a scientific aspect to the field via the hardware side of
things, which is really the foundation, at least historically, and there is a
historical emphasis on demonstrating results empirically. It's sort of a crude
awkward label but I accept it. But then again I went to a school where/when
comp sci and EE were the same department.

"Data science" has bothered me more, though, because it's so vague, "data" and
"science" are so inextricably defined relative to one another, and because
it's arguably misleading - it's not really the science of data, whatever that
means, and to the extent it's science, it's just science, but it's not, it's
really just statistics.

More appropriate terms to me would be "computational statistics" or
"statistical computing", "informatics", or "quantitative computation" or
something. Anything but "data science." It's like some stereotypically
ignorant but buzzword-compliant management committee, being unfamiliar with
data or science, somewhere commanded HR to "find us some of those... you
know... data science people!"

... and now venerable universities have whole departments with that title.

~~~
cwyers
> it's not really the science of data

How isn't it?

------
Notre1
Clare Corthell, the creator of the Open Source Data Science Masters project,
is interviewed in the 2016-07-30 episode of This Week in Machine Learning & AI
(TWiML):

[https://twimlai.com/twiml-talk-1-clare-corthell-open-
source-...](https://twimlai.com/twiml-talk-1-clare-corthell-open-source-data-
science-masters-hybrid-ai-algorithmic-ethics/)

------
Rogerh91
I really like this collection of resources--it's perfect for people really
trying to get into the basics of data science.

