
How to Become a Data Scientist – On Your Own (2015) - rbanffy
http://www.datasciencecentral.com/profiles/blogs/how-to-become-a-data-scientist-for-free
======
classybull
Becoming a data scientist on your own is exceedingly difficult because,
despite their purported adherence to objective data above all else, the
practice of data science is full of people who consistently appeal to
authority via educational credentials. You can see it in this thread. They
regularly make the mistake of thinking that because the skills necessary to be
successful in the field correlate highly with advanced degrees that means that
only people with advanced degrees should be able to participate in it. They
generally make it very difficult to objectively evaluate an individual's
skills because their injection of bias into the candidate evaluation process.

Its regressive and completely out of step with the supposed meritocracy we
like to think we follow in tech. Its also the path towards cartels. I get the
feeling a large portion of data scientists would like to create the American
Data Scientist Association, with credentials and bar tests.

~~~
spangry
Yeah I kinda get that feel as well. The thing that makes me suspicious is the
amount of unnecessary and obfuscating jargon that gets thrown about. They've
even invented new jargon to replace perfectly confusing old jargon (e.g. your
model's "error residual" is now your "function cost"). I've spent the past
couple of weeks doing a bit of ML vision stuff. Most of the terminology was
lost on me, at least until today when I discovered "Machine Learning is Fun":
[https://medium.com/@ageitgey/machine-learning-is-
fun-80ea3ec...](https://medium.com/@ageitgey/machine-learning-is-
fun-80ea3ec3c471)

I think I learned more in an hour than I did in the past week, thanks to this
series. The author actually bothers to explain concepts (that turn out to be
fairly simple btw) like 'gradient descent'. Highly recommended read if you
have the time and interest. Just to whet your appetite:

 _...current machine learning algorithms aren’t that good yet — they only work
when focused a very specific, limited problem. Maybe a better definition for
“learning” in this case is “figuring out an equation to solve a specific
problem based on some example data”._

 _Unfortunately “Machine Figuring out an equation to solve a specific problem
based on some example data” isn’t really a great name. So we ended up with
“Machine Learning” instead._

~~~
nerdponx
First of all, errors and residuals are different things.

Second, "cost" is not new jargon. What is relatively new is thinking about
probabilistic modeling in terms of abstracted cost functions, but only
relatively.

There are dozens of tutorials, courses, etc that are clear and don't introduce
unnecessary jargon. Nobody is trying to keep you out of data science.

As for the term "machine learning," it's because what we today call ML gree
out of actual AI research. It so happened that a lot of progress was made very
quickly by the ML researchers, so the ML-oriented terms became popular as some
older statistics terms were subsumed.

------
dhawalhs
We (Class Central) have been working on a Wirecutter-style guide on Data
Science. Instead of presenting a list of resources, to try to recommend the
best resource (mostly a MOOC).

Its a six part series, and so far on the first two parts have been published:

Part 1: The Best Intro to Programming Courses for Data Science [1]

Part 2: The Best Statistics & Probability Courses for Data Science [2]

Any feedback would be appreciated.

[1] [https://www.class-central.com/report/best-programming-
course...](https://www.class-central.com/report/best-programming-courses-data-
science/)

[2] [https://www.class-central.com/report/best-statistics-
probabi...](https://www.class-central.com/report/best-statistics-probability-
courses-data-science/)

~~~
jxm262
Welp.. you just implemented one of my 100's of startup ideas. And by the looks
of it, you did a pretty good job too :) So, kudos the site looks awesome and I
think will definitely fill a need people have.

------
iaw
I think the author of this post is unintentionally misleading. Becoming a data
scientist is not a passive activity that can be taught solely through
coursework, the only time the author mentions real-world applications is in
the point mentioning competitions.

Every data scientist that's worth anything has either done a PhD or would be
capable of doing a PhD, the distinguishing characteristics between PhD's and
standard coursework is the incremental effort navigating uncertainty.

In the end, Data Science entails a great deal of uncertainty that makes most
people uncomfortable.

------
__strisk
disclaimer: you may need to have a masters in CS or Statistics to be taken
seriously. For every success story you hear of someone "doing it on their
own", scrutinize it enough and you'll see that they had either a decent
educational background or support from a career facilitator (bootcamps).

~~~
iaw
Any hard Engineering branch, Mathematics, and some of the more rigorous
Biology stuff will do as well.

I honestly do not understand why there appears to be so much desire to get
into Data Science when becoming a Programmer is equally lucrative and
substantially easier to bootstrap into.

Edit: Seriously, programmers make as much if not more than Data Scientists for
what ends up being substantially less stressful work (all things being equal).
I suspect if the people pursuing DS actually ended up doing the work and
living with the responsibilities they'll regret their time investments.

~~~
autokad
there is no 'programmer' role in the c-suite, but there are chief data
scientists. data scientists have the ear of top management, and have direct
interaction where that is rarely true with programmers, which leads to...

there isn't a silicon ceiling on a ds pay like there is with programmers, and
I disagree they are equally lucrative. I have never seen ds roles that were
not substantially paid more than programmers; although with the explosion of
the ds role, there are plenty of sub-par ds positions out there. (according to
glass door, the average programmer makes 70k, the average ds makes 128 in san
francisco). That disparity even holds for large tech like facebook.

as far as 'less stress', I believe that is subjective. some people would like
to program, others more ds stuff, and often ds and programmers get to do a
little of both.

~~~
dsacco
_> there is no 'programmer' role in the c-suite, but there are chief data
scientists. data scientists have the ear of top management, and have direct
interaction where that is rarely true with programmers, which leads to..._

After working closely with dozens of tech companies, I have to say I've never
seen a single "Chief Data Scientist." I also can't say I've even heard of a
single company that has one (I'm sure _some_ exist though). I _have_ seen a
Chief Technology Officer in virtually every tech company, which is essentially
"programmer role in the c-suite" for the purposes of this discussion.

Furthermore, in the companies I've worked with that had in-house data
scientists, they always treated them less well than the software engineers
developing products.

I guess what I'm trying to say is that your statements don't match my
experience, or the experience of anyone I personally know in this industry,
and I'd be interested to see where your experience is coming from.

~~~
battlebot
I've never seen a CTO write any code. That's not their job.

~~~
dsacco
I anticipated a comment like this one, which is why I explicitly said "for the
purposes of this discussion."

You're right, a CTO doesn't usually write software, a CTO manages programmers
who write software (or VPs managing teams of programmers, etc). But a CTO
generally comes from a coding background, and how much data science do you
think a "chief data scientist" is really doing, as opposed to managing other
data scientists? People in the C-Suite typically don't really do anything
other than manage people managing others in the same background they came
from.

I think the spirit of my point still stands, pedantry aside. There clearly
exists a commonly used and recognized c-suite role for programmers, whether
they use their programming ability hands on or in managing others. It's not at
all clear to me that there is a commonly used nor well recognized c-suite role
for data scientists that would be distinct from _CTO._

As a category of employee and work division, data scientists have not yet
become distinct enough from cross-polinated disciplines to have that sort of
representation.

~~~
tjl
I know that at least one former Amazon CTO was an excellent coder, although he
didn't really do much (if any) while as CTO.

I don't know of many companies that have a "Chief Data Scientist" that reports
directly to the CEO. In all honesty, they're more likely to report to a CTO.

Also, there's a reason why the C-suite people have the word "Officer" in their
title as they're officers of the corporation and that implies additional legal
responsibilities. It's not necessary that it be in their official title, but
it typically is.

------
0xfaded
And ... where does it say learn maths?

------
autokad
this article has a great list of people to follow on twitter, if anyone has
ones to add be sure to post below =)

Olson is a great one, especially in GIS here is my contribution:
[https://twitter.com/randal_olson](https://twitter.com/randal_olson)

------
battlebot
I'm going to give you all a small dose of reality regarding Data Science. Are
you ready? It's being hyped to the max to sell courses, books, seminars, and
what have you. So far, the number of data science jobs available in most
cities is at least for now not in line with the MASSIVE hype taking place.

I would love to have a job doing Data Science: I have a PhD in a relevant
field so I recently pushed down hard on this area. I took the Coursera course,
I'm learning all the various Python libraries, I learned R, and do anything I
can every day to pick up a skill here or there. I even have a "Kaggle"
account. What I don't have are job leads because there aren't actually that
many jobs and the ones that do exist say "data science" but really mean other
things.

------
kapauldo
To illustrate how ridiculous and self serving this is, replace the word "data"
with "cancer." You cannot become a "scientst" by watching ted videos. At a
minimum, it requires a credentialed degree. Otherwise, I'm a data scientist
too.

~~~
tma-1
Do you need a degree to become a computer scientist / programmer?

~~~
brogrammernot
Hotly debated subject these days.

I personally learned programming on my own, and after about two years of doing
it, I went back and started taking some computer science courses in data
structures, discrete mathematics, algorithms as well as some other topics. I
took some coursework through the University I got my undergrad from but most
through local community colleges because they were 1/10th of the cost.

In my experience, I do not think you need a degree to be a programmer. You
need to have extreme grit and motivation to learn it on your own.

I took the coursework after doing it because trying to learn advanced computer
science topics on top of work in my own time simply wasn't working. It's not
incredibly fun to learn, dissect and implement algorithms. At least for me it
wasn't. Having no one to ask about advanced mathematics also sucked honestly.
For those reasons, a quality education or professor is worth their weight in
gold.

~~~
battlebot
As someone who came up through universities with the full traditional CS
background, and as someone who has hired and been a tech lead over many
developers, I can count only one person I know who didn't get a degree who is
a great developer. The people with degrees all had to learn a lot after
school, as did I, but the one who is self-taught is some kind of savant, I kid
you not. And as great a developer as he is, he had some holes in his knowledge
that I ran across from time to time.

