

Data Scientist Interviews - hrb1979
http://www.datascienceweekly.org/blog/18-data-scientist-interviews-volume-1-april-2014

======
agibsonccc
Hi, I'm an adjunct instructor at the data science bootcamp Zipfian Academy[1].

It seems there's still a lot of confusion as to what a data scientist is.

Data Scientists are typically analysts who know some combination of
matlab/python/R that come up with predictive models to achieve some sort of
business objective.

This is usually related to a businesses' profit center. The work is typically
anything beyond A/B testing using basic classifiers to figure out things like
churn prediction, handling data quality, all the way to doing object
recognition.

Data Engineers typically work on the JVM/distributed systems to handle data at
scale and implement models for data scientists in production.

With respect to deep learning, I'm also the author of a java based distributed
deep learning solution called deeplearning4j[2]

I think people throw around deep learning because it's the new thing, but it
really can be more accurate.

This is due to not needing to do feature engineering[3] and feature
extraction[4]

Also, if you're interested, I will be giving talks at both hadoop summit[5]
and OSCon[6] this year around distributed deep learning if you have any
specific questions as to what all of this stuff is. Data Science is an amazing
field to be in right now.

For those of you who are interested in neural networks in general, Ersatz
holds a great meetup (videos/tech talks recorded!)

Happy to answer questions!

[1]: [http://www.zipfianacademy.com/](http://www.zipfianacademy.com/)

[2] [http://deeplearning4j.org/](http://deeplearning4j.org/)

[3]:
[http://www.cs.princeton.edu/courses/archive/spring10/cos424/...](http://www.cs.princeton.edu/courses/archive/spring10/cos424/slides/18-feat.pdf)

[4]:
[http://en.wikipedia.org/wiki/Feature_extraction](http://en.wikipedia.org/wiki/Feature_extraction)

[5]: [http://hadoopsummit.org/san-jose/speakers/](http://hadoopsummit.org/san-
jose/speakers/)

[6]:
[http://www.oscon.com/oscon2014/public/schedule/detail/33709](http://www.oscon.com/oscon2014/public/schedule/detail/33709)

[7]: [http://www.meetup.com/SF-Neural-Network-Afficianados-
Discuss...](http://www.meetup.com/SF-Neural-Network-Afficianados-Discussion-
Group/)

~~~
datasci-fi
How can you possibly teach data science in 12 weeks?

~~~
agibsonccc
Little late on the reply. It's non stop 9-6pm everyday for 3 months. This
would be you quitting your job to learn the whole data science stack (SQL all
the way through machine learning)

Since the focus is on practical/hands on, with only the needed theory,
concepts are retained enough on a practical level to become a productive
junior data scientist.

Note that our acceptance rate is also very low though. We are bringing in
people who already will have phds or some sort of a software engineering
background. The job placement is worth it though. We are seeing avg salaries
of 115-120k starting. The problems being solved are also really interesting.

------
onislandtime
Thinking people (as oppose to those dedicated to promotional activities)
should stop using the term "data scientist". All scientists are data
scientists, otherwise we would call them philosophers. Data for the sake of
data is not a science. While you are at it, please also stop using the term
"big data", (often people mean: do something with the data), if you need to
use a computer cluster and MapReduce because the data doesn't fit in your Mac,
then refer to distributed data stores and computing systems. Also, please drop
the term "deep learning" when you refer to using more compute power to run
more complex models. Thanks.

~~~
michaelochurch
"Deep learning" has an actual meaning, which is the use of neural networks
with multiple hidden layers. (Networks with one hidden layer can theoretically
approximate any mathematical function, but it's the investigation of deeper
networks, with more, that has reinvigorated neural net research over the past
few years). I'm sure it is being misused, but there is a legitimate, technical
meaning to it.

"Data scientist" seems to be a way for mathematically literate programmers to
separate themselves from the teeming masses of commoditized ScrumDrones. It
seems to mean, "this person is smart enough to deserve dibs on the most
interesting work". Perhaps it's an attempt to back to the R&D culture that
existed before biztards commoditized us and our work.

Most of the fuss around "data science" makes me think of the Fundamental
Theorem of Employment. If you're hired for a job, it's typically either (1) to
do a job the person hiring you can't do for himself or (2) to do a job he
doesn't want to do. Type-1 workers are respected and have autonomy. Type-2
workers are generally ill-regarded (because the boss thinks he can do the
worker's job). "Data Scientist" seems to be a way for a programmer to say,
"Only hire me for Type-1 work".

I can't say I'm a huge fan of the title's existence, because most companies
use "data scientist" as Biztard for "person who does watered-down machine
learning", but I suppose the current climate is an improvement over the AI
winter.

~~~
onislandtime
Yes, the latest work on NN is a breakthrough for sure. So are the advancements
in distributed computing and storage that make low-cost scalability possible.
However, we should resist getting sucked into marketing terms and buzz words.
Terms like "self-driving car" are good because they are descriptive, accurate,
and imply a paradigm shift. On the other hand, I may be wrong, for example the
term "microprocessor" seems to have trascended relative size and is used to
refer to a type of computer processor on an integrated circuit. Language
evolves but perhaps we can influence by choosing good meaningful names when we
can.

------
darkhorn
What is the difference between data scientist and statistician? Have you ever
seen data scientist, like with a PhD? Have you ever seen statistician with PhD
like
[https://statistics.wharton.upenn.edu/programs/phd/](https://statistics.wharton.upenn.edu/programs/phd/)
?

~~~
agibsonccc
I would like to add where I teach[1], our data scientists get hired at places
like tesla . Many of our students tend to be bachelors or phds. You don't need
an phd to do data science. It is more about hands on skills.

[1] [http://www.zipfianacademy.com/](http://www.zipfianacademy.com/)

Edit: Cabinpark is right and I should clarify. Statisticians typically don't
have the proper computer science fundamentals to be able to deal with the
programming required to run the right experiments. They may have an
understanding of the distributions, t tests, ... but may not be able to use
the tools out there that the broad field of data science requires.

~~~
cabinpark
That doesn't answer the question.

When I hear the term data scientist, I assume that the person has extensive
training and background in statistics and mathematical modeling. I work with
large data sets all the time as a scientist and can run all the standard
statistical tests but I would hardly consider myself a data scientist.

~~~
agibsonccc
Realistically, most "data scientists" that people hire aren't going to have
that full background.

I think the problem with a lot of data science teams out there today is the
hiring. Not a lot of people understand what the role of a data scientist
should be and they expect these people who can do hadoop end to end, extensive
CS skills, know all of the latest advanced machine learning algorithms, and
know stats like the back of their hand.

Many employers will not need or use that full pipeline, and if they do, they
are probably capable of hiring hadoop engineers as well as more traditional
analysts who are deeper in the math with the models.

That analyst is someone with a specialized background and some form of
training in scientific computing. This does not need to be a full blown
masters/phd.

Realistically, you can get away with having a decent programming background, a
clue about the landscape of the machine learning algorithms, and enough
statistics to know when you're going down a rabbit hole with respect to
research.

I think it more comes down to having the right mindset with problem solving.
Much of this is also going to be domain specific.

The term data science is ambiguous at best, as it is a new field. I think over
the next few years we'll come to see more specialized roles over time that
will help clarify the sandbox that is data science vs data engineering among
other disciplines.

------
michaelkohen
For easy reading:
[http://wayfinder.co/pathways/535d479d8760ec110089a874/15-in-...](http://wayfinder.co/pathways/535d479d8760ec110089a874/15-in-
depth-interviews-with-data-scientists)

