

Is Data Science Your Next Career? - ohjeez
http://spectrum.ieee.org/podcast/at-work/tech-careers/is-data-science-your-next-career

======
christopheraden
I, for one, am glad that IEEE is making an effort to get people excited about
a formal education in data science. My education was a disjoint combination of
CS and Statistics (my degree is formally in statistics), with no union between
the two except what I made of it. In neither CS nor Statistics did my
education formally cover problems associated with having too much data to fit
in memory or store on one hard disk.

My biggest issue with the teams of statisticians I've worked with before is
that they lack a basic understanding of computer science. My biggest complaint
dealing with the software developers on analytics projects is they don't
understand statistics. I heard a great quote for which I don't remember the
source (I paraphrase here): "A data scientist is someone who knows more
computer science than a statistician, and more statistics than a computer
scientist." The nature of the analytics world right now suggests that this
type of specialty is sorely needed in many places.

~~~
numlocked
Here's the source for the paraphrased quote:
<https://twitter.com/josh_wills/status/198093512149958656>

"Data Scientist (n.): Person who is better at statistics than any software
engineer and better at software engineering than any statistician."

~~~
reinhardt
A more cynical definition would add "... and who is worse at statistics than
any statistician and worse at software engineering than any software
engineer." ;)

~~~
dbecker
I don't think you have to be very cynical to take that view.

I'm a data scientist, and I'll readily admit that your definition describes me
well.

------
geebee
I like the new Data Science term, because it suggests people might be more
open to paying money for math. Coming from an operations research background,
I know it can be a challenge to convince people that this is worthwhile.

But really, the term to me more or less means "mathematically literate." I
know, there are _some_ techniques that seem to be specifically associated with
data science, like the analysis of large scale datasets, but many engineering
and mathematical disciplines do deal with this already.

There's a reason these jobs want someone who has a degree in... math,
statistics, computer science, operations research, physics, engineering, hell,
let's just say "or related field" and be done with it.

It's partly because these fields contribute to intersection called "data
science", but my real guess is that a degree in any of these fields means that
you're probably mathematically literate. You've done one of those majors that
requires you to take calculus of several variables, linear algebra, some
differential equations, some kind of probability and statistics, probably
write a computer program or two, and then focus on some more specific branch
in depth where you learn to model things mathematically.

A good humanities curriculum will impart knowledge, sure, but it also trains
you to read dense material, make sense of it, and express some kind of insight
about it. A good "stem" curriculum does the same thing, except with numbers
and data.

There was a time when someone could get a job by being highly literate. I see
this as a similar situation - if you're mathematically literate at a
reasonably high level, you're probably employable.

------
josh2600
I don't know.

When I think of big data, the first thing that pops into my head is Insurance
Actuarial tables, and that's not interesting to me. Are statistics suddenly
the hottest and most interesting thing in the world because we can run
experiments over larger datasets? Maybe, but I think that most engineers
capable of doing the kinds of analysis these firms want would be better suited
to harder problems.

Don't get me wrong, data analysis is important, I just wonder if the IEEE has
a duty to encourage organizations like this or if they should be trying to
influence kids back towards the "hard" engineering practices.

To be honest, as long as people are doing something that makes them happy, I'm
not one to judge, but I do think there's something to be said for attacking
things that are harder than statistics.

~~~
anonymousleaf
The impression I got from the "cs/se/stats" remark was it was the fusion of
all three disciplines. A statistician can't sit down and pull data from a
MySQL table, and certainly can't write a library to gather data on users on
the site. As someone that's basically in this exact confluence point, I can
tell you that there is very much interplay between the statistics and the
software engineering and computer science disciplines.

~~~
csirac2
Hmm. What kind of statistician these days can't pull data from a MySQL table?
Even the almost-retired people I know have to interact with data sources in
some SQL product or another.

~~~
darkxanthos
HN is not "average". False consensus effect:
<http://en.m.wikipedia.org/wiki/False-consensus_effect>

~~~
csirac2
I hardly think I'm special for posting on HN. Have you read job adverts with
"statistician" in the title? In order to do your day-job as a statistician you
need to work with data products and tools which require more than MS Office
type computer skills.

------
Sealy
To me, this looks like a fad. Let me explain.

Five years ago, talk of Business Intelligence was all the rage. It was the
'hot' new thing that companies were pouring millions into. You needed the
Analytical and statistics skills required to interpret large data-sets
efficiently whilst having enough vision to clearly cut through the noise to
deliver meaningful metrics. Technical knowledge of manipulating data using
multi-dimensional cubes and datasets is also required.

Now it seems that 'Data Science' is set to pick up where BI left off. The
fields appear very similar.

To avoid it being an oxymoron, I would clearly define the boundrys and goals
relative to similar fields... BI / Data Warehousing / Data Analyst / Database
Architecture

Disclosure: I've made a VERY good living since graduating working for
Investment Banks in BI/Data analytics. I know from experience that money in
these fields is more down to the industry you apply it to. Number crunching
payroll or scientific data, low salary. Number crunching bank regulatory or
trading data, massive money (regardless of what you call yourself).

~~~
mipmap
As someone else working in the BI space, my coworkers and myself have been
saying similar things. I feel like "Big Data" has the same feel that BI had
years ago.

Also, we're all pretty sure that the title "Data Scientist" will be applied
far too liberally. I have friends at other BI firms who are already calling
themselves data scientists because they attended a convention where the words
"Hadoop" and "Cloudera" were spoken.

~~~
Sealy
Exactly, I find there is not enough definition and way too much overlap. Even
on Kaggles front page where they show 3 examples of 'the worlds top data
scientists' it says:

Alexander Larko: -Experienced Computer Scientist & Data Miner with wide
ranging skillset

No disrespect to this man's skills but I'm sure there are hundreds of us on
here that could easily fall under that category!?!

------
avichal
I don't think a lot of people know about the Insight Fellows program yet, but
it's highly relevant here: <http://insightdatascience.com/>

They're taking people in STEM fields who are over-qualified and under paid,
and helping them transition into new careers as data scientists at top
technology companies (Google, Facebook, Square, LinkedIn, etc.). It's a really
interesting model because they're filling a big hole that universities have
right now in that there's no degree for data science. Close to 100% of their
Fellows make the transition successfully and I think the idea is something
that others are going to try to copy in the near future because it's clear
there's a supply-demand mismatch right now.

Fwiw, the company is a YC alumnus (a hard pivot from their original idea).

------
kyllo
So, universities are going to start trying to teach a person to be a
programmer (in several languages), a sys/ops admin, a statistician, a
business/systems analyst, and a DBA all at once? Good luck with that...

------
tjbiddle
I feel obligated to post, hopefully it's not unwelcome. While not in
recruiting, we're always looking for more great engineers at
Inflection(Inflection.com) - We're a big-data company and are crunching
billions of records. If you're at all interested feel free to shoot me an
email (tjbiddle at the-website-i-mentioned-above).

------
mc-lovin
My take on data science, and what makes it distinct from other fields, is that
it combines a knowledge of the business logic (i.e. software engineering) with
knowledge of statistics.

For example, a statistician might wonder exactly what a particular ID referred
to. Does it mean a person, an IP address, a single "session". They could, of
course, find this out, but the data scientist would already know this.

Similarly, a software engineer might wonder what information they need to be
collecting from the user. The data scientist knows what analysis will
ultimately be done, and so knows what information must be collected.

So data science combines statistics and software engineering, and this is
useful because it allows a holistic view of the data analysis process, from
the collection of data, to the statistical analysis of the processed data.

------
RuggeroAltair
I think that the word "Science" deserves some attention. I completely respect
the data structure side of a data scientist but more often than I'd like I
find people addressing themselves as data scientists being very good as
software engineers, but not as good as statisticians or model builders.

I may be wrong but I disagree with who says that the difference from a data
analyst and a data scientist is that the data scientist is a software
engineer.

I would say instead that the difference between a software engineer and a data
scientist is that the data scientist is a scientist that has a strong CV in
data structures and algorithms, as well as in (maybe pure) science, with
experience in statistics, math, or physics, and that knows very well how to
work with models, test hypotheses, spot patterns, anomalies etc...

------
michaelochurch
Data science seems, for now, to be what software engineering was supposed to
be: a career where you choose your own tools and problems (with some
constraints) and that gives you the freedom to move about in different
industries instead of being stuck to one in the way that most programmers now
are.

In many organizations, data scientists are full-time programmers but who get
the dibs on the most interesting projects. I identify as a data scientist as
code-word for "no-hire if the work's not interesting". There's plenty of hard
engineering (in addition to traditional data science, where statistical
intuition is more important) in data science. There are plenty of data
scientists working on OS hacks, compilers, and other "hard engineering"
topics. The difference and advantage for a data scientist is that your boss
doesn't think he could do your job if he wanted to. If your title shows that
you actually know math, you're not "just a code monkey".

~~~
kamaal
>>There are plenty of data scientists working on OS hacks, compilers, and
other "hard engineering" topics.

Can you give some examples on this? Seems very interesting.

~~~
michaelochurch
Nothing comes off the top of my head, and it's not like they get to specialize
in compilers. They mostly end up doing one-off hacks to make an existing
algorithm more performant.

What's fun about machine learning is that it touches so many other parts of
computer science. You could be at a high level writing DSLs in Clojure to make
it possible for statisticians to specify their models directly, or you could
go to the low level and write GPU code.

The general rule is that if your boss thinks he can do your job, you lose. If
he doesn't think that, you win. When you're a data scientist, your odds are
much higher of coming out in the second category.

~~~
christopheraden
To your last point, it depends on what role you currently serve to your
company. If you're among statisticians, you make yourself valuable by knowing
more about computer science and programming than the rest of the group. If
you're amongst engineers (probably more common than the former--at least on
HN), knowing probability and statistics gives you that edge.

------
southphillyman
What major differences are there between the new data science careers and what
developer's have been doing in university research departments for years now?
Is it simply a matter of scope? Transitioning from relatively small clinical
trial sets to marketing data. Is credentialing needed because there is greater
responsibility to formulate mathematical models in this path? If so how is
that different than creating domain specific algorithms... Basically I'm
trying to understand why this is seen as a unique career path as opposed to
just another pivot developers may have to adapt to if they want to stay
relevant or be on the cutting edge.

------
kyllo
"I worry that the Data Scientist role is like the mythical 'webmaster' of the
90s: master of all trades" \- Aaron Kimball, CTO at Wibidata

[http://blogs.msdn.com/b/microsoftenterpriseinsight/archive/2...](http://blogs.msdn.com/b/microsoftenterpriseinsight/archive/2013/01/31/what-
is-a-data-scientist.aspx)

------
joshwd
What really differentiates data science from econometrics, other than (I
guess) the type of data being analysed?

~~~
jjsz
I would like to know this as well.

~~~
neel8986
1) Size of data : Most econometrician work on small data set (mostly in MBs )
which they can they keep in RAM and use R and excel to analyze the data. but
modern day data scientist have to deal with GBs (sometimes TBs or even PBs) of
data..for such a large data you need multiple machine or even hundreds of
machine..So you need to be good at distributed computing and frameworks like
hadoop, hive etc

2) Visualization : such large dataset can not always be expressed in bar
charts or pie charts...so standard charting tools like excel and R dont
work..you need to have good knowledge of charting libraries like d3 or openGl
(for 3d visualization) to analyze and express their findings

4) Type of data: Econometricians are never comfortable with unstructured data
set consisting of twitter feeds and apache logs..good knowledge of machine
learning and graph algorithms are becoming very essential...Apache mahout a
machine learning framework build over hadoop is looking extremely promising

~~~
mc-lovin
I would also add that econometricians are highly focused, almost exclusively
focused in fact, on finding causal relationships.

This means that descriptive work such as clustering, dimension reduction, is
often either ignored, or considered as a kind of pre-processing before the
real work starts.

~~~
Fomite
I think this is a big one, and one of the reasons I would be uncomfortable
calling myself a "data scientist" despite meeting some of the more tool-
oriented definitions - my work has a much larger focus on attempting to infer
causality.

------
drieddust
Can anyone suggest a structured approach[books, tutorials, free online
courses] to learn data science?

~~~
mc-lovin
This is a free version of the book "Mining Massive Datasets".

Basically statistical methods that work with big datasets, which is the core
of data science.

<http://infolab.stanford.edu/~ullman/mmds.html>

~~~
drieddust
Thanks I was expecting may be a series of books, lectures, video which can
take me from novice to intermediate level in data science.

I am willing to spend 10 hours a week on this.

~~~
yahelc
This seems to fit the bill: <https://www.coursera.org/course/datasci>

~~~
muraiki
I did the first two weeks of this course and found it quite accessible,
although the second week question of implementing matrix algebra in SQL didn't
seem to have much preparatory material in the lectures. Unfortunately I've had
to drop out due to a concussion, but I think that most HN'ers would be able to
take this course.

