
Data Scientist: The Sexiest Job of the 21st Century - rmorrison
http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
======
dude_abides
Interestingly, just yesterday, I found out that Linkedin Friend Suggest uses,
among other things, co-logins from same IP address as a signal. On my test
account that I created at work, it eerily showed me all my co-workers in the
Friend Suggest list. Later, as soon as I logged in from home, it added my wife
to my Friend Suggest list.

I wonder if one of the goals of a good Data Scientist is also to be not too
accurate, lest the product create an eerie feeling among users! (remember the
Target pregnant girl incident?!)

~~~
makmanalp
Great point! Rather, it is to gather and analyse data as accurately as
possible, and then apply it as inaccurately as required :)

~~~
001sky
Definitely creepy, when travelling. I've seen it, too.

------
gaius
Heh, I wonder if back in the 60s, HBR said Business Analyst or Statistician
were the sexy jobs of the 20th century.

Because that's all a "data scientist" is... but without the experience to
realize there's already a job title for what they do.

~~~
3pt14159
gaius, I've been appreciating your comments for a long, long time now. But you
are wrong here.

There is a difference. It isn't a difference in fundamentals so much as it is
a difference in focus.

Business Analysts give reports to CEOs about customer segments or the
projected amounts of signups. They arn't even close to DS or statisticians.

Statisticians tell you about how a drug reacted with a control group or how
likely it is that a population feels a certain way given the results of a
survey or trial.

Data scientists harness data. They impact every user on a site. "Watch this
video" "Follow this user" (recommendations) or "Silently ignore this user's
impact on the algorithms that manage where this piece of content should go"
(graph analysis) or "What exactly is in this photo" (object recognition) or
"What combination of widgets leads to the maximal amount of engagement"
(optimization) or "I have this paper that I really like, show me more that are
just like it" (recommendations, document classifications, NLP).

It _is_ different. The focus is on users and what they _will_ do or _should_
do or _should_ see. To call them statisticians leads to much less
understanding of the value that DS bring. Put me in a room with an actuary
from an insurance company. Neither of us could possibly do each others jobs.
Neither of us have the others skill set.

Now, both of us could learn and get up to speed on how the other works, but a
sys admin and a web developer could swap roles more easily than an actuary and
a DS. Yet nobody is complaining that we call devs and sys admins different
titles.

~~~
timr
That's not a counterargument to his point. You're parsing job titles down to
the atom, and concluding that _"data scientist"_ is different than
_"scientist"_ is different than _"statistician"_ , is different than
_"analyst"_. Gaius is saying that this _job responsibility_ has been around
for a long time, but that people are reaching to find reasons to give it a new
name -- exactly what you're doing.

If you ask me, the phrase "data scientist" is recruiter-speak. I have all of
the skills required of a "data scientist". I've done the job of a "data
scientist". And other than object recognition, I've developed all of the
different product features you mention in your comment. You know how I got the
skills necessary to _do_ those things? I was trained as a _scientist_ , and
there's no such thing as a scientist without data. A person properly trained
to analyze data should be able to effectively and fluidly transfer those
skills between domains -- otherwise, they're not actually good at it. There's
nothing special about internet products that precludes competent people from
doing effective data mining on their logs.

I suspect that the real problem here is that "data science" is Internet
Hipster for: _"someone who has already worked at an internet company, and
knows some statistics"._ Because when it comes right down to it, your average
statistician, chemist or physicist is more skilled at data analysis than 99.9%
of the "data scientist" types you meet, but they don't easily press the
comfort button for hiring managers at consumer internet companies. Why hire
the "risky" ex-scientist, when you can hire the guy who claims to be a
designer, a software engineer _and_ a statistician?

~~~
gruseom
Now that you mention it, "data scientist" is almost a tautology.

~~~
gammarator
Tell that to the string theorists.

------
carlsednaoui
For those that prefer to read the article in one single page:
[http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-
the...](http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-
century/ar/pr)

~~~
javert
IMHO HN stories should always be posted in this format.

~~~
tzs
In general, that is objectively bad, although for this particular site it is
not as bad as it could be.

Here are the three general problems with submitting print views:

1\. For most sites, the print view results in a small font and lines that
extend all the way across the page. This makes them hard to read. Sometimes,
on a desktop, with a bit of fiddling they can actually be made legible to
those of us who are older than 40. On mobile, they are often simply not
possible for many of us to read.

This particular site is OK in this regard, as they appear to have actually set
the line width and the font size so that it comes out reasonable on the
screen. In fact, their print view is quite pleasant to read.

2\. The print view often omits comments, sidebar links to related stories,
links for sharing, and so on. Some people actually might want to use those.

3\. There is often no evident link from the print view back to the normal
view. Sometimes you can figure it out by playing with the URL, but sometimes
the relationship between the print URL and the normal URL is hard to figure
out if all you have is the print URL to work with. Note that the normal page,
on the other hand, does generally have a link to the print page, so those who
prefer the print page can easily go to it.

For these reasons, in almost all cases the submission should be to the normal
page, not the print page. Ideally, the submitter can add a comment that gives
the print URL to save time for those who do prefer it.

Note that some sites have an "all on one page" option, that puts the whole
thing on one page, but leaves comments, social links, and such. That's the
best to use if available.

------
tryitnow
There's a recommendation I give to people who are writing their online
personals ad: If you're sexy, there's no reason to say that you're sexy.

I think the same applies here.

A data scientist is a fancy way of saying a "statistician who can code (should
be required in stats programs now anyhow) and who can communicate effectively"

~~~
gaius
I'd be surprised if it was even possible to graduate in stats these days and
not know R at least, and probably NumPy too.

~~~
majormajor
This was ISyE, not stats, and it was 5 years ago, but I was _amazed_ by how
much extra work some people would do to avoid having to learn anything but
Excel (meanwhile, I was messing around with R and whipping programs up to get
better results in less time). This was at a highly ranked engineering program,
too.

Based on a few people I've kept in touch with, it seems like it hasn't changed
all that much at the undergrad level. The grad level was where the problem
sizes and difficulty really forced you to use better tools.

~~~
crntaylor
In case anyone else is wondering, ISyE == Industrial and Systems Engineering.

------
jwoah12
I love the fact that this was posted a half hour after this:
<http://d.gould.in/blog/2012/09/18/your-job-is-not-sexy/>

~~~
donretag
It was actually posted much sooner that that, but gained no traction:
<http://news.ycombinator.com/item?id=4542383>

------
bearmf
I remember reading at least 10 articles with nearly the same content during
the year. Why are authors so eager to convince everyone of big data's
sexiness? Results should speak for themselves. So far Linkedin's Friend
Suggest is one of the biggest success stories.

~~~
rm999
As a "data scientist" I found this article had much more meaningful content
than most articles I've read in the past year. It's not just repeating how
data science will be big in the next decade, it discusses who data scientists
are and how to hire them.

>So far Linkedin's Friend Suggest is one of the biggest success stories.

I don't agree with this. Google is basically a big data sciences company.
'Data science' may be a new term, but it describes something companies have
been doing for decades.

~~~
nostrademons
Yeah, big data crunching pervades basically everything Google does. I joined
Google as a UI SWE (basically a webdev), and find that most of my daily work
nowadays involves processing large amount of data to come up with new
features. I suppose I made a conscious effort to move back in the stack to
more algorithmic back-end work, but even if you stick with UI work, the launch
process is so data-driven that you almost need to have a basic familiarity
with statistics & data processing.

------
elchief
I teach data mining at a top grad school, and am a data scientist at a
startup.

I got one call from a recruiter who thought I was in a different city. Ain't
so sexy from where I'm sitting.

~~~
binarysolo
You probably just need better buzzwords (and ideally the background to back it
up) -- NoSQL, big data, MongoDB, Hadoop, etc.

I consulted for a client that used those technologies, updated my LinkedIn
profile afterwards, and the amount of incoming requests from recruiters and
principals has been nothing short of phenomenal. (Anecdotally, 20 InMails in
10 days, of which 14 of them converted into a phone interview with the
principal.)

~~~
baltcode
> You probably just need better buzzwords (and ideally the background to back
> it up) -- NoSQL, big data, MongoDB, Hadoop, etc.

Are there as many data scientists who don't work on Big Data?

~~~
binarysolo
To be pretty honest, prior to my life as a data scientist (and grad school) I
was a business analyst. We mined data and threw 10M-100M entries into a MySQL
database w/ Rails dashboard and for our non-RT analysis purposes it was
tolerable.

There are plenty of data problems out there already warehoused by small-cap
and mid-cap firms; I honestly don't see a need to go Web-Scale and all that
jazz for its own sake if your use case doesn't need it. There's also shortcuts
like sampling to kick the can down the road, but that's another discussion in
and of itself.

~~~
_delirium
I think the keyword "big data" ends up being used in even a lot of smaller
cases, because _everyone_ thinks what they have is "big data", I'm guessing
because they do all genuinely have much more data than they might have a
decade ago. But that still varies widely in size; what some companies think is
"big data" is still perfectly analyzable, for non-realtime purposes, on one
beefy workstation. Yet, because they'd never seen data with _tens of millions
of rows!_ before, and it breaks whatever system they were previously using to
analyze stuff (SPSS, etc.), what they want to hire is a "big data" person.

~~~
baltcode
I agree. I wonder if it is possible to get hired as a data scientist as easily
if you haven't worked on big data before.

Or, I guess programmers and engineers could start using the big data tools
even though they are not needed. Has anyone ran Hadoop on a single (multi-
core) machine for this purpose?

~~~
keefe
I doubt it, simply because it's so easy to find big datasets to work on. It
doesn't have to be professional, that's the nice thing about a data driven
profession.

Check out <http://www.kdnuggets.com/> for links to large data sets to work on
and there are also some on amazon.

Also, yes you can certainly run hadoop on a single instance, but once you get
into "real big" sizes you'll need a cluster to demonstrate expertise, be it on
your local machines at your house or on a set of VPS or EC2 or whatever.

------
hcarvalhoalves
Let's keep reinventing job titles to pretend they are new and sexy.

\- Business Analyst: Data Scientist \- Systems Analyst: Growth Hacker \-
Public Relations: Social Media Evangelist

What else?

------
suyash
ASK HN? : I'm little confused, can someone please shed some light into this so
we can all get a clearer picture. What is the difference between Data
Scientist vs Big Data Expert vs Analytics Engineer (Statistics, metrics etc)
vs Hadoop Architect vs Machine Learning Expert ? Thanks a lot HN people!

~~~
rm999
Every data scientist has to:

* be very good at working with large datasets with computational tools (hadoop is an example)

* be a decent programmer, scripter, and hacker

* have a decent background in statistics

A good data scientist:

* has a good intuition and business sense

* can explain insights to non-technical people (usually through visualization and plotting)

* knows machine learning and predictive analytics

It's a vague term, but purposefully so. There's tons of stuff you can do with
data, a data scientist knows what to do and how to do it.

~~~
suyash
Thanks rm999 :)

------
mmcdan
The Insight Data Science fellows program looks awesome, but it is
disappointing that only phd candidates and post-docs can apply. There is some
irony with the fact that the cover of their brochure uses the facebook
friendship visualization done by Paul Butler, who was an undergraduate intern
at facebook when he made it.

------
jboggan
I've found that a lot of companies are looking for data scientists but many of
them have very different ideas of what that means. This makes for some
interesting interviews.

I recently moved to SF and am currently interviewing for data science
positions - particularly ones involving social networks and applied graph
theory - so drop me a line if you know anyone who is dealing with that problem
space.

~~~
binarysolo
Just checked out your LI profile (fellow data science guy here) -- I think you
basically need a bit more work experience or some github code to show yourself
off. The big data guys like Google who have best practices, brand, and provide
great onboarding should be your focus IMHO.

~~~
tejaswiy
Quick question: What do you classify as work ex? I do mostly iOS programming,
but I've been playing with Hadoop + the commoncrawl.org crawl data. Basically,
I guess, what level of stats do you need to be comfortable with to call
yourself a data scientist?

~~~
ahuibers
Following Gladwell's 10000 hour rule, I would say you could probably call
yourself a data science after 1000+ hours experience working with datasets
successfully. As far as the math goes you should be able to do regression
analysis, you don't need to know tons of stats but you do need to know stats
and probability essentials (first few classes at a good school) deeply. I like
this Wikipedia entry on "mathematical maturity":
<http://en.wikipedia.org/wiki/Mathematical_maturity>; apart from writing
proofs, it is very relevant.

------
bitwize
Sorry, I don't think data science is going to topple the quadfecta of
sexiness: porn star, rock star, sports star, and movie star.

------
philip1209
Question for HN: I'm graduating this Spring with majors in Systems Engineering
and Physics, and I want to work as a data scientist, preferably at a startup.
What can I do to position myself for such a job? If any of you work in the
field and are willing to provide some 1-on-1 advice, please shoot me an email
- mail@philipithomas.com

~~~
3pt14159
1\. Know programming.

2\. Be smart with an eye for economics (there is way more overlap than people
give it credit for).

3\. Start by talking to people and telling them what you want to do. Most
founders want to help people reach their dream.

If you have a github account, email me and maybe you can start with us over at
500px here in Toronto.

~~~
jvm
What's the Toronto scene like? I'm finishing up a PhD at NYU this year and was
planning on breaking in to the field after graduating. NYC is obviously a
great place to be but for relationship reasons I was thinking of moving to
Toronto (which is honestly a nicer city anyway much as I love NY), but a
cursory inspection suggests a lot less demand for data-loving jobs. I would
love to be mistaken though; am I?

------
nachteilig
I know I should be excited for the positive innovations data science will
bring us, but am I alone in mostly still finding it creepy?

