

The Forgotten Job of a Data Scientist: Editing - rouli
http://www.john-foreman.com/1/post/2014/05/the-forgotten-job-of-a-data-scientist-editing.html

======
novum
Data Scientist (noun): A statistician who lives in San Francisco.

(only half joking)

~~~
michaelochurch
I feel like "data scientist" is a title that grew out of the Fundamental
Theorem of Employment, which states that you're usually hired to do a job that
either (1) the boss man can't do for himself, or (2) the boss _doesn 't want_
to do. Type 1 work gets you respect and autonomy. Type 2 work will have you
commoditized.

Software companies are satisfied with the job they've done at commoditizing
programming talent but, at least for now, having a half-decent grasp of any
specialty (e.g. machine learning, information retrieval) requiring
mathematical firepower puts one solidly into Type-1 employment, which is where
one wants to be.

"Data scientist" seems to be a way of saying, "yes, I code but I also know
math, so use me for Type-1 work only".

~~~
eshvk
> "Data scientist" seems to be a way of saying, "yes, I code but I also know
> math, so use me for Type-1 work only".

You make it sound like a bad thing? Despite the rah-rah I hear from
programmers about how they are unique snowflakes, being only a programmer is
like being a janitor. A prime way to get discarded at the age of 40. If I can
make sure that I am valuable because I bring other things to the table (Math,
Product vision, people skills), why on earth wouldn't I rebrand myself to
better reflect that?

~~~
michaelochurch
_You make it sound like a bad thing?_

Not at all. That's my attitude as well. I don't want to waste my life on
Type-2 work.

"I only want to do interesting work" _sounds_ entitled after being conditioned
by corporate mediocrity, but I think it's a reasonable attitude. Companies
frown on self-assertion, preferring agreeable mediocrity, and I hate that. I
tend to be honest about things.

You can't say, "I leave bosses and companies that assign me crappy work" on a
job interview. I wish people _could_ be honest about such things, but it's
just not socially acceptable to speak the truth about anything that matters
(e.g. politics, religion, sex, money, power, careers). On HN, I try to be as
honest as I can be. Sorry if it comes off as obnoxious.

 _Despite the rah-rah I hear from programmers about how they are unique
snowflakes, being only a programmer is like being a janitor. A prime way to
get discarded at the age of 40._

Agree.

 _If I can make sure that I am valuable because I bring other things to the
table (Math, Product vision, people skills), why on earth wouldn 't I rebrand
myself to better reflect that?_

That's absolutely what you should be doing. If it's not obvious, I'm on the
same side with people who say "I know math, so use me for Type-1 work only". I
am one of them.

The reason the job distinction is toxic in many companies, however, is that
software engineering should _also_ be respected rather than commoditized. To
me, the rush of people like you and me to get "superior" titles on our resumes
is a sign that the business world doesn't respect "regular old" software
engineering. That sucks, because the skills of a truly good software engineer
are also quite important.

~~~
eshvk
> software engineering should also be respected rather than commoditized.

I am not so sure anymore. When anyone who has done six weeks in a boot camp
can call themselves a software engineer, the semantics of the word are lost.

> That sucks, because the skills of a truly good software engineer are also
> quite important.

The best people? They are valued and known to be valuable. For example, there
is a guy in my company who works remote from the midwest. He is truly amazing.
When he interviewed me, I quickly got the feeling that his machine learning
skills were top notch. BUT, when I work with him, I realize that he is truly
phenomenal. He can write code up and down the abstraction ladder. Good, solid
fucking code. Hell, he can double up as an SRE and fix shit when he wants to.
Sure, he is a "machine learning engineer". But he is much much more than just
that title.

------
michaelochurch
"Data scientist" is a mess of a job title. It seems to be as much of a
reaction against the commoditization of software engineering (which leaves the
smartest, and by correlation, usually the most mathematically literate, 10% of
programmers ill-suited for the average software job) as it is a real
distinction.

There are plenty of "data scientists" who use canned tools and play around
with parameters because that's all "the business" thinks it needs.

You want to trim complexity for a reason that any data scientist worth his
salt (and there are plenty of celebrity engineers in SF making $500k who
aren't worth their salt and don't know this) should already know: bias-
variance tradeoff (see also: underfitting and overfitting). If your model is
too flexible/complex, it will begin absorbing noise. That leads to a model
that performs extremely well on training data but fails miserably on unseen
data. There are well-studied techniques for preventing this, but I'd guess
that fewer than 20% of self-described or titled "data scientists" are familiar
with them.

~~~
eshvk
> There are plenty of "data scientists" who use canned tools and play around
> with parameters because that's all "the business" thinks it needs.

As with a software engineer, it is a role that is different in every place.
Every place has its own definition of the role. This is not bad. It is a mere
reflection of the market conditions where there are a lot of people are
simultaneously bad at Linear Algebra, Probability and Statistics and dangerous
enough to write production code fast. (Your standard C.S. grad SWE).

