
"the sexy job in the next 10 years will be statisticians" - rbxbx
http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=1
======
kazuya
Rather, domain experts armed with statistics.

------
falsestprophet
There is a difference between in demand and _sexy_.

~~~
gaius
Money is always sexy.

~~~
dtby
This mercenary, puerile response is why I've given up on HN as a source of
interesting tech news.

Or, rather, why I'll give it up as soon as Gabe's site (or something similar)
takes off. Even if I have to write it myself.

~~~
gaius
I think you misunderstand the point I am making. Which is that if it is in
demand, then all else will follow. As it hasn't, then it's probably not.

~~~
dtby
I admit that your point was very well hidden.

It probably helps that I am rich while (as one might expect) most HN users are
not.

------
lunchbox
Does this jibe with what HNers are seeing in their fields? What other
quantitative skills do you perceive to have booming demand?

~~~
patio11
Analytics, web and otherwise.

~~~
illumin8
Right on. I work in healthcare IT and our most revered department is clinical
analytics. These people are not technically savvy, but tools like Cognos can
turn anyone with some basic math skills into a data analyst. You can literally
drag and drop fields from any SQL database and make pivot tables out of them
and generate reports. Of course there is a huge difference between someone
that can drag and drop fields and someone that knows enough about the
underlying data to actually generate good analysis.

~~~
stevenbedrick
+1, especially to your last sentence. That right there is my biggest medical
informatics pet peeve- a lot of hospitals and clinics have "analysts" who are
good at using the tools, and maybe even have a good grasp of the data schema
of their repositories... but have very limited clinical knowledge, and even
more limited knowledge of the workflows that generated their data in the first
place.

So, what happens is that they generate some numbers without fully
understanding the "story" behind them. For example, they might get a request
for data concerning the frequency with which patients with condition "X" are
treated at their hospital. In most EHR systems, the way to answer questions
like this is by using ICD codes... however, there is rarely a 1:1 relationship
between what we might think of as a "diagnosis" and a code. Depending on how
it's defined, even something seemingly simple such as "Asthma" might be
represented in an EHR by (for example) a dozen different codes, and which
codes are used can depend heavily on a wide variety of factors: how the EHR's
designers implemented the diagnosis system and user interface, how the
clinicians were trained to use the system, specifics of how the patient's
symptoms presented themselves, how the billing department coded the
clinicians' diagnoses, the phase of the moon, etc. etc. etc. As a result,
instead of a simple query ("find all patients with ICD code A"), the query
ends up looking like "find all patients with codes A, B, C, D, E .... or J; or
code K, but only if it co-occurs with L, M, or N; or code O, if the patient
was seen in clinic number 4 or 5 between such-and-such dates; etc. etc. etc."
And that's for a simple and straightforward clinical question. Imagine if it
was something more complex, like "how many patients with condition X also
develop condition Y after having treatment Z".

Coming up with a query like that takes significant clinical knowledge, but,
more importantly, it requires intimate knowledge of the organization that
created the data in the first place. It also requires some pretty serious
"people skills"- the clinicians that the analyst will be working with to
formulate the question will know virtually nothing about computers or
databases, and so it will fall on the analyst to work with the clinicians to
elucidate the implications and edge cases of the original question. It's kind
of like being a detective. This, by the way, is a big part of why it's so hard
to get good--- as in, reliable, valid, and comparable--- quality measures from
large health care organizations. The data's often way more complex and
ambiguous than novices realize, and (speaking from personal experience, here)
it often takes people who come from non-clinical backgrounds and are used to
more straightforward analytical questions quite a while to realize just how
far down the rabbit hole they've gone. What might seem like

Of course the fun doesn't stop once our analyst has finally generated some
numbers. Whoever wanted the numbers in the first place usually doesn't think
much about where they came from (cf. "automation bias"), and as such can go on
to make ill-informed decisions as a result of some subtle mistake in the data
(i.e., unbeknownst to anybody, the analyst's query missed a whole block of
patients coming from a particular clinic, thereby underestimating the
prevalence estimates of asthma). This is doubly true when the consumers of the
data are generic statisticians (as in, not specialist biostatisticians who are
experienced in clinical data analysis) The first commandment of statistics is
"Know thy data", and medical data is one of those areas where that's a tricker
problem than usual.

~~~
illumin8
You raise some excellent points. Our software is mainly showing providers and
payers how to minimize waste, fraud, and abuse of the system so it requires a
huge amount of customization because of each organizations use of coding. The
clinical analytics comes in where they can analyze historical data and tell
them "you could have saved X amount of money by coding this procedure
differently" or "there is no medical reason to do procedure X if you've
already done procedures Y and Z, thereby saving XX money." To analyze this
data not only requires a statistician's grasp of math, but it requires medical
knowledge and organizational knowledge as well.

If you were smart enough to be a data/stats geek and also had an MD, plus
years of experience working as a doctor in a hospital, I'm sure you are worth
your weight in gold as this skill set is very rare.

~~~
stevenbedrick
It sounds like you guys make really useful software!

As you say, MDs who have the skills and are inclined to do this sort of stuff
are few and far between. My grad program in medical informatics has a master's
track whose graduates are mostly MDs, and they would be quite well qualified
for this sort of thing... except that most of them go on to either be CIOs or
implementation consultants, and typically make far more than analysts do.

I think it's something that they're going to have to start teaching in medical
schools, however. As more and more places start taking quality improvement
seriously, being able to think systematically about clinical data is going to
become a very important skill for doctors to possess. Of course, our
experience thus far with trying to get it into the curriculum has not been
very encouraging. It's amazing- doctors love trying out new gadgets or drugs,
so they're clearly not inherently afraid of technology or of change... but try
and get them to modify their curricula, and they look at you like you're
crazy.

------
geebee
I like this line: "The rising stature of statisticians, who can earn $125,000
at top companies in their first year after getting a doctorate, is a byproduct
of the recent explosion of digital data."

So... someone who majored in math/physics/hard sciences, got the grades and
test scores to gain admission to a top university, and goes through a program
with a high attrition rate that you're doing well to complete in 6 years can
earn a bit less than a JD (half the time) or MBA (1/3 the time).

Like Right Said Fred Said: I'm too sexy for this field, too sexy for this
field, holds no ap-peal...

(ok, I'm a data geek, and this sort of thing actually sounds like far more fun
than corporate law, and $125k starting is decent... but still, let's still
recognize that the reward to effort ration is still not quite comparable with
the professions, and the journey is quite a bit harder).

------
dilipd
Here is a question from a layman's (Stat 101) perspective: Why don't we see
many Statistics PhDs, armed only with with a computer, access to free public
data and some programming skills, CONSISTENTLY make a killing on Wall St?

Is it because Statistics fails once the number of variables & complexities
increase to reflect the real life?

~~~
3pt14159
They are making a killing, they just don't talk about it. I know several
people who are doing this. Currency trading, predicting dog & horse races,
options trading - all one man shows with a tiny little grasp that nobody else
has. Each making a couple million a year trying to figure out the next little
trick (all the straightforward tricks are owned by Goldman's super fast
computers). More than once I have seen 1 second of lag cost them big, but
overall they are maxing out the opportunities their little (< 20k loc, usually
in vb of all things, sometimes in python or ruby) stats programs have found.

Furthermore, I work as a business intelligence quant for the online tech
space. I've DRASTICALLY increased ROI rates for online ads, as well as
conversion metrics, with my formulas and clustering models. There is so much
low hanging fruit out there it is crazy.

If anyone wants to get into this field I'd be more than happy to point you in
the right direction.

~~~
impeachgod
I would too. I think it's best you make a public post about this, I think a
lot of people will be interested in this.

------
martyhu
I agree with this article that over the next 10 years, the internet may have a
lot more data. But how much of that data will be privately owned?

Platform apps dominate an increasingly large fraction of the internet. Its
hard to see them making their data public anytime soon.

~~~
c1sc0
"But how much of that data will be privately owned?" ... and how much of that
data will leak out anyway? Or maybe people will stop caring about privacy
altogether? Even data acquired in dubious ways will require statisticians for
analysis, maybe even more so.

------
nazgulnarsil
it's not the raw skills. those have been in demand for a long time. the next
big thing is going to be the marriage of analytics with intuitive interfaces.
the standard model currently is pretty much power point presentations. PP
sucks and yet thousands of businesses use it as the main way of communicating
important statistics.

------
bbuffone
Completely agree - Currently my company is doing a lot in analytics and data
analysis. Having a background in statistics is something highly desired but I
find my of the computer engineers lack a lot of the math skills to take on the
analytics problems alone.

At least the ones that have applied.

~~~
Elite
What types of problems does your company need to solve?

~~~
bbuffone
We need most of the things defined in this web book -
<http://www.statsoft.com/textbook>.

------
davemabe
I agree that statisticians will be more and more in demand as time goes on.

I need one right now for a project - if interested email me at my username at
gmail.

------
callahad
Can anyone suggest good resources for gaining a cursory understanding of
statistics?

------
houseabsolute
I think I read an article to this effect about five years ago. Still not true
. . .

~~~
obiterdictum
I don't think it should be taken literally, because statisticians come in
different guises. Nobody ever says that being statistician is hot, yet there
is a lot of interest in quantitative analysis and HFT (even here on HN) which
is all about statistics.

------
markstansbury
Foucault would be proud.

------
HilbertSpace
Looks like the article might provide some 'luster' to IBM; they are going to
put 4000 people on this. Hmm ....

What the article predicts would, could, and should happen but won't. Here's
the problem:

Let's start with the 'status' of statistics:

Academic Teaching: In academics, the courses available rarely go beyond just
some Stat 101, experimental design, or applied regression analysis. The
teachers rarely have much expertise in statistics, e.g., rarely understand the
strong law of large numbers, the Radon-Nikodym theorem and its connection with
sufficient statistics, or the Lindeberg-Feller version of the central limit
theorem. Net, the teaching sucks.

Academic Research: The quantity of good academic research in statistics is
meager. The applied statistics research such as in the article would not be
regarded as solid research. The grant support is far behind that for physics
(theory, particle, applied), biomedical, computer science, engineering, or
pure math. Net, the research sucks.

Ph.D. Programs. One can count with shoes on all the good Ph.D. programs in
statistics. So, over the past 40 years might count Berkeley, Stanford,
Chicago, Cornell, Yale, Hopkins, and UNC.

Computer Science. Yup, to do much in statistics, need computing. So, much of
the public and academic computer science swallows the idea that computer
science has expertise in statistics. No it doesn't, not while they can't state
the strong law of large numbers, and nearly no one in computer science can;
for that they just didn't take the right courses in grad school. About all CS
can do is pull equations they don't really understand from cookbook statistics
and try intuitive heuristics, and that is similar to medicine in the days of
snake oil cooked up on wood stoves. Suckage.

Professionalism. Law, medicine, and parts of engineering are 'professions'
with certifications, licensing, liability, and strong professional societies.
Statistics isn't a profession in this sense. Uh, such 'professionalism' is
from important up to crucial 'branding' and credibility for customers outside
the profession. Medicine has it; statistics doesn't. Indeed, in academics, a
suggestion that statistics should be 'professional' is an anathema. Students
who want to get their fellowships renewed will keep their mouths SHUT and
never say such things. Suckage.

So, net, the status of the field sucks.

Okay, now we can move on to why the field won't catch on in business:

We have to notice that nearly no one high in business now or on the way to
being high in business knows more than just some elementary applied
statistics, from long ago, that they never understood very well, never really
used, and was likely poorly taught. Also they have not seen much of
significance in business from anything at all serious in statistics. They know
about the importance of computing, the Internet, and maybe some of assembly
line robots, supply chain optimization, comparisons among planes, trains,
trucks, biomedical research, even efforts in applied nuclear fusion, but they
nearly never attribute significant importance to statistics.

So, suppose there is a good statistician, in a business, with some good data
and with some powerful techniques in statistics that can convert that data
into new information valuable for the business. Suppose this statistician
writes an internal memo to his supervisor and proposes that the company fund
the statistician to work on delivering the value to the business.

Here's what happens: The memo goes up the management chain of the statistician
to the first manager who doesn't have much respect for statistics. Given the
status of statistics, don't expect the memo to go up very far.

Then this manager sees two cases:

(1) The project fails. Then the manager will have a black mark on his record
for sponsoring some contemptible, risky, wasteful, 'blue sky, far out, ivory
tower, intellectual self-abuse, academic research project'. Bummer.

(2) The project is successful. Quickly everyone in the management chain who
does not understand statistics will feel threatened. There is a rumor that a
women in the office complained that once from 100 feet away the statistician
looked at her in a way that made her feel "uncomfortable", and the
statistician is GONE.

So, the manager sees only disaster whether the project is successful or not,
and the project doesn't get funded. If the statistician proposes a second such
project, then he's a 'loose cannon on the deck', out of control,
insubordinate, not a 'team player', and gone.

Or a big organization middle manager can fund big projects in computing,
supply chain optimization, assembly line robots, etc. he doesn't understand,
but, due to the status of the field of statistics he can't fund a project in
statistics.

There is really only one way for statistics to come forward in business now:

The guy with the valuable work in statistics starts his own business and sells
just the results. The customers like the value of the results for their
businesses and don't have to address anything else.

But, for this business the statistician is totally on his own: There isn't an
'information technology' venture partner anywhere in the US who would touch
his project with a 10 foot pole, again, for much the same reason as the
business manager.

The statistician MIGHT get some seed funding if he shows a good user interface
or Series A funding if he shows good ComScore or revenue numbers, but the role
of 'statistics' he can be advised to keep quiet.

Or, the venture partners believe in Markov processes: The future of the
business given ComScore numbers is conditionally independent of the statistics
in the 'secret sauce'! So, look at the ComScore numbers and f'get about any
'statistics' in the 'secret sauce'. This Markov assumption is not fully
justified, and likely not a single venture partner in the country could give a
solid definition of conditional independence, but this is still the situation.

And that's the way it is.

So, it's tough to make statistics applied; call this situation a 'problem':
Then, for someone with some new, powerful, difficult to duplicate or equal
work in statistics that can take some of the oceans of data available now and
deliver valuable results and sees their way clear with just a bootstrapped
company to high profit margins and rapid organic growth, the flip side of this
'problem' is an opportunity.

~~~
ced
A lot of "applied statistics" is now performed under the headings of "machine
learning" and "data mining". Both fields are thriving.

Furthermore, Bayesian methods have come back to the fore in the past ~15
years. They are quite likely the future of statistics, especially in academia.

~~~
xtho
You can't solve all problems with machine learning and data mining. How would
you apply those methods, e.g., to psychological experiments or test planing
when every single test costs a considerate amount of money?

~~~
moultano
Situations like that are not why statistics is becoming the new in-demand
skill. Expensive trials have existed for a long time. Petabytes of barely-
structured data haven't.

~~~
xtho
I didn't comment on the article but on the statement above about applied
statistics being dominated by ml and about Bayesian statistics being the
future.

