
The Big Data Brain Drain - barry-cotter
https://jakevdp.github.io/blog/2013/10/26/big-data-brain-drain/
======
rprospero
> Those with the skills mentioned in this article could easily ask for several
> times that compensation in a first-year industry job, and would find
> themselves working on interesting problems in a setting where their
> computational skills are utilized and valued.

I have to admit that this one always baffled me from the reverse perspective.
I always hear these discussions of the great talent shortage and how desperate
people are for software engineers. I hear tales of people taking six week
crash courses on coding and getting well paying industry jobs. But I just
can't believe them.

I don't consider myself a great coder, but I'll forgo humility for a moment to
say that I'm decent. Over the course of my PhD, I needed to:

* Write cluster based numeric code for the university supercomputer

* Reverse engineer an undocumented binary network protocol with a packet sniffer

* Work in six languages in a single day (C, Tcl, Labtalk, Scheme, Labview, Python)

* Write code so low level that I had to debug it with an oscilloscope

* Design and implement a data acquisition system that accepted concurrent input from hardware, network, and and user sources

* Port a Monte-Carlo simulation to the GPU

* Write user facing visualization systems

I guess that what I'm trying to say is that I can easily FizzBuzz.

After my PhD, I looked into going into industry, but I couldn't even get a no-
thank-you. People complain about dealing with recruiters bugging them about
job possibilities, but I couldn't even get one to take my CV. I relocated
across an ocean to an employer who didn't even pay for said relocation
partially because they're the only people who even got back to me.

Don't get me wrong - I love my current position, but I've known plenty of
academics who were miserable. Individuals who were more skilled than myself
and sometimes only making as much as Zappos pays their warehouse staff.

If the talent shortage is that bad, why aren't more of these people being
poached? Why aren't we seeing a larger brain drain?

~~~
tom_b
All I have is anecdotes, but I'll hazard some guesses.

The talent shortage has a strong geographic component. In SV or NYC, job
search sites (a poor way to gauge actual demand, but whatever) seem to list
1,000s of available Java/C# gigs, with substantial numbers of listing that
include javascript. But I would probably lump those three languages together
as placeholder flags to indicate "standard enterprise developer" jobs. Most of
these job listing seem to fall on the entry-level, with a steep fall-off in
mid-level offerings, and a few (mythical?) mega-$$ listings. These types of
jobs feel more "credential sensitive" to me - if your resume (not CV) doesn't
say Comp Sci or maybe MIS, you probably get binned immediately.

Leave those regions, job listings seem to drop. If we can trust these listings
to reflect some level of region-comparable demand, we might guess that in
Seattle, WA or Chicago, IL, there is about 1/2 the demand of SV and NYC. Cut
it down even more in places like Austin, TX or RTP, NC, maybe 1/6 or 1/8 the
demand of SV and NYC? I'm sure we could include a bunch of other cities in
those tiers, but maybe those are loosely representative.

I often wonder how much job demand looks inflated on these job boards though.
A quick glance at Java in NYC, NY on indeed.com shows a few big staffing
companies with hundreds of listings. I usually assume there are bunch of
duplicate listings from staffing companies for a single position.

I think a lot of lower-level gigs out there are filled by kind-of-passable
candidates not making anything like the SV salary numbers that get thrown out
on HN frequently. Among strong engineers I personally know in my region,
several are frequently contacted by recruiters for jobs that are, at best,
side-ways moves for them (eg doing the same gig at some different place for
roughly the same money). A PhD (and maybe even a MS) _might_ scare employers
at this level?

The other part of this, especially regarding the original article, is that
deep, strong statistical/machine learning/magic fairy dust hacking gigs are
much less numerous that the standard enterprise dev jobs and that those gigs
are even more restrictive in recruiting nature.

Meaning, if you are really recruiting for a top-notch "data science" person,
you are probably(?) looking for intensive credentials or portfolios. So MS or
PhD in Applied Math or Statistics with easily demonstrated programming skills
or a set of completed projects cast in the "data science" space. And if you're
not recruiting at that level, the "data science" label means "query some data
sources to generate a tabular report, maybe with some rollups."

~~~
drzaiusapelord
A recruiter here in Chicago tells me he's dying for entry level ASP.NET and
back-end Java coders. I think a lot of these companies just run these guys
through at largely unimpressive wages, run a bunch of candidte through as temp
to perm, and hope to god one guy out of 20 isn't a complete moron. I've had
the displeasure of working with some of these guys. They have some paper certs
and some basic understanding of what they're doing but they have zero big
picture and any love of what they are doing. Don't get me started on their
work from a security or stability perspective. These guys are just in the
wrong field. They heard the siren's song of being a coder, did some coding
bootcamp or took a community college class or two, and are shoved into big
departments churning out junk code.

I think these jobs echo how a lot of basic IT support staff in the late 90s
and early 2000s got work. Companies were scrambling for someone to sit down
with end users and explain how Outlook worked, install software, and maybe
reboot the occasional server. These guys all ran to braindump sites, got a
MCSE or CCNA, and now are lifers at the company they fooled into hiring them.

There's something somewhat sad about all this. These people probably would
have been better off in a different field but are now stuck in unpromotable
positions because they just don't have what it takes to move up. Our own
support guy is in his 40s and is barely competent at providing basic level 1
PC desktop support. He's totally in the wrong field. I'm not saying this stuff
is a calling, but in some ways it is. Like a lot of technical jobs, you kinda
have to make it your religion and invest a bit of your personal time into it
because its so fast moving that if you treat it like a 9-5 desk job, you'll
fall behind very quickly.

~~~
Amezarak
> I think a lot of these companies just run these guys through at largely
> unimpressive wages,

This.

When I graduated with a MS in CS 3 years ago, I interviewed at over a dozen
companies. I got offers at all of them. The highest offer I got was for 45k.
The lowest was 32k. Outside of SV and NYC and the big tech hubs, wages are
terrible. For comparison, waiters at the restaurant I worked at (by no means
high end) made up to 40k.

Companies want to pay dirt-cheap wages and get tons of highly qualified
applicants and _shockingly_ they just can't find anyone!

That said, three years in and my salary has almost doubled (after job-hopping
- which did make me sad, as that first job was a lot of fun even if it was a
lot of work), so at least there's that, but that's just about the cap for
private sector work in the region.

------
tom_b
A more recent article addressing this essay from the same author:

[https://jakevdp.github.io/blog/2014/08/22/hacking-
academia/](https://jakevdp.github.io/blog/2014/08/22/hacking-academia/)

The tl;dr summary from this second article referencing the first:

    
    
      a quick summary is this: scientific research in many 
      disciplines is becoming more and more dependent on the
      careful analysis of large datasets. This analysis requires
      a skill-set as broad as it is deep: scientists must be 
      experts not only in their own domain, but in statistics, 
      computing, algorithm building, and software design as 
      well. Many researchers are working hard to attain these 
      skills; the problem is that academia's reward structure is 
      not well-poised to reward the value of this type of work. 
      In short, time spent developing high-quality reusable 
      software tools translates to less time writing and 
      publishing, which under the current system translates to 
      little hope for academic career advancement.
    

I think the HN title is somewhat misleading, the central thesis from the
original article is stated as:

    
    
      the skills required to be a successful scientific
      researcher are increasingly indistinguishable from the 
      skills required to be successful in industry
    

This is probably a stunningly awesome outcome - it means that industry has a
place for advanced degree holders who will not find a classic academic
position. In the linked essay above, from the same author:

    
    
      the number of PhDs granted each year far exceeds the 
      number of academic positions available, so it is simply 
      impossible for every graduate to remain in academia.
    

I wish people would not use 'big data' as a label in these discussions. I
think the essential truth is that being able to apply a scientific,
quantitative thought process to problems combined with the ability to write
software to provide others with solutions to those problems is valuable across
academia and industry. That doesn't really have much to do with the 'big data'
meme flaming across the skies these days.

------
ISL
Academia can easily retain us. Pay us more and support our work.

I love my field very much; the only three things that have me considering
leaving academia: Poor salary outlook; PhD + 10 years domain expertise? $50k.
Poor job liquidity; zero geographic choice. Professors rarely appear to be
happy/terrible work-life balance due to administrative overhead and the need
to secure future funds.

All of these concerns can be resolved with money, through both higher salaries
and better funding. If they were resolved, I wouldn't be looking outside at
all.

If you have great management and need someone who's good with precision
hardware, sensing, and data analysis in the Seattle area, please get in touch.

------
gajomi
An interesting read (along with the sister article pointed out by tom_b,
linked in the original). It is relevant for me as within about a year I will
leave my current position as a postdoctoral scientist. I consider myself more
heavily weighted towards the "domain specialist" and "probability/statistics"
foundation, but would still be considered a data scientist by many. The
article is concerned with making academia more attractive so as to stop off
loss of data scientists to industry, but I am also wondering whether or not
there might be some fit within industry that has perks like those in academia.
So I will try to ask for some advice, as someone with an open mind towards
both academic and industrial position, but with no experience in the latter...

In applying for position in industry what sort of strategies are there to
maximize time allocated to discretionary research/independent projects/etc.
Lets say I am willing to work a maximum of 60 hours a week, and that I have
some idea in mind for a minimum salary (I dare not say the actual number). How
can I negotiate a contract where 10-15 of those hours go towards personal
projects that might be only loosely aligned with the firm's objective? Is this
even possible for someone just starting out? For example, could I take a pay
deduction? Or should I just think about reducing the total number of hours I
work for the company?

Does anyone have experience applying for research grants within industry (of
small to medium size)?

------
_yosefk
The "unreasonable effectiveness of data" in finding publishable results and
creating commercially viable products ought to be blinding us (or rather
"them" \- AI researchers of whom I'm not one) to better ways to learn. A child
learns to speak and recognize objects from much less data than contemporary
machine learning/data mining/AI/whatever you call it needs.

Of course it's much better to keep your program small and your data sets huge
than the other way around - we'd all do it if we could. But it ought to be a
lot like keeping a 2^64-N look-up table to implement 32b integer
multiplication; it's ultimately larger and costlier than figuring out how
multiplication works, and you give wrong results for the N entries missing in
your table.

Or something along those lines. (Of course natural languages are
"scruffier"/not as "neat" as multiplication - but demonstrably not as scruffy
as to be impossible to reasonably learn in any way except reading everything
ever written and translated into another language.)

~~~
incision
_> "A child learns to speak and recognize objects from much less data than
contemporary machine learning/data mining/AI/whatever you call it needs."_

You think?

When a kid starts talking at 1.5, 2, 3 years or whenever they're doing so
after being exposed to many thousands of hours of input, much of it
effectively guided.

Also, as far as I know, we don't have nearly enough understanding of the way
the brain works to make a useful comparison between "records" fed to a model
and all the analog input that goes into making a person.

------
barbudorojo
I think the key factor here is that with the new tools and power in
computation big data is everywhere. Intelligent and creative people can use
this tools to create models and new hypothesis that they can refine and sharp
with the new data. This is like Lisp and its meta capabilities. We can design
programs that create general hypothesis (macros) and then those tests are
macro-expanded to create functions that specialize and refine the model and
hypothesis. The power of computation and the availability of rich sources of
data (sensors, NLP, twiiter, and so on) allow us to think in ways that were a
dead end before the big data era.

Is the new tools and powerful computational capabilities what rewards those
with the inspiration and creativity required to get the most of it. Industry
or Academia is a false dichotomy, what is needed here is a new way of thinking
to explore and create hypothesis in a way that was not possible before the big
data era.

------
pessimizer
I'm also interested in the unreasonable effectiveness of data in both
reenforcing an illusion of control and rationalizing our preexisting
prejudices - a commonality between quants in finance, data science in
business, and mainstream science (as Ioannidis is teaching us.)

Anything that improves and grounds statistics in some sort of concrete, more
falsifiable context is a good thing. Maybe new people attacking from a
different perspective can help.

------
CmonDev
Whenever I hear about "brain drain" it always means "someone provides better
conditions".

