
Data Science Teams Need Generalists, Not Specialists - bryanrasmussen
https://hbr.org/2019/03/why-data-science-teams-need-generalists-not-specialists
======
wokwokwok
I work in this field and I flatly reject that the purpose of a ‘business
intelligence’ team is to “develop profound new business capabilities” or
”organise the data scientists such that they are optimized to learn”.

What planet are you on?

Such a team is _absolutely not_ a pure research team, devoted to investigating
the data and driving new capability and insights.

The purpose of such a team is to facilitate the _business_ to have operational
insight into the state of the business, and serve the business as _the
business requires_ to deliver business as value.

What the article proposes is like having a development team that has 100% time
to pursue their interests and build whatever they like to show to the business
to develop new business capability.

There is a place for such a team, and a data science team as described, but
like google X, a very specialised thing for companies that can afford to
dream.

...not for everyday engineering teams.

Generalists vs specialists is a straw man, obviously you have both.

~~~
spongepoc
>I work in this field and I flatly reject that the purpose of a ‘business
intelligence’ team

Good job the article is about data science and not business intelligence then.
Cool the feigned indignant outrage please.

~~~
nerdponx
It doesn't say "business intelligence" anywhere in the article.

------
kodz4
Sounds like the author wants to work at a university.

And that is probably where the best data science results are going to come
from. Where inter-disciplinary teams and cross talk are the norm.

Science takes time to make sense of data.

We have figured out how to produce gigantic quantities of data, but that
doesn't mean science gets faster.

Whether it is CERN or Wall Street or the NSA or Facebook processing the data
takes it own sweet time.

And when they don't find anything or use it in misguided ways it takes time to
work that out too. Because everyone is conditioned to hide that.

It took 20-30 years before anyone seriously took the experiments (data) of a
Micheal Faraday to get an accurate math model of electromagnetism. There were
a whole lot of famous mathematicians around, and all of them had access to the
data. So why did it take time?

Orgs with data don't have that kind of time. And the truth is these mythical
generalists don't exist. They really can't be quickly mass produced like
vegetables. And on top of it all orgs and execs are conditioned to not share
their data.

This combo of factors is why we see so many bad consequences and erosion of
trust in every single org dealing with big data.

We are all living under the delusion that Data Science is like working on
crude oil at a refinery. It's more like working at a landfill with arbitrary
deadlines to find diamonds skewing incentives for the data to be misused.

------
uptownfunk
I work in this field.

I am expected to serve clients in a lot of different domains.

When we were doing data science work for an insurance client, I stayed up late
for weeks reading about actuarial models and how things currently work as well
as learning all the jargon there is in the industry. I could probably pass the
highest level actuarial exams at this point (or at least not fail too
embarrassingly) [edit: OK I probably couldn't do this, but I could probably
pass any/all exams related to the quant side] and also innovate superior
premium pricing models using "Machine Learning". Not because I want to become
an actuary but that's what's required to do good work.

When I worked in pharma, I learnt from other PhD's/PostDoc's on my team about
oncology, a very specific type of oncology in fact, and everything that goes
into Pharma companies and their marketing efforts, how doctors operate and
behave. Not only that, but I learnt from an industry expert on all the nuances
and subtleties that involve analyzing various types of medical claims data.
(Hint: It's a total CF)

I could go on and on about the different domains I've had to work in. But the
whole point is, being a generalist, in the sense of a dabbler, is utter
nonsense. If you want to be a data scientist, you have to be flexible enough
to work in any domain, and also have the gumption to become a specialist in
the field, do something new in that domain with your shiny "machine learning"
knowledge, while making sure your models are not GIGO from spurious
statistical assumptions, and making sure you know how to code decently enough
that your algorithms/code doesn't shit itself. This probably aligns closer to
what the article was actually talking about..

(Edit: Sorry for the confusion, in my world the word generalist has a totally
different meaning..)

~~~
amirathi
> this generalist thing is utter nonsense

By your own description, you sound like a generalist with ability to dive deep
into the areas needed. You seem to refute the core premise of the article but
your description says otherwise.

~~~
uptownfunk
Honestly, I'm just put off by the word generalist. It makes it sound like just
because someone can call a few APIs and create a deep learning model, that, as
long as they have a clean data set, all of a sudden the world now revolves
around them. Frankly, a lot of it is because of the hype, all of a sudden data
scientists are these magical leprechauns that by virtue of their fancy
algorithms can make money appear out of thin air.

It really isn't like that, you really have to be able to go deep, as you said.
Yes, there is a part of it where you have to be flexible (I wouldn't call it
general, because I think it sweeps too much under the rug), so as to go so
deep into a topic, you pass this kind of "expert-level turing test" where,
were you and a domain expert put in the same room, a reasonable person
wouldn't be able to tell you apart, where the weaker version is "another
expert wouldn't be able to tell you apart" or something like that..

~~~
wefarrell
It sounds like you equate 'generalist' with BSer, but the article's definition
of generalist matches what you're advocating.

> Specialists’ work is coordinated by a product manager, with hand-offs
> between the functions in a manner resembling the pin factory: “one person
> sources the data, another models it, a third implements it, a fourth
> measures it” and on and on.

If you're taking the time to learn the domain, source the data and clean it
then you fit their definition of a generalist.

------
dagw
They need both. Having a domain expert in the domain you are modelling will
save you a lot time and lets you avoid a lot of dumb mistakes.

~~~
0815test
Also, you need domain expertise to do good feature engineering and to develop
good tailored architectures. These are quite critical to obtaining SOTA
results, so far as I can tell. People like to pretend that deep learning and
"AI" have made feature/model engineering unnecessary, and it's just not true.

~~~
hikkigaya
Not really, you don't need to be an expert in go to beat the best go player.

~~~
currymj
there was a strong model of the world in AlphaGo, which allowed it to
accurately simulate games that followed the rules.

that is quite a lot of domain expertise, much more than in most ML systems.

~~~
natalyarostova
Another way of saying this, is you don't really need the same level of domain
expertise if the problem space is entirely defined by a small set of clearly
articulated rules.

------
kyllo
In my experience you basically have to give data scientists direct
responsibility for a business process in order to ensure that their models are
relevant to and actually get utilized in the process.

Forecasting is a great example of something that every retail company needs,
and that data science is supposed to help with. But if you make planners
responsible for planning and ordering (minimizing surpluses and stockouts),
and make the data scientists responsible for developing forecasting models,
the planners won't trust or use the forecasting models--they'll just continue
making their own personal models in Excel.

If you want to actually solve a business problem with machine learning, then
you have to actually give the data scientist decisionmaking authority in the
business process and responsibility for the business result.

------
thanatropism
In this thread: people who have never heard of industrial labs.

[https://en.wikipedia.org/wiki/Bell_Labs](https://en.wikipedia.org/wiki/Bell_Labs)

[https://en.wikipedia.org/wiki/PARC_(company)](https://en.wikipedia.org/wiki/PARC_\(company\))

[https://en.wikipedia.org/wiki/Research_and_development#Busin...](https://en.wikipedia.org/wiki/Research_and_development#Business_R&D)

------
michaelcampbell
Generalists are also often in danger of RIFs and layoffs, because for any task
X a company values at time T, there's always an employee Q who can do X better
than generalist G.

But on the flip side, they seem to be able to bounce back better than
specialists.

~~~
felixgallo
It's the other way around; if you have specialist X and Y, and generalist Z,
companies tend to get rid of X and Y and consolidate functions with Z.

~~~
michaelcampbell
Not in my experience; but it may depend on how "special" the specialist is,
and what the specialization is. Sometimes Z just can't do it.

------
raverbashing
I wonder how many "data scientists" around don't know what a normal
distribution is or what's the chance of throwing 6 in a die right after
throwing a 6 on the first time.

~~~
isolli
I would always expect a higher chance of throwing a 6 the second time, because
I updated my Bayesian prior (the die is fair) and I now believe there is a
slight chance that the dice is biased in favor of landing 6s.

~~~
Scarblac
That's amusing. Your prior is that the die is fair. And even if it is,
regardless of what the result of the first throw is, the chance that it is
fair is necessarily lowered after the first throw.

That's an odd effect of Bayes' rule that I never considered.

~~~
Tenoke
>regardless of what the result of the first throw is, the chance that it is
fair is necessarily lowered after the first throw.

This isn't true. You probably have a higher prior for it being unfair towards
6 or 1 (and how much depends on the scenario).

However, if for example, you have priors for it being potentially unfair
towards every number equally, 1 throw only changes the posterior probability
of whether it is biased towards that number but not whether it is unfair. In
case of unequal priors for which numbers it might be biased (e.g. 0.01 chance
of it being biased towards 6/1 and 0.001 for 2, 3, 4, 5 and 0.976 for
unbiased) getting a number changes all those probabilities yes but for example
if you get a 3 the chance of bias there gets higher but mostly at the expense
of the probabilities for bias for the other numbers - which get lower (I am
too lazy to calculate the posteriors).

All in all, you should know beforehand what your posteriors are going to be
for any case based on any outcome and if you know that any one probability
(like that of it being fair) is definitely going to lower no matter the
outcome then that should be your current prior for that case.

------
rq1
Isn't it a generalist' ability to specialize Just In Time?

------
thekhatribharat
Earlier HN discussion:
[https://news.ycombinator.com/item?id=19361208](https://news.ycombinator.com/item?id=19361208)

------
dalbasal
I think smith's early take on "division of labour" unfortunately conflated two
different things, either of which could be called division of labour.

One is what he described in the pin factory, some basic precurser to a
factory/assembly line. One person draws wire, another cuts it, another
sharpens..

This is not about specialized skills, or even labour. It's industrial
engineering. Break a process down to components and optimize individually. In
simple cases like this, just breaking down the proccess is 90% of the way.
Give 3 people the job of sharpenning an endless pile of pins and the improved
tools/methods will follow.

This bleeds into industrial labour (Smith's example for this is pin
_packaging_ ) in a few ways. Once you have small, efficient, tooled compnent
processes... you don't need skilled labour. Pinmaking might be a specialized
craft, but anyone can be taught to sharpen.

This is the _opposite_ of the other kind of division of labour.

When historians (especially british ones from the same period) theorized about
early civilisation, specialized division of labour featured often. Early
cities had enough people that not everyone had to farm. You could have
specialist priests, soldiers, artisans, stonworkers, smiths, boatbuilders..

This is where specialized skills and depth of knowledge comes in.

So... data science & pins... The first kind of division of labour is the
"organization as a machine" kind. People do a consistent reptitive process.
This works very well, but only if you need a cvonsisten, repetitive result.

I think what this article is mostly argueing is that data science (like
programming, engineering and a lot of "information economy" jobs) is but
shouldn't be organized this way. You don't need a consistent, repetitive
result. If you do, that's what computer programs are for.

I agree, I think. Like with a lot of software domains, there's a stark
differenc ebetween small projects where requirements, design, architecture &
implementation can be done in one head and big projects that can get bogged
down in bureucracy, misunderstandings and the inability to move back-and-
forward between elements. I think data science has the extra problem of
datasecurity and other things that require controls and rigidity.

I expect a lot of these problems will lessen with time. The field is still in
growth phase, and both tools and skill levels will improve.

Circa 2005, a problem I cam across all the time was impractical designs for
web apps. You'd have a designer who had been designing posters and liveries.
They'd make a picture. Then you had a html guy, who would try very hard to
make it happen in html. Then it'd go to a JS or server-side specialist, who
discovers that and arbitrary amount of text needs to fit neatly into a box
that fits exactly 416 charachters of lorem ipsum.

~~~
throw22032019
To "break a process down to components" _is_ what the division of labour is,
and the division of labour is nothing else but that. Smith didn't conflate
shit. Specialization is a posterior an effect of the division of labour.
There's no _opposite_ division of labour, labour is divided in the same sense
in the two cases: one worker straightens the pin, another sharpens it; a
machine straightens it, another sharpens it.

"Labour" does not just mean what a labourer does, it can mean what is to be
done, "a labour", such as the labour of making a pin. If you had attempted
instead to infer what Smith meant through the example, you'd not have written
any of that.

------
potatofarmer45
"Mastery in that they know the business capability from end-to-end". That's
the point. These generalists are not really data scientists. Often it's some
account manager/salesperson/ops worker rebranding themselves to get ahead. A
real data scientist who understands the business side is rare and really good,
but most "generalists" are just bad. So bad in fact they exacerbate every
single problem the author thinks they solve: Having bs people called data
scientists devalues and annoys the real data scientists; and they often create
confusing buzzword driving plans and white elephants for show rather than
insight that will have a real impact.

I worked in a media agency once where an account manager who could barely do a
simple calculation in Excel rebranded himself as a "lover of data". All he did
was play office politics. We nicknamed him unicorn. It was a nice way to
describe a creature with a thin rod sticking out of its head.

He was such a leech in that he was full of incorrectly used buzzwords,
overpromising, and overselling (he had a habit of present non-significant
analysis as facts). The very idea of statistical confidence was a mystery to
him.

Because he played the politics game well, he did reasonably well and ran a few
projects. Every single one of those turned out to be a giant white elephant to
the clients.

~~~
itronitron
What you're describing is not a generalist. Generalists can actually do things
and it sounds like your colleague was just pretending. I have worked with one
or two people similar to what you describe, they can pose a real risk to their
employer if they are not managed appropriately.

