
Cargo cult data science - riri-au
http://blog.richardweiss.org/2017/07/25/data-science-in-organizations.html
======
gipp
When the author says:

> However, that assumes that someone presenting an analytical presentation
> will be viewed more favourably than someone presenting something softer.
> Basically, I had assumed a data-driven culture exists, when in reality
> businesses are struggling to create that culture in the first place.

I think this understanding of the situation is in itself part of the problem.
It assumes that someone coming in with an analytical presentation necessarily
_should_ be viewed more favorably than someone presenting something softer.

Coming from someone who's been working as a data scientist for several years,
data-driven decision making has its limits. One very important one is a
strong, strong bias towards myopic metrics (e.g. "engagement" over "lifetime
value", "traffic volume" over "reputation in the market"), on the basis that
they:

* Are more easily measurable

* Provide more data to work with

* Provide a stronger signal/noise ratio

* Provide much faster feedback

An organization which _always_ values data-driven decision making over
expertise-driven decision making is always going to fall prey to this myopia.
Fighting cargo-cult data science and building a sustainable analytical culture
also means understanding the limits of data-driven decision making and that it
does not replace, but supplements, "softer" expertise-driven culture.

~~~
Bartweiss
> _An organization which _always_ values data-driven decision making over
> expertise-driven decision making is always going to fall prey to this
> myopia._

This is a huge and under-appreciated concern. It's disturbing how often
success is measured by optimizing a single metric, and how resistant people
can be to recognizing issues with this approach.

A goal like "improve clickthrough rates" is easy to measure, but without some
human insight it's all too easy to achieve it at the cost of overall success.
Did you decrease time-on-page? Maybe your visitors feel mislead. Did you
decrease conversion rate? Maybe your new visitors don't actually want your
product. And so on, indefinitely, including lots of side effects you might not
have convenient statistics for.

I have a depressing sensation that at least half of corporate data science
consists of abusing Goodhart's Law - finding a useful metric and then naively
optimizing for it until it's no longer representative of business success.

~~~
riri-au
You preempted my followup article. I'm not sure where the balance between
these two is, but I'm sure that many places get it wrong.

~~~
Bartweiss
I'm definitely excited for that followup, then. I think this is an incredibly
common issue in both directions, and a surprising number of companies seem to
make both errors at once.

------
paraschopra
One key fact that I wish every product manager internalizes is that data
science is a technology. And like all technologies, it may or may not be
applicable to particular problem. And it may or may not be best use of an
organization's time to invest in that technology v/s other options.

On the marketing side, just like a marketer will never market a database
upgrade to users, she shouldn't consider marketing data science / ML directly
to users. Users DO NOT care about using a data science enabled feature. They
want value and progress in their lives and some times you may create more
value by removing a form field than by investing in and delivering a data
science project.

So use good business sense balance investment v/s reward for evaluating data
science projects. I wrote about this here: [https://growth.wingify.com/what-
you-need-to-know-before-you-...](https://growth.wingify.com/what-you-need-to-
know-before-you-board-the-machine-learning-train-a81c513098fe)

------
thisisit
Wow, this article is exactly what is happening in my project including the
"data lake" part. The even more infuriating thing is when everything turns
into a bizzaro world. Our boss made us spend an hour talking about the
difference between "data lake" and "data ocean". The only thing one could do
is _facepalm_

~~~
heisenbit
You read a well reasoned article and have first hand experience but frankly
these are just two data points. You lack the data to justify the effort for a
facepalm ;-)

Imho. a lot of these unbounded data projects are the result not just of cargo
cult but satisfying a deeper need i.e. management avoiding decision making. It
is much easier to go for broad data collection than making a directional
decision, building a targeted model and making real world changes that lead to
meaningful fact finding. Dreaming of data oceans is less risky than navigating
a puddle but the latter moves you actually forward.

~~~
Bartweiss
> management avoiding decision making

A solid point, but I'd also add _justifying_ decision-making.

The stated reason for projects like this is usually to guide better decision
making, which is only possible if the project succeeds. But as long as the
project produces some kind of comprehensible output, it can be used as an
excuse for making new decisions or changing old ones.

 _Dilbert_ used to have a lot of strips about managers using re-orgs to bury
their bad decisions. From some of the horror stories I've heard lately, data
science and analytics have taken over that role at many companies, helping to
cover up power grabs and backtracking under the guise of "listening to the
data".

------
Will_Parker
This article seems to speak unfavourably towards building a basic BI
infrastructure. I don't know why. There is immense value for having a single
trusted source of truth where the most basic business questions can be
answered ad hoc, with a suite of simple visualizations and KPIs that cover the
most important facts of how the business is doing. Data like this serves as a
crucial element of communication across different departments in the company.

One use of a strong BI infrastructure, that is under-appreciated, is as a sort
of test suite for the business. If an important metric changes, it's extremely
costly if it is not discovered very quickly. This also can lower the cost of
business risks. In other words there can be much value in visible data that
points to nothing new. BI isn't just a method to look for business
improvements, though it certainly can be that, as well.

BI and data warehousing infrastructure doesn't replace more targeted and
specific data science projects, it complements them.

~~~
riri-au
I guess I was a bit harsh, fair call.

I'll respond this way: basic BI infrastructure is the bedrock of an
organization's data capabilities. The best way for an organization to build
their initial BI/DS capability is to address some business problem, by
building a warehouse. That's in contrast to building a warehouse and then
looking for problems to solve with it.

So you're right, thanks for taking time to comment :)

------
peacetreefrog
In my experience, there're (at least) two ways of using data in an
organization:

1\. Top down. Where you start with a problem/decision, and use data to inform
it. "What phone plan should we offer our customers?" per the article is
topdown, and data science can help inform the answer.

2\. Bottom up. Where you start with a bunch of data, and try to brainstorm,
"OK what cool things can we do with this?". I worked for an IOT company that
collected a bunch of sensor data and we'd run into this all the time. We'd
take our best shot and report back to clients, who'd say, "Cool, but what do I
do with this?". Not saying you can NEVER come up with something useful, but
it's a lot harder.

------
gaius
_However, that assumes that someone presenting an analytical presentation will
be viewed more favourably_

Well, it certainly isn't helped by data scientists claiming to be better than
ANY programmer and ANY statistician. Who could possibly live up to their own
hype?

A DS and ML winter will follow just as it did for AI.

~~~
thanatropism
Yes. The blogpost is about the organizational difficulties in unlocking the
value of _technically sound_ "data science" projects, but these in turn are
the tip of an iceberg of "omg watson" on the executive side and "machine
learning does well on $archetypal_dataset, it can do anything!" on the techie
side.

A while ago there was a Kaggle project to solve certain conjectures on prime
number theory. _Seriously?_

~~~
soVeryTired
Got a link to that Kaggle competition?

~~~
walterreade
Here's the link. It was a playground competition (i.e., no rewards) - "This
competition challenges you create a machine learning algorithm capable of
guessing the next number in an integer sequence. While this sounds like
pattern recognition in its most basic form, a quick look at the data will
convince you this is anything but basic!"

[https://www.kaggle.com/c/integer-sequence-
learning](https://www.kaggle.com/c/integer-sequence-learning)

------
padthai
"Experiements in data science"

Is the blog's title misspelled?

~~~
hkon
no?

~~~
soVeryTired
"Experiments" doesn't have four Es. Unless it's meant to be some sort of pun
on "experience".

~~~
hkon
Yes, I was just having a bit of fun. People cannot even point out spelling
errors without hedging their statements. (right?)

~~~
padthai
I was unsure because the blog is two years old. Two years with a typo in the
title is quite a bit.

Maybe it was some kind of subtle joke that I do not get.

------
quirkot
I’d say this piece applies outside of data science, too. It’s a nice reminder
that technology can lead to culture change, but cannot drive it

~~~
corporateslave2
Technology is the defining change of our life times.

~~~
gaius
You might be surprised at how normal it is to just use technology to do the
same thing faster. I'll wager most business people think of computers as
glorified typewriters than can also send "memos".

------
aj7
An organization executive is not a stakeholder. At best she is a leader, a
formulation, an innovator. At worst she is a parasite. But not a stakeholder.

~~~
sgt101
In the sense that the executive has the power to enable your project for six
months and risks losing $500k of stock options if she is wrong I think she is
a stakeholder.

I wish I had some steak like that!

------
graycat
This is all very old stuff.

One earlier version was for AI expert systems.

Then there was object request broker architecture.

Such considerations were ubiquitous for the biggie operations research (OR)
with optimization, simulation, etc. OR was so big that it was required in
B-school programs.

Similarly for management science.

The lessons for how to make applications, as in this OP, were all there in the
past. Indeed, operations research (OR) and management science (MS) merged to
become OR/MS with a journal _Interfaces_ that talked a lot about the points in
the OP.

I went through a lot of that history and discovered lessons much like those in
the OP.

> Fundamentally, to be a data driven company, data needs to be part of the
> internal dialogue spoken by all members.

Okay, let's stop right there! Who the heck, why, where, when did anyone ever
say, argue, justify that any company should be "a data driven company"? Maybe
a "market driven company", but data driven?

Really, for what kind of company should have, there is very wide agreement,
from a home based business to Wall Street, and that is a money making company!

What turns on the CEO and the BoD is making money!

But not nearly all projects, data science, ..., Taylor's time and motion
studies, are directly connected with making money. E.g., when I wrote software
to schedule the fleet at FedEx, the main goal was just a schedule, printed
out, on paper, with departure times, flight times, arrival times, etc., that
would pass expert review as "flyable". Actually, saving money, i.e.,
_optimization_ , was of much less interest.

> So, to avoid a cargo cult of data, organizations should stop chasing
> technology and start working with experienced technologists who can apply
> technology to solve organizational problems.

Yup.

> Executives, to understand how their project relates to company goals, and
> how success would be reported.

Really, reasonably well experienced problem sponsor executives will ask "Why
should I do that?" and need a good answer or won't do it. Sure, one reason to
do the project may be just to be playing with the latest buzz words, but most
organizations have highly sensitive BS detectors that will be triggered by
buzz words.

> With their bosses demanding analytical results, managers will demand
> analytical results from their peers, and so on, down throughout the
> subgroup.

Why would bosses be "demanding analytical results"? How many bosses understand
good analytical results versus a lot of BS, have an accurate view of the
potential of analytical results, could explain why it might be good for
results to be analytical, know how to do projects that yield solid analytical
results, or see how analytical results could help their careers or the goals
of the company? Answer: Only a small fraction. E.g., only recently has Wall
Street taken analytical results seriously for trading instead of intuitive,
judgment _stock picking_.

> My reasoning was simple: anyone with data science on their side would be
> able to prove that their efforts worked better than their peers.

Then? How about the peers feel threatened and mount a gossip and sabotage
campaign against the data scientist and their work? The management chain can
also feel threatened.

> Basically, I had assumed a data-driven culture exists, when in reality
> businesses are struggling to create that culture in the first place.

They are not even "struggling to create that culture". It is a fertile,
gullible imagination that believes that many organizations believe that they
want "a data-driven culture".

> Data science is best viewed as a form of company culture, rather than a set
> of technologies.

No. Data science is best viewed as a technique, box of tools, that sometimes
can, likely with work with other tools and techniques, yield some valuable
results.

> I argue that it’s best to spread a data-driven culture from the top of an
> organization down, by requiring that reports be analytical.

Neither the spreading nor the requiring will work. Only a tiny fraction of the
people in the organizations have significant ability with data science, and
they will NOT make any such spreading or requiring of something they don't
understand possible in the organization.

> Solutions that help measure and improve the performance of a part of the
> company (“we’ll help you measure marketing ROI”, or “we will introduce
> predictive maintenance), will spread and become enduring organizational
> strengths.

Not really. For "enduring organizational strengths" look to, say, high quality
reasoning, writing, and presentations, powerful innovation, high
determination, careful attention to the markets and the customers.

For "Solutions that help measure and improve the performance of a part of the
company", that will be down somewhere near a good company Web site, good
telephone courtesy, keeping lunch breaks under an hour, stopping pilfering,
having good computer network management, having good computer security.

Sometimes _data science_ , or just call it applied mathematics, and the rest
of math, can mean super big bucks for a company:

Supposedly a big example is the trading software of James Simons's Renaissance
Technologies.

IIRC once the CEO of American Airlines said that their subsidiary Sabre for
reservations and scheduling was so important he'd sell off all the planes and
just keep Sabre.

Likely the old linear programming application of the diet problem is still
used effectively (i.e., save big bucks) in feed mixing for livestock, cat
food, dog food, etc.

Linear and non-linear programming are likely still pillars of, worth big bucks
for, operating an oil refinery.

There may be some big bucks from applying math to ad targeting on Web sites.

For large projects, the old linear programming application of "program (or
project) evaluation and review technique, commonly abbreviated PERT, .... PERT
was developed primarily to simplify the planning and scheduling of large and
complex projects. It was developed for the U.S. Navy ..." Closely related is
the "critical path method (CPM)".

[https://en.wikipedia.org/wiki/Program_evaluation_and_review_...](https://en.wikipedia.org/wiki/Program_evaluation_and_review_technique)

