
The joyless world of data-driven startups - spxdcz
https://medium.com/@zambonini/the-joyless-world-of-data-driven-startups-b6f475f11f5f
======
baddox
> Sometimes it works. Sometimes it’s critical. But sometimes it fails, or
> results in unintended consequences that we may not notice for years.

> Data-driven journalism gave us Buzzfeed

> Data-driven music gave us X-Factor and Pop Idol

> Data-driven movies gave us 25 Hollywood sequels planned for this year

> Data-driven education gave us Key Performance Indicators and Teaching to the
> test

These first three examples are awful examples. A ton of people love all of
those things. The only failure of data-drivenness here is the failure to
generate content that _the author_ wants. Now, the author can attempt to make
some argument about how websites, TV shows, and movies have a moral obligation
to strive for whatever objectives the author prefers, but that's a separate
issue to settle.

The fourth example is a little different, because we're talking about
mandatory education programs for children, rather than products that people
choose to pay for or consume. Also, I don't think that data-drivenness itself
is a significant contributor to those problems in education.

~~~
api
The failure is the failure to generate novelty or adequately explore
alternatives.

Enough people might reliable go see 25 sequels, but none of those films will
be memorable. None will advance the art of film-making. None will change
anyone's life or mentality or affect the culture in any meaningful way.

Being data driven means chasing the biggest, loudest signal in your data set.
It means pandering to that signal, because it swamps all others. A data-driven
approach is not going to lead you anywhere new.

In machine learning / AI we refer to such algorithms as "greedy." A classic
example would be simple hill climbing a.k.a. gradient descent. These
algorithms are known to be very good at optimizing within the bounds of a
simple well-behaved and regular fitness landscape, but they readily become
stuck at local maxima when presented with any solution space with any complex
structure.

We're living in what I'm tempted to call the dark age of the local maximum,
the age of gradient descent.

~~~
baddox
> The failure is the failure to generate novelty or adequately explore
> alternatives.

Perhaps you value novelty more than the average person. What might not be
"adequate" for you might very well be adequate for a huge portion of people.

Throughout this comment you mention several potential goals of content
producers (being memorable, advancing the art, changing someone's life, etc.),
but you make no argument for why those goals ought to be prioritized over
other goals, other than the implication that you personally prefer those
goals.

~~~
dxbydt
> Perhaps you value novelty... > What might not be "adequate" for you...

It isn't about what he values versus what you value. What the author complains
about are well known problems with recommendation engines. Take the naive reco
algo - "You just bought a 50 inch Philips TV. You might also like - 50 inch
Sony TV, 50 inch LG TV, 50 inch Samsung TV, 50 inch ..." \- See the problem ?
I already bought my fucking TV, you can't expect me to buy more & more of the
same or similar goods.

Then there's the CF algorithm & its variants, with well known problems -
namely, they don't actually match content to one's preferences. Typically, one
POV dominates across the board due to the sparsity of the matrix, & getting
the diversity required for the matrix to fill out takes a long long time and a
very large number of people with diverse opinions. You mistakenly give a five
star rating to Godfather & you are bombarded with mafia movies for a long
time. You attempt to confuse the system by giving Pretty Woman five stars as
well. Then the system tries to gamely proceed by suggesting "Those who watched
Godfather AND Pretty Woman are more likely to watch - So I married an Axe
murderer."

Can't win.

There are auto-complete screenplay software that basically make a composite of
the top 100 best selling screenplays & do what in the industry is called a
flip. Namely, change male to female, winner to loser, comedy to tragedy etc.
These data-driven screenplay software might suggest that if you take the
ladies from Thelma & Louise & replace them with grizzly old men, you get
Unforgiven.

There are lyric generation software with the same flavor, umpteen loop
generators & infinite jukeboxes, content recommendation systems along the same
lines - since you just starred this code sample on angular, you will enjoy
this github repo on react,...

Hopefully you see the downside.

~~~
baddox
I understand the general argument, and I don't disagree _in general_. My
intuition is that showing 50 inch TVs to someone who just bought one is not
ideal [0]. My point is that the specific examples provided (Hollywood
blockbusters, etc.) are not accompanied by any evidence or reasoning to
convince me that these industries are not doing a good job of satisfying the
market.

[0] That said, I've seen lots of counterintuitive but very real phenomena
regarding user behavior, so I won't claim to be _that_ confident about this
being ineffective. Perhaps people return TVs a lot and buy other ones. I don't
have the data.

~~~
shakethemonkey
The data might just show that offering 50 inch TVs to someone who just bought
one is actually a rich opportunity. Perhaps both televisions were stolen in a
burglary. Or someone is finally upgrading all their televisions from CRT to
flat screen -- maybe they moved from a house to a small apartment. It would
not surprise me in the least if people were 10 times more likely to purchase a
television having just bought one, compared to individuals randomly selected.

------
noelwelsh
I think the article goes a bit too far against data. Hits like Bohemian
Rhapsody are by their nature freak events and not easily reproducible. Nobody
is suggesting (I hope) that you can achieve that kind of success without a
healthy dose of luck.

However I agree that for early stage startups data driven decision making can
be difficult. My experience is its expensive and you often have very little
data.

The other issue is this kinda uncanny valley of false rigor. On the one
extreme you have very informal analysis. For example, we tweaked our blog post
template and increased newsletter signup rates but I can't tell you exact %s
because at this stage we don't track it. We seem to be getting a lot more
signups, but perhaps its illusory. That's ok. At our level of traffic it
really isn't important. At the other extreme you try to model non-stationary
processes and all that and have rigorous control over sources of error. In the
middle is where I see many companies with, say, A/B testing, believing they
have a high level of statistical rigor but not actually achieving that rigor
in practice due to many uncontrolled sources of error. This middle spot, where
you have too much faith in faulty reasoning, is where I _believe_ bad data
driven decision making resides.

Oh, and on turd polishing:
[http://www.dorodango.com/create.html](http://www.dorodango.com/create.html)

~~~
frandroid
> Hits like Bohemian Rhapsody are by their nature freak events and not easily
> reproducible.

Just like startups...

------
raincom
The so-called data-driven science have not understand the notion of science.
In a minimal sense, science is to produce knowledge. There are two things to
it: hypothesis generation; testing the hypothesis. As the history and
philosophy of sciences have shown, there is no algorithmic way of generating
hypotheses. Or if you generate hypotheses algorithmically, you are still left
to figure out whether these hypotheses are ad hoc or not. After all, the
history of sciences have given powerful heuristics to reduce the solution
space to generate hypotheses to solve or explain problems or facts. Here,
whether one picks 'solve' or 'explain' depends on which philosophy of science
one picks up.

Whenever I see statistics and data-sciences, I see tons of adhoc bullshit
masquerading as sciences/knowledge. It is always easy to come up with a
hypothesis to explain a set of chosen facts; in order for that hypothesis to
be non ad hoc, it has to predict surprising facts.

As the fad continues, we may hear like robots replacing scientists to produce
knowledge about various phenomena. For a best critique of AI, check the book
by UCBerkeley philosopher Hubert Dreyfus: what computers can't do, a critique
of artificial reason.

------
mwsherman
“Epistemology”, he whispered.

Using data is good, but “based on” offers a lot of wiggle room. A 10% increase
in CTR is nothing to sneeze at, but it does not answer the question of the
best use of your engineers’ (or designers’, or marketers’) time. Should they
have instead been working on the thing that has a 50% chance of a 20%
improvement? How do we account for all the data we didn’t bring to bear?

The data is small, the interpretation is big.

There is also the problem that philosophers call “regress”, which is that
every rational decision has to trace back to a premise that one assumes in.
Should we be in the business we are in, compared to all the other potential
uses of talent? We can’t know that empirically, at root.

~~~
jes
+1 for epistemology.

------
slowmovintarget
Premature optimization: still the root of all evil.

One of the better takeaways from the article was the notion that being data-
driven means you're aiming for average, and you might not even hit it. Aim for
the moon, you might only achieve orbit instead.

I watched a CEO make arbitrary layoff decisions based on what the numbers said
should be the size of a development organization and the the ratio of
developers to QA. The actual software being built was irrelevant to his
figures. He used numbers to justify grinding the dev organization into the
ground.

------
peterburkimsher
I'm not afraid of computers acting like people (AI). I'm very worried about
people acting like computers.

Every circuit, every program is based on a principle: comparators (analogue) =
NAND (digital) = if statements (software). Machines choose their answer by
taking a huge amount of information, and sorting it. By design, this leads to
some monstrous conclusions. For example, eugenics might be logically
efficient, but it is morally abhorrent.

Taking risks, making mistakes: these are not flaws, they are the very essence
of being human.

Test yourself! I guess that everyone on here is very rational (as I am). I
only discovered this problem in my character after a conversation with an
artist, a good friend from high school. She makes all her decisions based on
the heart, rather than the mind. Try to do something totally random! When
things make no logical sense, the emotions wake up again. You'll "feel" again.
It doesn't matter if that's a good or bad feeling - acting like a machine
makes you feel nothing at all. A machine can defend every action it takes,
because it's never wrong. But machines can't apologise.

There will be data-driven businesses. They're not actually run by humans
(whatever the management says), they're run by machines. Those companies could
ultimately be fully automated away. It's far better, as a human, to be
creative (even if the most creative thing you can do, like me, is teaching
machines how to talk to other machines).

~~~
greggman
> She makes all her decisions based on the heart, rather than the mind

Maybe I reading the wrong thing into that. I make decisions with my heart and
mind.

Heart = I care about my kids and want them to be healthy

Mind = To care for them I vaccinate them and don't use homeopathy

Maybe I'm misinterpreting this but my general experience is someone who "makes
all decisions based on the heart, rather than the mind" generally makes some
very poor decisions that actually don't lead to the results they want.

~~~
somberi
@Greggman - I like your Heart/Mind example more approachable variant of
Bertrand Russell's "The way forward for humanity is compassion guided by
knowledge".

------
dude_abides
The most important skill, in order to be data-driven, is to ask the right
questions. If you're looking to get to product market fit, the questions you
should be asking are very different from the ones you should be asking if
you're looking to grow a "good" product. In both cases, data can help you
reach your goal, but only if you ask the right questions.

If an early-stage startup tries growth hacking before it reaches product
market fit, it will likely end in disaster.

------
upquark
Lots of ignorance in the article and some of the comments regarding data-
driven decision making. Does the author realize how much of the civilization
around us is built and guided by data-driven decisions? Also, people/companies
not being able to effectively use their measurements to their advantage is not
really evidence against the idea itself. When your model doesn't work, it's
not modeling that's broken, it's just your model.

Gut reactions can take us only so far: they break down as we move away from
single human-scale familiar problems (ones that the brain has some built-in,
evolved capacity of handling, such as reading other people's facial
expressions).

~~~
crimsonalucard
Do you realize how much of civilization was built through intuition?

As great as data is, it is also limited. Why? Because we can never gather
enough data. All our data is just a simplification of what's actually going
on.

Essentially we're just grabbing data that's generated by black-box tests on
systems that are astronomically more complex then we can comprehend. In many
cases the data tells only a fraction of the story. It's akin to some alien
race trying to understand the a computer desktop by measuring the electrical
inputs on a usb port and seeing how that effects the voltage output of the
hdmi port.

\--

Here's a telling example of the power of intuition by a quote from steve jobs
responding to Marc Andreessen inquiring about the "critical" problem the
iphone had of "not having a physical keyboard.":

‘They’ll get used to it.’

Any datapoint you gathered on keyboards back in the day would have told you
otherwise!

~~~
jmtulloss
I pretty much agree with you 100%, but I hate it when people use Steve Jobs
quotes. He was better at this than almost anybody else, the rest of us need
some help and some luck.

------
pbreit
>> Everyone tells early stage startups to use data for big strategic
decisions.

This is not even remotely true in my experience working at and advising
startups. Sure, data is important, but more so for tactical matters like ad
performance and A/B testing. Big strategic decisions typically employ far less
data relatively, pretty much definitionally since no data exists for "big
strategic decisions".

------
hyperion2010
There is a looming disaster of a similar nature coming in healthcare too.

Data shouldn't be used to set goals it should be used to achieve them. It may
also tell you when it is not currently possible to achieve your goal. That
doesn't mean you should throw up your hands and set goals that the data seems
to indicate are achievable because (among other things) that is tantamount to
believing we can actually predict the future.

~~~
davak
I would love to buy you a beer sometime and discuss that with you. If you have
any sources for that epiphany... please let me know.

------
evanwarfel
There is one, and only one reason to be 'data driven'. Or to test one's
hypotheses, for that matter. And that is to make sure you aren't fooling
yourself; that you haven't fallen prey to the myriad cognitive biases; to
prevent your preconceived notions from clouding your judgement of reality, aka
what is actually going on.

While data always provides more information, the less strong your prior
beliefs, the less informative your experiment will be -- If you believe
something and it turns out to be majorly false, you get a nice shift in
expectations. If you believe in something and it turns out to be very true,
you gain lots of information in terms of quantifying the effect you are
looking into.

If you are Google, looking to eck out every last 1/1000th of a penny on ads,
yeah, maybe a/b testing the shade of blue of a button can be justified.

The more other companies are "Data Driven" [like the somewhat unfortunate
examples the author chose], as opposed to "Hypothesis Driven", the more there
is room for somebody else to fry bigger fish.

In other words, it's not the "data's" fault, it is ours.

------
dlu
Oh thank goodness it isn't just me.

------
ArekDymalski
This article omits one important aspect: the data isn't used only to create.
It's also a guide what to delete/abandon. Sure, it might be influenced by
chaotic fluctuations but anyway can help make a decision. Sometimes making any
decision is better than wandering around with ambivalent gut feelings.

------
zenogais
Personally I enjoyed this article.

I'm working within several different businesses right now, and the consistent
theme I'm trying to relate to the folks I work with is that data-driven
techniques can take you right up to the edge of what is known to be possible.
It's the people who work with the ambiguity there and take leaps into the
unknown that ultimately change things. It's fine to want to be part of the
pack, but for the really ambitious folks being at the front-edge of the pack
is still being part of the pack. Learning to make the move out in front is the
hard part.

------
frik
> Sometimes it works. Sometimes it’s critical. But sometimes it fails, or
> results in unintended consequences that we may not notice for years.

A bit off-topic, but it explains why we got Windows 8x and the upcoming
Windows 10 - data driven metrics.

When will Microsoft learn that developers and advanced users turned off the
"phone-home" metrics gathering functions in Windows XP, Vista, 7 and Office?

People want Windows 10 to be Windows 7.5. It would be nice to get some lost
Windows XP functionality back and shell bugs fixed that are in since Vista.

------
lukethomas
Being data-driven is detrimental when it replaces common-sense (talking to
users, collecting feedback, improving based on feedback).

Before the internet (and being able to track every single action), successful
companies were built. It can be done. Using data to drive decisions has some
value, but it's not the end-all solution, it's merely a piece of the puzzle.

------
keithwhor
Anyone read the Crunchbase profile of the company the author is the CTO for?

"Bipsync provides a research automation platform to maximize the productivity
of professional investors. Founded in Silicon Valley in 2012 by experienced
investors and software developers at Stanford University, the company uses
modern technologies and user-centered design to speed up data capture,
automate research maintenance and identify insights that drive better
decisions for investors and funds." [1]

I mean, maybe I shouldn't be looking for patterns, because, y'know, data. But
it seems oddly conflicting to be pitching a product that encourages the use of
data to drive decisions and then publicly condemning... the use of data to
drive decisions.

Aside from that contradiction, the company just got seed funding four months
ago. It's probably far too early to make decisions about the efficacy of being
"data-driven." From personal experience, trying to manage people by telling
them, "I'm right, let's do it my way," is terribly demotivating (and very
prone to error). Conversely, trying to weigh everyone's input equally and sift
out good ideas is an organizational nightmare that creates a ton of
complexity. Complexity slows down execution. And who decides on the best
ideas?

Creating a mental framework for hypothesis testing and building a product
based on optimizing for specific metrics is, in my mind, what being data-
driven actually means. There are no inconsistencies or personal biases. It's
scalable. You can teach the entire team how to approach the design of a
feature as a problem with a testable hypothesis. Politics go out the window as
execution strategy is determined by return on investment of engineering
resources. Being data-driven doesn't discourage creativity, it just allows you
to reframe problems.

Buzzfeed clickbait titles are but a small (and, well, effective) subset of a
vast array of largely positive things that come from being "data-driven."
Attempting to demonize patterns of logical, rational decision-making because
you (personally) don't like one outcome is... well, an anti-pattern. (It
happens all of the time. See: The history of the scientific method. ;))

Sure, it's not sexy. But it doesn't need to be. It just needs to work.

1\.
[https://www.crunchbase.com/organization/bipsync](https://www.crunchbase.com/organization/bipsync)

------
crimsonalucard
In God we trust; all others must bring data.

------
adnam
The problem with the data-driven approach is you might be optimising towards a
local maximum.

~~~
roel_v
Only with naive optimization algorithms, of course. There's a whole academic
field of study devoted to optimization, and avoiding local maxima is a big
issue of course. Hundreds if not thousands of papers are published on it,
every year.

------
prostoalex
I struggle to reconcile the data-driven approach to running products and
companies with innovator's dilemma.

It seems like micro-decisions are best made after looking at the data, but
macro decisions are not.

------
SnacksOnAPlane
Data-driven decisions will get you into local maxima, but you'll get stuck
there when there's a good chance that a more radical change would help much,
much more.

------
jacques_chester
Vision is a fancier way of saying "dumb luck".

~~~
Toine
Yes, luck is there, but i think it's only a part of it

------
curiously
This reminds me of Warren Buffet's approach to investing. It's better to be
approximately right than precisely wrong. I fear that all we are doing with
data is figuring out a precise way to fail instead of focusing on being right.

"But fail often and fast" the cliche goes, but what about the opposite, it
should be true by simple negation of logic right?

"Win seldomly and slow", then suddenly collecting data on every useless piece
of data becomes futile. You are not focused on winning and without the burden
of speed and pressure to screw things up. You are absolutely calm and able to
think things through.

~~~
bgilroy26
Business may proceed as usual: fearful managers might use data to make their
jobs more secure, savvy managers might use data to move fast and break things.

