
Groundhog Day - or the Problem with A/B Testing - fogus
http://www.codinghorror.com/blog/2010/07/groundhog-day-or-the-problem-with-ab-testing.html
======
patio11
I wonder if the conclusion about A/B testing would have been the same if the
tenuously-related lead-in anecdote had been about medicine ("Although
randomized drug trials might have succeeded in saving billions of lives, it
didn't ring true to patients...")

You can ask users about their satisfaction. Do it. If you do A/B testing well,
after testing _it goes up_. I'm sorry if it doesn't provide spiritual
fulfillment to Jeff Atwood or product visionaries who hate the idea that
something so simple so often trumps their vision, but surely you can work
through that after you see the benefits available. A/B testing let me help
over 100,000 extra kids learn to read this year -- that is pretty freaking
spiritually compelling.

~~~
tezza
Hi Patrick...

I'd just wonder if it would be safe to say that A/B testing is a good tool...,
but should be used in conjunction with other tools such as knowing-your-
customer, intuition, expertise etc.

I'm not in a space where I can A/B test anything but from the outside it seems
as if A/B testing can be mis-used as a replacement for traditional methods and
can be obsessed about because it IS measurable.

------
perplexes
A/B Testing is a manual hill-climbing algorithm, and we can't see the hill.
Imagine a vast 2d plane that represents all the possible combinations of your
website - its buttons, colors, copy, layout. There are mountains and valleys
that represent spots of high-signup and low-signup. You are at a certain
point, and an A/B test will, say, move you North, at which point you can see
what your altitude is. Is it higher? Lower?

Unfortunately moving like this will trap you on a particular mountain, until
someone comes along with a helicopter, saying - hey! I think I see a bigger
mountain range over there! - and you try out a completely different design.
Which you could certainly A/B test to see if the spot you landed on is
actually higher than where you were (which if you're at the base of a huge
mountain, may not be the case!).

Of course, the combinations are multi-dimensional, and so are the fitness
functions (signup, retention, word-of-mouth, etc). But I've found this useful
to explain why A/B testing has problems; that is to say, it has specific uses,
and so do UX designers.

------
michael_dorfman
I think Atwood misses the point a bit here: A/B testing is not destined to
find shallow local maxima; one can also test larger, more significant
differences.

And, ultimately, Phil (in Groundhog Day) discovers that.

~~~
mechanical_fish
The problem, of course, is that you can only afford a certain number of big
experiments. Life is finite.

And big experiments can be hard to A/B test against your mature product. This
is what that whole "chasm" thing is about: the audience for new things is
different from the audience for old things; present a radical new thing to
someone who isn't an early adopter and it will get low marks. The automobile
had really bad A/B results against the horse, for most audiences in the late
1800s. It is hard work to attract a new audience, when you have the
alternative of incrementally improving the experience of the old audience.

And, of course, the more time you've invested in building your radically
different idea, the more crushed you will be by bad A/B data. A/B testing of
tiny differences that can be toggled in five minutes is the least emotionally
painful form of A/B, so no wonder it is so much more popular, to the extent
that Atwood thinks it represents the entire field.

Atwood's metaphor is still perfect, however. The movie _Groundhog Day_ covers
all of this. It's an astonishing work of art.

------
noelwelsh
I've heard that life is like a box of chocolates. I've also heard that
stringing together aphorisms, shaky analogies, and vacuous statements doesn't
make a coherent argument, but the size of Jeff Atwood's audience suggests I'm
wrong. More A/B testing for me?

Look, A/B testing is a tool using statistical tests of significance. These
statistical tests are based on a bunch of assumptions, the most important of
which is (usually) independent and identically distributed data (i.i.d.) If
your data isn't i.i.d. (like, say, the changes in opinions and attitudes that
occurred over the many years during which the automobile was developed) the
conclusions reached by the test don't hold. Still, not investing in
automobiles was probably a good investment unless you got lucky.

------
nat
I disagree with that quote he throws out at the end there about A/B testing
not being able to construct things. Isn't that essentially what genetic
algorithms do? Granted, we don't use that sort of process to make a website
because it would be slightly insane, but who knows?

And even starting from one sub-optimal site, isn't arriving at some better,
more refined experience a form of construction? If you're not discovering
something unexpected or seeing something new form from A/B testing, I think
that speaks more to your lack of imagination or unwillingness to try than it
does some inherent limitation of testing.

~~~
rmc
Genetic algorithms do create things. It's the randomness seed at the start
that creates the new thing.

------
lionhearted
Interesting article. I was fascinated by his point on Groundhog Day, then I
disagreed with his point on A/B Testing (because I used to feel the same way),
and then finally I loved this quote at the end - "A/B testing is like
sandpaper. You can use it to smooth out details, but you can't actually create
anything with it."

I can't think of another article that short on a mundane topic that I had
three distinctly different, relatively strong emotions to.

------
edanm
For anyone that doesn't know it (like me), he mentioned the Ouroboros. I found
it pretty interesting. From Wikipedia:

"The Ouroboros or Uroborus[1] is an ancient symbol depicting a serpent or
dragon swallowing its own tail and forming a circle."

<http://en.wikipedia.org/wiki/Ouroboros>

~~~
nopassrecover
It's (probably) a Red Dwarf reference.

------
dennisgorelik
I wonder why would anyone want to give up reliving the same day after measly
30-40 years. Would it be better to age in the meanwhile?

------
Charuru
It's a movie. In the real world after having the perfect date the woman ends
up loving you forever.

