

In Defense Of A/B Testing - paraschopra
http://www.smashingmagazine.com/2010/08/26/in-defense-of-a-b-testing/

======
_delirium
This seems to be a perennial (50-year-long) debate in design / HCI / user-
interface circles. The hardcore extremes are a branch that comes out of
traditional design and architecture, which is radically against anything that
reeks of "focus groups"; and the branch that instead comes out of industrial
psychology, which won't do anything unless it has a p-value. In practice most
designers actually building things seem to do something in between. You can't
make _every_ design decision using an A/B test unless you're prepared to run
hundreds of thousands of them, because designing something non-trivial (say, a
new IDE) requires quite a lot of design decisions. But you can use A/B testing
(or quantitative user studies more generally) to get data on specific groups
of alternatives, or to hone in on a particular subset of the design space. A
lot of the argument is over the relevant proportions, and which other methods
to use to supplement quantitative A/B-type testing.

One school, for example, is that you should generally aim to understand your
design and audience, via things like ethnographic interviews, examination of
existing designs, qualitative user studies, thought about why stuff
worked/didn't, theories about user interaction and design in certain areas,
etc., and then use quantitative tests as more of a sanity-checking tool (are
users reacting in line with how we were expecting them to?). That's somewhat
more in line with a traditional scientific view of theory formation followed
by testing the predictions that theory makes. Other approaches have the
quantitative tests taking a much more active, central role in the design
process, more akin to the machine-learning view of induction from data, where
you're agnostic about "why" something works, and just follow where the data
takes you. I think that latter gets towards the approach some designers find
stifling to creativity, particularly in how it's applied in e.g. metrics-based
game design.

~~~
paraschopra
All great points. My only argument in the article was that it is just a tool
which should be used if you specific case for using it. It isn't inherently
bad, you just need to see where it is a good fit. Like you said, design can be
more than sum of its parts so doing thousands of tests for designing a new IDE
(for instance) is impossible.

Though if you have a specific question to answer: should I have toolbar
instead of menu bar, then of course you can do A/B testing.

------
paraschopra
This is my response to recent anti A/B testing sentiment. Relevant threads on
HN discussing the articles that I have responded to:

Groundhog Day - or the Problem with A/B Testing
<http://news.ycombinator.com/item?id=1531573>

Out of the cesspool and into the sewer: A/B testing trap
<http://news.ycombinator.com/item?id=1448858>

~~~
michael_dorfman
It's a great response; the strange thing is that it is necessary at all.

An aside: am I the only one who thinks Atwood's sandpaper analogy is a bit
like Creationism? _"Random mutations and natural selection can only smooth out
small details, they can't create a whole new species...."_

~~~
derefr
Not to say your analogy isn't correct, but really, not much _can_ be done with
a pure mutation+selection mechanism—most really do just get stuck at local
minima. This is why most "successful" organisms on Earth (depending on your
definition of success) don't just use that mechanism. Rather, most biological
life has discovered a secondary form of selection—evolution—which is powered
by _sexual recombination_ , where groups of large-scale, successful changes
are introduced to other such groups, all simultaneously. This is what makes
genetic algorithms different from generic hill-climbing algorithms (which the
system of an A/B-test plus a designer simulates.)

On a larger economic scale, however, all those much-decried people saying "I
want to build something that is Facebook + Google News + Basecamp + a toaster"
are sort of performing the sexual recombination step, if they actually manage
to get it built instead of just making incessant Craigslist job postings ;)

