

Why designers don’t like A/B testing - jacobr
http://www.ghostinthepixel.com/?p=549

======
lukestevens
As a designer who has written a yet-to-be-published book on A/B testing...
(And as posted on the site, but awaiting moderation...)

 _> And, quite frankly, the first victims of A/B testing are beauty, elegance,
charm, and grace._

Right, and the first casualty of a designer's-opinion-only approach is
"beautiful" designs that users hate. (Of course, neither premise is
necessarily true, this is just a baseless appeal to emotion. Who could be
against "beauty, elegance, charm, and grace"?) Can the author demonstrate
from, say, ABtests.com or Whichtestwon.com that this is the case? Or did he
simply have a bad experiences at highly data-driven companies? Is Netflix, for
example, guilty of sacrificing "beauty, elegance, charm, and grace"?

 _> Instead we get a unsightly pastiche of uneven incrementalism lacking any
kind of holistic cohesiveness or suggestive of a bold, vivid, nuanced vision
that inspires users._

Such as? Is the author suggesting we go with poor-performing designs because
in one or a few people's opinion, they "inspire" users? How do you know users
are inspired if their performance suffers?

 _> It is the implicit charter of a high-quality design team (armed with user
researchers and content strategists!) to propose something a user may not be
able to imagine, that is significantly better, since they are so conditioned
by mediocre design in the mainstream. _

IMO, it's implicit that a high quality design team could come up with 2 or
more such designs, or design variations, to test. Not only that, that they
would come up with such design variations to test that _are_ cohesive,
elegant, and beautiful.

 _> A/B testing may only be as effective as the designs being tested, which
may or may not be high quality solutions. Users are not always the best judge
of high quality design. That’s why you hire expert designers of seasoned
skills, experience, judgment, and yes the conviction to make a call as to
what’s better overall._

Make a call based on what? Gut feel? The quality of designs being tested is
not a problem of A/B testing -- if you only want users to choose from "high
quality design", only test "high quality design". I don't understand the
objection here. If a designer is truly an expert with seasoned skills, they
should be the first to push for A/B testing of their best ideas, not
retreating into vague notions of what they feel is best. There is no "One True
Solution". Designers should own and drive the A/B testing process, not feel
like they are victims of it.

 _> As is true with any usability test, you gotta question the motives behind
the participants’ answers/reactions. Instead, biz/tech folks look at A/B test
results as “the truth” rather than a data point to be debated. Healthy
skepticism is alwayswarranted in any testing. Uncovering the rationale for a
metric is vital._

People buy or they don't. They click or they go. You can't "question the
motives" or hundreds or thousands of users, that doesn't make sense. You can't
wish away the data.

 _> A/B testing is typically used for tightly focused comparisons of granular
elements of an interface, resulting in poor pastiches with results drawn from
different tests._

Poor A/B testing practice may be a problem, fine, but let's not throw out the
baby with the bathwater.

 _> How do you A/B test novel interaction models, conceptual paradigms, visual
styles (by the way, visuals & interactions have a two-way rapport, they inform
each other, can’t separate them–see Mike Kruzeniski’s talks) which may vary
wildly from before? _

Uh, you run an A/B test and measure what you're doing.

 _> Would you A/B test the Wii or Dyson or Prius or iPhone? Against what???_

Category error. We're talking about graphic/interaction/web design. All those
companies absolutely should A/B test their web sites, for example.

 _> A/B testing locks you into just two comparative options, an exclusively
binary (and thus limited) way of thinking. What about C or D or Z or some
other alternatives? What if there are elements of A & B that could blend
together to form another option? Avenues for generative design options are
shut down by looking at only A and only B._

Seriously? Even basic free tools like Google Website Optimizer let you A/B/n
or multivariate test.

 _> Finally A/B testing can undermine a strong, unified, cohesive design
vision by just “picking what the user says”. A designer (and team) should have
an opinion at the table and be willing to defend it, not simply cave into a
simplistic math test for interfaces._

Sure, and not A/B testing can undermine a business by costing them thousands
of dollars by not "picking what the user says". Pitting design teams against a
"simplistic math test" is a silly way to polarize the issue. The designer or
design team should be thrilled they have such a wealth of data at their
disposal to come up with the best design possible. It's not about fighting,
defending, or caving, it's about using the data to know -- and not blindly
guess about -- what's going on.

 _> A/B test results perpetuate a falsely comforting myth that designs can be
graded like a math test, in which there’s a single right answer._

The author asserts this several times throughout the piece but never provide
any evidence. Will they shed some light on real case studies where this
happens?

Indeed, if there is any "falsely comforting myth" doing the rounds, it's that
designers can guess and know what thousands of disparate users will prefer
within a single percentage point of accuracy. If you can guess that well, I'd
like you to pick my stocks.

 _> And you risk dissuading top quality design talent from joining the team’s
cause for good, beautiful, useful designs that improve the human condition._

Who's "human condition" are you improving by potentially harming client's
businesses by blindly rolling out designs which may in fact cost them
thousands (or hundreds of thousands, or millions) of dollars, and therefore
jobs, and so on...? This is another silly, polarizing argument.

tl;dr: If you want "good, beautiful, useful designs", then A/B test "good,
beautiful, useful designs". It's not us vs them. We can have our cake and eat
it too!

~~~
gaius
A couple of companies ago, I worked with a designer who believed "users like a
challenge" (his words) and every interface he designed was like a little
puzzle. For example you couldn't tell what was clickable or where without
studying the page for subtle clues.

Except we weren't making games... We were making corporate intranets.

~~~
bestes
Isn't "Discoverable" another way of saying this? It can and does work. It's
fun to find a feature or learn how to do something new.

Examples: iPhone. No manual (probably there is now). It's unbelievable,
really. But, you have to admit it works!

Anti-example: fire extinguisher. No tricky interface, please.

One reason I didn't see mentioned is user level. Beginners want/need simple.
Later, they want more power. How do you keep the advanced features out of the
way of the new users? Hide them! Advanced users will find them when they are
ready.

~~~
gaius
You have fundamentally misunderstood the motivations of the average corporate
user. The system is a tool to them, nothing more, they use it when they must,
then get on with their real jobs.

------
patio11
Why don't (some) developers like, e.g., the Lean Startup? Because you can
execute your job perfectly, and drive the business right off an effing cliff.
It strikes at some of our core narratives, like "successful startups are made
on the backs of technical competence, largely by upstart engineers who are a)
ridiculously competent and b) scorned by people in positions of power."

You can't tell someone who really _feels_ that narrative that, by the way,
technical competence is not sufficient for success. (To say nothing of saying
it isn't necessary.)

So why don't designers like A/B testing? Well, what's their narrative? User-
loving highly-skilled artist scorned by people in positions of power. (Wow, we
have a trend here.) They don't want to be told that a) users provably _don't
care_ that this design looks better or is more intuitive or b) that
beautiful/responsive/intuitive/etc design is not always worth actual money.

~~~
ahoyhere
You're right. Nobody wants to be told that they're wrong about their most
preciously held beliefs about the value of their skills.

But for everyone who's championing split-testing, they should remember a
little bit about where it first became popular: Direct Mail. Direct Mail
copywriters/designers were the first people to truly use split-testing as a
major part of their professional arsenal.

Split-testing gave us long-form sales letters with typewriter fonts, blue
backgrounds, yellow highlighter effects, and pictures of Ferraris and
McMansions and screenshots of commission checks.

Why? Because they all work better.

~~~
kolektiv
That's interesting though, because you seem to be implying (forgive me if I'm
wrong) that we shouldn't have typewriter fonts, yellow highlights, etc. Or at
least that it somehow isn't worth the trade off. But as you say, they do work.

Don't get me wrong, I find those design elements you're talking about to be
far from my tastes, but provably I'm not "normal" within the markets under
test. At what point what you say to a business "you might make more money like
this, but don't do it anyway because it's awful"?

Apologies if I'm reading more in to your statement than was intended. I've
seen enough of your work to know that you get both design and business hence I
think that statement interesting!

~~~
ahoyhere
I'm not saying the yellow-highlighter crowd is wrong, not at all. But would
YOU buy from them?

Your question exposes a really important fact about split-testing:

People are complex. Different stuff works for different people.

The yellow highlighter thing tends to make people like us wanna hurl… it
triggers immediate distrust. Meanwhile, information marketers still find that
it works.

You can't know who a person is when you split-test against them.

More importantly, you need a LOT of traffic, and even minor temporary changes
in traffic composition can affect the quality of your data.

You will see that traffic from HN responds to & converts very differently than
traffic from Lifehacker, and again different from Small Biz Trends. Initially
you may find certain changes increase your conversion rates… and two months
later you find you have lost almost all those customers, or that a
disproportionate amount of them have asked for refunds, or been shitty to you
in the support channels. Net result: a loss, which you only found out about
months after you were so confident that your testing showed a gain. Would the
other half of the split test do better? You can never know, because you cannot
replicate the exact scenario ever again.

Traffic is not created equal; traffic is made up of different individuals. You
can only ever get a feeling of what works for some of them some of the time,
depending on several factors.

Take the yellow-highlighter example again: Why does it work for infomarketers
when it makes people like us flee in the other direction? They sell to
different people. Why would the Hacker News crowd be worse customers than the
Lifehacker crowd? Because they are different types of people.

Unless you run a months-long test covering all parts of your business
(refunds, avg turnover rate, length of customer stay, # of support
inquiries/nastiness in support inquiries, resource usage, server bills, etc),
tho, split-testing can't tell you which types of people are better for you.

And if you DO do that, when exactly will you have time left to expand and
market your product?

I'm not saying split-testing is wrong or useless, but people use it to try to
create scientific certainty when it's absolutely NOT possible to do so.

~~~
kolektiv
Thanks for the clarification - I think to some extent we're arguing on the
same side. Points about traffic I would certainly agree with, and to some
extent it's as much about attraction as about increasing conversion metrics
once obtained.

I'm not sure I'd agree with the last point about time - I think that balancing
business generation, expansion and refinement is possible - but perhaps not so
much amongst startup size businesses, which is more HN territory. In that case
it should certainly be looked at as "is it better to try and get 0.1% more
people to click buy or 20% more people to arrive" - or whatever you think your
likely ratios are, hence cost/benefit call.

------
cousin_it
Reading the article, and realizing that these are probably the _best_
arguments against A/B tests that designers can offer, has actually made me
change my opinion in favor of A/B testing. I'll try to go for a point-by-point
reply:

    
    
        > A/B testing may only be as effective as the designs being tested
    

True, but taken on face value, these words mean that A/B testing becomes more
valuable when you have great designs made by great designers. Presumably not
what the author wanted to say.

    
    
        > As is true with any usability test, you gotta question
          the motives behind the participants’ answers/reactions.
    

True, but how does the designer know in advance that "questioning the motives"
and "applying healthy skepticism" will yield conclusions favorable to the
designer?

    
    
        > A/B testing is typically used for tightly focused 
          comparisons of granular elements of an interface, 
          resulting in poor pastiches with results drawn from 
          different tests.
    

This point is correct, as far as I can see. It's actually the strongest
argument against A/B tests that I know.

    
    
        > Would you A/B test the Wii or Dyson or Prius or iPhone? Against what?
    

This point doesn't have any bearing on whether you should A/B test websites.

    
    
        > A/B testing locks you into just two comparative 
         options, an exclusively binary (and thus limited) 
         way of thinking.
    

This point is wrong. A couple months ago we ran an A/B test with four
alternatives.

    
    
        > A designer (and team) should have an opinion at the 
          table and be willing to defend it, not simply cave 
          into a simplistic math test for interfaces.
    

This point only asserts what the author is trying to prove.

~~~
ahoyhere
"Reading the article, and realizing that these are probably the best arguments
against A/B tests that designers can offer..."

What makes you say that? Did you split-test them? (Serious question.)

~~~
cousin_it
Serious answer: because I haven't seen any designer offer better arguments
against A/B testing, though they often write about disliking it.

If someone holds an opinion but doesn't articulate the reasons for holding it,
I'm willing to give them the benefit of the doubt. But if they try to
articulate the reasons and the result is emotional and full of fallacies, I
start to think that it's just rationalization and the actual reason is
something else, perhaps some opaque emotional reaction that they're trying to
justify instead of examining it.

~~~
ahoyhere
If you read a lot, and you probably do, you know that the vast majority of
people can't express their way out of a paper box. (Especially if you read
stuff written by "amateurs," such as almost all of the internet.)

That doesn't mean that what they are saying is incorrect, just that they are
very bad at explaining it.

Ex: Most people "believe" in seasons, and evolution. But you ever try to get
them to explain what makes those things WORK?

------
benjash
As a senior graphic designer / web developer:

A/B testing is massively useful, but often misused in business. Commercially
its become a bit of a buzzword. Which has led it being badly implemented.

I think designers often get frustrated when A/B testing is applied to their
designs post-brief.

Designing a website for A/B testing purposes is a completely different brief.
You are designing a toolkit for testing purposes. So that the next version
will be informed.

A/B is quite is lot more complicated than some people appreciate. Ive often
seen people create two completely different designs and then sort of let them
fight it out. Yet they learn nothing from these experiments. There could of
been elements that work well for certain persona's in each design!

Fundamentally, designer can be quite precious. They don't like criticism like
this. After all its impossible to measure all the factors and how does one
measure 'creativity'. Plus some will feel like this lowers there input. why
not just throw all the elements in bucket and see which comes up trumps.

As a designer I love using this sort of testing. Design by data is a growing
trend, that i think will get wider recognition. Sadly the A/B testing phase
can create some slightly odd combination at first, but when you go back a re-
design the page. You know its going to convert better.

------
ender7
I agree with lukestevens; there's a ton of value in A/B testing. A willingness
to destroy your attractive but unusable mock is what separates the graphic
designers from the interface designers.

That said.... A/B testing can be _extremely dangerous_ :

\- It's VERY EASY to design your test incorrectly. The most common result is
that your test really ends up measuring some other factor of the design, or is
chock-full of confounding factors. Any conclusions you draw from it will
almost certainly be incorrect.

\- It's EVEN EASIER to over-generalize from your data. People clicked on the
green button more than the red button...people must like green buttons! Quick,
redo the design with a green palette! These conclusions are always terribly
tempting to make, since it feels like you've uncovered some deep truth about
your users. You probably haven't, and it probably can't be generalized beyond
_the exact thing that you tested_.

\- It's easy to substitute A/B testing for critical thinking and self-editing.
Two options? Don't worry about deciding! A/B test them! Another decision needs
to be made? We'll let the users decide! The problem here is twofold: a). When
designing something interactive, there are simply too many decisions to be
made. You better get comfortable with making them, because there isn't enough
time in the world to A/B test them all. b). (related to below) Your product
will end up looking like it was designed-by-committee. Which it was.

\- Related to the previous one, you can get trapped into doing "design by A/B
testing". It's a simple algorithm. Get something that looks okay, then do an
A/B test on some small changes. Pick the best one. Repeat. Unfortunately, the
nature of A/B testing is that it is inherently _isolated_. If you're testing
correctly, you've eliminated all your confounding factors. You're just looking
at _one_ aspect of your system. The problem is that interfaces are inherently
tightly coupled. Everything tends to depend on everything else, and then you
start throwing in abstract concepts like "feel" and "friction" and general
"look" of the site (how well does it fit together? does it seem to have
personality? Does it respect the metaphors it creates? Is it consistent?). A/B
testing can't measure that kind of stuff. You also get into problems of local
maxima. This is essentially a hill-climbing algorithm. Unless you're willing
to occasionally start descending a hill, you're going to get stuck in an okay-
but-could-be-better hump forever.

In short: A/B testing is powerful. Don't abuse it. And for godsakes, design
your damn tests correctly.

------
jdietrich
This designer doesn't understand A/B testing.

In Ye Olden World of Direct Mail, everyone worked to beat the control. The
control, your most successful mailing piece to date, was the benchmark against
which everything was measured. You could modify the control, you could take a
completely novel approach, but your work only counted for anything if it beat
the control. Clients learned to ignore what they thought about a piece and
care only about the results.

What the industry learned was that the only sane way to develop a brand new
mailing piece was to imitate established controls. Books like Who's Mailing
What and Major Mailers provided an index of successful mailings. Copywriters
and designers came to understand the Darwinian nature of mailing and gave up
on trying to predict what works. They learned that artistry was at best
useless, at worst harmful. Their intuitions were completely wrong - the best-
performing mailings were almost universally ugly, stupid and reliant on cheap
tricks.

Outside of DM, designers almost never think in this manner. Throughout their
education, they are taught to produce work that is aesthetically pleasing to
the design world, with perhaps some concessions to accessibility. Aesthetics
don't pay the bills.

The purpose of A/B testing is to measure the effects of a decision on the
bottom line, but also to act as a driver for decisionmaking. A good DM
designer has no hesitation in making a mailing piece look like a payslip or a
bill, because he knows that such pieces work. His favourite font is Courier,
his favourite colour is fluorescent yellow, his favourite adjectives are
"splashy", "bold" and "eye-grabbing". He is blind to aesthetics and seeks only
to drive the recipient to the next stage - to open the envelope, to read the
headline, to the body, to the insert, to the coupon.

If you're looking to build a brand based on aesthetics, a designer is probably
the right choice. If you just want to sell some product, 99% of them are a
liability.

------
noelwelsh
The author is right about incremental A/B testing, though for the wrong
reasons. If I do tests:

    
    
      A vs B -> B
      B vs C -> C
      C vs D -> D
    

there is no guarantee that, say, A plus the changes in C wouldn't be better
than D. This is the danger of local optima vs global optima.

I don't buy the rest of the arguments.

~~~
raganwald
I agree. The article isn't really arguing with A/B _testing_ , it's arguing
with hill-climbing. In other words, it's not the questions that are the
problem, it's what you do with the answers.

HN is (at this time) programmer-centric. It ought to be obvious that you
cannot design the best program on a large scale by optimizing functions. You
have to start with a good design and then relentlessly optimize the bits. From
time to time you have to consider larger redesigns that would throw all your
previous optimizations right out.

Web/UX design is roughly analogous. Customer conversion, for example, is a
minefield. You might have two archetypical personae, Alice and Bob. If you
start with a design that appeals to the Bobs and the Alices equally, a small
change (sexist image?) might alienate the Alices but increase the appeal to
the Bobs. If the _immediate_ gain outweighs the loss, it looks good.

But further gains run out of steam when you have all the Bobs but no Alices:
You have reached a local maxima as you state so economically. Had you tried a
different optimization that marginally increased appeal to both the Alices and
the Bobs, you might be able to continue to acquire both the Alices and the
Bobs over time, far outstripping the Bob-centric improvements that alienate
Alice.

But you'll never know if you went with the biggest immediate gain.

------
wpietri
It seems like this person's answer boils down to "because sometimes other
people use it wrongly," which strikes me as weak. Sure, it's true; he catalogs
a few ways the clueless can misuse A/B tests. But you can over-weight any
evaluative result. Including (cough, cough) a designer's own opinion of their
work.

He's wrong on one point, though: some designers definitely hate A/B testing
because they don't like being proved wrong.

A fair number of designers apply a genius-beyond-fathoming approach to
presenting designs. It works especially well for design firms; like many
consulting companies they sell to people who can't really evaluate their
output. As long as they act with confidence, clients think they're brilliant.

Those designers (whose inflexibility, lets be fair, we encouraged by putting
them through waterfall hell) have a hard time adjusting to iterative
processes. They have to stop pretending confidence in individual designs, and
instead develop confidence in their ability to fail skillfully. They are
forced to confront the fact that they aren't artists, but instead are
professionals who are there to accomplish particular business goals. And they
have to learn to compromise gracefully with people of different backgrounds.

Hating all that forced learning is perfectly natural.

------
ThomPete
As a designer myself I know where he is coming from. But it's also obvious
that he is building up a strawman.

First of all when you do A/B testing you don't ask the user. You test what
composition get the most clicks.

Second A/B does not mean beautiful/ugly but for instance high contrast on call
to action / low contrast on call to action.

Third you can't claim that there is no right design and then go on to claim
that A/B testing is wrong because it forces you to choose between two
alternatives. Then he is basically contradicting his entire claim.

Fourth. There are cases when design is important and cases where it's not.
It's not always important and good design is not always beautiful in the way
he seem to be talking about it.

At the end of the day better conversion must be the aim for any design. If you
want to do art you should become an artist.

Design matters a lot, aesthetics less so. In fact aesthetics are a rather
contextually relative thing and should be judged as such.

Only designers who think about A/B as the OP don't like A/B testing. There are
plenty who understand that it can actually lead to better design.

------
kelnos
_Users are not always the best judge of high quality design. That’s why you
hire expert designers of seasoned skills, experience, judgment, and yes the
conviction to make a call as to what’s better overall._

In the end, aren't the users the only judge who matter? You hire expert
designers to create an interface that your users judge to be pleasing, but
also intuitive and easy to use. The user's judgement is king. If your users
don't like your interface, it's game over. It doesn't matter how "cohesive"
your designer thinks it is.

 _How do you A/B test novel interaction models, conceptual paradigms, visual
styles (by the way, visuals & interactions have a two-way rapport, they inform
each other, can’t separate them–see Mike Kruzeniski’s talks) which may vary
wildly from before? Would you A/B test the Wii or Dyson or Prius or iPhone?
Against what???_

Rephrased: "I don't like A/B testing because you can't test everything using
A/B testing." Huh?

 _A/B testing locks you into just two comparative options, an exclusively
binary (and thus limited) way of thinking. What about C or D or Z or some
other alternatives?_

C'mon. The "A/B" in "A/B testing" isn't meant to limit choices. It's just a
convenient way of expressing the idea. There's nothing stopping you from doing
"A/B/C/D testing", or even starting out with 10 different ideas, and then
using other methods to narrow them down to a few choices that you can then A/B
test.

 _What if there are elements of A & B that could blend together to form
another option? Avenues for generative design options are shut down by looking
at only A and only B._

Sure, if you think of it in a ridiculously restrictive way. So do an A/B/C
test between A, B, and some A+B blend that you think might work. Or make it an
iterative process: do an A/B test, which let's say shows you B is better. Then
do another A/B test, with the original B pitted against a B with some elements
of A blended in.

If you look at A/B testing as the be-all-end-all and consider it the only tool
in your arsenal, of course you're going to get bad results. But it has its
uses in some places, and any designer who refuses to do A/B testing _at all_
just seems uninformed and unwilling to examine hard data about how an
interface might work.

------
mgkimsal
It boils down to "because they don't get to be in control." Or perhaps "they
have to answer to someone else." Or lastly "they don't like being told they're
wrong."

I'm not sure any developers like any of those situations either.

"My data structure is _killer_!. What's that? Oh, we now need to track more
info at a different level? But... that'll KILL all my ORM work! Users should
just deal with what's there - it already works!"

The 'ugly pastiche' issue - I'm not sure why it'd be an ugly pastiche. The
author seems to be suggesting that sections that 'won' from multiple disparate
tests would be thrown together in to one interface without testing _that_
interface. The little a/b testing I've seen done tended to be more iterative
and sequential - design a vs b, then b vs c, then b vs d, then d vs e, etc.

------
bemmu
I used to think that the only possibly important metric of a page is how many
sales it leads to, and with an A/B test you can prove which design is better
and that is the end of discussion. I still mostly believe this.

Lately however I realized there is another metric which most A/B tests don't
measure (but which perhaps could be measured). That metric is, if a person
sees this variation of the page, how likely are they to tell people about the
service.

For example if I include some sort of "human interest" content on my landing
page such as my personal backstory on why I started the business, that might
make it more likely to get mentioned in blogs. But such a backstory really
doesn't mean much to people who are contemplating making a purchase and could
hurt conversions. Not including it could deprive the landing page of thousands
of leads. Including it could raise the conversion rate a bit, but there would
be less potential customers to convert.

When a customer looks at the page, you are trying to make a sale. When a
member of the media looks at the page, perhaps they would value different
things, maybe in that case those site aesthetics would be more important.

------
jvandenbroeck
Of course you shouldn't judge the character or quality of a professional with
an A/B test, that would be ridiculous (although I can imagine people doing
it).

BUT you should do A/B tests, if you do an A/B test with 1000k+ people and one
design converts 20% more than the other, why on earth would you keep the other
design? Just because the "professional" designer of the other design wouldn't
feel bad?

------
redthrowaway
It seems that the author isn't disaffected with A/B testing, but rather with
improper implementations thereof. A/B testing is intended to be scientific, so
it must therefore adhere to rigorous scientific principles of validity. Sure,
there are those who implement it incorrectly, but a properly implemented A/B
test should always have a (large) control group, against whom you can measure
the purported improvements. Such complaints as A/B testing inherently
fostering incrementalism that leads away from proper design can be countered
by implementing a "proper" control at each stage. Similarly, his complaint of
A/B testing being a binary proposition ignores the 0/1/many axiom of good
engineering. An A/B test that forces the user to choose between two
unattractive options is a stupid test. An A/B test that shows many options to
be unattractive is a good test with a dissatisfying result.

Rather than question the underlying validity of the methodology, the author
would be better served by questioning the validity of its implementation in
the instances he's encountered.

------
Isofarro
"Instead we get a unsightly pastiche of uneven incrementalism lacking any kind
of holistic cohesiveness or suggestive of a bold, vivid, nuanced vision that
inspires users."

Uh, you lost me at that point. Try an A/B test by varying "holistic
cohesiveness or suggestive of a bold, vivid, nuanced vision that inspires
users" with "consistent design integrity that inspires users".

------
mtogo
> _Users are not always the best judge of high quality design._

Yes they are, actually. If your users don't like the design, the design is bad
because the users are all that matter.

------
tworats
The author misses the goal of A/B testing: it's to improve a measurable
business objective (signups, retention, etc), not to pick a more aesthetic
design.

A/B testing may tell you to pick a "worse" design that results in better
business. That's perfectly ok. There are plenty of design showcases one can
browse for purely aesthetic appreciation of design.

------
rweba
Surprised no one has mentioned Doug Bowman's reasons for leaving Google, which
seems in part because of extensive use of A/B testing. I don't really know
enough to comment on it but it seems relevant:
<http://stopdesign.com/archive/2009/03/20/goodbye-google.html>

------
matthiasl
Is A/B testing a greedy algorithm?

I don't know much about A/B testing, nor have I done any. My first thought is:
how do you avoid getting stuck on local maxima?

(The wikipedia article on A/B testing mentions A/B..Z testing and
'multivariant testing', which may be an answer to my question, but they don't
seem to get much mention.)

~~~
kolektiv
That's actually a valid point with A/B and MV testing - you can get stuck on
local maxima. It's why the process still requires manual intervention and
design skills. Most flaws with A/B implementations are people/process or
misuse.

------
duopixel
I found that developers edge out designers at predicting the results of A|B
tests:

<http://method.ac/blog/design/devepers-designers-results.html>

------
lwhi
I think A/B testing in relation to design is a bit like living by the dice (a
la the diceman).

You only pick alternatives you'd be happy to see play out.

