

Why you only need 5 users for usability testing - henning
http://www.useit.com/alertbox/20000319.html

======
aasarava
Even if you disagree on the exact number of people you need to test with, the
key point here is that you don't need to have access to an expensive usability
lab to get good feedback on your product. Just looking over someone's shoulder
as they fumble their way through the interface you designed can be a huge eye
opener.

My partners and I have been building a family intranet service (Kinverge.com).
As soon as we had a working prototype together, we each sat down with various
members of our family and asked them to perform several tasks on the site
(register, invite others, add photos, post a message, etc.)

After the initial instructions, we kept our mouths shut and simply took notes.
And this is important -- don't jump in and try to help when the tester gets
confused or lost. Seeing the actions users make when they're lost is just as
important as seeing how they got lost in the first place.

The notes we gathered during these sessions later helped us decide what
changes to make to the site and in what priority.

~~~
lux
We've found a brief interview where you can ask what they thought of certain
things was very helpful afterwards too.

We also do usability on the cheap, using ScreenFlow (varasoftware.com) which
is only $99 compared to $1500 or so for Morae, which seemed to be the industry
leader in usability testing. I think the analytics side, which was the big
difference, is overrated and unnecessary.

We're planning on writing to the ScreenFlow folks to show them the results of
our tests, since they didn't seem to know of any users using it for this yet
and it could be a good secondary market for them to get into...

I'm also curious about Silverback (silverbackapp.com) but wasn't able to get
into an early beta spot in time for our testing. It actually looks really
similar to ScreenFlow in any case.

~~~
metajack
I wanted to write an app to do usability testing for a while, and then
screenflow came out and my first thought was "This is exactly what I wanted,
and even better to boot." I have a hard time believing no one else is using it
for this.

I'm glad it's working out well for you. We'll be starting to use it for this
in the near future.

------
timcederman
Ah, this old chestnut.

First of all, I love the fact that Nielsen thinks you can quantify "usability
problems found" as being on a measurable scale. To think you can identify
"100%" of usability problems with 15 users has always cracked me up.

I completely agree with his suggestions of small groups of users and iterative
design. However the problem with his approach is it's like a shotgun - he
admits, you get several users to make sure you find the non-overlapping areas.

What I found is that while it's good running sessions with groups of 'naive'
users, a few targeted, _repeated_ sessions with individual users that match
personas being designed for is incredibly useful, and produces results that
are surprisingly universal. Not only does the usability engineer discover new
things about users each time someone comes in, each time the user comes in
they find something new too. Repeated tests with individuals is a method that
is rarely covered except in perhaps user-centered and participatory design
approaches.

~~~
olavk
Why can't you quantify usability problems found? Isn't it just a matter of
counting? More users yields diminishing returns in number of problems found,
and with 15 users you basically find all problems that you will find using any
larger number of users. That sounds sensible to me. What am I missing?

~~~
donal
The issue is with false negatives (not finding real problems). It is hard to
say that 100% of issues were found, because how can you quantify the number of
problems that weren't found?

The claim of this article that you can somehow quantify the percentage of
usability problems is rather absurd. How can you quantify the total number of
usability issues in a program a priori?

Also, I don't think that graph ever reaches 100%, I think it approaches 100%
(but then I'm not a math-whiz, so I will gladly accept correction).

They've successfully performed statistical sleight of hand. The assumption
that you can quantify the total number of defects is untenable, but they leave
that part out of their article and just show the nice charts and math to
provide strength to their broken hypothesis.

I think this is unfortunate because I think that there is probably some great
insights in this article and the intent is good.

Usability and Human-Centered Computing/Human-Computer Interaction has a
tendency to suffer from this type of "pseudo-science" of using fuzzy
statistics to present great sounding finding. I was made aware of this trend
by the following article:

Wayne D. Gray and Marilyn C. Salzman, "Damaged merchandise? A review of
experiments that compare usability evaluation methods", Human-Computer
Interaction, vol. 13, pp. 203-261, 1998.

~~~
olavk
Well the idea of a usability problem that is never found sounds pretty
metaphysical to me. I mean - if no one actually experience the problem, then
how is it a problem? Also, how can a problem experienced by a user be a false
positive?

Nielsen just claims that if you test with 15 users, you can be pretty sure you
have found all problems you will ever find - you most likely won't find any
new problems by using an additional 15 or 50 tests. (But obviously you cannot
be 100% sure.)

But whether his numbers are based on sound research I wont judge.

~~~
donal
The problem isn't with Nielsen's recommendations, it is his methods. He
presents his findings as being based on scientific research, but unfortunately
his methods are deceitful.

Have you ever seen the infomercial for "Dual Action Cleanse" with Klee Irwin?
Irwin and Nielsen use very similar methods. The make claims that pass the
common sense test, then they use something resembling science to "prove" that
they are spouting fact. They are both trying to sell you something, Nielsen
has just found a much more profitable customer.

I imagine selling super-laxatives to homebodies and the sleep deprived isn't
anywhere as lucrative to selling consulting services to businessmen.

The problem with Nielsen is his "research" is dangerous to the field of HCI
and usability. Anybody that uses his "findings" to support their own research
is building a house on a broken foundation.

For his business clients this "academic navel gazing" doesn't matter, they
probably see the results they were looking for (since he already told them
exactly what to look for). Irwin's customers probably get the results they
were after too, and whether or not the FDA monitors the claims made by "herbal
supplements" probably doesn't bother them much either.

------
ahsonwardak
From our experience, this has been correct. It may be better said that a lot
of time with one user walking through and testing the UX of a site is worth
more than several shorter tests with more people. If you start multiplying
that deep dive with fewer users, you'll quickly see the major UX fixes that
are needed.

Additionally, the most obvious UX fixes are the ones that will greatly improve
your site anyway. As more UX fixes that are suggested, you'll find that they
are a product of differentiated user preferences than needed UX fixes. That's
not a UX fix. It's having certain parts of your site customizable in look and
feel.

------
DaniFong
I think this is incorrect for social software, where the dominant user
interactions are between users, mediated by the culture. Traditional usability
testing bares mainly upon user interaction with software. There are usually a
limited set of goals for users, and so a sample of five will test many of
them. That's not true for social software.

~~~
timcederman
Google targets people using screeners that have existing social networks and
brings them in to trial new applications that make use of the existing
network.

------
serhei
It probably depends on how "focused" your software is. If you have a very
specific user in mind, you can test against as few as 3 users that fit the
description, just to see if your specific user actually works the way you
thought. If your software is trying to appeal to a wide range of users you
might have to go up to 15 or even more.

~~~
henning
First let me say I've never done usability testing.

But from what I've read, the right way to do it would be multiple distinct
tests, each with a group of 3-5 users, with one group for each major user
category. Lots of tests with small user groups is much better than fewer tests
with more users.

------
vaksel
I don't think 5 users is a right sample size. Lets face it..if you are a
techie, chances are you'll get other techies to test your stuff. So whats
going to happen when a 80 year old man logs onto your site and does something
to crash it?

You know that combination of inputs that you would never suspect a sane person
would ever want to do?

~~~
timcederman
Isn't that more of a QA problem rather than a usability problem?

Also the business side of usability is "are we designing for an 80 year old
man demographic?" If not, it's hard to justify his concerns.

~~~
eru
If it works for him - it works for everyone.

~~~
timcederman
That's a simplistic conclusion to make - how do you back that up?

~~~
eru
If you design it so that he can use, but provide short cuts for more advanced
users - you can reach quite a lot of people.

~~~
timcederman
Now you're confusing an 80 year old man with all 'basic' users.

