
You Only Need to Test with 5 Users (2000) - azhenley
https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/
======
an_opabinia
This is an excellent point, and the much more fascinating corollaries:

\- a product designer/manager of something with 1,000,000 users won’t learn
more about usability than a product designer/manager of something with 15
users. All those measurements of flows and secret at scale analytics data is
sort of worthless for the purposes of usability.

\- people with 15 users worth of learning about usability instead of 0 users
are way undervalued, while people with 1,000,000 are way overvalued

\- “we don’t know until we test it,” the #1 refrain of big company design
nowadays, is intellectually bankrupt for most free software, since if it
looked bad for 15 users it’s probably going to still be bad for 1,000,000

\- after you’ve shown something to 15 people and they don’t like it due to
usability problems, you’re extremely unlikely to find 1,000,000 more who will
like it.

This also appeals to my instinct that there is something learnable about
design, that great design is achieved by people and not by massive datasets.

~~~
noir_lord
It might even be worse still since this model (seems I may be wrong) to assume
that the probability of a usability bug is constant, it might be that the
share of bugs discovered by users is skewed towards the first few such that
the first user finds more than the formula would predict.

It's certainly been my observation that cynical developers who test things as
they go by deliberately putting in silly things into stuff _they_ just wrote
seem to get hung up less in testing.

I mean the system I inherited at work the first thing I did when I got an
instance spun up was put in a negative value in the quote line quantity (which
immediately broke..well almost everything) then decimal values in quantity
fields where only integers made sense, then text in number fields and so on
each time breaking something in a new and interesting way.

Sometimes I think it's hard not to be cynical about enterprise systems.

My old lecturer (somewhat pithily) "Almost all the testing in the world means
nothing compared to 15 minutes in the hands of the 17 year old office junior"

~~~
HeWhoLurksLate
One of my friends was _literally hired as an intern to try to break software_
last year. He loved it, and found a ton of bugs, which was really helpful to
the company- they eventually paid him a $1,000 bonus for his help over the
summer.

~~~
noir_lord
I can believe it, the best person to test software for low hanging fruit of
the "What happens if I do this thing that no sane person who knows anything
about what the software is supposed to do would do?".

It's one of the reasons why I don't trust myself to test things fully, we
write the software with all sorts of assumptions in our heads and
subconsciously steer away from doing silly things - in that context it's
really difficult to aim at a point a zero-knowledge user would hit.

~~~
QuinnWilton
I'm been building an API Security Scanner, and one of the things it does is
just fuzz every endpoint with garbage in each parameter to look for
stacktraces, errors, etc.

Moreso than any of the security tests I've written, that fuzzing has broken
every enterprise API our customers have thrown at it.

------
miki123211
I don't have much to do with UX, but I will add one insight. Try finding users
with esoteric ways of working, and ensure the site works for them. By esoteric
I mean blind users of screen readers, users of old, non js browsers, people on
corp or school networks where things may be blocked, people on extremely small
screens or extremely large ones (TVs with remotes), people who don't own or
don't want to use a credit card (there's a widespread credit card phobia in
i.e. Poland when it comes to online purchases), people using a different
language and keyboard layout, particularly ltr, significant when it comes to
desktop apps etc. I'm a screen reader user myself, and I find websites that
might be beautiful, but are utterly inaccessible constantly. I've either
encountered, or was a witness of, usability difficulties in all of those
categories I outlined. For each one, I could provide an example of a website
or app that I/someone had to abandon for just that reason, and this is just
me. I'm sure there are more nichse I haven't thought about.

~~~
iamaelephant
This is going to depend heavily on your target market. In many of the SaaS
applications I have been involved in we really don't care about users with
non-JS browsers, or extremely small screens, or TVs, or people without credit
cards. Some of the applications I have worked on will never be translated.

~~~
wwweston
Two observations:

1) A site that isn't ready for a screen reader probably isn't ready for a
voice browser or other non-visual user agent. Say, an AI/digital assistant. Or
perhaps even a search engine indexing bot (though for the last 10-15 years the
incentives behind this often mean people will invest heavily, even non-
cooperatively, in this specifically).

2) Projects for which the engineering takes into account the possibility of
serving content to non-JS or weird screens UAs from the get-go are often
better technically. Not a statistically investigated theory, but my theory is
that it's because if you're really doing the REST/resource-oriented thinking
necessary to make a non-trivial app display responses as minimal markup,
you're also doing the thinking necessary to make a good API to be consumed by
a dynamic or even SPA front-end. And vice versa: if you've got a good API for
a dynamic-heavy or SPA front-end, it's not difficult to represent the given
resource with the media type HTML instead of JSON. Which means if it _is_
difficult to represent a resource as HTML instead of JSON, something's
probably not right with how your app is put together.

I wouldn't go so far as to say _every_ app needs to be plain-HTML +
accessibility focused, but I think there's benefits that go beyond the margins
of users with direct accessibility issues.

~~~
derp_dee_derp
Ux Engineering manager here for a fortune 100 company.

We don't care about any of the use cases in your comment. If our JavaScript
doesn't work in your browser, you are a security risk and we don't want our
site to work on your browser.

Please call our 1-800 number to talk to a representative.

~~~
saagarjha
> If our JavaScript doesn't work in your browser, you are a security risk and
> we don't want our site to work on your browser.

Sorry, what? Why am I security risk for not wanting to run the arbitrary code
that your website sends me? (I browse with JavaScript turned on, FWIW, so this
is a hypothetical question for me.)

~~~
adrianN
Some javascript is used to detect bots.

~~~
saagarjha
Client-side JavaScript to detect bots seems doomed to fail, unless it's some
sort of proof-of-work kind of thing.

~~~
adrianN
All bot detection is doomed to fail, but you can make it more expensive for
the bot authors.

------
achow
This has been questioned (5 user theory).

From 2008 research paper: [http://www.simplifyinginterfaces.com/wp-
content/uploads/2008...](http://www.simplifyinginterfaces.com/wp-
content/uploads/2008/07/faulkner_brmic_vol35.pdf)

Excerpt:

Historic reason: Both Nielsen (1993) and Virzi (1992) were writing in a
climate in which the concepts of usability were still being introduced..They
were striving to lighten the requirements of usability testing in order to
make usability practices more attractive to those working with the strained
budgets.

Conclusion: It is advisable to run the maximum number of participants that
schedules, budgets, and availability allow. The mathematical benefits of
adding test users should be cited. More test users means greater confidence
that the problems that need to be fixed will be found; as is shown in the
analysis for this study, increasing the number from 5 to 10 can result in a
dramatic improvement in data confidence. Increasing the number tested to 20
can allow the practitioner to approach increasing level

~~~
chakintosh
A few weeks ago, I tested an app with 50 users from different backgrounds,
affinities, abilities ... etc.

Towards the end, we realized the past 20 or so tests have been a waste of
time. Issues and improvements that arose from the first 20sh tests kept
repeating themselves throughout every remaining session.

This sure increases confidence in your data, but really when you're in an MVP
stage or you don't have much funding, you're better off testing with around 15
to 20 people and fix the issues that they find, because most likely, those
issues are in fact very problematic and deserve more priority. More users will
just yield more granular bugs and issues that you can schedule for later.

------
tedivm
The author ends by saying you need five users from each distinct type of user,
with the example of parent and child.

Unfortunately the author neglected to mention that the vast majority of
projects already have multiple distinct groups- abled people, blind people,
deaf people, and are just a start. In many places these are legally required
considerations.

------
linuxdude314
The 10x engineer uses no users for the test, but instead microdoses during
testing for an altered subjective experience.

~~~
swagasaurus-rex
I think I discovered a way to become a 420x engineer.

------
awillen
This is one of those things that takes a very abstract concept and tries to
boil it down a bit too far with a mathematical model. Also, the 5 number in
the headline is just misleading, since he clearly points out that the real
number is 15.

The reality is that the number of people you need to test with to get the
right number of insights (along with the depth of testing for any given user)
is going to vary drastically across products of varying purposes and level of
complexity. 5/15 users may be a reasonable average, but this is a case where
an average of many different things isn't a particularly useful measure for
any one of those things.

That doesn't even take into account the quality of people that you're testing
with. Five experienced testers is different than five people with domain
expertise but not testing experience is different than five people off the
street.

~~~
zild3d
> Also, the 5 number in the headline is just misleading, since he clearly
> points out that the real number is 15.

The 5 number isn't misleading at all, the author shows that it's the 80/20
point. The first 5 users in your usability test give you 85% of the value of
testing with 15. The takeaway is to not do the same exact test with 15 users,
do an iteration with 5, and rinse and repeat.

"Let us say that you do have the funding to recruit 15 representative
customers and have them test your design. Great. Spend this budget on 3
studies with 5 users each!"

------
zhoujianfu
A friend who worked at google told me once they would invariably get better
insight on usability by just following a few individual users around a task
vs. analyzing their zillions and zillions of site visits.

------
dalbasal
The caveat here is that it depends on what you mean by "test" and what
"insights" you are interested in.

If you are interested in (for example) determining the optimal price and only
1/20 users buy something... you still probably need about 1,000 X (price
points you want to test).

In that "test" you are basically trying to uncover the demand curve (or points
on it). It's a statistical question.

Say you have a dating app, and you are trying a new matching algorithim. It
will also take thousands of matches before you have the data to make
determinations.

All that said, I totally agree with the author. I would just frame it
differently.

The question you need to ask is _" do I need statistics?"_ Statistics have
become habitual, but much (most?) of the time, we don't need statistics.

If you want to learn if a user can write and publish a blog using your
software or install a water filter under they sink... you don't need
statistics. You need to know where most people get into trouble, and n=5 will
work fine for that.

This is intuitive if we just think outside of the "testing" vocabular. You
write a CV/essay/article. You ask 1-3 friends to read it and advise. You don't
produce statistics.

~~~
arathore
The article talks specifically about usability testing - though it isn't in
the title but it is in the first line of the article. I don't think pricing
strategies or matching algorithms and such would fall under this domain.

------
anbop
Also... when you have 15 users (and it’s slow growing) it’s because you
satisfy a unique need. These users are actually willing to talk to you for
HOURS because they need your product, they know it’s niche, and that their
feedback can actually affect the product development. Speaking from
experience, I had a customer fly to ME to give me feedback.

~~~
piyush_soni
Wow. Skype/Email didn't suffice? :)

------
calmchaos
Some key questions:

1\. How extensively do those 5 people test the software? Do they test all
features or just part of the software?

2\. What is the background of those 5 people testing the software? Do they
understand UX/good UI design and how well?

3\. Are these 5 people just random users or professional test engineers?

4\. How passionate are these 5 people about the product/service they are
testing? How meaningful is it for them that the product actually works _really
well_?

5\. What is the quality level of feedback these 5 people can provide? Is it
like "meh, this is ugly" or is it detailed, concrete and contains practical
improvement ideas that can be easily implemented?

------
Kluny
Ha, as a web designer I long for the ability to test with even one user before
launch day. Rarely do I get room in the budget for that :(

~~~
bravura
You don't have $10? Come on.

~~~
Etheryte
This might sound condescending, but it is a good point. Anyone can be a test
user, you don't need to formally hire people with a contract etc (unless your
work is contractually classified, but that's a whole other bag of beans). You
can ask colleagues from other departments, people at a coffee shop, friends
etc.

~~~
tobbebex
It depends on the userbase. I write medical imaging software intended to be
used by MDs, and can’t just ask a random person at a coffeshop to try to
establish a certain cancer diagnosis using our software. Finding candidates
that have time for usability testing in that target group can be quite a
challenge.

------
nitwit005
The fundamental assumption here seems to be that users are basically
interchangeable:

> There is no real need to keep observing the same thing multiple times

That may be true for some simpler products, but I helped out with some user
research on an analytics tool, and there was quite a diversity of feedback
from the first two batches of users.

------
gdcohen
There is a trend towards not testing at all! Instead builds are deployed
straight to production or canary (mini-subset of production) and then very
carefully and closely monitored. If a problem is uncovered, a rollback is
performed. If canary is done well, then the problem can be caught before it
has widespread impact.

~~~
imhoguy
By "not testing" you mean automated and no manual testing. Still somebody
needs to manually test some cases to write them down next as a code to cover
constantly changing product, but even the best automation won't resolve
unknown unknowns. Canary is only a way to reduce one deployment impact.

------
CodiePetersen
Wow great article and perfect timing for me. I'm about to go into testing
phase myself and this is something I've never considered. I was worried
because we're and indie group and I wasn't sure how we were going to get a lot
of people but looks like smaller is good enough and even optimal.

Thanks.

------
EGreg
I totally disagree this is true for most sites today.

Usability these days is not just about what a single user will do. We build
multi user apps. So the following things are an emergent phenomenon of actions
MANY people take:

Engagement

Retention

Viral Spread

Collaboration

Notifications

Real time updates

Chicken and Egg problems

To test these things, you often need tons of real or fake accounts. You get
some people trying interest X, and others interest Y. Sometimes things with
the exact same interface usability take off in one country and not another.
Like Orkut in Brazil!

Sites sometimes arrive at breakthroughs by A/B testing many things
automatically across millions of sessions. That’s far more efficient but
requires a large enough sample.

In fact, most of the reasons famous sites are famous is because they
successfully got a lot of people to keep coming back and doing something. They
probably got them to invite friends. And so on.

------
dbg31415
I did some work for a major telco and we weren't allowed to do user testing
outside of the company courtyard.

It consistently blew my mind how "out there" the feedback was with about a
half-a-day-worth of testing.

So I don't know about 5-users, but I can say having about 25-50 is good for
getting a broad sample.

If you get the wrong 5 users... you're going to get some really skewed
results. For example, if you grab a group of 5 people and all of them are in
tech your results are going to be dramatically different than if they are 5
people from accounting, or 5 people from janitorial services.

------
z3t4
I think the key here is to find five representative customers , not just five
random people. These users can be hard to find unless you have good market
fit. But if you have very good market fit, my experience is that users will
overcome just about every hurdle. For example digging gold and other minerals,
people will do a lot of work if it's valuable. But if it's not as valuable you
might need to hand it to them on a silver plate.

------
TrialError
I have loved this rule ever since I came across it as a student in 2002 and
been using it successfully for user acceptance testing strategies in large
projects.

I interpret it as a sort of fixed Pareto principle for projects where you
limit effort and team size for maximum gain. N=5 also happens to be close to
the ideal team size in agile frameworks. This rule is smart in a lot of ways
and was ahead of its time.

------
andy_ppp
I find it very irritating where I am that the design team don’t interact with
the programmers concerns about the designs difficulties and instead just point
to user research. A lot of design issues can be fixed by talking to not just
your users but any human walking by. Hallway usability testing works well and
just about anyone, even a programmer, will make your design better with a
fresh pair of eyes.

------
buro9
I always worked on the premise: "Find five people who care"

It's nice to get the "five" verified, but I still think it's important to make
sure they care, to have them be an essential part of making the product
better. The trick though is not to be drawn into making anything for any one
of these users (it's still got to be a generic usability for all users).

------
hammerbrostime
3 users has a surprisingly decent return too. Test _something_ for crying out
loud, a minimal investment provides a great deal of insight.

~~~
soup10
most of what you need to know to fix usability issues will be discovered
pretty quickly just by watching a single "regular" person use it.

------
Jeff_Brown
This takes into consideration the benefits of more users, but not the cost. If
the cost of adding one more test user is sufficiently small, it can be optimal
to test with lots of them.

(Roughly, the economic rule here is that you keep doing something until the
marginal cost is greater than the marginal benefit.)

------
blue_devil
Usability testing is more like brainstorming. Those 3-5 users may tell you
what _some_ usability issues are - but they won't tell you how big those
issues are, for your purposes. With imperfect a priori information on who the
users of your thing are, you have to go back to assumptions.

------
yannisc
[https://www.userfeel.com](https://www.userfeel.com) has a calculator on its
homepage that shows the percentage of problems found based on the number of
test users.

------
baxtr
Reminds me of the small “sample size misconception”

[https://www.dailykos.com/story/2008/11/9/656465/-](https://www.dailykos.com/story/2008/11/9/656465/-)

~~~
dredmorbius
Very true, though this can cut several different ways.

I've done several measurements of various aspects of use and engagement on
Google+, as an independent outsider.

From a randomised sample of fewer than 100 randomly selected profiles (of 2.2
billion total), it was clear that the active fraction was about 10%, and the
_highly_ active fraction a minuscule portion of that.

I ended up checking 50k profiles, and another (fully independent) analysis by
Stone Temple Consulting of 500k profiles largely re-demonstrated the initial
9% finding. But, with more profiles, it was possible to dial in on the very
few (about 0.16%) highly active users. Which is another sampling problem --
you're looking for the 1 in 1,000 users who are active, and need to get a
sufficient sample (typically 30-300) of those. Let's call it 100, for round
numbers.

Which means you're looking for a sample of a one-in-a-thousand subpopulation,
meaning that 100 rare high-use users requires sampling 100 * 1000 = 100k of
the total population.

(This presumes no other way of subsetting the population.)

I ran into this looking at G+ Communities -- 8 million in total, with again a
very small fraction of highly-active ones. Initial samples of 12k and 36k
subsets were useful (and tractable on a residential DSL connection), but it
was being gifted with a full 8-million record population summary of community
activity that allowed full and detailed statistics to be calculated.

[https://social.antefriguserat.de/index.php/Migrating_Google%...](https://social.antefriguserat.de/index.php/Migrating_Google%2B_Communities#Google.2B_Community_Characteristics_and_Membership)

------
itsmurat
80/20 rule apply here you will get 80-90% problems with usability when you
tested with 5 people today’s world though we need to have predictive analytics
to increase this coverage and also Start creating customized journeys

------
casper345
But doesn't the Law of Large Numbers disagree with this. Yes the first 5 Users
(froma random sample size in the target market) could agree, when you get to
larger numbers the real observations come out.

------
amelius
Ok, but how are you going to test for accessibility?

~~~
RubenSandwich
By bringing in people with accessibility needs. Few ways:

1\. Large cities have accessibility Meetups, go to one and strike up a
conversation and offer to pay someone to user test your software while you
watch.

2\. Hire a company/contractor that specializes in accesabilty audits.

3\. Hire someone with an accessibility need. They will be unable to do their
job till you fix your accesibilty problems.

------
piyush_soni
Compare that to Gmail that was in Beta for more than 5 years. How many 'test'
users did they need? :)

------
julienreszka
Anybody who studied a bit of probability theory knows that this is both false
and wrong. Statistically significant sampling can't be done with only five
users. Confidence will only be reasonably sufficient for a population of 5 (at
most). This is like if you addressed your product to 5 people (I don't believe
it to be likely, you probably have way more than 5 users).

See this page for assistance with computing sample size function of population
and confidence [https://www.surveymonkey.com/mp/sample-size-
calculator/](https://www.surveymonkey.com/mp/sample-size-calculator/)

This article probably keeps being reposted because some people try to save
money on user testing

...and that's probably why we keep having sh*tty products out in the wild.

~~~
ivalm
Yes and no, people have relatively convergent views on usability.
Statistically, you can think of proposition A "user from set U affirms that
object O has property Q". You then sample opinions for U on whether this
proposition is true. Each sampling is a Bernoulli trial parametrized by

p = Prob(A = True)

The standard error of the mean is then

sqrt(p*(1-p)/N)

where N is how many users you sampled. Suppose people are convergent in their
opinion (either p=0.99 or 0.01) then even with N=5 the uncertainty in mean is
less than 5%!

To make a concrete example, you only need to ask very few users if a
particular object is white to be fairly confident whether the majority of
people would consider a particular object to be white.

~~~
rosser
That is to say, if all five of the users with whom you've tested your
application say it's confusing, or it sucks somehow, it is diminishingly
likely for that population to be the outlier [0], and if only you had tested
with a few tens or dozens — let alone _thousands_ — more, you'd see the _true_
pattern...

Yes, statistically, it's _possible_ for outliers to bunch like that. It's
also, statistically, far less likely.

[0] Assuming, for sake of argument, a nominally representative test group.

