Hacker News new | past | comments | ask | show | jobs | submit login
You Only Need to Test with 5 Users (2000) (nngroup.com)
523 points by azhenley 10 months ago | hide | past | web | favorite | 107 comments

This is an excellent point, and the much more fascinating corollaries:

- a product designer/manager of something with 1,000,000 users won’t learn more about usability than a product designer/manager of something with 15 users. All those measurements of flows and secret at scale analytics data is sort of worthless for the purposes of usability.

- people with 15 users worth of learning about usability instead of 0 users are way undervalued, while people with 1,000,000 are way overvalued

- “we don’t know until we test it,” the #1 refrain of big company design nowadays, is intellectually bankrupt for most free software, since if it looked bad for 15 users it’s probably going to still be bad for 1,000,000

- after you’ve shown something to 15 people and they don’t like it due to usability problems, you’re extremely unlikely to find 1,000,000 more who will like it.

This also appeals to my instinct that there is something learnable about design, that great design is achieved by people and not by massive datasets.

It might even be worse still since this model (seems I may be wrong) to assume that the probability of a usability bug is constant, it might be that the share of bugs discovered by users is skewed towards the first few such that the first user finds more than the formula would predict.

It's certainly been my observation that cynical developers who test things as they go by deliberately putting in silly things into stuff they just wrote seem to get hung up less in testing.

I mean the system I inherited at work the first thing I did when I got an instance spun up was put in a negative value in the quote line quantity (which immediately broke..well almost everything) then decimal values in quantity fields where only integers made sense, then text in number fields and so on each time breaking something in a new and interesting way.

Sometimes I think it's hard not to be cynical about enterprise systems.

My old lecturer (somewhat pithily) "Almost all the testing in the world means nothing compared to 15 minutes in the hands of the 17 year old office junior"

One of my friends was literally hired as an intern to try to break software last year. He loved it, and found a ton of bugs, which was really helpful to the company- they eventually paid him a $1,000 bonus for his help over the summer.

It sounds like this guys employer has taken first step to inventing QA.

There's whole classes of highly paid engineers whose job is to do this. But they work for old fashioned, boring, companies.

The problem is QA tends to get to scripted. You need to exercise each corner of the product, thus you test A, B, C in that order - so you never find cases where testing C, A, B breaks, or any other permutation. (to be fair with any complexity it is impossible to test all permutations)

That's one part of it.

Another part is hiring a breed of test engineers who like breaking stuff and have a knack for it.

We call it exploratory testing.

I can believe it, the best person to test software for low hanging fruit of the "What happens if I do this thing that no sane person who knows anything about what the software is supposed to do would do?".

It's one of the reasons why I don't trust myself to test things fully, we write the software with all sorts of assumptions in our heads and subconsciously steer away from doing silly things - in that context it's really difficult to aim at a point a zero-knowledge user would hit.

I'm been building an API Security Scanner, and one of the things it does is just fuzz every endpoint with garbage in each parameter to look for stacktraces, errors, etc.

Moreso than any of the security tests I've written, that fuzzing has broken every enterprise API our customers have thrown at it.

Even a sane person can find a lot if they never touched the implementation of a specific feature.

On one of my teams, we had a hard rule where all features must be tested by 2 other team members (and if multiple people worked on a feature, none of them can be the testers). Something like every other case found something questionable, if not an outright broken edge case, that the developer(s) completely missed.

I remember I implemented 2FA for a previous company, and our QA person managed to lock himself out of the account while enabling it. I asked him how that's possible, and he said "well I went to enable 2FA, it gave me the recovery codes, then it asked for a 2FA code so I entered one of the recovery ones, but now I ran out".

I thought this was so dumb it was brilliant QA.

I have a friend who does this for free. Every single thing I've ever shown him (both my software and that of others) breaks as soon as he gets his hands on it.

He's the sort to dig as far into the internals of things as he can and then start messing with it (he implemented partial function application for C, for example [1]).

My poor prototypes never stood a chance.

[1] https://github.com/zwimer/C-bind

That's so enjoyable to do. :D

I bombed a job interview by writing a piece of demo software that failed all of your "throw it garbage" tests. I couldn't have hired a professor to teach me a more useful lesson!

Mine was brilliant, not a famous academic or anything and it was a college not a university but he'd written production systems in the 80's for telecomms and massive supermarkets so as an instructor for the real world really hard to beat - he taught me all sorts of things that stuck some of them I didn't understand at the time but 20 years later they make much more sense :).

That seems like a brutally unfair interview practice unless you were told in advance they’d be doing that.

They gave me a small project to do, and told me to do it as though I were building it for a customer. I think that was warning enough that it ought to gracefully handle bad input — any real-world program needs to do the same.

To be honest, unless they explicitly discussed this with you before, or went through it with you afterwards, I'm with empath75 on this one. We used to do something like that at $former_workplace and every once in a while, a candidate would come up with a program that didn't validate (most of) the input or failed in similar trivial ways.

It turned out that some of them, indeed, simply didn't care -- and didn't know, either. We'd explain what the problem was and they'd shrug or say they'd seen <big name software> breaking like that, too, you fix it if it turns out someone actually breaks it.

Others, however, would skip it so that they could focus on stuff that was more complicated or more relevant. They'd validate one set of inputs, just to show that they know it needs to be done and can do it, but not everything. Or they'd throw in a comment like //TODO: Validate this by <insert validation methods here>. Most of the time we'd just ask them to talk us through some of the validations -- and most of the time they could write them on the spot.

You could argue that this is very relevant in real life, and that even if it weren't, what's relevant is the interviewer's choice, not the candidate's (although tbh the latter is one of the reasons why tech interviews suck so much).

But at the end of the day it is an interview setting, not a real-life setting, no matter how much you try to make it seem otherwise. At the end of the day, the people doing it are still young candidates trying to impress their interviewers, not company employees working on a company project under the supervision of tech leads. You don't end up with much useful data unless you allow for some flexibility in this sort of experiment.

+1 but several of your points are incorrect or need caveats.

Your 1st point isn't correct, in that you will learn interesting things from 1m users that you won't from 15. The thing is that the 15 will tell you why, whereas the 1m won't unless you ask precisely the right question (which is a problem in itself). It basically takes experience here... (And this is something an AI may eventually become very good at.)

Your 3rd and 4th point aren't correct, in that given some sampling error you may very well find that what 15 people don't like 1m will.

I'm in full agreement with you in principle, in the sense that I firmly hold that far too many start-ups do quantitative without qualitative and end up with the wrong conclusions; but you can't just wave away quantitative data like that.

Re 1st point: OP was referring specifically to usability learnings

I'd argue that usability also must include accessibility and I highly doubt that in your 15 user sample set you will cover the deaf, the (color) blind, the dyslexic, the paralyzed, the epileptic and various other groups of people with special accessibility requirements.

Or, for an American developer, what are the odds that your 15-user sample will find usability problems having to do with your icons only making sense to someone in the US? There's more to localization than just translation. Or if you have a product that does voice recognition, what range of accents are you covering with those 15 users?

That's OK, we nowadays use icons that don't make sense to anyone at all. I mean how would you know that three horizontal lines was a "Hamburger menu", and if you do know that, why would you want a Hamburger?

Actually the icon looks like three menu items. It is a pretty good icon for signaling that it will open a menu. The hamburger nomenclature seems to be an ex pos facto name for something that rightfully should be called the menu icon.

> the icon looks like three menu items

It's a nondescript icon made of three horizontal bars, it looks like literally anything that comes in threes. My mom calls it the pancake button and my fourteen year-old nephew used to call it the button with something that looks like a fork but without a handle until he switched to the "meh" button once he became a nihilist (teenagers do that stuff sometimes). A menu with three items is very likely to be among the last things that crosses a non-techie's mind.

At this point it's been used enough that anyone with enough exposure to electronics knows what it does, but it's hardly a better choice than the "File" menu. The point of making something intuitive, as opposed to explicit (i.e. by using a symbol as opposed to spelling out) is kindda missed if you need to "well actually" it and explain why it means whatever it means.

I disagree. It took around two years of seeing hamburger menus before it clicked that it was a common symbol for a menu. Its getting worse and seems to be getting replaced by 3 vertical dots now.

I just think of the 3 vertical dots as vertical ellipsis. I often use vertical ellipsis when making a list that continues. e.g.

— History of Empire and Colonialism

— History of Religion and Society

— History of Race, Gender, and Power

— History of War and Society

So it naturally means "there is more stuff here" to me. YMMV

In your context it does make sense. As a menu button it doesn't (to me at least).

I can only speak for myself, but for me who grew up with C64, Windows 3.1, 95, 98, Linux, Windows XP more Linux etc that icon still didn't make sense until I got it explained.

Then again, the magnifykng glass icon for search didn't make sense either until I read the docs but at least back then people made docs and kept the UI stable enough so that it made sense to learn it.

I still remember fondly being good with OLE (Object Linking and Embedding) in a Microsoft Works.

The old days seemed so much more intuitive when there was a crappy low res icon with text underneath.

Not sure if you are sarcastic and in what way, but if I understand you correctly you think we are better off now.

I think we would be better off if we kept the best from both: the predictability and discoverability of the past with the niceness of today.

Why should we have to choose between nice and usable?

No I think that having a bit of descriptive text below an icon was far more useful than far prettier world of nicely rounded corners on hamburger menus that we have today.

Usability should include accessibility when possible, but there is no must here. There are lots of tools that need usability testing that have a sufficiently small team and user-base to make accessibility a waste of time and resources.

Indeed, on teams and products of any size there is always a decision to be made about which accessibility requirements will be met and to what degree.

I would make the separate argument that what you need with accessibility requirements is separate testing expirements for each one. You don't need 100 testers to to make sure you have some special accessibility requirements, e.g. you need 15 normal vision testers and 15 impared vision testers.

> but you can't just wave away quantitative data like that

quantity has a quality of its own

This may be true for software used when working alone, but for software used by groups, you are not going to figure out group effects without testing with different group sizes. Consider the differences testing a game with 1, 2, 5, 100, or 1000 users.

There's usability testing and there's scale testing.

Basic UI/UX will be significantly revealed be small-group testing.

Don't let the perfect be the enemy of the good.

So fifteen groups, then?

Agree, as the massive datasets are telling you more about what people want instead how the product works.

That’s a big distinction, as when you look at a million people, 80% are are either the same, or don’t really understand what they are doing at all.

Modern designers usually assume that people are dolts and need simplicity. Optimal in my opinion is providing quality on-boarding and understanding behavior of someone who knows what they are doing as well.

I definitely think there is a learnable aspect to design. Throwing data at the problem is expensive and does not create many deep insights. But focusing on communication and how the product communicates a coherent philosophy through features and UX yields a more systematic, standalone framework to test hypotheses against, something which data can further shape. It's more important to find "must" and "cannot" boundary conditions than preferences, and it's also easier to see if you are satisfying those conditions.

My formula is therefore: Pick a few principles to apply to the solution you're making, make sure as best you can that they agree with each other. Test to see if potential users agree with those principles. Select features based on the resulting design framework. Then develop the product as a way of discovering more about this framework.

Lots of companies are doing this implicitly through their founding teams and hiring practices: they just end up with culture that values certain principles, so they get upheld at every meeting and the results end up in the product. But it can also be communicated and imposed more explicitly, and I think that's where design becomes more visible in the process.

It’s not exclusive, usability is a duality. At the micro and macro levels. At the micro level you want to watch a couple of individual users and watch whether every little thing added to the UI makes sense to them. Whether the product solves their problem.

If micro fails, macro will most likely fail.

At the macro level you’re tracking higher level metrics like how many users engage with the different parts of the UI, where the fallout is? What is the retention of users, what do power users do differently to new users e.t.c

This kind of wholistic view about the product at micro and macro levels gives you a much better understanding about the bottle necks in the product.

If you have 1,000,000 users you need to ask very different questions, so you better learn different things.

At 15 users your training budget is to teach everybody how to use your mess of a UI is cheaper than the cost of fixing the mess of a UI. At 1,000,000 users your cost of training is higher, but you also have more users to spread the cost of preparing the training among.

My impression is that most people instinctively assume that to get accurate results from sampling, you need a sample size proportional to the population, and that isn't generally the case.

Maybe for many purposes, 5 won't cut it and you need 100 or 1,000, but you don't need 1%.

I don't have much to do with UX, but I will add one insight. Try finding users with esoteric ways of working, and ensure the site works for them. By esoteric I mean blind users of screen readers, users of old, non js browsers, people on corp or school networks where things may be blocked, people on extremely small screens or extremely large ones (TVs with remotes), people who don't own or don't want to use a credit card (there's a widespread credit card phobia in i.e. Poland when it comes to online purchases), people using a different language and keyboard layout, particularly ltr, significant when it comes to desktop apps etc. I'm a screen reader user myself, and I find websites that might be beautiful, but are utterly inaccessible constantly. I've either encountered, or was a witness of, usability difficulties in all of those categories I outlined. For each one, I could provide an example of a website or app that I/someone had to abandon for just that reason, and this is just me. I'm sure there are more nichse I haven't thought about.

This is going to depend heavily on your target market. In many of the SaaS applications I have been involved in we really don't care about users with non-JS browsers, or extremely small screens, or TVs, or people without credit cards. Some of the applications I have worked on will never be translated.

Two observations:

1) A site that isn't ready for a screen reader probably isn't ready for a voice browser or other non-visual user agent. Say, an AI/digital assistant. Or perhaps even a search engine indexing bot (though for the last 10-15 years the incentives behind this often mean people will invest heavily, even non-cooperatively, in this specifically).

2) Projects for which the engineering takes into account the possibility of serving content to non-JS or weird screens UAs from the get-go are often better technically. Not a statistically investigated theory, but my theory is that it's because if you're really doing the REST/resource-oriented thinking necessary to make a non-trivial app display responses as minimal markup, you're also doing the thinking necessary to make a good API to be consumed by a dynamic or even SPA front-end. And vice versa: if you've got a good API for a dynamic-heavy or SPA front-end, it's not difficult to represent the given resource with the media type HTML instead of JSON. Which means if it is difficult to represent a resource as HTML instead of JSON, something's probably not right with how your app is put together.

I wouldn't go so far as to say every app needs to be plain-HTML + accessibility focused, but I think there's benefits that go beyond the margins of users with direct accessibility issues.

Ux Engineering manager here for a fortune 100 company.

We don't care about any of the use cases in your comment. If our JavaScript doesn't work in your browser, you are a security risk and we don't want our site to work on your browser.

Please call our 1-800 number to talk to a representative.

> If our JavaScript doesn't work in your browser, you are a security risk and we don't want our site to work on your browser.

Sorry, what? Why am I security risk for not wanting to run the arbitrary code that your website sends me? (I browse with JavaScript turned on, FWIW, so this is a hypothetical question for me.)

Some javascript is used to detect bots.

Client-side JavaScript to detect bots seems doomed to fail, unless it's some sort of proof-of-work kind of thing.

All bot detection is doomed to fail, but you can make it more expensive for the bot authors.

> Ux Engineering manager here for a fortune 100 company.

That doesn't mean much these days when it comes to ux :-) Edit: so until we know what company you work for we don’t know if you are part of the problem or fighting against the problems.

If anything modern ux is an exercise in embracing mystery meat navigation.

I recently got an iPad and while I like it a lot it is an exercise in frustration and DDG-ing to find out if something is missing (like select all in mail) or if someone has just come up with another "intuitive" way to hide it: slide in? slide in from the right? pull down on a list that is already on top? Touch the top bar? Long touch? Turn device sideways?

I'm almost not joking when I say usability was better in the 90ies:

Ever present menus, hover to get tooltips, context menus and documentation.

> We don't care about any of the use cases in your comment

Hey, everybody has to make tradeoffs, and it's certainly conceivable that the margins of disability don't matter to you, or that your phone support is your non-visual UA, or even that you don't care about search.

That's only half the contents of my comment, though, the rest is about engineering benefits. Perhaps you don't care about those either?

> If our JavaScript doesn't work in your browser, you are a security risk

What? JavaScript fantastically useful, but it's a vulnerability surface, not a security feature.

Maybe you meant to say HTTPS/TLS? Because if you meant to use JS as a proxy for TLS support, oh, man, I have some bad news for you. Both about security and about the relative utility of engineering around proxy tests vs engineering around tests for specific feature support.

> Ux Engineering manager here for a fortune 100 company.

I'd love to know the name of the company. Don't worry, I won't try to get you in trouble or anything, it's just clear that there's opportunities to rise well above one's competence there.

> it's just clear that there's opportunities to rise well above one's competence there.

Isn't that super hostile ? Can't read between the lines there

Your criticism is at least as apt as mine.

OTOH, if someone's going to do a drive-by comment in a discussion where they lean heavily on the authority of a management position at a high-profile company, and then proceed to speak as if they don't understand how the area they interact with intersects with security and haven't thoughtfully engaged with the comment they're responding to, then I'm not really sure it's out of line to suggest that whatever authority they've been given has been questionably allocated.

and this is exactly my problem with Fortune100 companies. Shoving your code down our throats till we hoke, without the possibility to use it the way we want.

This has been questioned (5 user theory).

From 2008 research paper: http://www.simplifyinginterfaces.com/wp-content/uploads/2008...


Historic reason: Both Nielsen (1993) and Virzi (1992) were writing in a climate in which the concepts of usability were still being introduced..They were striving to lighten the requirements of usability testing in order to make usability practices more attractive to those working with the strained budgets.

Conclusion: It is advisable to run the maximum number of participants that schedules, budgets, and availability allow. The mathematical benefits of adding test users should be cited. More test users means greater confidence that the problems that need to be fixed will be found; as is shown in the analysis for this study, increasing the number from 5 to 10 can result in a dramatic improvement in data confidence. Increasing the number tested to 20 can allow the practitioner to approach increasing level

A few weeks ago, I tested an app with 50 users from different backgrounds, affinities, abilities ... etc.

Towards the end, we realized the past 20 or so tests have been a waste of time. Issues and improvements that arose from the first 20sh tests kept repeating themselves throughout every remaining session.

This sure increases confidence in your data, but really when you're in an MVP stage or you don't have much funding, you're better off testing with around 15 to 20 people and fix the issues that they find, because most likely, those issues are in fact very problematic and deserve more priority. More users will just yield more granular bugs and issues that you can schedule for later.

The author ends by saying you need five users from each distinct type of user, with the example of parent and child.

Unfortunately the author neglected to mention that the vast majority of projects already have multiple distinct groups- abled people, blind people, deaf people, and are just a start. In many places these are legally required considerations.

The 10x engineer uses no users for the test, but instead microdoses during testing for an altered subjective experience.

I think I discovered a way to become a 420x engineer.

This is one of those things that takes a very abstract concept and tries to boil it down a bit too far with a mathematical model. Also, the 5 number in the headline is just misleading, since he clearly points out that the real number is 15.

The reality is that the number of people you need to test with to get the right number of insights (along with the depth of testing for any given user) is going to vary drastically across products of varying purposes and level of complexity. 5/15 users may be a reasonable average, but this is a case where an average of many different things isn't a particularly useful measure for any one of those things.

That doesn't even take into account the quality of people that you're testing with. Five experienced testers is different than five people with domain expertise but not testing experience is different than five people off the street.

> Also, the 5 number in the headline is just misleading, since he clearly points out that the real number is 15.

The 5 number isn't misleading at all, the author shows that it's the 80/20 point. The first 5 users in your usability test give you 85% of the value of testing with 15. The takeaway is to not do the same exact test with 15 users, do an iteration with 5, and rinse and repeat.

"Let us say that you do have the funding to recruit 15 representative customers and have them test your design. Great. Spend this budget on 3 studies with 5 users each!"

A friend who worked at google told me once they would invariably get better insight on usability by just following a few individual users around a task vs. analyzing their zillions and zillions of site visits.

The caveat here is that it depends on what you mean by "test" and what "insights" you are interested in.

If you are interested in (for example) determining the optimal price and only 1/20 users buy something... you still probably need about 1,000 X (price points you want to test).

In that "test" you are basically trying to uncover the demand curve (or points on it). It's a statistical question.

Say you have a dating app, and you are trying a new matching algorithim. It will also take thousands of matches before you have the data to make determinations.

All that said, I totally agree with the author. I would just frame it differently.

The question you need to ask is "do I need statistics?" Statistics have become habitual, but much (most?) of the time, we don't need statistics.

If you want to learn if a user can write and publish a blog using your software or install a water filter under they sink... you don't need statistics. You need to know where most people get into trouble, and n=5 will work fine for that.

This is intuitive if we just think outside of the "testing" vocabular. You write a CV/essay/article. You ask 1-3 friends to read it and advise. You don't produce statistics.

The article talks specifically about usability testing - though it isn't in the title but it is in the first line of the article. I don't think pricing strategies or matching algorithms and such would fall under this domain.

you probably don't but you probably should

Also... when you have 15 users (and it’s slow growing) it’s because you satisfy a unique need. These users are actually willing to talk to you for HOURS because they need your product, they know it’s niche, and that their feedback can actually affect the product development. Speaking from experience, I had a customer fly to ME to give me feedback.

Wow. Skype/Email didn't suffice? :)

Some key questions:

1. How extensively do those 5 people test the software? Do they test all features or just part of the software?

2. What is the background of those 5 people testing the software? Do they understand UX/good UI design and how well?

3. Are these 5 people just random users or professional test engineers?

4. How passionate are these 5 people about the product/service they are testing? How meaningful is it for them that the product actually works _really well_?

5. What is the quality level of feedback these 5 people can provide? Is it like "meh, this is ugly" or is it detailed, concrete and contains practical improvement ideas that can be easily implemented?

Ha, as a web designer I long for the ability to test with even one user before launch day. Rarely do I get room in the budget for that :(

Sit in Starbucks and offer $5 gift cards...

You don't have $10? Come on.

This might sound condescending, but it is a good point. Anyone can be a test user, you don't need to formally hire people with a contract etc (unless your work is contractually classified, but that's a whole other bag of beans). You can ask colleagues from other departments, people at a coffee shop, friends etc.

It depends on the userbase. I write medical imaging software intended to be used by MDs, and can’t just ask a random person at a coffeshop to try to establish a certain cancer diagnosis using our software. Finding candidates that have time for usability testing in that target group can be quite a challenge.

I think it's rather about their own time. If I waste a day that could have been customer work, my company misses out on well over a thousand bucks.

Yup, this one. If I spend three hours doing ux testing the boss will be like, "Is that necessary? How much time did you spend doing that?" Lol someday when I'm the VP of engineering it'll be different, but not today.

The fundamental assumption here seems to be that users are basically interchangeable:

> There is no real need to keep observing the same thing multiple times

That may be true for some simpler products, but I helped out with some user research on an analytics tool, and there was quite a diversity of feedback from the first two batches of users.

There is a trend towards not testing at all! Instead builds are deployed straight to production or canary (mini-subset of production) and then very carefully and closely monitored. If a problem is uncovered, a rollback is performed. If canary is done well, then the problem can be caught before it has widespread impact.

By "not testing" you mean automated and no manual testing. Still somebody needs to manually test some cases to write them down next as a code to cover constantly changing product, but even the best automation won't resolve unknown unknowns. Canary is only a way to reduce one deployment impact.

Wow great article and perfect timing for me. I'm about to go into testing phase myself and this is something I've never considered. I was worried because we're and indie group and I wasn't sure how we were going to get a lot of people but looks like smaller is good enough and even optimal.


I totally disagree this is true for most sites today.

Usability these days is not just about what a single user will do. We build multi user apps. So the following things are an emergent phenomenon of actions MANY people take:



Viral Spread



Real time updates

Chicken and Egg problems

To test these things, you often need tons of real or fake accounts. You get some people trying interest X, and others interest Y. Sometimes things with the exact same interface usability take off in one country and not another. Like Orkut in Brazil!

Sites sometimes arrive at breakthroughs by A/B testing many things automatically across millions of sessions. That’s far more efficient but requires a large enough sample.

In fact, most of the reasons famous sites are famous is because they successfully got a lot of people to keep coming back and doing something. They probably got them to invite friends. And so on.

I did some work for a major telco and we weren't allowed to do user testing outside of the company courtyard.

It consistently blew my mind how "out there" the feedback was with about a half-a-day-worth of testing.

So I don't know about 5-users, but I can say having about 25-50 is good for getting a broad sample.

If you get the wrong 5 users... you're going to get some really skewed results. For example, if you grab a group of 5 people and all of them are in tech your results are going to be dramatically different than if they are 5 people from accounting, or 5 people from janitorial services.

I think the key here is to find five representative customers , not just five random people. These users can be hard to find unless you have good market fit. But if you have very good market fit, my experience is that users will overcome just about every hurdle. For example digging gold and other minerals, people will do a lot of work if it's valuable. But if it's not as valuable you might need to hand it to them on a silver plate.

I have loved this rule ever since I came across it as a student in 2002 and been using it successfully for user acceptance testing strategies in large projects.

I interpret it as a sort of fixed Pareto principle for projects where you limit effort and team size for maximum gain. N=5 also happens to be close to the ideal team size in agile frameworks. This rule is smart in a lot of ways and was ahead of its time.

I find it very irritating where I am that the design team don’t interact with the programmers concerns about the designs difficulties and instead just point to user research. A lot of design issues can be fixed by talking to not just your users but any human walking by. Hallway usability testing works well and just about anyone, even a programmer, will make your design better with a fresh pair of eyes.

I always worked on the premise: "Find five people who care"

It's nice to get the "five" verified, but I still think it's important to make sure they care, to have them be an essential part of making the product better. The trick though is not to be drawn into making anything for any one of these users (it's still got to be a generic usability for all users).

3 users has a surprisingly decent return too. Test something for crying out loud, a minimal investment provides a great deal of insight.

most of what you need to know to fix usability issues will be discovered pretty quickly just by watching a single "regular" person use it.

This takes into consideration the benefits of more users, but not the cost. If the cost of adding one more test user is sufficiently small, it can be optimal to test with lots of them.

(Roughly, the economic rule here is that you keep doing something until the marginal cost is greater than the marginal benefit.)

Usability testing is more like brainstorming. Those 3-5 users may tell you what some usability issues are - but they won't tell you how big those issues are, for your purposes. With imperfect a priori information on who the users of your thing are, you have to go back to assumptions.

https://www.userfeel.com has a calculator on its homepage that shows the percentage of problems found based on the number of test users.

Reminds me of the small “sample size misconception”


Very true, though this can cut several different ways.

I've done several measurements of various aspects of use and engagement on Google+, as an independent outsider.

From a randomised sample of fewer than 100 randomly selected profiles (of 2.2 billion total), it was clear that the active fraction was about 10%, and the highly active fraction a minuscule portion of that.

I ended up checking 50k profiles, and another (fully independent) analysis by Stone Temple Consulting of 500k profiles largely re-demonstrated the initial 9% finding. But, with more profiles, it was possible to dial in on the very few (about 0.16%) highly active users. Which is another sampling problem -- you're looking for the 1 in 1,000 users who are active, and need to get a sufficient sample (typically 30-300) of those. Let's call it 100, for round numbers.

Which means you're looking for a sample of a one-in-a-thousand subpopulation, meaning that 100 rare high-use users requires sampling 100 * 1000 = 100k of the total population.

(This presumes no other way of subsetting the population.)

I ran into this looking at G+ Communities -- 8 million in total, with again a very small fraction of highly-active ones. Initial samples of 12k and 36k subsets were useful (and tractable on a residential DSL connection), but it was being gifted with a full 8-million record population summary of community activity that allowed full and detailed statistics to be calculated.


This article presents standard error as the quantity representing "by how much our prediction could be off" without mentioning that this is only so for a single standard deviation.

For a normally distributed value, one standard deviation only covers around 68% of the cases so you'd have to at least double that error value if you want to be pretty sure of your conclusion (which would then cover around 95% of the cases).

80/20 rule apply here you will get 80-90% problems with usability when you tested with 5 people today’s world though we need to have predictive analytics to increase this coverage and also Start creating customized journeys

But doesn't the Law of Large Numbers disagree with this. Yes the first 5 Users (froma random sample size in the target market) could agree, when you get to larger numbers the real observations come out.

Ok, but how are you going to test for accessibility?

By bringing in people with accessibility needs. Few ways:

1. Large cities have accessibility Meetups, go to one and strike up a conversation and offer to pay someone to user test your software while you watch.

2. Hire a company/contractor that specializes in accesabilty audits.

3. Hire someone with an accessibility need. They will be unable to do their job till you fix your accesibilty problems.

Compare that to Gmail that was in Beta for more than 5 years. How many 'test' users did they need? :)

Anybody who studied a bit of probability theory knows that this is both false and wrong. Statistically significant sampling can't be done with only five users. Confidence will only be reasonably sufficient for a population of 5 (at most). This is like if you addressed your product to 5 people (I don't believe it to be likely, you probably have way more than 5 users).

See this page for assistance with computing sample size function of population and confidence https://www.surveymonkey.com/mp/sample-size-calculator/

This article probably keeps being reposted because some people try to save money on user testing

...and that's probably why we keep having sh*tty products out in the wild.

Yes and no, people have relatively convergent views on usability. Statistically, you can think of proposition A "user from set U affirms that object O has property Q". You then sample opinions for U on whether this proposition is true. Each sampling is a Bernoulli trial parametrized by

p = Prob(A = True)

The standard error of the mean is then


where N is how many users you sampled. Suppose people are convergent in their opinion (either p=0.99 or 0.01) then even with N=5 the uncertainty in mean is less than 5%!

To make a concrete example, you only need to ask very few users if a particular object is white to be fairly confident whether the majority of people would consider a particular object to be white.

That is to say, if all five of the users with whom you've tested your application say it's confusing, or it sucks somehow, it is diminishingly likely for that population to be the outlier [0], and if only you had tested with a few tens or dozens — let alone thousands — more, you'd see the true pattern...

Yes, statistically, it's possible for outliers to bunch like that. It's also, statistically, far less likely.

[0] Assuming, for sake of argument, a nominally representative test group.

Anybody who studied a bit of probability theory knows that all this depends on the underlying probability distributions. If one draws five samples from a Gaussian distributions with an unknown mean and a standard deviation of 10, and none of the samples is below 50, then indeed it is very unlikely that the mean is 20.

What you're missing is covered in the article: the purpose of testing is to improve the product, not accurately document all the problems with one iteration of it. Better to test with 3 sessions of 5 users, than 1 session of 15.

Seems like for large effects, this depends on how often the effect occurs. If it's something that happens for everyone, you only need one user.


If the same exact thing happens to every user, you don't need sampling. You can't possibly have a relevant bias while choosing

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact