- a product designer/manager of something with 1,000,000 users won’t learn more about usability than a product designer/manager of something with 15 users. All those measurements of flows and secret at scale analytics data is sort of worthless for the purposes of usability.
- people with 15 users worth of learning about usability instead of 0 users are way undervalued, while people with 1,000,000 are way overvalued
- “we don’t know until we test it,” the #1 refrain of big company design nowadays, is intellectually bankrupt for most free software, since if it looked bad for 15 users it’s probably going to still be bad for 1,000,000
- after you’ve shown something to 15 people and they don’t like it due to usability problems, you’re extremely unlikely to find 1,000,000 more who will like it.
This also appeals to my instinct that there is something learnable about design, that great design is achieved by people and not by massive datasets.
It's certainly been my observation that cynical developers who test things as they go by deliberately putting in silly things into stuff they just wrote seem to get hung up less in testing.
I mean the system I inherited at work the first thing I did when I got an instance spun up was put in a negative value in the quote line quantity (which immediately broke..well almost everything) then decimal values in quantity fields where only integers made sense, then text in number fields and so on each time breaking something in a new and interesting way.
Sometimes I think it's hard not to be cynical about enterprise systems.
My old lecturer (somewhat pithily) "Almost all the testing in the world means nothing compared to 15 minutes in the hands of the 17 year old office junior"
There's whole classes of highly paid engineers whose job is to do this. But they work for old fashioned, boring, companies.
Another part is hiring a breed of test engineers who like breaking stuff and have a knack for it.
It's one of the reasons why I don't trust myself to test things fully, we write the software with all sorts of assumptions in our heads and subconsciously steer away from doing silly things - in that context it's really difficult to aim at a point a zero-knowledge user would hit.
Moreso than any of the security tests I've written, that fuzzing has broken every enterprise API our customers have thrown at it.
On one of my teams, we had a hard rule where all features must be tested by 2 other team members (and if multiple people worked on a feature, none of them can be the testers). Something like every other case found something questionable, if not an outright broken edge case, that the developer(s) completely missed.
I thought this was so dumb it was brilliant QA.
He's the sort to dig as far into the internals of things as he can and then start messing with it (he implemented partial function application for C, for example ).
My poor prototypes never stood a chance.
It turned out that some of them, indeed, simply didn't care -- and didn't know, either. We'd explain what the problem was and they'd shrug or say they'd seen <big name software> breaking like that, too, you fix it if it turns out someone actually breaks it.
Others, however, would skip it so that they could focus on stuff that was more complicated or more relevant. They'd validate one set of inputs, just to show that they know it needs to be done and can do it, but not everything. Or they'd throw in a comment like //TODO: Validate this by <insert validation methods here>. Most of the time we'd just ask them to talk us through some of the validations -- and most of the time they could write them on the spot.
You could argue that this is very relevant in real life, and that even if it weren't, what's relevant is the interviewer's choice, not the candidate's (although tbh the latter is one of the reasons why tech interviews suck so much).
But at the end of the day it is an interview setting, not a real-life setting, no matter how much you try to make it seem otherwise. At the end of the day, the people doing it are still young candidates trying to impress their interviewers, not company employees working on a company project under the supervision of tech leads. You don't end up with much useful data unless you allow for some flexibility in this sort of experiment.
Your 1st point isn't correct, in that you will learn interesting things from 1m users that you won't from 15. The thing is that the 15 will tell you why, whereas the 1m won't unless you ask precisely the right question (which is a problem in itself). It basically takes experience here... (And this is something an AI may eventually become very good at.)
Your 3rd and 4th point aren't correct, in that given some sampling error you may very well find that what 15 people don't like 1m will.
I'm in full agreement with you in principle, in the sense that I firmly hold that far too many start-ups do quantitative without qualitative and end up with the wrong conclusions; but you can't just wave away quantitative data like that.
It's a nondescript icon made of three horizontal bars, it looks like literally anything that comes in threes. My mom calls it the pancake button and my fourteen year-old nephew used to call it the button with something that looks like a fork but without a handle until he switched to the "meh" button once he became a nihilist (teenagers do that stuff sometimes). A menu with three items is very likely to be among the last things that crosses a non-techie's mind.
At this point it's been used enough that anyone with enough exposure to electronics knows what it does, but it's hardly a better choice than the "File" menu. The point of making something intuitive, as opposed to explicit (i.e. by using a symbol as opposed to spelling out) is kindda missed if you need to "well actually" it and explain why it means whatever it means.
— History of Empire and Colonialism
— History of Religion and Society
— History of Race, Gender, and Power
So it naturally means "there is more stuff here" to me. YMMV
Then again, the magnifykng glass icon for search didn't make sense either until I read the docs but at least back then people made docs and kept the UI stable enough so that it made sense to learn it.
I still remember fondly being good with OLE (Object Linking and Embedding) in a Microsoft Works.
I think we would be better off if we kept the best from both: the predictability and discoverability of the past with the niceness of today.
Why should we have to choose between nice and usable?
Indeed, on teams and products of any size there is always a decision to be made about which accessibility requirements will be met and to what degree.
I would make the separate argument that what you need with accessibility requirements is separate testing expirements for each one. You don't need 100 testers to to make sure you have some special accessibility requirements, e.g. you need 15 normal vision testers and 15 impared vision testers.
quantity has a quality of its own
Basic UI/UX will be significantly revealed be small-group testing.
Don't let the perfect be the enemy of the good.
That’s a big distinction, as when you look at a million people, 80% are are either the same, or don’t really understand what they are doing at all.
Modern designers usually assume that people are dolts and need simplicity. Optimal in my opinion is providing quality on-boarding and understanding behavior of someone who knows what they are doing as well.
My formula is therefore: Pick a few principles to apply to the solution you're making, make sure as best you can that they agree with each other. Test to see if potential users agree with those principles. Select features based on the resulting design framework. Then develop the product as a way of discovering more about this framework.
Lots of companies are doing this implicitly through their founding teams and hiring practices: they just end up with culture that values certain principles, so they get upheld at every meeting and the results end up in the product. But it can also be communicated and imposed more explicitly, and I think that's where design becomes more visible in the process.
If micro fails, macro will most likely fail.
At the macro level you’re tracking higher level metrics like how many users engage with the different parts of the UI, where the fallout is? What is the retention of users, what do power users do differently to new users e.t.c
This kind of wholistic view about the product at micro and macro levels gives you a much better understanding about the bottle necks in the product.
At 15 users your training budget is to teach everybody how to use your mess of a UI is cheaper than the cost of fixing the mess of a UI. At 1,000,000 users your cost of training is higher, but you also have more users to spread the cost of preparing the training among.
Maybe for many purposes, 5 won't cut it and you need 100 or 1,000, but you don't need 1%.
1) A site that isn't ready for a screen reader probably isn't ready for a voice browser or other non-visual user agent. Say, an AI/digital assistant. Or perhaps even a search engine indexing bot (though for the last 10-15 years the incentives behind this often mean people will invest heavily, even non-cooperatively, in this specifically).
2) Projects for which the engineering takes into account the possibility of serving content to non-JS or weird screens UAs from the get-go are often better technically. Not a statistically investigated theory, but my theory is that it's because if you're really doing the REST/resource-oriented thinking necessary to make a non-trivial app display responses as minimal markup, you're also doing the thinking necessary to make a good API to be consumed by a dynamic or even SPA front-end. And vice versa: if you've got a good API for a dynamic-heavy or SPA front-end, it's not difficult to represent the given resource with the media type HTML instead of JSON. Which means if it is difficult to represent a resource as HTML instead of JSON, something's probably not right with how your app is put together.
I wouldn't go so far as to say every app needs to be plain-HTML + accessibility focused, but I think there's benefits that go beyond the margins of users with direct accessibility issues.
Please call our 1-800 number to talk to a representative.
That doesn't mean much these days when it comes to ux :-) Edit: so until we know what company you work for we don’t know if you are part of the problem or fighting against the problems.
If anything modern ux is an exercise in embracing mystery meat navigation.
I recently got an iPad and while I like it a lot it is an exercise in frustration and DDG-ing to find out if something is missing (like select all in mail) or if someone has just come up with another "intuitive" way to hide it: slide in? slide in from the right? pull down on a list that is already on top? Touch the top bar? Long touch? Turn device sideways?
I'm almost not joking when I say usability was better in the 90ies:
Ever present menus, hover to get tooltips, context menus and documentation.
Hey, everybody has to make tradeoffs, and it's certainly conceivable that the margins of disability don't matter to you, or that your phone support is your non-visual UA, or even that you don't care about search.
That's only half the contents of my comment, though, the rest is about engineering benefits. Perhaps you don't care about those either?
Maybe you meant to say HTTPS/TLS? Because if you meant to use JS as a proxy for TLS support, oh, man, I have some bad news for you. Both about security and about the relative utility of engineering around proxy tests vs engineering around tests for specific feature support.
> Ux Engineering manager here for a fortune 100 company.
I'd love to know the name of the company. Don't worry, I won't try to get you in trouble or anything, it's just clear that there's opportunities to rise well above one's competence there.
Isn't that super hostile ? Can't read between the lines there
OTOH, if someone's going to do a drive-by comment in a discussion where they lean heavily on the authority of a management position at a high-profile company, and then proceed to speak as if they don't understand how the area they interact with intersects with security and haven't thoughtfully engaged with the comment they're responding to, then I'm not really sure it's out of line to suggest that whatever authority they've been given has been questionably allocated.
From 2008 research paper: http://www.simplifyinginterfaces.com/wp-content/uploads/2008...
Both Nielsen (1993) and Virzi (1992) were writing in a climate in which the concepts of usability were still being introduced..They were striving to lighten the requirements of usability testing in order to make usability practices more attractive to those working with the strained budgets.
It is advisable to run the maximum number of participants that schedules, budgets, and availability allow. The mathematical benefits of adding test users should be cited. More test users means greater confidence that the problems that need to be fixed will be found; as is shown in the analysis for this study, increasing the number from 5 to 10 can result in a dramatic improvement in data confidence. Increasing the number tested to 20 can allow the practitioner to approach increasing level
Towards the end, we realized the past 20 or so tests have been a waste of time. Issues and improvements that arose from the first 20sh tests kept repeating themselves throughout every remaining session.
This sure increases confidence in your data, but really when you're in an MVP stage or you don't have much funding, you're better off testing with around 15 to 20 people and fix the issues that they find, because most likely, those issues are in fact very problematic and deserve more priority. More users will just yield more granular bugs and issues that you can schedule for later.
Unfortunately the author neglected to mention that the vast majority of projects already have multiple distinct groups- abled people, blind people, deaf people, and are just a start. In many places these are legally required considerations.
The reality is that the number of people you need to test with to get the right number of insights (along with the depth of testing for any given user) is going to vary drastically across products of varying purposes and level of complexity. 5/15 users may be a reasonable average, but this is a case where an average of many different things isn't a particularly useful measure for any one of those things.
That doesn't even take into account the quality of people that you're testing with. Five experienced testers is different than five people with domain expertise but not testing experience is different than five people off the street.
The 5 number isn't misleading at all, the author shows that it's the 80/20 point. The first 5 users in your usability test give you 85% of the value of testing with 15. The takeaway is to not do the same exact test with 15 users, do an iteration with 5, and rinse and repeat.
"Let us say that you do have the funding to recruit 15 representative customers and have them test your design. Great. Spend this budget on 3 studies with 5 users each!"
If you are interested in (for example) determining the optimal price and only 1/20 users buy something... you still probably need about 1,000 X (price points you want to test).
In that "test" you are basically trying to uncover the demand curve (or points on it). It's a statistical question.
Say you have a dating app, and you are trying a new matching algorithim. It will also take thousands of matches before you have the data to make determinations.
All that said, I totally agree with the author. I would just frame it differently.
The question you need to ask is "do I need statistics?" Statistics have become habitual, but much (most?) of the time, we don't need statistics.
If you want to learn if a user can write and publish a blog using your software or install a water filter under they sink... you don't need statistics. You need to know where most people get into trouble, and n=5 will work fine for that.
This is intuitive if we just think outside of the "testing" vocabular. You write a CV/essay/article. You ask 1-3 friends to read it and advise. You don't produce statistics.
1. How extensively do those 5 people test the software? Do they test all features or just part of the software?
2. What is the background of those 5 people testing the software? Do they understand UX/good UI design and how well?
3. Are these 5 people just random users or professional test engineers?
4. How passionate are these 5 people about the product/service they are testing? How meaningful is it for them that the product actually works _really well_?
5. What is the quality level of feedback these 5 people can provide? Is it like "meh, this is ugly" or is it detailed, concrete and contains practical improvement ideas that can be easily implemented?
> There is no real need to keep observing the same thing multiple times
That may be true for some simpler products, but I helped out with some user research on an analytics tool, and there was quite a diversity of feedback from the first two batches of users.
Usability these days is not just about what a single user will do. We build multi user apps. So the following things are an emergent phenomenon of actions MANY people take:
Real time updates
Chicken and Egg problems
To test these things, you often need tons of real or fake accounts. You get some people trying interest X, and others interest Y. Sometimes things with the exact same interface usability take off in one country and not another. Like Orkut in Brazil!
Sites sometimes arrive at breakthroughs by A/B testing many things automatically across millions of sessions. That’s far more efficient but requires a large enough sample.
In fact, most of the reasons famous sites are famous is because they successfully got a lot of people to keep coming back and doing something. They probably got them to invite friends. And so on.
It consistently blew my mind how "out there" the feedback was with about a half-a-day-worth of testing.
So I don't know about 5-users, but I can say having about 25-50 is good for getting a broad sample.
If you get the wrong 5 users... you're going to get some really skewed results. For example, if you grab a group of 5 people and all of them are in tech your results are going to be dramatically different than if they are 5 people from accounting, or 5 people from janitorial services.
I interpret it as a sort of fixed Pareto principle for projects where you limit effort and team size for maximum gain. N=5 also happens to be close to the ideal team size in agile frameworks. This rule is smart in a lot of ways and was ahead of its time.
It's nice to get the "five" verified, but I still think it's important to make sure they care, to have them be an essential part of making the product better. The trick though is not to be drawn into making anything for any one of these users (it's still got to be a generic usability for all users).
(Roughly, the economic rule here is that you keep doing something until the marginal cost is greater than the marginal benefit.)
I've done several measurements of various aspects of use and engagement on Google+, as an independent outsider.
From a randomised sample of fewer than 100 randomly selected profiles (of 2.2 billion total), it was clear that the active fraction was about 10%, and the highly active fraction a minuscule portion of that.
I ended up checking 50k profiles, and another (fully independent) analysis by Stone Temple Consulting of 500k profiles largely re-demonstrated the initial 9% finding. But, with more profiles, it was possible to dial in on the very few (about 0.16%) highly active users. Which is another sampling problem -- you're looking for the 1 in 1,000 users who are active, and need to get a sufficient sample (typically 30-300) of those. Let's call it 100, for round numbers.
Which means you're looking for a sample of a one-in-a-thousand subpopulation, meaning that 100 rare high-use users requires sampling 100 * 1000 = 100k of the total population.
(This presumes no other way of subsetting the population.)
I ran into this looking at G+ Communities -- 8 million in total, with again a very small fraction of highly-active ones. Initial samples of 12k and 36k subsets were useful (and tractable on a residential DSL connection), but it was being gifted with a full 8-million record population summary of community activity that allowed full and detailed statistics to be calculated.
For a normally distributed value, one standard deviation only covers around 68% of the cases so you'd have to at least double that error value if you want to be pretty sure of your conclusion (which would then cover around 95% of the cases).
1. Large cities have accessibility Meetups, go to one and strike up a conversation and offer to pay someone to user test your software while you watch.
2. Hire a company/contractor that specializes in accesabilty audits.
3. Hire someone with an accessibility need. They will be unable to do their job till you fix your accesibilty problems.
See this page for assistance with computing sample size function of population and confidence https://www.surveymonkey.com/mp/sample-size-calculator/
This article probably keeps being reposted because some people try to save money on user testing
...and that's probably why we keep having sh*tty products out in the wild.
p = Prob(A = True)
The standard error of the mean is then
where N is how many users you sampled. Suppose people are convergent in their opinion (either p=0.99 or 0.01) then even with N=5 the uncertainty in mean is less than 5%!
To make a concrete example, you only need to ask very few users if a particular object is white to be fairly confident whether the majority of people would consider a particular object to be white.
Yes, statistically, it's possible for outliers to bunch like that. It's also, statistically, far less likely.
 Assuming, for sake of argument, a nominally representative test group.