Question: How many users do you need to test with for a usability test?
Answer 1: = 5 users (Jakob Nielsen and Thomas Landauer, 1993).
Answer 2: = 15 users (Laurie Faulkner, 2004), PDF file.
So, which is it, 5 or 15? And why are we arguing about an extra 10 users, doesn't one need to test with at least 100 or more users for statistical significance, accuracy and validity?
Statistical Validity in Usability Testing
Usability research is largely qual-itative, or driven by insight (why users don't understand or why they are confused). Qual-itative research follows different research rules to quant-itative research and it is typical that sample size is low (i.e. 15 or 20 participants).
The end result of usability testing is not statistical validity per say (the outcome of quant-itative research) but verification of insights and assumptions based on behavioral observation (the outcome of qual-itative research).
Why don't we do large numbers in usability testing?
- We are looking for behavioral based insight (what they do).
- Statistics tell half the story and often are devoid of context (e.g. Why did they fail?)- Also one of the major problems with gaining insight from web analytics (website traffic statistics).
- Our objective is to apply findings to fix design problems in a corporate setting (not academic analysis).
- Research shows that even with low numbers, you can gain valid data.
- Usability testing is being used industry-wide and has been for past 25 years. Experts, authors and academics put their reputations and credentials behind the methodology.
Behavior vs. Opinion
Usability research is behavior-driven: You observe what people do, not what they say.
In contrast, market
research is largely opinion-driven: You ask people what they think and
what they think they think. You need big samples for market research
because of this (though focus groups bend this because they are
somewhat qualitative). This is why phone or web surveys require
hundreds or thousands of responses. Behavior-driven research is more
predictable. Basically, if 10/15 users are confused you can assume that many more will also be confused as well.
Example: If you ask someone "what do you think of this homepage?", you will need several hundred responses to gain statistical validity in order to validate what will be opinion-driven data. Asking someone their opinion does not constitute usability requirements, since usability testing is about isolating "how they will actually use" the design not just "what they think" of the design.
If you give a small set of users a scenario that forces them to interact with home page elements and observe their behavior, and listen to their unsolicited reactions, you will get a better idea of what they think and need. The driver here is expectation (governed by cognitive factors) vs. opinion which can be driven solely by emotional, social or personal factors.
Suggested Sample Sizes for Research
Corporate Usability Research:
- Surveys (phone and web) = ~240-~1,000 +
- Focus Groups = 15-20 (depends on audience segments involved and goals of study)
- Usability Testing = 10-15 participants
- Field Studies = 15-40 participants
- Card Sorting = 15-30 (higher is better since card sorting uses the statistical method of cluster analysis)
Academic Usability Research:
Samples are usually larger depending on size and scope and research objectives (e.g. 15 users per segment or 40-100 users in a usability test).
Jakob Nielsen's "test with 5 users" assumption
I think it is important to understand that Jakob Nielsen was trying to promote usability testing as a regular usability research activity in corporate environments. I believe he conducted this research (using a call center software application in the early 90's, rumor has it) in order to demystify the perceived complexity of setting up and running a usability test.
Remember in the early 1990's, only the hard core research and development labs at Apple, Bell Labs, Microsoft, IBM and Sun were doing usability testing. In Nielsen's much respected and equally criticized article "Why You Only Need to Test With 5 Users" (written in 2000) he recommends (based on the early 1990's analysis) that instead of opting for higher accuracy, you go for the "fast and dirty" approach of conducing three tests instead of one "elaborate" study.
Later on in the article Nielsen says that the rule only applies if your users are comparable. If you have other segments or user types, you will need to test more users.
Translation: 5 users per audience segment or target user group, or for a website with 3 diverse segments you will need 15 users for the one test.
Magic Number 15 for Usability Testing...or Why 5 Users is Not Enough
Laurie Faulkner ( PDF: 2004) has conducted new empirical research showing benefits from increased sample size. In her study, "Beyond the five-user assumption: Benefits of increased sample sizes in usability testing", she wrote:
It is widely assumed that 5 participants suffice for usability testing. In this study, 60 users were tested and random sets of 5 or more were sampled from the whole, to demonstrate the risks of using only 5 participants and the benefits of using more. Some of the randomly selected sets of 5 participants found 99% of the problems; other sets found only 55%. With 10 users, the lowest percentage of problems revealed by any one set was increased to 80%, and with 20 users, to 95%.
At Experience Dynamics, (usability consultancy) we have found that the cost savings of using fewer users is negligible. In other words, after you spend the time and money to set up, facilitate and report on the test, adding a few more users does not add "that much" time and money to the overall project.
The benefit you get from adding a few more users to the total (or in the case of 5 users, doubling the amount) is far greater than the small test that gives you "quick and dirty" results. In the case of running a series of usability tests or iterating your testing process (recommended for refinements based on evolving design decisions), you may want to choose a smaller number of users: I recommend no less than 8 users.
Frank Spillers, MS