General Outline of a Usability Test

According to Lauesen [89]:Chap.1 + 13 the following checklist should be considered:

Users:
Description of users and the assumed properties
Domain knowledge:
What kind of knowledge on the application situation has to be presupposed for the user.
Technical knowledge:
What special technical knowledge has to be assumed for the user.
Appropriate user:
Ways to find test subjects (5-10 users are enough to discover about 80 -90% of the problems)
Test area:
Where the testing should be carried out; what equipment is necessary.
Facilitator:
Which person should take the lead in the test?
Interface:
What will be the interface for the user? (In the early stages of experimentation and prototype development of a system there is usually not that interface available, which will appear to the user later, but possibly only dummies or mock-ups, simplified prototypes, or simulators). There can be test questions that can be made only in very realistic scenarios (maneuvers, driving, operating complex systems ...).
Minute-takers:
Who writes protocols during the test? (What should be recorded as data?)
Test tasks:
Definition of the test tasks to process the subjects (these test questions should come from the area of those tasks to be processed by the system anyway). The selection of test questions is not easy: What tasks are important? Which sequences and dependencies to be considered? What time requirement is rated as 'normal'? Are the terms clear?
Presentation of task:
That statement must be clear and must not contain any references to the nature of the solution.
Beginning of test:
Clearly defined start configuration, which can be produced again and again.
On-line help:
The interface may include assistance for the user, which are then part of the application and can be tested as such.
User Conduct:
Only one user at a time. This enable the possibility to 'think aloud'. Although this can affect the behavior some how, it can provide valuable clues to the nature of the difficulty (in voice interfaces, of course, not possible).
Data collection:
At least the protocols should be stored. Moreover, any kind of additional data that can improve the assessment is helpful, e.g. a video recording (possibly from multiple perspectives), keyboard loggers in the case of computer tests, etc.
Debriefing:
When finishing the test the investigator asks specifically for both good and bad features of the system. You can support this phase in addition to exercise appropriate further questionnaires.
Test schedule:
Example: Welcome and Introduction (10 min), test tasks (15 min), requests (15 min), detection of the error (X min).
Processing of data:
For this some time has to be scheduled. The pre-established criteria must be clear enough to come to unequivocal data classifications.
Analysis of data:
Conclusions about good and bad features of the system interface (see next section)
Impact on development:
Recommendations for the overall system.

Gerd Doeben-Henisch 2012-12-14