Chapter 4


  • Determinants of an evaluation plan:
    • Stage of design (early, middle, late)
    • Novelty of the project (well defined versus exploratory)
    • Number of expected users
    • Criticality of the interface (life-critical medical system vs. museum-exhibit support system)
    • Costs of the product and finances allocated for testing
    • Time available
    • Experience of the design and evaluation team

Expert Reviews

  • A natural starting point for evaluating new or revised interfaces is to present them to colleagues or customers and ask for their opinions.
  • These methods depend on having experts ( whose expertise may be in the application or user-interface domain)
  • Expert reviews can be conducted on short notice and rapidly
  • Expert reviews can occur early or late in the design phase
  • The outcome may be a formal report with problems identified or recommendations for changes
  • Expert reviews usually take from half a day to one week, although a lengthy training period may be required to explain the task domain or operational procedures.

There are a variety of expert-review methods from which to choose:

  • Heuristic evaluation
    • The expert reviewers critique an interface to determine conformance with a short list of design heuristics, such as the Eight Golden Rules.
  • Guidelines review
    • The interface is checked for conformance with the organizational or other guidelines document.
  • Consistency inspection
    • The experts verify consistency across a family of interfaces.
  • Cognitive walk-through
    • The experts simulate users walking through the interface to carry out typical tasks.
  • Metaphors of human thinking(MOT)
    • The experts conduct an inspection that focuses on how users think when interacting with an interface.
  • Formal usability inspection
    • The experts hold a courtroom-style meeting, with a moderator or judge, to present the interface and to discuss its merits and weaknesses.
  • Different experts tend to find different problems in an interface, so three to five expert reviewers can be highly productive, as can complimentary usability testing.

Usability Testing and Laboratories

  • The usability-test report provided supportive confirmation of progress and specific recommendations for changes.
  • Usability testing not only sped up many projects, but also produced dramatic cost savings.
  • Usability tests are designed to find flaws in user interfaces.

Usability Labs

  • A typical modest usability laboratory would have two 10-by-10-foot areas, divided by a half-silvered mirror.

Step-by-Step Usability Guide

  • Plan
  • Analyze
  • Design
  • Test and Refine

Testing Considerations

  • A detailed test plan is needed
  • Pilot test

Handling participants and the Institutional Review Board (IRB)

  • Representative samples of relevant populations
  • Controls: physical, time, place, etc.
  • IRB and Informed Consent
    • the IRB governs any research performed with human subjects.
  • Record and annotate observations (Keystrokes, menu selections, eye-tracking)
  • Participant encouragement


  • “Thinking aloud:” (Enunciating what is being done as it is being done) (Verbally evaluating on the fly)
    • often leads to many spontaneous suggestions for improvements
    • The think-aloud procedure may alter the true task time. i.e. The users may pause the task activity as they vocalize their thoughts.
  • Retrospective think aloud
    • With this technique, after completing a task users are asked what they were thinking as they performed the task.
    • The drawback is that the users may not be able to wholly and accurately recall their thoughts after completing the task.

The spectrum of usability testing

Usability testing comes in many different flavors and formats. Most of the current research demonstrates the importance of testing often and at varied times during the design cycle. The purpose of the test and the type of data that is needed are important considerations. Testing can be performed using combinations of these methods as well.

  • Paper mock-ups and prototyping.
    • A test administrator plays the role of the computer by flipping the pages while asking a participant user to carry out typical tasks.
    • Inexpensive, rapid, and usually productive.
  • Discount usability testing.
    • Quick and dirty approach to task analysis, prototype development, and testing
    • Widely influential because it lowers the barriers to newcomers.
    • Advocates point out that most serious problems are found with only a few participants, enabling prompt revision and repeated testing.
  • Competitive usability testing.
    • Competitive testing compares a new interface to previous versions or to similar products from competitors.
  • Universal usability testing.
    • This approach test interfaces with highly diverse users, hardware software platforms, and networks.
    • This will result in the creation of products that can be used by a wider variety of users.
  • Field tests and portable labs.
    • This testing method puts new interfaces to work in realistic environments or in a more naturalistic environment in the field for a fixed trial period.
  • Remote usability testing.
    • Since web-based applications are available internationally, it is tempting to conduct usability tests online, avoiding the complexity and cost of bringing participants to a lab.
    • This makes it possible to have larger numbers of participants with more diverse backgrounds, and it may add to the realism, since participants do their tests in their own environments and use their own equipment.
  • Can-you-break-this tests.
    • Pioneered by game designers
    • Users try to find fatal flaws in the system or otherwise destroy it.

For all its success, usability testing does have at least two serious limitations: it emphasizes first-time usage and provides limited coverage of the interface features.

Usability test reports

The U.S. National Institute for Standards and Technology took a major step towards standardizing usability-test reports in '97. The Common Industry Format describes the testing environment, tasks, participants, and results in a standard way sa as to enable consumers to make comparisons. The groups' work is ongoing.

Survey Instruments

User surveys are a familiar, inexpensive, and generally acceptable companion for usability tests and expert reviews. The keys to successful surveys are clear goals in advance and development of focused items that help to attain those goals.

Preparing and designing survey questions

  • A survey form should be prepared, reviewed by colleagues, and tested with a small sample of users before a large-scale survey is conducted.
  • Pre-test or pilot-test any survey instrument prior to actual use.
  • Ascertain characteristics about the users (Background demographics, Experience, Job responsibilities, Personality style…)

Things to look for in a survey:

  • Task domain objects and actions
  • Interface domain metaphors and action handles
  • Syntax of inputs and design of displays

Types of surveys:

  • Likert scale
    • i.e. (strongly agree) - (agree) - (neutral) - (disagree) - (strongly disagree)
  • Bipolar
    • Rank from 1-10 between two extremes
    • i.e. -( Hostile 1-2-3-4-5-6-7-8-9-10 Friendly )

Sample Questionnaires

  • Questionnaire for User Interaction Satisfaction (QUIS)
  • System Usability Scale (SUS)
    • the "quick and dirty" scale

Acceptance Tests

  • A “once over” upon implementation
  • Generally, a set of tests to establish that requirements are met
  • Measurable criteria for the user interface can be established for the following:
    • Time for users to learn specific functions
    • Speed of task performance
    • Rate of errors by users
    • User retention of commands over time
    • Subjective user satisfaction

The goal of early expert reviews, usability testing, surveys, acceptance testing, and field testing is to force as much as possible of the evolutionary development into the pre-release phase, when change is relatively easy and inexpensive to accomplish.

Evaluation During Active Use

  • Active use “evolves” the system
    • Addressing all users
  • Perfection is elusive, but improvement is attainable
  • How to:
    • Interviews
    • Focus groups
    • Analysis of use via data logging (The software architecture should make it easy for system managers to collect data about the patterns of interface usage, speed of user performance, rate of errors, and/or frequency of requests for online assistance.)
  • Follow up:
    • Telephone, e-mail, suggestion boxes, etc. (provide extremely effective and personal assistance to users who are experiencing difficulties.)
    • Communities
    • Forums, wikis, newsgroups

Automated Evaluation

  • Display Analysis Program (Tullis, 1988)
  • Markup Validation Service (W3C)
  • NIST Web Metrics (NIST)
  • WebTango (UC-Berkeley)
  • Run-time logging
    • Key strokes, referrals (following links), screen changes, time to complete tasks, etc.

Controlled Psychologically Oriented Experiments

  • Human performance measures
    • Especially related to perception and cognition
    • As relates to system usability
  • Scientific method and experimental approaches:
    • Match theoretical framework to practical problem
    • Testable Hypothesis/es
    • Small number of Independent Variables (the thing that is being manipulated)
    • Carefully selected Dependent Variables (something that happens as a result of the experiment and is usually measured)
    • Careful selection of participants
    • Control for bias
    • Apply statistical methods to analysis
  • Resolve problem, advance/refine theory, and give advise for further research
Midterm Exam
Chapter 3
1 2 3 4 5 6 7 Chapter 5
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License