Evaluation techniques for interactive systems

9 min readDec 27, 2020

What is the evaluation?

Evaluation role is to access designs and test systems to ensure that they actually behave as we expect and meet user requirements.
Ideally, evaluation should occur throughout the design life cycle, with the results of the evaluation feeding back into modifications to the design.

Goals of evaluation

Evaluation has three main goals.

Assess the extent of system functionality

The system’s functionality is important in that it must accord with the user’s requirements. Evaluation at this level may measure the user’s performance with the system to assess the effectiveness of the system in supporting the task.

e.g - if a filing clerk is used to retrieving a customer’s file by the postal address, the same capability (at least) should be provided in the computerized file system.

2. Assess the effect of interface on the user

It is important to assess the user’s experience of the interaction and its impact upon the user. And includes considering aspects such as how easy the system is to learn, its usability and the user’s satisfaction with it. It may also include his enjoyment and emotional response, particularly in the case of systems that are aimed at leisure or entertainment.

3. Identify specific problems

This may be aspects of the design which, when used in their intended context, cause unexpected results, or confusion amongst users. And it is related to both the functionality and usability of the design.

Evaluation through expert analysis

Evaluation of a system should ideally be performed before any implementation work has started. If the design itself can be evaluated, expensive mistakes can be avoided, since the design can be altered prior to any major resource commitments. There are a number of methods have been proposed to evaluate interactive systems through expert analysis. These methods can be used at any stage in the development process from a design specification, through storyboards and prototypes, to full implementations, making them flexible evaluation approaches.

Few experts based evaluation techniques

Cognitive Walkthrough
Heuristic Evaluation
Model-based evaluation

Cognitive Walkthrough

This is originally proposed by Polson and colleagues as an attempt to introduce psychological theory into the informal and subjective walkthrough technique. The main focus is to establish how easy a system is to learn (by hands-on, not by training or user’s manual).

To do walkthrough, you need four things

A specification or prototype of the system.
A description of the task the user is to perform on the system.
A complete, written list of the actions needed to complete the task with the proposed system.
An indication of who the users are and what kind of experience and knowledge the evaluators can assume about them.

Heuristic Evaluation

The heuristic is proposed by Nielsen and Molich which is a guideline or general principle or rule of thumb that can guide a design decision or be used to critique a decision that has already been made. 3–5 evaluators is sufficient.

Nielsen’s ten heuristics are -

Visibility of system status - Always keep users informed about what is going on, through appropriate feedback within a reasonable time.Eg if a system operation will take some time, give an indication of how long and how much is complete.
Match between system and the real world - System should speak the user’s language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order
User control and freedom - Users often choose system functions by mistake and need a clearly marked ‘emergency exit’ to leave the unwanted state without having to go through an extended dialog. Support undo and redo.
Consistency and standards - Users should not have to wonder whether words, situations or actions mean the same thing in different contexts. Follow platform conventions and accepted standards
Error prevention - Make it difficult to make errors. Even better than good error messages is a careful design that prevents a problem from occurring in the first place
Recognition rather than recall - Make objects, actions and options visible. The user should not have to remember information from one part of the dialog to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate
Flexibility and efficiency of use - Allow users to tailor frequent actions. Accelerators - unseen by the novice user — may often speed up the interaction for the expert user to such an extent that the system can cater to both inexperienced and experienced users
Aesthetic and minimalist design - Dialogs should not contain information that is irrelevant or rarely needed. Every extra unit of information in a dialog competes with the relevant units of information and diminishes their relative visibility.
Help users recognize, diagnose and recover from errors - Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.
Help and documentation - Any such information should be easy to search, focusing on the user’s task, list concrete steps to be carried out, and not be too large.

Model-based evaluation

Model-based evaluation is using a model of how a human would use a proposed system to obtain predicted usability measures by calculation or simulation. These predictions can replace or supplement empirical measurements obtained by user testing. Model-based evaluation is combining
cognitive and design models for the evaluation process.

Models used for model-based Evaluations,

GOMS model
Keystroke-level model
Design rationale
Dialog models

Evaluation through user participation

User participation in evaluation tends to occur in the later stages of development when there is at least a working prototype of the system in place.

Styles of evaluation

Techniques that are available for evaluation with users, we will distinguish between two distinct evaluation styles: those performed under laboratory conditions and those conducted in the work environment or ‘in the field’.

Laboratory studies-

Users are taken out of their normal work environment to take part in controlled tests, often in a specialist usability laboratory.

Advantages -

Specialist equipment available- Contain sophisticated audio/visual recording and analysis facilities, two-way mirrors, instrumented computers and the like, which cannot be replicated in the work environment.
Uninterrupted environment- The participant operates in an interruption-free environment.

Disadvantages -

Lack of context - The unnatural situation may mean that one accurately records a situation that never arises in the real world
Difficult to observe several users cooperating

Appropriate - if system location is dangerous or impractical for constrained single user systems to allow controlled manipulation of use.

Field studies -

This type of evaluation takes the designer or evaluator out into the user’s work environment in order to observe the system in action.

Advantages -

Natural environment - Observe interactions between systems and between individuals that would have been missed in a laboratory study.
Context retained (though observation may alter it)- Seeing the user in his ‘natural environment’.
Longitudinal studies are possible.

Disadvantages -

Distractions - High levels of ambient noise, greater levels of movement and constant interruptions, such as phone calls, all make field observation difficult.
Noise

Appropriate - where context is crucial for longitudinal studies.

Empirical methods: experimental evaluation

This provides empirical evidence to support a particular claim or hypothesis. The evaluator chooses a hypothesis to test. Any changes in the behavioural measures are attributed to the different conditions.

There is a number of factors that are important to the overall reliability of the experiment.

Participants

Represent the set of people who going used for the experiment. Since the choice of a participant is vital to the successful participants should be chosen to match the expected user population as closely as possible.

2. Variables

Represent things to modify and measure in the evaluation. There are two types of variables, independent and dependent variables.

Independent variables - Characteristic changed to produce different conditions. e.g. interface style, a number of menu items
Dependent variables - Characteristics measured in the experiment i. e.g. time is taken, a number of errors.

3. Hypothesis

A hypothesis is a prediction of the outcome of an experiment. It is framed in terms of variables. The aim of the experiment is to show that this prediction is correct, this is done by disproving the null hypothesis.

4. Experimental design

This design represents the process of doing the evaluation. There are two main methods and they are between-subjects and within-subjects.

Between-subjects (or randomized) design - each participant is assigned to a different condition, more users required and variation can bias results.
Within-subjects (or repeated measures) - each user performs under each different condition and less costly and less likely to suffer from user variation

Once you gather the data you need to analyze the data. You need to identify the type of data, discrete or continuous and then according to that you need analyze the data using statistical methods.

Observational techniques

Think Aloud

In this method, a user asked to describe what he is doing and why what he thinks is happening. Eg. describing what he believes is happening, why he takes an action, what he is trying to do. This method requires little expertise ( simplicity ) and provides useful insight and shows the actual use of the system. But this method is can’t be used in every scenario.

Advantages

Simplicity - requires little expertise
Can provide useful insight with an interface
Can show how system is actually use

Disadvantages

Subjective
Selective - depending on the tasks provided
Act of describing may alter task performance - The process of observation can alter the way that people perform tasks and so provide a biased view

2. Cooperative Evaluation

In this method, user and evaluator collaborate and ask each other questions throughout. This is constrained and easier to use. Also, user encourages to even criticize the system.

Advantages

Less constrained and easier to use
User is encouraged to criticize the system
Clarification possible

3. Protocol Analysis

Methods for recording user actions in protocol analysis,

paper and pencil - cheap, limited to writing speed
audio - good for a think-aloud, difficult to match with other protocols
video - accurate and realistic, needs special equipment, obtrusive
computer logging - automatic and unobtrusive, large amounts of data difficult to analyze
user notebooks- coarse and subjective, useful insights, good for longitudinal studies
Mixed-use in practice
audio/video transcription difficult and requires skill.
Some automatic support tools available

4. Automated Analysis

Analyzing protocols, video, audio or system logs is time-consuming and tedious by hand but automatic analysis provides tools like EVA ( Experimental Video Annotator ) which is a system that runs on a multimedia workstation with a direct link to a video recorder to support the task. In Automated Analysis, the analyst has time to focus on relevant incidents and avoid excessive interruption of the task.

Advantages

The analyst has time to focus on relevant incidents
Avoid excessive interruption of the task

Disadvantages

Lack of freshness
Maybe post-hoc interpretation of events

5. Post-task walkthrough

In this method, the user reflects on the action after the event. This provides analyst time to focus on relevant incidents and avoid excessive interruption of the task. But this method lack freshness.

Query techniques

Interviews

In this Method Analyst questions user on one to one basis with prepared questions about his experience with the design. This method is informal, subjective and relatively cheap compared to other methods. But it is more time consuming than other methods.

Advantages

Can be varied to suit the context
Issues can be explored more fully
Can elicit user views and identify unanticipated problems

Disadvantages

Very subjective
Time-consuming

2. Questionnaires

In this method, users are given a set of fixed questions about what they prefer and what they think about the design. This method gives the chance to reach a big group of people in less time. But this is less flexible and less probing.

Advantages

Quick and reaches large user group
Can be analyzed more rigorously

Disadvantages

Less flexible
Less probing

Evaluation through monitoring physiological responses

Eye Tracking

In Eye-tracking method position of the eye is tracked through head or desk mounted equipment. Using that equipment the following measurements are taken and by analyzing those measurements the evaluation is conducted.

Fixations: eye maintains a stable position.

Number of fixations — The more fixations the less efficient the search strategy
Fixations duration — Indicate the level of difficulty with the display

Saccades: rapid eye movement from one point of interest to another
Scan paths: moving straight to a target with a short fixation at the target is optimal

Physiological Measurements

In this method users emotions and physical changes when using the user interface is observed and based on those data the evaluation is conducted.

Following are such changes observed in the process,