CS 3205 Final Exam

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Scalar questions

Ask user to judge a specific statement on a numeric scale. Scale usually corresponds with agreement or disagreement with a statement

Open-ended questions

Asks for unprompted opinions, good for general subjective information. Difficult to analyze rigorously

Discount usability evaluation (qualitative)

Observe user interactions. Gather user explanations and opinions. Produces a description, usually in non-numeric terms. Anecdotes, transcripts, problem areas, critical incidents

"Do you like my interface?"

Leading question, please the experimenter bias

Naturalistic approach

Observation occurs in realistic setting. Provides useful, realistic data. More likely to generalize. Hard to arrange and do, time consuming. Good for external validity

Usability engineering approach

Observe people using systems in simulated/ artificial settings. Given specific tasks to do. Observations/ measures made as people do their tasks. Look for problem areas/ successes. Good for uncovering 'big effects'. Non-typical users tested, non-typical tasks, different physical environment/ social context

Acceptance testing

Verify that system meets expected user performance criteria

Quantity vs. Quality

Bayles and Orland (and their pots). Quantity produces better final product. Is important to do instead of just sitting and thinking/ theorizing (functional fixation)

Sharing multiple prototypes

Better than sharing best or working as a group on one. More individual exploration, more feature sharing, more conversational turns, better consensus, increase in group rapport

Double-blind studies

Both user and facilitator don't know which experimental group the user is in

Internal validity

Can reproduce the experiment multiple times yourself. Same prototypes, different users, same experimental setup, same conditions

Video recording

Can see and hear what a user is doing. One camera for screen, rear view mirror useful. Initially intrusive

Why do quantitative analysis?

Can't just ask people (preference is not performance). Observations alone won't work (effects may be too small to see but important, variability of people will mask differences). Need to understand differences between users. Good for small details

Case/field studies

Careful study of "system usage" at the site, good for seeing "real life" use, external observer monitors behavior, site visits

Ordinal scale

Classification into named or numbered ordered categories; no information on magnitude of differences between categories (preference, social status, gold/ silver/ bronze medals). Can do everything you can with nominal scale, plus merge adjacent classes, is also transitive. Can find median, percentiles.

Nominal scale

Classification into named or numbered unordered categories (country of birth, user groups, gender, etc.). Can tell whether an item belongs in a category, can count items in a category. Can't find means, medians, etc.

Interval scale

Classification into ordered categories with equal differences between categories; zero only by convention (temperature C or F, time of day). Can add, subtract, cannot multiply as this needs an absolute zero. Can find mean, standard deviation, range, variance. Can have problems with instrument calibration, reproducibility, readability, human error

Qualitative analysis

Collect non-numerical data. Conversation transcripts, general observations. Analyze for broad consistent patterns. Naturalistic vs. experimental

Unpaired T-test

Comparing two sets of independent observations. Usually different subjects in each group. Groups may be different sizes

Causal inference

Control as many external variables as possible, randomize confounding variables. Any outside variable that could effect the result of the study should be equivalent for ALL test subjects.

Discount usability evaluation (quantitative)

Count, log, measure something of interest in user interactions. Speed, error rate, counts of activities, etc.

Collecting user performance data

Data collected on system use. Exploratory vs. targeted

T-test assumptions

Data points of each sample are normally distributed, population variances are equal, individual observations of data points in sample are independent (a person's data is included no more than once)

Usability engineering lifecycle

Design -> implementation -> evaluation (and repeat)

Inspection

Designer tries the system or prototype. Can catch major problems in early versions. Not reliable as completely subjective, not valid as introspector is non-typical user, intuitions and introspection are often wrong. Help task-centered walkthroughs, heuristic evaluation

Initial design stages

Develop and evaluate initial design ideas with the user

Best iPhone study

Did not actually happen. Select participants for each phone group at random, train them for a while, test for speed and error rate

Correlation

Do X and Y co-vary? Requires measuring X and Y. Probably need two prototypes or two different versions of a prototype (each with different X)

Causation

Does X cause Y? Requires measuring X and Y (establishing correlation). Requires establishing time precedence. Requires controlling for all confounding variables

Iterative design

Does system behavior match user's task requirements? Are there specific problems with the design? What solutions work?

Ways to get around please the experimenter bias

Double blind studies, don't let the user know what you are measuring/ what you care about (until the study is over, ask questions that cancel each other out. Evaluation measure should ALWAYS have a baserate if possible

Direct observations

Evaluator observes users interacting with system. Excellent at identifying gross design/ interface problems. Validity depends on how controlled/ contrived situation is. Simple observation, think aloud, constructive interaction

Evaluation

Experiment (or set of experiments) meant to provide answers to at least one design question. MUST have a research question, usually related to usability requirements. Heuristic, quantitative, qualitative

External validity

Experiment applies generally to other outside settings. Different users selected from different 'pool', different prototypes with same general IV an DV, different designers running experiments. Results apply generally to experiments with the same abstract characteristics

Self selection

Experimental groups are chosen by participants in some manner

Experimental approach

Experimenter controls all environmental factors. Good for internal validity

Heuristic evaluation

Experts look at a system and analyze carefully, produce report of usability problems. Can be difficult/ expensive to find/ hire an expert

Chi-square test

Good for categorical data that is larger than FET. No category should total less than 5. X^2 = (sum i = 1 -> n) [(O_i - E_i)^2] / E_i. O is observed count for category, E is expected. E_i = (rowTotal*colTotal)/n. Use table to get p-value

Interviews

Good for pursuing specific issues. Vary questions to suit context, probe more deeply on interesting issues as they arise (let user lead conversation), often leads to specific constructive suggestions. Accounts are suggestive, time consuming, evaluator can easily bias the interview, prone to rationalization of events/ thoughts

Audio recording

Good for recording think aloud talk. Hard to tie into on-screen user actions

Exploratory data collection

Hope something interesting shows up (like a pattern), can be difficult to analyze

Formative conceptual model

How a person perceives a screen after it has been used for a while

Initial conceptual model

How a person perceives a screen the very first time it is viewed

Baserate

How often does Y occur in the current setting (if one exists)? Might make sense to look at competing product

Independent vs dependent variables

Independent: variable that is manipulated to study an effect via a change Dependent: variable that is measured for change after IV is altered

Fair comparison

Insert new approach into an actual production setting, recreate the production approach in your new setting, scale things down so you're looking at a piece of a larger system (most relevant), when expertise is relevant train people before running study

Methods for qualitative discount usability evaluation

Inspection, extracting the conceptual model, direct observation (think-aloud, constructive interaction), query techniques (interviews, questionnaires), continuous evaluation (user feedback, field studies)

Ratio scale

Interval scale with absolute, non-arbitrary zero (temperature K, length, weight, time periods). Can multiply, divide

Pre-design

Investing in new expensive system requires proof of viability

Targeted data collection

Look for specific information, but may miss something

Discount usability evaluation

Low cost methods to gather usability problems. Approximate: capture most large and many minor problems

Process of controlled experiments

Lucid and testable hypothesis (includes both independent and dependent variable(s)). Judiciously select and assign subjects to groups. Control for bias (in instructions, experimental protocols, subject selection). Apply statistical methods to data analysis. Interpret your results

Parallel prototyping

Make multiple prototypes in parallel. Separates ego from artifact (criticism of one design is not a criticism of designer). Supports transfer of positive attributes across designs

Continuous evaluation

Monitor systems in actual use (usually late stages of development like beta releases, delivered system; fix problems in next release). User feedback via gripe lines (users can provide feedback to designers while using the system through help desks, bulletin boards, email, built-in gripe facility) best combined with trouble-shooting facility

Degrees of freedom T-test

N1 + N2 - 2

Null hypothesis of T-test

No difference exists between the means of two sets of collected data

Scales of measurements

Nominal, ordinal, interval, ratio

First iPhone study

Numeric keypad and QWERTY users. Measure wpm. Internal validity, no external validity. Both groups did same (and did poorly compared to their own phones)

How many users should you observe?

Observing many users is expensive. Individual differences matter. Shoot for 5 - 10. Reasonable number of users tested, reasonable range of users, big problems usually detected with handful of users, small problems/ fine measures need many users

Directional T-test

Only interested if the mean of a given condition is greater (OR less) than the other

Styles of questions

Open-ended questions, closed questions (scalar, multi-choice, ranked). Can combine to get specific response while allowing for user's opinion (with a comment section)

Recording observations

Paper and pencil, audio recording, video recording

Critical incidence interviews

People talk about incidents that stood out. Usually discuss extremely annoying problems with fervor, not representative but important to the user, often raises issues not seen in lab tests

Please the experimenter bias

People want to make you feel good about your work (they assume you worked hard)

How do we compare prototypes?

Perform an evaluation

Quantitative evaluation/ analysis

Perform an experiment that involves the collection of quantitative data (numeric data or data that can be translated into numeric data). Run statistical tests to evaluate differences across prototypes

How to interview

Plan a set of central questions, could be based on results of user observations, focuses the interview. Avoid leading questions. Let user responses lead follow-up questions

Retrospective testing interviews

Post-observation interview. Perform observational test, create video record of it, have users view video and comment on it. Clarify events that occurred during system use, avoids erroneous reconstructions, users often offer concrete suggestions

Quantitative analysis

Precise measurement, numerical values. User performance data collection, controlled experiments

Natural vs. experimental

Precision and direct control over experimental design vs. desire for maximum generalizability in real life situations

Questionnaires/ surveys

Preparation is expensive but administration is cheap. Does not require presence of evaluator. Results can be quantified. Only as good as the questions asked. Only ask questions that will have answers you care about. Determine the audience you want to reach. Determine how to deliver/ collect questionnaire (on-line, web site, surface mail)

Paper and pencil

Primitive but cheap. Record events, comments, interpretations. Hard to get detail (writing is slow). Should probably have two people doing this

Leading question

Question that suggests the answer the examiner is looking for or contains the information the examiner is looking to have confirmed. Don't ask these!

Ways of controlling subject variability

Reasonable amount of subjects, random assignment, make different user groups an independent variable, screen for anomalies in subject group

Type 1 error

Reject null hypothesis when it is, in fact, true. Considered worse because null hypothesis is meant to reflect the incumbent theory

Multi-choice questions

Respondent offered a choice of explicit responses

Ranked questions

Respondent places an ordering on items in a list. Useful to indicate user's preferences. Forced choice

Closed questions

Restrict respondent's responses by supplying alternative answers. Makes questionnaires a chore for respondent to fill in. Can be easily analyzed. Watch out for hard to interpret responses (alternative answers should be very specific)!

Conceptual model extraction

Show user static images of prototype or screens during use, have user explain function of each screen element/ how they would perform a particular task (and why they think that). Initial vs. formative. Good for eliciting people's understanding before and after use. Poor for examining system exploration and learning

T-test

Simple statistical test, allows one to say something about differences between means at a certain confidence level. Unpaired vs. paired, non-directional vs. directional. FORMULAS. Look up critical value in table

Lucid and testable hypothesis

State a lucid, testable hypothesis. This is a precise problem statement

Statistical analysis

Tells us mathematical attributes about data sets (mean, variance, etc.), how data sets relate to each other, the probability that claims are correct (statistical significance -> 5%)

Confidence limits

The confidence that your conclusion is correct

Null hypothesis

There is no difference

Constructive interaction method

Two people work together on a task. Monitor normal conversations, removes awkwardness of think-aloud. Co-discovery learning -> use semi-knowledgeable 'coach' and novice, only novice uses the interface, gives insights into two user groups

Non-directional T-test

Two-tailed. No expectation that the direction of difference matters

Fisher's exact test

Use contingency table. p = [(a+b)!(c+d)!(a+c)!(b+d)!]/(a!b!c!d!n!). Good for simple comparisons between distributions of data, small sample sizes, very robust (p is exact). Bad for complicated multi-dimensional data, large sample sizes (b/c factorials)

Partial solution to usability engineering approach

Use real users. Task-centered system design tasks. Environment similar to real situation

Controlled experiments

Use traditional scientific method. Reductionist (clear convincing result on specific issues). Insights into cognitive process, human performance limitations, etc. Allows system comparison, fine-tuning of details, etc.

Direct observations in lab

User asked to complete set of pre-determined tasks

Direct observations in field

User goes through normal duties

Simple observation method

User is given task, evaluator just watches user. Does not give insight into the user's decision process or attitude

Think aloud method

Users speak their thoughts while doing the task (what they are trying to do, why they took action, how they interpret what the system did, etc.). Gives insight into what the user is thinking. Most widely used evaluation method in industry. May alter the way users do the task, unnatural, hard to talk if concentrating

Paired T-test

Usually a single group studied under both experimental conditions. Data points of one subject are treated as a pair. Both conditions will have same number of data points

Confounding variables

Variables that affect both X and Y. If not controlled, cannot say that X causes Y

3 questions to establish purpose of questionnaire

What information is sought? How would you analyze the results? What would you do with your analysis?

Interpret your results

What you believe results really mean, their implications on your research, their implications to practitioners, how generalizable they are, limitations and critique

Statistical vs. practical significance

When n is large, even a trivial difference may show up as a statistically significant result. Statistical significance does not imply that the difference is important (matter of interpretation)

Problem with visual inspection of data

Will almost always see variation in collected data. Is it normal variation or a real difference between data?

Better iPhone study

iPhone (after >= one month), numeric keypad, and QWERTY users. Measure speed and error rate. iPhone and QWERTY were same speed (numeric much slower). iPhone users make many more errors. Still have problem of self selection


Kaugnay na mga set ng pag-aaral

Chapter 5: Receivables and Sales

View Set

Impressionism/Post Impressionism Test 1

View Set

Section 9 - 2.1 Intellectual Property Rights

View Set