IAT 432 Midterm
Formative vs. Summative Methods
seeking possibilities for improvement early on in dev. vs. evaluating overall performance of final design
p-value
"tells you assuming null is true, the chance you got a significant result, i.e. observed difference when there isn't one" choose p < 0.05
interview recommendations
- FLOW: nodding and engaging "yes, i see, etc.." - NON-DIRECTION: continuing the conversation without giving direction (repeat last few words implying you want to hear more, neutral questions "how do you feel about this....") - TRANSITION: moving on to different topic (use an old topic, abruptly say we are moving on) - SPECIFICITY: want more details (ask them to show you how they do something, "tell me more") -
Benefits and drawbacks of usability study
- benefits: find usability issues, improve end users experience - drawbacks: neglects emotional responses, issues with validity, appropriate audience is hard to find
ways to ensure external validity
- carefully select participants - carefully setup testing environment - select useful, real tasks
ways to ensure internal validity
- complete training tasks - let the user have the product for a bit so the "new toy" feeling fades - reduce study fatigue, less tasks
pros of cued recall debrief
- more immersive than post-task interview - easier to implement, less equipment - can be used in different systems - evaluating affect may be better for design than evaluating emotion
cons of cued recall debrief
- participant may be unwilling to speak about feelings - can take time - participant may not accurately describe their feelings
How to run a controlled experiment?
1. Hypothesis 2. Plan tasks 3. Switch independent variables 4. Measure dependent variables 5. Analyze statistics 6. Derive conclusions
steps for cued recall debrief
1. record first person view 2. play it back to be re-immersed 3. interview during play back and record third-person view, ask to describe feelings 4. analyze third-person videos and find comments related to affective experience
how to affinity diagram?
1. take notes of users anecdotes 2. group notes by themes 3. label these groups 4. draw relationships between these groups 5. report on the themes, using a story
how to coding?
1. transcribe notes from interview 2. list questions/focal points 3. go through notes and answer questions, by coding data 4. memos to take note of thoughts (heading, date, time) 5. go through data and count labels get a sense of the importance of different data
What is a hypothesis and what are the four elements required?
Adult users (population) type (activity/task) more words per minute (DV) using a QWERTY keyboard (IV 1) than when using a DVORAK keyboard (IV 2) 1. Population 2. Activity 3. Dependent Variable 4. Independent Variable
Evaluation ethics
Always treat participants with respect - Don't waste their time: Dont ask unnecessary questions - Make users feel comfortable: Speaking to them normally, - Maintain privacy: no names in data, images, etc. - Inform users: let them know they will be recorded, what they are doing and why, how long it will take
What is a usability study?
Evaluation method used to help find usability issues with in a design.
Why null hypothesis?
IT IS NOT THE CASE no difference, no impact
Behavioural vs. Attitudinal Data
how users behave vs. how users feel
hawthorne effect
a change in the participants behaviour because they know they are being watched
controlled variable
a factor in an experiment that remains the same
confounding variable
a factor other than the factor being studied that may affect the results of a study
affinity diagramming
a spatial clustering method to categorize and show relationships amongst data uncover the main themes
t-test
a statistical test used to evaluate the size and significance of the difference between two means only works on normal distributions, almost always use two-tailed
cognitive/experiential response
a subjective feeling measured in interviews/questionnaires will only get emotion, no affect
Controlled Experiment
a test to compare two products along a single dimension, aiming to isolate that single variable to see if there is a casual relationship between the independent variable and the dependent variable
external validity
how well do the results of the study generalize to the public?
emotion vs affect
affect is the quick immediate feelings, while emotions are the overall feeling you have about something after reflecting on it affect influences emotion
what is a heuristic evaluation?
an examination of the interface, keeping in mind usability heuristics, to find usability issues
SCALE OF MEASUREMENT: ratio scale
an interval scale with a true 0 (there is a lack of it) ex. weight
USABILITY TECHNIQUE: Conceptual Model Extraction
ask user to explain function of each element, how they would perform a task. get a sense of what a user thinks your design is for at first glance. can be used at first glance (initial) OR after used for some time (formative) Bad for examining design exploration and learning
USABILITY TECHNIQUE: Think Aloud
ask users to speak aloud their thoughts For when you want insight into what your user is thinking Bad: the user may think its weird and awkward so they may behave differently
cued recall debrief
asking your participant to watch back and review their test (used for finding affect without interrupting your tester during testing)
mean
calculate average
reliability
consistency or repeatability of the experiment
SCALE OF MEASUREMENT: interval scale
consistent difference between each item ex. weight
descriptive statistics vs inferential statistics
describe the data: graphs, variability, central tendency vs. make inferences (conclusions) about the data
face validity
does it seem to be measuring what you want to measure?
cons of heuristic evaluation
does not involve real users, you are bound to miss some things, does not get specific enough
construct validity
does what you are measuring represent the thing you want to determine? ex. does measuring WPM = better typer?
between subjects
each participant only tests with one of the two independent variables
when and why would you use a heuristic eval?
early on in development, since it can be done without working digital prototypes, can stop a problem from growing to the entire interface.
within subjects
every participant tests with both independent variables
novelty effect
excited about tasks because they are trying a new thing fix: let them get used to it
pros of heuristic evaluation
faster and cheaper than user testing
learning effects
for within subjects, a tester should get better at the task because they have already tried it with one of the IV's fix: counterbalance!
range
highest - lowest
internal validity
how certain are we that the change in IV causes the change in DV?
inter-rater reliability
how simliarily a group of observers code an event -1 to 1 (completely opposite to completely the same) if >0.9 then it is reliably the same
p-value results
if p < 0.05, reject null-hypothesis "there is less than a 5% chance that what you saw, was just by chance" if p >= 0.05 cannot reject or accept null hypothesis "there is more than a 5% chance that what you saw, was just by chance"
critical incidence interviews
interviews that aim to find about what stood out most to them in testing
retrospective testing interviews
interviews that ask users to reflect about what just happened in testing
USABILITY TECHNIQUE: Query techniques
interviews/questionnaires pre-test: gain background information, if they are correct audience for test post-test: assess user's thoughts about the system, ask about 5 aspects of usability
USABILITY TECHNIQUE: Simple Observation
just watch the user For getting an un-intrusive view into how a user completes a task bad: get no insight into the user's decision process or attitude
SCALE OF MEASUREMENT: ordinal scale
labelled in sequential order, not related to difference between each item ex. education level (elementary, secondary, university...)
SCALE OF MEASUREMENT: nominal scale
labelled variables, does not imply an quantitative value ex. numbered off items
5 aspects of usability
learnability, efficiency, memorability, errors, satisfaction
intrinsic motivation inventory (IMI)
measures enjoyment (used for emotion) - interest/enjoyment - perceived competence - perceived choice - pressure/tension
measures of variability
measures of the spread of data points range, variance, standard deviation
unstructured interview
more like a conversation, no planned questions pros: rich detail that you may not have planned for cons: hard to replicate, hard to analyze
Quantitative vs Qualitative Data
numerical data vs descriptive data based in language
study fatigue
participants get tired of doing your study tasks fix: less tasks
How do you conduct a usability study?
plan it, pilot test, recruit participants, pre-test questionnaire, participants complete tasks while you observe, post-test interview/questionnaire
structured interview
pre-determined questions that are short and clearly worded confirmatory rather than exploratory pros: easy to replicate, get specific answers, easy to analyze cons: restrictive answers, details can be lost
semistructured interview
pre-select topic areas and potential questions pros: can guide interview but not enforce what is discussed cons: can get off topic
importance of neutral comments
provides additional context that cannot be derived if only positive and negative comments
analytic vs. participative methods
studying the users vs. involving the users in the study
coding
tagging data with labels and memos, multiple tags can go on one piece of data inductive: you build a theory while doing this
standard deviation
the average distance of a data point from the mean
dependent variable
the experimental factor that is being measure
independent variable
the experimental factor that is manipulated, the factor that is being studied
median
the middle measurement
mode
the most frequent measurement
USABILITY TECHNIQUE: Constructive Interaction/Co-Discovery Learning
two-people work together on tasks, one semi-knowledgable "coach" and novice. Only novice uses the interface. Monitor their normal conversations removes the awkwardness of think aloud
unpaired vs paired t-test
unpaired (between): different subjects in each group paired (within): same subjects in each group
who would do a heuristic eval?
usability experts, subject matter experts, double experts
behavioural response
your body's external response measured with facial expressions, changes in tone, gestures could measure affect, but need to interpret data
physiological response
your body's internal response measured with heartbeat/breathing monitor, sweat could measure affect, but need to interpret data