What is a good test
previously validate test
Concurrent validity - compare to a (the "best" available test). -Skinfolds to underwater weighing, 12 min run to VO2max from a lab test.
test-retest method (most common)
- 2 administrations of the same test to the same group of individuals - Same day or over a short period of time (i.e. one week) statistical options? - Correlations - Pearson (r), Spearman (rho) - Coefficient of determination (r2 ) Standard Error of Measurement (SEM) - T-test - compare means (paired samples) - ANOVA - repeated measures - test for difference but can use a special statistic called an Intra-class Correlation (ICC) sometimes called Cronbach's Alpha
validity, reliability & objectivity
- 2 important qualities of a test is accuracy and consistency - Want to know how to use the tool but also the quality of the data it generates
measurement error
- A measured score = an observed score (X) - What we really want to know is the true score (T) - There is always some error in measurement (e) - observed score = true score ± measurement error
tournament play/sport skills
- Are the more skilled individuals more successful? - Correlate sport skills to standings or rankings in competition
reliability
- Consistency of test scores (same results over many tests) - Indicates the true ability of the individual. Tests with high reliability: - physical performance tests - throwing, jumping, strength Tests with lower reliability: -skills tests, agility, subjectively assessed tests.
validity
- Degree to which a test measures what it claims to be - The most important component Allows for interpretation of data to be correct
sources of concern
- Fatigue, motivation, environmental factors - Raw score versus ranked scores -- Rankings are always the same but the raw scores are used - Knowledge tests - if similar questions are used subjects begin to remember the questions (scores go up due to familiarity) - Young subjects - changes can occur due to maturation (increases in size, strength, cognitive ability)
construct validity
- How well a test measures a trait (anxiety, motivation, fair play) - Cardiorespiratory fitness is a construct How do we assess it? - VO2max - heart rate, distance, run time - measures that relate to the construct - Ability to play a game or sport is a construct. - We could identify people with high or low ability (rating) - Do skills tests show the difference in ability?
face validity
- It appears the test measures what it purports to measure - You can see the way the test relates to the activity/information -- 40 yd sprint = running speed (running backs in football) -- Badminton underhand clear test = ability to perform in game --Agility "T" test = vball movements
objectivity
- Specific form of reliability - Also called inter-observer or rater reliability. - Can 2 or more people can administer the same test and obtain similar scores? - Test re-test with two test administrators -correlation between scores (determined similar to reliability)
SEMO agility test
- Test objective: to measure agility while moving the body forward, backward, and sideward. Validity = 0.63 AAPHERD shuttle run test Reliability = 0.88 Objectivity = 0.97
validity coefficient
- comparison of a test to another test that purports to measure the same construct - Correlation coefficient = 0.80 or higher (ideally)
parallel forms method
-Compare 2 different formats for the same test (applies more to knowledge tests) - Pearson Product-moment correlation or ICC (ANOVA).
factors affecting validity
1. Characteristics of test takers - similar to group the test was validated on (sex, age, skill, strength, experience) 2. Criterion measure selected (accuracy of) 3. Reliability - if a test is not reliable it is difficult to determine if it is valid (repeatability) 4. Administrative procedures - clear directions, test done in the same way, similar environmental factors (temp, humidity), motivation
factors affecting reliability and objectivity (7)
1. Clear instructions are given for administering/scoring the test 2. Trained testers (test given the same way) 3. Measurement procedures are followed (avoid mistakes) 4. Appropriate tools of measurement are used (inter-observer - same tools) 5. Heterogeneity of the sample = higher reliability -- Homogenous samples - correlation may be lower between between two sets of scores. 6. Length of the test - generally, the longer the test the greater its reliability. 7. Procedures - -- Subject follows directions given -- Subjects prepared, ready and motivated -- Environmental factors favourable for subjects (noise level, lighting, temperature, etc.)
things to consider (7)
1. valid and reliable 2. cost - expensive equipment 3. time - how long 4. Ease of administration - more than one person? Training required? Instructions easy to follow? Demands placed on the subjects, is it reasonable? 5. Scoring - qualitative or quantitative? Easy or hard? 6. Norms - normative or comparison data? 7. Sport skills - resemble what is required in the game situation
Expert Ratings/Rankings
Have a group of experts rate the subjects and correlate this to other measures (i.e. Skills tests)
criterion-based validity
How well a test correlates to a criterion measure a) Predictive Validity - predict future performance --SAT exams for college/university entrance in USA relates to success in university b) Concurrent Validity - results of one test are compared to the results of a "gold standard"
content validity
How well a test measures the skills, abilities, or information that has been presented to the test takers