PSYC 368 - Tests and Measurements Exam 1
What is the range of a correlation coefficient (i.e. what would the number look like)? What else is this called?
-1.0 to +1.0 The closer the number to 1, the more correlated the data is. Also called reliability coefficient
Charles Darwin
-Author of Origin of Species, stresses the importance of what he called "descent with modification" (evolution). -Theorized that through the process of variation, certain traits and attributes are selected and passed down on from generation to generation as organism adapt.
Francis Galton
-First person to devise a set of tools for *assessing individual differences in his lab* -Thought that male pattern baldness was the result of furnace-like brain activity that singed off the hair. -"Wherever you can, count"
Alfred Binet
-Minister of Public Institution in Paris -Applied new tools to the assessment of Parisian school children who were not performing well -(With Theodore Simon) used tests of intelligence in a variety of settings, and for different purposes beyond just evaluating schoolchildren's abilities -1916: Lewis Terman at Stanford extended his and Simon's work.
Robert Yerkes
-WW1 -Psychologist from Harvard -Convinced the government and Army that 1.75 million recruits should be tested to classify and assign them
James Cattell
-Worked on the first "mental test" -Founder of Psychological Corporation (1920s)
Be able to interpret a reliability coefficient for an IQ test and a personality test.
0.0 to 1.0 .5 or less = not reliable .6 = weak .7 = reasonably reliable (this is good for personality research due to changes in mood/attitude) .8 = good, strong reliability .9 = excellent, very reliable .9 or more = excellent, potentially overly reliable
Draw a normal distribution. Know what percentage typically falls within 1, 2, and 3 standard deviations from the mean.
1 - 34.1 2 - 13.6 3 - 2.1
What is a standardization sample? Why is it important?
A large sample of test takers who represent the population for which the test is intended (also known as a norm group).
What is the relationship of reliability and validity?
A test can be reliable but not valid
what kind of test is the SAT
Aptitude Test
what kind of test is the Strong Interest Inventory
Aptitude test
What is the Spearman-Brown Correlation Formula? Why would you use this?
Corrects lowered reliability, it allows us to estimate what reliability would be if the test were not split in half. Psychometrics
How would you explain the process of testing construct validity? What is the time frame usually like?
Correlate one test with others
How do you compute test-retest reliability?
Correlate scores from one test taken at two times. The interval should be long enough to reduce memory effects, but not so long that real changes could have occurred.
Inferential Statistics
Deals with the estimation of population characteristics using a sample (generalization) ex: election night results
what kind of test is the Woodworth Personal Data Sheet
First personality test, developed to screen Army recruits for shell shock
How can you usually increase reliability of a test?
Increase the number of items on the test.
what kind of test are the Weschler Scales
Intelligence
What type of reliability would you typically use with a Likert scale?
Internal Consistency
what is an Achievement Test
Measures level of knowledge in a particular domain. ex - Norris Educational Achievement Tests
what is an Aptitude Test
Measures potential to succeed ex - Scholastic aptitude scale, evaluation aptitude test
what is an Intelligence Test
Measures skill or competence ex - Cognitive abilities test, Stanford-Binet Intelligence Test
what is a Personality Test
Measures unique and stable set of characteristics, traits, or attitudes ex - Aggression questionnaire, achievement/motivation profile.
what is the first letter in NOIR?
Nominal Variables are labels, assigns numbers to observations to classify. Always qualitative (differences in quality, not quantity) and mutually exclusive (gender, eye color, handedness) Nomin is Latin for name, naming variables
What is the 2nd letter in NOIR?
Ordinal Imply intensity or severity, assigns numbers to observation in a sequence from lesser to greater; explains how variables can be ordered/ranked. Ex: college football top 20, letter grades, Likert items, IQ test scores
what kind of test are the Rorschach Inkblots
Personality Test
what kind of test is the MMPI
Personality Test
What is the 4th letter in NOIR?
Ratio Assigns numbers to observations that reflect quantity with reference to *absolute zero* Ex: height, weight, speed of travel, number of yards gained by a running back
Descriptive Statistics
Reduces mass of data to one or two relatively understood values (measures of central tendency, correlation, regression).
What are the four main types of reliability discussed in your text?
Test-retest, parallell forms, internal consistency, and interrator reliability
Army Beta Tests
Tested illiterate and non-English speaking recruits in the Army
Army Alpha Tests
Tested literate recruits in the Army (orally)
What is a raw score and how is it different from a percentile or standard score?
The *original, untransformed score*, before any operation is performed on it. They form the basis for other scores such as percentiles and standard scores. What we use to give meaning to the data.
Median
The midpoint in a set of scores, where 50% of the data falls below and above. It is not very affected by the extremes. Use when you have extreme scores and don't want to distort average
What is a z-score and why is it so cool?
The most frequently used standard score which allows comparison of scores across different distributions. They are easy to compute and understand across different distributions are comparable, regardless of mean, standard deviation, and raw score.
What is a percentile rank?
The point in a distribution of scores below which a given number of scores fall. Most common score for reporting test results. It is a location along a continuum from 0 to 99. It is NOT a percentage.
Mean
The sum of all of the values in a group, divided by the number of values in that group. It is sensitive to extreme scores which can make it less representative of the set of scores.
Mode
The value that occurs most frequently. Use when data is categorical and values can only fit into one class. (gender, eye color, etc.)
Be prepared to reflect on the idea that: "our understanding of behavior is only as good as the tools we use to measure it."
There are all kinds of ways that we try to measure outcomes, and sometimes we use the very best instruments available—and at other times, we may just use what's convenient. The best takes more time, work, and money, but it gives us accurate and reliable results. Anything short of the best forces us to compromise, and what you see may, indeed not be what you get.
Why are norms useful?
They allow us to compare outcomes with others in the same test-taker group.
Qualitative Data
Variables that represent an attribute and can be assigned to a unique category (political affiliation or class standing) - usually nominal
Quantitative Data
Variables whose values are determined by counting or numerical measurements (number of children in a family, weight)
Test-Retest Reliability
What: Looks at the stability of scores *over time.* How: *Correlate scores* from one test taken at two times (interval of time should be long enough to reduce memory effects, but not so long that real changes could have occurred).
What is face validity? Is it a "valid" type of validity?
When a test appears to be valid, makes intuitive or common sense.
What is reliability?
Whether a test, or measurement tool measures something consistently. Every time we weigh something, it should be the exact same. When measuring physiological traits, we want them to be consistent.
What is the difference between a z-score and a t-score?
Z-Score: mean of 0 and standard deviation of 1 T-Score: mean of 50 and standard deviation of 10
Construct Validity
a *theoretical, intangible trait in which individuals differ* (hostility, depression) ex - GPA is related to intelligence, but does not explain it.
What is the primary danger in using a test that is not reliable?
data can change over time and represent different things.
Content Validity
degree to which *items on a test are representative of the universe of behavior the test was designed to sample* Q: does a collection of items on the test fairly represent all the possible questions that could be asked?
Criterion-Related Validity
demonstrated when a test is shown to be effective in *estimating an examinee's performance on some outcome measure* ex - chapel attendance predicting religiosity
What is the 3rd letter in NOIR?
interval Assigns numbers to observations that reflect a constant unit length between units of measurement. Has equal intervals Ex: temperature
Concurrent Validity
measures are obtained at approximately the same time as the test scores ex - diagnostic clinical tests, achievement tests
Predictive Validity
measures are taking place in the future, most often used in entrance exams ex - GRE
According to your text, what are five main purposes of testing?
selection, placement, diagnosis, hypothesis testing, classification
What is validity?
the extent to which a test measures what it claims to measure (if so, scores have meaning)
Parallel Form Reliability
when: used when there is more than one form of test how: correlation between two test scores why: reduces the possibility of cheating and memory/practice effects
Internal Consistency Reliability
when: you want to know if the items on a test asses one, and only one dimension how: correlate each individual item score with the total score
Interrater Reliability
when: you want to know whether there is consistency in the rating of some outcome how: Examine agreement between raters (Judges)
What two things impact error in classical test score theory?
*Trait error*: focused on the individual *Method error*: focused on the situation We assume that error is randomly distributed.