Psych 440 Midterm 1
Culture
"The socially transmitted behavior patterns, beliefs, and products of work of a particular population, community, or group of people" History suggests cultural bias in testing can have an adverse impact: - immigration restrictions - forced sterilization
Psychometrics
'Measuring the mind'. The fundamental goal of psychological measurement is to *predict behavior*.
Origins of testing
- Chinese civil service exams initiated in the Chang Dynasty over 3000 yrs old - 1859 Darwin's Origin of the Species raised issue of individual differences + provides theoretical basis for animal models in medical and psychological testing - Wilhelm Wundt was a German medical doctor who studied how individuals were similar instead of different (Leipzig School) + described human abilities with respect to reaction time, perception and attention span
Professional ethical standards
- APA 1895 - formed committee on mental measurement - APA 1954 - published Technical Recommendations for Psychological Tests and Diagnostic Tests - Collaboration of APA and other organizations (AERA) have led to publication of sound practices in the field of testing and assessment
Age and grade norms
- Average performance of test-takers at various age/grades - Scores do not present equal units of measurement - Scores often used as evaluative standards - Not effective with very young or adult test-takers
Culture and testing
- Many early tests had NO minority individuals in standardization samples - Items culturally grounded in the dominant American culture: + "who was the first person to discover American?" - Translation problems: no corresponding object/word, changes in meaning
Comparison of NRT and CRT
- NRT= covers a *broad* content domain - CRT= focuses on a more *specific* content domain - NRT= emphasizes *discrimination* among individuals - CRT= emphasizes *description* of what individuals can and cannot do - NRT= interpretation requires a clearly defined comparison group - CRT= interpretation requires a clearly defined standard or criterion level of performance - NRT= scores are usually reported in terms of standard scores or percentile ranks - CRT= scores are usually reported in absolute
Nominal scale
- Nominal (or naming) level - Lowest level of measurement - Ordering in not important, only the label attached to designate a mutually exclusive and exhaustive category Examples: - medical diagnoses, gender, political party affiliation
Error and prediction
- Standard Error of the Estimate (SE) - Indicates magnitude of errors in estimation - Higher correlations produce smaller SE - Lower correlations produce larger SE
Who's involved in assessment?
- Test developers - Test users - Test taker - Society at large
Techniques of psychological assessment
- Tests - Interviews - Case history data - Behavioral observation - Role-playing - Computer-based instruments - Other techniques
Testing in the U.S.
- U.S. military developed Army Alpha & Beta during WWI (Yerkes & Brigham) + used to identify intellectual abilities of recruits and personality risk factors for "shell shock" - In 1939 the Wechsler-Bellevue Intelligence Scale (now WAIS) developed for adults + later other versions were developed for use with preschool (WPPSI) and school age children (WISC)
Rights a test-taker has
-Informed consent -Informed of test findings -Privacy (confidentiality) -Least stigmatizing label
Assessment assumptions
1) Psychological states or traits exist, and can be quantified and measured. 2) Different approaches to measuring aspects of the same thing can be useful. 3) Various sources of error are part of the assessment process. 4) Test-related behavior can predict behavior in other settings. 5) Present-day behaviors can predict future behaviors.
Z-scores
A standard score where the mean of the scores is set at zero (0) and standard deviations are set at intervals of one (1).
Normative sample
A group of people who performance on a particular test is analyzed for reference in evaluating the performance of individual test-takers. Sample must be representative or typical of the intended population of interest Inadequate norms makes it difficult to make proper interpretations
Interpreting percentiles
A percentile difference of 10 near the middle of the group often represents a smaller difference in performance than a difference in 1- near the tails. In terms of skills, a difference of a few percentile points near the tails means more change has taken place than the same size difference near the middle of the group.
Standard scores
A raw score that has been converted from one scale to a new (standardized) scale with a prescribed mean and SD. Typically expressed in terms of number of SDs from the mean - all standard scores have equal unit sizes across the distribution
Percentiles
A raw score that has been converted into the percentage of a distribution that falls below that particular raw score. - widely used in test manuals as well as other literature on commercially published standardized tests
Convenience samples
A sample that is convenient or available for use.
Scale
A set of numbers who properties model empirical properties of the variables to which the numbers are assigned. + discrete + continuous
Correlation
A statistical technique which allows us to make inferences about how two (or more) variables related (co-relate) to each other (linearly). Expressed using a correlation coefficient: - statement about the direction of a relation - statement about the strength of the relation
The normal distribution
A symmetrical, mathematically defined frequency distribution curve. - highest at the center (most frequent scores are at the mean) and tapering on both sides Asymptotic towards the abscissa + there is never a zero point or a 100 point, measures must be between 1-99 Mean, median, and mode are equal Area under the curve is divided in terms of SD units and can aid in the interpretation of test scores.
Coefficient of determination
Accurate interpretation of correlation coefficients requires another statistic, the coefficient of determination. - calculated by squaring the r2 The coefficient of determination tells how much variance in one variable is accounted for by the variance in the other.
Percentile pros and cons
Advantages: - can be used to interpret performance in terms of various groups and are easily understood Disadvantages: - percentiles are an ordinal scale - differences between individuals near the middle are magnified and differences at the extremes are compressed
Z-scores: pros and cons
Advantages: - indicates each person's standing as compared to the group mean - can easily be converted to percentiles Disadvantages: - negative z values can be difficult to work with and explain - dealing with fractional z values can be a hassle
Purposive samples
Arbitrarily selecting a sample because it is believed to present some population.
Interval scale statistics
Because of equal intervals between values some mathematic operations are meaningfully appropriate: - addition and subtraction - multiplication and division not appropriate because there is no true zero - statistical tests based on mean scores and/or variance
For every one unit increase in IV there's a one unit increase in..
Beta
Classification
Define when objects fall into the same or different categories with regards to an attribute. Examples: • Types of objects, college majors, sex, personality types
Discrete scale
Categorical labels or integers, no meaningful middle grounds between categories.
Variable
Characteristics or attributes of objects (people, places, things, animals, etc.) in a population *that are not constant*.
Alfred Binet
Commissioned by France's education system to help identify 'subnormal' children. - *developed first intelligence test* in 1905 with Theodore Simon - mental age proposed as criterion for evaluation - test revised by Lewis Terman at Stanford, current revisions still widely used
What is important to remember about correlation and causation?
Correlation ≠ causation!!
Test users
Counselors, other therapists, teachers, human resources, researchers.
Asymptotic
Curve in normal distribution doesn't touch x-axis.
Scales and descriptive statistics
Data must be measured on an interval or a ratio scale for the computation of means and other parametric statistics to be valid. Therefore, if data are measured on an ordinal scale, the median but not the mean can serve as a measure of central tendency.
Kurtosis
Describes the steepness of a distribution in its center. *Platykurtic*: flat *Leptokurtic*: peaked *Mesokurtic*: somewhere in between
Error
Deviation for some measurement from the true standing of an individual on some characteristic. Many sources of error: - effects of the environment - precision of the measurement device - confounding variables Error influences estimates of both central tendency and variability.
Psychological assessment assumption #2
Different approaches to measuring aspects of the same thing can be useful. - because we are making inferences it is better to have convergent evidence
Skewness
Distributions can be characterized by the extent to which they are asymmetrical or "skewed". *Positive skew*: only a few extremely high scores and many low scores *Negative skew*: only a few extremely low scores and many high scores
Quartiles
Dividing points between the four quarters of a distribution of test scores. *Interquartile range* is equal to the difference between Q3 and Q1 + the relative distance of Q1 and Q3 from the median (Q2) gives an indication of skewness of the distribution *Semi-interquartile range* equals the interquartile range divided by 2.
Validity
Does the test measure effectively what it purports to measure?
Reliability
Does the test produce consistent measurement results?
Random samples
Each individual from the population has an equal chance of being included in the sample.
James McKeen Cattell
First American to systematically study assessment of *individual differences*. - a student of Wundt, but more influenced by Galton's methods - studied differences in reaction time - *coined the term 'mental test'* - named his daughter "Psyche"
Psyche
Greek word for 'the mind'.
Nominal scale statistics
If numbers are assigned, they cannot be meaningfully manipulated mathematically. Appropriate arithmetic operations: - counting - proportions - percentages - chi-square tests
Z-scores and the normal distribution
If we have a normal distribution, we can make the following assumptions: - approx 68% of the scores are between a z-score of 1 and -1 - approx 95% of the scores will be between a z-score of 2 and -2 - approx 99.7% of the scores will be between a z-score of 3 and -3
Ratio scale
Includes ordering, equal intervals AND an obsolute zero. - all mathematical operations can be meaningfully performed Examples: - length and weight, Kelvin scale
Ordinal scale
Individuals or things are ranked or ordered on the basis of some criteria; intervals between ranks are not consistent. Examples: - grade level, ranking from shortest to tallest, movie sequels
Best portrayer of data if there is an outlier
Median
Norm-referenced (NRT) test interpretation
Interpretation is based on an individual's *relative standing* in some known group.
Criterion-referenced (CRT) test interpretation
Interpretation is based on measuring an individual's skill level in relation to a clearly specified standard (i.e., criterion).
Limitations
It is important to understand that normed scores do not represent standards or goals to be achieved by students. + norms simply describe typical or normal performance Criterion reference scores may have little or no application at the upper end of the knowledge/skill continuum. + more difficult to make proper comparisons between test takers
Metric
Latin word for 'measurement'.
Deviation scores
Measure of how far the raw score is from the mean of its distribution (X - μ).
Variability
Measures are used to describe how much fluctuation in scores there are in a sample of observations. - needed to interpret a person's score
Central tendency
Measures are used to describe the typical response seen in a sample of observations. - needed to interpret a person's score
Tarasoff vs. Univ. of California
Mental professionals have a right to tell people they may be in danger because of their patient.
Inferential statistics
Methods for making inferences about a population of objects based on information from a sample from that population. Examples: - chi-square test of association, t-test and ANOVA, correlation and regression
Measures of central tendency
Mode (most frequently observed) Mean (average score) Median (50th percentile score)
Mode
Most frequently observed score. - only measure of central tendency that can be used with Nominal data. Examples: 3,4,4,5,5,5,6,8 = 5 3,4,4,4,5,5,5,8 = 4 and 5
Advantages of standardized measurements
Objectivity Quantification • Communication • Economy • Scientific generalizability
Variance and SD
Reflects the variability of scores about the mean of the group. Variance is the average of the sum of the squared deviations of each score from the mean. The SD is the square root of the variance. + is expressed in the same units of measurement as the original scores
National and anchor norms
National norms are derived from 'representative' samples of a country. + often developed using stratified sampling methods Anchor norms indicate how test scores for a measure compare to the norms for other measures of the same construct. + calculated using percentile scores
Interval scale
Numbering includes order, but intervals between each successive level represents equal differences; *no absolute zero point in the scale*. Examples: - fahrenheit scale, intelligence test scores
Continuous scale
Numbers do not represent categories, middle ground between units possible.
Correlation coefficient
Pearson's r
Types of norms
Percentiles Age Grade National Anchor Subgroup
Median
Point which divides the group in half so that 50% of scores fall about it and 50% fall below it. - better measure of central tendency than the mean when the data are skewed because it is unaffected by extreme scores Examples: 2,3,4,5,6 = median is 4 2,3,4,5,6,7,8 = median is 5
Predictors of risk taking behavior
Positive predictors: - confidence - risk propensity - sensation seeking - gender (M>F) - extraversion Negative predictors: - age - social desirability - neuroticism - risk assessment
Prediction
Predicting values of one variable based on knowledge of scores on other variables is a practical use of correlation. Examples: - predicting job performance from aptitude test scores However, prediction technique must take into account both the scales of measurement and the correlation between the two variables.
Psychological assessment assumption #5
Present-day behaviors can predict future behaviors. - or there would be no point to testing
Descriptive statistics
Procedures for organizing, summarizing, and describing quantitative information. + academic performance can be described using descriptive statistics Examples: - batting average, census data, horsepower
Psychological assessment assumption #1
Psychological states or traits exist, and can be quantified and measured. - if these things are not real, then what's the point of trying to measure them? - if we can't quantify and measure them then psychologists are screwed
Test developers
Psychologists required to adhere to ethical standards (APA, AERA).
Federal testing legislation
Public interest in educational testing parked by Sputnik (1957). - National Defense Education Act (1958) provided money for aptitude testing in attempt to identify gifted children - Increased use of tests led to concerns about value and effect of psychological testing on students
T-scores
Represent one transformation of z which overcomes the disadvantage of working with negative scores. T-score = (z-score X 10) + 50 - t-score mean = 50 - t-score SD = 10
Scaling
Represent quantity of an attribute numerically. • Also used to measure psychological characteristics such as IQ test scores Examples: • Physical attributes and other quantities such as weight, height, age, and the cost of buying products or services
Stratified samples
Sampling individuals from subgroups in the population in the same proportion as the population they are part of. - best when population includes subgroups that differ on some potentially meaningful characteristic Helps prevent sampling bias
Scales of measurement
Scales, or levels, of measurement help determine what statistical analyses are appropriate; enable test users to make accurate score interpretations. Four levels: - nominal - ordinal - interval - ratio
What are the two types of measurement?
Scaling and Classification.
Standard deviation (SD) formula
Standard deviation -- the average deviation of each score from the mean. Population: σ = √∑ (X - μ)2 / N Sample: s = √∑ (X - mean of X)2 / (n - 1)
Types of normative samples
Stratified Random Purposive Convenience
Negative relation
Strong negative (r= -.7) + political affiliation and willingness to vote for another party's candidate Moderate/weak negative relation (r= -.4) + brushing teeth and cavities
Positive relation
Strong relation (r= .7 or higher) + height and weight + age and job experience Moderate/week relation (r= .4 or lower) + chemotherapy and cancer remission + GRE scores and grad student success r2 = proportion of variance shared by variables
Sir Frances Galton
Studied genetic influence using pedigree charts. - attempted to quantify individual differences by classifying people - *developed first correlation coefficient* later refined by Karl Pearson - created Anthropometric Laboratory in London in 1884 - major proponent of the Eugenics Movement
What if a person scores below the mean?
Subtract 1 from percentile: 1-#
Measures of variability
Synonyms for variability are "spread" and "dispersion". Each term refers to differences among scores within a sample or population. Three common types are: - Range - Deviation scores - Variance and SD
Sampling and norms
Test administered to members of the sample under the same conditions. - environment - instructions - time restrictions - etc Developers calculate descriptive statistics and provide precise description of sample.
Psychological assessment assumption #4
Test-related behavior can predict behavior in other settings. - otherwise the test would be useless
Mean
The "average" of a set of scores. - found by summing all values and then dividing that sum by the total number of observed values - requires interval or ratio data - sensitive to every score in the sample, and may be inappropriate with skewed data Example: 3,4,4,5,5,5,6,8 = 40/8 = 5 3,3,4,4,6 = 20/5 = 4
Range
The difference between the highest and lowest scores and is sensitive to outliers. Examples; 2,5,7,7,8,8,10,12,15,17,20 + Range = 20-2= 18 2,5,6,6,7,8,11,14,15,15,23 + Range = 23-2= 21
Population z-score formula
The population z-score is calculated by subtracting the population mean from the individual raw score and then dividing the population SD. Where: - x is the individual score - μ is the population mean - δ is the population SD Z= (X-μ)/δ
Measurement
The process of assigning numbers or symbols to a characteristic or attribute according to a set of rules.
Assessment
The process of gathering and integrating data for the purpose of making an evaluation. - most effective when info is obtained using multiple techniques
Testing
The process of measuring variables by means of devices or procedures designed to obtain a sample of behavior.
Sample z-score formula
The sample z-score is calculated by subtracting the sample mean from the individual raw score and then dividing by the sample SD. Where: - X is the individual score - X ̅ is the sample mean - s is the sample SD Z= (X- X ̅ )/s
Examples of psychological scales
They often measure individual differences. • Interests • Personality • The GRE • Risk taking behavior
Describing data
Three methods: - pictorially - measures of central tendency - measures of variability (or dispersion)
Multiple regression
Used when multiple predictors are used. - can be used when more than one predictor variable is available Takes into account the correlation between each of the predictor scores and what is being predicted. Also taken into account are the correlations among the predictors Y=a + b1X1 + b2X2
Simple linear regression
Used when one variable is used to predict values. - describes the relationship between one independent variable (x) and one dependent variable (y)
Logistic regression
Used when the variable being predicted id dichotomous (ex. gender).
Ordinal scale statistics
Values imply nothing about magnitude of differences between one level to the next. - numbers are not units of measurement - statistical operations are limited to non-parametric tests
Psychological assessment assumption #3
Various sources of error are part of the assessment process. - our goal is to manage error sources instead of ignoring the problem
Adequate norms
Was the test developed using samples similar to the people taking the test?
Z-scores and percentile ranks
Z-scores can be used to calculate percentiles when raw scores have a normal distribution. - when used in conjunction with a Z-table, the z-score reveals the area of the normal distribution below the score in question
Characteristics of an effective test
• Reliability • Validity • Adequate norms
No relation (pearson's r)
r = 0.00
.80 and -.80
same correlation
Tests are defined by what (and how) they measure
• Content • Format • Administration procedures • Scoring and interpretation procedures • Psychometric quality - what makes an effective test?