Psychological Testing

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Discriminant Validity Technique

a form of construct validity in which evidence is obtained to demonstrate that a test measures something unique and different from what other available test measure (measures something unique) (want to know that a test specifically measures depression and not correlate with other tests [anxiety, stress])

Convergent Validity Technique

a form of construction validation in which evidence is obtained to demonstrate that a test measures the same attribute as do other measures that purport to measure the same thing

Predictive Validation

a procedure to determine the extent to which scores on a test are predictive of performance on some criterion measure assessed at a later time; usually expressed as a correlation between the test (predictor variable) and the criterion variable

Z-scores

basically SDs, tell you how many SDs you are from mean. Can calculate this by z = x-xbar/SD

The sad case of Facilitative Learning

bogus. Children with autism, computer, parent holding wrist

As reliability increases...

confidence interval gets smaller

Platykurtic

curve was very flat with lots of variation (like a plate); wide

Demand characteristics Blind and Double Blind Studies

demands from the researcher DB: not even RA knows what you're looking for; researcher does not have direct contact with subject. Meant to create distance

Basic Statistics: Statistic

description of the sample

Leptokurtic

distributions of scores is very narrow

An indication of how close the obtained score on a test is to a true score on the test if it were possible to obtain a true score. The standard error of measure is a...

function of the test's reliability. If reliability of the test is low, there will be a lot of error in the measurement.

Reliability

if your test is internally consistent, it is reliable. (Can assume test is not measuring other things, extraneous variables/confounds)

Factor Analysis

looks at intercorrelations of items of a test; your test with the other items, looking for which items in the test are most correlated with each other, and not as correlated with other items of the test (ex: Child Behavior Checklist - two major: Internalizing and Externalizing)

Attrition

people who drop out or do not participate

Basic Statistics: Sample

portion of the population

Confidence Interval

ranges of scores between which we believe the true score lies - Ans: It is the range of scores between which we can have a certain level of confidence that our true score lies between

Reliability =

the consistence of your test, how dependable it is

Construct Validity

the extent to which scores on a psychometric instrument designed to measure a certain characteristics are related to measures of behavior in situations in which the characteristic is supposed to be an important determinant of behavior. *The construct validity of a test is the extent to which the test may be said to measure a theoretical construct or trait.* (How well does your test correlate - correlation of your test with the construct

Alpha (.05)

there's a 5% chance that that happened due to chance

Basic Statistics: Parameter

two ends; extremes; boundaries (height: tallest and shortest); description of the population

Tests of Internal Consistency

want to make sure test is *only* measuring this construct

Basic Statistics: Population

who you want to measure

IQ scores

xbar = 100, SD = 15

If true score variance and observed score variance that means there is

zero (0) error

*What is Reliability?

• "Reliability refers to the consistency of scores obtained by the same persons when reexamined with the same test on different occasions, or with different sets of equivalent items, or under other variable examining conditions... *Test reliability indicates the extent to which individual differences in test scores are attributable to "true" differences in the characteristics under consideration and the extent to which they are attributable to chance errors.* To put it in more technical terms, measures of test reliability make it possible to estimate what proportion of the total variance of test scores is error variance." - Allows us to make an estimate of how much of our result is due to error

What is Reliability?

• "The attribute of consistency in measurement" • "The degree to which a testing or experiment measures consistently that it is designed to measure" • "The more reliable a measuring procedure is, the greater the agreement between scores obtained when the procedure is applied twice" • The extent to which a physiological assessment instrument measures anything consistently. A reliable instrument is relatively free from errors of measurement, so the score examinees obtain on the instrument are close in numerical value to their true score" Observed/Obtained Score = True Score; equal to each other + Error

Validity: What is validity? (Content vs. Criterion vs. Construct)

• "The extent to which a test measures the quality it purports to measure. Types of validity include content validity, criterion validity, and construct validity" Kaplan & Saccuzzo • "The extent to which an assessment instrument measures what it was designed to measure. Validity can be assessed in several ways: by analyzing the instruments content (content validity), by relating scores on the instrument to a criterion (predictive and concurrent validity), by relating scores on the instrument to a criterion (predictive and concurrent validity), and by a more thorough analysis of the extent to which the instrument is a measure of a certain psychological construct" LR Aiken

r = correlation coefficient

• "r" can vary from -1 to +1. The strength of the correlation is the absolute value of "r" • "r" measures the linear relationship between 2 variables (straight line relationship)

Reliability: Sources of Error Variance (What are they?)

• *Test Construction* (Ex: MMPI has personal/awkward questions. That can create error because not everyone responds the same to personal/awkward questions.) • *Examiner Variables* (Some examiners may not give test in standard/right/specific way → create error.) • Examinee Variables (Lack of sleep, physical discomfort, anxiety.)

Ratio Scale

• 0'Kelvin = absolute absence of temperature (all particles stop moving) • 0 distance, 0 weight

What is correlation?

• A relationship between variables • "co-relation" • def - the extent to which 2 variables are related to one another. The strength of the relationship. Statistically, it is the extent to which 2 variables covary.

How does correlation help us to account for variance?

• Accounting for Variance: - The Coefficient of Determination = r^2 (variance accounted for by your two variables) (1-r^2 is the variance NOT accounted for, what's left over) - Ex: 11%, 89% is variance not accounted for. 1 - r^2 = error

What are standard scores? - Normal distributions/curves are easy because they have the same mean, easy to compare - IQ = normally distributed - Psychopathology is not normally distributed

• Allow you to compare different tests to each other (eg: educational testing) • Note: You can only compare scales that are normally distributed (Linear Transformation). If the distribution is not normal, you must "stretch" the skewed distribution so it looks normal (Nonlinear Transformation). "Normalizing" a distribution should only be done when you have: - Good reason to believe that the test sample was large and representative enough - Failure to obtain normally distributed scores was due to the measuring instrument, not because the population itself is skewed

Interval Scales (most psych testing are)

• Allow you to know how much better something is than another • Limit: no absolute zero (0 is not the absence of the quality you're measuring) • 0'C = freezing point of water = 32'F; 0'F is not the absence of temperature

Test-Retest Reliability

• An estimate of reliability obtained by correlating pairs of scores obtained from the same person on two different administration of the same test. Test and then retest. • What are the advantages and disadvantages? • Advantage: This is a relatively easy way to estimate the ability of a test to give consistent scores. • Disadvantages: Practice Effect (familiar because tested before), Fatigue Effect (get tired), Maturation (grow, smarter, 3 year old a month later, things that can happen in between)

Skewed Distributions

• At times, distributions are not normally distributed → "skewed" • Not symmetrical (positively skewed: R tail, negatively skewed: L tail) • Positively skewed (^...) tail in positive direction • Negatively skewed (...^) tail in negative direction • Outlier: score that is extreme, way outside typical • Bill Gates is an outlier • Median score is best measure of Central Tendency in skewed distribution • Median score is between mode and mean in skewed distribution • The mean is affected by the "tail" (ie, outliers) and will move toward the end. The mode will be at the peak of the distribution. The median will fall in between the mean and the mode, and is considered the best measure of central tendency in a skewed distribution.

Speed Tests

• Because the items are very similar in difficulty, tests of Internal Consistency will produce spuriously high estimates of reliability. Alternate or parallel form reliability is most appropriate. As a rule, the test should be administered on two different occasions

Nominal Scales

• Categories • Data that is categorical is typically places on a nominal scale • Often represented as a bar graph

Construct Validity: How to establish Construct Validity: (want all of these...)

• Convergent Validity Technique and • Discriminant Validity Technique and • Factor Analysis or Tests of Internal Consistency

What do statistics allow us to do?

• Describe - Descriptive Statistics (most psych testing) • Infer - Inferential Statistics (statement based on what you know)

Standardized Testing Characteristics

• Has Normative data • Has Standardized procedures for administration (give test same way every time trying to reduce effects of administrator on subject - negative proctor vs. positive proctor)

What are Norms? • What is Normative Data?

• In developing any test, we need to gather normative data. That is, we need to gather info on the "defined" population to which we would like to make inferences. • How do we get normative data?

What type of Reliability is more appropriate?

• In large part, the type of reliability that is most appropriate depends on qualities of your test • Ex: Speed Tests vs. Power Tests

Sampling Distributions of Means and Central Limits Theorem

• In other words, what are Sampling Distributions of Means and Central Limits Theorem? • Sampling Distributions of Means? • Distribution or curve created from mean scores • Data points that create distribution are not individual scores • A Sampling Distribution is a distribution in which means of samples of size N from a population are used to create a distribution • Is a curve created from samples of a specific size to create a distribution

Correlation is not causation (just because they covary, doesn't mean one causes the other) ex: ice cream sales and violent crime are correlated, ice cream sales does not cause violent crime What does a correlation cause?

• It tells us how closely the dots of a scatter plot follow a line (Line of Best Fit). That is, it tells you the variance of the dots around any given point on the line. • It tells us the direction of the correlation (positive or negative)

Kurtosis

• Leptokurtic • Platykurtic

1. Where's the middle?

• Measures of Central Tendency (mean, median, mode) - Mode: most frequently occurring score - Median: arranged in order, the one directly in the middle - Mean (x bar): average; sum of the scores divided by the # of scores

2. How much does it spread out?

• Measures of Variation/Variance (standard deviation, variance, range) - Range: distance of tallest person to shortest person - Variance: can take each score, subtract the middle, and divide by # of scores. / Square this number / Square root whole thing = SD (avg of distance of scores from the middle) // Average distance from mean squared - Standard Deviation: average distance from the mean

Ordinal Scales

• Metals and ribbons (1: gold, 2: silver, 3: bronze) (blue, red, white, green) • Don't know the distance between them (gold → silver → bronze) • Know order, but don't know how much better

Scaling

• Nominal Scales • Ordinal Scales • Interval Scales • Ratio Scale

The Normal Curve

• Normal Distribution (often called "Bell Shaped Curve") • Qualities: • The Distribution of scores is symmetrical • Half of the scores fall above and half fall below the mean (arithmetic average) • A large number of scores fall very close to the mean with progressively fewer cases occurring as scores get further above or below mean • Fewer scores as you approach the ends of the curve • Mode, Median, Mean are all aligned in the middle • Assumptions: curves must be normally distributed • Mean is only best measure for a normal curve

Parallel-Forms and Alternate-Forms Reliability

• Parallel-Forms Reliability: In this type of reliability, there are two different forms of the test, and the means and variance of the observed scores are equal to each other (eg: Army Alpha and Army Beta) • Alternate Forms Reliability is simply the comparison of two different versions of the test that were constructed as to be parallel (eg: elementary timed math tests) - Same: trying to measure same thing, construct, ideas - Difference: May be different in format

Regression

• Regression allows us to make a prediction. However, there is error in making these predictions. • What is error? • Error is the variance that is due to extraneous variables (confounds). That is, it is the variance not accounted for.

Reliability vs. Validity

• Reliability is a necessary, but not sufficient condition, for a test to be valid. (Must have a reliable test for it to be valid) • A test cannot be more valid than it is reliable. • A test can only be as valid as it is reliable • A test will always be most related to itself. In other words, a test cannot be more related to something else than it will be related to itself. • Theoretically, a test should not correlate more highly with any other variable than it correlates with itself.

Why? Why can't a test be more valid than it is reliable?

• Reliability is the relationship of a test to itself (its own score). • Validity is the relationship of a test to a construct or criterion

What things affect Correlation?

• Restricted Range or homogeneity of sample (one without much variation, narrow sample) - When you restrict the range of a sample (don't use everybody) you create inaccurately low correlations. Better correlations = fuller range in your samples. - Scatter of the dots around the line of best fit (ie: error)

The Special Case of Stratified Samples

• Samples are stratified across various demographics (gender, age, ethnicity, level of education, graphic region, urban/rural, etc.) and according to known population data - Could stratify on age, ethnicity, gender (51% of population is F, 49% is M) • Sample not random but created based on population numbers that has the right proportions.

• Representative Sample (looks like our population, represents it) • Random Sample/Random Sampling (process by which we do that ^)

• To obtain normative data, we take a representative sample of the population. The best procedure is to get a Random Sample of the population. • A true random sample consists of a group of subjects that were randomly selected from the population. Each member of the population must have an equal probability of being selected.

Standard Scores • What are raw scores?

• Unmanipulated scores, haven't done anything to it yet • Can be converted to standard scores

What are the DISadvantages of Parallel-Forms and Alternate-Forms Reliability?

Disadvantage: Fatigue Effect and Maturation continue to be a problem. There must be evidence that the two versions of the test are in fact measuring the same thing and are statistically parallel

Stanine

Divides the curve into equal parts of 9 ("Standard Nine")

Standard Scores & Percentile Ranks

Pie/curve: 2 / 14 / 34 / 34 / 14 / 2 = 100% at +1 Standard Deviations, 84% of population is at or below your score at -1 Standard Deviations, 16%

If comparing results of two different tests (math & IQ) what would be appropriate way to address?

Standard Error of Difference

Standard Error of Difference

• There are times that we want to compare scores (eg: test scores of two different applicants for a job or two different test scores of one individual). In this case, we now have two sources of error: - Error from Person 1 and Error from Person 2 or - Error from Test 1 and Error from Test 2 *We must account for error twice

How are Confidence Intervals used in psychological testing?

• To determine the confidence we have that the true score falls within a specified range of scores (if you want to be more confident, need to throw a bigger/wider net) • To make a statement about future performance • To make a comparison between subscale scores on a particular test

If T score of 40, what is equivalent A score?

= 400. What is percentile rank? =16(% of curve is at or below my score).

How do we get normative data? - Standard Scores allow us to make interpretations.

Best procedure is random sampling. Everyone has an equal opportunity to be included in the sample.

*Features* of Sampling Distribution samples:*

*• All of the samples must have the same number of N • Each population member must have the same probability of being included or re-included over and over again into the sample* • Note: Since it is impossible to have access to the total population, this is mostly theoretical. • The sampling distribution of means gives us a better estimate of the population mean because it reduces the effects of outliers

What is Reliability? (own slide)

- Dating: dependable, stable, consistent, predictable, you can count on them - Unreliable: not dependable, not stable, can't count on them - Qualities are the same for testing

Split-Half Reliability

- Fundamental problem: Restricting range of possible scores - Cut the test in half (originally 0-100) now can only be (0-50) = Restricted Range in half, which lowers correlation (not going to be as correlated) • Split-Half Reliability compares the two halves • By reducing the length of the test, the range of possible scores has been reduced. In a way, Split-Half Reliability restricts the range of possible scores. Therefore, the correlation of the two halves of the test would give a spuriously low estimate of the tests reliability

-"under-represented groups" → minority/majority

- Men 50%, must be at least 4/5 (80%) of 50%, which is 40%. At least 40% of women must be selected.

Basic Statistics

- Population - Sample - Parameter - Statistics

Regression vs. Correlation

- Regression allows you to predict. Correlation tells you about a relationship, do not allow you to make a prediction. • How are they similar? • They both rely on the line of best fit, dots around line of best fit, closer the dots the stronger the relationship. • What makes them different? • Y-intercept (at an X of 0, you can know where the line crosses the Y-axis) • Regression: y = mx + b - b: y-intercept - x: rise over run; how much does it rise as you run across the X-axis

Impact of skewedness (outliers) on statistical analysis?

- The problem is it affects the mean score. Makes it inappropriate/difficult to interpret the data because the data is going to be meaningless/a problem if it's skewed. The mean is not the best estimate. Skewed makes it meaningless

Sampling Problems in Research

- We want representative samples (if not representation, may not be appropriate) • Over use of incidental samples or samples of convenience - Incidental sample = convenient samples (samples that are around you: college students) - College populations are not representative of most of the population (too educated, young, female, smart, rich, healthy, white/Asian). The results of an individual in this group would affect results of an psych test, not representative... that's why we like to use stratified samples

Standard Error of Measure [one Standard Deviation unit of error in your measurement (+/-)] (own notes)

- We would not need statistics if there were no error. - If you have a reliable test (ex: thermometer), there would be no error - If your instrument is less reliable (ex: psychological test) you'll have more error in your estimates

Homoscedacity scatter of dots

- regardless of where you are on the X-axis, scatter of dots around line of best fit is even. - Correlation is average scatter/distance of dots around line of best fit (an average line) - Restricted Range problem: when the sample is too narrow that it doesn't have the full range of people (as opposed to when you look at the whole population). - The Restricted Range problem makes it lower than what is true

Descriptive Statistics: What are the two ways we describe our samples?

1. Where's the middle? 2. How much does it spread out?

Why does reliability establish the ceiling on how valid a test can be? (Utility Chapter)

A test cannot be more correlated with another variable than it will be with its own score.

What are the advantages of Parallel-Forms and Alternate-Forms Reliability?

Advantage: Parallel-forms and Alternate Form Reliability help to reduce Practice Effect

What is the utility of knowing the standard error of measure?

Ans: From Standard Error of Measure, we can calculate the Confidence Interval

A scores (SAT)

Mean for A score is: xbar = 500, SD = 100

T scores (personality)

Mean for T score is: xbar = 50, SD = 10

Why is a test of Internal Consistency (or Factor Analysis) important in establishing Construct validity?

Want to know you're measuring only one construct. All other is error.

Content Validity (content valid instrument = DSM)

• The type of validity that is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample (if we have some quality of construct trying to measure, it covers all the different dimensions/qualities we talk about in order to fully measure that quality)

Summary of Factors Influencing Reliability

• Short tests are less reliable that longer tests because they are more influenced by chance variables and error will increase • If test items are too difficult there will be more guessing, hence more error and reduced reliability • If tests items are too easy, the variability of scores will be reduced and reliability will be reduced - Easy = restrict the range, too hard = people guess • As the group upon which the reliability test was performed becomes more homogeneous, the variability of the scores will decrease, and therefore, the reliability will also decrease • For particular kinds of reliability, inter-item consistency as measured by Kuder-Richardson or Cronbach's Alpha, reliability is increased as the items become more homogeneous • In behavioral assessment, inter-rater reliability is reduced if the raters are not well trained and do not adequately understand the criteria. Reliability is also reduced if they do not all rate the same subjects. Conversely, inter-rater reliability increases if raters are well trained and if they all rate all the subjects - You want high inter-rater reliability. All raters must be well trained and rate all same subjects - Operational definition of behavior rating must be well defined and specified

Norm-Referenced Tests vs. Criterion-Referenced Tests

• Some standardized tests are norm-referenced and others are criterion-referenced - Norm Referenced: compared to a norm/average and how much you vary from it (Ex: SAT, GRE, MMPI, IQ) - Criteria Referenced: compared to where you stand relative to some standard score (Ex: driver's license test, Psychology licensing test, CBEST [above that line = qualified to drive/pass the exam]) • Why is this information important? • Adverse Impact Laws or the 4/5ths (80%) Rule (don't want this Adverse Impact; illegal) • As a result, employment tests must be criterion-referenced

Speed vs. Power Tests (picture)

• Speed Tests • Alternate Forms Reliability - good • Parallel-Forms Reliability - good • Internal Consistency - bad • Power Tests • Split-Half Reliability - bad • Alternate Forms Reliability - okay • Test-Retest Reliability - okay • Internal Consistency - good

Speed vs. Power Tests

• Speed Tests contain items of uniform difficulty, but there is a premium placed on speed. The Ceiling is created by time-limits • Power Tests have items of varying difficulty, some of which may be very difficult. People are not intended to pass all items. The Ceiling is created by difficulty

Power Tests

• Split-Half reliability may not be appropriate bc it is difficult to split the test evenly. Test-retest reliability and Alternate forms reliability are acceptable, but Practice Effect is a problem with Test-retest reliability, and Maturation and Fatigue Effect are a potential problem with giving a test twice. Tests of Internal Consistency are often favored

Standard Error of Estimate

• Standard Error of Estimate is the amount of error in our estimates (ie: predictions) • Stronger correlation = less error

Types of Reliability

• Test-retest Reliability: simplest form of reliability • Parallel-Forms Reliability/Alternate-Forms Reliability • Split-Half Reliability • Tests of Internal Consistency

Internal Consistency (why likely to be reliable?) *TEST essay question

• Tests of Internal Consistency look at the correlation of test items with each other • Why is a test of Internal Consistency (eg: Cronbach Alpha [continuous data] or Kuder Richardson Formula 20 [dichotomous data]) a good estimate of a tests reliability? • That is, measures of internal consistency are likely to be very close to the tests reliability if we were using test-retest or alternate forms reliability. Why? - Test with IC is most likely reliable because only measuring one construct, not extraneous variables • *Tests attempt to measure a discrete content area (ie: one variable) • For a test, the major source of measurement error is the sampling of different contents. That is, if the items of your test are not consistent with each other, then you are measuring multiple contents rather than just one variable • Those other contents that are being measured are error (ie: extraneous variable*

What do we do to compensate for this Restricted Range problem?

• The Spearman-Brown Formula/Estimate • The Spearman-Brown Formula/Estimate is used in Split-half Reliability and gives us an estimate of what the tests reliability would be had we not split the test in half • Any time you see the Spearman-Brown Formula used, it means that Split-Half Reliability was used to estimate the tests reliability

Reliability

• The amount of variance we see in test scores (obtained/observed scores) depends on two factors" • True score variance • Error variance • O^2 observed/obtained = O^2true + O^2error

Criterion-Related Validity

• The extent to which a test or other assessment instrument measures what it allegedly measures, as indicated by the correlation of test scores with some criterion measure of behavior (correlation of your test with criterion measure of behavior-external measure behavior) • Two methods for establishing Criterions-Related Validity - Concurrent Validation - a procedure to determine the extend to which scores obtained by a group of individuals on a particular psychometric instrument are related to their simultaneously determined scores on another measure (criterion) of the same characteristic that the instrument is supposed to measure. - A correlation of your test with how well your test correlates with your performance on some other measure of behavior / the correlation of your test with a criterion measure behavior now/today - Two measures occurring simultaneously (ex: finger exercise test and typing speed) give both tests today/same time and see if they're correlated - has to be a group of people who don't know how to type

Assumptions in Correlation

• The relationship between the variables must be linear • The scatter (variability) of the dots around any given X is consistent with any other given X. This is called Homoscedacity. Homoscedacity is okay for your correlation. Heteroscedacity is a problem for your correlation, makes your correlation uninterpretable, not accurate.

Three important features of CLT:

• The sampling distribution is approximately normal. It is not surprising that the sampling distribution is normal if the population distribution is normal. What is surprising, however, is that the CLT assures that if you have sufficiently large sample size (N>=25), the sampling distribution will be normal regardless of the shape of the population • The mean of the sampling distribution of means is equal to the mean of the population • As the sample N increases, the standard deviation of the sampling distribution of means decreases


Set pelajaran terkait

Chapter 36: Management of Patients With Immune Deficiency Disorders

View Set

Exam #4 Chapter 9:Stereotyping & Prejudice (2)

View Set

The Rise and Fall of the Populist Party (1892-1909)

View Set

MetEd: Tropical Cyclone Forecast Uncertainty

View Set

GBA 2, Assignment 10 Social Insurance Concept

View Set

GEOG 161: Exam 3 (Chapters 8, 9, 10, 11)

View Set

MGMT 443 - Chapter 9 Gleim Practice Questions

View Set