Psy 370 Exam 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Heterotrait heteromethod

Different trait, different method discriminant validity: a mea

What will decrease the SEM?

higher reliability smaller std deviation

Multitrait-multimethod correlation matrix:

presents all the correlations between a group of different traits, attributes, or constructs, each measured by two or more different measurement methods or tests.

Effect sizes r and r^2

r .1 small .3 medium .5 large r^2 1% small 9% medium 25% large

When do we use the predictive method?

-Used when it is important to show a relationship between test scores and a future behavior...ie.., if the test scores and the criterion scores have a strong relationship. -Appropriate for validating employment tests

Heterotrait-monomethod

(different trait, same method) method of variance: variance due to method can be detected by seeing if the different-trait, same‑method correlations are stronger than the different-trait, different-method correlations.

monotrait-heteromethod

(same trait, different method) convergent validity: measures the same shit but in different ways The validity diagonal (monotrait-heteromethod correlations) contains the correlations of the same traits measured by different methods

In what situations, do you use evidence based on content v evidence based on relations with external criteria v evidence based on relations with other constructs?

-Evidence based on content: for tests that measure concrete attributes (observable and measurable behaviors) -Evidence based on relations with external criteria: for tests that predict outcomes -Evidence based on relations with other constructs: for tests that measure abstract constructs

Test-retest and formula

-Extent to which an individual obtains a similar score on different administrations Process:-Administer test to same group on two different occasions Objective: Assess stability of test over time Analysis:-Correlation: scores from the first and second administrations are then compared/correlated -Assumes test takes have not changes between 1st and 2nd test administration in terms of skill/quality being measured Formula: pearson product moment correlation

Coefficient Alpha

-For tests that have more than two possible answers (rating scales) -May be used for scales made up of questions with only one right answer

Coefficient of determination

-Helps us determine the amount of variance shared by the test and criterion Obtained by squaring the validity coefficient, so r=.3 --> r^2=.09 -Larger coefficients represent stronger relationship of greater overlaps between test and criterion -"What amount of variance do the test and criterion share?"

Face validity

-How test takers perceive the attractiveness and appropriateness of a test -Tells us nothing about what a test really measures -Might influence approach to test -Though non -statistical it might be what motivates a respondent to take the survey and take it seriously

Construct validity

-Involves accumulating evidence that scores relate to observable behaviors in the ways predicted by the underlying theory -Because all tests measure one or more constructs, the traditional term "construct validity" is synonymous with "validity"

Demonstrating evidence of validity after test development

-Involves examining the extent to which experts agree on the relevance of the content of test items -Experts review and rate how essential test items are to the attribute measured -We calculate Lawshe's (1975) Content Validity Ratio to determine agreement among experts. The more experts, the lower the minimum value

Classical Test Theory

-No instrument perfectly reliable or consistent -All test scores contain some error X=T+E X= observed score T= true score E= random error

Establishing a test format

-Written test: a paper-and-pencil test in which a test taker must answer a series of questions. -Practical test: requires a test taker to actively demonstrate skills in specific situations. -Instructional objectives: guides students' learning of the terms and concepts of reliability/precision and validity.

Test users must

-be aware of test bias and how it affects test validity -ensure tests are valid for minority populations -ensure tests are free of Qs requiring specific cultural background -use appropriate norm groups for minorities

Test takers

-fatigue -illness -exposure to test questions before the test -not providing truthful and honest answers

Standard error of measurement? What does it allow us to do?

-is an index of the amount of uncertainty or error expected in an individual's observed test score -aka how much the individual's observed test score might differ from the individual's true test score It allows us to quantify the amount of variation in a person's observed score that measurement error would most likely cause SEM=σsqrt(1-rxx) σ = standard deviation of one administration of the test scores rxx= reliability coefficient

What are the 5 sources of validity evidence?

1) Evidence based on test content: the extent to which items are representative of the construct the test measures (generally non-statistic, based on judgement, SMEs) 2) Evidence based on response process: involves observing test takers as they respond to the test or interviewing them when they complete the test to understand the mental processes they use to respond 3) Evidence based on internal structure: previously construct validity: involves examining if the conceptual framework used in test development can be demonstrated using appropriate analytical techniques 4) Evidence based on relations with other variables: previously criterion-related validity/construct validity: correlating test scores w other measures to see if they are related. The extent to which test scores are systematically related to relevant criterion. Involves accumulating evidence that a test is based on sound psychological theory. Indicates any concept or characteristic that a test is designed to measure. 5) Evidence based on the consequences of testing: involves determining if the correct inferences are being made based on the interpretation of the test scores

Test administration

-not following administration instructions -disturbances during the test period -answering test takers' questions inappropriately -extremely cold or hot room temperature

Test scoring

-not scoring according to instructions -inaccurate scoring -errors in judgment -errors calculating test scores

Test itself

-poorly designed -trick questions -ambiguous questions -poorly written questions -reading level higher than the reading level of target population

Test publishers should

-prevent test misuse by making test manuals and validity information available / accessible before purchase -refuse to provide test materials to persons who do not have testing credentials or who are likely to misuse the tests

Linear regression

-statistical procedure for predicting performance on a criterion using one set of test scores Y′ = a + bX where Y′ = the predicted score on the criterion a = the intercept b = the slope X = the score the individual made on the predictor test

Validity coefficient

-the resulting correlation coefficient when two sets of scores are correlated - usually test scores and criterion score a statistic used to infer the strength of the evidence of validity that the test scores might demonstrate in predicting job performance. rxy

Gathering Theoretical Evidence

1) List associations construct has with other constructs 2) Propose one or more hypotheses using the test as an instrument for measuring the construct -Establish nonmological network -propose experimental hypothesis

What are the four sources of error?

1) test itself 2) test administration 3) test scoring 4) test takers

Factors that Influence Reliability

1) test length 2) homogenityof questions 3) test-retest interval 4) administration 5) scoring 6) cooperation of test takers

Concurrent method process

1)Administer test and criterion measure at same time to same group of people 2)Correlate tests scores with the criterion

The predictive method process

1. Group of people take the test (the predictor) 2. Scores held for a pre-established time interval 3. When time has elapsed, researchers collect a measure of some behavior (the criterion) 4. Correlate tests scores with the criterion

Questions w high internal consistency v low internal consistency?

7+8 similar to 8+3 3+8 not same as 150 x 300

For example: rxx = .8 and σ = 20 and test score = 76

95% CI = 76 +/- [1.96 x [20 x sqrt(1 - .8)]] = [68.16, 83.84]

Internal consistency and formula

A measure of how related items or groups of items on a test are to each other -Process: heterogenous test or homogenous subsets is split in half and scores on first half compared with scores on second half -Objective: assesses how related items or groups of items are to one another -Analysis: scores on both halves are correlated -Administer test to a single group to see if the test items all share something in common -Appropriate for homogeneous tests -Pearson product moment correlation corrected for length by Spearman-Brown formula Also: coefficient alpha or KR-20

Confidence interval

A range of scores that we feel confident will include the test taker's true score 95%CI=X +/- 1.96(SEM) 1.96= the 2 points on the normal curve that include 95% of the scores

Alternate-Forms and formula

Alternate forms: the test developer creates two different forms of the test two ways to do it: one way is to have everyone take test 1, then test 2. other way to do is to have first half of class take 1 and second half of class take 2 at the same time -Pearson product moment correlation

What is alternative to norm and criterion referenced test?

Alternative is Authentic Assessment - assesses student's ability to to perform real-world tasks by applying the knowledge and skills he or she has learned

Scorer reliability aka interrater reliability and formula

Amount of consistency among scorers' judgments -Process: two or more people score the same tests -Objective: Assess consistency among scorers' judgements -Analysis: scores by both scorers are correlated or intraclass correlations used if more than 2 raters -two or more individuals score the same test Formula: Pearson product moment correlation

Factor analysis

An advanced procedure that helps investigators explain why items within a test are correlated or why two different tests are correlated

Cohen's Kappa

An index for calculating scorer reliability/inter-rater agreement when scorers make judgments that result in nominal and ordinal data (pass/fail essay questions, rating scales on personality inventories) -ranges from -1 to 1 1= complete agreement sum up expected count of agreement agreements(N)-expected count agreements/ total count(N)- expected count agreement

Reliability coefficient

An index of the strength of the relationship between two sets of test scores r=correlation coefficient rxx= reliability coefficient

What is generalizability theory?

Another approach for estimating reliability -Concerned with how well and under what conditions we can generalize an estimation of reliability of test scores from one test administration to another -Proposes separating sources of systematic error from random error in order to eliminate systematic error -breaks down variation and error into multiple sources (raters, occasions, items)

Construct

Behaviors: actions that are observable and measurable.

Norm referenced test v criterion referenced test

Compare a test taker's score with the scores of a group of test takers who took the test previously (or could also be current cohort) (SAT, ACT, GRE) -Used to determine how well an individual achievement compare with others -test score compared to others. percentiles Compare a test taker's score with an objectively stated standard of achievement (e.g., learning objectives for the chapters in this text) -Used to learn whether individual learned specific knowledge or can demonstrate a skill -test score compared to highest possible score. percentages

Concrete v Abstract attributes

Concrete attributes: can be clearly described in terms of observable and measurable behaviors. Ex: baking a cake, playing the piano, math knowledge Abstract attributes: more difficult to describe in terms of behaviors bc people may disagree on what the behaviors represent Ex: intelligence, personality, creativity

What are the traditional views of validity? V the current one?

Content Validity: evidence based on content Criterion-Related Validity: Evidence based on response process/relations with other variables Construct Validity: evidence based on internal structure -This view suggested that there were different "types" of validity. -The current view is that validity is a single concept with multiple sources of evidence to demonstrate it.

Evidence of Construct validity? (2 types)

Convergent validity: evidence that test scores correlated with scores on other tests that measure the same construct Discriminant validity: evidence that test scores are not correlated with unrelated constructs

Correlational coefficient v Validity coefficients

Correlational coefficient: quantitative estimate of the relationship between two variables Validity coefficients: correlation coefficients for predictive evidence and concurrent evidence

Correlational coefficient v reliability coefficient

Correlational: -1 to 1 Reliability coefficient: 0 to +1. Negative reliability would probably indicate a computational problem

Multitrait-multimethod (MTMM) design:

Donald Campbell and Donald Fiske cleverly combined the need to collect evidence of reliability/precision, convergent evidence of validity, and discriminant evidence of validity into one study. monotrait-hetero method hetero trait mono method hetero trait hetero method

What are the two methods for obtaining evidence of validity based on test content?

During Test Development After Test Development

Effect sizes for Eta squared

Effect Size* .01 = small .059 = medium .138 = large

Ways to establish quantitative evidence

Experimental interventions Evidence of validity based on content Evidence of validity based on relations with external criteria Multiple studies

Gathering evidence of construct validity (traingles)

Full triangel: hetero trait mono method: multitrait method correlation matrix: want low correlations. If aren't low you have hetero variance Dashed triangle: hetero trait heteromono method --> discriminant validity: want low correlations not in triangle: Mono triat, hetero method: convergent validity: want high correlations

Test of significance

Help us determine the relationship between the test and the criterion and the likelihood of the relationship would have been found by chance alone How likely is it that the correlation between the test and the criterion resulted from chance or sampling error?" If p<.05 we can be confident that the test and criterion are related. And it is not due to chance or sampling error. Significant means there is evidence of a true relationship and not significant means there is no relationship

Calculating internal reliability for heterogenous tests

Heterogenous tests have multiple subtests or factors Accounting skills Calculations skills Use of spreadsheet

What is the difference between interrater reliability and interrater agreement?

Interrater reliability: given the test once, and have it scored (interval or ratio-level data) by scorers or two methods Interrater agreement: create a rating instruments and have it completed by two judges (nominal or ordinal level data)

Interrater agreement v intrarater agreement and formula

Interrater: an index of how consistently the scores rate or make decisions on same test -Formula: cohen's kappa Intrarater: when one scorer makes judgments, the researchers also wants assurance that that scorer makes consistent judgments across all tests Intrarater agreement: Intraclass correlation coefficient

Interscorer v intrascorer

Interscorer aka scorer reliability: the amount of consistency among scorer judgements Intrascorer reliability: whether each scorer was consistent in the way he or she assigned scores from test to test

During Test Development

Involves performing series of systematic steps 1. Defining the testing universe 2. Developing the testing specifications: 3. Establishing a test format 4. Constructing test questions

Objective criterion v subjective criterion

Is observable and measurable E.g., number of accidents on the job, number of days absent, number of disciplinary problems in a month, behavioral observation, GPA, withdrawal or dismissal Subjective criterion: based on a person's judgement E.g., supervisor and peer ratings, teachers recommendation, diagnosis

For the test-retest graph on the slide. Do you think self-efficacy changes over time?

It does not bc r=.752 is high, so it is stable over time. the sig was .000

Job analysis

Job analysis: a process that identifies the knowledge, skills, abilities, and other characteristics required to perform a job.

Concurrent method

Large group takes the test (the predictor) Same group takes another measure (the criterion) with evidence of reliability and validity. Tests are taken at same point in time Scores are correlated

Predictive method

Large group takes the test (the predictor) and their scores are held (ex: for 6 months) In the future, same group is administered a second measure with evidence of reliability and validity (the criterion) Scores are calculated

In general, what kinds of tests are more reliable?

Longer tests

How to create y= mx + b formula from statistical output?

Look at coefficients table. Under B where hrs study that is x, and plug in Constant under B so y'=9.732 + .622x

Where do you like in the statistical output for how adding a 2nd predictor affected incremental validity?

Look at r square change

Low incremental validity v high incremental validity

Low: large overlap between test 1 and test 2 High: small overlap between test 1 and 2 denoted as r1,2

What is reliability?

Measures consistency -A reliable test is one we can trust to measure each person in approximately the same way every time it is used *random definitions: reliability refers to the degree to which test scores are free from errors of measurement ... the same result on repeated trials

Homogeneous tests v heterogeneous tests

Measuring only one trait/characteristic v measuring more than one trait/characteristic

multicollinearity

Multicollinearity is a statistical concept where several independent variables in a model are correlated Multicollinearity among independent variables will result in less reliable statistical inferences. mot cortex and alc intake highly correlated w each other .797 when two predictors highly correlated/related, harder to tease out the unique contribution of the predictors in an ideal world. can't tell whether really predict, or overlap/related

What is the challenge of test-retest reliability?

Practice effects and fatigue Make interval long enough for practice effects and make sure test takers not permanently changed by taking test

Evidence based on relations with other variables: previously criterion-related validity: What are the two types of evidence?

Predictive validity: extent to which scores predict future behavior Concurrent validity: extent to which scores correlate with current performance

Random error v Systematic error

Random error: will increase/decrease a person's score by exactly the same amount with infinite testing, cancels itself out, lowers reliability of test Systematic error: occurs when source of error always increase or decrease a true score, does not lower reliability of a test since the test is reliably inaccurate by the same amount each time

What does reliability depend on and what does validity depend on?

Reliability depends on characteristics of the test itself Validity depends on the inferences that are going to be made from test scores

Reliability (statistically speaking)

Reliability is the ratio of true score variance over observed score variance % of observed score variance attributable to true score variance

Reliability v Precision per text

Reliability/precision in general For statistical evaluation, reliability coefficient is preferred

Exploratory v Confirmatory Analysis

Researchers do not propose a formal hypothesis about the factors that underlie a set of test scores, but instead use the procedure broadly to help identify underlying components (not based on theory/hypohtesis) Confirmatory: Researchers specify in advance what they believe the factor structure of the data should look like and then statistically test how well that model actually fits the data. statistical procedure to cofnirm whether factors in a theory actually exist

What programs calculate coefficinet alpha and KR-20?

SAS and SPSS

Eventhough its underreported, suicide is what number cause of death among adolescents?

Second or third

Random error

Unexplained difference between a person's actual score on a test (the obtained score) and that person's true score lowers reliability

What can we assume about SEM if an individual took a test an infinite number of times:

Standard deviation is the measure of spread in your sample. Standard error is more of an estimate of the population—it allows us to compute a confidence interval to estimate the location of the true population mean. So if you compute a 95% confidence interval (SE * +/- 1.96), you can make the claim that if you computed the sample mean and infinite number of times, the true population mean will fall within the confidence interval 95% of the time. -Approximately 68% of the observed test scores (X) would occur within +/-1 SEM os the true score -Approximately 95% of the observed test scores (X) would occur within + 2 SEM of the true score (T) -Approximately 99.7 percent of the observed test scores (X) would occur within + 3 SEM of the true score (T) -use this info to create confidence intervals: a range of scores that we feel confident will include the test taker's score

Spearman and Brown formula

Statistically adjusts the reliability coefficient when test length is reduced to estimate what the reliability would have been if the test were longer -Used for split half reliability: which correlates first half of measure with second half of measure

"If a student scores 65 on the ASE test, what course grade would we expect the student to receive?" PROCESS

Step 1: Calculate the means and standard deviations of X and Y. Step 2: Calculate the correlation coefficient (rxy) for X and Y. Step 3: Calculate the slope and intercept. Step 4: Calculate Y′ when X = 65. Step 5: Translate the number calculated for Y′ back into a letter grade.

How is split halves done?

Test questions in the original test were randomly assigned to split half 1 and other half to split half 2. This procedure resulted in two tests, each one half as long as the original test --> the more questions on a test, the higher the reliability --> therefore we must adjust the reliability coefficient through the spearman brown formula

What are the four types of reliability coefficients?

Test-retest, alternate forms, internal consistency, scorer

What are the two methods for evaluating validity coefficients?

Tests of significance Coefficient of determination

KR-20

Use when questions can be scored either right or wrong, so true-false and mc

Criterion

The measure of performance (independent behaviors, attitudes, events) that we correlate with test scores

True score

The score that would be obtained if an individual took a test an infinite number of times and then the average scores of all testings were computed

monotrait monomethod

These are the correlations between the same (mono) traits using the same (method). This is equivalent to correlating a test with itself, so it is really a reliability coefficient

What is the mathematical relationship of validity and reliability?

Validity coefficient can't be greater than sqrt reliability if rxx= .64, then rxy has to be less than .8 -test needs to be reliable to also be valid

Gathering Psychometric Evidence

Ways to establish quantitative evidence Reliability Convergent evidence of validity Discriminant evidence of validity Multitrait-multimethod design

What does cross loading mean

When an item is correlated equally to two components

The concurrent method

When test administration and criterion measurement happen at the same time -Appropriate for validating clinical tests that diagnose behavioral, emotional, or mental disorders and selection tests -Often used for selection tests because employers do not want to hire applicants with low test scores or wait a long time to get criteria data

Evidence of validity based on relationships with external criteria

When test scores correlate with independent behaviors, attitudes, or events

Multiple regression

Y′ = a + b1X1 + b2X2 + ... Y′ = the predicted score on the criterion a = the intercept b = the slope of the regression line and amount of variance the predictor contributes to the equation, also known as beta (β) X = the predictor

Developing the testing specifications

a documented plan containing details about a test's content....similar a to blueprint *Includes Content areas: the subject matter that the test will measure.

Nomological network

a method for defining a construct by illustrating its relation to as many other constructs and behaviors as possible

The one that looks like a box plot but isnt is called

a scatter bar

Coefficient of multiple determination

a statistic for interpreting the results of a multiple regression.

Validity (traditional v current view)

accuracy Traditional: Does the test measure what it was designed to measure? Current: Does the test have evidence of validity for its intended use? A test can measure what it was designed to measure, but not be valid for a particular purpose (in class example of scale (kg) and intelligence)

An alpha of .70 may be fine for exploratory studies but not for...

basing critical, individual decisions (hire/fire, medicate/don't medicate)

Order effects

changes in test scores resulting from the order in which the tests were taken differences in participant responses as a result of the order in which treatments are presented to them.

Construct explication

defining or explaining a psychological construct 1) Identify the behaviors that relate to the construct. 2) Identify other constructs that may be related to the construct being explained. 3) Identify behaviors related to similar constructs, and determine whether these behaviors are related to the original construct.

Parallel forms

describes different forms of the same test

Goodness-of-fit test

evidence that the factors obtained empirically are similar to those proposed theoretically.

Criterion contamination

if the criterion measures more dimensions than those measured by the test

Convergent evidence of validity:

if the test is measuring a particular construct, we expect the scores on the test to correlate strongly with scores on other tests that measure the same or similar constructs.

Competency modeling

is a procedure that identifies the knowledge, skills, abilities, and other characteristics most critical for success on some or all of the jobs in an organization

Linear regression v multiple regrresion

linear uses one predictor/test to predict a criterion multiple regression uses more than one predictor/test to predict criterion

Correction for attenuation

rxoyo = rxtyt/sqrt(RxxRyy) Correction for attenuation (CA) is a method that allows researchers to estimate the relationship between two constructs as if they were measured perfectly reliably and free from random errors that occur in all observed measures.

Spearman brown formula

rxx=nr/1+ (n-1)(r) rxx=estimated reliability of test n=number of questions in the revised version divided by the number of questions in the original version of the test r=the calculated correlation coefficient for two short forms of the test

Content validity is heavily reliant on the assumption

that the test constructor has truly tapped the domain of interest

Defining the testing universe

the body of knowledge or behaviors that a test represents.....review other instruments...interview experts...research the construct

slope (b) of the regression line

the expected change in one unit of Y for every change in X b=r(Sy/Sx) where r = the correlation coefficient Sx = the standard deviation of the distribution of X Sy = the standard deviation of the distribution of Y

Intercept

the place where the regression line crosses the y axis. when the predictor is 0 -sometimes doesnt mean anything if no one scores a zero, or if that score not possible a= Yline - bXline Yline= the mean of distribution of Y b=slope Xline= the mean of the distribution of X

Attenuation

the reduction in the validity coefficient (given unreliablity) rxoyo = rxtyt*sqrt(RxxRyy)

Construct validity process involves gathering two types of data?

theoretical evidence psychometric evidence

When a relationship can be established between test scores and a criterion, we can...

use test scores from other individuals to predict how well those individuals will perform on the criterion measure.

Although the general incidence of suicide has decreased during the past two decades, the rate for people between 15 and 24 years of age has

tripled

Measurement error

variations in measurement using a reliable instrument

Restriction of range

when a range of test scores or criterion scores is reduced or restricted restricting a range usually r becomes lower

Discriminant evidence of validity:

when the test scores do not correlate with unrelated constructs.


Kaugnay na mga set ng pag-aaral

Economics Chapter 4 Practice Exam

View Set

Computer Security Final (shorter)

View Set