PSY 606 Exam 2

Ace your homework & exams now with Quizwiz!

What types of validity emphasized by achievement vs aptitude tests?

Achievement tests tend to stress content validity vs. Aptitude tests tend to stress criterion-related and construct validity

What is σ²X?

variance

What is σ?

standard deviation (SD)

Reliability coefficient definition

"ratio of the variance of true scores over the variance of total scores" [test score x and another version of itself x (two different versions of the same test)]

Maximal performance test: 2 measures of interest 1) Discrimination Index (D)

(D) = pT - pB; proportion top minus proportion bottom for a given item and D to be at least 0.30 or higher to be considered acceptable discrimination

Behavioral problems/disorders

(e.g., ADD, ADHD, Hyperactivity) = require behavioral assessment and behavioral (presence and absence of key behaviors)

Source of error (reliability): Statistical factors

(specifically looking at restricted range) main concern there is for split half by definition split half is only looking at only half of the test

What are the 5 factors of the Stanford Binet 5 (SB5)?

1) Fluid Reasoning, 2) Knowledge, 3) Quantitative Reasoning, 4) Visual-Spatial Processing, 5) Working Memory

Identify the 4 index scores of the WAIS-IV (2008)

1) Verbal Comprehension Index (VCI), 2) Perceptual Reasoning Index (PRI), 3) Working Memory Index (WMI), 4) Processing Speed Index (PSI)

What is σ²T?

0.8 and higher is good for the variance of true

Problems with culture-fair tests?

1) Assumptions of universal understanding of non-verbal items: (when you take a language from the test, is it true there is universal understanding, culture can change non-verbal items even with pictures); 2) Restricted range of abilities tested: when you are only looking at fluid stuff you are not looking at/miss all the verbal things and range is restricted (reliability and validity coefficients come down); 3) Lack of comparability with other mainstream IQ tests: comparing WISC with Stanford-Binet is like comparing apples and oranges, limits what you can do with these tests

Role of SE in computing confidence intervals (68% and 95% confidence intervals)?

68% confidence interval = X + or - 1 SEM (standard error) 95% confidence interval = X + or - 2 SEMs (standard errors)

How is the SEM used to set up 68% and 95% confidence intervals?

68% confidence interval = X + or - 1 SEM (standard error) 95% confidence interval = X + or - 2 SEMs (standard errors) e.g. if you have a WAIS IQ of 110, SEM = 3 and 95% confidence; score would be 110 + or - 6 which is 2 SEMs of 3 basically saying that you are 95% confidence that the person's true score would be somewhere between 104 and 116 = 110-6 and 110+6

Typical performance test measures: 2) Item-total correlation (rpb)

= correlation between item score (type dichotomous: either yes or no) and total score (some continuous number depending on how many points are possible on the test); rpb is a special case of correlation called point by serial correlation that refers to correlation between a dichotomous x variable and a continuous y which is exactly what we are doing here. Interpretation: acceptable D >= 0.30

Typical performance test measures: 1) Discrimination index (D) [use of percent endorsement]

= percent endorsement of Group A minus percent endorsement of Group B (D = % endorsement Group A - % endorsement Group B); if on item 1 60% of group A (those that are depressed) said yes that's a percent endorsement of 0.60 minus 30% of group B (who are non depressed people) said yes so D = 0.60-0.30=0.30 which is acceptable discrimination

What is acceptable and unacceptable discrimination?

Acceptable discrimination is when it meets the criterion of 0.30 or higher but unacceptable if it is less than 0.30.

Testing in Educational Settings Aptitude vs achievement:

Aptitude = learning ability/potential vs. Achievement = crystallized learning (actual learned)

What type of correlation does congruent validity, convergent vaidity, and discriminant validity produces?

Congruent validity: produces the highest correlation Convergent validity: intermediate correlation Discriminant validity: produces the lowest correlation In fact normal pattern or rank order pattern is met then you have consistent with good construct validity because that is one of the theoretical pattern you are looking for

Approach: Method of contrasted groups

Contrasted groups of interests: those who are inside the domain vs. those who are outside the domain; If we have a depression scale, we want to contrast those who are inside the domain who are depressed versus those who are outside the domain who are not so the items on the depression scale should all discriminate those groups

Sample Problem # 1 Top Group: 8 correct and 2 incorrect Bottom Group: 5 correct and 5 incorrect Find p & D & interpretation.

PT = 8/10 = 0.80 PB = 5/10 = 0.50 P = (0.80+0.50)/2 = 0.65 D = 0.80-0.50 = 0.30 Interpretation: Easy question, Acceptable discrimination, Extrinsic ambiguity

Unacceptable discrimination

D<0.30 it is less than 0.30 so does not meet criterion

Acceptable discrimination

D>=0.30 meets the criterion of 0.30 or higher

Interpretation of Difficulty index

Difficulty index (p) - optimum = .50 Easy = greater than 0.50 Difficult = less than 0.50

Short Answer Question If you were constructing a depression scale, describe how you would go about dealing with each of the following types of validity: a) face; b) content; c) criterion-related; d) construct (Note: make your response specific to depression!)

Face validity is whether the scale appears at face value to be measuring depression so how I would deal with face validity is having a test item like "I have recently thought of killing myself" since it is measuring whether one has any suicidal ideation which is considered a symptom of depression. I would also manipulate the face validity in the depression scale by having both obvious and less obvious questions to control social desirability like items about stress, eating habits, and sleeping routines. I would deal with content validity by providing items on all different aspects of symptoms one can experience with depression, for example, depressed mood: I felt sad/depressed, loss of all interest or pleasure: I lost interest in my usual activities/nothing makes me happy, loss of appetite or weight loss: I have a poor appetite/I lost a lot of weight without trying to, sleep disturbance: I sleep much more than I used to/I have a lot of trouble getting to sleep. Since criterion-related validity is the correlation between test x and some criterion y so I would try to use another depression index like the Beck's Depression Inventory as a reference when trying to construct a depression scale to compare them with one another. As for construct validity, I would need to make sure that the depression scale is really measuring depression by collecting data about the individual's symptoms and indicators like low self-confidence, social withdrawal, fatigue, and low energy levels. Within construct validity, there are 3 types of validity: congruent, convergent, and discriminant which I would compare and correlate the depression scale with other test measures. For congruent validity, I would compare the depression scale with the Beck's Depression Inventory since they both measure the same construct. For convergent validity, I would compare the depression scale with the generalized contentment scale and for discriminant validity I would compare it to the sexual attitude scale.

Validity Individual: X = R + I + E Population: σ²X = σ²R + σ²I + σ²E

For the sake of validity we are partitioning T into two components: Relevant and Irrelevant components. At the Individual level become score consists relevant plus irrelevant component plus random error. To estimate validity coefficient we need to talk about population notation: the variance of squared x equals the variance of relevant plus the variance of irrelevant plus the variance of error

Short Answer Question Group # correct # incorrect T ## ## ____ ____ B ## ## Find p & D. (Data to be provided on test): Based on the above data, calculate the p and D indices and place them in the blanks provided. Also: a) what can you say about the level of difficulty of the above item?; b)...the discrimination level? c) is this a good or bad item? Explain (mention the type of ambiguity, if appropriate).

Formulas pT = # top group correct / total that group pB = # bottom group correct / total that group p = (pT + pB)/2 D = pT - pB a) Easy question vs. Difficult question b) Unacceptable discrimination vs. Acceptable discrimination c) Intrinsic ambiguity vs. Extrinsic ambiguity Interpretation: a) Difficulty index (p) - optimum = .50 Easy = greater than 0.50 Difficult = less than 0.50 b) Discrimination index (D) - acceptable is D > or equal to .30 c) Ambiguity: Intrinsic (bad kind) — D < .30 Extrinsic (good kind) - D > or equal to .30

Special Needs-- relevant public laws: PL 95-561

Gifted and Talented Children Act offers incentives for gifted programs where certain funding are given to certain districts/schools for these programs also students can get a geographic exception into any school they want with those services provided to them

What does higher correlation coefficient for a given item indicate?

Higher correlation coefficient means a significant correlation and acceptable discrimination (good item); perfect correlation = 1.00

Mental retardation/Intellectual disability (MR)

IQ score with 70 or lower & based on psychosocial maturity level then you might qualify for this

What does "intrinsic vs extrinsic ambiguity" refer to?

Intrinsic Ambiguity ("bad kind") - is when it is ambiguous to those that are inside the domain which are the ones that have the higher scores and know the material so that's bad but the kind you want is. Extrinsic Ambiguity ("good") - it should be ambiguous to those outside the domain the ones that have the lower scores.

Definition of Validity

Is the scale measuring what it is suppose to

How is content validity addressed on achievement tests?

Matter of finding "best fit" between curriculum of school or school district with the emphasis of a given test [e.g., if strength is math and weakness is language arts, then Stanford Achievement Test fits best (e.g., Hawaii)]

Sample Problem # 2 Top Group: 4 correct and 6 incorrect Bottom Group: 3 correct and 7 incorrect Find p & D & interpretation.

PT = 4/10 = 0.40 PB = 3/10 = 0.30 P = (0.40+0.30)/2 = 0.35 D = 0.40-0.30 = 0.10 Interpretation: Difficult question, Unacceptable discrimination, Intrinsic ambiguity

Types of Reliability: Internal consistency measures Inter-item correlation

define as the average correlation among all possible pairs of items

Intelligence tests--theory: ratio vs deviation IQ--what is latter preferred?

Ratio IQ (Early form of IQ) = ratio of mental age over chronological age times 100 =6/5*100 = IQ of 120 Standard deviation IQ: Z score (15) +100 = Current IQ = 1(15)+100=115 The deviation IQ is more preferred since it is intended to be more accurate.

Typical performance test measures: Two different forms of discrimination 1) Discrimination index (D) & 2) Item-total correlation (rpb)

Since there are no right or wrong answers so p drops out, it does not make sense to talk about item difficulty when you don't have any right or wrong answers

"Dimension of specificity of background presupposed"—Where do different kinds of tests fall on this continuum?

Specific Background <--> General Background Classroom Exams-> Standardized Achievement Tests->SAT->WAIS-Verbal->WAIS-Performance->"Culture-Fair" tests Classroom exams are the most specific background Culture Fair tests are the general background

Types of Reliability: Internal consistency measures Split-half

Split-half which looking at one half of the test with another half, typically it is odds vs. evens items because you are only looking at half the test you have to correct for the restriction of range which you make use of Spearman Brown prophecy formula to perform that correction (end result would be called the corrected split half)

Intelligence tests--theory: Individual tests: Major areas tested on Stanford-Binet test

Stanford-Binet test measures 5 main factors of Aptitude pure reasoning, knowledge of meaning of words, ability to do word problems like math in your head, ability to manipulate shapes, and ability to keep things in memory and process things.

Short Answer Question Answer both parts: a) What do "pluralistic" vs "mainstream" views of cultural influences in aptitude testing refer to? b) Identify 3 problems with so-called "culture-fair" tests.

The pluralistic approach is when you compare same group and same group so "like with like" and its views of cultural influences in aptitude testing refers to the system of multicultural pluralistic assessment where it is comparing against different norms. The mainstream approach is when there is one set of norms for everybody and its views of cultural influences in aptitude testing refers to tests like the KBIT and the WRAT. One of the problems with "culture-fair" tests is the assumptions of universal understanding non-verbal items. Another problem with "culture-fair" tests is that there is a restricted range of abilities tested. The last problem of culture-fair" tests is the lack of comparability with mainstream IQ tests.

Short Answer Question Describe what each of the following types of reliability is conceptually, and what type of error it represents: a) test-retest; b) Cronbach's alpha; c) alternate forms reliability; d) standard error of measurement (SEM)

The test-retest is looking at the issue of stability over time and the end result of test-retest is reliability coefficient, which you hope is at least 0.80 or higher. But the type of error for test-retest is true change where error comes from the span of time the test is being retest. The Cronbach's alpha will give you the estimate for variably scored tests, in other words, tests where they have variable levels of scoring like Likert type scales is an example and the type of error it has is item sampling error, which that the way the items were sampled weren't equivalent. Alternate forms reliability is when tests have gone through the trouble of developing equivalent alternate forms like the WRAT there is the tan form and blue form. The type of error for alternate forms reliability is also item sampling error. The standard error of measurement (SEM) is the standard deviation of error around the true score and it can be used to form confidence intervals around the true score. The type of error for SEM is random factors and chance differences since SEM can affect any of these measures. The split half is when you are looking at one half of the test with another half, typically it is odds vs. evens items. However, type of error for split half is statistical factors due to the truncated range you need to correct or compensate for the restriction of range by using the Spearman Brown prophecy formula to perform that correction which would result in a corrected split half.

What is the purpose of item analysis in general?

To increase reliability and validity by detecting and removing items that fail to discriminate between groups of interests

Intelligence tests--theory: Individual tests: Major areas tested on Wechsler tests

Wechsler test measures verbal vs. performance IQ and the tests represent intelligence throughout the life span (WAIS - Adult age 15 years and up, WISC - grade school kids age 6-15 years, WPPSI - preschool kids early ed.) Some of the subtests in the WAIS-IV (2008) include vocabulary, matrix reasoning, visual puzzles, arithmetic, symbol search.

Reliability Individual: X = T + E Population: σ²x = σ²xT + σ²xE

When estimating reliability, the individual level is converted to population notation: the variance of x = variance of true score + variance of error.

Which of the following falls in the class of a behavioral disorder? a) Attention Deficit Hyperactivity disorder (ADHD) b) Mental retardation c) Learning Disorder (LD) d) All of the above

a) Attention Deficit Hyperactivity disorder (ADHD)

Learning Disorder (LD)

a) exclusion factor; b) discrepancy factor (aptitude vs. achievement); discrepancy factor (between KBIT composite vs. WRAT score) a specific deficient]

For clinical scales (e.g., anxiety, depression) it's a good idea to manipulate what type of validity? a) face b) content c) criterion-related d) construct

a) face

The primary source of error for split-half reliability is: a) statistical factors (restricted range) b) random or chance differences c) true change d) item sampling

a) statistical factors (restricted range)

The primary source of error for test retest is: a) true change b) item sampling c) random or chance differences d) statistical factors (restricted range)

a) true change

Types of validity: Construct Validity

asking the question do all the correlations with other measures, other than the test that you are validating, do the patterns of correlations between the test and the other measures make theoretical sense? Among those theoretical patterns you are looking at are: Congruent validity, Convergent validity, Discriminant validity

If someone's WAIS-IQ was 103, and the published SEM is 3 pts, then the 95% confidence interval for that score would be: a) 103 +- 3 b) 103 +- 6 c) 95 +- 3 d) 95 +- 6

b) 103 +- 6

The reliability coefficient (rxx) is essentially correlating the test score with a: a) different test measuring something similar b) another version of the same test c) some criterion measure d) all of the above

b) another version of the same test

If a given test takes into account *all* aspects of a given construct, then that test shows acceptable ______ validity: a) face b) content c) criterion-related d) construct

b) content

A discrepancy of 1.5 SDs or greater between aptitude and achievement is consistent with: a) mental retardation b) learning disorder c) ADHD d) All of the above

b) learning disorder

If a given item on an aptitude test predicts school achievement better for ethnic group A than group B, then this would be construed as evidence for ____ bias. a) measurement b) prediction c) content d) SEM

b) prediction

For Charles Spearman, the unique contribution of a given test was reflected as: a) g-factor b) s-factor c) fluid IQ d) crystallized IQ

b) s-factor

The Spearman-Brown prophecy formula is used to correct which of the following estimates of reliability: a) test retest b) split half c) alternate forms d) all of the above

b) split half

Why is Item-total correlation (rpb) more better than discrimination index?

because in the case of D they are using 0.30 as the cut off but remember D is blind to sample size; if you hit 0.30 you don't know whether you have 10 folks or 100 folks so you aren't taking into account the issue of sampling error. Whereas the Acceptable rpb value is based on significance so whatever the critical value of r is for that sample size if it meets or beats that value you have significance, in that case it's taking sample size into account which is accounting for sampling error so rpb is a more reliable and more sensitive measure to discrimination.

Given: y' = 5X + 10 and Se = 3, which of the following is the 68% confidence interval for X = 5? a) 35 plus or minus 6 b) 25 plus or minus 6 c) 35 plus or minus 3 d) 25 plus or minus 3

c) 35 plus or minus 3

Who gave us the concept of "mental age?" a) David Wechsler b) Francis Galton c) Alfred Binet d) Raymond Catell

c) Alfred Binet

Which of the following would be more specific on the "scale of specificity presupposed?" a) culture-fair tests b) SAT c) Classroom tests d) Achievement tests

c) Classroom tests

If you wish to show stability of a test, you would calculate: a) Cronbach's alpha b) Split-half reliability c) Test retest reliability d) SEM

c) Test retest reliability

If the bottom scoring group gets an item correct more often than the top scoring group, this would imply: a) easy item b) extrinsic ambiguity c) intrinsic ambiguity d) difficult item

c) intrinsic ambiguity

If you want to validate an employee screening test and compare scores to supervisor's ratings one year later, this would constitute what type of validity? a) content b) construct c) predictive d) concurrent

c) predictive

Approaches to cross-cultural testing: Pluralistic

compare same group and same group [compare "like with like" (e.g. SOMPA - system of multicultural pluralistic approach comparing against different norms like in California, stratifying the norms) [separating Asian kids and comparing them with other Asian kids]

Problem of test bias: Content bias

content of items biased towards dominant group (biased in favor of one group than another, systemic errors like culture)

Types of validity: Construct Validity Convergent validity

correlations with measures of similar constructs; if you are comparing your index of self-esteem with generalized contentment (Hudson GCS) which is theoretically similar to self-esteem, convergent means to come together refers to using constructs that are similar but not necessarily identical to

Types of validity: Construct Validity Discriminant validity

correlations with measures of unrelated constructs; the idea to discriminate or to separate, when you are comparing a construct that is very different, for example, compare index to self-esteem against a child's attitude towards mother (Hudson CAF/CAM) something like that

Types of validity: Construct Validity Congruent validity

correlations with other measures of the same construct; for example, self-esteem as a test to validate and correlating the Hudson index of self-esteem with someone else's test on self-esteem like Coopersmith's SE, comparing the test against another test of the same things, congruent means the same as

Source of error (reliability): Chance or Random differences

could affect any of the types of reliability we discussed

Types of validity: Criterion-related Validity

criterion-related and construct: both qualitatively assessed involving correlation, in the case of criterion-related you are picking a criterion that is a more direct measure of the construct in other words it is not a paper and pencil test (that would be indirect measurement), criterion is typically rating by other so if the criterion comes in the future like with the SAT and the college GPA then it is a predictive, if the criterion coming at the same time as you are taking the test like the MMPI and the criterion of the clinician's ratings then it is called concurrent validity; With the case of using correlation, coming up with a validity coefficient rxy using the assessment of adequate validity is in the form of significant correlation so we are not trying to hit .8 or higher we merely want to achieve significance in the form of correlation Once you have a significance rxy for criterion validity, you can use the validity coefficient the correlation to predict future criterion values from present test scores and you do that by using linear regression: y'=bX+a [estimated value of the y (y')=slope of line (b)* any X value (X)+y intercept for that regression line (a); where b is defined as the validity coefficient r times standard deviation of y divided by standard deviation of x (b=r*sy/sx) and a is defined as the mean of Y minus b times by the mean of X (a=mean Y-b*mean X)

A way of estimating the average correlation among all pairs of items for variably-scored items (e.g., Likert scales) is: a) KR-20 b) Split-half c) Alternate forms d) Cronbach's alpha

d) Cronbach's alpha

Which of the following is a short-cut method of calculating the inter-item correlation for tests with dichotomous items? a) Cronbach's alpha b) Alternate forms c) Test retest d) KR-21

d) KR-21

The major advantage of item-total (rpb) over discrimination index (D) is that the item-total correlation: a) uses significance as a criterion of acceptable discrimination thereby taking sample size into account b) is a more sensitive measure of discrimination c) always produces a larger value d) a and b above

d) a and b above

Which of the following is a provision of the Individuals with Disabilities Education Act of 1990? a) education in least restrictive environment b) individual education plans (IEPs) c) operational definitions of each disorder d) all of the above

d) all of the above

Correlating the results of the Hudson Index of Self Esteem with the Coopersmith Self Esteem Scale would be an example of: a) convergent validity b) discriminant validity c) content validity d) congruent validity

d) congruent validity

If p = .65 and D = .20, we could characterize this item as: a) difficult and extrinsic ambiguity b) easy and extrinsic ambiguity c) difficult and intrinsic ambiguity d) easy and intrinsic ambiguity

d) easy and intrinsic ambiguity

Types of Reliability: Standard error of measurement (SEM)

define as standard deviation of error around the true score; it can be used to form confidence intervals around the true score

Problem of test bias: Measurement bias

differential difficulty across groups (idea that you have different difficulty levels for different groups and more difficulty for one group than the other)

Problem of test bias: Prediction bias

differential prediction of scholastic success across groups (test predicts better than one group vs. another, predicts better for one group than the other)

Intelligence tests--theory: g- and s-factors (Spearman)

doesn't matter what tests you correlated it is found that all scores all correlate to some extent & have some common feature which is where it is overlapped in known as the g-factor (general factor) and the part that didn't correlate and don't overlap is called s-factors (specific factors);

Intelligence tests--theory: fluid vs crystallized IQ (Catell)

fluid - more genetic based, independent experience, nature vs. crystallized - nurture, comes with experience; Crystallized - verbal gets better as you get older, Fluid - physical strength peaked out at 25/30 and then they start going down (dropping)

Source of error (reliability): Real score change over time

if you are measuring for example, self esteem in test retest mode it can change from time A to time B which of course will bring rxx down (so real score change is mostly a concern for test retest

Extrinsic Ambiguity ("good")

it should be ambiguous to those outside the domain the ones that have the lower scores; D is 0.30 or higher (D>=0.30)

Types of Reliability: Test-Retest

looking at the issue of stability over time, end result of test-retest is reliability coefficient (rxx>=.80) which you hope is at least 0.80 or higher; highest possible is 1;

Approaches to cross-cultural testing: Mainstream approach

one set of norms for all [one set of norms for everybody like the KBIT & WRAT]; mostly using this today

Special Needs-- relevant public laws: IDEA (1990) & PL 94-142 (1975)

provide students with special needs a) all students should be educated in a least restrictive environment (mainstream environment), b) all children with disabilities covered under these laws should have advised annual individual education plans (IEPs) and 3) first time these terms clearly defined disorders (give the definition for diagnosing a learning disorder)

Types of validity: Face validity

purely subjective, face end content or qualitatively assessed but in the case of face validity you are asking the question: do these items at face value appear to be related to the construct? So when talking about doing clinical form of measurement like depression, you would want to manipulate face validity and have some items that are less obvious than others

Source of error (reliability): Item sampling

refers to how the items are sampled from the universal of possible items and that would affect any measure of internal consistency like Split-half, Cronbach's alpha, KR-20, KR-21, & Alternate forms

Definition of Reliability

same as the idea of consistency in measurement

Define Reliability coefficient: rxx = σ²T / σ²X

the correlation of the test score x and itself (rxx) = variance of true (σ²T / σ²X) over the variance of total scores; as rxx approaches one the test approaches perfect reliability

Maximal performance test: 2 measures of interest 2) Difficulty indices (p)

the difficulty index refers to how difficult the item is based on how many people in your sample got it right and it comes from the average of pT + pB so you take proportion top plus the proportion bottom and divide it by two; p = (pT + pB)/2

Define Validity coefficient r2xy = σ²R / σ²X

the validity coefficient r2xy is the variance of relevant divided by the variance of total score x

Types of Reliability: Internal consistency measures Equivalent Alternate Forms

this is only appropriate when tests have gone through the trouble of developing equivalent alternate forms like the WRAT there is the tan form and blue form; this is the most desirable form of internal consistency because it involves two full versions of a test

Approaches to cross-cultural testing: "Culture-fair" testing

use of non-verbal items (e.g., Catell's Culture Fair Test of Intelligence) removing the language so nonverbal norms [if language is the problem, then they do matrices and performance based test maybe have an interpreter for the verbal stuff]

Intrinsic Ambiguity ("bad kind")

when it is ambiguous to those that are inside the domain which are the ones that have the higher scores and know the material so that's bad; D is less than 0.30 (D<0.30)

Types of Reliability: Internal consistency measures Inter-item correlation Cronbach's alpha

will give you that estimate for variably scored tests another words tests where they have variable levels of scoring like Likert type scales is an example like 1, 2, 3, 4, 5 or maybe an essay where you have 10 points possible

Types of Reliability: Internal consistency measures Inter-item correlation Kuder-Richardson (KR) 20 and 21

will produce an inter-item correlation estimate for dichotomously scored tests like true/false, or correct/incorrect; KR 20 is the full version of the formula and KR 21 is the short hand version

Use of linear regression to predict criterion scores (y') from test scores (X); formula: y' = bX + a Sample Problem Given: y' = 3X+5 and SE = 2, so b=3 and a=5 in this case What is 68% confidence interval for predicting y if X = 5?

y' = 3(5)+5=15+5=20; 68%= + or - 1 so SE = 2* (+ or - 1)= + or - 2 Answer: y' = 20 + or - 2

Use of linear regression to predict criterion scores (y') from test scores (X); formula: y' = bX + a Sample Problem Given: y' = 5X+10 and SE = 3, so b=5 and a=10 in this case What is 95% confidence interval for predicting y if X=2?

y' = 5(2)+10=10+10= 20; 95%= + or - 2 so SE = 3*(+ or - 2)= + or - 6 Answer: y' = 20 + or - 6

Types of validity: Content Validity

you have the proportionate sampling of the domain or the universe possible items, you want to be sampling all aspects or dimensions of the construct and when you are looking at content validity of a maximal performance test like classroom exams for example you want to have a balance of topic and process items in fact more process items (topic refers to knowledge level and process is everything from comprehension up to evaluation) so it is desirable to have mostly process items on a test of achievement

Types of Reliability: Internal consistency measures

you want rxx>=.80 (is at least 0.80 or higher): with SEM you want to be as low as possible but when you are talking about test retest and measures of consistency then you are producing a correlation you want it to be at least .8 or higher


Related study sets

ATI RN Targeted Medical Surgical: Respiratory Online Practice 2019

View Set

Cell Bio Membrane Proteins Chapter 11

View Set

Introduction to Diagnosis & Evaluation

View Set

Lesson 10: Taxation of Investments

View Set

Structure and function of antibodies

View Set