test 2 multiple choice

¡Supera tus tareas y exámenes ahora con Quizwiz!

87. A Kuder-Richardson (KR) or split-half estimate of reliability for a speed test would provide an estimate that is A. spuriously low. B. spuriously high. C. insignificant. D. equal to a test-retest method.

spuriously high.

154. Most reliability coefficients, regardless of the specific type of reliability they are measuring, range in value from A. -1 to +1. B. 0 to 100. C. 0 to 1. D. negative infinity to positive infinity.

0 to 1.

7. Makel et al. (2012) observed that only about ____ of the published literature replicated previous work. A. 1% B. 3% C. 5% D. 7%

A. 1%

6. What has been called a "replicability crisis" in psychology emerged as a result of a number of factors. Which is not one of those factors? A. a general lack of published attempts to replicate research B. editorial preferences for papers with positive findings C. questionable research practices on the part of study authors D. unwillingness or inability of original study authors to share data

D. unwillingness or inability of original study authors to share data

145. Which of the following is TRUE about systematic and unsystematic error in the assessment of physical and psychological abuse? A. Few sources of unsystematic error exist, due to the nature of what is being assessed. B. Few sources of systematic error exist. C. Gender represents a source of systematic error. D. None of these

Gender represents a source of systematic error

78. Which BEST conveys the meaning of an inter-scorer reliability estimate of .90? A. Ninety percent of the scores obtained are reliable. B. Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error. C. Ten percent of the variance in the scores assigned by the scorers was attributed to true differences and 90% to error. D. Ten percent of the test's items are in need of revision according to the majority of the test's users.

Ninety percent of the variance in the scores assigned by the scorers was attributed to true differences and 10% to error.

102. "Sixty-eight percent of the scores for a particular test fall between 58 and 61" is a statement regarding A. the utility of a test. B. the reliability of a test. C. the validity of a test. D. None of these

None of these

33. Which would NOT be useful in estimating a test's inter-item consistency? A. Cronbach's alpha B. the Kuder-Richardson formulas C. the average proportional distance D. a coefficient of equivalence

a coefficient of equivalence

175. IRT is a term used to refer to A. a model that has many parameters. B. a parameter that has many models. C. a family of models for data analysis. D. a dysfunctional family of models.

a family of models for data analysis.

84. Typically, speed tests A. contain items of a uniform difficulty level. B. are completed by fewer than 1% of all test-takers. C. have low validity coefficients. D. yield high rates of false positives.

contain items of a uniform difficulty level.

89. The Spearman-Brown formula can be used for which types of tests? A. speed and multiple-choice B. true-false and multiple-choice C. speed, true-false, and multiple-choice D. road or driving tests

speed, true-false, and multiple-choice

158. A test of infant development contains three scales: (1) Cognitive Ability, (2) Motor Development, and (3) Behavior Rating. Because these three scales are designed to measure different characteristics (that is, they are not homogeneous), it would be inappropriate to combine the three scales in calculating estimates of the test's A. alternate-forms reliability. B. internal-consistency reliability. C. test-retest reliability. D. interrater reliability.

. internal-consistency reliability.

15. With critical variables in a research study held constant, different methods used to estimate reliability will typically yield A. virtually no differences in the magnitude of the estimate. B. sizable differences in the magnitude of the estimate. C. skewed estimates of reliability. D. identical estimates of reliability.

. sizable differences in the magnitude of the estimate

103. The standard error of measurement of a particular test of anxiety is 8. A student earns a score of 60. What is the confidence interval for this test score at the 95% level? A. 52-68 B. 40-68 C. 44-76 D. 36-84

44-76

132. A test of attention span has a reliability coefficient of .84. The average score on the test is 10, with a standard deviation of 5. Lawrence received a score of 64 on the test. We can be 95% sure that Lawrence's "true" attention span score falls between A. 63 and 65. B. 62 and 66. C. 60 and 68. D. 54 and 74.

54 and 74.

10. Which is an example of what is referred to as "QRP" in your textbook? A. collecting additional data to reach statistical significance B. over-reporting of data with excessive detail C. telling subjects in a control group that they need not participate D. requesting detailed data from the original study author

A. collecting additional data to reach statistical significance

61. The Spearman-Brown formula is used for A. correcting for one half of the test by estimating the reliability of the whole test. B. determining how many additional items are needed to increase reliability up to a certain level. C. determining how many items can be eliminated without reducing reliability below a predetermined level. D. All of these

All of these

68. KR-20 is the statistic of choice for tests with which types of items? A. multiple-choice B. true-false C. All of these D. None of these

All of these

77. A synonym for inter-scorer reliability is A. inter-judge reliability B. observer reliability C. inter-rater reliability D. All of these

All of these

81. If a test is homogeneous A. it is functionally uniform throughout. B. it will likely yield a high internal-consistency reliability estimate compared with a test-retest reliability estimate. C. it would be reasonable to expect a high degree of internal consistency. D. All of these

All of these

43. Which of the following is true of systematic error? A. It significantly lowers the reliability of a measure. B. It insignificantly lowers the reliability of a measure. C. It increases the reliability of a measure. D. It has no effect on the reliability of a measure.

It has no effect on the reliability of a measure

76. Which of the following statements is TRUE about coefficient alpha? A. Kuder thought it to be single best measure of reliability. B. It was first conceived by Alfalfa Alpha. C. It is a characteristic of a particular set of scores, not of the test itself. D. None of these

It is a characteristic of a particular set of scores, not of the test itself.

16. Generally, diagnostic reliability is necessary. However, which of the following is NOT a reason that diagnostic reliability is necessary? A. It is necessary for accurate diagnosis. B. It is necessary for any double-blind study. C. It is necessary to determine the effectiveness of treatments. D. It is necessary to track changes in a disorder over time.

It is necessary for any double-blind study.

139. A test containing 100 items is revised by deleting 20 items. What might be expected to happen to the magnitude of the reliability estimate for that test? A. It will be expected to increase. B. It will be expected to decrease. C. It will be expected to stay the same. D. It cannot be determined based on the information provided.

It will be expected to decrease.

115. If a device to measure blood pressure consistently overestimated every assessee's actual blood pressure by 10 units, which of the following could reasonably be expected to be TRUE of the reliability of this measuring device as the years passed? A. It would increase. B. It would decrease. C. It would not be affected. D. It would alternately decrease and increase.

It would not be affected.

116. In general, which of the following is TRUE of the relationship between the magnitude of the test-retest reliability estimate and the length of the interval between test administrations? A. The longer the interval, the lower the reliability coefficient. B. The longer the interval, the higher the reliability coefficient. C. The magnitude of the reliability coefficient is typically not affected by the length of the interval between test administrations. D. The magnitude of the reliability coefficient is always affected by the length of the interval between test administrations, but one cannot predict how it is affected.

The longer the interval, the lower the reliability coefficient.

53. Which of the following is TRUE for parallel forms of a test? A. The means of the observed scores are equal for the two forms. B. The variances of the estimated scores are equal for the two forms. C. The means and variances of the observed scores are equal for the two forms. D. The means and variances of the estimated scores are equal for the two forms.

The means and variances of the observed scores are equal for the two forms.

71. Which of the following is, generally speaking, the preferred statistic for obtaining a measure of internal-consistency reliability? A. KR-20 B. KR-21 C. Kendall's Tau D. coefficient alpha

coefficient alpha

174. In the term latent trait theory, "latent" is a synonym for A. invisible. B. state. C. undeveloped. D. dormant.

invisible.

181. A dichotomous test item is one that A. presents the testtaker with a dichotomy. B. is frequently used in Common Core reading subtests. C. is exemplified by a True/False item. D. All of these

is exemplified by a True/False item.

62. For a heterogeneous test, measures of internal-consistency reliability will tend to be ________ compared with other methods of estimating reliability. A. higher B. lower C. very similar or higher D. more robust

lower

32. The more homogeneous a test is, the A. less inter-item consistency it can be expected to have. B. more utility the test has for measuring multifaceted variables. C. more inter-item consistency it can be expected to have. D. None of these

more inter-item consistency it can be expected to have.

151. The items of a personality test are characterized as heterogeneous in nature. This tells us that the test measures A. aspects of family history. B. ability to relate to the opposite sex. C. unconscious motivation. D. more than one trait.

more than one trait.

57. Test-retest estimates of reliability are referred to as measures of ________, and split-half reliability estimates are referred to as measures of ________. A. true scores; error scores B. internal consistency; stability C. inter-scorer reliability; consistency D. stability; internal consistency

stability; internal consistency

107. Which statistic can help the test user determine how large a difference must exist for scores yielded from two different tests to be considered statistically different? A. standard error of measurement between two scores B. standard error of the difference between two scores C. observed variance minus error variance D. standard error of the difference between two means

standard error of the difference between two score

75. A coefficient alpha over .9 may indicate that A. the items in the test are too dissimilar. B. the test is not reliable. C. the items in the test are redundant. D. the test is biased against low-ability individuals.

the items in the test are redundant.

129. The index that allows a test user to compare two people's scores on a specific test to determine if the true scores are likely to be different is A. the standard error of the mean. B. the standard error of the difference. C. the standard deviation. D. the correlation coefficient.

the standard error of the difference.

168. Which is the BEST example of a dynamic characteristic? A. the stress level of a trapeze flyer at a circus B. the intelligence of a college student during Spring Break C. the anti-authority attitude of an inmate serving a life term D. None of these

the stress level of a trapeze flyer at a circus

133. By definition, estimates of reliability can range from _______. A. -3.00 to +3.00 B. 1 to10 C. 0 to 1 D. -1 to 1

0 to 1

50. An estimate of test-retest reliability is often referred to as a coefficient of stability when the time interval between the test and retest is more than A. 30 days. B. 60 days. C. 3 months. D. 6 months.

6 months.

147. Manuel earns a 90 on a standardized math test. The standard error of measurement for this test is 5. Approximately 95% of the scores fall between ______________. A. 85 and 95. B. 80 and 100. C. 80 and 100. D. Cannot determine based on the information provided.

80 and 100.

146. In general, approximately what percentage of scores would be expected to fall within two standard deviations above or below the standard error of measurement of the "true score" on a test? A. 85% B. 90% C. 95% D. 99%

95%

13. Which of the following is the best remedy for QRPs? A. pre-registration B. registration C. post-registration D. self-correction

A. pre-registration

143. Which is NOT a possible source of error variance? A. test administration B. test scoring C. test interpretation D. All are possible sources of error variance.

All are possible sources of error variance.

170. Test items with little discriminative ability prompt the test developer to consider the possibility that A. the content of the item does not match the construct measured by the other items in the scale. B. the item is poorly worded and needs to be rewritten. C. the item is too complex for the educational level of the population. D. All of these

All of these

19. Prior to DSM-5, a problem with the primary method used to estimate reliability of the DSM was that the method A. did not allow for truly independent judgments. B. resulted in overestimates of reliability. C. artificially constrained information provided to clinicians. D. All of these

All of these

29. Error in the reporting of spousal abuse may result from A. one partner simply forgets all of the details of the abuse. B. one partner misunderstands the instructions for reporting. C. one partner is ashamed to report the abuse. D. All of these

All of these

40. The standard error of measurement is A. used to infer how far an observed score is from the true score. B. also known as the standard error of a score. C. is used in the context of classical test theory. D. All of these

All of these

42. A reliability coefficient is A. an index. B. a proportion of the total variance attributed to true variance. C. unaffected by a systematic source of error. D. All of these

All of these

46. A source of error variance may take the form of A. item sampling. B. testtakers' reactions to environment-related variables such as room temperature and lighting. C. testtaker variables such as amount of sleep the night before a test, amount of anxiety, or drug effects. D. All of these

All of these

51. Which of the following might lead to a decrease in test-retest reliability? A. the passage of time between the two administrations of the test. B. coaching designed to increase test scores between the two administrations of the test. C. practice with similar test materials between the two administrations of the test. D. All of these

All of these

52. Which of the following is TRUE for estimates of alternate- and parallel-forms reliability? A. Two test administrations with the same group are required. B. Test scores may be affected by factors such as motivation, fatigue, or intervening events like practice, learning, or therapy. C. Item sampling is a source of error variance. D. All of these

All of these

59. Which of the following factors may influence a split-half reliability estimate? A. fatigue B. anxiety C. item difficulty D. All of these

All of these

85. Which type(s) of reliability estimates would be appropriate for a speed test? A. test-retest B. alternate-form C. split-half from two independent testing sessions D. All of these

All of these

117. What is the difference between alternate forms and parallel forms of a test? A. Alternate forms do not necessarily yield test scores with equal means and variances. B. Alternate forms are designed to be equivalent only with regard to level of difficulty. C. Alternate forms are different only with respect to how they are administered. D. There are no differences between alternate and parallel forms of a test.

Alternate forms do not necessarily yield test scores with equal means and variances.

66. Which of the following is NOT an acceptable way to divide a test when using the split-half reliability method? A. Randomly assign items to each half of the test. B. Assign odd-numbered items to one half and even-numbered items to the other half of the test. C. Assign the first-half of the items to one half of the test and the second half of the items to the other half of the test. D. Assign easy items to one half of the test and difficult items to the other half of the test.

Assign easy items to one half of the test and difficult items to the other half of the test.

9. According to Chin, as cited in your textbook, a lack of replicability in psychology affects the work of A. the police. B. judges. C. court clerks. D. All of these

B. judges

11. Unreliable findings that reach general acceptance in the academic community A. tend to self-correct. B. tend to linger too long. C. become exposed through social media. D. are not admissible in a court of law.

B. tend to linger too long

1. According to Gil et al. (2016), which of the following is a source of error in scores on psychological tests? A. whether or not the examiner has a beard B. whether the testtaker's country is at war or peace C. the body weight of the testtaker two weeks prior to the test D. None of these

B. whether the testtaker's country is at war or peace

45. Why might ability test scores among testtakers most typically vary? A. because of the true ability of the testtaker B. because of irrelevant, unwanted influences C. Both because of the true ability of the testtaker and because of irrelevant, unwanted influences D. None of these

Both because of the true ability of the testtaker and because of irrelevant, unwanted influences

125. Which of the following is TRUE of both the standard error of measurement and the standard error of difference? A. Both provide confidence levels. B. Both can be used to compute confidence intervals for short answer tests. C. Both can be used to compare performance between two different tests. D. Both are abbreviated by SEM.

Both can be used to compare performance between two different tests.

177. The Rasch model A. was developed by a Danish mathematician named Rasch. B. is an IRT model with specific assumptions about the underlying distribution. C. was devised from generalizability theory. D. Both was developed by a Danish mathematician named Rasch and is an IRT model with specific assumptions about the underlying distribution.

Both was developed by a Danish mathematician named Rasch and is an IRT model with specific assumptions about the underlying distribution.

5. In 2015, a report on attempted replications published in Science, noted that, depending on the criteria used, ___ of the replications found the same results as the original study. A. 0% B. 20 to 40% C. 40 to 60% D. 100%

C. 40

4. In 2015, a group of researchers attempted to replicate 100 peer-reviewed, published psychological studies. This group of researchers was called the A. Society for the Replication of Psychological Studies. B. Scientists for the Abolition of Irreproducible Results. C. Open Science Collaboration. D. Coalition for Responsible Science.

C. Open Science Collaboration

3. Berman et al. (2015) observed that one source of error in evaluations the suicide risk of patients is A. whether or not the patient has previously attempted suicide. B. whether or not the clinician previously had a patient attempt suicide. C. how religious the evaluating clinician is. D. how religious the evaluated patient is.

C. how religious the evaluating clinician is.

8. Replication of research by independent parties provides for A. confidence in study findings. B. confirmation that the study findings were not an anomaly. C. confidence that the study findings were not the result of the original experimenter's biases. D. All of these

D. All of these

2. Hawkins et al. (2016) found that subjects with ____ fasting glucose levels made nearly ____ times as many errors as subjects with fasting glucose levels in the normal range. A. low; one-quarter B. high; one-quarter C. high; four times D. high; twice

D. high; twice

14. As compared to what was business as usual in the past, more researchers are coming to the realization that replication is A. really not as necessary as what researchers once thought. B. not something that can ever completely "right" past wrongs. C. mandatory given the influence of social media. D. needed if published findings are to be relied on.

D. needed if published findings are to be relied on.

123. If a student received a score of 50 on a math test with a standard error of measurement of 3, which of the following statements would be TRUE of the "true score"? A. In 68% of the cases, the "true score" would be expected to be between 44 and 56. B. In 68% of the cases, the "true score" would be expected to be between 47 and 53. C. In 95% of the cases, the "true score" would be expected to be between 47 and 53. D. In 95% of the cases, the "true score" would be expected to be between 44 and 56.

In 68% of the cases, the "true score" would be expected to be between 47 and 53.

128. Which of the following statements is TRUE regarding the differences between a power test and a speed test? A. Power tests involve physical strength; speed tests do not. B. In a power test, the testtaker has time to complete all items; in a speed test, a specific time limit is imposed. C. In a power test, a broad range of knowledge is assessed; in a speed test, a narrower range of knowledge is assessed. D. Both in a power test, the testtaker has time to complete all items; in a speed test, a specific time limit is imposed and in a power test, a broad range of knowledge is assessed; in a speed test, a narrower range of knowledge is assessed.

In a power test, the testtaker has time to complete all items; in a speed test, a specific time limit is imposed.

70. Many assumptions must be met when using KR-21 to estimate reliability. Which is NOT such an assumption? A. Items should be dichotomous. B. Items should be of equal difficulty. C. Items should be homogeneous. D. Items should be scorable by computer.

Items should be scorable by computer

25. Which is TRUE of measurement error? A. Like error in general, measurement error may be random or systematic. B. Unlike error in general, measurement error may be random or systematic. C. Measurement error is always random. D. Measurement error is always systematic.

Like error in general, measurement error may be random or systematic.

121. For criterion-referenced tests, which of the following reliability estimates is recommended? A. test-retest reliability estimates B. alternate-form reliability estimates C. split-half reliability estimates D. None of these

None of these

153. With regard to a value found for coefficient alpha, A. "bigger is always better." B. "smaller is always better." C. "negative is best." D. None of these

None of these

171. The fact that cultural factors may be operating to weaken an item's ability to discriminate between groups is evident from A. a general lack of reliability in culture-specific tests. B. latent trait theory. C. Georg Rasch's unauthorized biography, You Can Never Be Too Rich or Too "Rasch." D. None of these

None of these

63. Typically, adding items to a test will have what effect on the test's reliability? A. Reliability will decrease. B. Reliability will increase. C. Reliability will stay the same. D. Reliability will first increase and then decrease.

Reliability will increase.

22. In their study of the diagnostic reliability of DSM-IV diagnoses, Chmielewsi et al. (2015) used the "gold standard" in diagnostic instruments. The tool they used was the A. MAST-2. B. SCID I/P. C. SCI-5. D. Semi-Structured Diagnostic Interview (SSDI).

SCID I/P.

137. Which of the following is TRUE of the standard error of measurement? A. The larger the standard error of measurement, the better. B. The standard error of measurement is inversely related to the standard deviation (that is, when one goes up, the other goes down). C. The standard error of measurement is inversely related to reliability (that is, when one goes up, the other goes down). D. A low standard error of measurement is indicative of low validity.

The standard error of measurement is inversely related to reliability (that is, when one goes up, the other goes down).

161. Because of the unique problems in assessing very young children, which of the following would be the BEST practice when attempting to estimate the reliability of tests designed to measure cognitive and motor abilities in infants? A. Use relatively short test-retest intervals. B. Use relatively long test-retest intervals. C. Do not use the test-retest method for estimating reliability of the test. D. Use only inter-scorer reliability estimates.

Use relatively short test-retest intervals.

172. Which is TRUE about reliability in the psychometric sense? A. reliability is an all-or-none measurement B. a test may be reliable in one context and unreliable in another C. a reliability coefficient may not be derived for personality tests D. alternate forms reliability may not be derived for personality tests

a test may be reliable in one context and unreliable in another

134. Using estimates of internal consistency, which of the following tests would likely yield the highest reliability coefficients? A. a test of general intelligence B. a test of achievement in a basic skill such as mathematics C. a test of reading comprehension D. a test of vocational interest

a test of reading comprehension

48. Which type of reliability estimate is obtained by correlating pairs of scores from the same person (or people) on two different administrations of the same test? A. a parallel-forms estimate B. a split-half estimate C. a test-retest estimate D. an au-paire estimate

a test-retest estimate

152. "Coefficient alpha 20" is a reference to A. a variant of the Kuder-Richardson KR-20 formula. B. the 20th in a series of formulas developed by Cronbach. C. a 20th-century revision of a Galtonian expression. D. None of these

a variant of the Kuder-Richardson KR-20 formula.

12. According to Chin, it is common for research findings to be A. rejected for not having met the general acceptance standard. B. accepted as having met the general acceptance standard. C. accepted by a judge but then rejected by an expert witness. D. accepted by an expert witness but then rejected by a judge.

accepted as having met the general acceptance standard.

148. In Chapter 5 of your textbook, you read of the writing surface on a school desk that was "riddled with heart carvings, the legacy of past years' students who felt compelled to express their eternal devotion to someone now long forgotten." This imagery was designed to graphically illustrate sources of error variance during test A. development. B. administration. C. scoring. D. interpretation.

administration.

135. What type of reliability estimate is appropriate for use in a comparison of "Form A" to "Form B" of a picture vocabulary test? A. test-retest B. alternate-forms C. inter-rater D. internal-consistency

alternate-forms

178. Why isn't IRT used more by "mom-and-pop" test developers such as classroom teachers? A. most classroom teachers were trained in generalizability theory B. IRT has no application in classroom tests C. applying IRT requires statistical sophistication D. All of these

applying IRT requires statistical sophistication

18. Prior to research on inter-rater reliability for DSM-5, DSM inter-rater reliability estimates were obtained using the ____ method. A. test-retest B. paired-paragraph C. audio-recording D. one-way mirror

audio-recording

95. The fact that the length of a test influences the size of the reliability coefficient is based on which theory of measurement? A. classical test theory (CTT) B. generalizability theory C. domain sampling theory D. item response theory (IRT)

classical test theory (CTT)

79. When more than two scorers are used to determine inter-scorer reliability, the statistic of choice is A. Pearson r. B. Spearman's rho. C. KR-20. D. coefficient alpha.

coefficient alpha.

179. Who are the primary users of IRT? A. classroom teachers B. commercial test developers C. instructors at universities in Departments of Education D. life insurance actuaries

commercial test developers

41. Reliability, in a broad statistical sense, is synonymous with A. consistently good. B. consistently bad. C. consistency. D. validity.

consistency.

105. As the reliability of a test increases, the standard error of measurement A. increases. B. decreases. C. remains the same. D. alternately increases, then decreases.

decreases

34. Cronbach's alpha is to similarity of scores on test items as average proportional distance is to A. difference in scores on test items. B. inter-item consistency. C. test-retest reliability. D. parallel forms reliability.

difference in scores on test items.

31. The term test heterogeneity BEST refers to the extent to which test items measure A. different factors. B. the same factor. C. a unifactorial trait. D. a nonhomogeneous trait.

different factors.

21. Which of the following terms is used in your textbook to describe the test-retest method of estimating diagnostic reliability? A. methodologically sound B. artificially constrained C. psychometrically balanced D. ecologically valid

ecologically valid

72. Coefficient alpha is appropriate to use with all of the following test formats EXCEPT A. multiple-choice. B. true-false. C. short-answer for which partial credit is awarded. D. essay exam with no partial credit awarded.

essay exam with no partial credit awarded.

91. Use of the Spearman-Brown formula would be inappropriate to A. estimate the effect on reliability of shortening a test. B. determine the number of items needed in a test to obtain the desired level of reliability. C. estimate the internal consistency of a speed test. D. All of these

estimate the internal consistency of a speed test

28. A research study entails behavioral observation and rating of front desk clerks in the hospitality industry to determine whether or not they greet guests with a smile. Which type of error is this test most susceptible to? A. test administration error B. test construction error C. examiner-related error D. polling error

examiner-related error

17. Field trials of DSM-5 demonstrated a mean kappa that was indicative of a ______ level of agreement among raters. A. poor B. fair C. good D. "kinder and gentler"

fair

155. All indices of reliability provide an index that is a characteristic of a particular A. test. B. group of test scores. C. trait. D. approach to measurement.

group of test scores.

88. A measure of clerical speed is obtained by a test that has respondents alphabetize index cards. The manual for this test cites a split-half reliability coefficient for a single administration of the test of .95. What might you conclude? A. The test is highly reliable. B. The published reliability estimate is spuriously low and would have been higher had another estimate been used. C. The split-half estimate should not have been used in this instance. D. Clerical speed is too vague a construct to measure.

he split-half estimate should not have been used in this instance.

64. Error variance for measures of inter-item consistency comes from A. fatigue. B. motivation. C. a testtaker practice effect. D. heterogeneity of the content.

heterogeneity of the content.

106. If the standard deviations of two tests are identical but the reliability is lower for Test A as compared to Test B, then the standard error of measurement will be ________ for Test A as compared with Test B. A. higher B. lower C. the same D. hard to tell

higher

163. If the variance of either variable in a correlational analysis is inflated by the sampling procedure used, then the resulting correlation coefficient tends to be A. higher B. lower. C. unaffected. D. unstable.

higher

67. If items on a test are measuring very different traits, estimates of reliability yielded from split-half methods will typically be ________ as compared with estimates from KR-20. A. higher B. lower C. similar D. approximately the same

higher

182. Latent trait models differ from classical test theory (CTT) in many key ways including the fact that A. in CTT, no assumptions are made about the frequency distribution of test scores. B. latent trait models do not presume that test items can carry with them different "weight." C. latent trait models typically provide for follow-up studies to support the existence of the presumed trait. D. All of these

in CTT, no assumptions are made about the frequency distribution of test scores.

104. As the confidence interval increases, the range of scores into which a single test score falls is likely to A. decrease. B. increase. C. remain the same. D. alternately decrease and increase.

increase.

169. As used in Chapter 5 of your text, the term inflation of variance is synonymous with A. restriction of variance. B. restriction of range. C. inflation of range. D. None of these

inflation of range.

56. What term refers to the degree of correlation between all the items on a scale? A. inter-item homogeneity B. inter-item consistency C. inter-item heterogeneity D. parallel-form reliability

inter-item consistency

164. The directions for scoring a particular motor ability test instruct the examiner to "Give credit if the child holds his hands open most of the time." Because what constitutes "most of the time" is not specifically defined, directions such as these could result in lowered reliability estimates for A. test-retest reliability. B. alternate-form reliability. C. inter-rater reliability. D. parallel forms reliability.

inter-rater reliability

127. A police officer administers a breathalyzer test to a suspected drunk driver, does not put on his glasses to read the meter, and as a result, mistakenly records the blood alcohol level. This is the kind of mistake that is BEST associated with which type of reliability estimates? A. test-retest B. inter-scorer C. internal-consistency D. situational

inter-scorer

130. Which type of reliability is directly affected by the heterogeneity of a test? A. test-retest B. inter-rater C. internal-consistency D. alternate-forms or parallel-forms

internal-consistency

39. A confidence interval is a range or band of test scores that A. has proven test-retest reliability. B. is calculated using the standard error of the difference. C. is likely to contain the true score. D. None of these

is likely to contain the true score.

93. Traditional measures of reliability are inappropriate for criterion-referenced tests because variability A. is maximized with criterion-referenced tests. B. is minimized with criterion-referenced tests. C. is variable with criterion-referenced tests. D. cannot be determined with criterion-referenced tests.

is minimized with criterion-referenced tests.

54. Which source of error variance affects parallel- or alternate-form reliability estimates but does not affect test-retest estimates? A. fatigue B. learning C. practice D. item sampling

item sampling

35. One of the problems associated with classical test theory has to do with A. the notion that there is a "true score" on a test has great intuitive appeal. B. the fact that CTT assumptions are often characterized as "weak." C. its assumptions concerning the equivalence of all items on a test. D. its assumptions allow for its application in most situations.

its assumptions concerning the equivalence of all items on a test.

165. A vice president (VP) of personnel employs a "Corporate Screening Test" in the hiring process. For future testing purposes, the VP maintains records of scores achieved by __________ as opposed to ___________ in order to avoid restriction-of-range effects. A. job applicants; hired employees B. hired employees; job applicants C. successful employees; hired employees D. successful employees; other corporate officers

job applicants; hired employees

65. If items from a test are measuring the same trait, estimates of reliability yielded from split-half methods will typically be ________ as compared to estimates from KR-20. A. higher B. lower C. similar D. approximately the same

lower

162. If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be A. higher. B. lower. C. unaffected. D. unstable.

lower.

120. If the variance of either variable is restricted by the sampling procedures used, then the magnitude of the coefficient of reliability will be A. lowered. B. raised. C. unaffected. D. affected only in tests with a true-false format.

lowered.

144. A goal of a test developer is to A. maximize error variance. B. minimize true variance. C. maximize true variance. D. minimize stress for testtakers.

maximize true variance.

157. Different types of reliability coefficients A. all reflect the same sources of error variance. B. may reflect different sources of error variance. C. never reflect the same source of error variance. D. reflect on error variance during leisure activities.

may reflect different sources of error variance

55. Which of the following types of reliability estimates is the most expensive due to the costs involved in test development? A. test-retest B. parallel-form C. internal-consistency D. Spearman's rho

parallel-form

38. The multiple-choice test items on this examination (yes, the one that your taking right at this moment) are all examples of A. dichotomous test items. B. latent trait test items. C. polytomous test items. D. None of these

polytomous test items

83. If a time limit is long enough to allow test-takers to attempt all items, and if some items are so difficult that no test-taker is able to obtain a perfect score, then the test is referred to as a ________ test. A. speed B. power C. reliable D. valid

power

26. This variety of error has also been referred to as "noise." It is A. systematic error. B. random error. C. measurement error. D. background error.

random error.

27. A Wall Street Securities firm that is actually located on Wall Street is testing a group of candidates for their aptitude in finance and business. As the testing begins, an unexpected "Occupy Wall Street" sit-in takes place. From a psychometric perspective in the context of this testing, the sit-in is viewed as A. systematic error. B. random error. C. test administration error. D. background error.

random error.

126. The meaning of reliability in the psychometric sense differs from the meaning of reliability in the "every day" use of that word in that A. reliability in the "every day sense" is usually "a good thing." B. reliability in the psychometric sense is usually "a good thing." C. reliability in the psychometric sense has greater implications. D. None of these

reliability in the "every day sense" is usually "a good thing."

141. The greater the proportion of the total variance attributed to true variance, the more ____________ the test. A. scientific B. variable C. reliable D. expensive

reliable

36. Which of the following is NOT an alternative to classical test theory cited in your text? A. generalizability theory B. representational theory C. domain sampling theory D. latent trait theory

representational theory

47. Computer-scorable items have tended to eliminate error variance due to A. item sampling. B. scorer differences. C. content sampling. D. testtakers' reactions to environmental variables.

scorer differences.

86. Which of the following would result in the LEAST appropriate estimate of reliability for a speed test? A. test-retest B. alternate-form C. split-half from a single administration of the test D. split-half from two independent testing sessions

split-half from a single administration of the test

94. If traditional measures of reliability are applied to a criterion-referenced test, the reliability estimate will likely be A. spuriously low. B. spuriously high. C. exactly zero. D. None of these

spuriously high.

119. In which type(s) of reliability estimates would test construction NOT be a significant source of error variance? A. test-retest B. alternate-form C. split-half D. Kuder-Richardson

test-retest

138. What type of reliability estimate is obtained by correlating pairs of scores from the same person on two different administrations of the same test? A. parallel-forms B. split-half C. interrater D. test-retest

test-retest

49. Which type of reliability estimate would be appropriate only when evaluating the reliability of a test that measures a trait that is presumed to be relatively stable over time? A. parallel-forms B. alternate-forms C. test-retest D. split-half

test-retest

82. Which type(s) of reliability estimates would be most appropriate for a measure of heart rate? A. test-retest B. alternate-form C. parallel form D. internist consistency

test-retest

159. The fact that young children develop rapidly and in "growth spurts" is a problem when it comes to the estimation which type of reliability for an infant development scale? A. internal-consistency reliability B. alternate-forms reliability C. test-retest reliability D. interrater reliability

test-retest reliability

118. Coefficient alpha is the reliability estimate of choice for tests A. with dichotomous items and binary scoring. B. with homogeneous items. C. that can be scored along a continuum of values. D. that contain heterogeneous item content and binary scoring.

that can be scored along a continuum of values.

176. A polytomous test item is a test item A. that has multiple tomous's attached. B. that may have more than one choice keyed correct. C. that is exemplified by this test question. D. None of these

that is exemplified by this test question

136. What index of reliability would be BEST use to compare two evaluators' assessments of a group of job applicants? A. KR-20 B. coefficient alpha C. the Kappa statistic D. the Spearman-Brown correction

the Kappa statistic

80. For determining the reliability of tests scored using nominal scales of measurement, the statistic of choice is A. Kendall's Tau. B. the Kappa statistic. C. KR-20. D. coefficient alpha.

the Kappa statistic.

24. In an illustrative scenario described in Chapter 5 of your text, a group of 12th grade "whiz kids" in math, newly arrived to the United States from China, perform poorly on a test of 12th grade math. According to the text, what probably accounted for this? A. lower standards in China as compared to the US for measuring math ability. B. higher standards in the US as compared to China for earning high grades. C. the ability of the Chinese students to read what was required in English. D. the reliability of the instrument used to test 12th grade math skills.

the ability of the Chinese students to read what was required in English.

30. Stanley (1971) wrote that in classical test theory, a so-called "true score" is "not the ultimate fact in the book of the recording angel." By this, Stanley meant that A. it would be imprudent to trust in Divine influence when estimating variance. B. the amount of test variance that is true relative to error may never be known. C. it is near impossible to separate fact from fiction with regard to "true scores." D. All of these

the amount of test variance that is true relative to error may never be known.

90. An estimate of the reliability of a speed test is a measure of A. the stability of the test. B. the consistency of the response speed. C. the homogeneity of the test items. D. All of these

the consistency of the response speed.

180. In the context of IRT, the term discrimination best refers to A. the degree to which an item differentiates people with respect to some cultural variable. B. the utility of a test used in personnel selection. C. the degree to which internal consistency compares to selection precision. D. the degree to which an item differentiates people with respect to a trait being measured.

the degree to which an item differentiates people with respect to a trait being measured.

92. Interpretations of criterion-referenced tests are typically made with respect to A. the total number of items the examinee responded to. B. the material that the examinee evidenced mastery of. C. a comparison of the examinee's performance with that of others who took the test. D. a formula that takes into account the total number of items for which no response was scorable.

the material that the examinee evidenced mastery of.

74. Coefficient alpha is an expression of A. the mean of split-half correlations between odd- and even-numbered items. B. the mean of split-half correlations between first- and second-half items. C. the mean of all possible split-half correlations. D. the mean of the best or "alpha" level split-half correlations.

the mean of all possible split-half correlations.

108. The standard error of the difference between two scores is larger than the standard error of measurement for either score because the standard error of the difference between the two scores is affected by A. the true score variance of each score. B. the standard deviation of each score summed. C. the measurement error inherent in both scores. D. All of these

the measurement error inherent in both scores.

73. The "20" and "21" in KR-20 and KR-21 represent A. numbers held constant in the denominator. B. numbers held constant in the numerator. C. the order in which the formulas were created. D. the age of Fred Kuder's sons when the formulas were developed.

the order in which the formulas were created.

160. In the language of psychological testing and assessment, reliability BEST refers to A. how well a test measures what it was originally designed to measure. B. the complete lack of any systematic error. C. the proportion of total variance that can be attributed to true variance. D. whether or not a test publisher consistently publishes high quality instruments.

the proportion of total variance that can be attributed to true variance.

156. The precise amount of error inherent in the reliability estimate published in a test manual will vary with A. the purchase price of the test (the more expensive, the less the error). B. the sample of test-takers from which the data were drawn. C. the population of test user actually using a published test. D. All of these

the sample of test-takers from which the data were drawn.

124. A psychologist administers a test and the test-taker scores a 52. If the cut-off score for eligibility for a particular program is 50, what index will best help the psychologist determine how much confidence to place in the test-taker's obtained score of 52? A. the standard error of difference B. the standard error of measurement C. measures of central tendency: mean, median, or mode D. measures of variability such as the standard deviation

the standard error of measurement

100. The standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests is A. the standard error of the difference between means. B. the standard error of measurement. C. the standard deviation of the reliability coefficient. D. the variance.

the standard error of measurement.

109. A guidance counselor wishes to determine if a student scored higher on a mathematics test than on a reading test. What statistic(s) would be MOST useful? A. the standard error of measurement for each test score B. the standard error of the difference between two scores C. the raw score on each test as well as the mean of each distribution D. the mean of each distribution and index of test difficulty for each test

the standard error of the difference between two score

23. In classical test theory, an observed score on an ability test is presumed to represent the testtaker's A. true score. B. true score less the variance. C. true score combined with extraneous factors. D. the testtaker's true score and error.

the testtaker's true score and error

20. In the test-retest method to estimate reliability A. the time frame between interviews must be relatively short. B. separate interviews are conducted by certified raters. C. a minimum of two re-tests are required. D. All of these

the time frame between interviews must be relatively short.

142. A score earned by a testtaker on a psychological test may BEST be viewed as equal to A. the raw score plus the observed score. B. the error score. C. the true score. D. the true score plus error.

the true score plus error.

140. In the formula X = T + E, T refers to A. the true score. B. the time factor. C. the average test score. D. test-retest reliability.

the true score.

101. Which of the following is NOT a part of the formula for the standard error of measurement for a particular test? A. the validity of the test B. the reliability of the test C. the standard deviation of the group of test scores D. Both the reliability of the test and the standard deviation of the group of test scores

the validity of the test

58. Which of the following is usually minimized when using split-half estimates of reliability as compared with test-retest or parallel/alternate-form estimates of reliability? A. time and expense B. reliability and validity C. reliability only D. time spent in scoring and interpretation

time and expense

69. The KR-21 reliability estimate was developed A. to yield greater consistency in reliability coefficients. B. to facilitate computation by hand. C. for use with less homogeneous items. D. because Kuder wanted to "one-up" Richardson's 20.

to facilitate computation by hand.

44. As the degree of reliability increases, the proportion of A. total variance attributed to true variance decreases. B. total variance attributed to true variance increases. C. total variance attributed to error variance increases. D. None of these

total variance attributed to true variance increases.

60. Internal-consistency estimates of reliability are inappropriate for A. reading achievement tests. B. scholastic aptitude/intelligence tests. C. word processing tests based on speed. D. tests purporting to measure a single personality trait.

word processing tests based on speed.


Conjuntos de estudio relacionados

A&P I Chp. 16 - Types of Receptors, Stimulus, and Localization of Pain

View Set

Chapter 14: Users, Groups and Permissions

View Set

Psych 101 final study guide Quiz 1-2

View Set

Real Deal Anatomy Test Study Set

View Set