Test

Ace your homework & exams now with Quizwiz!

A reliability coefficient of .60 indicates that ___ of variability in test scores is true score variability. Select one: A. 60% B. 40% C. 36% D. 16%

Answer A is correct. A reliability coefficient is interpreted directly as a measure of true score variability. A reliability coefficient of .60 indicates that 60% of the variability in scores is true score variability, while the remaining 40% of the variability is due to measurement (random) error.

Which of the following item difficulty (p) levels maximizes the differentiation of examinees into high- and low-performing groups? Select one: A. 0.5 B. 0.9 C. 1.5 D. 0

Answer A is correct. An item difficulty level (p) ranges in value from 0 to +1.0 with a value of 0 indicating a very difficult item and a value of +1.0 indicating a very easy item. A difficulty index of .50 indicates that 50% of examinees in the try-out sample answered the item correctly. When p equals .50, this means that the item provides maximum differentiation between the upper- and lower-scoring examinees - i.e., a large proportion of examinees in the upper group answered the item correctly, while a small proportion of examinees in the lower group answered it correctly.

Stella S. obtains a score of 50 on a test that has a standard deviation of 10 and a standard error of measurement of 5. The 95% confidence interval for Stella's score is approximately: Select one: A. 45 to 55. B. 40 to 60. C. 35 to 65. D. 30 to 70.

Answer B is correct. The 95% confidence interval for an obtained test score is constructed by multiplying the standard error of measurement by 1.96 and adding and subtracting the result to and from the examinee's obtained score. An interval of 40 to 60 is closest to the 95% confidence interval and was obtained by multiplying the standard error by 2.0 (instead of 1.96) and then adding and subtracting the result (10) to and from Stella's score of 50. Additional information on calculating confidence intervals is provided in the Test Construction chapter of the written study materials.

Which of the following is NOT an example of a standard score? Select one: A. WAIS IQ score B. percentage score C. z score D. T score

Which of the following is NOT an example of a standard score? Select one: A. WAIS IQ score B. percentage score C. z score D. T score

When using principal component analysis: Select one: A. the first principal component represents the largest share of the total variance. B. the first principal component represents the smallest share of the total variance. C. each component represents an equal share of the total variance. D. the order of the components is not related to the share of total variance they represent.

a. CORRECT A characteristic of principal components analysis is that the components (factors) are extracted so that the first component reflects the greatest amount of variability, the second component the second greatest amount of variability, etc.

All other things being equal, which of the following tests is likely to have the largest reliability coefficient? Select one: A. a multiple-choice test that consists of items that each have five answer options B. a multiple-choice test that consists of items that each have four answer options C. a multiple-choice test that consists of items that each have three answer options D. a true-false test

a. CORRECT All other things being equal, tests containing items that have a low probability of being answered correctly by guessing alone are more reliable than tests containing items that have a high probability of being answered correctly by guessing alone. Of the types of items listed, multiple-choice items with five answer options have the lowest probability of being answered correctly by guessing alone

Assuming no constraints in terms of time, money, or other resources, the best (most thorough) way to demonstrate that a test has adequate reliability is by using which of the following techniques? Select one: A. equivalent (alternate) forms B. test-retest C. Cronbach's alpha D. Cohen's kappa

a. CORRECT Because equivalent forms reliability takes into account error due to both time and content sampling, it is the most thorough method for establishing reliability and, consequently, is considered by some experts to be the best method.

Cronbach's alpha is an appropriate method for evaluating reliability when: Select one: A. all test items are designed to measure the same underlying characteristic. B. test items are subjectively scored. C. the test will be administered to examinees at regular intervals over time. D. there is a restriction in the range of scores.

a. CORRECT Cronbach's alpha is an appropriate method for evaluating reliability when the test is expected to be internally consistent - i.e., when all test items measure the same or related characteristics.

In the context of test construction, cross-validation is associated with which of the following? Select one: A. shrinkage B. criterion deficiency C. criterion contamination D. banding

a. CORRECT Cross-validation refers to re-assessing a test's criterion-related validity with a new sample. Because the chance factors operating in the original sample are not all present to those operating in the cross-validation sample, the validity coefficient usually "shrinks" (is smaller) for the new sample.

In factor analysis, a factor loading indicates the correlation between: Select one: A. a test and an identified factor. B. two different tests. C. two factors measured by the same test. D. two factors measured by different tests

a. CORRECT In factor analysis, a factor loading is a correlation coefficient that indicates the correlation between a test and an identified factor.

Incremental validity is a measure of: Select one: A. decision-making accuracy. B. shrinkage C. the generalizability of research results. D. the costs involved in using a predictor.

a. CORRECT Incremental validity refers to the increase ("increment") in decision-making accuracy that results from the use of a new predictor (e.g., the increase in accurate hiring decisions).

A test developer would use the Kuder-Richardson Formula (KR-20) in order to: Select one: A. evaluate a tests internal consistency reliability. B. evaluate a tests test-retest reliability. C. determine the impact of increasing a tests reliability on its validity. D. determine the impact of lengthening a test on its reliability.

a. CORRECT KR-20 is used to determine a test's internal consistency reliability when test items are scored dichotomously.

When using criterion-referenced interpretation of scores obtained on a job knowledge test, you would most likely be interested in which of the following? Select one: A. the total number of test items answered correctly by an examinee B. an examinee's performance relative to that of other examinees C. an examinee's standing on two or more measures designed to assess the same characteristic D. ensuring that test items are based on a systematic job evaluation

a. CORRECT One criterion that is used to interpret a person's test score is the total number of correct items. This criterion is probably most associated with "mastery testing." A person is believed to have mastered a content area when he/she obtains a predetermined minimum score on the test that is designed to assess knowledge of that area. There are other types of criteria that are external to the test itself that are used in criterion-referenced interpretation but none of the other responses addresses those types of interpretation; and, consequently, this answer is the best one.

he assumption underlying convergent validity is that: Select one: A. a measure of a characteristic should correlate highly with a different type of measure that is already known to assess the same characteristic. B. a valid measure of a variable should have as much (or more) factorial validity as does an existing measure of the same variable. C. a measure of a construct should correlate more highly with itself than with another measure of the same construct. D. to be valid, a measure of a characteristic should correlate highly with the measure of the behavior it is designed to predict.

a. CORRECT One way to establish a test's construct validity is to determine that it correlates highly with other measures that are already known to assess the same trait. When it does, the measure is said to have convergent validity.

When the kappa statistic for a measure is .90, this indicates that the measure: Select one: A. has adequate inter-rater reliability. B. has adequate internal consistency reliability. C. has low criterion-related validity. D. has low incremental validity.

a. CORRECT Reliability coefficients range from 0 to +1.0, so a coefficient of .90 indicates good reliability.

Which of the following best describes the relationship between validity and reliability? Select one: A. A valid test is also a reliable test. B. A valid test may or may not be a reliable test. C. A reliable test is also a valid test. D. An invalid test is not a reliable test.

a. CORRECT Reliability sets an upper limit on validity, which means that a valid test must also be a reliable test. However, high reliability does not guarantee validity - i.e., a test can be free from the effects of measurement error but not measure the attribute it was designed to measure.

When a test has been constructed on the basis of item response theory, an examinee's total test score provides information about his/her: Select one: A. status on a latent trait or ability. B. predicted performance on an external criterion. C. performance relative to other examinees included in the standardization sample. D. current developmental level.

a. CORRECT Scores on tests developed on the basis of item response theory are reported in terms of the examinee's level on the trait or ability measured by the test rather than in terms of a total score. An advantage of this method of score reporting is that it makes it possible to compare scores from different sets of items and from different tests.

The correction for attenuation formula is used to measure the impact of increasing: Select one: A. a tests reliability on its validity. B. a tests validity on its reliability. C. the number of test items on the tests validity. D. the number of test items on the tests reliability.

a. CORRECT The correction for attenuation formula is used to determine the impact of increasing the reliability of the predictor (test) and/or the criterion on the predictor's validity.

In a distribution of percentile ranks, the number of examinees receiving percentile ranks between 20 and 30 is: Select one: A. equal to the number of examinees receiving percentile ranks between 50 and 60. B. greater than the number of examinees receiving percentile ranks between 50 and 60. C. about equal to one-half the number of examinees receiving percentile ranks between 50 and 60. D. about equal to one-fourth the number of examinees receiving percentile ranks between 50 and 60.

a. CORRECT The flatness of a percentile rank distribution indicates that scores are evenly distributed throughout the full range of the distribution. In other words, at least theoretically, the same number of examinees fall at each percentile rank. Consequently, the same number of examinees obtain percentile ranks between the ranks of 20 and 30, 30 and 40, etc.

In factor analysis, the original factor matrix is usually rotated in order to: Select one: A. facilitate interpretation of the identified factors. B. determine how many factors to extract. C. cross-validate the factor analysis. D. verify the causal relationships among the identified factors.

a. CORRECT The rotation of factors provides a clearer pattern of factor loadings - i.e., in the rotated matrix, some tests correlate most highly with one factor, while other tests correlate more highly with a different factor. This makes it easier to identify the factors (dimensions) that account for the intercorrelations between the tests.

To maximize the inter-rater reliability of a behavioral observation scale, you should make sure that coding categories: Select one: A. are mutually exclusive. B. are measured on an interval or ratio scale. C. produce criterion-referenced scores. D. produce scores that are normally distributed.

a. CORRECT To maximize the reliability of a behavior observation scale, coding categories must be discrete and mutually exclusive. For example, if the behavioral categories for aggressiveness were "aggressive acts" and "emotional displays," the same behavior might be recorded twice, and an unreliable picture of a child's behavior would be obtained.

To obtain a "coefficient of stability," you would: Select one: A. administer the same test twice to the same group of examinees on two separate occasions and correlate the two sets of scores. B. administer a test to a group of examinees and determine the average inter-item correlation. C. administer a test to two different random samples of examinees on two occasions and correlate the two sets of scores. D. administer parallel forms of a test to the same group of examinees and correlate the two sets of scores.

a. CORRECT To obtain a coefficient of stability, the same measure is administered to the same group of examinees on two separate occasions and the scores obtained by the examinees are correlated. The result indicates the consistency (stability) of scores over time.

A reliability coefficient is best defined as a measure of: Select one: A. relevance. B. consistency. C. interpretability. D. generalizability.

b. CORRECT A reliability coefficient indicates the proportion of variance in test scores that is consistent (i.e., is due to true score variability rather than to measurement error).

You ask a group of experienced salespeople to review the test items included in a test you have developed to help select new sales applicants. You are apparently interested in determining the test's ______ validity. Select one: A. incremental B. content C. concurrent D. differential

b. CORRECT A test's content validity refers to the extent to which test items represent the domain of knowledge, skills, and/or abilities the test was designed to measure. Content validity is established primarily by having subject matter experts evaluate items in terms of their representativeness.

A test developer would construct an expectancy table to: Select one: A. facilitate norm-referenced interpretation of test scores. B. facilitate criterion-referenced interpretation of test scores. C. correct obtained scores for the effects of guessing. D. correct obtained test scores for the effects of measurement error.

b. CORRECT An expectancy table provides the information needed to interpret an examinee's score in terms of expected performance on an external criterion and, consequently, is a method of criterion-referenced interpretation.

The primary advantage in using a percentile rank, z-score, or T-score is that these scores: Select one: A. are easy to interpret because they reference an individuals test performance to an absolute standard of performance. B. are easy to interpret because they reference an individual's test performance to the performance of other examinees. C. are easy to interpret because they make it possible to predict which criterion group an examinee is likely to belong to. D. normalize the raw score distribution so that parametric tests can be used to analyze test scores.

b. CORRECT Because it is usually difficult to "make sense" of raw scores, they are often transformed into scores that are easier to interpret. The advantage of norm-referenced scores (which are a type of transformed score) is that they make it possible to determine how well an examinee did in comparison to other examinees.

A 200-item test that has been administered to 100 college students has a normal distribution, a mean of 145, and a standard deviation of 12. When the students' raw scores have been converted to percentile ranks, Alex obtains a percentile rank of 49, while his twin sister Alicia obtains a percentile rank of 90. The teacher realizes that she made a mistake in scoring Alex's and Alicia's tests: Both should have received a raw score that was five points higher. In terms of their percentile ranks, when the teacher adds the five points to Alex's and Alicia's scores, she can expect that: Select one: A. Alicia's percentile rank will increase more than Alex's. B. Alex's percentile rank will increase more than Alicia's. C. Alicia's and Alex's percentile ranks will increase by the same amount. D. Alicia's and Alex's percentile ranks will not change.

b. CORRECT Because of the above-described phenomenon, Alex's percentile rank will increase more than Alicia's. This makes sense if you think about the normal distribution: Since most of the scores are "piled up" near the center of the distribution, the increase in 5 points in Alex's score will position him above a larger number of examinee's than the 5 point increase in Alicia's score. This difference will be reflected in their percentile ranks.

After reviewing the data collected on a new selection test during the course of a criterion-related validity study, a psychologist decides to lower the selection test cutoff score. Apparently the psychologist is hoping to do which of the following? Select one: A. reduce the number of false negatives B. increase the number of true positives C. reduce the number of false positives D. increase the number of false negatives

b. CORRECT By lowering the selection test (predictor) cutoff score, the psychologist will increase the number of people who are accepted on the basis of their selection test score -- i.e., doing so will increase the number of positives, including the number of true positives, who are individuals who will be selected on the basis of their test scores and will be successful on the criterion.

When the heterotrait-monomethod coefficient is large, this indicates: Select one: A. a lack of differential validity. B. a lack of discriminant validity. C. adequate convergent validity. D. adequate concurrent validity.

b. CORRECT If you are validating a test, you want the heterotrait-monomethod coefficient to be low so that you have evidence of discriminant (divergent) validity. When this coefficient is large, this indicates a lack of discriminant validity. Additional information on the heterotrait-monomethod coefficient and other coefficients included in a multitrait-multimethod matrix is provided in the Test Construction chapter of the written study materials.

In factor analysis, when two factors are "orthogonal," this means that: Select one: A. the factors are correlated. B. the factors are uncorrelated. C. the factors explain a statistically significant amount of variability in test scores. D. the factors do not explain a statistically significant amount of variability in test scores

b. CORRECT In factor analysis, orthogonal factors are uncorrelated (independent) and oblique factors are correlated (dependent).

A psychologist develops a diagnostic test to identify people who have injection phobia. In this situation, the test's ________ refers to how good the test is at identifying people who actually have injection phobia from the pool of people who have injection phobia. Select one: A. specificity B. sensitivity C. positive predictive value D. negative predictive value

b. CORRECT Sensitivity refers to the probability that a test will correctly identify people with the disease from the pool of people with the disease. It is calculated using the following formula: true positives/(true positives + false negatives).

An advantage of using the kappa statistic rather than percent agreement when assessing a test's inter-rater reliability is that the former: Select one: A. is easier to calculate. B. corrects for chance agreement. C. corrects for small sample size. D. takes into account the effects of multicollinearity.

b. CORRECT The kappa statistic (which is also known as Cohen's kappa and the kappa coefficient) provides a more accurate estimate of reliability than percent agreement because its calculation includes removing the effects of chance agreement.

In terms of item response theory, the slope (steepness) of the item characteristic curve indicates the item's: Select one: A. level of difficulty. B. ability to discriminate between examinees. C. internal consistency reliability. D. criterion-related validity.

b. CORRECT The steeper the slope of the item characteristic curve, the better its ability to discriminate between examinees who are high and low on the characteristic being measured.

To evaluate the validity of a newly developed selection test for clerical workers, a test developer will correlate scores obtained on the test by newly hired clerical workers with the job performance ratings they receive after being on-the-job for six months. The resulting correlation coefficient will provide information on the test's: Select one: A. discriminant validity. B. predictive validity. C. construct validity. D. concurrent validity.

b. CORRECT There are two types of criterion-related validity -- predictive and concurrent. As its name implies, predictive validity involves correlating predictor scores with criterion scores that are obtained at a later time to determine how well the predictor predicts future performance on the criterion.

According to classical test theory, total variability in test scores is due to: Select one: A. true score variability plus systematic error. B. true score variability plus random error. C. relevant variability plus irrelevant variability. D. relevant variability plus confounding variability.

b. CORRECT This answer accurately describes how total variability is conceptualized in classical test theory. This conceptualization is represented by the formula: X = T + E, where X is the total variability in test scores; T is true score variability; and E is variability due to measurement (random) error.

You would use a "multitrait-multimethod matrix" in order to: Select one: A. compare a test's predictive and concurrent validity. B. determine if a test has adequate convergent and discriminant validity. C. identify the common factors underlying a set of related constructs. D. test hypotheses about the causal relationships among variables.

b. CORRECT When a measure correlates highly with other measures of the same trait, the measure has convergent validity; when it has low correlations with measures of different traits, it has discriminant (divergent) validity. Convergent and discriminant validity are used as evidence of construct validity, and the multitrait-multimethod matrix contains correlation coefficients that provide information about a measure's convergent and discriminant validity.

The optimal item difficulty level (p) for a true/false test is: Select one: A. +1.0. B. .75. C. .25. D. -1.0.

b. CORRECT When considering the probability that an examinee can select the correct answer by chance alone, the optimal difficulty level is halfway between 100% of examinees answering the item correctly and the probability of answering the item correctly by chance alone. For a true/false item, the latter is 50%, so the optimal item difficulty is 75% (.75), which is halfway between 100% and 50%.

To maximize the ability of a test to discriminate among test takers, a test developer will want to include test items that vary in terms of difficulty. If the test developer wants to add more difficult items to her test, she will include items that have an item difficulty index of: Select one: A. .90. B. .50. C. .10. D. 0

c. CORRECT An item difficulty level of .10 indicates a difficult item (only 10% of examinees in the sample answered it correctly) and is the best answer of those given.

In a normal distribution, which of the following represents the lowest score? Select one: A. percentile rank of 20 B. z-score of -1.0 C. T score of 25 D. Wechsler IQ score of 70

c. CORRECT A T score is a standardized score with a mean of 50 and a standard deviation of 10. Therefore, a T-score of 25 is two and one-half standard deviations below the mean and represents the lowest score of those given in the answers.

The distribution of percentile ranks is always: Select one: A. the same as the shape of the distribution of raw scores. B. normal regardless of the shape of the distribution of raw scores. C. rectangular (flat) regardless of the shape of the distribution of raw scores. D. bimodal regardless of the shape of the distribution of raw scores.

c. CORRECT A distinguishing characteristic of percentile ranks is that their distribution is always rectangular (flat) regardless of the shape of the distribution of raw scores.

A test designed to measure knowledge of clinical psychology is likely to have the highest reliability coefficient when: Select one: A. the test consists of 30 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology. B. the test consists of 30 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology. C. the test consists of 80 items and the tryout sample consisted of individuals who are heterogeneous in terms of knowledge of clinical psychology. D. the test consists of 80 items and the tryout sample consisted of individuals who are homogeneous in terms of knowledge of clinical psychology.

c. CORRECT All other things being equal, longer tests are more reliable than shorter tests. In addition, the reliability coefficient (like any other correlation coefficient) is larger when there is an unrestricted range of scores - i.e., when the tryout sample contains examinees who are heterogeneous with regard to the attribute(s) measured by the test.

Which type of reliability would be most appropriate for estimating the reliability of a multiple-choice speeded test? Select one: A. split-half B. coefficient of concordance C. alternate forms D. coefficient alph

c. CORRECT Alternate-forms reliability is an appropriate method for establishing the reliability of speeded tests

Criterion contamination has which of the following effects? Select one: A. It artificially increases scores on the criterion. B. It artificially reduces the criterion's reliability coefficient. C. It artificially increases the predictor's criterion-related validity coefficient. D. It artificially attenuates scores on the predictor and the criterion.

c. CORRECT Criterion contamination has the effect of artificially inflating the correlation between the predictor and the criterion.

A personnel director uses a mechanical aptitude test to hire machine shop workers. Several of the people hired using the test turn out to be less than adequate performers. These individuals are: Select one: A. true positives. B. true negatives. C. false positives. D. false negatives.

c. CORRECT False positives are individuals who are predicted to perform satisfactorily by the predictor but, in fact, perform poorly on the criterion. In other words, these individuals have been "falsely identified as positives."

It would be most important to assess the test-retest reliability of a measure that: Select one: A. is subjectively scored. B. assesses examinees' speed of responding. C. measures a stable trait. D. measures a characteristic that fluctuates over time. Feedback

c. CORRECT If a test is designed to measure a stable trait, you would want to make sure that scores are stable over time. Therefore, test-retest reliability would be important for this kind of test.

A final exam is developed to evaluate students' comprehension of information presented in a high school history class. When the exam is administered to three classes of students at the end of the semester, all students obtain failing scores. This suggests that the exam may have poor ________ validity. Select one: A. concurrent B. incremental C. content D. divergent

c. CORRECT If all students do poorly on a test designed to assess their mastery of the course content, one possible reason is that the test questions do not represent that content; i.e., the test does not have adequate content validity. (There are, of course, other possible reasons, for the students' low scores but, of answers given, this is the best one.)

In factor analysis, communality refers to: Select one: A. the proportion of variance accounted in a single variable by a single factor. B. the proportion of variance accounted in multiple variables by a single factor. C. the proportion of variance accounted for in a single variable by all of the identified factors. D. the total proportion of variance in all of the variables included in the analysis that is not due to error.

c. CORRECT In factor analysis, a communality is calculated for each test (variable) included in the analysis. The communality indicates the total amount of variability accounted for in the test by all of the identified factors. Additional information on the communality and other aspects of factor analysis that you want to be familiar with for the exam is provided in the Test Construction chapter of the written study materials.

A college freshman obtains a score of 150 on his English final exam, a score of 100 on his math exam, a score of 55 on his chemistry exam, and a score of 30 on his history exam. The means and standard deviations for these tests are, respectively, 125 and 20 for the English exam, 90 and 10 for the math exam, 45 and 5 for the chemistry exam, and 30 and 5 for the history exam. Based on this information, you can conclude that the young man's test performance was best on which exam? Select one: A. English B. math C. chemistry D. history

c. CORRECT In this case, the student's English score is equivalent to a z-score of +1.25, his math score is equivalent to a z-score of +1.0, his chemistry score is equivalent to a z-score of +2.0, and his history score is equivalent to a z-score of 0. Therefore, the student obtained the highest score on the chemistry test.

When a test user uses a correction for guessing formula that involves subtracting points from each examinee's scores, the resulting distribution of scores will have a ____________________ than the original (non-corrected) distribution. Select one: A. higher mean and larger standard deviation B. higher mean and smaller standard deviation C. lower mean and larger standard deviation D. lower mean and smaller standard deviation

c. CORRECT The effect of this type of correction for guessing formula on a distribution's mean is fairly easy to understand - i.e., use of the formula will result in reducing the scores of some examinees and, thereby, reduce the size of the mean. To understand its effect on the standard deviation, assume that the lowest possible score on a test is 0 and that the highest score is 100, which is obtained by at least one examinee. In this situation, as a result of the correction for guessing, some examinees will obtain scores lower than 0, while the highest scorer will still receive a score of 100. When this occurs, the range of scores will increase, and this will be reflected in the distribution's standard deviation.

The item discrimination index (D) ranges in value from: Select one: A. 0 to 10. B. 0 to 50. C. -1.0 to +1.0. D. -50 to +50.

c. CORRECT The item discrimination index is calculated by subtracting the percent of examinees in the lower scoring group from the percent of examinees in the upper scoring group who answered the item correctly and ranges in value from -1.0 to +1.0.

c. CORRECT The effect of this type of correction for guessing formula on a distribution's mean is fairly easy to understand - i.e., use of the formula will result in reducing the scores of some examinees and, thereby, reduce the size of the mean. To understand its effect on the standard deviation, assume that the lowest possible score on a test is 0 and that the highest score is 100, which is obtained by at least one examinee. In this situation, as a result of the correction for guessing, some examinees will obtain scores lower than 0, while the highest scorer will still receive a score of 100. When this occurs, the range of scores will increase, and this will be reflected in the distribution's standard deviation.

c. CORRECT The maximum value for the standard error of measurement is the value of the standard deviation of the test scores. The standard error is equal to the standard deviation when the reliability coefficient is zero.

In the multitrait-multimethod matrix, which of the following coefficients provides information about a test's convergent validity? Select one: A. heterotrait-heteromethod B. heterotrait-monomethod C. monotrait-heteromethod D. monotrait-monomethod

c. CORRECT The monotrait-heteromethod coefficient is a measure of convergent validity. It indicates the correlation between the test that is being validated and another measure of the same trait (monotrait) that uses a different method of measurement (heteromethod).

The standard error of measurement is used to: Select one: A. estimate a test's "true" reliability coefficient. B. estimate a test's "true" criterion-related validity coefficient. C. calculate the range within which an examinee's true test score is likely to fall given her obtained score. D. calculate the range within which an examinee's true criterion score is likely to fall given her predicted criterion score.

c. CORRECT The standard error of measurement is an index of error and is used to construct an interval in which an examinee's true test score is likely to fall given his or her obtained test score.

he point at which an item characteristic curve intercepts the vertical (Y) axis provides information on which of the following? Select one: A. the item's difficulty level B. the item's ability to discriminate between low and high scorers C. the probability of answering the item correctly by guessing D. the item's ability to

c. CORRECT The vertical axis indicates the probability of choosing a correct response as a function of an examinee's ability level. The point at which the item characteristic curve intercepts the vertical axis indicates the probability of choosing the correct response by chance alone.

To evaluate the concurrent validity of a new selection test for clerical workers, you would: Select one: A. conduct a factor analysis to confirm that the test measures the attributes it was designed to measure. B. have supervisors and others familiar with the job rate test items for relevance to success as a clerical worker. C. administer the test to a sample of current clerical workers and correlate their scores on the test with their recently assigned performance ratings. D. administer the test to clerical workers when they are initially hired and six months after they are hired and then correlate the two sets of scores.

c. CORRECT To evaluate a test's criterion-related validity, scores on the predictor (in this case, the selection test) are correlated with scores on a criterion (measure of job performance). When scores on both measures are obtained at about the same time, they provide information on the test's concurrent validity.

Which of the following is used to estimate the effects of shortening or lengthening a test on the test's reliabilty coefficient? Select one: A. Cohen's kappa statistic B. Kuder-Richardson Formula 20 C. Cronbach's coefficient alpha D. Spearman-Brown formula

d. CORRECT Although the Spearman-Brown formula is probably most often used in conjunction with split-half reliability, it can actually be used whenever a test developer wants to estimate the effects of increasing or decreasing the number of test items on the test's reliability coefficient.

A personnel director hires all job applicants who obtain a high score on a job selection test but, after using the test for six months, realizes that many of the new employees are obtaining low performance ratings. Assuming that the selection test has adequate criterion-related validity, the personnel direct can reduce the number of unsatisfactory workers that she hires using the test by: Select one: A. lowering the selection test cutoff score and the job performance rating cutoff score. B. raising the selection test cutoff score and the job performance rating cutoff score. C. lowering selection test cutoff score. D. raising the selection test cutoff score.

d. CORRECT Applicants who are hired on the basis of their selection test scores but who perform poorly on the job are false positives. Raising the cutoff score on the selection test (predictor) should reduce the number of individuals who do poorly on the job - i.e., it will reduce the number of positives, including the number of false positives. Note that lowering the job performance rating (criterion) cutoff score would also reduce the number of false positives but that, in many work situations, an employer would not want to do this.

Content sampling is not a potential source of measurement error for which of the following methods for evaluating a test's reliability? Select one: A. coefficient alpha and alternate forms B. alternate forms and test-retest C. split-half only D. test-retest only

d. CORRECT Because test-retest reliability involves administering the same test (i.e., the same content) twice, content sampling is not a source of error.

The applicants for sales positions at the Acme Company complain that the selection test they are required to take is unfair because it doesn't "look like" it measures the knowledge and skills that are important for successful job performance. Their complaint suggests that the selection test is lacking which of the following? Select one: A. incremental validity B. differential validity C. construct validity D. face validity

d. CORRECT Face validity refers to the extent that a test appears to be valid to test-takers - i.e., to the extent that the test "looks like" it is measuring what it is supposed to be measuring.

In a normal distribution, a T score of ___ is equivalent to a percentile rank of 16. Select one: A. 10 B. 20 C. 30 D. 40

d. CORRECT In a normal distribution, a percentile rank of 16 and a T score of 40 are both one standard deviation below the mean.

The best way to control consensual observer drift is to: Select one: A. use the correction for attenuation formula. B. use a true experimental research design. C. videotape the observers. D. alternate raters.

d. CORRECT Of the actions described in the answers to this question, this one is the best way to alleviate consensual observer drift, which occurs when raters who are working together influence each other's ratings so that they assign ratings in increasingly similar (and idiosyncratic) ways.

In a multitrait-multimethod matrix, the coefficient that indicates a test's reliability is the _____________ coefficient. Select one: A. heterotrait-heteromethod B. heterotrait-monomethod C. monotrait-heteromethod D. monotrait-monomethod

d. CORRECT The monotrait-monomethod coefficient indicates the correlation of the test with itself and is a measure of the test's reliability.

Which of the following scores does not "belong with" the other three? Select one: A. stanine scores B. z-scores C. percentile ranks D. percentage scores

d. CORRECT The scores listed in answers a, b, and c are norm-referenced scores that permit an examinee's score to be compared to the scores of others who are taking or have taken the same test. In contrast, percentage scores are a type of criterion-referenced score that reference an examinee's score to the content of the exam and indicate how much of the content an examinee has mastered.

The minimum and maximum values of the standard error of estimate are: Select one: A. -1 and +1. B. 0 and 1. C. 0 and the standard deviation of the predictor. D. 0 and the standard deviation of the criterion.

d. CORRECT The standard error of estimate equals the standard deviation of the criterion scores times the square root of one minus the validity coefficient squared. This formula indicates that the standard error of estimate ranges from 0 (which occurs when the validity coefficient is 1.0) to the standard deviation of the criterion scores (which occurs when the validity coefficient is 0).

__________ refers to the extent to which individual test items contribute to the overall purpose of the test. Select one: A. Validity B. Reliability C. Discrimination D. Relevance

d. CORRECT This question describe relevance, which is determined by judging the extent to which each item assesses the target content or behavior domain and does so at the appropriate ability level.

Which of the following types of validity would you be most interested in when designing a selection test that will be used to predict the future job performance ratings of job applicants? Select one: A. discriminant B. content C. construct D. criterion-related

d. CORRECT When a test is being used to predict performance on a criterion, you would be most interested in the test's criterion-related validity (e.g., in its correlation with the criterion measure).


Related study sets

3.4) Explain the use of code to represent a character set

View Set

GRE 3000 Word List - list07 - 12

View Set

Real Estate Contracts and Agency

View Set