Urbina ch 1-6 test questions
the magnitude of a reliability coefficient is more likely to be affected by the ___ than by the ____ of the sample for which it is computed
heterogeneity/ size
if the group of people today were to take the original wechsler adult intelligence scale (WAIS) and its latest revision, the WAIS-IV, according to the flynn effect, chances are that their average IQ scores on the two tests would be
higher on the WAIS than on the WAIS-IV
a true score is
hypothetical entity
the essential characteristic of item response theory models is that they place ____ on a common scale
items and persons
for a pure speed test, the customary indexes of item difficulty and discrimination are ____ they are for a pure power test
less appropriate than
stanines are a type of standard score originally devised in order to
reduce the time and effort needed to process scores
if the range of values of either one of two variables that are correlated using the Pearson product moment coefficient of correlation (pearson r) is restricted, teh size of the obtained coefficient will be
reduced
with regard to the samples used to establish within-group norms, the single most important requirement is that they should be
representative of the group for which they will be used
forced choice items belong to the category of
selected response items
one of the most useful aspects of factor analysis, as applied to test validation research, is that the rsults of applying this technique can
simplify the interpretation and reporting of test scores
For testing and many other purposes, the quintessential index of the variability in a distributuion of scores is the
square root of the variance
the single most important source of criteria for evaluating tests, testing practices, and the effects of test use can be found in the
standards for educational and psychological testing
criterion contamination is most likely to occur when
supervisors who rate job performance have access to employees test scores
if the correlation between predictor and criterion is +1.00 or -1.00, the standard error of estimate is equal to
zero
evaluating psychological tests is least problematic a)prior to their being placed into use b)once they have been placed into use
A) prior to their being placed into use
nomothetic span refers to
a network of relationships between measures
one of the distinct advantages of generalizability thoery over traditional approaches to score reliability is that generalizability theory
allows for the evaluation of interaction effects
which of the followign score transformation procedures is the only one that qualifies as a linear transformation a)normalized standard scores b) z scores to t scores c)raw scores to percentile scores d) percentiles to stanines
b) z scores to t scores
compared to the other areas listed, the development of criteria or bases for decision making has been substantially slower in the context of
clinical assessment
one of the main advantages of IRT methodology is that it is ideally suited for use in ____ testing
computer adaptive
Peter only read and studied chapter 1 of his textbook the night before he took an exam that covered chapters 1-4. peter answered 90% of the items on the exam correctly. his success is probably best explained by the concept of
content sampling error
which of the following is NOT one of the advantages of selected response over constructed response items? Selected response items a)are less prone to scoring errors b)make more efficient use of testing time c)are easier to quantify d)are easier to prepare
d) are easier to prepare
which of the following sources of error in test scores is not assessed by traditional estimates of reliability a)interscorer differences b)time sampling c) content sampling d) deviations from standardized procedures
d) deviations from standardized procedures
which of the following statements about the normal curve model is not true a)it is bilaterally symmetrical b)it limits extend to infinity c)its mean, median, and mode coincide d)it is multimodal
d) it is multimodal
the standard error of measurement of test A is 5 and the standard error of measurement of test B is 8. the standard error of the difference fo rcomparing scores from the two tests will be a)less than 8 b)less than 5 c)between 5 and 8 d)greater than 8
d)greater than 8
item discrimination indexes are statistics primarily used to assess item
validity
which of the following is not an essential element of psychological testing a)systematic procedures b)the use of empirically derived standards c) preestablished rules for scoring d)sampling behavior from affective domains
D) sampling behavior from affective domains
_____ constitute the most widely used frame of reference for test score interpretation
norms
if one wished to produce a test that would result in maximum diffferentitation among test takers, one owuld aim for an average difficulty level (p value) fo
.50
suppose that a student obtains a score of 110 on a test with M=100, SD=20, and an estimated reliability of .96. chances are 68 out of 100 that the student's true score falls somewhere between
106 and 114
when transformed into the Wechsler scale type of deviation IQs, a z score of -1.00 would become a wechsler IQ of
85
if test X has a mean of 500 and SD=100, assuming a normal distribution and N=1,000, about how many individuals would have scored between 300 and 700?
950
if the distribution of scores on a test fits the standard normal distribution model, the smallest difference in the performance of test takers would be between the scores that rank between the ____) percentiles a) 5th and 10th b) 20th and 25th c)50th and 55th d)90th and 95th
C) 50th and 55th
which of the following statements about criterion measures is NOT true a)criterion measures can differ in terms of their reliability and validity b)different criterion measures do not always correlate with each other c)the best criterion measures are usually available at the time of testing d)criterion measures may or may not generalize across different gorups
C) the best criterion measures are usually available at the time of testing
which of the following is not one of the basic objectives toward whcih item response theory (IRT) is geared a)to provide max info about the trait levels of examinees b)to give examinees items that are tailored to their test traits c)to increase the number of items included in a test d)to minimize measurement error
C) to increase the number of items included in a test
the earliest antecedents of modern testing for personnel selection date back to
China, BCE
which of the following coefficients represents the strongest degree of correlation between two variables a) -.90 b)-.20 C)+.20 d)+.60
a) -.90
standard errors of estimate are used in order to gauge the
accuracty with which criteria are predicted
qualitative item analysis typically takes palce
after the item pool is generated
credit for devising the first successful psychological test in the modern era is usually given to
alfred binet
of all the following developmental norms, which ones are the most universally applicable a) theory based ordinal scales b)mental age norms c) natural sequences d)grade based norms
c) natural sequences
from the standpoint of validation procedures, which of the following types of decsions is the most complex a)selection b)placement c)classification
c)classification
the primary purpose for which psychological tests are currently used is
decision making
the procedures involved in item analysis pertain primarily to test
developers
in order to gatehr discriminant validity evidence, one would correlate the scores of tests taht purport to assess ____ constructs
different
the true ratio IQ or intelligence quotient was derived by
dividing the MA by the CA and multiplying the result by 100
evidence of validity that is based on test content and response processes is particularly applicable to
educational tests
validity is best described as the degree to which
evidence supports inferences from test scores
if the reliability of a test is well established, test users can assume that the scores obtained from that test will be reliable. true or false?
false
temperature scales, such as the Fahrenheit scale, are an excellent example of ____ scales
interval
a high raw score that is not accompanied by interpretive data is
meaningless
all other things being equal, scores obtained from longer tests are ____ those obtained from comparable tests that are shorter
more reliable than
compared to psychological testing, psychological assessment is generally
more varied in its methods
of all the people involved in the testing process, the ultimate responsibility for appropriate test use and interpretation resides in the test
user
the concepts of test ceiling and test floor are most closely related to the issue of
test difficulty
____ reliability coefficients are used to estimate time sampling error in test scores
test-retest
on a test of general cognitive ability, a 5 year old child obtains a mental age score of 4 years and a 10 year old child obtains a mental age score of 9 years. if one were to compute their IQs according to the orgiinal ratio IQ formula, the result would be
the 10 year old would obtain a higher IQ
Jim and tim both took the same verbal ability test. on that test, Jim obtained a score that ranked at the 70th percentile. tim got a T score of 70. if we just look at the position of these scores and assume that the score distribution of the reference gorup against which they are both being compared was normal, what can we correctly conclude
tim scored higher than jim
if the shape of the distributuion of scores obtained from a test is significantly and positively skewed, it means that the test was probably ____ for the test takers in question
too hard