psychological testing 1
consider the case where nine persons earn $10,000 a tenth person earns $990,000. What is their average income?
$100,000
Factor loadings can vary between
-1.0 and +1.0
In general, the optimal level of item difficulty is
.5
Many authors suggest that reliability should be at least ___ for decisions about individuals.
.90
A group of standard scores always possesses a mean of ___ and a standard deviation of ___.
0.0, 1.0
The optimal value for the item-discrimination index is
0.5
Currently, ethnic minorities constitute about __________ of the U.S. population.
1/3
A C scale consists of ____ units.
11
Idiocy and it's treatment by the physiological method was first published in ______ by _________.
1866, Seguin
first resolution of the binet-simon scales was completed in _____ by _________.
1910, Goddard
latest revision of the stanford-binet was completed in
2003
In a normal distribution, approximately ____ percent of the scores will fall within one standard deviation of the mean in either direction.
68
Full Scale IQ is ____ percent certain to be accurate within plus or minus 5 IQ points.
95
Brass instruments tests recorded
All of the above
Leta Stetter Hollingworth is known for which of the following accomplishments?
All of the above
the following is a desirable feature for an individually administered test?
All of the above
Regarding the true score, which statement is correct?
All of the above: we can never know the true score with certainty, we can derive a probability that the true score resides within a certain interval, and we can derive a best estimate of the true score
the mean of a standardized score is typically set at
All of the above?
regarding a test manual, ethical guidelines indicate the test publishers
Are required to publish a manual
_______________ share a common assumption that behavior is best understood in terms of clearly defined characteristics such as frequency, duration, antecedents, and consequences.
Behavioral procedures
In testing children, Binet warned scientists to be on the lookout for
Both suggestability and and failure of attention
the theoretical upper limit of the validity coefficient is constrained by the reliability of
Both the test and the criterion
individual test of intelligence, projective personality tests, and neuropsychological test batteries are example of Level ___ tests
C
what did Goddard report as the primary culprit of the low intelligence scores of immigrants?
Environmental deprivation
The first form of numerical rating scales can be traced to
Galen in the 2nd century
in developing his test, Rorschach was most heavily influenced by
Galton
On a __________ scale, respondents who endorse one statement also agree with milder statements pertinent to the same underlying continuum.
Guttman
the validity of a screening test is bolstered to the extent it possesses _________ sensitivity and _______ specificity.
High; high
window stickers are made aware, and language they can understand, of the reasons were testing, etc., this is called
Informed consent
in order to investigate potential sex bias in a test item, we would examine a(n)
Item-characteristic curve
__________________________ is a statistical index of how efficiently an item discriminates between persons who obtain high and low scores on the entire test.
Item-discrimination index
Which test is considered to have better norming?
MMPI-2
a form of ranking is found in ________ scales.
Ordinal
In _________ validity, test scores are used to estimate outcome measures
Predictive
________ validity is particularly relevant for entrance examinations and employment tests.
Predictive?
A(n) ________ scale has a conceptually meaningful zero point.
Ratio
for what kind of distribution with the highest number of persons the score in the superior range?
Rectangular
for what kind of reliability is the Spearman-brown formula relevant?
Split-half
which of the below is an index of the internal consistency of a test or scale?
Split-half
The mean of _________ scores is always 5, and the standard deviation is approximately 2.
Stainine
_________ we have the long-term effect of pressuring African-American students to "protectively disidentify" with achievement in school and related intellectual domains
Stereotype threat
the classical theory of reliability is also known as the theory of
True and error scores
The "thought meter" was developed by
Wundt
A construct is
a theoretical, intangible quality or trait in which individuals differ
Under what circumstances is it considered ethical to ask a client to take a test such as the MMPI-2 home for completion?
almost never
A test is valid when the inferences made from it are
appropriate meaningful useful all of the above
Classical theory assumes that measurement errors
are not correlated with true scores
Which is the most comprehensive term?
assessing
Renorming of tests should
be the rule, not the exception
In Moore's adoption study, which group scored higher on IQ tests?
black children adopted into white families
A(n) ____________ effect is observed when significant numbers of examinees obtain perfect or near-perfect scores.
ceiling
The degree to which items on a test are representative of the universe of behavior the test was designed to sample is an index of ____________ validity.
content
A factor loading is actually a(n)
correlation
In a(n) _______________ test, the objective is to determine where the examinee stands with respect to very tightly defined educational objectives.
criterion-referenced
The text mentions that the following element(s) can be used to define informed consent from a legal standpoint.
disclosure competency voluntariness all of the above
Putting forth a variety of answers to a complex or fuzzy problem is an example of _________ thinking.
divergent
The major challenge with split-half reliability is
dividing the test into nearly equivalent halves
Perfect correlation
does not imply identical pre- and posttest scores for each examinee
It is common practice in test development that the prepublication version of a new instrument might contain _____________ the number of items desired on the final draft.
double
why were group tests are generally slow to catch on?
early versions had to be laboriously scored by hand
The Glasgow Coma Scale was developed by the method of
expert rankings
The only reason for building in ________ validity involves public relations.
face
In a ____________, the frequency of the class intervals is represented by single points rather than columns.
frequency polygon
The difference in underlying raw score points between percentiles of 90 and 99 is _________ that between percentiles of 50 and 59.
greater than
According to the text, which kind of test generally requires the greatest vigilance from the examiner?
group and individual tests require equal vigilance
regarding the publication of new or revised instruments, the most important guideline is to
guard against premature release of a test
The quantification or "common sense" approach to content validity advocated by the text
helps cull out existing items that are deemed inappropriate by expert raters
The standard error of the estimate is an index of the error of measurement caused by the ______________ of a test.
imperfect validity
Access to psychological tests is restricted because:
in the hands of unqualified persons, psychological tests can cause harm the selection process is rendered invalid for persons who preview test questions leakage of item content to the general public completely destroys the efficacy of a test all of the above
An important advantage of _________ tests is that the examiner can gauge the level of motivation of the examinee.
individual
When test takers are made aware, in language that they can understand, of the reasons for testing, etc., this is called
informed consent
A homogeneous scale is also referred to as
internally consistent
Which type of reliability method is best for a test that involves subjectivity of scoring?
interscorer
The more powerful and useful statistics should only be used with ___________ levels of measurement.
interval and ratio
A construct possesses the following characteristic(s):
it cannot be operationally defined a network of predictions can be derived from theory about the construct both a and b
Another name for latent trait theory is
item response theory
A graphical display of the relationship between the probability of a correct response and the examinee's position on the underlying trait measured by the test is called
item-characteristic curve
If an examinee obtains a verbal score higher than his/her performance score, then the underlying true scores for verbal and performance abilities
may or may not show the same pattern
Which is the correct order for levels of measurement?
nominal, ordinal, interval, ratio
If all other factors are held constant, what effect does a strong practice effect have upon a test's reliability?
none
In a(n) _____________ test, the performance of each examinee is interpreted in reference to a relevant standardization sample.
norm-referenced
The ____________ is simply the normal distribution graphed in cumulative form.
normal ogive
Suppose we obtain the following response patterns on a multiple choice question with correct answer "c": abc d e high-scorers 5 6 80 5 4 low-scorers 15 14 40 16 15 What needs to be done to improve this test item?
nothing, this is a good test item
What is the relationship between the variance and the standard deviation?
one can be computed from the other
What effect does an excessive emphasis on nationally normed achievement tests for selection and evaluation appear to promote?
outright fraud and cheating
When instructions for a task are neutral or nonthreatening, test-anxious subjects
perform just as well as low-anxious subjects
T score scales are especially common for __________ tests.
personality
The sources of measurement error are
potentially knowable for individual cases
A ______ test allows enough time for test takers to attempt all items, but is constructed so that no test taker is able to obtain a perfect score.
power
As a group, Native Americans may tend to emphasize ___________ more than European Americans.
present time
The "brass instruments" era was a dead end because
psychologists mistook simple sensory processes for intelligence
which of the following is NOT true about the concept of variance
psychometricians prefer variance to standard deviation as an index of variability
The necessary prerequisite(s) to administering a new test are:
reading the manual memorizing key elements of instructions rehearsing the test all of the above
What is the relationship between the reliability and the validity of a psychological test?
reliability is necessary but not sufficient for validity
When initial research indicates that an instrument produces skewed results in the standardization sample, test developers typically
revamp the test at the item level
A ______ test typically contains items of uniform and generally simple level of difficulty.
speed
The most commonly used statistical index of variability in a group of scores is the
standard deviation
In a _______ scale, all raw scores are converted to a single-digit system of scores ranging from 1 to 9.
stanine
The restriction imposed by Hollerith cards was a main impetus for the development of ________ scales.
stanine
A Likert scale is also referred to as a __________________.
summative scale
Methods for computing the reliability coefficient for a test involve
temporal stability internal consistency both a and b
The essential objective of _________________ is to determine the distribution of raw scores in the norm group so that the test developer can publish derived scores known as norms.
test standardization
What concept is best summed up by the question, "Does use of this test result in better patient outcomes or more efficient delivery of services?"
test utility
What was the catalyst for the development Binet and Simon's test?
the call for an instrument to identify cognitively impaired school children needing special instruction
The use of normalized standard scores is appropriate when
the normative sample is large and representative -the raw score distribution is only mildly non-normal both a and b
According to the functionalist perspective on test validity, a test is valid if
the test serves the purpose for which it is used
What approach do most test developers use in choosing a norm group?
they strive to make a good faith effort to select a representative sample
Which of the following was used as a situational test by the Office of Strategic Services during WWII?
transporting equipment across a raging brook scaling a ten foot high wall surviving a realistic interrogation all of the above
The expression ____________ refers to the phenomenon in which a test predicts a criterion less well when used on a new sample of subjects.
validity shrinkage
Regarding the true score, which statement is correct?
we can never know the true score with certainty we can derive a probability that the true score resides within a certain interval we can derive a best estimate of the true score all of the above
When are expert judges needed to determine the content validity of a test?
when the trait being measured is ill-defined
The test item writer's aim is to make all or nearly all considered guesses _________ guesses.
wrong