PSYC Chapter 8
A close friend, who is now a beauty school dropout, is heard to complain: "I spent all night studying 'Shampoo' for the final examination and there was not a single question on that subject!" As a budding expert in testing and assessment, you hear that complaint as A) "I have a problem with that test's content validity!" B) "There was excessive error variance in the test administration procedures!" C) "The instructor should have paid more attention to the test's construct validity!" D) "Now I am going to have to reconsider a career as a tanning technician!"
A) "I have a problem with that test's content validity!"
An item-difficulty index can range from A) 0 to 1. B) .10 to .99. C) .25 to .75. D) 0 to 100.
A) 0 to 1.
Estimates suggest that approximately _____ percent of the population might be asexual. A) 1 B) 2 C) 3 D) 4
A) 1
The test of asexuality developed by Yule et al. (2015) contains _____ items. A) 12 B) 18 C) 36 D) 48
A) 12
It is an online community of asexual individuals that has become a source of recruitment of subjects for asexuality research. It is called the A) Asexuality and Visibility Education Network. B) Friends of Asexuality. C) League of Asexual and Non-Sexual Individuals. D) American Society of Affiliated Individuals for Asexuality.
A) Asexuality and Visibility Education Network.
Test item writers must keep many considerations in mind. Which of the following is not typically one of those considerations? A) Will the test be administered by an instructor or a teaching assistant? B) Which item format or formats should be employed? C) How many items should be written in total? D) What range of content should the items cover?
A) Will the test be administered by an instructor or a teaching assistant?
As described in the text, all of the following are elements of a matching item except A) a column listing propositions. B) a column listing responses. C) a column listing premises. D) a place to insert the correct number or letter choice.
A) a column listing propositions.
Which is an example of the selected-response item format? A) a multiple-choice item B) a fill-in-the-blank item C) Both a multiple-choice item and a fill-in-the-blank item are correct. D) None of the answers is correct.
A) a multiple-choice item
Ideally, the first draft of a test should include at least how many items as compared with the final version of the test? A) about twice the number of the final version B) about half the number of the final version C) about three times the number of the final version D) roughly the same number as the final version
A) about twice the number of the final version
Item branching refers to A) administering certain test items on a test depending on the test takers' responses to previous test items. B) the creation of alternate and parallel forms of tests based on a group of test takers' responses to the original test. C) statistical efforts to ensure that items translated into foreign languages are of the same difficulty. D) reusing items in an original test that were originally developed for use in a parallel test.
A) administering certain test items on a test depending on the test takers' responses to previous test items.
One of the questions that the developer of a new test must answer is, "Should more than one form of the test be developed?" In answering this question, a primary consideration is A) development costs. B) test content. C) test reliability. D) item discrimination.
A) development costs.
Brotto and Yule expressed their belief that their new measure of asexuality A) does not depend on one's self-identification as asexual. B) is not capable of identifying the individual who exhibits characteristics of a lifelong lack of sexual attraction in the absence of personal distress. C) should be used with caution as a tool of recruitment with members of the asexuality population. D) All of the answers are correct.
A) does not depend on one's self-identification as asexual.
The higher the item-difficulty index, the _____ the item. A) easier B) harder C) more robust D) less robust
A) easier
The method of paired comparisons is used to A) force test takers to choose between two stimuli for each test item and analyze the response against that of a group of judges. B) maximize the opportunity of selecting a socially desirable response. C) obtain data that are presumed to be interval in nature, by transforming the test taker's responses into a direct paired scale. D) provide test takers with a limited number of pairs of choices in order to minimize testing time.
A) force test takers to choose between two stimuli for each test item and analyze the response against that of a group of judges.
The best type of item yields an item-characteristic curve that A) has a positive slope. B) has a negative slope. C) is leptokurtic. D) has few, if any, outliers.
A) has a positive slope.
An item-discrimination index typically compares performance of A) high scorers with low scorers on a particular item. B) medium scorers with low and high scorers on a particular item. C) all the lower scorers on a particular item. D) a group of test takers on one test item with their performance on another item.
A) high scorers with low scorers on a particular item.
A sensitivity review typically focuses on which the following? A) individual test items B) the standardization sample C) statistics used as part of validity and reliability studies D) the extent to which latent traits are latent
A) individual test items
Criterion-referenced testing and assessment is most typically employed in A) licensing for occupations and professions. B) the diagnosis of reading difficulties. C) competition for scholarships. D) in situations where the criteria required for success are vague.
A) licensing for occupations and professions.
Items for an item bank A) may be taken from existing tests. B) are always written especially for the item bank. C) have never before been administered. D) earn interest at prime minus one percent.
A) may be taken from existing tests.
Instruments that contain items that function differentially A) may have reduced validity. B) may have inflated reliability. C) are last to be banked in an item bank. D) are informally referred to as "DIFFED."
A) may have reduced validity.
When analyzing a particular item's discriminative abilities for an ability test, a test developer typically compares the item's responses A) of the highest and the lowest scorers on the test. B) of the highest and middle scorers on the test. C) to the performance on the test of minority groups to rule out any possible bias. D) of test takers from predefined age groups to rule out any possible age discrimination.
A) of the highest and the lowest scorers on the test.
A professor who asks a colleague to regrade a set of essay questions is most likely trying to address or prevent concerns about A) rater error. B) validity shrinkage. C) criterion-related validity. D) test-retest reliability.
A) rater error.
Multiple-choice items draw primarily on which test taker ability? A) recognition B) organization C) planning D) perceptual-motor skills
A) recognition
Computer adaptive testing has been found to A) reduce by as much half the number of test items administered. B) increase the number of test items administered by as much as double. C) increase measurement error but within tolerable limits. D) increase inter-item consistency by as much as 50 percent.
A) reduce by as much half the number of test items administered.
As part of the process of test development, the term test revision best refers to the A) rewording, deletion, or development of new items. B) development of a completely new test. C) reprinting of a test after a previous edition has sold out. D) Both rewording, deletion, or development of new items and development of a completely new test are correct.
A) rewording, deletion, or development of new items.
The process of differential item functioning (DIF) analysis entails A) scrutinizing item response curves for DIF items. B) interviewing people from different cultures. C) administering tests in different ways. D) Both interviewing people from different cultures and administering tests in different ways are correct.
A) scrutinizing item response curves for DIF items.
With regard to the test tryout phase of test development, A) test conditions should be as similar to the actual administration as possible. B) at least 500 subjects should be included to ensure accurate results. C) the sample used must be nationally representative. D) All of the answers are correct.
A) test conditions should be as similar to the actual administration as possible.
When an item-characteristic curve of an ability test item has an inverted U shape, it usually indicates that A) test takers of moderate ability have the highest probability of answering the item correctly. B) test takers of low ability have the highest probability of answering the item correctly. C) test takers of high ability have the highest probability of answering the item correctly. D) the item is working as well as any item on this test could be expected to work.
A) test takers of moderate ability have the highest probability of answering the item correctly.
As mentioned in the text, CAT is available on a wide array of platforms including A) the Internet. B) X-box. C) Playstation. D) All of the answers are correct.
A) the Internet.
The higher the item-reliability index, A) the higher the internal consistency of the test. B) the lower the internal consistency of the test. C) the more likely the test taker is to miss the item. D) the more likely the test developer is to eliminate the item.
A) the higher the internal consistency of the test.
In a cumulative model of scoring applied to an ability test A) the higher the total score, the higher the test taker is on the ability measured by the test. B) the pattern of responses is critically important when judging the ability of the test taker. C) comparisons of the test taker's performance on tests tapping similar abilities may easily be made. D) All of the answers are correct.
A) the higher the total score, the higher the test taker is on the ability measured by the test.
What is the value of the item-discrimination index for an item that all the students in the higher-scoring group answered correctly but that no one in the lower-scoring group answered correctly? A) -1 B) +1 C) .50 D) .25
B) +1
Item-discrimination indices can range from A) .001 to 1.00. B) -1 to +1. C) 0 percent to 100 percent. D) 1 to 100.
B) -1 to +1.
If 100 people take a test and 20 of those test takers answer a particular item correctly, then the p value of the item is A) .25. B) .20. C) .40. D) .04.
B) .20.
A test developer is designing a standardized test using a multiple-choice format. The final form of the test will contain 50 items. It would be advisable for the first draft of this test to contain, at least, how many items? A) 50 B) 100 C) 150 D) 25
B) 100
In the course of developing their asexuality measure, Brotto and Yule were able to identify about _____ percent of self-identified asexual individuals. A) 88 B) 93 C) 94 D) 97
B) 93
Which statement is true regarding an item-discrimination index? A) It provides a measure of the percent of people who said yes to or agreed with an item. B) A negative index value on a particular item calls for revising or eliminating the item. C) Tetrachoric correlation is most frequently used in estimating the item-discrimination index of each item. D) All of the answers are correct.
B) A negative index value on a particular item calls for revising or eliminating the item.
The concept of asexuality was first introduced by A) William Masters. B) Alfred Kinsey. C) Virginia Johnson. D) Both William Masters and Virginia Johnson are correct.
B) Alfred Kinsey.
In response to the need for an instrument to help identify individuals who have experienced a lifelong lack of sexual attraction, but who have never heard the term "asexual," Yule et al. (2015) developed a test called the A) Asexuality Evaluation Schedule. B) Asexuality Identification Scale. C) Asexual Research Subject Selector. D) None of the answers is correct.
B) Asexuality Identification Scale.
Which is true of item-characteristic curves (ICCs)? A) For items that are fair to different groups of test takers, the ICCs for these groups should be significantly different. B) Biased items exhibit different shapes of ICCs for different groups when the two groups do not differ in total test score. C) A steep slope of ICC tells us that test takers of moderate ability have the highest probability of answering the item correctly. D) They are used as an aid in determining the kurtosis of a distribution of test scores.
B) Biased items exhibit different shapes of ICCs for different groups when the two groups do not differ in total test score.
The following item appears on an end-of-semester course evaluation in a test and measurements course: The most interesting class I am taking this semester is "Tests and Measurements." The possible responses are 1.strongly agree 2.agree 3.unsure 4.disagree 5.strongly disagree This item illustrates what approach to scaling? A) nomothetic B) Likert C) Guttman D) ipsative
B) Likert
Test items that contain alternatives with five points ranging from "strongly agree" to "strongly disagree" are characterized as using A) Guttman scaling. B) Likert scaling. C) Nielson scaling. D) Opinion scaling.
B) Likert scaling.
If all raw scores on a test are to be converted to scores that range only from 1 to 9, the resulting scale is referred to as A) a unidimensional scale. B) a stanine scale. C) a multidimensional scale. D) None of the answers is correct.
B) a stanine scale.
With regard to item-discrimination indices, a d equal to -1 is A) a test developer's dream. B) a test developer's nightmare. C) a test taker's dream. D) an insomniac's nightmare.
B) a test developer's nightmare.
Which scaling method entails a process by which measures of item difficulty are obtained from samples of test takers who vary in ability? A) difficulty scaling B) absolute scaling C) content scaling D) sample-contingent scaling
B) absolute scaling
A math test developer is interested in deriving an index of the difficulty of the average item for his math test. As his consultant on test development, you advise him that this index could be obtained by A) identifying the item deemed to be the average in difficulty and then deriving an item-difficulty index for that item. B) adding together the item-difficulty indices for all test items and then dividing by the total number of items on the test. C) dividing the total number of items on the test by the average item-difficulty index. D) finding the difference between the greatest and the least item-difficulty indices and then dividing by two.
B) adding together the item-difficulty indices for all test items and then dividing by the total number of items on the test.
A disadvantage of applying classical test theory (CTT) in test development is that A) the number of test takers in the sample must be very large. B) all CTT-based statistics are sample-dependent. C) assumptions underlying CTT use are weak. D) All of the answers are correct.
B) all CTT-based statistics are sample-dependent.
An item-difficulty index of 1 occurs when A) all examinees answer the item incorrectly. B) all examinees answer the item correctly. C) examinees are evenly divided between correct and incorrect responses. D) None of the answers is correct.
B) all examinees answer the item correctly.
Consider the following sample True-False item. "I am going to ace this course in psychological testing and assessment." Circle TRUE or FALSE according to your own belief. This item is an example of an item that A) is referred to in psychometric parlance as trinitarian in nature. B) can be used only when a dichotomous choice can be made without qualification. C) Both is referred to in psychometric parlance as trinitarian in nature and can be used only when a dichotomous choice can be made without qualification are correct. D) None of the answers is correct.
B) can be used only when a dichotomous choice can be made without qualification.
The item-validity index is key in determining A) construct validity. B) criterion-related validity. C) content validity. D) All of the answers are correct.
B) criterion-related validity.
Brotto and Yule reported that the development of their measure of asexuality was developed in four stages. Which best characterizes Stage 1? A) literature search for definitions of asexuality B) development of open-ended questions C) literature search for correlates of asexuality D) writing and submission of a research grant request
B) development of open-ended questions
Scoring drift refers to A) the tendency of scorers to give higher scores to test takers with certain characteristics (such as age and gender) that is similar to themselves. B) differences between the typical scoring of an item during standardization and subsequent, more authoritative scoring of an item. C) a gradual decline in inter-scorer reliability after 95 percent of the examinations have been scored due to scorer fatigue. D) a flexible method of scoring test items for populations other than that of the standardization sample.
B) differences between the typical scoring of an item during standardization and subsequent, more authoritative scoring of an item.
A test item functions differently in one group of test takers as compared to another group of test takers known to have the same level of an underlying trait. This phenomenon is known as A) dysfunctional item syndrome. B) differential item functioning. C) differential item difference. D) differential item incongruity.
B) differential item functioning.
One of the advantages of computerized adaptive testing (CAT) is that A) all test items are administered to all test takers. B) floor effects are reduced. C) the ceiling has been removed. D) the basement has been finished.
B) floor effects are reduced.
A well-written true-false item A) includes multiple ideas. B) has a correct response that is either true or false, and not subject to debate. C) typically contains irrelevant information as a distracter. D) Both includes multiple ideas and has a correct response that is either true or false, and not subject to debate are correct.
B) has a correct response that is either true or false, and not subject to debate.
A good item on a norm-referenced achievement test is an item that A) demonstrates that the test taker has met certain pre-specified criteria. B) high scorers respond to correctly while low scorer respond to the same incorrectly. C) both high and low scorers respond to correctly. D) low scorers seek clarification regarding the meaning of the question.
B) high scorers respond to correctly while low scorer respond to the same incorrectly.
In ipsative scoring, a test taker's scores are compared to A) the scores of other test takers from the same geographic area who are similar with regard to key demographic variables. B) his or her scores on another scale on the same test. C) the scores of other test takers from past years who have taken the same test under the same or similar conditions. D) his or her other scores on a parallel form of the same test.
B) his or her scores on another scale on the same test.
Possible applications of IRT were discussed in your textbook. Which of the following is not one of those possible applications? A) determining measurement equivalence across test taker populations B) identifying a common metric among several tests measuring the same construct C) evaluating existing tests for the purpose of mapping test revisions D) developing item banks
B) identifying a common metric among several tests measuring the same construct
The greater the value of the item-discrimination index, the more test takers answered the item correctly in the higher-scoring group as compared to test takers A) who served as the nontesttaking control group. B) in the lower-scoring group. C) who participated in the test standardization. D) None of the answers is correct.
B) in the lower-scoring group.
With regard to the test revision process, it typically A) takes about one year to complete. B) includes all of the steps that the initial test development included. C) is much less expensive than the original development of a test. D) All of the answers are correct.
B) includes all of the steps that the initial test development included.
In item analysis, the term item-endorsement index refers to the percent of test takers who A) responded correctly to a particular item. B) indicated that they agreed with a particular item. C) passed the item on a pass/fail test of ability. D) consented to answer an optional item.
B) indicated that they agreed with a particular item.
An item-characteristic curve includes all of the following except A) information that can be used to judge item bias. B) information that can be used to correct item guessing. C) item-discrimination information. D) item-difficulty information.
B) information that can be used to correct item guessing.
An item-reliability index provides a measure of a test's A) test-retest reliability. B) internal consistency. C) stability. D) All of the answers are correct.
B) internal consistency.
The two columns of a matching item may contain different number of items because this makes A) the odds of cheating successfully on this type of item significantly less. B) it more difficult to achieve a perfect score by guessing. C) the role of chance a much greater factor than it would be otherwise. D) it possible for test takers to decline to respond to certain items.
B) it more difficult to achieve a perfect score by guessing.
A test developer designs a test for the sole purpose of identifying the most highly skilled individuals among those tested. During the test revision stage of test development, the test developer will be particularly interested in A) item bias. B) item discrimination. C) item reliability. D) item validity.
B) item discrimination.
The inspiration to create a new test may come from many varied sources. Thinking of the illustrative descriptions of inspiration cited in your text, which of the following is not a possible source of inspiration for the creation of a new test? A) an emerging social phenomenon suggests the need for a psychological test B) legislation has been passed ordering the creation of a new psychological test C) a review of the literature on an existing test suggests a need for a new psychological test D) a test developer thinks "there is a need for this sort of test"
B) legislation has been passed ordering the creation of a new psychological test
If an item-discrimination index is negative, A) high scorers are more likely to answer the item correctly than low scorers. B) low scorers are more likely to answer the item correctly than high scorers. C) the alternate form of the test is probably not equivalent. D) the computer scoring is in error because this index is not supposed to be negative.
B) low scorers are more likely to answer the item correctly than high scorers.
A negative item-discrimination index results for a particular item when A) more high scorers than low scorers on a test get the item correct. B) more low scorers than high scorers on a test get the item correct. C) an item is found to be biased and unfair. D) most test takers do not enter the response keyed correct for the particular item.
B) more low scorers than high scorers on a test get the item correct.
Sorting techniques can be employed to develop A) nominal scales. B) ordinal scales. C) interval scales. D) All of the answers are correct.
B) ordinal scales.
As with the use of other rating scales, the use of Likert scales typically yields _____ data. A) nominal-level B) ordinal-level C) interval-level D) ratio-level
B) ordinal-level
Using the method of paired comparisons yields A) nominal-level data. B) ordinal-level data. C) interval-level data. D) ratio-level data.
B) ordinal-level data.
An item-characteristic curve A) is the single best index of guessing that a test user can use. B) plots the difficulty and discrimination of an item. C) Both is the single best index of guessing that a test user can use and plots the difficulty and discrimination of an item are correct. D) None of the answers is correct.
B) plots the difficulty and discrimination of an item.
In his article entitled "A Method of Scaling Psychological and Educational Tests," L. L. Thurstone introduced absolute scaling, which was a A) procedure for obtaining a measure of item validity. B) procedure for obtaining a measure of item difficulty. C) procedure for deriving equal-appearing intervals. D) procedure for divining item reliability.
B) procedure for obtaining a measure of item difficulty.
A student raises concern that a professor has given different grades to two essay answers that are very similar. From a psychometric perspective, the student is expressing concerns about A) criterion-related validity. B) rater error. C) test-retest reliability. D) parallel forms reliability.
B) rater error.
A "good" test item on an ability test is one A) to which almost all test takers respond correctly. B) that distinguishes high scorers from low scorers. C) to which almost all test takers respond incorrectly. D) in which it is absolutely impossible to guess the correct answer.
B) that distinguishes high scorers from low scorers.
When considering the effect of guessing, the optimal level of item difficulty is most typically A) .5. B) the midpoint between 1.0 and the chance of success by random guessing. C) .25. D) the midpoint between 0 and the chance of success by random guessing.
B) the midpoint between 1.0 and the chance of success by random guessing.
As a distribution of scores gets flatter, what happens to the optimal boundary line for determining higher- and lower-scoring groups for item-discrimination indices? A) the optimal boundary line gets smaller B) the optimal boundary line gets larger C) the optimal boundary line does not change D) the optimal boundary line ceases to be optimal
B) the optimal boundary line gets larger
You are interested in developing a test for social adjustment in a college fraternity or sorority. You begin by interviewing persons who had graduated from college after having been a member of a fraternity or sorority for at least two years. Which stage of the test development process best describes the stage that you are in? A) the test-tryout stage B) the pilot work stage C) the test construction stage D) None of the answers is correct.
B) the pilot work stage
An item-discrimination index is used on an ability test A) to determine whether items are measuring what they are designed to measure. B) to measure the difference between how many high scorers and how many low scorers answered the item correctly. C) to estimate how predictive the item is of the test taker's future performance. D) to measure the difference between how many median scorers and how many low scorers answered the item correctly.
B) to measure the difference between how many high scorers and how many low scorers answered the item correctly.
An advantage of using a true-false item format over a multiple-choice item format in a teacher-made test designed for classroom use is A) true-false items are applicable to a wider range of subject areas. B) true-false items are easier to write. C) true-false items reduce the odds of a correct answer as the result of guessing. D) true-false items will never become dated.
B) true-false items are easier to write.
Guttman scales A) are typically used with nominal categories. B) typically are constructed so that agreement with one statement may predict agreement with another statement. C) typically are constructed so that agreement with one statement should not be correlated with agreement with any other statement. D) were originally developed by a Peace Corps task force.
B) typically are constructed so that agreement with one statement may predict agreement with another statement.
Ideally, psychological or educational tests are revised A) every decade. B) when the test is no longer useful. C) as a function of annual test sales. D) None of the answers is correct.
B) when the test is no longer useful.
A decision is made to use only a few subjects per item during the test tryout phase of a test's construction. This decision is most likely to lead to A) "phantom factors" during test construction. B) "phantom factors" during the test administration. C) "phantom factors" during factor analysis. D) "phantom deposits" in the test author's royalty account.
C) "phantom factors" during factor analysis.
When the effect of guessing is taken into account, what is the optimal item-difficulty level for a true-false item? A) .50 B) .60 C) .75 D) 1.00
C) .75
If 50 students were administered a classroom test, how many would be included in each upper and lower group for the purpose of calculating d, the item-discrimination index? A) 25 B) 10 C) 13 D) 17
C) 13
A test developer has created a pool of 30 items and is ready for a test tryout. At a minimum, how many subjects should the test be administered to? A) 60 B) 120 C) 150 D) 180
C) 150
According to your textbook, for a test tryout, the minimum sample for each item on the test is A) one-half of the number of test takers in the standardization sample. B) 25 test takers. C) 5 test takers. D) 500 test takers.
C) 5 test takers.
Which statement best describes the relationship between item difficulty and a "good" item? A) The difficulty level is not a factor in determining a "good" item. B) An item with a high difficulty level is likely to be "good." C) An item with a mid-range difficulty level is likely to be "good." D) An item with a low difficulty level is likely to be "good."
C) An item with a mid-range difficulty level is likely to be "good."
A strategy for cheating on an examination entails one test taker memorizing items and later recalling and reciting them for the benefit of a future test taker. This cheating strategy may be countered by A) a computer-tailored test administration to each test taker. B) a computer-randomized presentation of test items. C) Both a computer-tailored test administration to each test taker and a computer-randomized presentation of test items are correct. D) None of the answers is correct.
C) Both a computer-tailored test administration to each test taker and a computer-randomized presentation of test items are correct.
A test developer of multiple-choice ability tests reviews data from a recent test administration. She discovers that all test takers who scored very high on the test as a whole responded to item 13 with the same incorrect choice. Accordingly, the test developer A) assumes that members of the high-scoring group are making some sort of unintended interpretation of item 13. B) plans to interview members of the high-scoring group to understand the basis for their choice. C) Both assumes that members of the high-scoring group are making some sort of unintended interpretation of item 13 and plans to interview members of the high-scoring group to understand the basis for their choice are correct. D) should remove item 13 from the test and place in its stead a note that reads: "Go to Item 14."
C) Both assumes that members of the high-scoring group are making some sort of unintended interpretation of item 13 and plans to interview members of the high-scoring group to understand the basis for their choice are correct.
The development of a criterion-referenced test usually entails A) exploratory work with a group of test takers who have mastered the material. B) exploratory work with a group of test takers who have not mastered the material. C) Both exploratory work with a group of test takers who have mastered the material and exploratory work with a group of test takers who have not mastered the material are correct. D) None of the answers is correct.
C) Both exploratory work with a group of test takers who have mastered the material and exploratory work with a group of test takers who have not mastered the material are correct.
To increase the precision of a test, test developers may have to A) increase the number of test items. B) increase the number of response options. C) Both increase the number of test items and increase the number of response options are correct. D) None of the answers is correct.
C) Both increase the number of test items and increase the number of response options are correct.
One of the questions that the developer of a new test must answer is, "How will the test be administered?" The answer to this question may be A) the test will be individually administered. B) the test will be group administered. C) Both the test will be individually administered and the test will be group administered are correct. D) None of the answers is correct.
C) Both the test will be individually administered and the test will be group administered are correct.
Factor analysis can help the test developer A) to eliminate or revise items that do not load on the predicted factor. B) to identify whether test items appear to be measuring the same construct. C) Both to eliminate or revise items that do not load on the predicted factor and to identify whether test items appear to be measuring the same construct are correct. D) None of the answers is correct.
C) Both to eliminate or revise items that do not load on the predicted factor and to identify whether test items appear to be measuring the same construct are correct.
Which is true of cross-validation of a test after standardization has occurred? A) Cross-validation creates confusion regarding the meaning of the original standardization data. B) The cross-validation sample is composed of the same test takers that participated in the original test standardization. C) Cross-validation often results in validity shrinkage. D) All of the answers are correct.
C) Cross-validation often results in validity shrinkage.
Which is not a typical question that is raised and answered during the test conceptualization stage of test development? A) What is the objective of the test? B) Is there a need for the test? C) How valid are the items on the test? D) What types of responses will be required of the test taker?
C) How valid are the items on the test?
Which is true of Thurstone's equal-appearing intervals method of scaling? A) It is relatively simple to construct. B) It demands that the test taker sort item responses into stacks of similar content. C) It uses judges' ratings to assign values to items. D) It is typically devised using proprietary software developed by Louis Thurstone's grandchildren.
C) It uses judges' ratings to assign values to items.
Which is true of item analysis on speed tests? A) Results of the item analysis are relatively easy to interpret and are clear. B) Item-difficulty levels are lower toward the end of the test. C) Item-discrimination levels are higher toward the end of the test. D) Later items tend to have low item-total correlations.
C) Item-discrimination levels are higher toward the end of the test.
An item bank is A) a computerized system whereby test items "pay dividends" only when used. B) the optimum combination of reliability and validity in an item. C) a set of test items from which a test can be constructed. D) a statistical item-discrimination index for data relating to high and low scorers on a test.
C) a set of test items from which a test can be constructed.
Having a large item pool available during test revision is A) a disadvantage due to the great expense of item development. B) often a waste of time because many of the items are eventually deleted. C) an advantage because poor items can be deleted in favor of the good items. D) a great perk for test developers who are swimming enthusiasts.
C) an advantage because poor items can be deleted in favor of the good items.
Jana takes a personality test administered by the "True Compatibility Dating Service." According to the personalized, computerized personality profile that results, Jana learns that her need for exhibitionism is much greater than her need for stability. Since the test analyzes data only with regard to Jana, and no other client of the dating service, it may be assumed that the test was scored using A) a diagnostic model. B) a cumulative model. C) an ipsative model of scoring. D) truly compatible models.
C) an ipsative model of scoring.
On a particular test, men and women tend to have the same total score. Men and women do, however, tend to exhibit different response patterns to specific items. A reasonable conclusion is that the test is A) unreliable. B) invalid. C) biased. D) scaled.
C) biased.
The Rokeach values measure involves presenting the subject with index cards, on each of which a single value is listed. Test takers are asked to place the cards in order of their own concern about each of the values. This procedure best exemplifies A) multidimensional scaling. B) Likert scaling. C) comparative scaling. D) Murray scaling.
C) comparative scaling.
A student complains that a midterm examination did not include items from a particular in-class lecture. From a psychometric perspective, the student is expressing concern about the midterm's A) test-retest reliability. B) internal consistency reliability. C) content validity. D) cross-validation.
C) content validity.
These tests are often used for the purpose of licensing persons in professions. The tests referred to here are A) pilot tests. B) norm-referenced tests. C) criterion-referenced tests. D) Guttman scales.
C) criterion-referenced tests.
The higher an item-validity index, the greater the _____ validity. A) construct B) content C) criterion-related D) face
C) criterion-related
In creating a test designed to measure personality constructs, the test developer's first step would best be to A) determine which items would lead to socially desirable responses. B) create a large pool of potential items. C) define the construct or constructs being measured. D) select a representative sample of test takers for test tryout.
C) define the construct or constructs being measured.
Test developers have at their disposal a number of statistical tools that may be applied when selecting items for use on a test. In Chapter 8's Meet an Assessment Professional, Dr. Scott Birkeland made reference to two such techniques. One was a measure of item discrimination, and the other was a measure of item A) reliability. B) utility. C) difficulty. D) variance.
C) difficulty.
Most classroom tests developed by instructors for use in their own classroom are A) subjected to formal procedures of psychometric evaluation. B) only evaluated formally for content validity. C) evaluated informally for their psychometric properties. D) used without modification, year after year, until retirement or death.
C) evaluated informally for their psychometric properties.
A sensitivity review panel would most likely be formed of A) only experts from the majority group. B) only experts from a particular minority group. C) experts representing both minority and majority groups. D) measurement specialists from all continents known for their sensitivity.
C) experts representing both minority and majority groups.
Brotto and Yule reported that the development of their measure of asexuality was developed in four stages. Which best characterizes what they did during Stages 2 and 3? A) analysis of variance B) regression analysis C) factor analysis D) meta-analysis
C) factor analysis
An individually administered test designed for use with elementary-school-age student is in the test tryout stage of test development. For the purposes of the tryout, this test should be administered A) as a group test to as many classes as possible in an elementary school. B) individually to high school students for exploratory purposes. C) individually to elementary-school-age students in an environment that simulates the way that the final version of the test will be administered. D) to experts in elementary school education to ensure that the items are appropriate for elementary school-aged children.
C) individually to elementary-school-age students in an environment that simulates the way that the final version of the test will be administered.
Asexuality A) is a sexual orientation. B) is not a sexual orientation. C) is considered by some to be a sexual orientation and not by others. D) was delisted as a sexual orientation inDiagnostic and Statistical Manual of Mental Disorders -V.
C) is considered by some to be a sexual orientation and not by others.
On a true-false inventory, a respondent selects true for an item that reads, " I summer in Tehran." The individual scoring the test would best interpret this response as indicative of the fact that this respondent A) is extremely eccentric with respect to choice of time shares. B) requires more sensation seeking than Cape Cod has to offer. C) is responding randomly to test items. D) None of the answers is correct.
C) is responding randomly to test items.
An analysis of a test's item may take many forms. Thinking of the descriptions cited in your text, which is not one of those forms? A) item validity analysis B) item discrimination analysis C) item tryout analysis D) item reliability analysis
C) item tryout analysis
A test item written in a multiple-choice format has three elements. Which of the following is not one of those elements? A) foil B) stem C) leaf D) correct option
C) leaf
Test developers calculate an item-validity index to A) understand why an item is difficult or easy. B) reduce the likelihood of an examinee's guessing. C) maximize the test's criterion-related validity. D) determine the internal consistency of the test.
C) maximize the test's criterion-related validity.
Expert panels may be used in the process of test development to A) provide judgments concerning each item's reliability. B) serve as expert witnesses in any future litigation. C) screen test items for possible bias. D) All of the answers are correct.
C) screen test items for possible bias.
Which is an example of the use of a completion item format on a test? A) true-false items B) matching items C) short-answer items D) multiple-choice item
C) short-answer items
The Likert scale is an example of which type of rating scale? A) categorical B) paired methods C) summative D) content
C) summative
In contrast to scaling methods that employ indirect estimation, scaling methods that employ direct estimation do not require A) writing two sets of items for parallel forms. B) the use of the method of equal-appearing intervals. C) transforming test taker responses into some other scale. D) indirect methods to interpret test taker responses.
C) transforming test taker responses into some other scale.
The term used to describe the decrease in item validities that typically occurs during cross-validation is A) validity detriment. B) validity decrement. C) validity shrinkage. D) cross-validation devaluation.
C) validity shrinkage.
In Guttman scaling, A) test takers are presented with a forced-choice format. B) each item is completely independent of every other item and nothing can be concluded as a result of the endorsement of an item. C) when one item is endorsed by a test taker, the less extreme aspects of that item are also endorsed. D) when more than one item tapping a particular content area is endorsed, the less extreme aspects of those items are eliminated.
C) when one item is endorsed by a test taker, the less extreme aspects of that item are also endorsed.
What is the value of the item-discrimination index for an item answered correctly by an equal number of students in the higher- and lower-scoring groups? A) -1 B) +1 C) .50 D) 0
D) 0
A disadvantage of recruiting asexual research subjects from a single online community is that A) the persons belonging to the online community may constitute a unique group within the asexual population. B) the persons belonging to the online community have already acknowledged their asexuality as an identity. C) asexual individuals who do not belong to the community will be systematically omitted. D) All of the answers are correct.
D) All of the answers are correct.
A test manual for a commercially prepared test should ideally include A) a description of the test development procedures used. B) test-retest reliability data. C) internal-consistency reliability data. D) All of the answers are correct.
D) All of the answers are correct.
Ability tests are typically standardized on a sample that is representative of the general population and selected on the basis of variables such as A) age. B) gender. C) geographic region. D) All of the answers are correct.
D) All of the answers are correct.
According to Brotto and Yule, their new measure of asexuality performed satisfactorily on A) a measure of incremental validity. B) a measure of convergent validity. C) a measure of discriminant validity. D) All of the answers are correct.
D) All of the answers are correct.
An analysis of item alternatives for a multiple-choice test can yield information about A) the effectiveness of distractor choices. B) which items are in need of revision. C) test taker response patterns. D) All of the answers are correct.
D) All of the answers are correct.
As a result of a sensitivity review, items containing _____ may be eliminated from a test. A) offensive language B) stereotypes C) unfair reference to situations D) All of the answers are correct.
D) All of the answers are correct.
Brotto and Yule established the discriminant validity of their measure of asexuality by comparing scores on it with scores on A) the Childhood Trauma Questionnaire. B) the Short-Form Inventory of Interpersonal Problems-Circumplex scales. C) the Big-Five Inventory. D) All of the answers are correct.
D) All of the answers are correct.
In general, what can be said about an item analysis of a speed test? A) Results are often misleading and difficult to interpret. B) Item-difficulty levels are higher toward the end of the test. C) Item-discrimination levels are higher for later items. D) All of the answers are correct.
D) All of the answers are correct.
It is a term that is used to refer to the preliminary research surrounding the creation of a prototype of a test. Which of the following best describes that term? A) pilot work B) pilot study C) pilot research D) All of the answers are correct.
D) All of the answers are correct.
Item analysis is conducted to evaluate A) item reliability. B) item validity. C) item difficulty. D) All of the answers are correct.
D) All of the answers are correct.
Likert scales measure attitudes using continuums. A continuum of items measuring _____ could be used for a Likert scale. A) like it todo not like it B) agree todisagree C) approve todo not approve D) All of the answers are correct.
D) All of the answers are correct.
The "think aloud" test administration format A) has examinees literally thinking aloud as they respond to each item on a test. B) is a qualitative technique. C) can help test developers understand how an examinee interprets particular items. D) All of the answers are correct.
D) All of the answers are correct.
The elements of a multiple-choice item include A) a stem. B) distractors. C) foils. D) All of the answers are correct.
D) All of the answers are correct.
The idea for a new test may come from A) social need. B) review of the available literature. C) common sense appeal. D) All of the answers are correct.
D) All of the answers are correct.
The so-called "smiley face" scales may be used with A) young children. B) adolescents who have limited language skills. C) adults who have limited language skills. D) All of the answers are correct.
D) All of the answers are correct.
To calculate an item-reliability index, one must have previously calculated A) the correlation between the item score and the criterion. B) the correlation between the item score and the total score. C) the item-score standard deviation. D) All of the answers are correct.
D) All of the answers are correct.
When a test is translated from one language in one culture to another language in another culture, _____ can help ensure that the original test and the translated test are reasonably equivalent and tapping the same construct. A) a translator B) item response theory C) bi-lingual people who are experts on the two cultures D) All of the answers are correct.
D) All of the answers are correct.
When testing is conducted by means of a computer within a CAT context, it means that A) a test taker's response to one item may automatically trigger what item will be presented next. B) testing may be terminated based on some preset number of consecutive item failures. C) testing may be terminated based on some preset, maximum number of items being administered. D) All of the answers are correct.
D) All of the answers are correct.
Which of the following conditions may lead to the decision to revise a psychological or educational test? A) item content, including the vocabulary used in instructions and pictures, has become dated B) test norms no longer represent the population for which the test is designed C) reliability and validity of a test can be improved by a revision D) All of the answers are correct.
D) All of the answers are correct.
Which statement is true regarding test development and test taker guessing? A) Methods have been designed to detect guessing. B) Methods have been designed to statistically correct for guessing. C) Methods have been designed to minimize the effects of guessing. D) All of the answers are correct.
D) All of the answers are correct.
An example of a selected-response type of item is A) a multiple-choice item. B) an essay item. C) a matching item. D) Both a multiple-choice item and a matching item are correct.
D) Both a multiple-choice item and a matching item are correct.
Co-validation is A) highly recommended for evaluating item validity shrinkage. B) also referred to as co-norming. C) a strategy that can save time and money for the test publisher. D) Both also referred to as co-norming and a strategy that can save time and money for the test publisher are correct.
D) Both also referred to as co-norming and a strategy that can save time and money for the test publisher are correct.
To ensure consistency in scoring, test developers have employed A) anchor protocols. B) resolvers. C) revolvers. D) Both anchor protocols and resolvers are correct.
D) Both anchor protocols and resolvers are correct.
As part of the test development process, a test revision may entail A) rewording, deletion, or development of new items. B) development of a new edition of a test. C) the reprinting of a test. D) Both rewording, deletion, or development of new items and development of a new edition of a test are correct.
D) Both rewording, deletion, or development of new items and development of a new edition of a test are correct.
It is needed to calculate the item-validity index. It is A) the correlation between the item score and the criterion score. B) the mean of the item-score distribution. C) the item-score standard deviation. D) Both the correlation between the item score and the criterion score and the item-score standard deviation are correct.
D) Both the correlation between the item score and the criterion score and the item-score standard deviation are correct.
Which is a major difference between comparative scaling and categorical scaling? A) Comparative scaling involves sorting stimuli; categorical scaling does not. B) Comparative scaling involves making quantitative judgments; categorical scaling does not. C) Comparative scaling involves putting stimulus cards in a set number of different piles assigned a certain meaning; categorical scaling does not. D) Comparative scaling involves rank-ordering each stimulus individually against every other stimulus; categorical scaling does not.
D) Comparative scaling involves rank-ordering each stimulus individually against every other stimulus; categorical scaling does not.
Which is a major difference between multiple-choice questions and essay questions? A) Essay questions involve primarily recognition, while multiple-choice questions involve logical reasoning. B) Essay questions are scored more objectively because the examiner is provided with more information by an examinee. C) Essay questions can test a wider range of material than multiple-choice questions. D) Essay questions allow for more creativity to be expressed by an examinee.
D) Essay questions allow for more creativity to be expressed by an examinee.
Which statement is true of guessing? A) It occurs more often on achievement than personality tests. B) It poses methodological problems for the test taker. C) Most test takers guess based on little knowledge of the subject matter. D) It poses methodological problems for the test developer.
D) It poses methodological problems for the test developer.
In order to determine whether their new measure of asexuality was useful over and above already-available measures of sexual orientation, Brotto and Yule compared it to a previously established measure of sexual orientation called the A) Sexual Desire Inventory. B) Solitary Desire subscale of the Sexual Desire Inventory. C) Abernathy Measure of Sexual Orientation. D) Klein Scale.
D) Klein Scale.
A student makes the following complaint after taking an exam: "I spent all night studying Chapter 7 and there wasn't even one test question from that chapter!" From a psychometric perspective, this student is concerned about the exam's A) error variance. B) test-retest reliability. C) rater error. D) None of the answers is correct.
D) None of the answers is correct.
According to the text, which statement is true of scaling? A) There is only one best approach to scaling and only one best type of scale. B) Ratio scaling leads to the least scoring drift. C) Ratio scaling was first developed in the Republic of Samoa. D) None of the answers is correct.
D) None of the answers is correct.
During the norming of a new intelligence test, a test publisher administers to all of the test takers not only the new intelligence test, but a vision test using an eye chart. The publisher has engaged in A) test conceptualization. B) cross-validation. C) shared validation D) None of the answers is correct.
D) None of the answers is correct.
Item banks A) were once a profit center for the Wells Fargo Company. B) originated as a result of investments made by Morgan-Stanley. C) originated as a result of investments made by Morgan Freeman. D) None of the answers is correct.
D) None of the answers is correct.
On the item characteristic curves for a test of ability, a large number of items biased in favor of male test takers is found to coexist with the exact same number of items biased in favor of female test takers. Based on these findings, it would be reasonable for the test developer to claim that the test A) measures the same ability in the two groups. B) is a fair test as any observed bias balances out. C) demonstrates gender equality for the ability measured. D) None of the answers is correct.
D) None of the answers is correct.
Who is best associated with the development of the scaling methodology? A) Galton B) Cohen C) Spearman D) Thurstone
D) Thurstone
All of the following are components of a multiple-choice item except A) a foil. B) a correct alternative. C) a stem. D) a branch.
D) a branch.
An anchor protocol is A) a previously developed test with known validity that can be used as a comparison for newly developed tests. B) a statistical procedure in which weights are assigned to each item of a model test to maximize predictive validity. C) a list of guidelines for a standardized test used to ensure that all test takers are similar in key ways to the population of the original standardization sample. D) a model for scoring and a mechanism for resolving scoring discrepancies.
D) a model for scoring and a mechanism for resolving scoring discrepancies.
An item-endorsement index is most likely to be used in which type of test? A) a cognitive test B) an achievement test C) a vocational aptitude test D) a personality test
D) a personality test
An advantage of applying item response theory (IRT) in test development is that A) the principles underlying IRT make its application easy and appealing. B) sample sizes used to test the utility of test items can be relatively small. C) assumptions underlying IRT usage are weak. D) item statistics are independent of the samples that have been administered the test.
D) item statistics are independent of the samples that have been administered the test.
All of the following are methods of evaluating item bias except A) noting differences between the item-characteristic curves. B) noting differences in the item-difficulty levels. C) noting differences in the item-discrimination indices. D) noting differences in validity shrinkage.
D) noting differences in validity shrinkage.
In the field of psychometrics, pilot work refers to the A) job of someone whose responsibility is to fly an airplane, jet, or space vehicle. B) preliminary research entailed in finalizing the form of a test. C) efforts of the lead researcher on a test development team. D) preliminary research conducted prior to the stage of test construction.
D) preliminary research conducted prior to the stage of test construction.
As illustrated in the sample item-characteristic curve published in your textbook, the vertical axis on the graph lists the A) values of the score on the test ranging from 0 to 100. B) values of the characteristic of the items on a scale of 1 to 10. C) heteroscedasticity of the item curve in values ranging from 0 to infinity. D) probability of correct response in values ranging from 0 to 1.
D) probability of correct response in values ranging from 0 to 1.
Human asexuality is generally defined as A )the absence of sexual attraction to anything at all. B) a sexual attraction only to other asexual people. C) an unwillingness or inability to experience sexual arousal. D) the absence of sexual attraction to anyone at all.
D) the absence of sexual attraction to anyone at all.
On an item characteristic curve, the steeper the curve, A) the more latent the trait is presumed to be. B) the greater the item reliability. C) the less the item discrimination. D) the greater the item discrimination.
D) the greater the item discrimination.