PSYC Chapter 8

¡Supera tus tareas y exámenes ahora con Quizwiz!

A close friend, who is now a beauty school dropout, is heard to complain: "I spent all night studying 'Shampoo' for the final examination and there was not a single question on that subject!" As a budding expert in testing and assessment, you hear that complaint as A) "I have a problem with that test's content validity!" B) "There was excessive error variance in the test administration procedures!" C) "The instructor should have paid more attention to the test's construct validity!" D) "Now I am going to have to reconsider a career as a tanning technician!"

A) "I have a problem with that test's content validity!"

An item-difficulty index can range from A) 0 to 1. B) .10 to .99. C) .25 to .75. D) 0 to 100.

A) 0 to 1.

Estimates suggest that approximately _____ percent of the population might be asexual. A) 1 B) 2 C) 3 D) 4

A) 1

The test of asexuality developed by Yule et al. (2015) contains _____ items. A) 12 B) 18 C) 36 D) 48

A) 12

It is an online community of asexual individuals that has become a source of recruitment of subjects for asexuality research. It is called the A) Asexuality and Visibility Education Network. B) Friends of Asexuality. C) League of Asexual and Non-Sexual Individuals. D) American Society of Affiliated Individuals for Asexuality.

A) Asexuality and Visibility Education Network.

Test item writers must keep many considerations in mind. Which of the following is not typically one of those considerations? A) Will the test be administered by an instructor or a teaching assistant? B) Which item format or formats should be employed? C) How many items should be written in total? D) What range of content should the items cover?

A) Will the test be administered by an instructor or a teaching assistant?

As described in the text, all of the following are elements of a matching item except A) a column listing propositions. B) a column listing responses. C) a column listing premises. D) a place to insert the correct number or letter choice.

A) a column listing propositions.

Which is an example of the selected-response item format? A) a multiple-choice item B) a fill-in-the-blank item C) Both a multiple-choice item and a fill-in-the-blank item are correct. D) None of the answers is correct.

A) a multiple-choice item

Ideally, the first draft of a test should include at least how many items as compared with the final version of the test? A) about twice the number of the final version B) about half the number of the final version C) about three times the number of the final version D) roughly the same number as the final version

A) about twice the number of the final version

Item branching refers to A) administering certain test items on a test depending on the test takers' responses to previous test items. B) the creation of alternate and parallel forms of tests based on a group of test takers' responses to the original test. C) statistical efforts to ensure that items translated into foreign languages are of the same difficulty. D) reusing items in an original test that were originally developed for use in a parallel test.

A) administering certain test items on a test depending on the test takers' responses to previous test items.

One of the questions that the developer of a new test must answer is, "Should more than one form of the test be developed?" In answering this question, a primary consideration is A) development costs. B) test content. C) test reliability. D) item discrimination.

A) development costs.

Brotto and Yule expressed their belief that their new measure of asexuality A) does not depend on one's self-identification as asexual. B) is not capable of identifying the individual who exhibits characteristics of a lifelong lack of sexual attraction in the absence of personal distress. C) should be used with caution as a tool of recruitment with members of the asexuality population. D) All of the answers are correct.

A) does not depend on one's self-identification as asexual.

The higher the item-difficulty index, the _____ the item. A) easier B) harder C) more robust D) less robust

A) easier

The method of paired comparisons is used to A) force test takers to choose between two stimuli for each test item and analyze the response against that of a group of judges. B) maximize the opportunity of selecting a socially desirable response. C) obtain data that are presumed to be interval in nature, by transforming the test taker's responses into a direct paired scale. D) provide test takers with a limited number of pairs of choices in order to minimize testing time.

A) force test takers to choose between two stimuli for each test item and analyze the response against that of a group of judges.

The best type of item yields an item-characteristic curve that A) has a positive slope. B) has a negative slope. C) is leptokurtic. D) has few, if any, outliers.

A) has a positive slope.

An item-discrimination index typically compares performance of A) high scorers with low scorers on a particular item. B) medium scorers with low and high scorers on a particular item. C) all the lower scorers on a particular item. D) a group of test takers on one test item with their performance on another item.

A) high scorers with low scorers on a particular item.

A sensitivity review typically focuses on which the following? A) individual test items B) the standardization sample C) statistics used as part of validity and reliability studies D) the extent to which latent traits are latent

A) individual test items

Criterion-referenced testing and assessment is most typically employed in A) licensing for occupations and professions. B) the diagnosis of reading difficulties. C) competition for scholarships. D) in situations where the criteria required for success are vague.

A) licensing for occupations and professions.

Items for an item bank A) may be taken from existing tests. B) are always written especially for the item bank. C) have never before been administered. D) earn interest at prime minus one percent.

A) may be taken from existing tests.

Instruments that contain items that function differentially A) may have reduced validity. B) may have inflated reliability. C) are last to be banked in an item bank. D) are informally referred to as "DIFFED."

A) may have reduced validity.

When analyzing a particular item's discriminative abilities for an ability test, a test developer typically compares the item's responses A) of the highest and the lowest scorers on the test. B) of the highest and middle scorers on the test. C) to the performance on the test of minority groups to rule out any possible bias. D) of test takers from predefined age groups to rule out any possible age discrimination.

A) of the highest and the lowest scorers on the test.

A professor who asks a colleague to regrade a set of essay questions is most likely trying to address or prevent concerns about A) rater error. B) validity shrinkage. C) criterion-related validity. D) test-retest reliability.

A) rater error.

Multiple-choice items draw primarily on which test taker ability? A) recognition B) organization C) planning D) perceptual-motor skills

A) recognition

Computer adaptive testing has been found to A) reduce by as much half the number of test items administered. B) increase the number of test items administered by as much as double. C) increase measurement error but within tolerable limits. D) increase inter-item consistency by as much as 50 percent.

A) reduce by as much half the number of test items administered.

As part of the process of test development, the term test revision best refers to the A) rewording, deletion, or development of new items. B) development of a completely new test. C) reprinting of a test after a previous edition has sold out. D) Both rewording, deletion, or development of new items and development of a completely new test are correct.

A) rewording, deletion, or development of new items.

The process of differential item functioning (DIF) analysis entails A) scrutinizing item response curves for DIF items. B) interviewing people from different cultures. C) administering tests in different ways. D) Both interviewing people from different cultures and administering tests in different ways are correct.

A) scrutinizing item response curves for DIF items.

With regard to the test tryout phase of test development, A) test conditions should be as similar to the actual administration as possible. B) at least 500 subjects should be included to ensure accurate results. C) the sample used must be nationally representative. D) All of the answers are correct.

A) test conditions should be as similar to the actual administration as possible.

When an item-characteristic curve of an ability test item has an inverted U shape, it usually indicates that A) test takers of moderate ability have the highest probability of answering the item correctly. B) test takers of low ability have the highest probability of answering the item correctly. C) test takers of high ability have the highest probability of answering the item correctly. D) the item is working as well as any item on this test could be expected to work.

A) test takers of moderate ability have the highest probability of answering the item correctly.

As mentioned in the text, CAT is available on a wide array of platforms including A) the Internet. B) X-box. C) Playstation. D) All of the answers are correct.

A) the Internet.

The higher the item-reliability index, A) the higher the internal consistency of the test. B) the lower the internal consistency of the test. C) the more likely the test taker is to miss the item. D) the more likely the test developer is to eliminate the item.

A) the higher the internal consistency of the test.

In a cumulative model of scoring applied to an ability test A) the higher the total score, the higher the test taker is on the ability measured by the test. B) the pattern of responses is critically important when judging the ability of the test taker. C) comparisons of the test taker's performance on tests tapping similar abilities may easily be made. D) All of the answers are correct.

A) the higher the total score, the higher the test taker is on the ability measured by the test.

What is the value of the item-discrimination index for an item that all the students in the higher-scoring group answered correctly but that no one in the lower-scoring group answered correctly? A) -1 B) +1 C) .50 D) .25

B) +1

Item-discrimination indices can range from A) .001 to 1.00. B) -1 to +1. C) 0 percent to 100 percent. D) 1 to 100.

B) -1 to +1.

If 100 people take a test and 20 of those test takers answer a particular item correctly, then the p value of the item is A) .25. B) .20. C) .40. D) .04.

B) .20.

A test developer is designing a standardized test using a multiple-choice format. The final form of the test will contain 50 items. It would be advisable for the first draft of this test to contain, at least, how many items? A) 50 B) 100 C) 150 D) 25

B) 100

In the course of developing their asexuality measure, Brotto and Yule were able to identify about _____ percent of self-identified asexual individuals. A) 88 B) 93 C) 94 D) 97

B) 93

Which statement is true regarding an item-discrimination index? A) It provides a measure of the percent of people who said yes to or agreed with an item. B) A negative index value on a particular item calls for revising or eliminating the item. C) Tetrachoric correlation is most frequently used in estimating the item-discrimination index of each item. D) All of the answers are correct.

B) A negative index value on a particular item calls for revising or eliminating the item.

The concept of asexuality was first introduced by A) William Masters. B) Alfred Kinsey. C) Virginia Johnson. D) Both William Masters and Virginia Johnson are correct.

B) Alfred Kinsey.

In response to the need for an instrument to help identify individuals who have experienced a lifelong lack of sexual attraction, but who have never heard the term "asexual," Yule et al. (2015) developed a test called the A) Asexuality Evaluation Schedule. B) Asexuality Identification Scale. C) Asexual Research Subject Selector. D) None of the answers is correct.

B) Asexuality Identification Scale.

Which is true of item-characteristic curves (ICCs)? A) For items that are fair to different groups of test takers, the ICCs for these groups should be significantly different. B) Biased items exhibit different shapes of ICCs for different groups when the two groups do not differ in total test score. C) A steep slope of ICC tells us that test takers of moderate ability have the highest probability of answering the item correctly. D) They are used as an aid in determining the kurtosis of a distribution of test scores.

B) Biased items exhibit different shapes of ICCs for different groups when the two groups do not differ in total test score.

The following item appears on an end-of-semester course evaluation in a test and measurements course: The most interesting class I am taking this semester is "Tests and Measurements." The possible responses are 1.strongly agree 2.agree 3.unsure 4.disagree 5.strongly disagree This item illustrates what approach to scaling? A) nomothetic B) Likert C) Guttman D) ipsative

B) Likert

Test items that contain alternatives with five points ranging from "strongly agree" to "strongly disagree" are characterized as using A) Guttman scaling. B) Likert scaling. C) Nielson scaling. D) Opinion scaling.

B) Likert scaling.

If all raw scores on a test are to be converted to scores that range only from 1 to 9, the resulting scale is referred to as A) a unidimensional scale. B) a stanine scale. C) a multidimensional scale. D) None of the answers is correct.

B) a stanine scale.

With regard to item-discrimination indices, a d equal to -1 is A) a test developer's dream. B) a test developer's nightmare. C) a test taker's dream. D) an insomniac's nightmare.

B) a test developer's nightmare.

Which scaling method entails a process by which measures of item difficulty are obtained from samples of test takers who vary in ability? A) difficulty scaling B) absolute scaling C) content scaling D) sample-contingent scaling

B) absolute scaling

A math test developer is interested in deriving an index of the difficulty of the average item for his math test. As his consultant on test development, you advise him that this index could be obtained by A) identifying the item deemed to be the average in difficulty and then deriving an item-difficulty index for that item. B) adding together the item-difficulty indices for all test items and then dividing by the total number of items on the test. C) dividing the total number of items on the test by the average item-difficulty index. D) finding the difference between the greatest and the least item-difficulty indices and then dividing by two.

B) adding together the item-difficulty indices for all test items and then dividing by the total number of items on the test.

A disadvantage of applying classical test theory (CTT) in test development is that A) the number of test takers in the sample must be very large. B) all CTT-based statistics are sample-dependent. C) assumptions underlying CTT use are weak. D) All of the answers are correct.

B) all CTT-based statistics are sample-dependent.

An item-difficulty index of 1 occurs when A) all examinees answer the item incorrectly. B) all examinees answer the item correctly. C) examinees are evenly divided between correct and incorrect responses. D) None of the answers is correct.

B) all examinees answer the item correctly.

Consider the following sample True-False item. "I am going to ace this course in psychological testing and assessment." Circle TRUE or FALSE according to your own belief. This item is an example of an item that A) is referred to in psychometric parlance as trinitarian in nature. B) can be used only when a dichotomous choice can be made without qualification. C) Both is referred to in psychometric parlance as trinitarian in nature and can be used only when a dichotomous choice can be made without qualification are correct. D) None of the answers is correct.

B) can be used only when a dichotomous choice can be made without qualification.

The item-validity index is key in determining A) construct validity. B) criterion-related validity. C) content validity. D) All of the answers are correct.

B) criterion-related validity.

Brotto and Yule reported that the development of their measure of asexuality was developed in four stages. Which best characterizes Stage 1? A) literature search for definitions of asexuality B) development of open-ended questions C) literature search for correlates of asexuality D) writing and submission of a research grant request

B) development of open-ended questions

Scoring drift refers to A) the tendency of scorers to give higher scores to test takers with certain characteristics (such as age and gender) that is similar to themselves. B) differences between the typical scoring of an item during standardization and subsequent, more authoritative scoring of an item. C) a gradual decline in inter-scorer reliability after 95 percent of the examinations have been scored due to scorer fatigue. D) a flexible method of scoring test items for populations other than that of the standardization sample.

B) differences between the typical scoring of an item during standardization and subsequent, more authoritative scoring of an item.

A test item functions differently in one group of test takers as compared to another group of test takers known to have the same level of an underlying trait. This phenomenon is known as A) dysfunctional item syndrome. B) differential item functioning. C) differential item difference. D) differential item incongruity.

B) differential item functioning.

One of the advantages of computerized adaptive testing (CAT) is that A) all test items are administered to all test takers. B) floor effects are reduced. C) the ceiling has been removed. D) the basement has been finished.

B) floor effects are reduced.

A well-written true-false item A) includes multiple ideas. B) has a correct response that is either true or false, and not subject to debate. C) typically contains irrelevant information as a distracter. D) Both includes multiple ideas and has a correct response that is either true or false, and not subject to debate are correct.

B) has a correct response that is either true or false, and not subject to debate.

A good item on a norm-referenced achievement test is an item that A) demonstrates that the test taker has met certain pre-specified criteria. B) high scorers respond to correctly while low scorer respond to the same incorrectly. C) both high and low scorers respond to correctly. D) low scorers seek clarification regarding the meaning of the question.

B) high scorers respond to correctly while low scorer respond to the same incorrectly.

In ipsative scoring, a test taker's scores are compared to A) the scores of other test takers from the same geographic area who are similar with regard to key demographic variables. B) his or her scores on another scale on the same test. C) the scores of other test takers from past years who have taken the same test under the same or similar conditions. D) his or her other scores on a parallel form of the same test.

B) his or her scores on another scale on the same test.

Possible applications of IRT were discussed in your textbook. Which of the following is not one of those possible applications? A) determining measurement equivalence across test taker populations B) identifying a common metric among several tests measuring the same construct C) evaluating existing tests for the purpose of mapping test revisions D) developing item banks

B) identifying a common metric among several tests measuring the same construct

The greater the value of the item-discrimination index, the more test takers answered the item correctly in the higher-scoring group as compared to test takers A) who served as the nontesttaking control group. B) in the lower-scoring group. C) who participated in the test standardization. D) None of the answers is correct.

B) in the lower-scoring group.

With regard to the test revision process, it typically A) takes about one year to complete. B) includes all of the steps that the initial test development included. C) is much less expensive than the original development of a test. D) All of the answers are correct.

B) includes all of the steps that the initial test development included.

In item analysis, the term item-endorsement index refers to the percent of test takers who A) responded correctly to a particular item. B) indicated that they agreed with a particular item. C) passed the item on a pass/fail test of ability. D) consented to answer an optional item.

B) indicated that they agreed with a particular item.

An item-characteristic curve includes all of the following except A) information that can be used to judge item bias. B) information that can be used to correct item guessing. C) item-discrimination information. D) item-difficulty information.

B) information that can be used to correct item guessing.

An item-reliability index provides a measure of a test's A) test-retest reliability. B) internal consistency. C) stability. D) All of the answers are correct.

B) internal consistency.

The two columns of a matching item may contain different number of items because this makes A) the odds of cheating successfully on this type of item significantly less. B) it more difficult to achieve a perfect score by guessing. C) the role of chance a much greater factor than it would be otherwise. D) it possible for test takers to decline to respond to certain items.

B) it more difficult to achieve a perfect score by guessing.

A test developer designs a test for the sole purpose of identifying the most highly skilled individuals among those tested. During the test revision stage of test development, the test developer will be particularly interested in A) item bias. B) item discrimination. C) item reliability. D) item validity.

B) item discrimination.

The inspiration to create a new test may come from many varied sources. Thinking of the illustrative descriptions of inspiration cited in your text, which of the following is not a possible source of inspiration for the creation of a new test? A) an emerging social phenomenon suggests the need for a psychological test B) legislation has been passed ordering the creation of a new psychological test C) a review of the literature on an existing test suggests a need for a new psychological test D) a test developer thinks "there is a need for this sort of test"

B) legislation has been passed ordering the creation of a new psychological test

If an item-discrimination index is negative, A) high scorers are more likely to answer the item correctly than low scorers. B) low scorers are more likely to answer the item correctly than high scorers. C) the alternate form of the test is probably not equivalent. D) the computer scoring is in error because this index is not supposed to be negative.

B) low scorers are more likely to answer the item correctly than high scorers.

A negative item-discrimination index results for a particular item when A) more high scorers than low scorers on a test get the item correct. B) more low scorers than high scorers on a test get the item correct. C) an item is found to be biased and unfair. D) most test takers do not enter the response keyed correct for the particular item.

B) more low scorers than high scorers on a test get the item correct.

Sorting techniques can be employed to develop A) nominal scales. B) ordinal scales. C) interval scales. D) All of the answers are correct.

B) ordinal scales.

As with the use of other rating scales, the use of Likert scales typically yields _____ data. A) nominal-level B) ordinal-level C) interval-level D) ratio-level

B) ordinal-level

Using the method of paired comparisons yields A) nominal-level data. B) ordinal-level data. C) interval-level data. D) ratio-level data.

B) ordinal-level data.

An item-characteristic curve A) is the single best index of guessing that a test user can use. B) plots the difficulty and discrimination of an item. C) Both is the single best index of guessing that a test user can use and plots the difficulty and discrimination of an item are correct. D) None of the answers is correct.

B) plots the difficulty and discrimination of an item.

In his article entitled "A Method of Scaling Psychological and Educational Tests," L. L. Thurstone introduced absolute scaling, which was a A) procedure for obtaining a measure of item validity. B) procedure for obtaining a measure of item difficulty. C) procedure for deriving equal-appearing intervals. D) procedure for divining item reliability.

B) procedure for obtaining a measure of item difficulty.

A student raises concern that a professor has given different grades to two essay answers that are very similar. From a psychometric perspective, the student is expressing concerns about A) criterion-related validity. B) rater error. C) test-retest reliability. D) parallel forms reliability.

B) rater error.

A "good" test item on an ability test is one A) to which almost all test takers respond correctly. B) that distinguishes high scorers from low scorers. C) to which almost all test takers respond incorrectly. D) in which it is absolutely impossible to guess the correct answer.

B) that distinguishes high scorers from low scorers.

When considering the effect of guessing, the optimal level of item difficulty is most typically A) .5. B) the midpoint between 1.0 and the chance of success by random guessing. C) .25. D) the midpoint between 0 and the chance of success by random guessing.

B) the midpoint between 1.0 and the chance of success by random guessing.

As a distribution of scores gets flatter, what happens to the optimal boundary line for determining higher- and lower-scoring groups for item-discrimination indices? A) the optimal boundary line gets smaller B) the optimal boundary line gets larger C) the optimal boundary line does not change D) the optimal boundary line ceases to be optimal

B) the optimal boundary line gets larger

You are interested in developing a test for social adjustment in a college fraternity or sorority. You begin by interviewing persons who had graduated from college after having been a member of a fraternity or sorority for at least two years. Which stage of the test development process best describes the stage that you are in? A) the test-tryout stage B) the pilot work stage C) the test construction stage D) None of the answers is correct.

B) the pilot work stage

An item-discrimination index is used on an ability test A) to determine whether items are measuring what they are designed to measure. B) to measure the difference between how many high scorers and how many low scorers answered the item correctly. C) to estimate how predictive the item is of the test taker's future performance. D) to measure the difference between how many median scorers and how many low scorers answered the item correctly.

B) to measure the difference between how many high scorers and how many low scorers answered the item correctly.

An advantage of using a true-false item format over a multiple-choice item format in a teacher-made test designed for classroom use is A) true-false items are applicable to a wider range of subject areas. B) true-false items are easier to write. C) true-false items reduce the odds of a correct answer as the result of guessing. D) true-false items will never become dated.

B) true-false items are easier to write.

Guttman scales A) are typically used with nominal categories. B) typically are constructed so that agreement with one statement may predict agreement with another statement. C) typically are constructed so that agreement with one statement should not be correlated with agreement with any other statement. D) were originally developed by a Peace Corps task force.

B) typically are constructed so that agreement with one statement may predict agreement with another statement.

Ideally, psychological or educational tests are revised A) every decade. B) when the test is no longer useful. C) as a function of annual test sales. D) None of the answers is correct.

B) when the test is no longer useful.

A decision is made to use only a few subjects per item during the test tryout phase of a test's construction. This decision is most likely to lead to A) "phantom factors" during test construction. B) "phantom factors" during the test administration. C) "phantom factors" during factor analysis. D) "phantom deposits" in the test author's royalty account.

C) "phantom factors" during factor analysis.

When the effect of guessing is taken into account, what is the optimal item-difficulty level for a true-false item? A) .50 B) .60 C) .75 D) 1.00

C) .75

If 50 students were administered a classroom test, how many would be included in each upper and lower group for the purpose of calculating d, the item-discrimination index? A) 25 B) 10 C) 13 D) 17

C) 13

A test developer has created a pool of 30 items and is ready for a test tryout. At a minimum, how many subjects should the test be administered to? A) 60 B) 120 C) 150 D) 180

C) 150

According to your textbook, for a test tryout, the minimum sample for each item on the test is A) one-half of the number of test takers in the standardization sample. B) 25 test takers. C) 5 test takers. D) 500 test takers.

C) 5 test takers.

Which statement best describes the relationship between item difficulty and a "good" item? A) The difficulty level is not a factor in determining a "good" item. B) An item with a high difficulty level is likely to be "good." C) An item with a mid-range difficulty level is likely to be "good." D) An item with a low difficulty level is likely to be "good."

C) An item with a mid-range difficulty level is likely to be "good."

A strategy for cheating on an examination entails one test taker memorizing items and later recalling and reciting them for the benefit of a future test taker. This cheating strategy may be countered by A) a computer-tailored test administration to each test taker. B) a computer-randomized presentation of test items. C) Both a computer-tailored test administration to each test taker and a computer-randomized presentation of test items are correct. D) None of the answers is correct.

C) Both a computer-tailored test administration to each test taker and a computer-randomized presentation of test items are correct.

A test developer of multiple-choice ability tests reviews data from a recent test administration. She discovers that all test takers who scored very high on the test as a whole responded to item 13 with the same incorrect choice. Accordingly, the test developer A) assumes that members of the high-scoring group are making some sort of unintended interpretation of item 13. B) plans to interview members of the high-scoring group to understand the basis for their choice. C) Both assumes that members of the high-scoring group are making some sort of unintended interpretation of item 13 and plans to interview members of the high-scoring group to understand the basis for their choice are correct. D) should remove item 13 from the test and place in its stead a note that reads: "Go to Item 14."

C) Both assumes that members of the high-scoring group are making some sort of unintended interpretation of item 13 and plans to interview members of the high-scoring group to understand the basis for their choice are correct.

The development of a criterion-referenced test usually entails A) exploratory work with a group of test takers who have mastered the material. B) exploratory work with a group of test takers who have not mastered the material. C) Both exploratory work with a group of test takers who have mastered the material and exploratory work with a group of test takers who have not mastered the material are correct. D) None of the answers is correct.

C) Both exploratory work with a group of test takers who have mastered the material and exploratory work with a group of test takers who have not mastered the material are correct.

To increase the precision of a test, test developers may have to A) increase the number of test items. B) increase the number of response options. C) Both increase the number of test items and increase the number of response options are correct. D) None of the answers is correct.

C) Both increase the number of test items and increase the number of response options are correct.

One of the questions that the developer of a new test must answer is, "How will the test be administered?" The answer to this question may be A) the test will be individually administered. B) the test will be group administered. C) Both the test will be individually administered and the test will be group administered are correct. D) None of the answers is correct.

C) Both the test will be individually administered and the test will be group administered are correct.

Factor analysis can help the test developer A) to eliminate or revise items that do not load on the predicted factor. B) to identify whether test items appear to be measuring the same construct. C) Both to eliminate or revise items that do not load on the predicted factor and to identify whether test items appear to be measuring the same construct are correct. D) None of the answers is correct.

C) Both to eliminate or revise items that do not load on the predicted factor and to identify whether test items appear to be measuring the same construct are correct.

Which is true of cross-validation of a test after standardization has occurred? A) Cross-validation creates confusion regarding the meaning of the original standardization data. B) The cross-validation sample is composed of the same test takers that participated in the original test standardization. C) Cross-validation often results in validity shrinkage. D) All of the answers are correct.

C) Cross-validation often results in validity shrinkage.

Which is not a typical question that is raised and answered during the test conceptualization stage of test development? A) What is the objective of the test? B) Is there a need for the test? C) How valid are the items on the test? D) What types of responses will be required of the test taker?

C) How valid are the items on the test?

Which is true of Thurstone's equal-appearing intervals method of scaling? A) It is relatively simple to construct. B) It demands that the test taker sort item responses into stacks of similar content. C) It uses judges' ratings to assign values to items. D) It is typically devised using proprietary software developed by Louis Thurstone's grandchildren.

C) It uses judges' ratings to assign values to items.

Which is true of item analysis on speed tests? A) Results of the item analysis are relatively easy to interpret and are clear. B) Item-difficulty levels are lower toward the end of the test. C) Item-discrimination levels are higher toward the end of the test. D) Later items tend to have low item-total correlations.

C) Item-discrimination levels are higher toward the end of the test.

An item bank is A) a computerized system whereby test items "pay dividends" only when used. B) the optimum combination of reliability and validity in an item. C) a set of test items from which a test can be constructed. D) a statistical item-discrimination index for data relating to high and low scorers on a test.

C) a set of test items from which a test can be constructed.

Having a large item pool available during test revision is A) a disadvantage due to the great expense of item development. B) often a waste of time because many of the items are eventually deleted. C) an advantage because poor items can be deleted in favor of the good items. D) a great perk for test developers who are swimming enthusiasts.

C) an advantage because poor items can be deleted in favor of the good items.

Jana takes a personality test administered by the "True Compatibility Dating Service." According to the personalized, computerized personality profile that results, Jana learns that her need for exhibitionism is much greater than her need for stability. Since the test analyzes data only with regard to Jana, and no other client of the dating service, it may be assumed that the test was scored using A) a diagnostic model. B) a cumulative model. C) an ipsative model of scoring. D) truly compatible models.

C) an ipsative model of scoring.

On a particular test, men and women tend to have the same total score. Men and women do, however, tend to exhibit different response patterns to specific items. A reasonable conclusion is that the test is A) unreliable. B) invalid. C) biased. D) scaled.

C) biased.

The Rokeach values measure involves presenting the subject with index cards, on each of which a single value is listed. Test takers are asked to place the cards in order of their own concern about each of the values. This procedure best exemplifies A) multidimensional scaling. B) Likert scaling. C) comparative scaling. D) Murray scaling.

C) comparative scaling.

A student complains that a midterm examination did not include items from a particular in-class lecture. From a psychometric perspective, the student is expressing concern about the midterm's A) test-retest reliability. B) internal consistency reliability. C) content validity. D) cross-validation.

C) content validity.

These tests are often used for the purpose of licensing persons in professions. The tests referred to here are A) pilot tests. B) norm-referenced tests. C) criterion-referenced tests. D) Guttman scales.

C) criterion-referenced tests.

The higher an item-validity index, the greater the _____ validity. A) construct B) content C) criterion-related D) face

C) criterion-related

In creating a test designed to measure personality constructs, the test developer's first step would best be to A) determine which items would lead to socially desirable responses. B) create a large pool of potential items. C) define the construct or constructs being measured. D) select a representative sample of test takers for test tryout.

C) define the construct or constructs being measured.

Test developers have at their disposal a number of statistical tools that may be applied when selecting items for use on a test. In Chapter 8's Meet an Assessment Professional, Dr. Scott Birkeland made reference to two such techniques. One was a measure of item discrimination, and the other was a measure of item A) reliability. B) utility. C) difficulty. D) variance.

C) difficulty.

Most classroom tests developed by instructors for use in their own classroom are A) subjected to formal procedures of psychometric evaluation. B) only evaluated formally for content validity. C) evaluated informally for their psychometric properties. D) used without modification, year after year, until retirement or death.

C) evaluated informally for their psychometric properties.

A sensitivity review panel would most likely be formed of A) only experts from the majority group. B) only experts from a particular minority group. C) experts representing both minority and majority groups. D) measurement specialists from all continents known for their sensitivity.

C) experts representing both minority and majority groups.

Brotto and Yule reported that the development of their measure of asexuality was developed in four stages. Which best characterizes what they did during Stages 2 and 3? A) analysis of variance B) regression analysis C) factor analysis D) meta-analysis

C) factor analysis

An individually administered test designed for use with elementary-school-age student is in the test tryout stage of test development. For the purposes of the tryout, this test should be administered A) as a group test to as many classes as possible in an elementary school. B) individually to high school students for exploratory purposes. C) individually to elementary-school-age students in an environment that simulates the way that the final version of the test will be administered. D) to experts in elementary school education to ensure that the items are appropriate for elementary school-aged children.

C) individually to elementary-school-age students in an environment that simulates the way that the final version of the test will be administered.

Asexuality A) is a sexual orientation. B) is not a sexual orientation. C) is considered by some to be a sexual orientation and not by others. D) was delisted as a sexual orientation inDiagnostic and Statistical Manual of Mental Disorders -V.

C) is considered by some to be a sexual orientation and not by others.

On a true-false inventory, a respondent selects true for an item that reads, " I summer in Tehran." The individual scoring the test would best interpret this response as indicative of the fact that this respondent A) is extremely eccentric with respect to choice of time shares. B) requires more sensation seeking than Cape Cod has to offer. C) is responding randomly to test items. D) None of the answers is correct.

C) is responding randomly to test items.

An analysis of a test's item may take many forms. Thinking of the descriptions cited in your text, which is not one of those forms? A) item validity analysis B) item discrimination analysis C) item tryout analysis D) item reliability analysis

C) item tryout analysis

A test item written in a multiple-choice format has three elements. Which of the following is not one of those elements? A) foil B) stem C) leaf D) correct option

C) leaf

Test developers calculate an item-validity index to A) understand why an item is difficult or easy. B) reduce the likelihood of an examinee's guessing. C) maximize the test's criterion-related validity. D) determine the internal consistency of the test.

C) maximize the test's criterion-related validity.

Expert panels may be used in the process of test development to A) provide judgments concerning each item's reliability. B) serve as expert witnesses in any future litigation. C) screen test items for possible bias. D) All of the answers are correct.

C) screen test items for possible bias.

Which is an example of the use of a completion item format on a test? A) true-false items B) matching items C) short-answer items D) multiple-choice item

C) short-answer items

The Likert scale is an example of which type of rating scale? A) categorical B) paired methods C) summative D) content

C) summative

In contrast to scaling methods that employ indirect estimation, scaling methods that employ direct estimation do not require A) writing two sets of items for parallel forms. B) the use of the method of equal-appearing intervals. C) transforming test taker responses into some other scale. D) indirect methods to interpret test taker responses.

C) transforming test taker responses into some other scale.

The term used to describe the decrease in item validities that typically occurs during cross-validation is A) validity detriment. B) validity decrement. C) validity shrinkage. D) cross-validation devaluation.

C) validity shrinkage.

In Guttman scaling, A) test takers are presented with a forced-choice format. B) each item is completely independent of every other item and nothing can be concluded as a result of the endorsement of an item. C) when one item is endorsed by a test taker, the less extreme aspects of that item are also endorsed. D) when more than one item tapping a particular content area is endorsed, the less extreme aspects of those items are eliminated.

C) when one item is endorsed by a test taker, the less extreme aspects of that item are also endorsed.

What is the value of the item-discrimination index for an item answered correctly by an equal number of students in the higher- and lower-scoring groups? A) -1 B) +1 C) .50 D) 0

D) 0

A disadvantage of recruiting asexual research subjects from a single online community is that A) the persons belonging to the online community may constitute a unique group within the asexual population. B) the persons belonging to the online community have already acknowledged their asexuality as an identity. C) asexual individuals who do not belong to the community will be systematically omitted. D) All of the answers are correct.