Chapter 2 Test Psychological Assessments

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Which of the following can help test developers evaluate how well a test or an individual item is working to measure different levels of a construct? a. item response theory (IRT) information curves b. classical test theory (CTT) information curves c. the item-validity index d. the item-discrimination index

item response theory (IRT) information curves

In the context of methods for setting cut scores, the method that entails the collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability of interest is known as the _____ method. a. bookmark b. predictive yield c. known groups d. item-mapping

known groups

Identify the formula for the standard error of the difference between two scores when the squared standard error of measurement for test 1 (σmeas12) and the squared standard error of measurement for test 2 (σmeas22) is known. Page 172 in Edition 9 Psychological Assessments

o diff = 2^/o meas ^2/1 + o meas^2/2

Identify the types of item formats. (Check all that apply.) a. selected-response format b. constructed-response format c. activity format d. experiment format

selected-response format constructed-response format

A numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired is known as the _____. a. selection ratio b. productivity gain c. base rate d. utility gain

selection ratio

Identify an estimate of reliability that is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. a. parallel forms reliability b. split-half reliability c. test-retest reliability d. alternate forms reliability

split-half reliability

The standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests is known as the _____. a. standard error of the difference b. coefficient of stability c. standard error of measurement d. test-retest reliability

standard error of measurement

A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant is called the _____. a. coefficient of stability b. coefficient of generalizability c. standard error of the difference d. reliability difference

standard error of the difference

Unlike items in a selected-response format, items in a constructed-response format require testtakers to _____. a. conduct an experiment b. perform a skilled activity unrelated to the test c. select a response from a set of alternative responses d. supply or create the correct answer

supply or create the correct answer

_____ relates more to what a test appears to measure to the person being tested than to what the test actually measures. a. Concurrent validity b. Face validity c. Predictive validity d. Content validity

Face validity

The _____ approach is used by some diagnostic systems wherein individuals must exhibit a certain number of symptoms to qualify for a specific diagnosis. class scoring cumulative scoring ipsative scoring reliability scoring

class scoring

The _____ is also referred to as the true score model of measurement. a. generalizability theory b. item response theory c. domain sampling theory d. classical test theory

classical test theory

The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability known as the _____. a. test-retest reliability b. split-half reliability c. coefficient of equivalence d. coefficient of stability

coefficient of equivalence

The simplest way to determine the degree of consistency among scorers in the scoring of a test is by calculating a coefficient of correlation known as _____. a. coefficient of equivalence b. coefficient of inter-scorer reliability c. coefficient of generalizability d. coefficient of split-half reliability

coefficient of inter-scorer reliability

Face validity is a judgment _____. a. of how a test score can be used to infer a test user's standing on some measure b. about how a test measures the intelligence of the test user c. of how consistently a test samples behavior d. concerning how relevant test items appear to be

concerning how relevant test items appear to be

If test scores are obtained at about the same time as the criterion measures are obtained, measures of the relationship between the test scores and the criterion provide evidence of _____. a. nonsimultaneous validity b. concurrent validity c. predictive validity d. asynchronous validity

concurrent validity

David uses an intelligence test that measures individuals on a certain set of characteristics. However, the high scorers and low scorers on the test do not behave as predicted by the theory on which the test was based. Here, David needs to investigate the _____ of the test. a. ipsative validity b. construct validity c. summative validity d. anchor validity

construct validity

Evidence that test scores change as a result of some experience between a pretest and a posttest can be evidence of _____. a. construct validity b. base validity c. face validity d. concurrent validity

construct validity

Testtakers must supply or create the correct answer.

constructed-response format

If scores on a test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, and already validated tests designed to measure the same (or a similar) construct, it would be an example of _____. a. severity error b. convergent evidence c. discriminant evidence d. leniency error

convergent evidence

The rationale of the method of contrasted groups is that if a test is a valid measure of a particular construct, then test scores from groups of people who would be presumed to differ with respect to that construct should have _____. a. little variation between high scorers and low scorers in a group b. correspondingly different test scores c. extreme variation across high scorers in all groups d. similar test scores across all groups

correspondingly different test scores

In the context of concurrent and predictive validity, a(n) _____ is defined as the standard against which a test or a test score is evaluated. a. slope bias b. construct c. criterion d. intercept bias

criterion

Identify an experience that is most likely to result in maximum change in test scores between a pretest and a posttest. a. a random walk b. commute to the workplace c. watching a sitcom d. formal education

formal education

The _____ is based on the idea that a person's test scores vary from testing to testing because of variables in the testing situation. a. content sampling theory b. classical test theory c. item response theory d. generalizability theory

generalizability theory

A company wants to determine the practical value of a personality development training module prior to its inclusion in an HR training program. In this scenario, the company is trying to understand the _____ of the module. a. authenticity b. capacity c. utility d. validity

utility

Rank the five stages in the process of developing a test in order, placing the first step in the top position and the last step in the bottom position.

1. Test conceptualization 2. Test construction 3. Test tryout 4. Item analysis 5. Test revision

Construct

Informed scientific idea developed or hypothesized to describe or explain behavior.

What are five examples of Construct

Intelligence, Anxiety, Job Satisfaction, Personality, Bigotry, Clerical Aptitude, Depression, Motivation, Self-esteem, Emotional Adjustment, Potential Dangerousness, Executive Potential, Creativity, Mechanical Comprehension.

A testtaker's score on one scale within a test is compared to another scale within that same test.

Ipsative Scoring

Identify an accurate statement about the concept of "bias." a. In the context of psychometrics, it always implies prejudice and preferential treatment. b. It has the same meaning in every context that it is used. c. It can be detected using a variety of statistical procedures. d. It is not helped by active prevention during test development.

It can be detected using a variety of statistical procedures.

The potential problems of the Taylor-Russell tables were avoided by an alternative set of tables known as the _____. a. Brogden-Cronbach-Gleser tables b. Naylor-Shine tables c. protocol tables d. predictive yield tables

Naylor-Shine tables

If X is used to represent an observed score, T to represent a true score, and E to represent error, then X=_____. a. T/E b. T-E c. T(E) d. T+E

T+E

Identify an accurate statement about the concept of fairness. a. The issues of fairness are mostly rooted in values. b. It can be answered with mathematical precision and finality. c. It can be perfectly measured and evaluated through statistical procedures. d. The issue of fairness mostly crops up in technically complex, statistical problems.

The issues of fairness are mostly rooted in values.

Identify a limitation of the Taylor-Russell tables. a. They cannot interpret the variables of selection ratio and base rate. b. They cannot estimate the extent to which inclusion of a particular test in a selection system will improve selection. c. The relationship between the predictor and the criterion must be linear. d. The relationship between the test and the rating of performance on the job must be cyclic.

The relationship between the predictor and the criterion must be linear.

Identify a condition that deems a test to be due for revision. a. The test contains vocabulary and stimuli that are easily understood by current testtakers. b. The stimulus materials and verbal content look dated, and current testtakers cannot relate to them. c. The size of the population of potential testtakers increases. d. Current testtakers are able to score high in the test.

The stimulus materials and verbal content look dated, and current testtakers cannot relate to them.

Identify an accurate fact about the concept of criterion. a. A criterion is similar to a hit rate. b. There are no specific rules as to what constitutes a criterion. c. Time can never be used as a criterion. d. A criterion can be contaminated in nature.

There are no specific rules as to what constitutes a criterion.

True or false: In the context of the Angoff method, a strategy for setting cut scores that is driven more by data and less by subjective judgments is required in situations where there is major disagreement regarding how certain populations of testtakers should respond to items. a. True b. False

True

True or false: Item analysis tends to be regarded as a quantitative endeavor, even though it may also be qualitative. a. True b. False

True

True or false: Utility estimates based on the assumption that all people selected will actually accept offers of employment tend to overestimate the utility of the measurement tool. a. True b. False

True

True or false: Writing large item pools is helpful because approximately half of the items in the item pool will be eliminated from a test's final version after revision. a. True b. False

True

Identify an accurate statement about predictive validity of a test. a. Usually some intervening event takes place before the criterion measure is obtained. b. It is another term for concurrent validity. c. The test scores are obtained after the criterion measures have been obtained. d. The intervening event is always a training period.

Usually some intervening event takes place before the criterion measure is obtained.

Concerns about content validity of classroom tests are _____. a. addressed by test developers by conducing a test analysis d. routinely addressed, usually informally, by professors in the test development process c. avoided by test developers as the process of addressing them is complicated d. considered difficult by professors to address in the test development process

routinely addressed, usually informally, by professors in the test development process

The process by which a measuring device is designed and calibrated and by which scale values are assigned to different amounts of the trait, attribute, or characteristic being measured is known as _________.

scaling

Testtakers must choose a response from a set of alternative responses.

selected-response format

Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice (because of factors such as time or expense). This is known as a. test-retest reliability b. split-half reliability c. reliability coefficient d. Utility

split-half reliability

Which of the following is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice? a. split-half reliability b. test-retest reliability c. alternate forms reliability d. parallel forms reliability

split-half reliability

The _____ is the tool used to estimate or infer the extent to which an observed score deviates from a true score. a. average proportional distance b. standard error of measurement c. coefficient alpha d. error variance

standard error of measurement

An organization has 75 job positions available and 100 applicants. In this scenario, the selection ratio is _____. 0.075 13.33 0.75 1.33

0.75

A testtaker earns placement in a particular group with other testtakers whose pattern of responses is similar in some way.

Class scoring

Identify steps that should be taken in the test revision stage. (Check all that apply.) a. All items in the test must be rewritten if one item is not valid. b. The testing methodology must be changed if the items are weak. c. Items that are too easy or difficult should be revised. d. Items with many weaknesses should be deleted.

Items that are too easy or difficult should be revised. Items with many weaknesses should be deleted.

For psychometricians, a factor inherent in a test that systematically prevents accurate, impartial measurement is known as ________

bias

Evidence that test scores change as a result of some experience between a pretest and a posttest can be evidence of _____. a. base validity b. concurrent validity c. face validity d. construct validity

construct validity

Statements of concurrent validity indicate the extent to which test scores may be used to _____. a. determine the efficiency level of the test users b. estimate an individual's perception of the validity of the test c. estimate an individual's present standing on a criterion d. determine the consistency of the test

estimate an individual's present standing on a criterion

A cut score is set based on a test that best discriminates the test performance of two groups in the _____. a. bookmark method b. item-mapping method c. predictive yield method d. known groups method

known groups method

When trying to determine whether the benefits of using a test outweigh the costs, a test developer must conduct a _____. a. validity analysis b. productivity gain analysis c. reliability analysis d. utility analysis

utility analysis

A _____ is a statistic that quantifies reliability, ranging from 0 (not at all reliable) to 1 (perfectly reliable). a. Utility Coefficient b. Reliability Factorial c. Validity Coefficient d. Reliability Coefficient

Reliability Coefficient

Identify a true statement about the item response theory-based methods for setting cut scores in a test. a. Cut scores are typically set based on testtakers' performance across selected items on the test. b. The setting of cut scores is independent of test items c. Testtakers must answer items that are deemed to be above some minimum level of difficulty. d. The setting of cut scores is independent of any expert opinion.

Testtakers must answer items that are deemed to be above some minimum level of difficulty.

In the language of psychometrics, reliability refers to consistency in measurement. T/F:

True

What are the three elements of a multiple-choice format? (Check all that apply.) a. a correct alternative or option b. binary-choice items c. several incorrect alternatives or options a stem d. premises and responses

(1) a stem, (2) a correct alternative or option, and (3) several incorrect alternatives or options

Which informal rule of thumb is followed regarding the number of people on whom a test should be tried out? a. There should be a maximum of 20 subjects for each item on the test. b. There should be a minimum of 2 subjects for each item on the test. c. There should be a maximum of 50 subjects for each item on the test. d. There should be no fewer than 5 subjects for each item on the test.

An informal rule of thumb is that there should be no fewer than 5 subjects and preferably as many as 10 for each item on the test.

An organization is experimenting with a new personnel test for its employees. The organization is trying to gauge if the test is working as it should and if it should be instituted on a permanent basis. Which of the following is most likely to aid the organization in making this decision? a. an achievement test b. a locator test c. a discriminant analysis d. an expectancy table

an expectancy table

Identify a true statement about pilot work. a. A test developer employs pilot work to determine how to conceive the idea of a test. b. Pilot study only includes physiological monitoring of the research subjects. c. Pilot study only includes open-ended interviews with research subjects. d. Test items are piloted to evaluate whether they should be included in the instrument.

Test items are piloted to evaluate whether they should be included in the instrument.

Which of the following must be done after a test has been constructed? (Check all that apply.) a. The item-difficulty index must be calculated using the item-score standard deviation. b. The test must be tried out on people who are similar in critical respects to the people for whom the test was designed. c. The item-discrimination index must be calculated using the correlation between the item score and the criterion score. d. The test must be tried out under conditions identical to those under which the standardized test will be administered.

The test must be tried out on people who are similar in critical respects to the people for whom the test was designed. The test must be tried out under conditions identical to those under which the standardized test will be administered.

In the context of testing, identify the disadvantages of using the Taylor-Russell tables in a utility analysis. (Check all that apply.) a. They do not indicate the likely average increase in performance with the use of a particular test. b. They unrealistically dichotomize performance into successful versus unsuccessful. c. They require the relationship between the predictor and the criterion to be nonlinear. d. They do not show the relationship between selection ratio and existing base rate.

They do not indicate the likely average increase in performance with the use of a particular test. They unrealistically dichotomize performance into successful versus unsuccessful.

Constructs are unobservable, presupposed (underlying) traits that a test developer may invoke to describe test behavior or criterion performance. T/F:

True

The alternate-forms and parallel-forms reliability of a test are an indication of whether two forms of a test are really equivalent. T/F:

True

In which of the following approaches should a learner demonstrate mastery of particular material before the learner moves on to advanced material that conceptually builds on the existing base of knowledge? a. prototype testing b. criterion-referenced testing c. pilot testing d. norm-referenced testing

criterion-referenced testing

A professor attempts to judge the validity of a test that tests the time management skill of a testtaker by using the score of the test to measure the individual's ability to handle time. The professor is judging the _____ of the test. a. criterion-related validity b. face validity c. content validity ratio d. central tendency error

criterion-related validity

In contrast to techniques and principles applicable to the development of norm-referenced tests, the development of criterion-referenced instruments _____. a. entails exploratory work with at least five groups of testtakers b. depends on the test scores of low scorers c. is independent of the objective of the test d. derives from a conceptualization of the knowledge or skills to be mastered

derives from a conceptualization of the knowledge or skills to be mastered

An _____ is the reservoir or well from which items will or will not be drawn for the final version of a test. a. item pool b. item index c. item branch d. item format

item pool

Tests are due for revisions when, 1. The stimulus materials look dated and current testtakers cannot relate to them. 2. The verbal content of the test, including the administration instructions and the test items, contains dated vocabulary that is not readily understood by current testtakers. 3. As popular culture changes and words take on new meanings, certain words or expressions in the test items or directions may be perceived as inappropriate or even offensive to a particular group and must therefore be changed. 4. The test norms are no longer adequate as a result of group membership changes in the population of potential testtakers. 5. The test norms are no longer adequate as a result of age-related shifts in the abilities measured over time, and so an age extension of the norms (upward, downward, or in both directions) is necessary. 6. The reliability or the validity of the test, as well as the effectiveness of individual test items, can be significantly improved by a revision. 7. The theory on which the test was originally based has been improved significantly, and these changes should be reflected in the design and content of the test.

Boop

Anxiety is a construct that may be invoked to describe why a psychiatric patient paces the floor. What does this best describe? a. Construct, b. Construct Validity c. Utility d. Concurrent Validity

Construct

Intelligence is a construct that may be invoked to describe why a student performs well in school. What does this best describe? a. Construct, b. Construct Validity c. Utility d. Concurrent Validity

Construct

The theoretical, intangible way people vary describes? a. Construct, b. Construct Validity c. Utility d. Concurrent Validity

Construct

_____ is a judgment of how adequately a test score can be used to infer an individual's most probable standing on some measure of interest. a. Intercept bias b. Face validity c. Validity coefficient d. Criterion-related validity

Criterion-related validity

_____ is defined as the process of setting rules for assigning numbers in measurement. a. Piloting b. Anchoring c. Scaling d. Scoring

Scaling

An index of _____ gives information on the practical value of the information derived from scores on the test. a. utility b. dependability c. reliability d. validity

utility

In the context of testing and assessment, the term _____ refers to the usefulness or practical value of testing to improve efficiency. a. authenticity b. rigidity c. reliability d. utility

utility

Test scores are said to have _____ if their use in a particular situation helps a person make better decisions—better, that is, in the sense of being more cost-effective. a. validity b. authenticity c. reliability d. utility

utility

A _____ can be defined as a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the practical value of a tool of assessment. a. validity analysis b. psychometric soundness analysis c. reliability analysis d. utility analysis

utility analysis

As applied to a test, _____ is a judgment or estimate of how well a test measures what it purports to measure in a particular context. a. probability b. validity c. universality d. intentionality

validity

As applied to a test, _____ is a judgment or estimate of how well a test measures what it purports to measure in a particular context. a. validity b. universality c. probability d. intentionality

validity

An IT solutions company hires 30 software testers and 22 of them are considered successful. In this case, the base rate is _____. a. 1.36 b. 0.0073 c. 0.73 d. 0.0136

0.73

Identify a true statement about an index of inter-item consistency. a. A measure of inter-item consistency is calculated from a single administration of a single form of a test. b. It is a sufficient tool to measure multifaceted psychological variables such as intelligence or personality. c. A measure of inter-item consistency is calculated from multiple administrations of a single form of a test. d. It is directly related to test items that measure more than one personality trait.

A measure of inter-item consistency is calculated from a single administration of a single form of a test.

_____ is a judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable such as a trait or a state. a. Incremental validity b. Face validity c. Base validity d. Construct validity

Construct validity

_____ describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample. a. Face validity b. Content validity c. Predictive validity d. Incremental validity

Content validity

_____ describes a judgment of how adequately a test samples behavior representative of the universe of behavior that the test was designed to sample. a. Face validity b. Predictive validity c. Incremental validity d. Content validity

Content validity

_____ refers to how uniform a test is in measuring a single concept. a. Homogeneity b. Face validity c. Halo effect d. Inference

Homogeneity

____________ are items functionally uniform throughout, while ________ are items that are functionally different

Homogeneous tests, Heterogeneous tests.

If a test has a higher degree of internal consistency it is _________ in items. If a test has a lower degree of internal consistency it is _______ in items.

Homogeneous, Heterogeneous

Which of the following is a limitation of the Taylor-Russell tables? a. The tables cannot interpret the variables of selection ratio and base rate. b. Identification of a criterion score that separates successful from unsuccessful employees can potentially be difficult. c. It is impossible to estimate the extent to which inclusion of a particular test in a selection system will improve selection. d. The relationship between the predictor (the test) and the criterion (rating of performance on the job) must be nonlinear.

Identification of a criterion score that separates successful from unsuccessful employees can potentially be difficult.

Identify an accurate statement about convergent evidence for validity of a test. a. It only comes from correlations with tests measuring related constructs. b. It comes from correlations with tests measuring identical or related constructs. c. It only comes from correlations with tests measuring identical constructs. d. It comes from correlations with tests measuring unrelated constructs.

It comes from correlations with tests measuring identical or related constructs.

Identify a true statement about validity as applied to a test. a. It is a judgment of how dependable a test is to measure the same construct consistently. It is a judgment of how consistently a given test measures a particular construct. It is a judgment that serves as a stimulus to the creation of a new standardized test. It is a judgment based on evidence about the appropriateness of inferences drawn from test scores.

It is a judgment based on evidence about the appropriateness of inferences drawn from test scores.

In the context of setting cut scores, identify a true statement about the method of predictive yield. a. It is a norm-referenced method for setting cut scores. b. It employs a family of statistical techniques and is also called discriminant analysis. c. It uses a bookmark to separate test items and set a cut score. d. It sheds light on the relationship between identified variables and two naturally occurring groups.

It is a norm-referenced method for setting cut scores.

In the context of utility analysis, which of the following is true of the Taylor-Russell tables a. They entail obtaining the difference between the means of selected and unselected groups to know what a test is adding to established procedures. b. They assist in judging the utility of a particular test by determining the increase in average score on some criterion measure. c. They provide an estimate of the extent to which inclusion of a particular test in the selection system will improve selection. d. They provide an indication of the difference in average criterion scores for a selected group as compared with an original group.

They provide an estimate of the extent to which inclusion of a particular test in the selection system will improve selection.

Identify a true statement about test-retest reliability estimates. a. This measure is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time. b. This measure is used when it is impractical or undesirable to assess reliability with two tests or to administer a test twice. c. This estimate is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once. d. This estimate allows a test user to estimate internal consistency reliability from a correlation of two halves of a test.

This measure is appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time.

A source of error attributable to variations in the test taker's feelings, moods, or mental state over time. a. Transient Error b. Odd-even Reliability c. Inter-item Consistency d. Measurement Error

Transient Error

While eliminating or rewriting items in a test during the test revision stage, a test developer must _____. a. balance the strengths and weaknesses across items b. retain the easy items and delete the difficult ones c. make sure that the items being eliminated are invalid d. write a small item pool in the domain in which the test should be sampled

balance the strengths and weaknesses across items

The percentage of people hired under an existing system for a particular position is known as the _____. a. base rate b. selection ratio c. productivity ratio d. utility rate

base rate

In the context of testing, a disadvantage of using the Taylor-Russell tables in a utility analysis is that they _____. a. require the relationship between predictor and criterion to be nonlinear b. do not consider the cost of testing in comparison to benefits c. do not dichotomize criterion performance d. overestimate utility unless top-down selection is used

do not consider the cost of testing in comparison to benefits

In the context of a test, a feature of the item response theory (IRT) framework is that _____. a. cut scores are set based on tessttakers' performance across all the items on a test b. each item is associated with a particular level of difficulty c. the setting of cut scores is independent of expert opinion d. the difficulty level of an item is independent of its cut score

each item is associated with a particular level of difficulty

One way a test developer can improve the homogeneity of a test containing items that are scored dichotomously is by _____. a. eliminating items that do not show significant correlation coefficients with total test scores b. using multiple samples in testing to increase validity shrinkage c. assuring testtakers that improving homogeneity is the end-all of proving construct validity d. using a single sample in testing to increase validity shrinkage

eliminating items that do not show significant correlation coefficients with total test scores

Identify the component of an observed test score that has nothing to do with a testtaker's ability. a. efficiency b. error c. true variance d. true score

error

Which of the following tables can provide an indication of the likelihood that a testtaker will score within some interval of scores on a criterion measure? a. Naylor-Shine tables b. expectancy tables c. protocol tables d. Taylor-Russell tables

expectancy tables

A particular test situation or universe is described in terms of its _________ which include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration.

facets

In a psychometric context, _____ refers to the extent to which a test is used in an impartial, just, and equitable way. a. halo effect b. fairness c. bias d. central tendency

fairness

A(n) _____ is a logical result or deduction. a. construct b. hit rate c. inference d. criterion

inference

Michael encountered a difficult question in his LSAT exam. By using his knowledge of law terminology, he was able to logically arrive at the answer. In this scenario, Michael used the _____ method. a. convergent evidence b. hit rate c. predictive validity d. inference

inference

The degree of correlation among all the items on a scale is known as _____. a. inter-scorer reliability b. inter-item consistency c. coefficient of equivalence d. coefficient of stability

inter-item consistency

The degree of agreement or consistency between two or more raters with regard to a particular measure is known as _____. a. homogeneity b. heterogeneity c. split-half reliability d. inter-scorer reliability

inter-scorer reliability

The different types of statistical scrutiny that test data can potentially undergo are referred to collectively as _____. a. item-pool scoring b. item branching c. item analysis d. item-characteristic bias

item analysis

This stage invovles statistical procedures that assist in making judements about test items. a. test conceptualization b. test construction c. test tryout d. item analysis e. test revision

item analysis

Variables such as the form, plan, structure, arrangement, and layout of individual test items are collectively referred to as _____. a. index b. item bank c. scale d. item format

item format

From the perspective of a test creator, a challenge in test development is to _____. a. maximize the proportion of the total variance that is error variance and to minimize the proportion of the total variance that is true variance b. minimize the true variance and maximize the error variance c. maximize the proportion of the total variance that is true variance and to minimize the proportion of the total variance that is error variance d. neutralize the true variance and error variance

maximize the proportion of the total variance that is true variance and to minimize the proportion of the total variance that is error variance

A university administered a geometry test to a new batch of exchange students. All of the students received failing grades because they were unfamiliar with the language, English, that was used to administer the test. The test was designed to evaluate the students' knowledge of geometry but it reflected their knowledge and proficiency in English instead. This scenario is an example of _____. a. transient error b. odd-even reliability c. inter-item consistency d. measurement error

measurement error

The term _____ refers to the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistakes. a. utility b. reliability c. true variance d. measurement error

measurement error

The _____ provides evidence for the validity of a test by demonstrating that scores on the test vary in a predictable way as a function of membership in some group. a. method of paired comparisons b. method of contrasted groups c. membership-characteristic curve d. item-characteristic curve

method of contrasted groups

The _____ provides evidence for the validity of a test by demonstrating that scores on the test vary in a predictable way as a function of membership in some group. a. method of paired comparisons b. membership-characteristic curve c. method of contrasted groups d. item-characteristic curve

method of contrasted groups

Identify an example of a selected-response item format. a. essay format b. completion item format c. short answer format d. multiple-choice format

multiple-choice format

Identify an example of a selected-response item format. a. essay format b. multiple-choice format c. completion item format d. short answer format

multiple-choice format

Criterion-related validity is difficult to establish on many classroom tests by professors because _____. a. every criterion reflects the level of the students' knowledge of a particular material b. all classroom tests are informal tests c. all classroom tests are multiple-choice tests d. no obvious criterion reflects the level of the students' knowledge of a specific material

no obvious criterion reflects the level of the students' knowledge of a specific material

Many utility models are constructed on the assumption that the _____. a. people who do not score well on a personnel test are unsuitable for the job profile b. people selected by a personnel test will accept the position that they are offered c. top scorers of a personnel test will easily acclimate to the work environment d. top scorers of a personnel test will be a perfect fit for the job profile

people selected by a personnel test will accept the position that they are offered

The term _____ refers to the preliminary research surrounding the creation of a prototype of a test. a. scaling b. planning c. pilot work d. priority work

pilot work

Measures of the relationship between test scores and a criterion measure obtained at a future time provide an indication of the _____ of a test. a. concurrent validity b. face validity c. predictive validity d. content validity

predictive validity

In the context of a test, identify the uses of item response theory (IRT) information curves. (Check all that apply.) a. tailoring an instrument to provide high information or precision b. presenting test items on the basis of responses to previous items c. recognizing nonpurposive or inconsistent responding d. weeding out uninformative questions or eliminating redundant items

tailoring an instrument to provide high information or precision weeding out uninformative questions or eliminating redundant items

This stage involves conceiving the idea for a test? a. test conceptualization b. test construction c. test tryout d. item analysis e. test revision

test conceptualization

Item sampling and content sampling are sources of variance during _____. a. test construction b. test interpretation c. test scoring d. test administration

test construction

This stage involves writing and formatting test items and setting score rules? a. test conceptualization b. test construction c. test tryout d. item analysis e. test revision

test construction

In the interest of ensuring content validity of a test, _____. a. incremental validity must be identified b. irrelevant content should be used to further understanding of the construct c. test developers have a fuzzy vision of the construct being measured in the test d. test developers include key components of the construct being measured

test developers include key components of the construct being measured

In the interest of ensuring content validity of a test, _____. a. test developers have a fuzzy vision of the construct being measured in the test b. test developers include key components of the construct being measured c. incremental validity must be identified d. irrelevant content should be used to further understanding of the construct

test developers include key components of the construct being measured

This stage involves action taken to modify a test's content or format for the purpose of improving its effectiveness? a. test conceptualization b. test construction c. test tryout d. item analysis e. test revision

test revision

This stage involves administering a preliminary form of a test to a representative sample of testtakers? a. test conceptualization b. test construction c. test tryout d. item analysis e. test revision

test tryout

An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test is known as _____. a. parallel-forms reliability b. split-half reliability c. test-retest reliability d. alternate-forms reliability

test-retest reliability

Which of the following can be used to set fixed cut scores that can be applied to personnel selection tasks as well as to questions regarding the presence or absence of a particular trait, attribute, or ability? a. the Brogden-Cronbach-Gleser formula b. the problem-solving model c. the response-to-intervention model d. the Angoff method

the Angoff method

Identify the tables that are used to obtain the difference between the means of the selected and unselected groups in order to derive an index of what a test is adding to already established procedures. a. the Taylor-Russell tables b. protocol tables c. the Naylor-Shine tables d. expectancy tables

the Naylor-Shine tables

Which of the following is a technique for setting cut scores that is most likely take into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores? a. the method of predictive yield b. the item-mapping method c. the method of contrasting groups d. the known groups method

the method of predictive yield

The Taylor-Russell tables provide an estimate of the percentage of employees hired by the use of a particular test who will be successful at their jobs. Identify the variables that the tables use to provide the estimate. (Check all that apply.) a. the average score on some criterion measure b. the selection ratio used c. the cut score d. the test's validity

the selection ratio used the test's validity

A condition that deems tests to be due for revision is that the _____. a. reliability as well as the effectiveness of individual test items has remained constant b. test developer has come up with new and improved items to be tested c. test tryout has been successful d. theory on which the test was originally based has been improved significantly

theory on which the test was originally based has been improved significantly

According to classical test theory, _____ is a value that genuinely reflects an individual's ability (or trait) level as measured by a particular test. a. true variance b. true score c. coefficient alpha d. error variance

true score

According to the generalizability theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained which known as the _____. a. true score b. universe score c. true variance d. error variance

universe score

In the generalizability theory, a(n) _____ replaces that of a true score. a. error variance b. transient score c. true variance d. universe score

universe score


Ensembles d'études connexes

Intermediate Accounting - Chapter 3 SB

View Set

Cellular adaptation, injury and death

View Set

Abnormal Psychology Test 1 Review

View Set

World Geography Chs.5-8 Questions

View Set

Business Law Chapter 19 - Breach of Contract and Remedies

View Set

Acct 250 Chp 1 & Accounting Cycle Review

View Set

Human Anatomy Exam 2 - Lecture 7, The Central Nervous System (CNS), *The Brain and the Spinal Cord*

View Set

CHARLIE Y SOPHIE LLEGAN A BUENOS AIRES

View Set

NUR334 PrepU: Chapter 37 - Disorders of Brain Function

View Set

Patho PrepU Chapter 41 Thyroid and Diabetes

View Set

Chapter 23: Aggregate Demand and Aggregate Supply

View Set

Microbiology Chapter 8: Microbial Metabolism

View Set