test and measurements study guide

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Samuel has participants answer the items on his measure using the following scale: 1: Not at all true 2: A little true 3: Somewhat true 4: Very true Which of the following is true about this response scale? It is continuous It is unbalanced It is unipolar It is bipolar

It is unipolar

Which of the following is not true about construct validity? It requires making predictions or hypotheses about associations between constructs It involves assembling evidence about what a test really measures It requires expert judges or raters It involves looking at multiple different variables and correlations

It requires expert judges or raters

What does a negative discrimination index indicate to test developers? More people in the high-scoring group (compared to the low-scoring group) got the item correct More people in the low-scoring group (compared to the high-scoring group) got the item correct The item is valid The difficulty index is close to 0

More people in the low-scoring group (compared to the high-scoring group) got the item correct

When an assessment compares the performance of a child against other children of the same age it is called a(n) ________________________. Achievement test Standardized test Norm-referenced test Criterion-referenced test

Norm-referenced test

The score that you actually record is which of the following? Error score False score True score Observed score

Observed score

Tests can apply to a wide variety of content areas. Which of the following is not a general content area that psychological tests cover? Ability level Personality attributes Vocational interests Physical features

Physical features

Dr. Love creates a scale measuring how well couples communicate. He finds that couples scoring low on his scale are much more likely to get divorced within the next five years. This is an example of what type of validity for Dr. Love's scale? Content validity Predictive validity Concurrent validity Construct validity

Predictive validity

In order to determine the potential for performance in a future setting, an aptitude test must have which of the following? Concurrent validity Discriminate validity Predictive validity High reliability

Predictive validity

If a scale is internally consistent, then there is evidence that Individuals perform similarly on the scale in the future and the past All of the items on the scale measure the same construct The scale can be used to measure an individual's true score on a construct All of the above

All of the items on the scale measure the same construct

Why are z scores so important to the field of measurement? Carries a positive connotation regarding performance Allows scores from different scales/distributions to be compared Easily understood by non-testing professionals Allows for positive and negative scores

Allows scores from different scales/distributions to be compared

The first step in scale construction is to _____. Develop an item pool Determine the structure and format of the response scale Determine the extent of the construct/content you want to measure Decide upon the number of items to include

Determine the extent of the construct/content you want to measure

What is the first step you should take when designing an aptitude test? Determine the content & distribution of items Pilot items in a representative sample Determine the skills that are important for success in the role Run item analysis to determine whether the test shows good item discrimination

Determine the skills that are important for success in the role

Poor test instructions are an example of which of the following? Trait error Testing error Method error Instrument error

Method error

The same variable (e.g., happiness) can be assessed using different levels of measurement False True

True

If you add together the bivariate correlations between each predictor variable (X1 and X2) and the outcome (Y), the result should be ______ the multiple correlation between both predictor variables and the outcome. In other words, r2y1 + r2y2 is _____ R2Y.12 equal to greater than no way to tell less than

greater than

The variance of a variable is a measure of: how much variability exists among scores on the variable how well the variable measures what it is supposed to be measuring how consistently the variable is measured over time how internally consistent the items are measuring a variable

how much variability exists among scores on the variable

The squared correlation between two variables tells you what the value of one variable is given the score on another variable how much variance they share whether the variables are positively or negatively related whether two variables are on the same scale

how much variance they share

What is the biggest problem with this question? True or False: I do not vote republican. It is not concrete It involves a negation/double-negative It is a biased/loaded question it is double-bareled

it involves a negation/double-negative

When writing true-false items, what does the test developer most need to consider? Need for precision Number of items Probability of guessing Plausible distractors

Need for precision

A factor's eigenvalue equals the PERCENT of total variance in the data accounted for by a factor the AMOUNT of total variance in the data accounted for by a factor the MEAN of all the factor loadings 1

the AMOUNT of total variance in the data accounted for by a factor

A scree plot is a graph of _______ the factor eigenvalues the factor correlations the item loadings the item correlations

the factor eigenvalues

If a scale is completely unreliable, the correlation coefficient in any test of its reliability will be close to ___. -1.00 0.00 1.00 0.50

0.00

Validity is typically described in which of the following ways? A Continuum of weak to strong A Cronbach's Alpha A P value A correlation coefficient

A Continuum of weak to strong

What type of test measures current knowledge of a specific topic? Intelligence test Aptitude test Achievement test Projective test

Achievement test

A psychological construct is A behavioral tendency or complex pattern of behavior Not directly observable Something you must infer from a variety of questions or behaviors All of the above

All of the above

When administering an intelligence test, what is the term for the lowest point on test where the test taker can pass two consecutive items of equal difficulty? Lower Limit Ceiling Age Upper Limit Basal Age

Basal Age

Which type of item best minimizes the effects of guessing on test scores? True/False Completion Multiple Choice Matching

Completion

Dr. Baylor is concerned that the items on his "psychological health" scale really do adequately represent all aspects of psychological health. Dr. Baylor's concern is one of ______________. Content validity Construct validity Concurrent validity Predictive validity

Content validity

What is the fundamental logic behind the multitrait-multimethod technique? If a measure has construct validity, then... Correlations should be highest amoung different traits (e.g., aggression, intelligence) all measured using the same method (e.g., self-report) Correlations should be lowest amoung different traits (e.g., aggression, intelligence) all measured using the same method (e.g., self-report) Correlations should be highest among different methods (e.g., observation, self-report) all measuring the same trait (e.g., aggression) Correlations should be lowest among different methods (e.g., observation, self-report) all measuring the same trait (e.g., aggression)

Correlations should be highest among different methods (e.g., observation, self-report) all measuring the same trait (e.g., aggression)

If you correlate scores from your test with a real-world indicator of the construct you're trying to test, what type of validity evidence are you collecting? Criterion validity Content validity Face validity Construct validity

Criterion validity

If you are evaluating students' scores based on a predefined level of performance, what type of scores are you using? Stanines Standard scores Criterion-referenced scores Percentile scores

Criterion-referenced scores

What type of test is used when you want to determine if a student has achieved a specific level of mastery in a particular content area? Researcher-made test Criterion-referenced test Norm-referenced test Standardized test

Criterion-referenced test

As part of a study, Sandra answers nine questions asking about depressive symptoms (e.g. "I cry a lot," "I often feel like things are hopeless"). She answers each of these questions on a 1 (not at all true) to 5 (very true) rating scale. Each one of these nine questions could be considered a(n) C) variable A) item B) scale D) both A and C

D) both A and C

The purpose of the scale development project is to Test a hypothesized relationship between scores on a self-report measure and other constructs Design a self-report measure and evaluate its psychometric properties (e.g., reliability, validity) Create a scale measuring a construct no one has ever examined before Describe and distinguish between various forms of assessment

Design a self-report measure and evaluate its psychometric properties (e.g., reliability, validity)

What is calculated when you examine the proportion of test takers who get an item correct? Item analysis Discrimination index Difficulty index Correct alternatives

Difficulty index

What is the computed number for how well an item distinguishes between people high and low in a construct? Item analysis Discrimination index Difficulty index Correct alternatives

Discrimination index

Dr. McStuffins's patients think that her measure of health is valid because it involves the sorts of things they think of when they conceptualize being healthy (e.g., "not having boo-boos," "eating your vegetables"). This is an example of what type of validity? Content validity Construct validity Criterion validity Face validity

Face validity

When ordering items within a scale/survey, you should go from the most _____ to the most ______. General, Specific positive, negative objectionable, benign Specific, General

General, Specific

The optimal difficulty for a question is Close to 1 Close to 0 Halfway between 100% correct and chance level correct Halfway between 100% correct and 0% correct

Halfway between 100% correct and chance level correct

What is the key to establishing criterion validity? Having a criterion that can be measured simultaneously with the test Having a criterion that can be measured after the test takes place Assessing multiple criteria Having a criterion that is meaningfully related to the purpose of the test

Having a criterion that is meaningfully related to the purpose of the test

Which of the following is not ALWAYS recommended when writing items for a scale? Make items specific and concrete Include negatively keyed items Keep items simple, clear, & short Consider the knowledge and experiences of your audience

Include negatively keyed items

Which of the following often underlies individuals' achievement and aptitude in specific content areas? Interpersonal functioning Creativity Intelligence Mechanical ability

Intelligence

Two trained professionals observe the behavior of children in a classroom. They each rate observed behaviors using the same form, and the percent of items that were rated the same is calculated. This is an example of which type of reliability? Test-retest reliability Internal consistency Interrater reliability Parallel reliability

Interrater reliability

When we want to evaluate a person's performance relative to that of others within a particular group, which of the following would be want to use? Norm-referenced score Raw score Criterion-referenced score Performance indicator

Norm-referenced score

What is a common pitfall with completion items? The potential for more than one correct response Including statements that are only partly true Making the answer too easy to guess Being overly detailed

The potential for more than one correct response

Factor loadings are: always greater than 1 a sign of construct validity an indication of reliability correlations between the original items and factors

correlations between the original items and factors

An item whose highest factor loading and next highest factor loading are relatively similar (<.2 different) is considered a(n) _____________. extraction loading crossloader communality eigenvalue

crossloader

Item discrimination can be calculated by a) comparing the number of correct responses in the "high" versus "low" groups b) correlating scores on an item with scores on the total scale c) examining which items correlate most multiple subscales d) both a and b

d) both a and b

When designing a scale, factor analysis should be used to: measure the validity of a scale measure the reliability of a scale determine whether the scale measures one construct or multiple constructs identify the best and worst items on the scale

determine whether the scale measures one construct or multiple constructs

the goal of a factor analysis is to retain a small number of factors that capture very little of the variance in the variables. retain a small number of factors that capture a lot of the variance in the variables. retain a large number of factors that capture very little of the variance in the variables retain a large number of factors that capture a lot of the variance in the variables

retain a small number of factors that capture a lot of the variance in the variables.

A measurement device or technique used to quantify behavior is termed a test construct observation self-report

test

The maximum number of factors resulting from a factor analysis is______ the number of items in the factor analysis. greater than less than equal to inversely related to

equal to

Which kind of question is best for assessing higher order thinking and complex understanding? essay multiple choice matching completion

essay

If a test is being used to determine how scores on a construct relate to something else, then it is being used for hypothesis testing/prediction classification selection diagnosis

hypothesis testing/prediction

When sample size is large and the number of items in the analysis is small, chance results occur _________. never less often more often always

less often

If you want to know the meaning of a factor, you should look at the scree plot look at the factor's eigenvalue look at the communalities look at which items load highly on it

look at which items load highly on it

Factor rotation: makes some items more correlated and some items less correlated with each factor (than they initially were) is only necessary when you have reverse scored variables in the data set flips the meaning of the factors from positive to negative (i.e. reverse scores the factors) a and b only

makes some items more correlated and some items less correlated with each factor (than they initially were)

If a variable was measured in a way that provides magnitude but not equal intervals or an absolute zero, it was measured at the _______ level nominal ordinal ratio interval

ordinal

Dr. Chocula tests his students' understanding of the material in his math class by having them complete math problems. The format of this test is observation performance self-report inference

performance

If a test is being used to determine who should be admitted into graduate school, then it is being used for hypothesis testing/prediction classification selection diagnosis

selection

A difficulty index close to 1 indicates that individuals low in the construct are more likely to answer the question correctly that individuals high in the construct are more likely to answer the question correctly that the question is too hard that the question is too easy

that the question is too easy

Why is studying tests and measurement important within psychology? Tests may be used unfairly or inaccurately if the attributes of the test (e.g., purpose, populations it was designed for, reliability/validity) are not well understood Tests are used ubiquitously throughout psychological research and applied psychology settings (e.g., school psychology, counseling) Our understanding of human behavior is only as good as the tools we use to measure it All of the above

All of the above

Measures of anxiety and depression have a strong, negative correlation (e.g., r = -.70). Knowing this, which of the following is true? All of the above are true You will generally be able to predict someone's score on an depression measure well if you know their score on a measure of anxiety. You will generally be able to predict someone's score on an anxiety measure well if you know their score on a measure of depression. A scatterplot of individuals' scores on an anxiety measure and a depression measure will group relatively close to a line of best fit

All of the above are true

Why are correlation coefficients key to measuring reliability? Because they can test whether... Individuals who get high scores relative to the mean on one scale (or items within that scale) consistently get high scores relative to the mean on another scale (or items within another scale) An individual's scores on reverse-scored items are positively related to their scores on the other items on the scale. An individual's scores on a scale (or items on that scale) are consistently linked with outcomes An individual's scores on the same scale (or items within that scale) are consistently high or low relative to the mean of that scale

An individual's scores on the same scale (or items within that scale) are consistently high or low relative to the mean of that scale

According to Spearman's General Factor Theory, intelligence ("g") is The combination of componential, experiential, and contextual intelligence An underlying factor that explains individual differences in intellect Specific, independent abilities that vary among individuals The first "primary" mental ability

An underlying factor that explains individual differences in intellect

According to the lecture, which of the following is a problem with the below response scale? Very untrue Untrue Slightly untrue Neutral Slightly true True Very true Has too many options Contains a neutral option Response scale is not balanced Answers are not all mutually exclusive

Answers are not all mutually exclusive

Sonya's new measure of volunteerism did not explain a significant amount of additional variance in life satisfaction (change in R2) beyond that explained by an existing volunteerism scale. This means that: The new measure of volunteerism IS significantly correlated with life satisfaction (alone) Any of the above could be true; we cannot tell from the information provided The new measure of volunteerism is NOT significantly correlated with life satisfaction (alone) The existing volunteerism scale is significantly correlated with life satisfaction (alone)

Any of the above could be true; we cannot tell from the information provided

Assignment revisions in the Scale Development Project... Can earn you back all of the points you lost on the assignment Can be turned in any time after the assignment has been graded Are possible for the introduction and existing measures sections Are possible for all SDP assignments

Are possible for the introduction and existing measures sections

Item characteristic curves are helpful because they are easier to understand than the discrimination index are easier to understand than the difficulty index show the graphical relationship of difficulty and discrimination Can help identify who the item best discriminates among (e.g., low vs. moderately performing individuals)

Can help identify who the item best discriminates among (e.g., low vs. moderately performing individuals)

A Q-sort format is helpful because It assesses complex thinking skills (like synthesis and application) It forces people to discriminate among options when they normally would all select the highest or lowest values It allows you to easily score many items quickly It promotes free expression and creativity

It forces people to discriminate among options when they normally would all select the highest or lowest values

Dr. Testopherson creates a new scale measuring anxiety. Unfortunately, his scale has very poor reliability: items do not seem to all measure the same thing and individuals get different scores each time they take it. How should Dr. Testopherson go about establishing his scale's validity? By asking experts to verify that the scale items are good examples of various aspects of anxiety By correlating scores on his scale with related (e.g., depression) and unrelated (e.g., GPA) constructs By determining whether people who score high on his scale tend to have a diagnosed anxiety disorder according to the DSM He really can't. Until his scale is reliable, it cannot be a valid measure of anything.

He really can't. Until his scale is reliable, it cannot be a valid measure of anything.

Bryce has a 5-item scale and decides to run an item analysis. He correlates participants' scores on Item 1 with the mean score of all five items on the scale (including Item 1). He finds an item-total correlation of .42. This correlation... shows that the scale is reliable shows that Item 1 is reliable shows that Item 1 is valid Is an OVERestimate; Bryce actually needs to run a corrected item-total correlation (excluding Item 1 from the scale mean)

Is an OVERestimate; Bryce actually needs to run a corrected item-total correlation (excluding Item 1 from the scale mean)

An advantage of the continuous rating scale (compared to a category or likert scale) is: It is easier to score It only requires that the endpoints be rated It allows fine grained distinctions without overly taxing participants It requires fine motor control

It allows fine grained distinctions without overly taxing participants

How is guessing reduced with matching items? Repeating answer options Providing a great number of premises Providing more response options Using two columns when formatting

Providing more response options

Which of the following levels of measurement provides the most information about a variable Nominal Ratio Ordinal Interval

Ratio

Jennifer is creating a scale to measure introversion. In her experience, people who are introverted prefer to be alone, so she creates an item on her scale that reads: "I prefer to be alone" (agree-disagree). This is an example of what type of strategy for choosing scale content?. Criterion-group strategy Factor analysis strategy Rational-content strategy Theory-based strategy

Rational-content strategy

Andrew answers 5 questions about self-esteem (each rated 1-5) and earns a total score of 19 on this self-esteem measure. 19 is an example of what kind of score? Raw score Norm-referenced score Criterion-referenced score Z-score

Raw score

If a measure is said to be consistent, you might conclude that the measure is _______________________. Standard Reliable Concurrent Valid

Reliable

What does a percentile (rank) score of 85 tell us about an individual? Scored higher than 85% of others taking the test Has mastered 85% of the material Tells us nothing about the individual Scored higher than 15% of others taking the test

Scored higher than 85% of others taking the test

Which of the following is required of the test taker when answering multiple choice, true-false, and matching questions? Selecting information Supplying information Applying knowledge Synthesizing material

Selecting information

Which of the following is required of the test taker when answering essay, short answer, and completion items? Supplying information Selecting information Applying knowledge Synthesizing material

Supplying information

The grid that served as a guide when constructing an achievement test is called _______. Table of Outcomes Objectives Table Table of Specifications Specifications Grid

Table of Specifications

A measure of how stable scores on a test are over time is an example of which of the following? Internal consistency Parallel forms reliability Test-retest reliability Interrater reliability

Test-retest reliability

Sarah now has her regression output. To determine whether the number of Disney videos watched within the last month (DisNum) accounts for additional unique variance in children's enjoyment (Enjoy) of a Disney World trip, over and above the child's age (Age), Sarah should look at The change in R squared in Model 2 The overall R squared value for Model 1 The overall R squared value for Model 2 The change in R squared in Model 1

The change in R squared in Model 2

Incremental validity assesses the extent to which a scale _______. measures what it is designed to measure is correlated with an expected outcome measures a construct consistently accounts for variance in an outcome, beyond that which can be explained by other measure(s)

accounts for variance in an outcome, beyond that which can be explained by other measure(s)

When a psychological test takes the form of self-reported answers to questions, it may also be referred to as a scale survey questionnaire all of the above

all of the above

Sally finds a corrected item-total correlation of .42 for an item on her scale. Sally should keep the item - it certainly measures the same construct as the other items keep the item - it shows that the scale is valid (measures what it says its measuring) think about revising or discarding the item - it doesn't correlate that well with the rest of the construct automatically discard the item -- it's just too poor of a fit with the rest of the scale

automatically discard the item -- it's just too poor of a fit with the rest of the scale

The change in the value of the multiple correlation when a new variable is added to a regression model can be used to provide evidence about The validity of a new scale The reliability of a new scale The novelty or usefulness of a new scale The creativity of a new scale

The novelty or usefulness of a new scale

To determine how much variance the combination of number of Disney videos watched within the last month (DisNum) and child's age (Age) accounts for in children's enjoyment (Enjoy) of a Disney World trip, Sarah should look at The overall R squared value for Model 1 The overall R squared value for Model 2 The change in R squared in Model 1 The change in R squared in Model 2

The overall R squared value for Model 2

When dividing scores into quartiles, each quarter should have The same NUMBER of scores The same STANDARD DEVIATION The same RANGE of scores The same NORM

The same NUMBER of scores

Cronbach's alpha essentially averages all possible split half reliability coefficients to estimate reliability. This is an improvement over a single split-half reliability estimate because It corrects for items within a measure being on different scales The number of items being correlated is smaller when you split your test in half None of the above The split-half reliability estimate depends on how you split your scale (e.g. odd-even vs. 1st half - 2nd half)

The split-half reliability estimate depends on how you split your scale (e.g. odd-even vs. 1st half - 2nd half)

The difficulty index and the discrimination index are Inversely related to one another The two main components making up item analysis Only calculable when there is a single right answer Both rated on a 0 to 1 scale

The two main components making up item analysis

Jennifer next reads that the psychological definition of introversion involves individuals who gain energy from reflection and lose energy during social interaction. So she also includes the item "spending time interacting with others tends to drain my energy" (agree-disagree). This is an example of what type of strategy for choosing scale content? Theory-based strategy Rational-content strategy Criterion-group strategy Factor analysis strategy

Theory-based strategy

What is NOT a reason that multiple choice questions are often preferred for assessing learning/achievement? They allow for creative and unique responses They are easy to score They can be used to measure learning outcomes at almost any level They lend themselves to item analysis

They allow for creative and unique responses

A pitfall with matching items is that _____ They are only practical when you can generate a large number of options They are hard to administer to a large number of people The questions are not independent They deemphasize writing ability

They are only practical when you can generate a large number of options

What benefit do True/False, Multiple Choice, Matching, and Completion items all share? They are easy to write They mainly assess basic knowledge and memorization skills You can fit many on a test which can increase content validity They are objective and easy to score

They mainly assess basic knowledge and memorization skills

When we calculate reliability, we know the observed score. What are the two unknown components of the reliability equation that we can only estimate (not directly measure)? Method and error scores Test-retest and interrater scores True and error scores Means and standard deviations

True and error scores

Sarah wants to know whether the number of Disney videos watched within the last month (DisNum) accounts for additional unique variance in children's enjoyment (Enjoy) of a Disney World trip, over and above the child's age (Age). When running a multiple regression to examine this question, Sarah should: *DV = Dependent Variable *IV = Independent Variable Use Enjoy as the DV; enter DisNum as the first IV (block1) and Age as the second IV (block 2) Use Enjoy as the DV; enter Age as the first IV (block1) and DisNum as the second IV (block 2) Use DisNum as the DV; enter Enjoy as the first IV (block 1) and Age as the second IV (block 2) Use Enjoy as the DV; enter Age and DisNum together in a single block (block 1)

Use Enjoy as the DV; enter Age as the first IV (block1) and DisNum as the second IV (block 2)

Because "intelligence" is often inferred from behavior on a test, it's measurement must be Well-grounded in theory Quantitative Assessed using factor analysis Group admininstered

Well-grounded in theory


Set pelajaran terkait

Financial Accounting Module 6 - Analyzing Financial Statements

View Set

Chapter 1: ENV Problems, Their Causes, and Sustainability

View Set