Test and Measurements: Exam 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Which of the following correlations would indicate that two tests were measuring unrelated skills? A. -.23 B. .00 C. .50 D. -1.00 E. .85

.00

History of Mental Testing:

1. Early Period. 2. Boom Period. 3. First Period of Criticism. 4. Battery Period. 5. Second Period of Criticism. 6. Age of Accountability.

Steps in Measurement Process:

1. Identifying and Defining the Attribute. 2. Determining Operations to Isolate and Display the Attribute. 3. Quantifying the Attribute.

Measures of Central Tendency:

1. Mode 2. Median 3. Percentiles 4. Arithmetic mean

General Ethic Principles

1. Professional Training and Competence. 2. Professional and Scientific Responsibility. 3. Respect for the Rights and Dignity of Others. 4. Social Responsibility

Levels of Measurement:

1.Nominal 2.Ordinal 3.Interval 4.Ratio

Cumulative Frequency Curve (Ogive):

Graphic representation of the cumulative frequency distribution.

Arithmetic mean (M):

The sum of a set of scores divided by the total number of scores.

Operational Definition:

An attribute defined by how it is measured.

The types of tests that are most suspect when used with minority or other special groups in society are those that: A. Appraise progress in school. B. Will be used for personal decisions. C. Describe the individual at the present time. D. Are used to predict later performance. E. Are non-verbal in content.

Are used to predict later performance.

For making predictions, a test that yields a large negative correlation with a criterion could be characterized by which of the following: A. As useful as one with the same sized positive correlation. B. Preferable to any other. C. Worse than useless. D. No better than one with a zero correlation. E. Any of the above, depending on the test's empirical validity.

As useful as one with the same sized positive correlation.

True or False: Testing professionals should NOT inform a test taker about the consequences of not taking a test, should they choose not to take the test.

False

What do we call statistics that use samples to provide information about a population? A. Inferential B. Population-based C. Non-parametric D. Theoretical E. Descriptive

Inferential.

Types of Decisions:

Instructional, Curricular, Selection, Placement or classification, Personal.

Which of the following would be considered to be a produced-response type item? A. True-False. B. Multiple choice. C Matching. D. Short answer. E. None of these; they are all selected-response.

Short answer.

Which of the following is NOT a measure of variability? A. Variance B. Semi-Interquartile Range C. Standard Variation D. Range E. Standard Deviation

Standard Variation

Factor Analysis:

Study the pattern of intercorrelations to see which tests are measuring some common dimension or factor.

Median:

The value on the score scale that separates the top half of the group from the bottom half. -Although scores go by jumps (or discrete increments) of one unit, we consider the underlying ability to have continuous distribution. -If the median falls between two individuals with different scores, use the halfway point.

True or False: Because test takers have the right to be informed of their rights and responsibilities as test takers, it is normally the responsibility of the individual who administers a test (or the organization that prepared the test) to inform test takers of these rights and responsibilities.

True

What are two social benefits of testing?

Unified validity: worthiness of a test cannot be separated from its use and the inferences that result. Understanding of how people develop over time.

In preparing a histogram score intervals are shown along the: A. Tangential plane. B. Y-axis. C. Abscissa. D. Polygon function. E. Ordinate.

Abscissa.

Privacy

Degree of access others have to one's body or behavior.

According to your text, a score of 25 should be thought of as meaning: A. From 25 to just not quite 26. B. Exactly 25. C. More than 24, but not more than 25. D. From 24.5 to 25.5.

From 24.5 to 25.5.

Cumulative Frequency Distribution:

Lists each score or interval and the number of scores falling in or below the score or interval.

APA General Principles

Principle A: Beneficence and nonmaleficence. Principle B: Fidelity and responsibility. Principle C: Integrity. Principle D: Justice. Principle E: Respect for people's rights and dignity.

Grouped Frequency Distribution:

Scores grouped into broader categories to improve clarity of presentation.

Convergent validity:

The extent that different ways of measuring the same trait yield high correlations. -Traits need not have near-zero correlations to demonstrate evidence of convergent and discriminant validity

Face Validity:

What a test "looks like", or appearance of reasonableness. -Never sufficient evidence of validity for use of a test. -Important when voluntary cooperation of examinees is important

The typical procedure for establishing units in educational and psychological measurement relies on: A. A definition of a unit expressed in physical terms. B. A direct comparison of one unit amount with another. C. The intuitive judgment of the teacher or clinician. D. A rank ordering of people on the trait of interest. E. A definition that states that anyone item on a test is equivalent to any other item.

A definition that states that any one item on a test is equivalent to any other item.

Ethics

A set of moral principles or rules of conduct that defines and synthesizes socially valued elements of right and wrong.

Frequency Distribution:

A table that shows how often each score has occurred

Your text says the primary function of testing is to help people: A. Assess personality. B. Learn more efficiently. C. Understand one another. D. Measure basic abilities. E. Aid decision making.

Aid decision-making.

Psychologists obtain informed consent for assessments, evaluations, or diagnostic services EXCEPT when: A. Testing is mandated by law or governmental regulations. B. Informed consent is implied because testing is conducted as a routine educational, institutional, or organizational activity (e.g., when participants voluntarily agree to assessment when applying for a job). C. One purpose of the testing is to evaluate decisional capacity (aka the ability of health care subjects to make their own health care decisions) D. All of the above

All the above.

Which of the following would be used for construct validation? A. Predictions concerning group differences. B. Expert judgment. C. Predictions about the effects of experimental treatments. D. Correlations. E. All of the above.

All the above.

The most serious limitation of the multiple-choice type of item is that it: A. Encourages guessing. B. Is difficult to write. C. requires a high level of reading skill. D. Is limited to the appraisal of recall of knowledge. E. Cannot appraise originality.

Cannot appraise originality.

The College Board Scholastic Assessment Test is required for admission by both College P and College Q. College P is quite selective and has room for only about 25 percent of its applicants, while College Q admits about 75 percent. With which college is the test likely to be more useful? A. Same in both. B. College Q. C. College P. D. The selection rate is too low to be effective in either.

College P.

Restricting the items on a standardized achievement test to the course contents and learning outcomes typically found in the majority of schools is necessary to ensure: A. Criterion-related Validity. B. Construct related Validity. C. Test Reliability. D. Face Validity. E. Content-related Validity.

Content-related Validity.

The number 3.12 could not be a: A. Standard Deviation. B Median. C. Correlation coefficient. D. Mean. E. Variance.

Correlation Coefficient.

Which of the following would NOT be likely to be used in appraising the content validity of a high school level standardized achievement test in English? A. Examination of recommendations by the Modern Language Association. B. Pooled judgment of a group of experts. C. Analysis of the content of high-school English textbooks. D. Results from a teachers' focus group. E. Correlations with college marks.

Correlations with college marks.

Cumulative Percent:

Cumulative frequency divided by the total number of cases.

Confidentiality

Degree of access others have to information given voluntarily from one person to another

Empirical Validity (Statistical Validity):

Evaluation of a test as a predictor. -Give test to an entering group--then follow up by obtaining a specified measure of success, referred to as the criterion. -Compute the correlation, the more effective the test is as a predictor, the higher its criterion-related validity

True or False: The rights and responsibilities outlined in the APA publication Rights and Responsibilities of Test Takers are inalienable rights such as those listed in the US Bill of Rights, and as such are legally binding.

False

Measurement often fails in psychology or education because we are unable to define clearly the trait that we wish to measure. This would most likely be a problem in attempts to appraise: A. Scholastic Aptitude. B. Reading Comprehension. C. Good Citizenship. D. Athletic Skill. E. Mechanical Interest.

Good Citizenship.

The practice of underlining the keyword in true-false items is considered: A. Poor practice because it encourages the use of specific determiners. B. Poor practice because it limits the complexity of the statements that can be used. C. Good practice because it reduces the effects of guessing. D. Good practice because it reduces item ambiguity. E. Poor practice because it gives examinees clues to the right answer.

Good Practice because it reduces ambiguity.

All of the following were considered to be causes of increased interest in individual differences during the second half of the 19th century except: A. Replacement of patronage system in government with civil service testing. B. Growing interest in the study of neurological functioning. C. Refinements in the way the concept of abnormality was defined in the medical community. D. The large number of immigrants coming to the United States. E. Growing demand for accountability in school-related decision-making.

Growing interest in the study of neurological functioning.

A carefully constructed objective test will: A. Have items measuring each content area dispersed throughout the test. B. Be structured in such a way that the questions can be answered on the same page as the question. C. Have the items arranged with the most difficult items first. D. Have items with the same content grouped together. E. Have different types of items dispersed throughout the test.

Have items with the same content grouped together.

In the construction of objective test items, the use of complex sentence structures and sophisticated vocabulary is generally considered A. Inappropriate, because the items are harder to write well. B. Appropriate if the examinees are of above average intelligence. C. Appropriate, because this tests higher level cognitive skills. D. Inappropriate, because this makes the test a measure of reading ability.

Inappropriate, because this makes the test a measure of reading ability.

The following true-false item was written for an examination in measurement: "Multiple-choice questions are preferred over essay-type questions." Which of the following is the most important fault with this item? A. It contains irrelevant cues to the desired answer. B. It cannot be classified as absolutely true or absolutely false. C. It is double-barreled. D. It does not require the application of a student's knowledge. E. None of these; the item is acceptable as written.

It cannot be classified as absolutely true or absolutely false.

The following true-false item was written for an examination in measurement: "It is never appropriate not to use ambiguous test items." Which of the following is the most important fault with this item? A. It is double-barreled. B. It cannot be classified as absolutely true or absolutely false. C. It contains irrelevant cues to the desired answer. D. It contains a double negative. E. None of these; the item is acceptable as written.

It contains a double negative.

The person who developed the first uniform written spelling tests was: A. E.L. Thorndike B. Joseph Rice C. Francis Galton D. Hermann Ebbinghaus E. Alfred Binet

Joseph Rice

A personality test with four different scales was correlated with success in a job situation, with results as shown below. Which scale would permit the most accurate prediction of job success? A. Neurotic tendency r=-.50 B. Self-Sufficiency r=+.40 C. Ascendance r=+.35 D. Introversion r=-.20

Neurotic tendency r=-.50

The use of trick questions is considered appropriate under which conditions? A. It is desirable to get extra spread in the test scores. B. The examinees are of above-average ability. C. The correct answer to the question is controversial. D. The material being tested is obvious and well known by the examinees. D. None of these conditions; trick questions are never appropriate.

None of these conditions; trick questions are never appropriate.

Which of the following is usually the biggest problem in establishing the predictive validity of a test? A. Obtaining a good criterion measure. B. Writing a sufficiently large sample of items. C. Devising a test with good items. D. Cost. E. Administering the test under uniform conditions.

Obtaining a good criterion measure.

Which of the following would NOT be considered a problem in the quantification of behaviors? A. Difficulties in selecting attributes. B. Concern about units of measurement. C. Opposition to the study of covert traits. D. Difficulties in identifying the best procedure to elicit traits. E. Difficulty in reaching general agreement about the meaning of a trait.

Opposition to the study of covert traits.

In general, which question is likely to have a negative discrimination index? The question that: A. Nearly everyone gets wrong. B. Overall, the most knowledgeable examinees are getting the item wrong and the least knowledgeable examinees are getting the item right. C. The best students get right. D. About half of the students get right. E. Nearly everyone gets right.

Overall, the most knowledgeable examinees are getting the item wrong and (the least knowledgeable examinees are getting the item right.)

The question we should ask before trying to define an attribute of a person is whether: A. People will agree that the attribute is important. B. People can agree about what the attribute means. C. Procedures for appraising the attribute are sufficiently reliable. D. This attribute is relevant to the decisions we must make. E. We know of any behaviors that exemplify the attribute.

People can agree about what the attribute means.

Richard, a third-grader, is having considerable difficulty with his schoolwork, particularly reading. You consult the school records and find a notation that Richard's score on a cognitive ability test was 85. On the basis of this information, the most appropriate course of action would be to: A. Recommend sending Richard back to second grade. B. Seek additional information. C. Recommend putting Richard in a slow reading group. D. Recommend to Richard's parents that they hire a tutor for him. E. Seek counseling for Richard.

Seek Additional Information.

To say that a test has discriminant validity is to imply that it: A. Is biased. B. Must be highly reliable. C. Shows low correlations with measures of other traits. D. Discriminates between people who have different amounts of the trait. E. Shows high correlations with other measures of the same trait.

Shows low correlations with measures of other traits.

The mean and the median will be identical for what kind of distributions: A. Leptokuric distributions. B. Symmetrical distributions. C. All distributions. D. Bimodal distributions. E. Skewed distributions.

Symmetrical distributions.

Criterion-Related Validity:

Tests used to make a decision that implies predicting some future outcome. -Degree to which test correlates with some chosen criterion measure of job or academic success. -The higher the correlation, the better the test

In evaluating the content validity of a test, it is of primary importance to examine: A. research that has been done with the test. B. A description of the content of the test. C. The actual items on the test. D. Validity Coefficients. E. Reliability.

The actual items on the test.

Validity:

The degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.

The standard deviation is based on: A. The deviation of each score from the group mean. B. The number of scores above the mean. C. The difference between the highest and lowest scores. D. The deviation of each individual's performance from the lowest score. E. The range of the middle 50% of scores.

The deviation of each score from the group mean.

Discriminant Validity:

The extent to which measures are free of method variance are pure measures of discrete traits. -Traits need not have near-zero correlations to demonstrate evidence of convergent and discriminant validity.

Which of the following represents an operational definition of curiosity? A. The quality of a child's exploratory behavior. B. The number of questions a child asks during a week at school. C. The length of time that a child will spend voluntarily on a puzzle. D. The range of different topics in which a child is interested. E. The child's ability to design experiments.

The number of questions a child asks during a week at school.

The use of "none of these" is an option in a multiple-choice item is only appropriate when: A. The options provide absolutely correct or incorrect answers. B. More than five correct answers can be provided. C. Guessing is apt to be a serious problem. D. The item stem presents an ambiguous problem to the examinee. E. The number of possible answer choices is limited to two or three.

The options provide absolutely correct or incorrect answers.

Percentile:

The score below which any other percentage of the group falls. -Median = 50th percentile or point below which 50% of individuals fall. -E.g., If you score in the 80th percentile, your score was better than 80% of all other scores

Mode:

The score that occurs most frequently. -Highest point on a histogram. -Midpoint of the modal interval is the mode.

In order to establish construct validity, it is necessary to show that: A. Test scores are stable over time. B. The test as a whole has a high correlation with a criterion variable. C. The test measures what its author intended for it to measure. D. The construct has important psychological or educational meaning. E. The items on the test are heterogeneous.

The test measure what its author intended for it to measure.

True or False: Psychologists who develop tests and other assessment techniques use appropriate psychometric procedures and current scientific or professional knowledge for test design, standardization, validation, reduction or elimination of bias, and recommendations for use.

True

Validity focuses on the: A. Test items. B. Uses of the test scores. C. Domain of Observables. D. Qualifications and training of test users. E. Construct.

Uses of the test scores.

Which characteristic of a test is most important? A. Validity B. Stability C. Reliability D. Consistency E. Practicality

Validity


Ensembles d'études connexes

AI CH2 - Agents and Environments

View Set

Chapter 9: Assessing the Head, Face, Mouth and Neck

View Set

Drug Therapy in Pediatric Patients (Ch. 10) NOTES

View Set

Chapter 18: Nutrition for Patients with Disorders of the Lower GI Tract & Accessory Organs

View Set

Where Is the Eiffel Tower questions

View Set

Therapeutic Interventions Exam 1

View Set