Reliability Test

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

There are two versions of an achievement test. The publishing company gives both versions to a large group of students and obtains score on both versions. The company then correlates the scores. What type of reliability is being assessed?

Alternate form

Two different versions of a self-report instrument have been developed and are being used to assess attitude before and after instruction. What method of assessing reliability would be important to assess in this situation to determine if the versions are reliable with one another?

Alternate form

Which of the following is a source of error?

An administrator recording the wrong score for a test

When would a screening assessment be administered?

At the beginning (before instruction)

When do benchmark assessments occur?

At the end of a significant amount of time

When does summative assessment occur?

At the end of the unit or year

You are assigned to create an end-of-year test to measure whether students have learned what they were supposed to have learned for the year. How would you do that to assure content validity?

Base your test upon the objectives for the course.

Convergent validity is most like which other type of validity?

Concurrent

What is the primary purpose of benchmark assessment?

Determine general progress toward broader educational goals

What would a teacher do to create a test which has content validity?

Determine what the objectives are for the unit and create the test so that it measures the objectives.

A researcher correlates the Bolton Depression Scale (BDS) with the Mood inventory (MI). The purpose of doing it is to show that the BDS is not measuring the same thing as the MI. What type of validity is being assessed by doing so?

Discriminant

The following is a description of fifth-grade norming group for a normative summative test. The norming group consists of equal numbers of male and female elementary students from different regions of the country, proportional to the number of students in each region. The norming group consisted almost exclusively of white students. The sample consisted of students with special needs from all the schools included in the sample In what area is the norming group lacking? In the representation of:

Ethnic groups

Two teachers are grading essays written by students. How would you assess the content validity of the rubric used by the teachers?

Examine the rubric, comparing it against the objectives for the unit to see if the criteria are appropriate. If the criteria match the objectives for the unit, then the rubric would have content validity.

A researcher has recorded six adults as they converse in a group. She wants to have two assistants view the recordings and use a rubric to assess the level of participation of each of the members of the group. How would the researcher assess the interrater reliability of the assistants?

Have both assistants evaluate all member's participation independently and compare their evaluations to see if they are consistent.

Two teachers are evaluating science projects at a science fair. How would the principal assess the interrater reliability of the teachers in grading the projects?

Have both teachers grade the projects and correlate their scores (one teacher to the other) to see if they are consistent.

What effect will using standardization (the process of training raters to use a rubric correctly) have upon the reliability of a measurement? It will:

Increase the reliability of the scoring

What forms of assessing reliability would be used if a teacher wants to assess reliability for a written test but cannot administer it again?

Internal consistency

In what area could the questions need revising, since the questions might be ambiguous?

Interpretation of Historical Documents

If a teacher doesn't do a good job of teaching and the students don't learn, this will reduce the validity of the test.

It doesn't since the test can still do its job of measuring what students have learned, even if they haven't learned.

Which method of assessing reliability requires only one test administration of one test?

KR-20

If you want to see if an achievement test is consistent with itself and you can only administer the test once, what form of reliability would you use?

KR-20 is a form of internal consistency. Internal consistency determines if a test is consistent with itself when a test is only administered once.

In which area is this student's raw score the lowest?

Key Ideas and Details- Informational Text

Which of the following is a potential source of error for multiple-choice tests?

Knowing the answer and recognizing the answer both mean that the student knows the material and, with regards to recognizing the answer, can determine what it is on the test. Guessing the answer implies that the student doesn't know the answer, but happens to choose it. That is error since the test result is different than what the student deserves.

If a teacher does not base her test upon the objectives of the unit taught, will improving the test questions by making the wording clearer increase the content validity of the test?

No

If a teacher doesn't do a good job of teaching and the students don't learn, this will reduce the validity of the test.

No, it will not.

What impact will any personal problems which cause a student to be distracted during the test have upon the validity of the score?

Personal problems are a form of error which reduces the reliability and therefore reduces the validity. The score will not reflect what the student actually know.

Stockbrokers make try to determine how stocks will perform in the future. A person is considering choosing a stockbroker. In order to assess if she should use the stockbroker, she looks at the stocks the stockbroker's is saying will do well and compares it with the stocks' actual performances. Which of the forms of validity is being assessed?

Predictive

What is the primary purpose of summative assessment?

Providing an overall evaluation regarding what students have learned

What happens to reliability when error decreases?

Reliability and error are inversely related. As error decreases reliability will increase.

A teacher wants to see if students' knowledge about the subject matter she is teaching has changed by giving the same test before and after instruction. What type of reliability would be important for the test to have to do that?

Test-retest

Which of the following is a summative assessment?

The Keystone Examination

Here is information about a test from a review: Internal consistency reliability coefficients ranged from .70 to .91 for subscales, and .96 to .97 for the full scale score.

The full scale score will be more reliable because there are more questions. The items on the subscales are not more difficult since the total score is a combination of the items in the individual subscales. The quality would not be different either for the same reason.

During a test, a student starts acting out, screaming and disrupting the class. What impact would this have upon the reliability of the test?

The reliability would decrease.

The following scores on an English test. Which of the following scores would have the highest reliability coefficient?

The total score will always be more reliable since it is longer (has more questions). A longer test will be more reliable.

The purpose of the SAT is to predict how well students will do in college. How would you assess the predictive validity of the SAT? Compare the SAT scores against:

Their college GPA.

Which of the following is a potential source of error when the teacher is scoring a test?

There are distractions occurring while the teacher is scoring the test

Content validity would important for a benchmark assessment.

True

Correlation is used to compare scores when calculating a reliability coefficient.

True

If a test has little error, you would expect the student's raw score to be close to the true score.

True

If done well, standardization should increase intrarater reliability.

True

Which of the following is a source of error?

When an administrator records a wrong score, what that means that score does not reflect what they know. That is the definition of error. If a teacher doesn't teach the material, the test results will reflect that. And if the student didn't study for the test, that will be reflected on the test as well.

Which of the following would have the lowest reliability coefficient?

Writing Composition

Which of the following would have the lowest reliability coefficient?

Writing composition

Would you be comfortable using the Historical Reasoning score to make everyday decisions?

Yes

What form of assessing would be important if you are going to use two different versions of the same measuring instrument before and after students are taught to see if their knowledge has increased?

alternate form

A student cheats by looking at another student's test and obtains a few answers to questions which the student did not know the answer to. What impact would this have upon the reliability of the test?

decrease

Which method of assessing reliability correlates the scores from a repeated administration of the same test?

his is test-retest reliability since it involves administering the same test twice (with a period of time in between).

When discriminating between two constructs, you expect to find how strong a correlation between two measures of these constructs?

low to moderate

All things being equal, including number of questions, which of the following will be less reliable?

true false test

Which of the following is a source of error?

A student struggles with understanding the wording a question because she has a reading learning disability. As a result of this struggle, the student gets the question wrong.

Which of the following is a potential source of error when the teacher is scoring a test?

A teacher makes a mistake when calculating a student's score

Which of the following is a potential source of error?

A test question is ambiguously worded

A new test for cholesterol was developed by a company. To determine if the test is accurately measuring cholesterol levels, the company conducts a study. It administers the test to a group of people, has a doctor test the same people and compares the results to see if the new test is accurately measuring the cholesterol. Which of the forms of validity is being assessed?

Concurrent

A teacher has developed a measure of aptitude toward science. If the teacher wants to establish immediately if the test has validity, what method would the teacher use? Note: The teacher is not interested in knowing what the student knows about science, just if they have aptitude.

Concurrent

Reliability, in assessment, has primarily to do with:

Consistency

A test is supposed to be measuring math anxiety. Which of the following methods of assessing validity would be important for this test?

Construct

A school district has created a set of examinations which students have to take to determine if the students have learned the material for each school year. The district hires a consulting firm to come and examine the test to make sure it is measuring the district's learning outcomes. What form of validity is being assessed?

Content

A school district has developed an end-of-year test to determine if students have learned what they were taught. What type of validity would be more important for this test to have?

Content

A teacher wants to determine if the standardized test that a school district is using really measures the skills required by state standards. What type of validity study should the teacher conduct?

Content

What type of validity would be the most important for homework?

Content

Which type of validity would be important for a summative achievement test?

Content

A researcher correlates the Bolton Depression Scale (BDS) with the Hamilton Depression Rating Scale (HDRS). The purpose of doing it is to show that the BDS is measuring the same thing as the HDRS. What type of validity is being assessed by doing so?

Convergent

An IQ test is supposed to have two dimensions, quantitative and verbal. A researcher administers the IQ test and uses factor analysis to test if there are one dimension or two. The correlation among the items is assessing what type of validity?

Convergent

On the printout below, the black line represents an earlier administration of the DRC and the while line represents a later administration. For which of the tests/scores below would you conclude that the student's true score has improved?

Craft/Structure & Integration of Knowledge - Informational Text

After you defined your construct, what is the second step in assessing construct validity?

Creating a measurement instrument

To determine if stress and anxiety are two different constructs, which method should be used?

Discriminant validity

Which of the following is a potential source of error when the teacher is scoring a test?

Distractions have by their nature the potential of taking the grader's attention away from the grading process so that the grader will give the wrong grade.

A student not having studied is a source of error.

False

Reading through the rubric prior to grading is unnecessary if you have created the rubric.

False

Two instruments measuring two supposedly different constructs have a correlation coefficient of .72. You would conclude that the two constructs are actually just one construct.

False

Two teachers are grading the same science projects at a science fair. Afterwards, the scores were correlated. The correlation was .53. This would be an acceptable reliable coefficient.

False

Which of the following is a source of error?

Getting the question wrong because of a reading learning disability would be an error since the score would not reflect what the student knows.

In what area could the questions need revising, since the questions might be ambiguous?

Interpretation of Historical Documents has a reliability coefficient which falls below .70. It falls below the acceptable standard. That could mean that the questions need revising since low reliability means that the test questions could be impacted by error resulting in inconsistency.

If you want to see if two people are grading consistently with one another, what method of assessing reliability would you use?

Interrater

Two persons are observing students on the playground to evaluate each student's level of aggression. What type of reliability would be important in this situation?

Interrater

Two teachers are evaluating science projects at a science fair. How would the principal assess the interrater reliability of the teachers in grading the projects?

Interrater means "between raters." So, interrater reliability is determined by comparing the ratings from two teachers to see if they are consistent. The principal would then correlate the ratings.

When grading essays, a teacher wants to be sure that his grading is reliable. So, after grading the essays he waits three weeks and grades the same essays again. He then correlates the scores. Which form of reliability is being described?

Intrarater

Which form of reliability would be important if you want to see if a grader is consistent with him or herself, what form of reliability would be important?

Intrarater reliability ("within rater") would be important to determine if a grader is consistent with him/herself since it focuses upon the consistency of the person grading over time.

A reliability coefficient is calculated and found to be .83. For which purpose(s) is the reliability coefficient acceptable?

It is acceptable for everyday purposes, but not for critical purposes

A reliability coefficient is calculated and found to be .51. For which purpose(s) is the reliability coefficient acceptable?

It is not acceptable for any purposes

A teacher administers a test to his students. However, during the administration, a student throws up. Assuming that this significantly distracts the students, what will happen to the validity of the test?

It will decrease.

A reliability coefficient is calculated and found to be .48. The teacher identifies some ambiguous questions on the test and makes the wording clearer. What will happen to the reliability coefficient?

It will now be higher

A job counselor wants to know if an interest inventory is valid for determining how interested high school students will be in a job once they are working. What type of validity would be most important for this type of instrument?

Predictive

A researcher creates a test which is supposed to determine who will be more successful in their profession. The researcher gives the test to college students from the same major. She then correlates the test with a measure of professional success, given after ten years in the profession. Which form of validity is being assessed?

Predictive

A researcher has created an instrument to assess four-year-olds' aptitude for kindergarten. Which of the following types of validity would be important for such an instrument?

Predictive

A teacher has created another version of an already existing test. The new version of the test is administered to a large group of students in addition to the first version. The scores are compared to see both forms give similar scores. What is being assessed?

Reliability

During a test, a student starts acting out, screaming and disrupting the class. What impact would this have upon the reliability of the test?

Reliability and error are inversely related. As error increases reliability decreases. When error is introduced by the student acting out, the reliability of the test decreases.

What happens to reliability when error increases?

Reliability decreases.

What happens to reliability when error decreases?

Reliability increases.

If you want to see if a written test is reliable over time, what form of reliability would be important for the test to have?

Test-retest

Which method of assessing reliability correlates the scores from a repeated administration of the same test?

Test-retest

A professor has created a written, self-report test to measure anxiety toward the subject matter. How might one assess the test-retest reliability of the instrument?

Test-retest reliability looks at consistency across time. It involves giving it, waiting, and giving it again. Then the scores are compared.

A reliability coefficient is calculated and found to be .96. For which purpose(s) is the reliability coefficient acceptable?

Tests with reliability coefficients .90 or greater can be used for making decisions for all purposes, critical or everyday. Tests with reliability coefficients .70 or greater can be used everyday decisions, but not critical decisions. Tests with reliability coefficients less than .70 should not be used for making any decisions since it falls below acceptable standards.

If two test scores are highly negatively correlated, -.96, that indicates that the two test scores are measuring the same thing.

True

In the printout below, the black line represents an earlier administration while the while lines represent a later administration of the DRC. For all of the tests/scores in the printout below, you would conclude that the true score has not changed.

True

Maintaining control over the testing environment is important if you want your test to be reliable.

True

The purpose of giving a test is to measure the students' true scores.

True

Theoretically, if a test has no error, you would expect the student's raw score to be the same as the true score.

True

Validity has to do with the true score while reliability has to do with error.

True

Choose the best answer: A reliable assessment is one which

Validity asks the question, does the test do what it is supposed to do? This is another way of saying, does the test fulfill its purpose? Reliability, on the other hand, has to do with whether the test has consistency. Replicable means whether the results can be replicated, or repeated. If something can be replicated, it is consistent. The response, measures the content taught, has to do with content validity, not reliability.

For a science fair, two judges judge the projects. After the science fair, the principal looks at the scores given by both judges for all projects and compares the scores to see if the two judges gave similar scores. Which form of reliability is being described?

When human beings are doing the judging, there are two possibilities: inter- and intrarater reliability. The prefix, inter, means "between." The prefix, intra, means "within." So, if there are two people doing the judging, it would be interrater reliability since you are assessing the reliability between two judges.

Which method of assessing reliability uses two people to evaluate something?

When human beings are doing the judging, there are two possibilities: inter- and intrarater reliability. The prefix, inter, means "between." The prefix, intra, means "within." So, if there are two people doing the judging, it would be interrater reliability since you are assessing the reliability between two judges.

Content validity would be important for a diagnostic test.

true

The more error a test has, the more the raw scores will vary around the true score.

true

Would you be comfortable using the Historical Reasoning score to make everyday decisions?

yes


Set pelajaran terkait

SCSM-450_Ch11_Managing Capacity and Demand

View Set

Unit 4 Chapters 10-15 The Nursing Process

View Set

Astronomy 101 - Exam ch. 13, 14 & 15

View Set

Chapter 10: Leadership, Managing and Delegating (Combined)

View Set