Evaluation and Research Test 2
The maximum value of item discrimination in norm-referenced test items occurs when the item difficulty is
.50
Which hypothesis is stated in the null form?
There is no difference between the vocabulary scores of average and high-ability students
Which of the following is a technique for providing information about the reliability of an instrument?
Coefficient alpha
If a thermometer measured the temperature in an oven as 400 degrees 5 days in a row when the temperature was actually 337 degrees, this measuring instrument would be considered
Reliable but not valid
A major difference in the purpose of a meta-analysis and a review of literature for a research paper is that the meta-analysis uses the literature
For empirical and theoretical conclusions
A threat to the validity of rating scales is that the rater may tend to let her previous knowledge of the student in another capacity (e.g., another course) influence the rating. The threat is called the
Halo Effect
Scaled responses to questionnaire items such as strongly agree, agree, no opinion, and so on are referred to as ______ scale items
Likert-type
A researcher tests the knowledge that 10,000 school children from the southwestern United States have about nutrition, and the scores are transformed into percentiles by grade level. This type of research is a ____ survey study
Normative
If a researcher finds a small difference in average test scores between a large sample (over 700) of experimental participants and a large sample (same size) of control participants, it is very likely that the difference is
Statistically significant but does not have a high degree of meaningfulness
The fact that a meta-analysis quantifies findings from many studies is a frequently mentioned advantage. Another strong advantage of the meta-analysis over the normal review paper is
The description of the coding of study characteristics and the criteria for exclusing studies
The use of intraclass correlation to estimate reliability in motor performance instead of Pearson r has the advantage(s) of
(a and b) allowing analysis of trial-to-trial change and estimating reliablity for more than two trials
A null hypothesis is
A statistical hypothesis that assumes that there is no difference among the effects of treatments
Validity is determined by finding the correlation between scores on
A test and some independent criterion
The author of a test of anxiety on the premise, based on the literature, that the performance of people with high anxiety suffers when they are under stress. In an experiment, the author finds that people who scored as highly anxious on the test performed more poorly on a stressful task. The test author maintained that his finding was evidence of what kind of validity?
Construct
When a tester divides the number of students who answered a test item correctly by the number of students who attempted the item, the result is called
Index of difficulty
When a national poll reports that its survey has a margin of error of 3%, it is referring to error due to
Sampling
The type of scale that uses items such as "Worthless-Valuable" is called a
Semantic differential scale
In a meta-analysis, the ES for the experimental versus control groups was .68, which represents a percentile equivalent of 75. This means that
The average score of the experimental group was better than 75% of the control group's scores
For scores from a test to have good content validity, the following statement(s) must be true:
(b and c) the test adequately samples what was covered in the course and the percentage of points for each topic area reflects the amount of emphasis given that topic
In calculating ES for a treatment effect between a pretest and a posttest, the test recommends the use of which standard deviation?
The pretest standard deviation
The most frequent and severe criticism of meta-analysis is that
The studies represent wide differences in methodologies, designs, and measurements
If the researcher fails to reject the null hypothesis when there is really a difference, this is an example of a
Type II error
The first step in conducting a questionnaire survey is to
List specific objectives to be achieved
Which of the following are most difficult to analyze in a questionnaire study?
Open-ended items
Reliability of a measure may be established by
(a and b) giving the test to the same people on two different occassions and correlating the two sets of scores and correlating scores from alternative forms of the same test
To help ensure an acceptable return rate in a questionnaire study, the researcher should
(b and c) Use follow-up letters or e-mails and enclose or attach another copy of the survey
A predictive validity technique involving the construction of a two-way grid of percentages of students at different levels of an aptitude test who achieved different grades is called
An expectancy table
The accuracy with which a 12 minute run estimates maximal oxygen consumption in a group of male high school seniors represents
Concurrent validity
Comparing test items with the course objectives (course topics) checks which type of validity?
Content
A physical education teacher develops a skill test in volleyball. After administering the test to 50 students, she asks the volleyball coach to rate the students on volleyball skills. She then correlates the students' test scores with the coach's rating. This is an example of what type of validity?
Criterion validity
The inclusion criteria in a meta-analysis are very much like
Delimitations
A survey technique that asks a jury of experts to respond to a series of questionnaires in order to reach a consensus about important issues is the
Delphi survey method
The practice of selecting a small random sample of a survey's nonrespondents is primarily intended to
Determine whether the nonrespondents are a different population from the respondents
Choose the correct sequence for developing a questionnaire
Determining objectives, delimiting the sample, writing items, pilot testing
Consider the following questionnaire item (which uses a yes/no format): "Do you use objective test items or essay items?" What rule regarding item construction does it violate?
Do not use items that have two or more separate ideas in the same item
A measure of meaningfulness that expresses the difference between the experimental and control group in standard devision units is the
Effect size
A researcher decides to use an alpha of .01 and a power of .80. To determine the needed sample size the researcher must ascertain the expected
Effect size
In a meta-analysis, the results of various studies on a topic are quantified using a standard measure (ES) that is called
Effect size
When a researcher states that a result is significant, this means that
The result is unlikely to be a chance occurrence
When a researcher claims that there is a difference between treatments (i.e., rejects the null) when there really is no difference, what type of error is this?
Type I error
When an experimenter states that the level of significance is the .05 level, he is setting the probability of committing which type of error?
Type I error