Statistics Exam 1 Definitions (Ch 1, 2, 3)
The cumulative relative frequency for the last class must always be 1. Why?
All the observations are less than or equal to the last class.
Which allows the researcher to claim causation between an explanatory variable and a response variable?
A designed experiment allows the researcher to claim causation between an explanatory variable and a response variable
Define placebo. Choose the correct answer below.
An innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication
Define treatment. Choose the correct answer below.
Any combination of the values of the factors (explanatory variables)
Why shouldn't classes overlap when summarizing continuous data in a frequency or relative frequency distribution?
Classes shouldn't overlap so there is no confusion as to which class an observation belongs.
What is a cross-sectional study? Choose the correct answer below.
Cross-sectional studies are observational studies that collect information about individuals at a specific point in time or over a very short period of time.
What does it mean if a statistic is resistant?
Extreme values (very large or small) relative to the data do not affect its value substantially.
Identify the given statement as either true or false. The standard deviation can be negative.
False
Explain the difference between a single-blind and a double-blind experiment.
In a single-blind experiment, the subject does not know which treatment is received. In a double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received.
Why is it rare for frames to be completely accurate?
It is rare for frames to be accurate because frames are obtained periodically, whereas populations are constantly changing.
Which is the superior observational study? Why? Choose the correct answer below.
Neither study is always the superior to the other. Both have advantages and disadvantages that depend on the situation.
Distinguish between nonsampling error and sampling error.
Nonsampling error is the error that results from undercoverage, nonresponse bias, response bias, or data-entry errors. Sampling error is the error that results because a sample is being used to estimate information about a population.
What does it mean when sampling is done without replacement?
Once an individual is selected, the individual cannot be selected again.
What is replication in an experiment?
Replication is applying each treatment to more than one experimental unit.
Define statistics.
Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.
Which sampling method does not require a frame?
Systematic
Define confounding. Choose the correct answer below.
The effect of two factors (explanatory variables on the response variable) cannot be distinguished.
Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile range?
The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.
A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean or the median? Why?
The mean will likely be larger because the extreme values in the right tail tend to pull the mean in the direction of the tail.
Define response variable. Choose the correct answer below.
The quantitative or qualitative variable for which the experimenter wishes to determine how its value is affected by the explanatory variable
What makes the range less desirable than the standard deviation as a measure of dispersion?
The range does not use all the observations.
In a relative frequency distribution, what should the relative frequencies add up to?
The relative frequencies add up to 1.
What are the advantages of having a presurvey with open questions to assist in constructing a questionnaire that has closed questions?
The researcher can learn common answers.
Is the following statement true or false? When plotting an ogive, the plotted points have x-coordinates that are equal to the upper limits of each class.
True
Determine whether the following statement is true or false. Explain. Inferences based on voluntary response samples are generally not reliable.
True, because it is often the case that the individuals who volunteer do not accurately represent the population.
_________ are the characteristics of the individuals of the population being studied
Variables
The _____________ of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations.
arithmetic mean
The _________________ is the difference between consecutive lower class limits.
class width
__________are the categories by which data are grouped.
classes
A(n)________is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups.
cluster sample
A ________________ is one in which each experimental unit is randomly assigned to a treatment.
completely randomized design
A _______________ distribution displays the aggregate frequency of the category. In other words, it displays the total number of observations less than or equal to the upper class limit of the class.
cumulative frequency
A _______________ distribution displays the proportion (or percentage) of observations less than or equal to the upper class limit of the class.
cumulative relative frequency
For a distribution that is skewed left, the left whisker is ___________ the right whisker.
longer than
The _______________ is the smallest value within the class and the _______________ is the largest value within the class.
lower class limit upper class limit
A ________________ is an experimental design in which the experimental units are paired up. The pairs are selected so that they are related in some way (that is, the same person before and after a treatment, twins, husband and wife, same geographical location, and so on). There are only two levels of treatment in a matched-pairs design.
matched-pairs design
For a distribution that is skewed left, which of the following is true?
mean<median
A(n)_________is a numerical summary of a sample.
statistic
The sum of the deviations about the mean always equals
zero
Define experimental unit. Choose the correct answer below.
A person, object, or some other well-defined item upon which a treatment is applied
What is a case-control study? Choose the correct answer below.
Case-control studies are observational studies that are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records.
Discuss the advantages and disadvantages of each type of question.
Closed questions are easier to analyze, but limit the responses. Open questions allow respondents to state exactly how they feel, but are harder to analyze due to the variety of answers and possible misinterpretation of answers.
What is meant by confounding? Choose the correct answer below.
Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study.
Determine whether the following statements are true or false. (a) When a factor is controlled by setting it to three levels, the particular factor is of no interest to the researcher. (b) Randomization is used so that those factors not controlled in the experiment "average out" their effect on the response variable.
The statement is false because a factor that is controlled and set at various levels is a factor of interest to the researcher. The statement is true.
The _____________, IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the first and third quartiles
interquartile range
For a distribution that is skewed right, the median is ____________ of the box.
left of center
The standard deviation is used in conjunction with the ______ to numerically describe distributions that are bell shaped. The ______ measures the center of the distribution, while the standard deviation measures the ______ of the distribution.
mean mean spread
For a distribution that is symmetric, which of the following is true?
mean=median
For a distribution that is skewed right, which of the following is true?
mean>median
The ____________ of a variable is the value that lies in the middle of the data when arranged in ascending order. We use M to represent the median.
median
The ____________ of a variable is the observation of the variable that occurs most frequently in the data set.
mode
A frequency distribution lists the_________of occurrences of each category of data, while a relative frequency distribution lists the________of occurrences of each category of data.
number proportion
What are some solutions to nonresponse?
offer rewards and incentives attempt callbacks
A(n)_________is a numerical summary of a population.
parameter
A _______________ is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category.
pie chart
The _____________, μ (pronounced "mew"), is a parameter that is computed using data from all the individuals in a population.
population arithmetic mean
The ______________ of a variable is the square root of the sum of squared deviations about the population mean divided by the number of observations in the population, N.
population standard deviation
The __________, R, of a variable is the difference between the largest and smallest data value.
range
A numerical summary of data is said to be _________ if values that are extreme (very large or small) relative to the data do not affect its value substantially.
resistant
In a boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left whisker, the distribution is skewed
right
For a distribution that is symmetric, the left whisker is the ____________ as the right whisker.
same length
The ____________, x-bar(pronounced "x-bar"), is a statistic that is computed using data from individuals in a sample.
sample arithmetic mean
The _______________, s, of a variable is the square root of the sum of squared deviations about the sample mean divided by n−1, where n is the sample size.
sample standard deviation
A(n)________is obtained by dividing the population into homogeneous groups and randomly selecting individuals from each group.
stratified sample
A ___________ is obtained by selecting every kth individual from the population. The first individual selected corresponds to a number between 1 and k
systematic sample
A _______________ is obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis. Line segments are then drawn connecting the points.
time-series plot
The _____________ of a variable is the square of the standard deviation. The ____________ is σ^2, and the ______________ is s^2.
variance population variance sample variance
The _____________ represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation.
z-score
What is a Pareto chart?
A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency.
What is a bar graph?
A bar graph is a horizontal or vertical representation of the frequency or relative frequency of the categories. The height of each rectangle represents the category's frequency or relative frequency.
What is a closed question? What is an open question?
A closed question has fixed choices for answers, whereas an open question is a free-response question.
What is a designed experiment?
A designed experiment is when a researcher assigns individuals to a certain group, intentionally changing the value of an explanatory variable, and then recording the value of the response variable for each group.
What is a frame?
A frame is a list of the individuals in the population being studied.
What is an ogive?
A graph that represents the cumulative frequency or cumulative relative frequency for the class
What is a lurking variable? Choose the correct answer below.
A lurking variable is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study.
What does it mean when a part of the population is under-represented?
A part of the population is under-represented when it is proportionally smaller in a sample than in its population.
What does it mean when an observational study is prospective?
A prospective study collects the data over time.
What does it mean when an observational study is retrospective?
A retrospective study requires that individuals look back in time or require the researcher to look at existing records.
Define factor. Choose the correct answer below.
A variable whose effect on the response variable is to be assessed by the experimenter
What can be said about a set of data with a standard deviation of 0?
All the observations are the same value.
What is an observational study?
An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables.
Determine if the statement is true or false. When an observation that is much larger than the rest of the data is added to a data set, the value of the median will increase substantially.
False
Identify the given statement as either true or false. The standard deviation is a resistant measure of spread.
False
True or False: A data set will always have exactly one mode.
False
Determine whether the following statement is true or false. Explain. When taking a systematic random sample of size n, every group of size n from the population has the same chance of being selected.
False, because certain groups would never be selected.
Determine whether the following statement is true or false. Explain. A simple random sample is always preferred because it obtains the same information as other sampling plans but requires a smaller sample size.
False, because other sampling techniques may provide more information for less cost than a simple random sample.
Determine whether the following statement is true or false. Explain. When obtaining a stratified sample, the number of individuals included within each stratum must be equal.
False. Within stratified samples, the number of individuals sampled from each stratum should be proportional to the size of the strata in the population.
Determine whether the following statement is true or false. Generally, the goal of an experiment is to determine the effect that the treatment will have on the response variable.
True
Determine whether the following statement is true or false. Explain. When conducting a cluster sample, it is better to have fewer clusters with more individuals when the clusters are heterogeneous.
True, because when the clusters are heterogeneous, they are scaled down versions of the population.
A _______________ is a graph that uses points, connected by line segments, to represent the frequencies for the classes. It is constructed by plotting a point above each _______________ (the sum of consecutive lower class limits divided by 2) on a horizontal axis at a height equal to the frequency of the class.
frequency polygon class midpoint
A _______________ is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same, and the rectangles touch each other.
histogram
When an observation that is much larger than the rest of the data is added to a data set, the value of the mean will __________________.
increase
A(n) _________ is a person or object that is a member of the population being studied.
individual