Statistics 2001 Chapter 1,2,3
A pie chart is a segmented circle whose segments add up to ______ degrees.
360
The p-value approach to hypothesis testing has --- steps.
4
A one-way ANOVA test is based on which distribution?
Fdf1,df2
These days, it has become easy to access data by simply using a search engine like -------.
There are several guidelines to follow when constructing graphs that summarize statistical data. Which of the following statements is LEAST accurate?
Graphs should have a lot of adornments.
The two-way ANOVA test can be extended to capture the -----between the factors.
INTERACTION
SSB/r−1=
MSB
SSBr−1SSB/ r-1= .
MSB
When constructing a histogram, what values/labels go on the horizontal (x) axis and the vertical (y) axes?
Quantitative class limits on the horizontal axis; frequency or relative frequency on the vertical axis.
In a one-way ANOVA table, = SSTR + SSE.
SST
True or false: The alternative hypothesis always states the opposite of the null hypothesis.
True
True or false: The mean is the most widely used measure of central location for quantitative data.
True
True or false: The two-way ANOVA test can be conducted with or without examining the interaction of the two factors.
True
The significance level is the probability of making
a Type I error.
In order to calculate the arithmetic mean, one
adds all of the data points, then divides by the number of data points.
One method of graphical presentation for qualitative data is a _____.
bar chart
An important final conclusion to a statistical test is to...
clearly interpret the results in terms of the initial claim.
Relative frequency distributions are generally more useful than frequency distributions when
comparing data sets of different sizes.
Consider the following variable: a runner's time in a 100-meter race. This variable is best categorized as a ______ variable.
continuous
You use the R _____ function generates the correlation coefficient as well as the value of the test statistic and the p-value.
cor.test
The------- ------ approach to hypothesis testing is attractive when a computer is unavailable and all calculations must be done by hand.
critical value
The branch of statistics that summarizes important aspects of a data set is often referred to as ______ statistics.
descriptive
The ANOVA test assume the samples are selected ------.
independently
If the value of the test statistic falls in the rejection region, then the p-value must be
less than α.
The test statistic when the population standard deviation is know is z = x−μ0σ/√nx-μ0σ/n. This formula is valid only if XX follows ---a distribution.
normal
In most applications, we require some form of the equality sign in the ---hypothesis.
null
The two competing hypotheses used in hypothesis testing are called---the hypothesis and the ---hypothesis.
null, alternative
A cumulative frequency distribution identifies the number of -------that falls below the upper limit of a particular interval.
observations
When performing a hypothesis test on μ, the p-value is defined as the
observed probability of making a Type I error.
The average of the sum of squared differences from the mean is the
population variance
The ----is not considered a good measure of dispersion because it focuses solely on the extreme values and ignores every other observation in the sample or the population.
range
We always use ----evidence and the chosen significance level α to conduct hypothesis tests.
sample
Sampling, rather than surveying an entire population, can offer some substantial benefits. Some of those benefits include
saving money and time.
A one-way ANOVA test is better than using a series of two-sample t tests because conducting a
series of two-sample t tests inflates the risk of committing a Type I error.
A polygon gives a general idea of the--- of a distribution.
shape
Histograms can be used to determine the -----of the data.
shape
In a ANOVA test, we compute the grand mean by calculating
the sum of all the observations and then dividing by the total number of observations.
The critical value of a hypothesis test is
the value that separates the rejection region from the non-rejection region.
Data that are collected by recording a characteristic of a subject over several time periods are referred to as ______ data.
time series
Since ANOVA techniques were originally developed in connection with agricultural experiments, the term---- is often used to identify the populations being examined for an ANOVA analysis.
treatment
The formula for the sample mean is
xx = ∑i=1nxin
Match these terms with their meanings: α β
α: The probability of a Type I error. β: The probability of a Type II error.
In which of the following data sets would the arithmetic mean NOT be a good measure of central location?
7, 8, 8, 9, 25
What is the most widely-used measure of central location?
Mean
Place the sums of squares from a one-way ANOVA table in the correct order.
SSTR SSE SST
The branch of statistics that draws conclusions about a large set of data based on a smaller set of data is often referred to as ______ statistics.
inferential
Hypothesis testing is analogous to a criminal court of law where someone is ---until proven---.
innocent, guilty
It is not sufficient to end the analysis with a conclusion that you reject the null hypothesis or you do not reject the null hypothesis. You must ----the results.
interpret
Generally, the---- is the best measure of central location when outliers are present.
median
A(n) ______ is a segmented circle whose segments portray the relative frequencies of the categories of some qualitative variable.
pie chart
One method of graphical presentation for qualitative data is a(n) ______.
pie chart or bar chart
The first step to determine the median is to
place the data in numerical order
When performing a hypothesis test on μ when the value of σ is unknown, the test statistic is computed as x−μ0s/√nx-μ0s/n and it follows the
tdf distribution with (n - 1) degrees of freedom.
This is the symbol for the population mean.
μ
This is the symbol for the sample mean.
-x
Suppose you are performing a hypothesis test on μ and the value of σ is known. At the 5% significance level, the critical value(s) for a two-tailed test is (are):
-z0.025 and z0.025
True or false: The alternative hypothesis HA in one-way ANOVA requires that all means differ from one another.
False
The alternative hypothesis for a two-sided test for a population mean would be denoted as
HA: μ ≠ μ0
Which of the following BEST describes a frequency distribution for qualitative data?
It groups data into categories, and records the number of observations in each category.
In a neighborhood there are five houses listed for sale for the following amounts: $250,000; $275,000; $280,000; $295,000; and $515,000. What is the BEST measure of central location for the price of a house in the neighborhood?
Median
Which of the following graphical depictions displays cumulative data?
Ogive
Which of the following is an example of cross-sectional data?
Results of market research testing current consumer preferences for soda drinks
In one-way ANOVA, the mean square for treatments (MSTR) is calculated how?
SSTR/(c-1)
Match these shape and association measures with their Excel function names.
Skewness=> =SKEW(array) Kurtosis=> =KURT(array) Sample Covariance=> =COVARIANCE.S(array1,array2) Correlation=> =CORREL(array1,array2)
-------is the science that deals with the collection, preparation, analysis, interpretation, and presentation of data.
Statistics
A sales invoice is what type of data?
Structured
Which of the following is an example of inferential statistics?
Test the longevity of all light bulbs based on a sample of 100 light bulbs.
When there are an odd number of observations, and the observations are in order from smallest to largest, the median is...
The middle observation
All of the following are examples of continuous variables EXCEPT:
The number of children in a family
A(n) ______ depicts the frequency or the relative frequency for each category of a qualitative variable as a series of horizontal or vertical bars, the lengths of which are proportional to the values that are depicted.
bar chart
In one-way ANOVA, two independent estimates of the common population variance σ2 are estimated. These estimates are commonly referred to as ______.
between-treatments variability and within-treatments variability
The ______ is a weighted sum of the sample variances of each treatments.
error sum of squares
The p-value is the likelihood of obtaining a sample mean that is at least as -------as the one derived from the given sample, under the assumption that the null hypothesis is true as an equality.
extreme
In general, data are compilations of------ , -----, -----or other .
facts, figures, or other contents
An owner of a grocery store wants to determine the brands of soda that customers purchase at the store. When summarizing the data about soda brand purchases, the meaningful measure of central location is the ______.
mode
The ______ is a measure of central location that is the most frequently occurring value in the data set.
mode
When summarizing a qualitative data set, the ______ is the best measure of central location.
mode
A quantitative variable is also known as a ----variable.
numerical
The mean is usually greater than the median when the data are ----skewed.
positively
Performing a one-way ANOVA test, instead of performing a series of two-sample t tests, --the risk of incorrectly rejecting the null hypothesis.
reduces
A ______ is a (measured) subset of a population.
sample
Histograms can be used to observe the -----of the data.
spread or variability
In one-way ANOVA, the error sum of squares (SSE) is the
sum of the weighted sample variances of each treatment.
A -----distribution is one that is a mirror image of itself on both sides of its center.
symmetric
The term ---is often used to identify the c populations being examined.
treatments
In two-way ANOVA with interaction, we partition the total sum of squares SST into the following components:
SSA, SSB, SSAB, and SSE
If one variable decreases as the other variable decreases, the two variables have what type of relationship?
Positive
If the interaction between two factors is not significant, what are the next ANOVA tests to be done?
Tests about the population means of factor A and/or factor B
When constructing classes for a frequency distribution for quantitative data, which of the following statements is LEAST accurate?
The number of classes should equal the number of observations.
Which of the following is NOT an assumption for performing a one-way ANOVA?
The population correlation coefficients indicate a strong linear relationship.
Which of these is a NULL hypothesis applicable for a two-way ANOVA test with interaction?
There is no interaction between factors A and B.
In descriptive statistics, a polygon is best described as a
graph that connects the midpoints of each class and its associated frequency or relative frequency.
In a two-way ANOVA test, the sum of squares for factor B is based on the sum of the squared differences between the mean for each level of factor B and the ------ ------.
grand mean
In two-way ANOVA without interaction, the error sum of squares (SSE) is calculated as ______.
SST - (SSA + SSB)
_____ data often consist of numerical information that is objective and is not open to interpretation.
Structured
The ANOVA test is a ----tailed test.
right
If the value of the sample covariance between the two random variables X and Y equals -150, then we can conclude that X and Y have a (an) ______.
negative linear relationship
A one-way analysis of variance (ANOVA) test compares population---- based on one categorical variable or factor.
means
The mode is defined as
most frequently occurring variable in the data set
If the population standard deviation is unknown, it can be estimated by using ______.
s
An auditor for a small business wants to determine whether the mean value of all accounts receivable is less than $550. She takes a sample of 40 and computes the sample mean and the sample standard deviation. The null and alternative hypotheses for this test are
H0: μ ≥ 550 and HA: μ < 550
The only way we can reduce both Type I and Type II errors is by increasing-----.
n or sample size
In ANOVA testing, if the ratio of the between-treatment variability to within-treatment variability is significantly greater than one, then we
rejecting the null hypothesis of equal population means
The critical value approach specifies a region of values, called the ______. If the test statistic falls into this region, we reject the ______.
rejection, null hypothesis.
This value of rxy represents a perfect negative linear relationship.
-1
Which of the following are correctly configured two-tailed tests?
-H0: p = p0 HA: p ≠ p0 -H0: μ = μ0 HA: μ ≠ μ0
When decomposing total variation in a two-way ANOVA test with interaction, SST = +. + + .
SST= SSA+SSB+ SSAB+SSE
Not rejecting the null hypothesis when the null hypothesis is false.
Type II error
When creating a bar chart or a histogram, each bar/rectangle should be of the ------width.
same
Correlation coefficient measures the strength of the linear relationship between---- variables.
two
For a hypothesis test of μ when σ is known, the value of the test statistic is calculated as
z = x−μ0///σ/√n
In general, we follow three steps when formulating the competing hypotheses. Place these steps in the correct sequence.
- Identify the relevant population parameter of interest. - Determine whether it is a one- or two-tailed test. - Include some form of the equality sign in the null hypothesis and use the alternative hypothesis to establish a claim.
Place the steps to perform an ANOVA difference of means test in their proper sequence.
- Specify the null and the alternative hypothesis. - Specify the significance level - Calculate the value of the test statistic and the p-value. -State the conclusion and interpret the results.
< rxy < .
-1 , 1
In hypothesis testing, two correct decisions are possible:
-Reject the null hypothesis when it is false. -Do not reject the null hypothesis when it is true.
Put the following steps in the p-value approach to hypothesis testing in the correct order.
1.) Specify the null and alternative hypotheses. 2.) Specify the significance level. 3.) Calculate the value of the test statistic and its p-value 4.) State the conclusion and interpret results.
In a two-way ANOVA test, what is the maximum number of different hypotheses that can be tested?
3
Many experts believe that _____ of the data in the world today were created in the last two years alone.
90%
With respect to a bar chart, which of the following statements is MOST accurate?
A bar chart is a useful graphical tool for qualitative data.
Sampling is necessary when it is either impractical or impossible to survey the entire population. In which situation does surveying the entire population INSTEAD OF sampling just a part of the population make the most sense?
A teacher who has 30 students in her class wants to determine the average of the most recent test scores.
Which of the following statements is NOT correct concerning the p-value and critical value approaches to hypothesis testing?
Both approaches use the same decision rule concerning when to reject H0.
Which of the following is an example of descriptive statistics?
Calculate the percent of 2500 U.S. voters in an opinion poll who approve of the President's performance.
Suppose the competing hypotheses for a test are H0: μ ≤ 10 versus HA: μ > 10. If the value of the test statistic is 1.90 and the critical value at the 1% level of significance is z0.01 = 2.33, then the correct conclusion is:
Do not reject H0 and conclude that the population mean does not appear to be greater than 10 at the 1% significance level.
For an ANOVA test, the p-value is found using the ----table.
F
True or false: A Type I error occurs if we do NOT reject the null hypothesis when it is actually false.
False
Match these symbols with their meanings. H0 HA
H0- Null Hypothesis HA- Alternative hypothesis
A researcher for a store chain wants to determine whether the proportion of customers who try out the samples being offered is more than 0.15. The null and alternative hypotheses for this test are
H0: p ≤ 0.15 and HA: p > 0.15
The null hypothesis for a two-sided test for a population mean would be denoted as
H0: μ = μ0
Which of the following is right-tailed test for the correlation coefficient.
H0: ρxy < 0 HA: ρxy > 0
In order to approximate the class width for a frequency distribution of quantitative data, we calculate:
Largest Value- Smallest Value/ Number of classes
SSAc−1SSA/c-1=
MSA
In a two-way ANOVA, the Fdf1,df2 statistic that determines whether significant differences exist between the factor A means is calculated as /. .
MSA/ MSE
------is the within-treatments variance.
MSE
SSE/ rc(w-1)=
MSE
SSE/nT−c−r+1= .
MSE
-----is the between-treatments variance.
MSTR
Match these location measures with their R function names.
Mean: mean(df$var) Median: median(df$var) Multiple measures: summary(df) Minimum: min(df$var) Maximum: max(df$var) Percentile: quantile(df$var,p)
Match these location measures with their Excel function names.
Mean=> =AVERAGE(array) Median=> =MEDIAN(array) Mode=> =MODE(array) Minimum=> =MIN(array) Maximum=> =MAX(array) Percentile=> =PEPERCENTILE.INC(array, p)
When there are an even number of observations, and the observations are in order from smallest to largest, the median is...
The average of the two middle observations
What is the relationship between the variance and the standard deviation?
The standard deviation is the positive square root of the variance.
Which of the following statements is true about the test of H0: ρx y = 0?
The test statistic is assumed to follow the tdf distribution with n - 2 degrees of freedom.
A company wants to estimate the mean price of oil over the past 10 years. What type of data does the company need?
Time series data
Which of the following is not a measure of central location?
Variance
Which characteristic of big data does the following describe? Data come in all types, forms, and granularity, both structured and unstructured.
Variety
Which characteristic of big data does the following describe? The credibility and quality of data.
Veracity
The conclusions of a hypothesis test that are drawn from the p-value approach versus the critical value approach are
always the same.
A qualitative variable is also known as a -----variable.
categorical
The term ------- ------relates to the way data tend to cluster around some middle or central value.
central location
If the two independent estimates of σ2 are relatively close together, then it is likely that the variability of the sample means can be explained by
chance
Data that are collected about many subjects at the same point in time or without regard to differences in time are known as ______ data.
cross-sectional
In two-way ANOVA with interaction, we partition the total sum of squares into ----- distinct components.
four
A ______ is a way to organize qualitative data into categories and record the number of observations in each category.
frequency distribution
n a two-way ANOVA test, the sum of squares for factor A is based on the sum of the squared differences between the mean for each level of factor A and the ----- -----.
grand mean
The range is the difference between
largest and smallest values
We reject H0 if the p-value is ---- ----alpha.
less than
In one-way ANOVA, between-treatments variability is based on the variability between sample ---.
means
The one-way analysis of variance (ANOVA) test is used to determine if differences exist between the ----of three or more populations.
means
The measure of central location where half the values of the data set lie above this measure and half the values of the data set lie below this measure is known as the ______.
median
A negative value of the covariance implies that x and y have a ----linear relationship.
negative
What type of relationship exists between two variables if as one increases, the other decreases.
negative
The mean is usually less than the median when the data are --skewed.
negatively
If we reject the null hypothesis when it is actually false we have committed...
no error.
In order to implement an hypothesis test, it is essential that /X is---- distributed.
normally
Extremely small or large values, also referred to as----- .
outliers
Most researchers and practitioners favor the ----- -value approach
p-value
The notation σ^2 represents the
population variance.
A relative frequency distribution for quantitative data identifies the
proportion of observations that occur in each class.
If the chosen significance level is α = 0.05, then there is a 5% chance of
rejecting a true null hypothesis.
The standard error of the estimate is the standard deviation of the ---.
residuals or errors
When testing μ and σ is known, H0 can never be rejected if z ≤ 0 for a
right-tailed test.
Generally, for a frequency distribution, the width of each interval is the -----for each interval.
same
The ----level of a hypothesis test is defined as 100α%.
significance
If a distribution is not symmetric, then it is either positively skewed or negatively-------.
skewed
An ogive is a graph that plots the cumulative frequency, or cumulative relative frequency, against the
upper limit of the corresponding class.
When performing a hypothesis test on μ when σ is known, H0 can never be rejected if
z ≥ 0 for a left-tailed test.
The------------ frequency for a particular interval indicates the proportion of the observations that falls below the upper limit of that particular interval
Cumulative relative
We would conduct a hypothesis test to determine whether or not
sample evidence contradicts H0.
How many means can you test for differences using ANOVA?
3 or more
The notation -x represents the ______.
sample mean
In order to determine if significant differences exist between some of the population means, we develop two independent estimates of the common population .
variance
Two widely used measures of dispersion are...
variance and standard deviation
Northern University wants to determine the average starting salary for last year's graduates of its College of Business. What is the population from which the survey is taken?
All of last year's graduates from Northern's College of Business who started working
True or false: ANOVA is a statistical technique used to determine if there is a difference in three or more population standard deviations.
False
When testing whether the correlation coefficient differs from zero, the value of the test statistic is t20 = 1.95 with a corresponding p-value of 0.0653. At the 5% significance level, can you conclude that the correlation coefficient differs from zero? Multiple choice question.
No, since the p-value exceeds 0.05.
For quantitative data, a ______ groups data into classes and records the number of observations that falls into each class.
frequency distribution
The notation s2 represents the
sample variance
Which one of the following is NOT a step we use when formulating the null and alternative hypotheses?
Calculate the value of the sample statistic.
The competing hypotheses for a one-way ANOVA test that compares the means of three populations are defined as
H0: μ1 = μ2 = μ3 HA: Not all population means are equal
In order to summarize qualitative data, a useful tool is a(n) ______.
frequency distribution
In inferential statistics, we use---- information to make inferences about an unknown ----parameter.
sample, population
For a hypothesis test on μ when the value of σ is unknown, the value of the test statistic is calculated as ______, provided that we sample from a normal population.
tdf = x−μ0/s/√n
A ---------- ----------distribution shows the number of observations that fall below the upper limit of a particular interval. Listen to the complete question
cumulative frequency
In a frequency distribution for a categorical variable, intervals are------------ .
mutually exclusive
A ------is a series of rectangles where the width and height of each rectangle represent the interval width and frequency of the respective interval.
histogram