Final Exam
Pareto Charts
A Pareto chart is a bar graph for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies, so the bars decrease in height from left to right.
What is a scatterplot and how does it help us?
A scatterplot is a graph of paired (x, y) quantitative data. It provides a visual image of the data plotted as points, which helps show any patterns in the data.
Which word is associated with multiplication when computing probabilities?
AND
Which of the following is NOT a principle of probability?
All events are equally likely in any probability procedure.
In horse racing, a trifecta is a bet that the first three finishers in a race are selected, and they are selected in the correct order. Does a trifecta involve combinations or permutations? Explain.
Because the order of the first three finishers does make a difference, the trifecta involves permutations.
Heights of adult males are normally distributed. If a large sample of heights of adult males is randomly selected and the heights are illustrated in a histogram, what is the shape of that histogram?
Bell-shaped
A BLANK helps us understand the nature of the distribution of a data set.
Frequency distribution
Which of the following is always true?
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same
A data set consists of 100 amounts of annual income (dollars) and there are two outliers that are exceptionally high. Which measure of center appears to be best?
Median
A data set consists of a list of eye colors from 250 randomly selected statistics students. Which measure of center appears to be best?
Mode.
Is the median always equal to the midrange?
No
A combination lock uses three numbers between 1 and 94 with repetition, and they must be selected in the correct sequence. Is the name of combination lock" appropriate? Why or why not?
No, because the multiplication counting rule would be used to determine the total number of combinations.
If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global temperature?
No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.
Suppose that you need to create a list of n values that have a specific known mean. Some of the n values can be freely selected. How many of the n values can be freely assigned before the remaining values are determined? (The result is referred to as the number of degrees of freedom.)
Of the n values, n-1 can be freely selected because the remaining value(s) can be expressed in terms of the assigned values and the known mean.
BLANK are sample values that lie very far away from the majority of the other sample values.
Outliers
Multiplication Rule of Probability
P(A and B) = P(A) * P(B)
Addition Rule of Probability
P(A or B) = P(A) + P(B) - P(AB)
IQR (interquartile range)
Q3-Q1
Which measure of variation is most sensitive to extreme values?
Range
Why is it important to learn about bad graphs?
So that we can critically analyze a graph to determine whether it is misleading.
Determine whether the underlined number is a statistic or a parameter. A sample of employees is selected and it is found that 55% own a computer.
Statistic because the value is a numerical measurement describing a characteristic of a sample.
State whether the data described below are discrete or continuous, and explain why. The numbers of checked bags on flights between San Francisco and Atlanta
The data are discrete because the data can only take on specific values.
State whether the data described below are discrete or continuous, and explain why. The numbers of people that drive by a certain billboard each day.
The data are discrete because the data can only take on specific values.
Determine whether the data described below are qualitative or quantitative and explain why. The blood groups of A, B, AB, and O
The data are qualitative because they don't measure or count anything.
Determine whether the given value is a statistic or a parameter. A homeowner measured the voltage supplied to his home on 25 days of a given month, and the average (mean) value is 123.9 volts.
The given value is a statistic for the month because the data collected represent a sample.
Which of the following is NOT a characteristic of the mean?
The mean is called the average by statisticians.
How many different ways can the letters of "embarrass" be arranged? If the letters of "embarrass" are arranged in a random order, what is the probability that the result will be "embarrass"?
The number of different ways that the letters of "embarrass" can be arranged is 45360
If we collect a large sample of blood platelet counts and if our sample includes a single outlier, how will that outlier appear in a histogram?
The outlier will appear as a bar far from all of the other bars with a height that corresponds to a frequency of 1.
What does P(B|A) represent?
The probability of event B occurring after it is assumed that event A has already occurred
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
The term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern.
Which of the following is not a requirement of the binomial probability distribution?
The trials must be dependent
In a study of a sample of babies born at hospitals in one state, it was found that the average (mean) weight at birth was 3185.9 grams. Identify whether this value is a statistic or a parameter.
The value is a statistic because it describes some characteristic of a sample.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: −2.00, −1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
Which characteristic of data is a measure of the amount that the data values vary
Variation
Which of the following is NOT a property of the standard deviation?
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
Parameter
a numerical measurement describing some characteristic of a population
Statistic
a numerical measurement describing some characteristic of a sample
The ____ event A occurring are the ratio P(A)/ P(Ā)
actual odds in favor of
When using the _______ always be careful to avoid double-counting outcomes.
addition rule
The conditional probability of B given A can be found by _______.
assuming that event A has occurred, and then calculating the probability that event B will occur
Which of the following is NOT a measure of center?
census
A _______ probability of an event is a probability obtained with knowledge that some other event has already occurred.
conditional
Categorical (or qualitative or attribute) data
consist of names or labels (not numbers that represent counts or measurements).
Quantitative (or numerical) data
consist of numbers representing counts or measurements.
A _______ random variable has infinitely many values associated with measurements.
continuous
Methods used that summarize or describe characteristics of data are called _______ statistics.
descriptive
A _______ random variable has either a finite or a countable number of values.
discrete
Events that are _______ cannot occur at the same time.
disjoint
A ______ is a graph of each data value plotted as a point.
dotplot
The classical approach to probability requires that the outcomes are ___
equally likely
Find the actual mean in a frequency distribution by
finding the midpoints and multiplying those by the frequencies. dividing the sum of those to the sum of frequencies.
The heights of the bars of a histogram correspond to _____ values.
frequency
A(n) ______ uses line segments to connect points located directly above class midpoint values.
frequency polygon
We utilize statistical _______ to look for features that reveal some useful or interesting characteristics of the data set.
graphs
skewed left (negatively skewed)
have a longer left tail
skewed right (positively skewed)
have a longer right tail
Selections made with replacement are considered to be _______.
independent
The midrange is
max+min/2
A value at the center or middle of a data set is a(n) _______.
measure of center
The measure of center that is the value that occurs with the greatest frequency is the _____.
mode.
The complement of "at least one" is _______.
none
A(n) _____ distribution has a "bell" shape.
normal
In modified boxplots, a data value is a(n) _______ if it is above Q Subscript 3+(1.5)(IQR) or below Q Subscript 1−(1.5)(IQR).
outlier
If the order of the items selected matters, then we have a _______.
permutation problem
A _______ variable is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.
random
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
relative frequency
Continuous (numerical) Data
result from infinitely many possible quantitative values, where the collection of values is not countable.
Discrete Data
result when the data values are quantitative and the number of values is finite, or "countable"
The Range Rule of Thumb roughly estimates the standard deviation of a data set as ____.
s= range/4
The _______ for a procedure consists of all possible simple events or all outcomes that cannot be broken down any further.
sample space
A ______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables
scatterplot
A histogram aids in analyzing the _______ of the data.
shape of the distribution
A data value is considered _____ if its z-score is less than -2 or greater than 2.
significantly low or significantly high
Class width is found by ______.
subtracting a lower class limit from the next consecutive lower class limit
For data set having a distribution that is approximately bell-shaped, ____ states that about 68% of all data values fall within one standard deviation from the mean.
the Empirical Rule
The computed mean is not close to the actual mean when
the difference between the means is more than 5%
As a procedure is repeated again and again, the relative frequency of an event tends to approach the actual probability. This is known as _______.
the law of large numbers
Finding the value of a percentile
the location of that value is L = (k/ 100) * n , where k is the percentile and n is the sample size.
The bars in a histogram ______.
touch (without gaps)
The square of the standard deviation is called the
variance
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
z-score
If the selections are dependent, can they be treated as independent for the purposes of calculations?
Yes, because the sample size is less than 5% of the population.