First 45
So, on the test he will ask to find the five number summary: in the following order: xmin, QL,M,QU,xmax
...
The intercept of a regression line tells a person the predicted mean y-value when the x-value is _______.
0
The complement of "at least one" is ___.
"none."
Value of P(A_)
1-P(A)
If E represents any event and Ec represents the complement of E, then P(Ec)=__________.
1-P(E)
Volume of water in a swimming pool..
Continuous because it is not countable
RATIO
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Ages of Children: 4, 5, 6, 7 and 8
Class
Each raw data value is placed into a quantitative or qualitative category called a class
THEORETCAL
IT IS BASED ON A PREDICTABLE OUTCOME
Define the Mode.
Is the value that occurs with the greatest frequency.
A small p-value does what?
It discredits the null hypothesis.
A ____________________ is a bar graph in which the bars are drawn in decreasing order of frequency or relative frequency.
Pareto chart
n
Sample size
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean.
The empirical rule
Listed below are the jersey numbers of 1111 players randomly selected from the roster of a championship sports team. What do the results tell us?
The jersey numbers are nominal data and they do not measure or count anything, so the resulting statistics are meaningless.
Upper Class Limits
The largest numbers that can belling to the different classes.
Under what conditions is the median preferred?
The median is preferred when the data is strongly skewed or has outliers.
Standardized Score
The number of standard deviations that a piece of data lies above or below the mean. Z = (X - μ) / σ
Is the length of a newborn baby discrete or continuous?
The random variable is continuous.
Is the length of a song discrete or continuous?
The random variable is continuous.
Σ is called and means
Uppercase sigma, and means the "sum of terms [xi]"
1. The same of paired (x,y) data is a random sample. 2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern. 3. Remove outliers.
What are the requirements for a regression line?
In a symmetric and bell-shaped distribution, are the mean, median and mode the same?
Yes.
measure of center
a value at the center or middle of a data set: several way to determine the center; different definitions like mean, median, mode, and mid-range
The heights of the bars of a histogram correspond to
frequency values
σ2
population variance
The symbol for sample standard deviation is
s
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean
z-score.
Census
A census is the collection of data from every member of the population. It is not a measure of center.
Fill in the blank. A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
A relative frequency histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies.
POPULATION
ANY NUMBER FROM A PARAMETER IS A
Class Midpoints
Add the lower class limits and the Upper class limits and divide by 2 (ex 60+69=129 then divide by two and it will equal 64.5
Changing the width of bins in a histogram _______.
Changes the shape of the histogram
Determine whether the given value is from a discrete or continuous set "The height of 2-year-old maple tree is 28.3 ft."
Continuous
Statistic
Describes characteristics of a sample
Determine whether the value is from a discrete or continuous data: Number of cars owned is 7
Discrete
A(n) _______ is any collection of outcomes from a probability experiment.
Event
The histogram to the right represents the weights (in pounds) of members of a certain high-school math team. How many team members are included in the histogram?
In the histogram above, the y-axis is ticked for every 1-person increase in frequency, so the number of team members in each class is given by the height of the bar in ticks.
x
Is the variable usually used to represent the individual data values.
DESCRIPTIVE STATISTICS
Methods used that summarize or describe characteristics of data are called?
What is the term for a group of objects or people to be studied? Estimator Sample Census Population
Population
A(n) __________ is a numerical measure of the outcome of a probability experiment.
Random variable
Which statement is NOT true regarding the mean?
The mean is always the best measure of center.
mode
The mode of a variable is the most frequent observation of the variable that occurs in the data set. *if no observation occurs more than twice then there is NO MODE
Data for two variables.
What is bivariate data?
When is a point influential?
When omitting the observation would result in a very different regression equation
sample artithmetic mean, x, (pronounced x bar)
computed using sample data, sample is a statistic
Quartiles
measures of location, denoted Q1, Q2, Q3, which divide a set of data into four groups with about 25% of the values in each group.
The measure of center that is the value that occurs with the greatest frequency is the
mode
A(n) - distribution has a "bell" shape.
normal
Frequency values
...
For a particular regression analysis, it is found that SST = 900.0 and SSE = 400.00. Calculate the coefficient of determination
0.555 (regression identity theorem SSE=SST-SSR)
Median
50% of data is above and 50% of data is below; resistant, little change
Coefficient of variation
CV= S/X *100
variance
DEALS WITH STANDARD DEVIATION.
Continuous
Many possible values
Which of the statements below is true concerning bar graphs?
The height of each bar represents the category's frequency or relative frequency.
Which measure of variation is very sensitive to extreme values?
range
3 important measures of vatiation
range, standard deviation, and variance.
Given a collection of paired sample data, the _________ y=b0 + b1x algebraically describes the relationship between the two variables, x and y.
regression equation
In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
relative frequency
n means
sample
You are given information about a straight line. Determine whether the line slopes upward, slopes downward, or is horizontal. The equation of the line is y = 10 - 12x.
slope is downward
z-score transformation
statistical technique that uses the mean and standard deviation to transform each raw score into a standard score
Fill in the blank. Class width is found by _______.
subtracting a lower class limit from the next consecutive lower class limit.
Class width is found by _____________
subtracting the lower class limit from the next consecutive lower class limit
(Σxi)/N means
sum of all x values / N - population
The mean measures..
the center of distribution
the higher the standard deviation
the more spaced out and dispersed the bell shape.
In a symmetric and bell-shaped distribution,
the mean, median, and mode are the same
Bars in a histogram_________?
touch
bimodal
two data values occur with the same greatest frequency
When a data is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a ______.
z-score
"At least one" is equivalent to ____.
"one or more"
The average score on an aptitude test is 80 with a standard deviation of 10. One person scored a 65. What is that person's z-score?
-1.5
Emperical rule is known as
68%-95%-99.7% rule
Fill in the blank. A(n) _______ uses line segments to connect points located directly above class midpoint values.
A(n) frequency polygon uses line segments to connect points located directly above class midpoint values.
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
Z-score
A data value is considered ___ if its z-score is less than -2 or greater than 2.
unusual
relative frequency polygon
variation of the basic frequency polygon. uses relative frequencies (proportions or percentages) for vertical scale. to compare two data sets, graph to relative frequency polygons on same axes
frequency distribution
helps us understand the nature of the distribution of a data set.
You roll a six-sided die 56 times and land on an ace (a one) 7 times. You want to test the hypothesis that the die does not come up with an ace one-sixth of the time. Determine the null hypothesis.
H0: p=1/6
Exists between two variables when the values of one variable are somehow associated with the values of the other variable.
Define correlation.
A _______ helps us understand the nature of the distribution of a data set.
Frequency Distribution
Determine which of the 4 levels of measure is the most appropriate: Years of elections: 1988, 1990, 1992, 1994, and 1996
Interval
Look at the #42 charts and answer the questions: Is there strong evidence suggesting that the data are not from a population having a normal distribution?
No, the distribution is not dramatically far from being a normal distribution with a "bell" shape, so there is not strong evidence against a normal distribution.
Is it OK to say "average" instead of mean?
No.
Determine whether the given description corresponds to an experiment or an observational study: A stock analyst selects a stock from a group of twenty for investment by choosing the stock with the greatest earnings per share reported for the last quarter.
Observational study
Identify the study as an observational study or a designed experiment. An educational researcher used school records to determine that, in one school district, 84% of children living in two-parent homes graduated high school while 75% of children living in single-parent homes graduated high school.
Observational study
Which of the following is always true? -For skewed data, the mode is farther out in the longer tail than the median. -The mean and median should be used to identify the shape of the distribution. -Data skewed to the right have a longer left tail than right tail. -In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same
Identify which type of sampling is used: The name of each contestant is written on a separate card, the cards are placed in a bag, and three names are picked from the bag Simple Random Cluster Convenience Stratified Systematic
Simple Random
To determine customer opinion of their musical variety, Sony random selected 110 concerts during a certain week and surveys all concert goers. What type of sampling is this?
Cluster
Find the median for the given sample data. The salaries of ten randomly selected doctors are shown below. 150,000 143,000 165,000 238,000 215,000 129,000 139,000 723,000 217,000 166,000
$165,500
Z score rules
...
If the sample is collected without replacement, which of the following conditions regarding the population must be met to apply the Central Limit Theorem for Sample Proportions? 50 times bigger 5 times bigger 10 times bigger 100 times bigger
10 times bigger
Look at #23 chart and answer the question: The histogram represents - debate team members.
11
On a test, 74% of the questions are answered correctly. If 111 questions are correct, how many questions are on the test? 37 questions 67 questions 150 questions 82 questions
150 questions
How to know significant difference in coefficient of variation
5%
which percent of observations are expected to lie within 1 standard deviation of the mean?
68% >> (34% () 34%)
Suppose we have 10 exam scores for an introductory statistics course. The scores are 78, 98, 94, 89, 86, 77, 76, 80, 75, 68. The median score is
79
Determine whether the given value is from a discrete or continuous data set. When a car is randomly selected, it is found to have an engine with 6 cylinders an engine with 6 cylinders.
A discrete data set because there are a finite number of possible values.there are a finite number of possible values.
Define measure of center.
A value at the center or middle of a data set.
Which of the following is NOT a principle of probability
All events are equally likely in any probability procedure
No comma because the frequencies are roughly equal across the voltage classes.
Does the result appear to have a normal distribution? Why or why not?
A(n) _______ uses line segments to connect points located directly above class midpoint values.
Frequency Polygon
You are receiving a large shipment of batteries and want to test their lifetimes. Explain why you would want to test a sample of batteries rather than the entire population. If you test all the batteries you cannot form any conclusions about the population. If you test all the batteries to failure you would have no batteries to sell. The percentage of defective batteries can change in the time it takes you to test all the batteries.
If you test all the batteries to failure you would have no batteries to sell.
Descriptive Statistics
Methods & tools that summarize or describe relevant characteristics of data.
descriptive statistics
Methods used that summarize or describe characteristics of data
Fill in the blank. _______ are sample values that lie very far away from the majority of the other sample values.
Outliers are sample values that lie very far away from the majority of the other sample values.
"Relative frequency" is the same as which of the following?
Proportion
Which of the following corresponds to the case when every sample of size n has the same chance of being chosen?
Simple Random Sample
Experiments used to produce empirical probabilities are called what?
Simulations
Why is it important to learn about bad graphs?
So that we can critically analyze a graph to determine whether it is misleading
kth percentile
The ... denoted, Pk , of a set of data is a value such that k percent of the observations are less than or equal to the value.
Census
The collection of data from every member of the population.
Computing GPA
The grading system assigns quality points to letter grades as follows: A= 4; B= 3; C= 2; D= 1; F= 0. - Use formula for Weighted Mean w= #s of credits x= replace letter grades with their corresponding quality points
In a relative frequency distribution, what should the relative frequencies add up to?
The relative frequencies should add up to 1
Lower Class Limits
The smallest numbers that can belong to the different classes.
properties of standard deviation
The units of the standard deviation are the same as the units of the original data, the standard deviation is a measure of variation of all data values from the mean, the value of the standard deviation is never negative
Look at #10 charts and answer the two questions: Based on the distribution, do the weights appear to be reported or actually measured? What can be said about the accuracy of the results?
The weights appear to be reported because there are disproportionately more 0s and 5s. They are likely not very accurate because they appear to be reported.
Close to 0.
What is the value of Σ(Zx*Zy) if the points follow no linear pattern?
NOT a property of the standard deviation
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
What is the formula to determine the x-value from z-score?
X = mew + z times sigma (X = u + zo). (Mean plus (2 multiplied by standard deviation)
Look at #49 chart and answer the question: Does the graph distort the data? Why or why not?
Yes, because the graph incorrectly uses objects of volume to represent the data.
A value at the center or middle of a data set is a(n)
measure of center
OUTLIER
n modified boxplots, a data value is a(n) if it is above Q3plus (1.5)(IQR) or below Q1minus (1.5)(IQR)
A(n) ____ distribution has a "bell" shape.
normal
If the data points fall in a ____, the correlation is equal to zero.
random pattern
A ___ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
A histogram aids in analyzing the - of the data.
shape of the distribution
LOOK AT REAL LINE WITH LOF, LIF, UIF, UOF NUMBERS, AND OUTLIERS IN SLIDE 87!!!
study for midterm
Class width is found by
subtracting a lower class limit from the next consecutive lower class limi
Class width
the difference between lower class limits 60-69 70-79 the width would be 10 80-89
It can be slope.
What else can the intercept (from the excel) be?
A range of values used to estimate a variable.
What is a prediction interval?
When the original data values are arranged in order of increasing (or decreasing) magnitude, the middle value is called the _________
median
The ___ for a procedure consists of all possible simple events or all outcomes that cannot be broken down further.
sample space
When determining whether there is a correlation between two variables, one should be a ______ to explore the data visually.
scatter-plot
Class width is found by
subtracting a lower class limit from the next consecutive lower class limit
Quartile
Q1= 25th percentile; Q2= 50th percentile; Q3= 75th percentile
Inter-quartile range
Q3 minus Q1
1. Should not have any obvious pattern. 2. Should not become wider (or thinner) when viewed from left to right.
What is the criteria for a residual plot?
If the sum of the squares of the residuals is the smallest sum possible.
What is the least-squares property?
5 number summary
on a box plot, minimum, Q1, median, Q3, maximum
A magician claims he can cause a coin to come up heads more than 50% of the time. A coin is flipped 50 times, and 44 heads come up. Determine the null hypothesis.
p=0.50
A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
The symbol for population variance
sigma squaredσ2
The symbol for population standard deviation is
sigma- σ
A data value is considered _______ if its z-score is less than minus−2 or greater than 2.
significantly low or significantly high
the symbol for population standard deviation is
skyrimy o thing
the symbol of population variance is
skyrimy o thing^2
Standard deviation of a population
slightly different formula.
In the binomial probability formula, the variable x represents the ___.
the number of successes
We utilize statistical graphs
to look for features that reveal some useful or interesting characteristics of the data set
Which of the following is a common distortion that occurs in graphs? a. Using bars to represent the frequency of data values. b. Using points above the class midpoints at the heights of the class frequencies. c. Labeling both axes d. Using a two-dimensional object to represent data that are one-dimensional in nature.
d.
A - is a graph of each data value plotted as a point.
dotplot
When given the mean and SD...how to find if data is unusual?
find usual min and max and compare
Box-and-Whiskers Plot
graph representing information about the five-number summary and outliers for a given data set
dotplot
is a graph of each data value plotted as a point
Cluster sample
is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups
Side by Side Bar Graphs
is used when comparing data from two or more different data sets
For a distribution that is skewed right, the median is of the box.
left to the center
If the data set is skewed (left or right), and/or there are outliers, then
the best measure of the center: median the best measure of the dispersion is IQR/2= (Q3-Q1)/2
census
the collection of data from every member of the population. It is not a measure of center.
The standard deviation is used in conduction with the ______ to numerically describe distributions that are bell shaped
mean
Which of the following is NOT a value in the 5-number summary?
mean
The standard deviation is used in conjunction with the ______ to numerically describe distributions that are bell shaped. The ______ measures the center of the distribution, while the standard deviation measures the ______ of the distribution.
mean, mean, spread
Percentiles
measures of location, denoted by P_...which divide a set of data into 100 groups with about 1% of the values in each group. One type of quantiles or fractiles which partition data into groups with roughly the same number of values in each group. Measure of location.
The measure of center that is the value that occurs with the greatest frequency is the _______.
mode
the measure of center taht is the value that occurs with the greatest frequency is the _______?
mode
For a scatterplot, when the slope of the line in the plot is negative, the correlation is ____.
negative
What does w denote?
Denotes weights, which are assigned to different data values.
There is a positive linear correlation
If the scatterplot shows a distinct straight-line, or linear, pattern, what can we say? As the x-values increase, the corresponding y-values also increase. ex; r = .851
Look at #38 chart and question and answer the questions: Construct a scatterplot on the calculator. Does there appear to be a correlation between the president's height and his opponent's height?
No, there does not appear to be a correlation because there is no general pattern to the data.
Refer to the table summarizing service times (seconds) of dinners at a fast food restaurant. How many individuals are included in the summary? Is it possible to identify the exact values of all of the original service times?
No. The data values in each class could take on any value between the class limits, inclusive.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Favorite films
Nominal
μ
Represent the mean of all values in a population.
P(A) + P(mean of A) = 1 is one way to express the ____.
Rule of complementary events.
Statistics
The science of collecting, organizing, summarizing, and analyzing information to draw a conclusion and answer question. In addition, statistics is about providing a measure of confidence in any conclusions
VARIANCE
The square of the standard deviation is called the __
What does Σx represent?
The sum of all data values. (All frequencies added together)
influential points.
What are outliers and special points called?
Weighted Mean
When different (x) data values are assigned different weights (w).
Methods used that summarize or describe characteristics of data are called______ statistics.
descriptive
A -- helps us understand the nature of the distribution of a data set.
frequency distribution
A ____ indicates the shape and nature of the distribution of a data set.
frequency distribution
A ________ helps us understand the nature of the distribution of a data set
frequency distribution
A(n) -- uses line segments to connect points located directly above class midpoint values.
frequency polygon
Coefficient of variation, or CV
it is for a set of nonegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean, and is given by the following: CV= (s/_x)*100% . Used when means are substantially different, or if the samples use different scales or measurement units. Round CV to one decimal place.
A ____ correlation means that if one variable gets bigger, the other variable tends to get smaller.
negative
Fill in the blank. A(n) _______ distribution has a "bell" shape.
normal
Outliers
numbers that lie far from all the other numbers(CVDOT)
A ___ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
Unbiased estimator
sample variance s squares of the population variance little sigma^2. Values of s^2 tend to target the value of sigma squared instead of systematically tending to overestimate or underestimate sigma squared.
What does the z-score number represent?
the number of standard deviations from the mean. Aka standardized scores.
no mode
when no data value is repeated
When a data value is converted to a standardized scale representing the number of standard deviation the data value lies from the mean, we call the new value a _________
z-score
The sum of the deviations about the mean always equals
zero
Qualitive
Categorical data
What is the most common trick to mislead readers of bar graphs?
Change the scale of the vertical axis so that it does not start at 0.
Identify the variablle as either continuous or discrete: The number of limbs on a randomly selected oak tree.
Discrete
The mean represents the typical value in a set of data for what type of distribution?
For distributions that are roughly symmetric
Round-off rules
For mean, median and midrange, carry one more decimal place than is present in the original set of values. For mode leave as is without rounding. Example: mean of 2, 3 and 5 is 3.3333333. Round to 3.3.
When events A and B are said to be independent, what does that mean?
Knowledge that event B occurred does not change the probability of event A occurring.
When analyzing two quantitative variables, what is the first thing that should be done?
Make a scatterplot.
Suppose a fair die is rolled ten times and the result is recorded each time. Does this constitute a binomial experiment? Why or why not?
No, because there are more than two outcomes for each trial.
Is this a property of the standard deviation? When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
No, it is not a good practice to compare the two sample standard dev. in samples with very different means.
relationship between median, mean, and distribution shape... 1) Skewed left 2) Symmetric 3) Skewed right
1) mean < median 2) mean = median 3) mean > median
The computed mean and the actual mean are considered close if the difference is less than ____of the actual mean. Otherwise the means are said to be __________ different.
1. 5% 2. substantially.
How to Calculate Quartiles
1. Arrange data in ascending order 2. Determine Median (M)=Q2 3. Divide data set into halves: the observations below M and the observations above M The first quartile (Q1) is the median of the bottom half, and the third quartiles (Q3) is the median of the top half
A management survey for a company surveyed 235 employees. 44.7% of the employees surveyed were females. The number of males would be: 130 105 13 Unable to determine
130
The following frequency distribution analyzes the scores on a math test. scores: number of students 40-59: 2 60-75: 4 76-82: 6 83-94: 15 95-99: 5 Find the midpoint of of the class interval 40-59.
49.5
The following are speeds (mi/h) of cars measured with a radar gun. Determine the 5-number summary and boxplot for the data given below. 70, 70, 71, 72, 72, 73, 73, 75, 76, 76, 77, 78, 78, 79, 80 The 5-number summary is - - - - - Create a boxplot on the calculator
70, 72, 75, 78, 80
Below are 36 sorted ages of an acting award winner Find P90 using the method presented in the textbook. 18 19 20 22 25 27 27 33 38 41 42 43 46 51 53 54 55 56 57 58 62 63 65 69 70 71 72 72 74 74 74 76 80 80 80
76
UNUSUAL
A data value is considered __________ if its z-score is less than minus 2 or greater than 2
Since, in general, the longer a car is owned the more miles it travels one can say there is a _______ between age of a car and mileage.
A positive association
Z-Score
A z score (or standardized value) is the number of standard deviations that a given value x is above or below the mean. A negative z score corresponds to an x value less than the mean. A positive z score corresponds to an x value greater than the mean. The more negative the z score, the further the x value is below the mean. The more positive the z score, the further the x value is above the mean.
relative frequency
A ________________ __________________ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
True hypothesis Which of the following describes the value of the z-test statistic that is likely to result? Explain your choice. (i) The z-test statistic will be close to 0. (ii) The z-test statistic will be far from 0.
Statement I is more accurate. The null hypothesis is true, so the test statistic is likely to be close to what the null hypothesis predicts.
An instructor at the College of Lake County is interested in the average number of days that CLC math students are absent from class during a semester. She selects a random sample of students from the college and for each measures the number of days that the student was absent. Her sample produces an average number of days absent of 3.5 days. This value is an example of a:
Statistic
Determine whether the given value is a statistic or a parameter. "A health and fitness club surveys 40 randomly selected members and found that the average weight of those questioned is "
Statistic
The normal quantile plot shown to the right represents duration times (in seconds) of eruptions of a certain geyser from the accompanying data set. Examine the normal quantile plot and determine whether it depicts sample data from a population with a normal distribution.
The distribution is normal. The points are reasonably close to a straight line and do not show a systematic pattern that is not a straight dash line pattern.
Look at #51 chart and answer the questions: Compare the results.
The distribution of pulse rates for men is concentrated, centered around 60, whereas the distribution of pulse rates for women is more spread out, centered around 70.
If all the data values in a population are converted to z-scores, the distribution of z-scores will have what mean?
The mean of the z-scores will be zero.
Is the time required to download a file from the Internet discrete or continuous?
The random variable is continuous.
Is the number of bald eagles in the country discrete or continuous?
The random variable is discrete.
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
The term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern.
Is a scatterplot of the (x,y) values after each of the y-coordinate values has been replaced by the residual value y-^y (where ^y ndeontes the predicted value of y). That is, a residul plot is a graph of the points (x,y-^y)
What is a residual plot?
Grouped Frequency Distribution
When the range of the data is large, the data must be grouped into classes that are more than one unit in width, in what is called a grouped frequency distribution.
dot plot
consists of a graph in which each data value is plotted as a point along a scale of values. dots representing equal values are stacked
A ___ random variable has infinitely many values associated with measurements.
continuous
A __________ random variable has infinitely many values which can be plotted on a number line in an uninterrupted fashion.
continuous
Find the sample variance and standard deviation. 23, 11, 5, 9, 10
do on calc
A national consumer magazine reported the following correlations. The correlation between car weight and car reliability is -0.30. The correlation between car weight and annual maintenance cost is 0.20. Which of the following statements are true? I. Heavier cars tend to be less reliable. II. Heavier cars tend to cost more to maintain. III. Car weight is related more strongly to reliability than to maintenance cost. a. I only b. II only c. III only d. I and II only e. I, II, and III
e.
In a television advertisement, a company called "Waist Away" claimed the workout program on their set of DVDs would help people lose weight more than any other DVD workout program. To test this claim, an independent company, called "Slim Down," selected one other DVD program. They then randomly assigned half the volunteers to the Waist Away program and the other half to the Slim Down program. Each participant was weighed before they started the program and then regularly participated in their assigned program for one month. After one month, each participant was weighed again. The percent of weight lost was recorded for each person, where negative values indicated a weight gain. What type of study was performed?
experiment
An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were $437, $411, $487, and $248 . Compute the mean, median, and mode cost of repair.
mean: 395.75 median: 424
A value at the center or middle of a data set is a(n) ---
measure of center
A data value is considered ___ if its z-score is greater than or equal to -2, or less than or equal to 2.
ordinary
In a scatter-plot, a(n) _________ is a point lying far away from the other data points.
outlier
Assume that the proportion of people who live after suffering an aneurysm is 0.79. Suppose there is a new medicine that is used to increase the survival rate. Use the parameter p to represent the population proportion of people who survive after an aneurysm. For a hypothesis test of the medicine's effectiveness, researchers use a null hypothesis of p=0.79. What is the correct alternative hypothesis?
p>0.79
A ___ variable is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.
random
_______ is used when subjects are assigned to different groups through a process of random selection
randomization
What measure of variation is sensitive to extreme values?
range
kth percentile
denoted Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value. Percentiles divide a set of data that is written in ascending order into 100 parts; thus 99 percentiles can be determined. Ex. P1 divides the bottom 1% of the observations from the top 99%, P2 divides the bottom 2% of the observations from the top 98% and so on.
Method used that summarize or describe characteristics of data are called ______ statistics
descriptive
Methods used that summarize or describe characteristics of data are called - statistics.
descriptive
A _________ experiment allows the researcher to claim causation between an explanatory variable and a response variable
designed
Range is the
difference between the largest data value and the smallest
A ___ random variable has either a finite or a countable number of values.
discrete
A __________ random variable has either a finite or countable number of values.
discrete
________ result when the number of possible values is either a finite number or a 'countable' number
discrete data
Events that are ____ cannot occur at the same time.
disjoint (Disjoint events are mutually exclusive and cannot occur at the same time.)
Stratified sample
is obtained by dividing the population into homogeneous groups and random selecting individuals from each group
Resistant means
is the measure of central tendencies resitant to extreme values, does it alter the data significantly
The four levels of measurement that are commonly used for classifying data are ratio, _________, ________, and _________. interval, normal, ordinary nominal, ordinal, interval nominal, ordinal, categorical normal, ordinal, interval
nominal, ordinal, interval
The sample mean is a
statistic
Class width is found by _______.
subtracting a lower class limit from the next consecutive lower class limit
x with line above= weird Epison thing with x n
sum of all data values number of data values
Whenever a data value is less than the mean, _______.
the corresponding z-score is negative.
For data sets having a distribution that is approximately bell-shaped,_________ states that about 68% of all data values fall within one standard deviation from the mean
the empirical Rule
The Empirical Rule
the empirical rule can be used to determine the percentage of data that lie within k standard deviations of the mean. To help organize the empirical rule and make the analysis easier, draw a bell-shaped curve, as shown to the right. The line in the center of the curve represents the mean. The other lines are each 1, 2, and 3 standard deviations away from the mean.
percentiles
the percent of data below a point it is better to be in the 99th percentile because 99 percent of people are below you formula: P=B (# of data points below)/ T (total # of data points) * 100 ROUND NORMALLY
mean formula
the sum of all data values/ the number of data values
The bars in a histogram ___.
touch (without gaps)
Ogive
useful for determining the number of values below some particular value. it's a line graph that depicts cumulative frequencies. uses class boundaries along the horizontal scale (X) and cumulative frequencies along vertical scale (Y)
The square of the standard deviation is called the --
variance
population z-score
z = (x - µ) / σ
What is the formula for the z-score?
z = x value - mean or mew/ divided by standard deviation or sigma. The numerator X - mew is a *deviation score*. The denominator expresses deviation in standard deviation units.
The _______ represents the number of standard deviations an observation is from the mean.
z-score
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a_____.
z-score
When a data value is converted to a standardized scale representing the number of st. dev. the data value lies from the mean, we call the new value a __.
z-score.
Frequency Polygon
A(n) __________________ ______________ uses line segments to connect points located directly above class midpoint values.
Which of the following is always true?
A. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.<-- Correct B.The mean and median should be used to identify the shape of the distribution. C.For skewed data, the mode is farther out in the longer tail than the median. D.Data skewed to the right have a longer left tail than right tail.
Identify the symbols used for each of the following: (a) sample standard deviation; (b) population standard deviation; (c) sample variance; (d) population variance.
A. The symbol for sample standard deviation is s. b. The symbol for population standard deviation is σ. c. The symbol for sample variance is s^2 d. The symbol for population variance is σ^2.
What is an influential point?
An influential point is a point that changes the regression equation by a large amount.
Which of the following is NOT a value in the 5-number summary?
Mean
Formula to Find the Mean
Mean= Sum of all data values/number of data values.
Mean
Most often called average. The measure of center found by adding the data values and dividing the total by the number of data values. Means drawn from the same population tend to vary less than other measures of center. Uses every data value. Disadvantage: just one outlier can change the value of the mean substantially. So it is NOT a RESISTANT measure of center.
Researchers collect data by interviewing athletes who have won Olympic gold medals from 1992 to 2016. Identify the type of study. Retrospective Cross-sectional Prospective None of these
Retrospective
Distribution Shape and Boxplot
Right Skewed: If the median is to the left of the center of the box, the right whisker is longer than the left one Symmetric: If the median is at or near the center of the box, the whiskers are of equal lengths Left Skewed: If the median is to the right of the center of the box, the left whisker is longer than the right one.
Rounding rule:
Round z-scores to 2 decimal places
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
S= RANGE/4
The _______ of a probability experiment is the collection of all possible outcomes.
Sample space
Days before a presidential election, an article based on a nationwide random sample of registered voters reported the following statistic, "52% (±3%) of registered voters will vote for Robert Smith." What is the "±3%" called?
The "±3%" is called the margin of error.
Look at the #43 charts and answer the question: Which graph is more effective in showing the relative importance of the causes of work-related deaths?
The Pareto chart is better because it more clearly draws attention to the main cause of work-related death.
Histogram
The histogram is a graph that displays the data by using contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the frequencies of the classes.
Relative Frequency Graphs
The histogram, the frequency polygon, and the ogive shown previously were constructed by using frequencies in terms of the raw data. These distributions can be converted to distributions using proportions instead of raw data as frequencies. These types of graphs are called relative frequency graphs.
Determine which of the 4 levels of measurement is the most appropriate for the data below: Years in which a war was started
The interval level of measurement is the most appropriate because the data can be ordered, difference is no natural starting point
The accompanying data were collected from a statistics class. The column heads give the variable, and each of the rows represents a student in the class. Suppose you decided to code eye color using 1 for Blue and 0 for Not Blue. What would be the label at the top of the column?
The label would be blue.
Mode
The measure of center that is the value that occurs with the greatest frequency is the _______. The most frequently occurring score(s) in a distribution.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation.
Which measure of center (mean or median) is resistant? Explain what it means for that measure to be resistant.
The median is resistant because it is not sensitive to extreme values in the data set. If the largest observation was doubled, for example, the median would not change since that largest value does not factor into its computation.
The median
The median of a variable is the value that lies in the middle of the data when arranged in ascending order.We use M to represent the median
6. When you need to compute a raw score, that represents the minimum or maximum score needed to answer a question, look for the percentage in the question e.g. "What raw scores form the boundaries of the middle 60% of the distribution:
The middle 60% straddles the mean & can be divided into 2 = percentages; 30% & 30%. You look for the value closest to .3000 in the *mean to z column* & locate the z-score in that row. Then you use that z-score in the formula we use to compute raw score: X=mew + z sigma
Ring sizes typically range from about 3 to about 14. Based on what you know about gender differences, if we randomly select a person, are the event that ring size is smaller than size 5 and that the person is a male independent or associated? Explain.
The two events are associated because men on average have larger hands than women and this affects the probability of being smaller than size 5.
Cumulative Frequency Distribution
A cumulative frequency distribution is a distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary).
Fill in the blank. A _______ helps us understand the nature of the distribution of a data set.
A frequency distribution helps us understand the nature of the distribution of a data set.
Identify the type of observational study used: A town obtains current employment data by polling 10,000 of its citizens this month. Prospective Retrospective Cross-sectional None of these
Cross-sectional
Interval
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Monthly temperatures: 65° F, 70° F, 75° F, 80° F, and 85° F Choose the correct answer below: Ratio Nominal Interval Ordinal
Determine whether the given value is from a discrete or continuous set " The total number of phone calls a sales representative makes in a month is 425."
Discrete
True or False: A data set will always have exactly one mode.
Fasle -The mode of a variable is the most frequent observation of the variable that occurs in the data set. To compute the mode, tally the number of observations that occur for each data value. The data value that occurs most often is the mode. A set of data can have no mode, one mode, or more than one mode. If no observation occurs more than once, the data have no mode.
A researcher hypothesizes more than 85% of Americans own a cell phone. Which of the following would be an example of researchers making a Type II Error?
From a study conducted, researchers failed to reject their null hypothesis. In fact 90% of Americans own cell phones.
Frequency
How many times a person falls into a category
Researchers wondered if brain size has an effect on a person's IQ. From a sample of 20 individuals, the equation of the least-squares regression line is y = 71.8 + 0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the slope?
IQ is predicted to increase by 0.0286 for every 1 cubic centimeter increase in brain size.
The length of the box in a boxplot is proportional to which of the following?
IQR
UNION
IS THE NUMBER FROM BOTH AND THE NUMBER THEY HAVE IN COMMON.
SAMPLE SPACE
IS THE SET OF ALL THE POSSIBLE OUTCOMES.
PROBABILITY
IT IS A PREDICTION OF A CERTAIN OUTCOME
A bar chart and a Pareto chart both use bars to show frequencies of categories of categorical data. What characteristic distinguishes a Pareto chart from a bar chart, and how does that characteristic help us in understanding the data?
In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.
When a person stands trial for murder, the jury is instructed to assume that the defendant is innocent. Is this claim of innocence an example of a null hypothesis, or is it an example of an alternative hypothesis?
It is a null hypothesis, since it is assumed to be true until evidence can prove otherwise.
Upper Class limits
Largest numbers in each categories
For what types of associations are regression models useful?
Linear
Which measure of variation is very sensitive to extreme values?
Range
Which measure of variation is very sensitive to extreme values
Range
Which measure of variation is very sensitive to extreme values?
Range
which measure of variation is very sensitive to extreme values?
Range
The _______ is a tool for making predictions about future observed values and is a useful way of summarizing a linear relationship.
Regression equation
In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
Relative Frequency Distribution
z-score (often called the standardized value)
Represents the distance that a data value is from the mean in terms of the number of standard deviations. (It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation) The z-score is unitless. It has a mean 0 and standard deviation 1. The z-score is often called the standardized value.
Cumulative Frequency
The cumulative frequency for a class is the sum of the frequencies for that class and all previous classes (example pg.49).
Cumulative Frequency
The cumulative frequency is the sum of the frequencies accumulated up to the upper boundary of a class in the distribution.
Identify which of the following statements is not a requirement for a probability density function or state that they all are.
The curve must be symmetric and centered at zero
Look at #25 chart and answer the questions: Construct a histogram on the calculator. Are the data reported or measured?
The data appears to be measured. The heights occur with roughly the same frequency.
State whether the data described below are discrete or continuous and explain why: The exact ages in hours of different cockroaches found in a certain city
The data are continuous because the data can take any value in an interval
Look at #48 chart and answer the questions: What impression does the graph create? Does the graph depict the data fairly?
The graph creates the impression that men have salaries that are more than twice the salaries of women. No, because the vertical scale does not start at zero.
No, there does not appear to be a correlation because there is no general pattern to the data.
The heights of a certain country's presidents and their main opponents in the election campaign have been constructed into a scatterplot (above). Does there appear to be a correlation?
One common system for computing a grade point average (GPA) assigns 4 points to an A, 3 points to a B, 2 points to a C, 1 point to a D, and 0 points to an F. What is the GPA of a student who gets an A in a 33-credit course, a B in each of two 2-credit courses, a C in a 3-credit course, and a D in a 2-credit course?
The mean grade point average is a 2.7
Under what conditions is the mean preferred?
The mean is preferred when the data is relatively symmetric.
In determining the mean age of all students at your school, you survey 30 students and find the mean of their ages. Is this mean x or μ?
The mean is x.
If each monthly cell phone bill in the country were doubled, how would the mean of the cell phone bills be affected?
The mean of the cell phone bills would double.
Suppose, on the warmest day of the month, the daily high temperature in a city is accidentally recorded as 700 instead of 70 degrees Fahrenheit. Compare the effect this mistake will have on the mean monthly high temperature to the effect on the median monthly high temperature.
The mean will increase significantly, but the median will not change as a result of the mistake.
Definition of Mean (or Arithmetic Mean)?
The measure of center found by adding the data values and dividing the total by the number of data values.
What is mean of a set of data?
The measure of center found by adding the data values and dividing the total by the number of data values.
Definition of Midrange
The measure of center taht is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2.
Definition of Median
The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.
What is the Median?
The measure of center that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude.
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
Suppose a researcher is testing someone to see if he or she can tell Soda X from Soda Y, and the researcher is using 20 trials, half with Soda X and half with Soda Y. The null hypothesis is that the person is guessing. The alternative is one-sided, Ha: p0>0.5. The person gets 13 right out of 20. The p-value comes out to be 0.090. Explain the meaning of the p-value.
The probability that a person will get 13 or more right, if the person is truly guessing, is about 9%.
Is the time it takes for a light bulb to burn out discrete or continuous?
The random variable is continuous.
Is the number of socks in a drawer discrete or continuous?
The random variable is discrete.
What makes the range less desirable than the standard deviation as a measure of dispersion?
The range does not use all the observations.
Interquartile range (IQR)
The range of the middle 50% of the observations in a data set. The difference between the upper quartile and the lower quartile. IQR = Q3 - Q1 Interpretation of the interquartile range is similar to that of the range and standard deviation. That is, the more spread a set of data has, the higher the interquartile range will be.
Determine which of the levels of measurement is most appropriate for the data below: Brain volumes measured in cubic cm
The ratio level of measurement is the most appropriate because the data can be ordered, differences can be found and are meaningful, and there is a natural starting point
What does a correlation coefficient of 0 indicate?
There is no linear relationship between the two quantitative variables.
A person was trying to figure out the probability of getting two heads when flipping two coins. She flipped two coins 10 times, and in 6 of these 10 times, both coins landed heads. On the basis of this outcome, she claims that the probability of two heads is 6/10, or 60%. Is this an example of an empirical probability or a theoretical probability? Explain.
This is an example of empirical probability because it is based on an experiment.
Fill in the blank. We utilize statistical _______ to look for features that reveal some useful or interesting characteristics of the data set.
We utilize statistical graphs to look for features that reveal some useful or interesting characteristics of the data set.
If the R^2 is 1 or very near 1, its a good fit. If its close to 0 its a poor fit.
What is a good fit for R^2? What isn't a good fit?
The prediction interval is used for estimate of a value of a variable. A confidence interval is used for an estimate of a value of a population parameter.
What is the difference between the confidence interval and the prediction interval?
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
Look at #7 charts and answer the question: Do cigarette filters appear to be effective?
Yes, because the relative frequency of the higher tar classes is greater for nonfiltered cigarettes.
To describe the exact position of a score within a distribution, z-score must transform each x-value into a signed number; positive or negative.
all z-scores above the mean are positive and all z-scores below the mean are negative. The number tells the distance between the score and the mean in terms of the number of standard deviations.
Typically, the direction (>, <, or ≠) used in the _______ hypothesis is determined from the question of interest.
alternative
pictographs
drawings of objects, are often misleading because they can create false impressions that distort differences
A(n) _______ uses line segments to connect points located directly above class midpoint values.
frequency polygon
quantitative data
measures how much. such as weights of high school students. ARE DOT-PLOTS, HISTOGRAMS, AND STEM PLOTS.
What measure of central tendency best describes the "center" of the distribution when the graph is skewed
median
________ are sample values that lie very far away from the majority of the other sample values
outliers
A -- histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
The ____ and the ____ of a correlation coefficient describe the direction and the magnitude of the relationship between two variables
sign; absolute value
Class width is found by -------.
subtracting a lower class limit from the next consecutive lower class limit
midrange
the price exactly in between the highest and lowest
P (A or B) indicates ____.
the probability that in a single trial, event A occurs, event B occurs, or they both occur.
P(A or B) indicates
the probability that in a single trial, event A occurs, event B occurs, or they both occur
s
the sample variance symbol is
A data value is considered - if its z-score is less than -2 or greater than 2.
unusual
A data value is considered ______ if its z-score is less than -2 or greater than 2.
unusual
A data value is considered _________ if the z-score is less than -2 or greater than 2.
unusual
Inferential statistics
uses methods that generalize results obtained from a sample to the population and measure the reliability of the results
Weighted mean
when different x values are assigned different weights, w. Multiply each weight w by the corresponding value x, then add the products, and finally divide that toal by the sum of the weights, Sigma*w.
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a --
z-score
Find the population mean or sample mean as indicated. Population: 2, 1, 11, 15, 6
µ= 7
frequency polygon
A(n) __________________ ______________ uses line segments to connect points located directly above class midpoint values.
In the data table below, the x-values are the weights (in pounds) of cars and the y-values are the corresponding highway fuel consumption amounts (in mi/gal). Weight (lb) 40884088 33583358 41334133 36503650 35453545 Highway Fuel Consumption (mi/gal) 2626 3131 2929 2929 3030 Comment on the source of the data if you are told that car manufacturers supplied the values. Is there an incentive for car manufacturers to report values that are not accurate?
Yes, because consumers, in general, would prefer to buy a car with a higher level of fuel efficiency. In this case, the source of the data would be suspect with a potential for bias.
If the standard deviation of a variable is 10, what is the variance?
100
Number of notes in a song...
Discrete b/c its countable
what values can't be probabilities?
0<P(A)<1 greater than or equal too
Listed below are the top 10 annual salaries (in millions of dollars) of TV personalities. Find the range, variance, and standard deviation for the sample data. Given that these are the top 10 salaries, do we know anything about the variation of salaries of TV personalities in general? 40 38 36 29 17 15 13 9 8.6 8.0 The range of the sample data is $-- million. The variance of the sample data is -----. The standard deviation of the sample data is $----- million. Is the standard deviation of the sample a good estimate of the variation of salaries of TV personalities in general?
32 168.94 13.00 No, because the sample is not representative of the whole population.
Median
"Middle value." The measure of cneter that is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. Denoted by x-tilde. Sort values. If number of data values is odd, the median number located in the exact middle of list. Even? Mean of the two middle numbers. Properties: does not change by large amounts when we include just a few extreme values. It is a RESISTANT measure of center. Does not use every data value.
Varience is
(standard deviation)^2
Important Properties of the Median
- The median does change by large amounts when we include just a few extreme values (so the median is a resistant measure of center). - The median does not use every data value.
IQR (Interquartile Range)
-MEASURE OF DISPERSION (VARIABILITY) Remember, is data is symmetric: best measure of central tendency is the mean, while the best measure of dispersion is standard deviation. AND IQR (Q3-Q1) HOWEVER, if data is skewed or if it contains, best measure of central tendency is the median, and the best measure of dispersion is the IQR -DEFINITION: the range of the middle 50% of the observations in a data set. ===IQR=Q3-Q1 But if the data set is skewed and or has outliers:THE BEST MEASURE OF DISPERSION: IQR/2 = (Q3-Q1)/2
Check recording 12 minutes for a step by step process on how to approach a problem!!!!
...
The hights of the bars of a histogram correspond to ______ values?
...
If the probability that it will rain tomorrow is 0.30, the probability that it will not rain tomorrow is what?
0.70
How to check outliers with Quartiles Rule
1. Determine Lower (Q1=QL) and Upper Quartiles (Q3=QU) 2. Compute IQR 3. Determine Lower and Upper Fences a. LIF=QL -1.5(IQR) b. UIF=QU +1.5(1QR) c. LOF=QL -3(IQR) d. UOF=QU+3(IQR)
1. If the computed linear correlation coefficient r lies in the left/right tail beyond the leftmost/rightmost critical value. Or if the |r| exceeds value on table A-6. 2. If the computed linear correlation lies between the two critical values.
1. How do we know that there is a correlation (using r)? 2. How do we know if there is no correlation?
What are two important properties of x̃?
1. The median does not change by large amounts when we include just a few outliers. 2. The median does not use every data value.
Heights of adult males are known to have a normal distribution. A researcher claims to have randomly selected adult males and measured their heights with the resulting relative frequency distribution as shown here. Identify two major flaws with these results.
1. The sum of the relative frequencies is 125%, but it should be 100%, with a small possible round-off error. 2. All of the relative frequencies appear to be roughly the same. If they are from a normal distribution, they should start low, reach a maximum, and then decrease.
median rules
1. if data values = odd, then median is in exact middle of list 2. if data values = even, median is found by computing mean of two middle numbers
A professor has recorded exam grades for 10 students in his class, but one of the grades is no longer readable. If the mean score on the exam was 82 and the mean of the 9 readable scores is 86, what is the value of the unreadable score?
10 X 82 = 820 - 9 X 86= 774. 820 - 774 = 46 A= 46
Look at #24 chart and answer the questions: What is the class width? What are the approximate lower and upper class limits of the first class?
20 Lower class: 105 Upper class: 125
Refer to the data set of times, in minutes, required for an airplane to taxi out for takeoff, listed below. Find the mean and median. How is it helpful to find the mean? 13 19 12 39 13 36 36 47 12 19 18 26 13 45 28 15 14 47 42 17 38 16 41 13 18 43 50 28 33 17 17 38 12 17 48 34 41 25 42 10 28 39 28 48 46 36 18 17 Find the mean and median of the data set using a calculator or similar data analysis technology. The mean of the data set is --- minutes. The median of the data set is -- minutes. How is it helpful to find the mean?
28.2 28 The mean taxi out time is important for calculating and scheduling the arrival time.
Look at #2 chart and answer the questions: What is the class width? What are the class midpoints? What are the class boundaries?
3 51, 54, 57, 60, 63, 66, 69 49.5, 52.5, 55.5, 58.5, 61.5, 64.5, 67.5, 70.5
Statistics are sometimes used to compare or identify authors of different works. The lengths of the first 10 words in a book by Terry are listed with the first 10 words in a book by David. Find the mean and median for each of the two samples, then compare the two sets of results. Terry: 2 2 11 5 2 3 3 2 9 4 David: 3 4 3 3 3 2 3 2 1 1 The mean number of letters per word in Terry's book is --- The median number of letters per word in Terry's book is - The mean number of letters per word in David's book is -- The median number of letters per word in David's book is - Compare the two sets of results. Does there appear to be a difference?
4.3 3 2.5 3 Yes. Based on the results, words in Terry's book are longer than the words in David's book.
empirical rule
68% of all data is within one standard deviation of the mean 95% of all data is within 2 standard deviations of the mean 99.7% of all data is within 3 standard deviations
Waiting times (in minutes) of customers at a bank where all customers enter a single waiting line and a bank where customers wait in individual lines at three different teller windows are listed below. Find the coefficient of variation for each of the two sets of data, then compare the variation. Bank A (single line): 6.5 6.5 6.6 6.8 7.1 7.3 7.3 7.7 7.7 7.8 Bank B (individual lines): 4.3 5.3 5.8 6.2 6.7 7.7 7.7 8.6 9.3 9.9 Is there a difference in variation between the two data sets?
7.2 25.3 The waiting times at Bank A have considerably less variation than the waiting times at Bank B
The following are the duration times (in minutes) of all missions flown by a space shuttle. Use the given data to construct a boxplot and identify the 5-number summary. 9 7459 8861 10,024 10,100 10,118 11,453 11,523 11,841 The five number summary is - - - - - Make a boxplot on the calculator
9, 8682, 10062, 11453, 11841
A Type II Error is made... A Type II Error is made when there's not enough evidence to reject the null hypothesis and the null hypothesis is true. A Type II Error is made when there's evidence to reject the null hypothesis, but the null hypothesis is true. A Type II Error is made when there's not enough evidence to reject the null hypothesis, but the null hypothesis is not true. A Type II Error is made anytime we do not reject the null hypothesis.
A Type II Error is made when there's not enough evidence to reject the null hypothesis, but the null hypothesis is not true.
What type of effect can outliers have on a regression line?
A big effect
Fill in the blank. A histogram aids in analyzing the _______ of the data.
A histogram aids in analyzing the shape of the distribution of the data.
What is a value at the center or middle of a data set?
A measure of center.
Whenever a data value is less than the mean, _______
A negative z-score indicates a data value is less than the mean.
MEASURE OF CENTER
A value at the center or middle of a data set is a(n) _________.
How do you find the midrange?
Add the Max and min data value and then divide the sum by 2.
Which word is associated with multiplication when computing probabilities?
And
Correlation does not imply: Linearity Bias Causation Significance
Causation
Center
Center value of data(CVDOT)
Which of the following is NOT a procedure for determining whether it is reasonable to assume that sample data are from a normally distributed population? a. Visual inspection of a Histogram to determine if its roughly "bell shaped" b. Constructing a probability plot (QQ) c. Identifying the outliers. d. Checking that the probability of an event is 0.05 or less.
Checking that the probability of an event is 0.05 or less.
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "An education researcher randomly selects 48 middle schools and interviews all the teachers at each school."
Cluster
Identify which type of sampling is being used: An education researcher randomly selects 48 middle schools and interviews all the teachers at each school.
Cluster
Standardized Distribution
Composed of scores that have been transformed to create predetermined values for mean standard deviation. They are used to make dissimilar distributions comparable.
Of the following, which is the only method of data collection suitable for making conclusions about causal relationships?
Controlled experiments
Identify which type of sampling is being used: A researcher interviews 19 work colleagues who work in his building.
Convenience
Discrete
Countable number
Determine whether the given value is a discrete or continuous variable: People are asked to state how many times in the last month they visited their family doctor Continuous Discrete
Discrete
The distribution appears to be skewed to the left (or negatively skewed).
Does the graph suggest that the distribution is skewed? If so, how?
What must be true for a sample to be considered a simple random sample?
Every possible sample of that size must have the same chance of being selected.
What does it mean if a statistic is resistant?
Extreme values (very large or small) relative to the data do not affect its value substantially
Identify the given statement as either true or false. The standard deviation is a resistant measure of spread.
False
Which of the following is true for a normal probability density curve?
For a normal probability density curve, as x gets larger and larger, the graph approaches but never reaches the horizontal axis.
It has been noted that people who go to church frequently tend to have lower blood pressure than people who don't go to church. Does this mean you can lower your blood pressure by going to church? Why or why not? Explain.
Going to church may not cause lower blood pressure. Just because two variables are related does not show that one caused the other.
A student wondered if more than 10% of students enrolled in an introductory Chemistry class dropped before the midterm. He noticed that 2 out of 15 of his friends in the class dropped before the midterm. Based on his sample, he performs a hypothesis test. Which of the following statements is true?
He should not make a conclusion about all students in the introductory Chemistry class since he took a convenience sample.
Difference in Pareto and Bar charts
In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.
Suppose every student in a class is surveyed and it is found that 75% of the class plans to take another math class. It is reported that 75% of all students at the school plan to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Inferential statistics; the results of the class sample are extended to make a generalization about the population of all students at the school.
When computing the correlation coefficient, what is the effect of changing the order of the variables on r?
It has no effect on r.
What is the first step in almost every investigation of data?
Make an appropriate graph.
Which of the following is NOT a value in the 5-number summary? -Median -Mean -Minimum -Q1
Mean
Which of the following is NOT needed to construct a boxplot?
Mean
What is the formula to calculate mean?
Mean = Σx / n
What are the measures of center?
Mean, medium, mode and midrange.
The interquartile range tells us how much space the _____ of the data occupy.
Middle 50%
Class boundaries
Midpoint gap between the numbers 60-69 70-79 The class boundaries is 69.5 and 79.5
When summarizing graphs of categorical data, report the _______ and describe the _______.
Mode(s), variability
If events A and B are independent, what must be done to find the probability of event A AND B?
Multiply the probability of A and the probability of B.
When two events have no outcomes in common, they are called what?
Mutually exclusive
Under what conditions can extrapolation be used to make predictions beyond the range of the data?
Never
When can a correlation coefficient based on an observational study be used to support a claim of cause and effect?
Never
Are any of the measures of dispersion among the range, the variance, and the standard deviation, resistant? Explain.
No, all of these measures of dispersion are affected by extreme values.
Days before a presidential election, a nationwide random sample of registered voters was taken. Based on this random sample, it was reported that "52% of registered voters plan on voting for Robert Smith with a margin of error of ±3%." The margin of error was based on a 95% confidence level. Can we say with 95% confidence that Robert Smith will win the election if he needs a simple majority of votes to win?
No, because 50% is within the bounds of the confidence interval.
Three cards are drawn without replacement from a standard deck, and the number of kings is noted. Does this constitute a binomial experiment? Why or why not?
No, because the probability of getting a king is not the same for each of the three draws.
When two dice are rolled, the sum is between 2 and 12 inclusive. A student simulates the rolling of two dice and finding the sum by randomly generating integers between 2 and 12. Does this simulation behave in a way that is similar to actual dice? Why or why not?
No; The student's simulation will generate the sums with equal probability when in fact the sums are not equally likely.
A frequency distribution lists the ______ of occurrences of each category of data, while a relative frequency distribution lists the __________ of occurrences of each category of data.
Number; Proportion
What are two basic types of variables in statistics?
Numerical and categorical
Quantitative
Numerical data
Listed below are blood groups of O, A, B, and AB of randomly selected blood donors. Construct the relative frequency distribution. A O O O O A A O A O O A A O O AB O A B A AB O A O B O O AB A A AB A O A B O AB O O O Find the relative frequency for O, A, B, AB
O: 47.5% A: 32.5% B: 7.5% AB:12.5%
ADDITION
OR REFERS TO ______ RULE.
A study is conducted to measure children's growth rates without any treatment applied to the children. What best classifies this study?
Observational
_______ are sample values that lie very far away from the majority of the other sample values.
Outlier
____ are sample values that lie very far away from the majority of the other sample values.
Outliers
Determine whether the given value is a statistic or a parameter. "After inspecting all 45,000kg of meat stores at the Wurst Sausage Company, it was found that 20,000kg of the meat was spoiled."
Parameter
Determine whether the given value is a statistic or a parameter. "After taking the first exam, 15 of the students dropped the class."
Parameter
What is a numerical value that characterizes some aspect of a population? Statistic Census Parameter Estimator
Parameter
Suppose a researcher is testing someone to see whether she or he can tell Soda X from Soda Y, and the researcher is using 22 trials, half with Soda X and half with Soda Y. The null hypothesis is that the person is guessing. Suppose person A gets 19 right out of 22, and person B gets 15 right out of 22. Which will have a smaller p-value, and why?
Person A will have a smaller p-value because that person's number of successes is further from the hypothesized number of successes.
Which of the following is a reason we can never draw cause-and-effect conclusions from observational studies?
Potential confounding variables may explain the differences between groups rather than the treatment variable.
A(n) ______can be used to compute probabilities of continuous random variables.
Probability density function
Favorite rock group is qualitative or quantitative?
Qualitative because it is an attribute classification
Classify the data as either qualitative or quantitative. The following table gives the top five movies at the box office this week. Rank-last week-movie title-studio-sales (millions$) What kind of data is provided by the information in the first column?
Quantitative
Determine whether the data are qualitative or quantitative. "the number of seats in a movie theater"
Quantitative
AND
REFERS TO MULTIPLICATION. PROBABILITY OF EVENTS( A AND B) FOR INDEPENDENT EVENTS P(A AND B) = P(A)*P(B)
Relative Frequency =
RF = Frequency / Sum of all Frequencies
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A manman experienced a tax audit. The tax department claimed that the man was audited because he was randomly selected from all the tax payers.
Random
Yes. The frequencies start low, reach a maximum, then become low again, and are roughly symmetric about the maximum frequency. The Histogram would be bell-shaped, and NOT skewed.
Refer to the frequency distribution (above) of 25 home voltage measurements below, with a lower class limit of 127.7 volts, and a class width of 0.2 volt. Does the result appear to have a normal distribution? Why or why not?
No. The data values in each class could take on any value between the class limits, inclusive.
Refer to the table summarizing service times (seconds) of dinners at a fast food restaurant. How many individuals are included in the summary? Is it possible to identify the exact values of all of the original service times?
A medical study was investigating if getting a flu shot actually reduced the risk of developing the flu. A hypothesis test is performed. Which of the following will result in a Type I error?
Researchers said the flu shot reduced the risk of developing the flu when it actually didn't.
In a poll of 50,000 randomly selected college students, 74% answered "yes" when asked "Do you have a television in your dorm room?" Identify the sample and population.
Sample: the 50,000 selected college students; population: all college students
A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
Scatterplot
A histogram aids in ______ the of data
Shape of the distribution
Jan performed a study and obtained a p-value of 1.24. What conclusion should Jan make?
She made an error since it is not possible to get a p-value of 1.24.
Why is it important to learn about bad graphs?
So that we can critically analyze a graph to determine whether it is misleading.
Standard deviation measures the _____ of the distribution
Spread
To compute the variance, what should one do?
Square the standard deviation.
A z-score represents how many ______________ a data value is above or below the ______________.
Standard deviations, mean
What is the standard deviation of the sampling distribution called?
Standard error
A health and fitness club surveys 40 randomly selected members and found that the average weight of those questioned is 157 lb. Is this value a statistic or a parameter?
Statistic
What is an important difference between statistics and parameters?
Statistics are knowable, but parameters are typically unknown.
Checking for Outliers by Using Quartiles
Step 1 Determine the first and third quartiles of the data. Step 2 Compute the interquartile range. Step 3 Determine the fences. Fences serve as cutoff points for determining outliers. Lower Fence = Q1 - 1.5(IQR) Upper Fence = Q3 + 1.5(IQR) Step 4 If a data value is less than the lower fence or greater than the upper fence, it is considered an outlier.
Σ
Sum of all data values
IQ scores are measured with a test designed so that the mean is 108 and the standard deviation is 17. Consider the group of IQ scores that are unusual. What are the z scores that separate the unusual IQ scores from those that are usual? What are the IQ scores that separate the unusual IQ scores from those that are usual? (Consider a value to be unusual if its z score is less than −2 or greater than 2.) What are the z scores that separate the unusual IQ scores from those that are usual? What are the IQ scores that separate the unusual IQ scores from those that are usual?
The Lower z-score boundary is -2 The higher Z score boundary is 2 The lower bound IQ score is (formula is: X=μ+σ*Z) Lower Bound score is X= 108+17*-2 = 74 Lower bound score is 74 The Higher Bound score is (formula is: X=μ+σ*Z) higher bound score is X= 108+17*2 = 142
S=RANGE/4
The Range Rule of Thumb roughly estimates the standard deviation of a data set as
Fill in the blank. The bars in a histogram _______.
The bars in a histogram touch.
Touch
The bars in a histogram __________.
Class Midpoint
The class midpoint Xm is obtained by adding the lower and upper boundaries and dividing by 2, or adding the lower and upper limits and dividing by 2:
State whether the data described below are discrete or continuous: The number of programs installed on various computers
The data are discrete because the data can only take in specific value
Ordinal
The level of measurement of: Positions of runners in a race is ______________. Interval Ordinal Ratio Nominal
Which of the following is NOT a property of the linear correlation coefficient r? -The value of r is always between -1 and 1 inclusive. -The value of r is not affected by the choice of x or y. -The value of r measure the strength of a linear relationship. -The linear correlation r is robust. This is, a single outlier will not affect the value of r.
The linear correlation coefficient is robust. That is, a single outlier will not affect the value of r.
A highly selective boarding school will only admit students who place at least 2 standard deviations above the mean on a standardized test that has a mean of 200 and a standard deviation of 24. What is the minimum score that an applicant must make on the test to be accepted?
The minimum score that an applicant must make on the test to be accepted is 248
If we collect a large sample of blood platelet counts and if our sample includes a single outlier, how will that outlier appear in a histogram?
The outlier will appear as a bar far from all of the other bars with a height that corresponds to a frequency of 1.
Class Midpoints
The values in the middle of the classes. Each class midpoint is found by adding the lower class limit to the upper class limit and dividing it by 2 (example pg. 47)
Look at #40 and answer the questions: Construct a time-series graph (line graph) on the calculator. What is the trend? How does this trend compare to the trend for drive-in movie theaters?
There appears to be an upward trend, unlike drive-in movie theatres, which have a downward trend.
A friend flips a coin 10 times and says that the probability of getting a head is 40% because he got four heads. Is the friend referring to an empirical probability or a theoretical probability? Explain.
This is an example of empirical probability because it is based on an experiment.
True or False: Chebyshev's inequality applies to all distributions regardless of shape, but the empirical rule holds only for distributions that are bell shaped
True, Chebyshev's inequality is less precise than the empirical rule, but will work for any distribution, while the empirical rule only works for bell-shaped distributions
A categorical variable is only called bimodal under what circumstances?
Two categories are nearly tied for most frequent outcomes.
The existence of multiple mounds in a distribution is sometimes a sign of which of the following?
Two very different groups have been combined into a single collection
Construct a scatter diagram using the data table to the right. This data is from a study comparing the amount of tar and carbon monoxide (CO) in cigarettes. Use tar for the horizontal scale and use carbon monoxide (CO) for the vertical scale. Determine whether there appears to be a relationship between cigarette tar and CO.
Use excel - highlight all the values - insert "scattergram"
Which of the following is a common distortion that occurs in graphs?
Using a two-dimensional object to represent data that are one-dimensional in nature
Which of the following is not something that one looks for when studying scatterplots?
Variation
The study of statistics rest on what two major concepts?
Variation and data
Which characteristic of data is a measure of the amount that the data values vary?
Variations
1. The sample of paired (x,y) data is a simple random sample of quantitative data. 2. Visual examination of the scatter plot must confirm that the points approximate a straight-line pattern. 3. Outliers must be removed if they are known to be errors.
What are the requirements that should be satisfied before finding r?
To make predictions for the value of one of the variables given some specific value of the other variable.
What can use the regression equation for?
Negative correlation
What can we say about r = -.965 As the x values increase, the y values decrease.
No correlation
What can we say about r = 0? No distinct pattern between x and y.
To measure how points are configured among four quadrants.
What can we use Σ(Zx*Zy) for?
3 sig fig.
What do we round bof1 and bof0 to?
What is a No Mode Data Set?
When no data value is repeated, we say that there is no mode.
What is the difference between a random sample and a simple random sample?
With a random sample, each individual has the same chance of being selected. With a simple random sample, all samples of the same size have the same chance of being selected.
Look at #39 chart and answer the questions? Construct a scatterplot on the calculator. Is there a relationship between cigarette tar and CO?
Yes, as the amount of tar increases the amount of carbon monoxide also increases.
When using the addition rule
always be careful to avoid double-counting outcomes.
A political pollster reports that her candidate has a 5% lead in the polls. This is an example of
an Observational Study
Outliers
are sample values that lie very far away from the majority of the other sample values.
Variables
are the characteristics of the individuals within the population
In a probability histogram, there is a correspondence between ___.
area and probability.
Why is range not a good measure?
because it doesn't give you how wide the data is talking about but not weather it's scrunched or dispersed or how many n or N is
The U.S. Department of Housing and Urban Development(HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?
because the data are skewed right
If the data set is symmetric or approximately symmetric, and no outliers, then
best measure of center: mean best measure of dispersion: standard deviation
round-off rule
carry one more decimal place than is present in original set of values; because values of the mode are the same as some of the original data values, they can be left as is without any rounding
A ________ is the collection of data from every member of the population. sample census placebo statistic
census
Which of the following is NOT a measure of center?
census
Height is a
continuous variable
Methods used that summarize or describe characteristics of data are called ___ statistics.
descriptive
sample space
for a procedure consists of all possible simple events or all outcomes that cannot be broken down any further
midrange
half way between highest and lowest formula: max+min/2
relative frequency histogram
histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies
A scatterplot
is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
How to know if a certain data is "unusual"?
it is more than two standard deviations below the mean
which measures of central tendencies are not resistent
mean, range and standard deviation
What are the four levels of measurement?
nominal ordinal interval ratio
p-values
only a small P-value, such as .05 or less (5% chance or less) suggests that the sample results are not likely to occur by chance when there is no linear correlation, so a small P-value supports a conclusion that there is a linear correlation between the two variables.
Raw score
original, unchanged scores that are the direct result of measurement. A test score that has not been transformed or converted in any way.
Population mean is a
parameter
Below are 36 sorted ages of an acting award winner. Find the percentile corresponding to age 59 using the method presented in the textbook. 16,17,17,21,22,27,30,33,37,37,40,42,43,48,54,56,57,59,59,60,60,62,62,64,65,65,68,70,70,72,72,73,74,77,78,80
percentile of value = number of values less than x Over total number of values times 100 For this problem x=59. How many values are less than 59? 17 What is the total number of values? 36 59=17/36 x100
When drawings of objects are used to depict data, false impressions can be made. These drawings are called -.
pictographs
the 68-95-99.7% rule applies for
roughly all bell-shaped curves
What is the symbol for sample standard deviation?
s
nonzero axis
some graphs are misleading because on or both of the axes begin at a value other than zero, so differences are exaggerated
The ____ linear relationship is indicated by a correlation coefficient of -1 or 1.
strongest
The larger the standard deviation means...
that observations are more distant from the typical value, and therefore more dispersed
Sampling bias means
that the technique used to obtain the sample's individuals tend to favor one part of the population over another
Midrange
the value midway between the maximum and minimum values in the original data set. (Max data value + minimum data value)/2. Properties: it is very sensitive to extremes.
Mode
the value that occurs with the greatest frequency. Bimodal, multimodal, no mode. Only measure of center that can be used with data at the nominal level of measurement.
What is the square of the standard deviation called?
the variance. (s2)
What is the purpose of z-scores?
to describe the exact location of each score in a distribution; -always refers to population (must use a different formula for samples).
The bars in a histogram
touch
The bars in a histogram -.
touch
The bars in a histogram _______.
touch
A data value is considered _______ if its z-score is less than minus−2 or greater than 2.
unusual
The square of the standard deviation is called the _______.
variance
the square of a standard deviation is called the
variance
The square of the standard deviation is called the _______.
variance v=Standard dev^2
how to tell which histogram has the highest standard deviation
which ever graph is more spread out
How do you calculate Mean from a frequency distribution?
x̄ = Σ (f * x) / Σf
What is the formula to find a weighted mean?
x̄ = Σ(w*x) / Σw
What is the formula to find the mean of a set of sample values?
x̄ = Σx / n
Determine the regression equation for the data. Round the final values to three significant digits, if necessary. x= 0, 3, 4, 5, 12 Y= 8,2,6,9,12
y hat= 4.88 + 0.525x
symmetric data
you could fold graph down the middle and it would be the same on both sides; bell curve
sample z-score
z = (x - x̄) / s
What is the symbol for population variance?
σ2
A bar chart and a Pareto chart both use bars to show frequencies of categories of categorical data. What characteristic distinguishes a Pareto chart from a bar chart, and how does that characteristic help us in understanding the data?
In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.
Listed below are the playing times (in seconds) of songs that were popular at the time of this writing. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. Is there one time that is very different from the others? 444 237 232 246 246 297 277 223 239 211 262 254 258 a. The mean is -- seconds. b. The median is ---- seconds. c. The mode is ---- seconds. d. The midrange is ------ seconds. Is there one time that is very different from the others?
a. 258.7 b. 246 c. 211, 246 d. 327.5 Yes; the time of 444 seconds is very different from the others.
Listed below are head injury measurements from small cars that were tested in crashes. The measurements are in "hic," which is a measurement of standard "head injury criterion," (lower "hic" values correspond to safer cars). The listed values correspond to cars A, B, C, D, E, F, and G, respectively. Find the a. mean, b. median, c. midrange, and d. mode for the data. Also complete parts e. and f. 393 365 489 327 510 539 355 a. Find the mean . b. Find the median. c. Find the midrange. d. Find the mode. e. Which car appears to be the safest? f. Based on these limited results, do small cars appear to have about the same risk of head injury in a crash?
a. 425.4 b. 393 c. 433 d. There is no mode e. car D f. No, because the data values differ substantially.
Frequency Distribution
A frequency distribution is the organization of raw data in table form, using classes and frequencies.
Measure of Center
A value at the center or middle of a data set. There are several different ways to determine the center, so there are different definitions of measures of center, including the mean, median, mode, and midrange.
____ measure the strength of association between two variables
Correlation coefficients
Fill in the blank. The heights of the bars of a histogram correspond to _______ values
Frequency
Relative Frequency Distribution
Lists each category of data together with the relative frequency
Which of the following is NOT needed to construct a boxplot?
Mean
We utilize statistical _______ to look for features that reveal some useful or interesting characteristics of the data set.
graphs
Generally, the correlation coefficient of a ____ is denoted by r, and the correlation coefficient of a ____ is denoted by ρ or R.
sample; population
Frequency Distribution (or frequency table)
shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories
Descriptive statistics
summarize or describe relevant characteristics of data.
Distribution
the nature or shape of the spread of the data over the range of values (such as bell shaped, uniform or skwed(CVDOT)
Inferential statistics
used to make inferences, or generalizations, about a population
frequency polygon
uses line segments connected to points located directly above class midpoint values
Which characteristic of data is a measure of the amount that the data values vary? a. variation b. distribution c. time d. center
variation
Parameter
Describes characteristics of a population
The heights of the bars of a histogram correspond to ________ values
frequency
dotplot
A _______ is a graph of each data value plotted as a point.
Describe sampling without replacement.
Draw a notecard, note the name, do not replace the notecard and draw again. It is not possible the same student could be picked twice.
Arithmetic mean
of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations
When finding the mean of a set of data you should always do what first
put data in order!!!! median will be skewed otherwise
Methods used that summarize or describe characteristics of data are called _______ statistics.
descriptive
mode can be used for both
quantitative and qualitative
Bar Graph requires
title, axis labels, and clear scale
Which of the following is the probability that something in the sample space will occur?
1
Rejecting the null hypothesis when the null hypothesis is true is called _____________.
A type 1 error
Methods used that summarize or describe characteristics of data are called ______ statistics
Descriptive
After constructing any relative frequency distribution, what should be the sum of the relative frequencies?
1 or 100%
The variance for a sample was found to be 49. What is the sample standard deviation?
7 (square root of 49)
What does A overbear denote?
Event Upper A overbarA denotes the complement of event A, meaning that Upper A overbarA consists of all outcomes in which event A does not occur
Take x and x-intercept and add them together.
How do you find ^y?
Why is it important to learn about bad graphs?
So that we can critically analyze a graph to determine whether it is misleading
Determining outliers
Standardized values (z-scores) can be used to identify outliers. It is recommended to treat any data value with a z-score less than -3 or greater than +3 as an outlier. Such data values can then be reviewed for accuracy and to determine whether they belong in the data set.
A _____ exists between two variables when the values of one variable are somehow associated with the values of the other variable
correlation
A _______ exists between two variables when the values of one variable are somehow associated with the values of the other variable.
correlation
The value of a ____ ranges between -1 and 1.
correlation coefficient
midrange
data set = the MOC that is the value midway between maximum and minimum values in the original data set; found by adding maximum data value to minimum data value / 2
A _______ is a graph of each data value plotted as a point.
dotplot
How to find mean in odd N or n
find the middle value
For a scatterplot, the strongest correlations (r = 1.0 and r = -1.0 ) occur when data points fall exactly on a ____.
straight line
The greater the absolute value of a correlation coefficient, the ____ the linear relationship.
stronger
Class width is found by ___.
subtracting a lower class limit from the next consecutive lower class limit
How to find mean in even N or n
take the mean of the middle 2 values
relative frequency distribution
the frequency of a class is replaced with a proportion or percent.
mode
the most frequently occurring data value and is the appropriate measure of center for nominal data.
Does the frequency distribution appear to have a normal distribution? Explain.
Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric.
The Empirical Rule applies to distributions that are ________.
Symmetric and unimodal
Suppose we have 10 exam scores for an introductory statistics course. The scores are 68, 88, 84, 99, 96, 77, 76, 80, 75, 68. One of the intervals of interest is the interval 70 to 90 (where a score of 70 is included in this interval, but 90 is not). Based on the given information, what is the relative frequency for the interval 70 to 90 in this particular class?
.60
Look at #3 chart and answer the questions: What is the class width? What are the class midpoints? What are the class boundaries?
3 65.45, 68.45, 71.45, 74.45, 77.45, 80.45, 83.45, 86.45, 89.45, 92.45 63.95, 66.95, 69.95, 72.95, 75.95, 78.95, 81.95, 84.95, 87.95, 90.95, 93.95
Fill in the blank. A _______ is a graph of each data value plotted as a point.
A dotplot is a graph of each data value plotted as a point.
SUBSET
ALL THE NUMBER OF ONE SET BELONG TO ANOTHER.
In statistics, variables are ______.
Characteristics of people or things
Class Boundaries
Class Boundaries numbers are used to separate the classes so that there are no gaps in the frequency distribution.
Which sampling method divides the population up into sections, randomly selects some of those sections, then chooses all the members from the selected sections to study?
Cluster
Determine whether the given value is from a discrete or continuous data set. The time it takes a computer to complete a task. Continuous Discrete
Continuous
Determine whether the given variable is discrete or continuous: The weight of a randomly selected suitcase at O'Hare airport.
Continuous
Identify the variable as either continuous or discrete: The height of a randomly selected maple tree.
Continuous
Identify the variable as either discrete or continuous. The temperature of a randomly selected cup of coffee.
Continuous
Height of a child...
Continuous because it is not countable
Identify which type of sampling is used: To avoid working late, a quality control analyst simply inspects the first 100 items produced in a day Systematic Stratified Convenience Cluster Simple Random
Convenience
A study of an association between which ear is used for cell phone calls and whether the subject is left-handed or right-handed began with a survey e-mailed to 5000 people belonging to an otology online group, and 717 surveys were returned. (Otology relates to the ear and hearing.) What percentage of the 5000 surveys were returned? Does that response rate appear to be low? In general, what is a problem with a very low response rate?
Convert to percentage 14%. It appears to be low. It creates a serious potential for getting a biased sample that consists of those with a special interest in the topic.
z-score
Describes the exact location of a score in a distribution relative to the mean. Aka Standard Score; how many standard deviations you are away from the norm. Used to make different distributions, or metric scales, comparable.
Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Descriptive statistics; The results of the class sample are described without making any generalizations about the population of all students at the school.
What does it mean if a statistic is resistant?
Extreme values (very large or small) relative to the data do NOT affect its value substantially
outliers
Extreme values that don't appear to belong with the rest of the data.
Which measure of variation is very sensitive to extreme values?
Extreme values will affect the value of the range.
Here are 3 boxplots of weekly gas prices at a service station in the United States (price in $ per gallon). Compare the distribution of prices over the three years.
Gas prices have been increasing on average over the 3-year period, and the variation overall has been increasing as well. The distribution has been right-skewed, and there were 3 potential outliers in 2005.
We utilize statistical_____ to look for features that reveal some useful or interesting characteristics of the data set
Graphs
Piechart
Has sectors, each is proportional to each frequency of a category. Requires: key (legend), title, colors
A student wondered if more than 10% of students enrolled in an introductory Chemistry class dropped before the midterm. Suppose he performed a hypothesis test to test his claim. In the context of the problem, what would happen if the student made a Type I Error?
He claims that more than 10% of students in the introductory Chemistry class dropped before the midterm when, in fact, 10% (or less) actually dropped.
Refer to the accompanying data set and use the 30 screw lengths to construct a frequency distribution. Begin with a lower class limit of 0.720 in., and use a class width of 0.010 in. The screws were labeled as having a length of 3/4 in.
Length frequency 0.720-0.729 2 0.730-0.739 3 0.740-0.749 11 0.750-0.759 11 0.760-0.769 3
Frequency Distribution
Lists all categories of data and number of occurrences for each data category
Use the given qualitative data to construct the relative frequency distribution. The 2445 people aboard a ship that sank include 325 male survivors, 1661 males who died, 322 female survivors, and 137 who died. Find the relative frequency for male survivors, males who died, female survivors, and females who died.
Male Survivors: 13.3% Males who died: 67.9% Female survivors: 13.2% Females who died: 5.6%
The following data represent the weights (in grams) of a simple random sample of a candy. 0.90 0.87 0.83 0.92 0.90 0.86 0.86 0.87 0.81 0.84 Determine the shape of the distribution of weights of the candies by drawing a frequency histogram and computing the mean and the median. Which measure of central tendency best describes the weight of the candy?
Mean: 0.866 Median: 0.865 Which tendency described the weight of the candy better? A: Mean
The null hypothesis is always a statement about what?
Population parameter
EVENT
SUBSET OF SAMPLE SPACE.
The data are continuous because the data can take on any value in an interval.
State whether the data described below are discrete or continuous, and explain why. The exact ages in hours of different cockroaches found in a certain city.
The data are continuous because the data can take on any value in an interval (no set distance between chairs).
State whether the data described below are discrete or continuous, and explain why. The exact distances (in centimeters) between the chairs in a college classroom.
u
THE SYMBOL FOR THE POPULATION IS
TOUCH
The bars in a histogram __________.
Frequency
The frequency of a class then is the number of data values contained in a specific class.
Which of the following is NOT a characteristic of the mean?
The mean is called the average by statisticians.
What does P(B|A) represent?
The probability of event B occurring after it is assumed that event A has already occurred.
How to Find the Median
To find the median, first sort the values (arrange them in order), and then follow one of these two procedures: 1.) If the number of data values is odd, the median is the number located in the exact middle of the sorted list. 2.) If the number of data values is even, the median is found by computing the mean of the two middle numbers in the sorted list.
What is a designed experiment?
When a researcher assigns individuals to a certain group intentionally changing the value of an explanatory variable, and then recording the value of the response for each group
True or False: When comparing two populations, the larger the standard deviation, the more dispersion the distribution has, provided that the variable of interest from the two populations has the same unit of measure.
True, because the standard deviation describes how far, on average, each observation is from the typical value. A larger standard deviation means that observations are more distant from the typical value, and therefore, more dispersed.
Three cards are drawn with replacement from a standard deck, and the number of kings is noted. Does this constitute a binomial experiment? Why or why not?
Yes, because there are three independent draws. For each draw there are two outcomes (king and not king) and a constant probability of getting a king.
Look at #26 chart and answer the questions: Construct a histogram on the calculator. Do the data appear to have a distribution that is approximately normal?
Yes. It is approximately normal.
Events that are disjoint
cannot occur at the same time
The heights of the bars of a histogram correspond to _____________ values.
frequency
The heights of the bars of a histogram correspond to - values.
frequency
A correlation of 0 does not mean zero relationship between two variables; rather, it means zero ____.
linear relationship
midrange formula
maximum data value + minimum data value / 2
Variance
measure of variation equal to the square of the stand deviation.
Variation
measurement of the values varying(CVDOT)
mode
most common number in a set of date
A magician claims he can cause a coin to come up heads more than 50% of the time. A coin is flipped 50 times, and 44 heads come up. Determine the alternative hypothesis.
p>0.50
variance
total spread of data; isn't very useful 2 types - population varience: sigma squared -sample variance: S-squared
A large amount of scatter in a scatterplot is an indication that the association between the two variables is _______.
weak
mean
what we expect to happen; average sample mean: bar x; pop mean: mu
when to use mode for best measure of central tendency
when data is nominal or ordinal
multimodal
when more than two data values occur with same greatest frequency; each one is a mode
Frequency formula
# of frequencies/total frequencies *100 =
Ordinal
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Ranks of scores in a tournament. Nominal Interval Ratio Ordinal
1. When you need to find a proportion between a negative (-) & positive (+) z-score:
Go to *mean-to-z column* for each Z.; Find proportions and add together.
What type of data values are quantitative and the number of values is finite or countable? Interval Discrete Categorical Continuous
Discrete
The empirical rule
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean.
Range rule of thumb
For many data sets, the vast majority of sample values lie within 2 standard deviations of the mean.
The coefficient of determination. It is proportion of the variation in y that is explained by the linear relationship between x and y. r^2 = explained variation/total variaion
What is r^2?
The regression line; regression equation.
What is the best fitting line (10.3) that fits a scatterplot of sample data? And its equation?
The frequency distribution refers to the a data set of 30 screw lengths. The screws had been labeled as having a length of 3-3/4 in. It begins with a lower class limit of 3.720 inches and uses a class width of 0.010 inches. If displayed in a Histogram format, the data would have a left tail, or would be "skewed to the left" or "negatively skewed".
What is the class width? Would a Histogram be normal (bell shaped) or described as another term? If so, please define.
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as
a nonzero axis.
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as _______.
a nonzero axis.
A conditional probability of an event is
a probability obtained with knowledge that some other event has already occurred
Z-scores are turned into
a standard score. The purpose of z-scores is to identify and describe the exact location of each score in a distribution & to standardize an entire distribution to understand & compare scores from different tests.
Measure of center
a value at the center or middle of a data set
measure of center
a value at the center or middle of a data set. There are several different ways to determine the center, so there are different definitions of measures of center, including the mean, median, mode, and midrange
numerical summary of data is said to be resistant if...
extreme values (very large or small) relative to the data do not affect its value substantially
Q scores that separate the unusual IQ scores from those that are usual
find min and max based on mean and SD
The heights of the bars of a histogram correspond to ___ values.
frequency
We utilize statistical - to look for the features that reveal some useful or interesting characteristics of the data set.
graphs
The margin of error is _____________ the width of the confidence interval.
half
The table shows the magnitudes of the earthquakes that have occurred in the past 10 years. Use the frequency distribution to construct a histogram. Does the histogram appear to be skewed? If so, identify the type of skewness.
has a longer right tail, , skewed to the right.
right skew
high outliers; most data is on the left EX: salaries in the United States; a lot of people make similar amounts of money but few people make millions or billions
measures of spread
how far off will we be? includes range, variation and standard deviation; all are nonresistant to outliers; use in symmetric data with no outliers
Two events A and B are ___ if the occurrence of one does not affect the probability of the occurrence of the other.
independent
Biased samples
internet polls, in which people online can decide whether to respond mail-in poll, in which subjects can decide whether to reply telephone call in polls, in which newspaper, radio, or television announcements ask that you voluntarily call a special number to register your opinion
standard deviation
is a measure of how much data values deviated from the mean. A measure of variability that describes an average distance of every score from the mean.
A parameter
is a numerical summary of a population
A statistic
is a numerical summary of a sample
Population arithmetic mean, μ(pronounced "mew")
is computed using all the individuals in a population.The population mean is a parameter
The Pearson product-moment correlation coefficient only measures ____ relationships.
linear
When performing a linear regression analysis, it is important that the relationship between the two quantitative variables be _______.
linear
The ________ measures the strength of the linear correlation between the paired quantitative x- and y-values in a sample.
linear correlation coefficient r
The __________ measures the strength of the linear correlation between the paired quantitative x- and y- values in a sample.
linear correlation coefficient r
A z score (or standard score or standardized value) is the number of standard deviations, s or σ, that a given value x is above or below the mean x or μ. The z score is calculated by using one of the equations shown below.
look on desktop
left skew
low outliers; most data on the right side EX: GPA's most people are around the same GPA, few are very low
range
maximum-minimum
What measure of central tendency best describes the "center" of the distribution when the graph is symmetrical
mean
Population arithmetic mean, and it's symbol
mean computed by using all individuals in a population, symbol is "mew"
Sample arithmetic mean
mean using sample data, symbol is "x-bar"
measures of center
mean, med, mode, mid-point
A concrete mix is designed to withstand 3000 pounds per square inch (psi) of pressure. The following data represent the strength of nine randomly selected casts (in psi). 3970, 4100, 3200, 3100, 2950, 3840, 4100, 4030, 3650 Compute the mean, median and mode strength of the concrete (in psi).
mean: 3660 median: 3840 mode: 4100
A value at the center or middle of a data set is a(n) _____.
measure of center
A value at the center or middle of a data set is a
measure of center
A value at the center or middle of a data set is a(n) _______.
measure of center
When an odd number of data values are arranged in order, the _________ is the middle value.
median
What are two measures of the center of a distribution?
median and mean
Which measures of central tendencies are resistant
median and mode
Descriptive statistics
methods and tools that summarize or describe relevant characteristics of data.
inter-quartile range contains the
middle 50% of all observatoins
Formula for Midrange
midrange= (maximum data value + minimum data value) / 2
A certain group of test subjects had pulse rates with a mean of 84.1 beats per minute and a standard deviation of 14.0 beats per minute. Would it be unusual for one of the test subjects to have a pulse rate of 92.1 beats per minute? Recall that if the standard deviation is known, it can be used to find rough estimates of the minimum and maximum "usual" sample values by using the following equations.
minimum "usual" value = (mean)-2(standard deviation) maximum "usual" value = (mean)+2(standard deviation)
The measure of center that is the value that occurs with greatest frequency is the _________
mode
The measure of center that is the value that occurs with the greatest frequency is the -
mode
The measure of center that is the value that occurs with the greatest frequency is the ____.
mode
The measure of center that is the value that occurs with the greatest frequency is the _____.
mode
Sample Means of the same populations are more what?
more consistent than other types of measure of centers
A distribution of data is symmetric if
the left half of its histogram is roughly a mirror image of its right half. In this case, the mean, median, and mode are the same.
Two events A and B are independent if
the occurrence of one does not affect the probability of the occurrence of the other.
Standard deviation allows you
to see how spread out or concentrated the data in a bell curve is, should be able to pic which graphs go with which µ and "x-bar" and σ
The bars in a histogram _______.
touch A histogram is a graph consisting of bars of equal width drawn adjacent to each other (without gaps). Therefore, the bars touch
A data value is considered _______ if its z-score is less than −2 or greater than 2.
unusual
When calculating standard deviation
use the calculator and the handouts for ch. 3
Median is, symbol is
value that lies in the middle of the data when arranged in ascending order. M is the symble
Mode
variable that is most the most freequent observation, N or n's can be no mode, single mode, bimodal or multimodal
The square of the standard deviation is called the ____________
variance
The square of the st. dev. is called the ___.
variance.
The square of the standard deviation is called the
variance.
Which characteristic of data is a measure of the amount that the data values vary?
variation
The ____ linear relationship is indicated by a correlation coefficient equal to 0.
weakest
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
z-score.
Solve the problem.A variable x has a mean, μ, of 10 and a standard deviation, σ, of 7. Determine the standardized version of x.
z= (x-10)/7
Which z-score has the smallest p-value? z=0.51 z=−1.58 z=−2.37 z=−3.49
z=−3.49
Σxi
{sum of}{all x values}
What is the symbol used to represent the population mean?
μ
What is the formula to find the mean of all values in a population?
μ = Σx / N
What is the symbol for population standard deviation?
σ
Relative Frequency
Proportion of observations within a category
Is the number of hits to a website in a day discrete or continuous?
The random variable is discrete.
Is the number of people in line at a box office to purchase theater tickets discrete or continuous?
The random variable is discrete.
Is the number of people with blood type A in a random sample of 45 people discrete or continuous?
The random variable is discrete.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Length of the side of a square in cm
The ratioratio level of measurement is most appropriate because the data can becan be ordered commaordered, differences left parenthesis obtained by subtraction right parenthesisdifferences (obtained by subtraction) can becan be foundfound and areand are meaningful commameaningful, and thereand there is ais a naturalnatural startingstarting zerozero point.point.
explanatory variable; response variable.
The regression equation expresses a relationship between x (called the __________) and y (_______________).
A particular country has 60 total states. If the areas of all 60 states area added and then the sum is divided by 60, the result 193,950 square kilometers. Determine whether this result is a statistic or a parameter
The result is a parameter because it describes some characteristics of a population
Deviation score
score minus the mean = how much the score deviates from the mean.
A histogram aids in analyzing the ___ of the data.
shape of the distribution
A histogram aids in analyzing the _______ of the data.
shape of the distribution
Z-scores that seperate usual from unusual
-2 to 2
Identify the level of measurement of the data, and explain what's wrong with the calculation: In a survey, the respondents are identified as 100 for "yes", 200 for "no", 300 for "maybe", and 400 for anything else. The average is calculated for 652 respondents and the result is 256.1
-The data are at the nominal level of measurement -Such data are not counts or measures of anything, so it makes no sense to compute their average
Section 2.2 Homework
...
Identify the lower class limits, upper class limits, class width, class midpoints, and class boundaries for the given frequency distribution. Also identify the number of individuals included in the summary. 1. Identify the lower class limits. 2. Identify the upper class limits. 3. Identify the class width. 4. Identify the class midpoints. 5. Identify the class boundaries. 6. Identify the number of individuals included in the summary.
1. 100, 200, 300, 400, 500 2. 199, 299, 399, 499, 599 3. 100 4. 149.5, 249.5, 349.5, 449.5, 549.5 5. 99.5, 199.5, 299.5, 399.5, 499.5, 599.5 6. 140
Binomial probability distribution
1. 2 outcomes (yes or no answer) 2. fixed number of trials 3. same probability of success on each trial 4. independent trials (outcome of 1 doesn't effect another
TV viewing example: Compute Quartiles
1. Data in ascending order 2. Find quartiles a. Median=Q2 n=20 data values, so M=middle two data values/2 SO, Q2=M=30.5 b. Bottom half (n=10) so the median of that half=Q1 M=middle two data values/2 SO, Q1=23 c. Upper half (n=10) so the median of that half=Q3 M=middle two data values/2 SO, Q3=36.5
How do the five numbers describe data set:
1. Median describes middle of data set 2. Info about the spread: Having the IQR because you have Q3 AND Q1, you can get measure of dispersion(variation), by dividing IQR BY 2 3.xmin and xmax will give you info about the distribution, about whether or not you have outliers.
5 Number summary
1. Minimum 2. First quartile, Q1 3. Second quartile, Q2 (same as the median) 4. Third quartile, Q3 5. Maximum
Cans of regular soda have volumes with a mean of 12.31 oz. and a standard deviation of 0.11 oz. It is unusual a can to contain 12.41 oz of soda? Minimum "usual" value= -- oz Maximum "usual" value= --- oz Is 12.41 oz an "unusual" volume?
12.09 12.53 No, because it is between the minimum and maximum "usual" value.
Find the population mean or sample mean as indicated. Sample: 22, 18, 6, 13, 6
13
95% of values in a normal distribution fall within
2 Standard deviations [95-68= 27/2 = 13.5] >> (13.5% | 34% () 34%|13.5%)
The following frequency distribution shows the number of years of service for employees of the Alpha Corporation: Class Limits (years of service): frequency (# of employees) 1-5: 5 6-10: 20 11-15: 25 16-20: 10 21-25: 5 26-30: 3 What is the class width?
5
Which of the following is used to summarize two potentially related categorical variables?
A two-way table
Relative Frequencies A histogram is a graph consisting of bars of equal width drawn adjacent to each other (without gaps). A relative frequency histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies (as percentages or proportions) instead of actual frequencies.
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
scatterplot
A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
A _____________________ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
A mutual fund rating agency ranks a fund's performance by using one to five stars. A one-star mutual fund is in the bottom 20% of its investment class; a five-star mutual fund is in the top 20% of its investment class. Interpret the meaning of a four-star mutual fund.
A four-star fund is in the 4th quintile of the funds. That is, it is above the bottom 60%, but below the top 20% of the ranked funds.
The manufacturer of a certain vehicle recovery system claims that the probability that a stolen vehicle using its product will be recovered is 87%. What is the probability that exactly 9 out of 10 independently stolen vehicles with this product will be recovered?
B(n,p,x) = b(10, .87, 9).
What are Pareto charts?
Bar charts that are sorted from most frequent to least frequent
x̃
Denotes the Median.
Fill in the blank. In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as _______.
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as a nonzero axis.
Suppose babies born after a gestation period of 32 to 35 weeks have a mean weight of 2500 grams and a standard deviation of 600 grams while babies born after a gestation period of 40 weeks have a mean weight of 2900 grams and a standard deviation of 390 grams. If a 35-week gestation period baby weighs 2750 grams and a 41-week gestation period baby weighs 3150 grams, find the corresponding z-scores. Which baby weighs more relative to the gestation period?
The baby born in week 41 weighs relatively more since its z-score, .64 . 64, is larger than the z-score of .42 . 42 for the baby born in week 35.
Categorical Frequency Distribution
The categorical frequency distribution is used for data that can be placed in specific categories, such as nominal- or ordinal-level data. For example, data such as political affiliation, religious affiliation, or major field of study would use categorical frequency distributions.
Which of the following statements correctly describes the complement of event E?
The complement of event E is the set of outcomes which are in the sample space but not in event E.
Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 51.1 miles per hour.
The computed mean is not close to the actual mean because the difference between the means is morethan 5%.
Whenever a data value is less than the mean,_____.
The corresponding z-score is negative.
Are the data reported or measured?
The data appears to be measured. The heights occur with roughly the same frequency or The data appears to be reported. Certain heights occur a disproportionate number of times.
State whether the data described below are discrete or continuous, and explain why: The temperatures (in degrees Fahrenheit) of pizzas fresh the from oven
The data are continuous because the data can take any value in any interval
If the standard deviation for a data set is zero, what can you conclude about the data?
The data values must all be equal.
Look at #31 chart and answer the questions: Construct a histogram on the calculator. Which part of the histogram depicts flights that arrived early, and which part depicts flights that arrived late?
The two leftmost bars depict flights that have arrived early, and the other bars to the right depict flights that arrived late.
A magazine advertisement claims that wearing a magnetized bracelet will reduce arthritis pain in those who suffer from arthritis. A medical researcher tests this claim with 233 arthritis sufferers randomly assigned to wear either a magnetized bracelet or a placebo bracelet. The researcher records the proportion of each group who report relief from arthritis pain after 6 weeks. After analyzing the data, he fails to reject the null hypothesis. What are valid interpretations of his findings?
There were no statistically significant differences between the magnetized bracelets and the placebos in reducing arthritis pain. There's insufficient evidence that the magnetized bracelets are effective at reducing arthritis pain.
Why are percentages or rates often better than counts for making comparisons?
They take into account possible differences among the sizes of the groups.
Which of the following is NOT true about statistical graph. a. Similar graphs can be constructed in order to compare data sets. b. They utilize areas or volumes for data that are one-dimensional in nature. c. They can be used to consider the overall shape of the distribution. d. They can be used to identify extreme data values.
They utilize areas or volumes for data that are one-dimensional in nature. (Utilizing 2-or 3- dimensional pictures to represent 1- dimensional data is poor practice and distorts the data.
The interquartile range (IQR) is the difference between the _______ quartile and the _______ quartile.
Third, first
With a height of 70 in, Roger was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 75.1 in and a standard deviation of 2.4 in. a. What is the positive difference between Roger's height and the mean? b. How many standard deviations is that [the difference found in part (a)]? c. Convert Roger's height to a z score. d. If we consider "usual" heights to be those that convert to z scores between −2 and 2, is Roger's height usual or unusual?
To find the positive difference between Roger's height and the mean, subtract the mean from Roger's height and find the absolute value of the difference. 70 cm - 75.1 cm =5.1 in b. To determine how many standard deviations the difference is, compare the difference, 5.1, to the standard deviation, 2.4 5.1 Over 2.4 ≈2.13 standard deviations c. A z score is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions. Sample Population z= x- x overbar Over s or z=x- μ over σ The club is a population. Therefore, to convert Roger's height to a z score 70-75.1 divide 2.4 = -2.13
Why is random assignment used to assign people to treatment groups and control groups in a controlled experiment?
To make the groups as similar as possible, minimizing bias.
True or false? A histogram and a relative frequency histogram, constructed from the same data, always have the same basic shape.
True. A relative frequency histogram will have a different scale on the y-axis but the same shape as a regular histogram.
Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. An experiment was conducted to determine whether a deficiency of carbon dioxide in the soil affects the phenotype of peas. Listed below are the phenoytype codes where 1=smooth-yellow, 2= smooth-green, 3=wrinkled-yellow, and 4= wrinkled-green. Do the results make sense? 1 1 4 4 1 4 1 1 3 2 2 2 1 2 (a). The mean phenotype code is -- (b). The median phenotype code is - (c). The mode phenotype code is - (d). The midrange of the phenotype codes is --. Do the measures of center make sense?
a. 2.1 b. 2 c. 1 d. 2.5 Only the mode makes sense since the data is nominal.
Listed below are the top 10 annual salaries (in millions of dollars) of TV personalities. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data in millions of dollars. Given that these are the top 10 salaries, do we know anything about the salaries of TV personalities in general? Are such top 10 lists valuable for gaining insight into the larger population? 39, 35.5, 34.3, 27.5, 15, 12.4, 11.1, 9.8, 10.4, 7.6 a. The mean is --- b. The median is -- c. Select the correct choice below and fill in any answer boxes in your choice. d. The midrange is -- Given that these are the top 10 salaries, do we know anything about the salaries of TV personalities in general? Are such top 10 lists valuable for gaining insight into the larger population.
a. 20.26 b. 13.7 c. there is no mode d. 23.3 Since the sample values are the 10 highest, they give almost no information about the salaries of 10 personalities in general. No, because such top 10 lists represent an extreme subset of the population rather than the larger population.
Listed below are the durations (in hours) of a simple random sample of all flights of a space shuttle. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. Is there a duration time that is very unusual? How might that duration time be explained? 69 98 240 197 167 258 187 379 262 240 382 328 226 244 0 a. The mean is -- hours. b. The median is -- hours. c. The mode is --- hours. d. The midrange is --- hours. Is there a duration time that is very unusual? How might that duration time be explained?
a. 218.5 b. 240 c. 240 d. 191 Yes, the time of 0 hours is very unusual. It could represent a flight that was aborted.
The graph to the right compares teaching salaries of women and men at private colleges and universities. What impression does the graph create? Does the graph depict the data fairly? If not, construct a graph that depicts the data fairly. a.What impression does the graph create? b.Does the graph depict the data fairly?
a. The graph creates the impression that men have salaries that are more than twice the salaries of women. b. No, because the vertical scale does not start at zero.
Identify the symbols used for each of the following: (a) sample standard deviation; (b) population standard deviation; (c) sample variance; (d) population variance. a. The symbol for sample standard deviation is - b. The symbol for population standard deviation is - c. The symbol for sample variance is - d. The symbol for population variance is -
a. s b. theta c. s^2 d. theta^2
Arithmetic mean
adding all values of variables and dividing by number of variables
mean
an average; arithmetic mean of a set of data = the measure of center found by adding the data values and dividing the total by the number of data values
the symbol for sample standard deviation is
s
The table below shows the frequency distribution of the rainfall on 52 consecutive Saturdays in a certain city. Use the frequency distribution to construct a histogram. Do the data appear to have a distribution that is approximately normal?
No, it is not symmetric.
What measure of variation is very sensitive to extreme values?
The Range.
frequency
The _______ for a particular class is the number of original values that fall into that class.
Determine whether the data described below are qualitative or quantitative and explain why: The types of climates for different regions (tropical, arid, temperate, etc.)
The data are qualitative because they don't measure or count anything
Listed below are body temperatures (°F) of healthy adults. Why is it that a graph of these data would not be very effective in helping us understand the data? 98.6 98.6 98.0 98.0 99.0 98.4 98.4 98.4 98.4 98.6
The data set is too small for a graph to reveal important characteristics of the data.
Range
The difference between the maximum data value and the minimum data value. Very sensitive to extreme values and isn't as useful as other measures of variation.
Class Width
The difference between two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution (example pg. 47).
1) Below are the range and standard deviation for a set of data. Use the range rule of thumb and compare it to the standard deviation listed below. Does the range rule of thumb produce an acceptable approximation? Suppose a researcher deems the approximation as acceptable if it has an error less than 15%. Range equals= 38 Standard Deviations= 11.045
The estimated standard deviation is 9.5. (to get this, take the range and divide by 4)
A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the faculty want to give the community the impression that they deserve higher salaries, should they advertise the mean or median of their current salaries?
The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries.
A community college faculty is negotiating a new contract with the school board. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the faculty want to give the community the impression that they deserve higher salaries, should they advertise the mean or median of their current salaries?
The faculty should use the median to make their argument. The median will be lower than the mean since the mean is influenced by the few extremely high salaries.
Suppose you are testing someone to see whether he or she can tell butter from margarine when it is spread on toast. You use many bite-sized pieces selected randomly, half from buttered toast and half from toast with margarine. The taster is blindfolded. The null hypothesis is that the taster is just guessing and should get about half right. When you reject the null hypothesis when it is actually true, that is often called the first kind of error. The second kind of error is when the null is false and you fail to reject. Report the first kind of error and the second kind of error.
The first kind of error is saying the person can tell butter from margarine when in fact he or she cannot. The second kind of error is saying the person cannot tell butter from margarine when in fact he or she can.
Yes, it is approximately normal. The bars in a Histogram are always touching, and an (approximately) normal Histogram is bell-shaped.
The frequency distribution (above) represents frequencies of actual low temperatures recorded during the course of a 31-day month. Use the frequency distribution histogram to determine if the distribution is approximately normal?
Look at the #46 charts and answer the questions: Applying a loose interpretation of the requirements for a normal distribution, does the data appear to be normally distributed? Why or why not?
The frequency polygon appears to roughly approximate a normal distribution because the frequencies increase to a maximum, then decrease, and the graph is roughly symmetric.
What is an ogive?
A graph that represents the cumulative frequency or cumulative relative frequency for the class
The ___ of a discrete random variable represents the mean value of the outcomes.
expected value
in a variable, is the amount that it changes when the other variable changes by exactly one unit.
What is marginal change?
The value that measures how much variation in the response variable is explained by the explanatory variable is called the _______.
Coefficient of determination
5-number summary
1. Minimum. 2. First quartile, Q1. 3. Second quartile, Q2 (same as the median). 4. Third quartile, Q3. 5. Maximum.
Standardizing a distribution has two steps:
1. Original raw scores transformed to z-scores. 2. The z-scores are transformed to new X values so that the specific mew or mean & sigma/standard deviation are attained.
Find the mean for the given sample data. Unless otherwise specified, round your answer to one more decimal place than that used for the observations. The grocery expenses for six families were 55.72, 55.08, 76.11, 54.18, 63.56, 85.72 Compute the mean grocery bill. Round your answer to the nearest cent.
$65.06
Scores of an IQ test have a bell-shaped distribution with a mean of 100 and a standard deviation of 15. Use the empirical rule to determine the following. (a) What percentage of people has an IQ score between 85 and 115? (b) What percentage of people has an IQ score less than 55 or greater than 145? (c) What percentage of people has an IQ score greater than 130?
(a) 68% (b) .30% (c) 2.5%
What are three important properties of the Mean?
1. Samples means drawn fromt he same population tend to vary less than other measures of center. 2. The mean of a data set uses every data value. 3. A disadvantage of the mean is that just on outlier can change the value of the mean substantially.
Explain the meaning of the accompanying percentiles. (a) The 5th percentile of the head circumference of males 3 to 5 months of age in a certain city is 41.5 cm. (b) The 90th percentile of the waist circumference of females 2 years of age in a certain city is 49.8 cm. (c) Anthropometry involves the measurement of the human body. One goal of these measurements is to assess how body measurements may be changing over time. The following table represents the standing height of males aged 20 years or older for various age groups in a certain city in 2015. Based on the percentile measurements of the different age groups, what might you conclude?
(a)5% of 3- to 5-month-old males have a head circumference that is 41.5 cm or less (b)90% of 2-year-old females have a waist circumference that is 49.8 cm or less (c)At each percentile, the heights generally decrease as the age increases. Assuming that an adult male does not grow after age 20, the percentiles imply that adults born in 1990 are generally taller than adults at the same age who were born in 1950.
sample size
(n); the number of data values
z score
(or standardized value) the number of standard deviations that a given value x is above or below the mean. It converts a value to a standardized scale. Round-off to two decimal places.
3. When you need to find the P that is *greater* than a positive Z or a negative Z you will go to the:
*tail column*. Easy way to remember is it's the only one that doesn't include the mean.
Important Properties of the Midrange
- Because the midrange uses only the maximum and minimum values, it is very sensitive to those extremes. - In practice, the midrange is rarely used, but it has three redeeming features: 1.) The midrange is very easy to compute. 2.) The midrange helps reinforce the very important point that there are several different ways to define the center of a data set. 3.) The value of the midrange is sometimes used incorrectly for the median, so confusion can be reduced by clearly defining the midrange along with the median.
Important Properties of the Mean
- Sample means drawn from the same population tend to vary less than other measures of center. - The mean of a data set uses every data value. - A disadvantage of the mean is that just one extreme value (outlier) can change the value of the mean substantially. (Since the mean cannot resist substantial changes caused by extreme values, we say that the mean is not a resistant measure of center.)
IQ scores are measured with a test designed so that the mean is 93 and the standard deviation is 19.atider the group of IQ scores that are unusual. What are the z scores that separate the unusual IQ scores from those that are usual? What are the IQ scores that separate the unusual IQ scores from those that are usual? (Consider a value to be unusual if its z score is less than -2 or greater than 2.) What are the z scores that separate the unusual IQ scores from those that are usual? The lower z score boundary is? The higher z score boundary is? What are the IQ scores that separate the unusual IQ scores from those that are usual? The lower bound IQ score is? The higher bound IQ score is?
-2 2 55 131
Five-Number Summary
-five numbers used to summarize the data set 1.SDV-MINIMUN=xmin 2.Lower quartile=QL=Q1=P25 3.MIddle quartile =Median= M =Q2=P50 4.Upper quartile=QU=Q3=P75 5.LDV=MAXIMUM=xmax
Section 2.3 Homework
...
Section 2.4 Homework
...
z-scores
... Represents the distance that a data value is from the mean in terms of the number of standard deviations. We find it by subtracting the mean from the data value and dividing this result by the standard deviation
A frequency table of grades has five classes (A, B, C, D, F) with frequencies of 3, 10, 14, 8, and 2 respectively. What are the relative frequencies of the five classes?
.08 .27 .38 .22 .05
In a recent year the magnitudes (Richter scale) of 10,594 earthquakes were recorded. The mean is 1.315 and the standard deviation is 0.589. Consider the magnitudes that are unusual. What are the magnitudes that separate the unusual magnitudes from those that are usual? (Consider a value to be unusual if its z score is less than -2 or greater than 2.) What are the magnitudes that separate the unusual magnitudes from those that are usual? The lower bound earthquake magnitude is The higher bound earthquake magnitude is?
.137 2.493
Characteristics of mean
1. The mean is relatively reliable. 2. The mean takes every data value into account. 3. The mean is sensitive to outliers.
3 Properties of Standard Scores
1. The mean of a set of z-scores is always 0. 2. The standard distribution of a set of standardized scores is always 1. 3. The distribution of a set of standardized scores has the same shape as the original scores, the scaling is just different.
Listed below are the amounts of mercury (in parts per million, or pprm) found in tuna sushi sampled at different stores. Find the range, variance, and standard deviation for the set of data. What would be the values of the measures of variation if the tuna sushi contained no mercury? 0.93 0.38 0.87 0.59 0.68 0.15 0.41 The range of the sample data is -- ppm. Sample variance = --- ppm^2 Sample standard deviation = ---- ppm What would be the values of the measures of variation if the tuna sushi contained no mercury?
.78 .078 .280 The measures of variation would all be 0.
The sum of the deviations about the mean always equals
0 because observations greater than the mean will offset the observations less than the mean and cancel out to zero or close to zero
The data represents the daily rainfall (in inches) for one month. Construct a frequency distribution beginning with a lower class limit of 0.00 and use a class width of 0.20. Does the frequency distribution appear to be roughly a normal distribution? 0.39 0 0 0.28 0 0.56 0 0.18 0 0 1.36 0 0.16 0 0.01 0 0.16 0 0.11 0.42 0 0.01 0 0.27 0 0.11 0 0 0.15 0 Find the frequencies for daily rainfall in ranges: 0.00-0.19 0.20-0.39 0.40-0.59 0.60-0.79 0.80-0.99 1.00-1.19 1.20-1.39 Does the frequency distribution appear to be roughly a normal distribution?
0.00-0.19----------- 24 0.20-0.39------------ 3 0.40-0.59------------ 2 0.60-0.79------------ 0 0.80-0.99------------ 0 1.00-1.19------------ 0 1.20-1.39----------- 1 No, the distribution is not symmetric, the frequencies do not start off low.
John sets up a one sample z-test for proportions with a significance level of 0.05. He then performs the test and rejects the null hypothesis. The probability he correctly rejected the null hypothesis is 0.80. What is the probability of a Type I Error occurring? 0.05 0.80 0.20 A Type I Error cannot occur when the null hypothesis is rejected.
0.05
Which of the following values of the correlation coefficient indicates the weakest linear relationship between two variables?
0.1 (zero has no linear correlation, 1 has percent postive linear correlation)
Compute the coefficient of determination. Round your answer to four decimal places. A regression equation is obtained for a set of data points. It is found that the total sum of squares is 26.961, the regression sum of squares is 15.044, and the error sum of squares is 11.917.
0.5580 (r^2: coefficient of determination = SSR/SST) (Total sum of squares, SST: the total variation in the observed values of the response variable) (error sum of squares, SSE: the variation in the observed values of the response variable not explained by the regression) (Regression sum of squares, SSR: the variation in the observed values of the response variable explained by the regression)
Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. From the results of his survey, Eric obtained a 95% confidence interval of (0.52,0.68) for the proportion of all adults in the city rooting for North High. What proportion of the 150 adults in the survey said they were rooting for North High School?
0.60
Finding quartiles
1) Arrange the data in ascending order 2) Determine the median, M, or second quartile, Q2. 3) Determine the first and third quartiles, Q1 and Q3, by dividing the data set into two halves; the bottom half will be the observations below (to the left of) the location of the median. The first quartile is the median of the bottom half and the third quartile is the median of the top half.
Steps for determining a box plot
1) Determine the lower and upper fence Lower fence = Q1 - 1.5 (IQR) Upper fence = Q3 +1.5 (IQR) 2) Draw vertical lines at the Q1, M, and Q3. Enclose these lines in a box. 3) Label lower and upper fence 4) Draw a line from Q1 to the smallest data value that is larger than the lower fence. Draw a line from Q3 to the largest value that is smaller than the upper fence. 5) Any data values that are outliers (less than the lower fence and greater than the upper fence) get marked with an asterisk (*)
Normal Distribution Characteristics
1) The frequencies start low, then increase to one or two high frequencies, then decrease to a low frequency. 2) The distribution is approximately symmetric, with frequencies preceding the maximum being roughly a mirror image of those that follow the maximum (example pg. 50).
Name procedures you could follow to obtain a simple random sample of 5 students?
1)List each name on a separate piece of paper; place them all in a hat and pick five 2) Number the names from 1 to 427 and use a random number table to produce 5 different three digit numbers corresponding to the names selected
How to draw a B&W Plot
1. Determine the five-number summary (xmin,QL,M,QU,xmax) 2. Determine the outliers using the quartiles method 3. Determine the adjacent values S=smallest data value that is larger than LIF L=largest data value that is smaller than UIF S= will be less than QL L= will be larger than QU 4. Draw a horizontal number line and mark : QL,M,QU,S, and L 5. Draw vertical lines at QL, M, QU, and enclose these lines in a box 6. Connect Ql to the S and QU to the L with whiskers 7. Plot Outliers: MO with * and EO with o If data set does not have outliers (simple b&w plot): S=xmin (smallest data value) L=xmax (largest data value)
Heights of statistics students were obtained by a teacher as part of an experiment conducted for the class. The last digit of those heights are listed below. Construct a frequency distribution with 10 classes. Based on the distribution, do the heights appear to be reported or actually measured? 1. What can be said about the accuracy of the results? 0 0 0 0 0 0 0 0 0 1 2 2 3 4 4 4 5 5 5 5 5 5 5 5 5 6 6 8 8 8 9 2. Based on the distribution, do the heights appear to be reported or actually measured? 3. What can be said about the accuracy of the results?
1. Frequency: 9 1 2 1 3 9 2 0 3 1 2. The heights appear to be reported because there are disproportionately more 0s and 5s. 3. They are likely not very accurate because they appear to be reported.
Properties of z Scores
1. The number of standard deviations that a given value x is above or below the mean. 2. expressed as numbers with no units of measurement. 3. a data value is unusual if its z score is less than -2 or greater than +2. 4. If an individual data value is less than the mean, its z score is a negative number.
In a boxplot, potential outliers are points that are more than ___ IQRs from the edges of the box.
1.5
Use the magnitude (Richter Scale) of the earthquakes listed in the data set below. Find the mean and median of this data set. Is the magnitude of an earthquake measuring 7.0 on the Richter scale an outlier (data value that is very far away from the others) when considered in the context of the sample data given in this data set? Explain. 0.73 2.49 1.03 0.36 2.34 2.32 2.97 1.34 1.12 2.16 1.69 2.87 0.02 1.17 0.23 0.87 2.31 0.89 2.39 2.58 2.99 1.36 2.31 2.12 1.11 1.96 0.11 0.17 2.15 2.42 0.89 0.43 1.54 2.37 0.13 0.66 2.86 1.77 0.55 2.32 1.38 2.05 1.53 0.55 1.89 0.55 2.85 2.98 2.19 0.15 Find the mean and median of the data set using a calculator or similar data analysis technology. The mean of the data set is ----- The median of the data set is -------- Is the magnitude of an earthquake measuring 7.0 on the Richter scale an outlier when considered in the context of the sample data given?
1.564 1.615 Yes, because this value is very far away from all of the other data values.
Find the sample standard deviation for the given data. Round your final answer to one more decimal place than that used for the observations. The manager of a small dry cleaner employs six people. As part of their personnel file, she asked each one to record to the nearest one-tenth of a mile the distance they travel one way from home to work. The six distances are listed below. 24.6 14.1 39.9 48.0 18.5 17.1
13.78 mi
Six different second-year medical students at Bellevue Hospital measured the blood pressure of the same person. The systolic readings (in mmHg) are listed below. Find the range, variance, and standard deviation for the given sample data. If the subject's blood pressure remains constant and the medical students correctly apply the same measurement technique, what should be the value of the standard deviation? Range= - mmHg Sample variance= --- mmHg^2 Sample standard deviation= --- mmHg What should be the value of the standard deviation?
15 34.4 5.9 Ideally, the standard deviation would be zero because all the measurements should be the same.
The data represents the body mass index (BMI) values for 20 females. Construct a frequency distribution beginning with a lower class limit of 15.0 and use a class width of 6.0. Does the frequency distribution appear to be roughly a normal distribution? 17.7 33.5 26.9 22.5 24.9 28.9 22.8 18.3 27.8 22.6 19.2 22.4 21.2 37.7 40.4 27.7 44.9 30.3 29.1 21.7 Find the frequency for body mass indexes between: 15.0-20.9 21.0-26.9 27.0-32.9 33.0-38.9 39.0-44.9 Does the frequency distribution appear to be roughly a normal distribution?
15.0-20.9 ---------- 3 21.0--26.9 ---------- 8 27.0-32.9 -------------- 5 33.0-38.9 ---------- 2 39.0-44.9 -----------2 No, although the frequencies start low, increase to some maximum, then decrease, the distribution is not symmetric.
Twenty percent of adults in a particular community have at least a bachelor's degree. Suppose x is a binomial random variable that counts the number of adults with at least a bachelor's degree in a random sample of 100 adults from the community. If you are using the binomial probability formula, which of the following is the most efficient way to calculate the probability that fewer than 98 adults have a bachelor's degree, P(x<98)?
1−P(x=98)−P(x=99)-P(x=100)
A nurse measured the blood pressure of each person who visited her clinic. Following is a relative-frequency histogram for the systolic blood pressure readings for those people aged between 25 and 40. Use the histogram to answer the question. The blood pressure readings were given to the nearest whole number. Given that 300 people were aged between 25 and 40, approximately how many had a systolic blood pressure reading between 140 and 149 inclusive?
24 (.08*300)
Fuel consumption is commonly measured in miles per gallon (mi/gal). An agency designed new fuel consumption tests to be used starting with 2008 car models. Listed below are randomly selected amounts by which the measured MPG ratings decreased because of the new 2008 standards. Find the range, variance, and standard deviation for the sample data. Is the decrease of .4 mi/gal unusual? Why or why not? 2 2 3 1 4 2 4 1 2 2 2 2 1 2 2 2 2 2 2 2 The range of the sample data is - mi/gal. The variance of the sample data is -- The standard deviation of the sample data is --- mi/gal. Is the largest decrease, 4 mi/gal, unusual? Why or why not?
3 .6 .8 The decrease of 4 mi/gal is unusual because it is more than two standard deviations from the mean.
Look at #4 chart and answer the questions: What is the class width? What are the class midpoints? What are the class boundaries?
3 6.45, 9.45, 12.45, 15.45, 18.45 4.95, 7.95, 10.95, 13.95, 16.95, 19.95
Using the information in the table on home sale prices in the city of Summerhill for the month of June, determine the width of each class. Class limits(sale price in thousands of $): Frequency(# homes sold) 80.0-110.9: 2 111.0-141.9: 5 142.0-172.9: 7 173.0-203.9: 10 204.0-234.9: 3 235.0-265.9: 1
31
Given the following frequency distribution, how many data values were more than 28.5? Class Boundaries Frequency -------------------------------------------- 13.5-18.5 4 18.5-23.5 9 23.5-28.5 12 28.5-33.5 15 33.5-38.5 17
32
Listed below are the numbers of manatee deaths caused each year by collisions with watercraft. The data are listed in order for each year of the past decade. Find the range, variance, and standard deviation of the data set. What important feature of the data is not revealed through the different measures of variation? 90 73 83 96 73 87 75 66 98 66 The range of the sample data is -- deaths. The variance of the sample data is --- deaths^2. The standard deviation of the sample data is --- deaths. What important feature of the data is not revealed through the different measures of variation?
32 138.7 11.8 The measures of variation reveal nothing about the pattern over time.
Listed below are the arrival delay times (in minutes) of randomly selected airplane flights from one airport to another. Negative values correspond to flights that arrived early before the scheduled arrival time, and positive value represent lengths of delays. Find the range, variance, and standard deviation for the set of data. Some of the sample values are negative, but can the standard deviation ever be negative? -14 -10 5 4 -32 -11 -5 The range of the sample data is -- minutes. The variance of the sample data is ---- minutes^2 The standard deviation of the sample data is --- minutes. Some of the sample values are negative, but can the standard deviation ever be negative?
37 156.7 12.5 No, because the squared value in the standard deviation formula cannot be negative.
Listed below are the durations (in hours) of a simple random sample of all flights of a space shuttle program. Find the range, variance, and standard deviation for the sample data. Is the lowest duration time unusual? Why or why not? 80 96 234 198 164 270 199 370 259 230 380 335 225 247 0 The range of the sample data is -- hours. The variance of the sample data is ----. The standard deviation of the sample data is ---- hours. Is the lowest duration time unusual? Why or why not?
380 10987.3 104.8 Yes, because it is more than two standard deviations below the mean.
Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 50.8 miles per hour. Speed: 42-45 46-49 50-53 54-57 58-61 Frequency: 29 12 7 4 2 The mean of the frequency distribution is --- miles per hour. Which of the following best describes the relationship between the computed mean and the actual mean?
46.9 The computed mean is not close to the actual mean because the difference between the means is more than 5%.
A company advertises a mean lifespan of 1000 hours for a particular type of light bulb. If you were in charge of quality control at the factory, would you prefer that the standard deviation of the lifespans for the light bulbs be 5 hours or 50 hours? Why?
5 hours would be preferable since a smaller standard deviation indicates more consistency.
The following are amounts of time (minutes) spent on hygiene and grooming in the morning by survey respondents. Determine the 5-number summary and construct a boxplot for the data given below. 5 6 7 9 10 10 11 15 19 19 21 23 35 39 43 46 57 64 The 5-number summary is - - - - - Make a boxplot on the calculator
5, 10, 19, 39, 64
Find the mean of the data summarized in the given frequency distribution. Compare the computed mean to the actual mean of 51.4 degrees. Low Temp: 40-44 45-49 50-54 55-59 60-64 Frequency: 1 6 13 4 1 The mean of the frequency distribution is --- degrees. Which of the following best describes the relationship between the computed mean and the actual mean?
51.6 The compound mean is close to the actual mean because the difference between the means is less than 5%.
Find the third quartile Q3 of the list of 24 sorted values shown below. 30 32 35 36 37 38 43 47 47 47 48 49 52 55 55 59 59 59 61 64 69 71 75 78 The third quartile Q3 is
60
Use the regression equation to predict the y-value corresponding to the given x-value. Round your answer to the nearest tenth. The regression equation relating attitude rating (x) and job performance rating (y) for ten randomly selected employees of a company is y hat = 11.7+1.02x. Predict the job performance rating for an employee whose attitude rating is 67.
80.0
Below are the range and standard deviation for a set of data. Use the range rule of thumb and compare it to the standard deviation listed below. Does the range rule of thumb produce an acceptable approximation? Suppose a researcher deems the approximation as acceptable if it has an error less than 15%. Range= 39 Standard Deviation= 10.949 The estimated standard deviation is ----- Is this an acceptable approximation?
9.750 Yes, because the error of the range rule of thumb's approximation is less than 15%.
Heights of men on a basketball team have a bell-shaped distribution with a mean of 176 cm and a standard deviation of 5 cm. Using the empirical rule, what is the approximate percentage of the men between the following values? a. 161 cm and 191 cm b. 171 cm and 181 cm a. ---% of the men are between 161 and 191 cm. b. ----% of the men are between 171 cm and 181 cm.
99.7% 68%
Explain the difference between a bar graph and a Pareto chart.
A Pareto chart is a particular type of bar graph in which the bars are drawn in decreasing order of height.
A manufacturer of bolts has a quality-control policy that requires it to destroy any bolts that are more than 2 standard deviations from the mean. The quality-control engineer knows that the bolts coming off the assembly line have mean length of 8 cm with a standard deviation of 0.05 cm. For what lengths will a bolt be destroyed?
A bolt will be destroyed if the length is less than 7.9 7.9 cm or greater than 8.1 8.1 cm.
A difference between two groups in an observational study that can explain why the outcomes were very different between the groups is called what?
A confounding variable
An indication of no linear relationship between two variables would be:
A correlation coefficient equal to 0
Yes, as the weight increases the highway mileage decreases.
A given data table lists weights (pounds) and highway mileage amounts (mpg) for seven automobiles, and has been formatted into a scatterplot (above). Is there a linear relationship between weight and highway mileage?
Histogram
A histogram is a bar graph in which the horizontal scale represents classes of data values and the vertical scale represents frequencies. The heights of the bars correspond to the frequency values, and the bars are drawn adjacent to each other (without gaps). shows shape of distribution shows location of center shows the spread of data identifies outliers use class midpoints to separate each bar
A. The magazine has an interest in the survey results, so the source of the survey is questionable.
A magazine ran a survey about a web site for downloading music. Readers could register their responses on the magazine's web site. Identify what is wrong. Choose the correct answer below: A. The magazine has an interest in the survey results, so the source of the survey is questionable. B. The sample is a voluntary response sample, so there is a good chance that the results do not reflect the population. C. The sample is a census, so there is a good chance that the results do not reflect the population. D. It is likely that the survey used a loaded question, so the results of the survey are not reliable.
A. The sample is a voluntary response sample, so there is a good chance that the results do not reflect the population.
A magazine ran a survey about a web site for downloading music. Readers could register their responses on the magazine's web site. Choose the correct answer below. A. The sample is a voluntary response sample, so there is a good chance that the results do not reflect the population. B. It is likely that the survey used a loaded question, so the results of the survey are not reliable. C. The magazine has an interest in the survey results, so the source of the survey is questionable. D. The sample is a census, so there is a good chance that the results do not reflect the population.
How do a parameter and a statistic differ?
A parameter is a numerical measurement of a population; a statistic is a numerical measurement of a sample
What is a placebo and what purpose does it serve in an experiment?
A placebo is a fake treatment that looks like the treatment being tested in the experiment. Placebos blind subjects so they do not know whether or not they are receiving the treatment.
If foreign investment fell by 100%, it would be totally eliminated. It not possible for it to fall by more than 100 %.
A report about the decline of Western investment in third world countries included this: "After years of daily flights, several European airlines halted passenger service. Foreign investment fell 300 percent during the 1990s." What is wrong with this statement?
Systematic Sampling
A researcher selects every 732 th social security number and surveys the corresponding person. Which type of sampling did the researcher use?
What is a voluntary response sample?
A sample in which the subjects themselves decide whether to be included in the study
Fill in the blank. A _______ is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
A scatter-plot is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
What is a scatterplot and how does it help us?
A scatterplot is a graph of paired (x, y) quantitative data. It provides a visual image of the data plotted as points, which helps show any patterns in the data.
What is a scatterplot? What type of data is required for a scatterplot? What characteristic of the data can be better understood by looking at a scatterplot?
A scatterplot is a plot of paired quantitative data, and each pair of data is plotted as a single point. The scatterplot required paired quantitative data. The configuration of the plotted points can help us determine whether there is some relationship between the two variables.
The Pareto chart is more effective, it displays the information in decanting order.
A study was conducted to determine how people get jobs. The table below lists data from 400 randomly selected subjects. Compare the pie chart to the Pareto chart given on the left. Can you determine which graph is more effective in showing the relative importance of job sources?
pie chart
a graph that depicts qualitative data as slices of a circle in which the size of each slice is proportional to the frequency count for the category
Measure of center
A value at the center or middle of a data set is a(n) _______
measure of center
A value at the center or middle of a data set is a(n) _______.
Relative Frequency Distribution
A variation of the basic frequency distribution. In a relative frequency distribution, the frequency of a class is replaced with a relative frequency (a proportion) or a percentage frequency (a percent). The sum of the relative frequencies in a relative frequency distribution must be close to 1 (or 100%). *NOTE: when percentage frequencies are used, the relative frequency distribution is sometimes called a percentage frequency distribution. (example pg.49)
A bar chart and a Pareto chart both use bars to show frequencies of categories of categorical data. What characteristic distinguishes a Pareto chart from a bar chart, and how does that characteristic help us in understanding the data? A bar chart uses bars of equal width to show frequencies of categorical data. The vertical scale represents frequencies or relative frequencies. The horizontal scale identifies the different categories of qualitative data. When one wants a bar chart to draw attention to the more important categories, one can use a Pareto chart, which is a bar chart for categorical data, with the added stipulation that the bars are arranged in descending order according to frequencies. The bars decrease in height from left to right.
A. In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.
The table below shows the frequency distribution of red blood cell counts in 81 males. Red_blood_cell_count Frequency 3.00-3.49 1 3.50-3.99 6 4.00-4.49 11 4.50-4.99 19 5.00-5.49 20 5.50-5.99 15 6.00-6.49 9 6.50-6.99 3 Use the frequency distribution to construct a histogram. Using a loose interpretation of the requirements for a normal distribution, does the histogram appear to depict data that have a normal distribution? Why or why not?
A. The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is symmetric.
Which of the following is NOT a characteristic of the mean?
A. The mean is sensitive to outliers. B. The mean is relatively reliable. C. The mean takes every data value into account. D. The mean is called the average by statisticians.<--correct answer
Which of the following is NOT true about statistical graphs?
A. They utilize areas or volumes for data that are one-dimensional in nature.<-- Correct answer B. They can be used to identify extreme data values. C. Similar graphs can be constructed in order to compare data sets. D. They can be used to consider the overall shape of the distribution.
Which of the following is NOT a principle of probability? a. All events are equally likely in any probability procedure. b. The probability of any event is between 0 and 1 inclusive. c. The probability of an impossible event is 0. d. The probability of an event that is certain to occur is 1.
All events are equally likely in any probability procedure.
Frequency Distribution
All pieces of data for each category
What can be said about a set of data with a standard deviation of 0?
All the observations are the same value.
According to the Empirical Rule, ________ will be within two standard deviations of the mean.
Approximately 95% of the obesrvations
In the 2008 presidential election, 55% of the voters voted for a certain candidate. What is the probability that 75 out of 100 independently chosen voters voted for this candidate?
B(n,p,x) = b(100, .55, 75)
INTERSECTION
BOTH NUMBER HAVE IN COMMON IS _____
What are two commonly used graphs to display the distribution of a sample of categorical data?
Bar graph and pie chart
Why, in a frequency distribution, do we use the class midpoint when calculating mean?
Because we don't know the the exact values that fall into a particular class. So we just pretend that all values are equal to the class midpoint.
Which of the accompanying boxplots likely has the data with the larger standard deviation? Why?
Boxplot II likely has the data with the larger standard deviation because the boxplot appears to have a greater spread, which likely results in a larger standard deviation.
Look at #50 chart and answer the questions: In what way might the graph be deceptive? How much greater is the braking distance of Car A than the braking distance of Car C?
By starting the horizontal axis at 100, the graph cut off portions of the bars. The braking distance of Car A is about 30% greater than the braking distance of Car C.
Identify which type of sampling is used: random, systematic, convenience, stratified, or cluster. To determine customer opinion of their check dash in servicecheck-in service, American Airlines randomly selects 60 flights during a certain week and surveys all passengers on the flight.
Cluster
which car would a costumer buy based on standard deviation, range, mean, median
Car 2, because it has a lower sample standard deviation, hence more predictable gas mileage
The histogram to the right represents the weights (in pounds) of members of a certain high-school math team. What is the class width? What are the approximate lower and upper class limits of the first class? Class width = max value-min value/# of classes
Class width is the difference between two consecutive lower class limits (or two consecutive lower class boundaries) in a frequency distribution. The lower (and upper) class limits are the smallest (and largest) numbers that can belong to the different classes. The lower (and upper) class limits are the smallest (and largest) numbers that can belong to the different classes. The first lower class limit is approximately 90, and the second lower class limit is approximately 120. Determine the distance between them. 120−90=30 Therefore, the class width is 30. The approximate lower class limit of the first class is the first approximate lower class limit found above (approximately 90). The upper class limit of the first class is approximately equal to the second lower class limit, 120. Therefore, the approximate lower and upper class limits of the first class are 90 and 120, respectively.
Identify the variable as either continuous or discrete: The number of freshmen entering a randomly selected college in a certain year.
Continuous
Identify which type of sampling is being used: To avoid working late, a quality control analyst simply inspects the first 100 items produced in a day.
Convenience
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A researcher interviews 19 work colleagues who work in his building."
Convnience
Which of the following is NOT one of three common errors involving correlation? - Correlation does not imply causality. -The conclusion that correction implies causality. -The use of data based on averages. -Mistaking no linear correlation with no correlation
Correlation does not imply causality
The probability of event B occurring, given that event A has already occurred.
DESCRIBE WHAT THE P(B/A) MEAN.
Standard Deviation
Denoted by s, is a measure of how much data values deviate away from the mean. Most common measure of variation in statistics. Usually positive, zero only when all the data values are the same number. Never negative. Larger values of s indicate greater amounts of variation. Can increase dramatically with the inclusion of one or more outliers.
Suppose every student in a class is surveyed and it is reported that 75% of the class plans to take another math class. Is this an example of descriptive or inferential statistics? Explain.
Descriptive statistics; the results of the class sample are described without making any generalizations about the population of all students at the school.
B. The data are qualitative because they don't measure or count anything.
Determine whether the data described below are qualitative or quantitative and explain why. The types of food served by restaurants (Italian, Chinese, fast, etc.) Choose the correct answer below. A. The data are quantitative because they don't measure or count anything. B. The data are qualitative because they don't measure or count anything. C. The data are quantitative because they consist of counts or measurements. D. The data are qualitative because they consist of counts or measurements.
A. The data are qualitative because they don't measure or count anything.
Determine whether the data described below are qualitative or quantitative and explain why. The types of movies (drama, comedy, etc.) Choose the correct answer below. A. The data are qualitative because they don't measure or count anything. B. The data are quantitative because they consist of counts or measurements. C. The data are qualitative because they consist of counts or measurements. D. The data are quantitative because they don't measure or count anything.
The given description corresponds to an observational study.
Determine whether the given description corresponds to an observational study or an experiment. In a study of 413 women with a particular disease, the subjects were photographed daily.
The given value is a PARAMETER for the month because the data collected represent a POPULATION.
Determine whether the given value is a statistic or a parameter. A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average (mean) value is 113.3 volts.
The given value is a parameter for the month because the data collected represent a population.
Determine whether the given value is a statistic or a parameter. A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average (mean) value is 139.8 volts.
Parameter because the value is a numerical measurement describing a characteristic of a sample.
Determine whether the given value is a statistic or a parameter. A sample of seniors is selected and it is found that 25% own a computer.
The value is a PARAMETER because the value is a numerical measurement describing a characteristic of a POPULATION (refers to "all").
Determine whether the given value is a statistic or a parameter. In a study of all 3473 professors at a college, it found that 50 % own a television.
The ordinal level of measurement is most appropriate because the data can be ordered, but differences cannot be found or are meaningless.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Explain why. Ratings of hotels on a scale from 0 stars to 4 starsRatings of hotels on a scale from 0 stars to 4 stars.
D. The interval level of measurement is most appropriate because the data can be ordered, differences can be found and are meaningful, and there is no natural starting point.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Please explain why. Body temperature in degrees Fahrenheit. Choose the correct answer below. A. The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless. B. The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural starting point C. The nominal level of measurement is most appropriate because the data cannot be ordered. D. The interval level of measurement is most appropriate because the data can be ordered, differences can be found and are meaningful, and there is no natural starting point.
Quartiles (most common percentiles) --> resistant to extreme values
Divide data sets into fourths, or four equal parts. The first quartile, denoted Q1, divides the bottom 25% of the data from the top 75%. The second quartile divides the bottom 50% of the data from the top 50%, so the second quartile is equivalent to the 50th percentile, which is equivalent to the median. Finally the third percentile divides the bottom 75% of the data from the top 25%, so that the third quartile is equivalent to the 75th percentile.
Yes, it appears that births occur on the days of the week with frequencies that are about the same.
Does it appear that births occur on the days of the week with equal frequency in the cumulative frequency (above)? Let the frequencies be substantially different if any frequency is at least twice any other frequency.
The histogram has a longer right tail, so the distribution of the data is skewed to the right.
Does the histogram appear to be skewed?
Describe sampling with replacement.
Draw a notecard, note the name, replace the notecard and draw again. It is possible the same student could be picked twice.
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean.
Empirical Rule
In regression, what is predicting outside the range of the x-values from the sample data called?
Extrapolation
Class Width
Finally, the class width for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of one class from the lower (or upper) class limit of the next class
Formula for Weighted Mean
First multiply each weight (w) by the corresponding value (x), then to add the products, and finally to divide that total by the sum of the weights.
Empirical rule
For data sets having a distribution that is approximately bell-shaped these properties apply: About 68% of all values fall within 1 standard deviation of the mean. About 95% of all values fall within 2 standard deviations of the mean. About 99.7% of all values fall within 3 standard deviations of the mean.
EMPIRICAL RULE
For data sets having a distribution that is approximately bell-shaped, _______ states that about 68% of all data values fall within one standard deviation from the mean
Attempting to use the regression equation to make predictions beyond the range of the data is called _______.
extrapolation
The U.S. Department of Housing and Urban Development (HUD) uses the median to report the average price of a home in the United States. Why do you think HUD uses the median?
HUD uses the median because the data are skewed right
Look at the #44 charts and answer the questions: If someone would like to get a job, what seems to be the most effective approach?
Help-wanted ads (H)
1. If there computed linear correlation coefficient r lies in the left tail beyond the leftmost critical value or if it lies in the right tail beyond the rightmost critical value, reject Ho and conclude that there is sufficient evidence to support the claim. |r| > crit. value 2. Reject if lies between the two crit. values. |r| ≤ crit. val
How do we know there is a correlation and when to reject Ho?
3
How many decimals do we round r to?
When examining the shape of a distribution of numerical data, which of the following is not one of the three basic characteristics of a distribution's shape?
How many numbers are in the data set.
VENN DIAGRAM
INTERSECTION, UNION, COMPLIMENT
Researchers wondered if brain size has an effect on a person's IQ. From a sample of 20 individuals, the equation of the least-squares regression line is y = 71.8 + 0.0286x, where x represents the size of a brain in cubic centimeters and y represents IQ. What is the interpretation of the y-intercept?
IQ is predicted to be 71.8 for a brain size of 0 cubic centimeters.
In a typical boxplot, the length of the box indicates which measure of spread?
IQR
COMPLIMENT
IS ALL THE NUMBERS THAT DON'T BELONG TO THE SET.
Class Width: 6 Class Midpoints: 6.95, 12.95, 18.95, 24.95, 30.95 Class Boundaries: 3.95, 9.95, 15.95, 21.95, 27.95, 33.95
Identify the class width, class midpoints, and class boundaries for the given frequency distribution (above).
Lower Class Limits: 25, 30, 35, 40, 45, 50, 55 Upper Class Limits: 29,34, 39, 44, 49, 54, 59 Class Width: 5 Class Midpoints: 27, 32, 37, 42, 47, 52, 57 Class Boundaries: 24.5, 29.5, 34.5, 39.5, 44.5, 49.5, 54.5, 59.5 Number of individuals included in the summary: 93
Identify the lower class limits, upper class limits, class width, class midpoints, and class boundaries for the given frequency distribution (above). Also identify the number of individuals included in the summary.
Systematic Sampling
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A researcher selects every 762th social security number and surveys the corresponding person. What type of sampling did the researcher use? Random Convenience Systematic Stratified Cluster
It is questionable that the sponsor is a candy company because this sponsor can be greatly affected by the conclusion.
Identify what is wrong: Several studies showed that after eating chocolate, subjects had increased blood levels of antioxidants. Antioxidants have been associated with decreased risk of heart disease. A candy company financed this research.
Cluster
Identify which type of sampling is used: random, systematic, convenience, stratified, or cluster. To determine customer opinion of their check-in service, American Airlines randomly selects 3030 flights during a certain week and surveys all passengers on the flight.
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as _______.
a non-zero axis
Cluster
Identify which type of sampling is used: random, systematic, convenience, stratified, or cluster. To determine customer opinion of their check-in service, American Airlines randomly selects 60 flights during a certain week and surveys all passengers on the flights. Which type of sampling is used? Cluster Stratified Systematic Random Convenience
Determining z-score
If a data value is larger than the mean, the z-score will be positive. (occurs for observations with a value greater than the mean) If a data value is smaller than the mean, the z-score will be negative (occurs for observations less than the mean) If the data value equals the mean, the z-score will be zero Z-scores measure the number of standard deviations an observation is above or below the mean. Ex. A z-score 1.24 is interpreted as "the data value is 1.24 standard deviation above the mean." or GREATER than the mean. Ex. A z-score .5 or 1/2 , the standard deviation is LESS than the mean Ex. A z-score of 0 indicates that the value of observation is EQUAL to the mean
After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?
If percentages are used, the sum should be 100%. If proportions are used, the sum should be 1.
After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?
If percentages are used, the sum should be 100%. If proportions are used, the sum should be 1
After constructing a relative frequency distribution summarizing IQ scores of college students, what should be the sum of the relative frequencies?
If percentages are used, the sum should be 100%. If proportions are used, the sum should be 1.
Which of the following is NOT a requirement in determining whether there is a linear correlation between two variables? -Any outliers must be removed if they are known to be errors. -If r>1, then there is a positive linear correlation. -The sample of paired data is sample random sample of quantitative data. -A scatter-plot should be visually show a straight-line pattern.
If r>1, then there is a positive linear correlation
That the Zx*Zy tend to be positive. If its downhill, its the opposite.
If using z-score and Σ(ZxZy) approximate an uphill line, what does this tell us?
No, a graph cannot help to overcome the deficiency. If the sample is a bad sample, there are no graphs or other techniques that can be used to salvage the data.
If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the Internet, can a graph help to overcome the deficiency of having a voluntary response sample?
C. No, a graph cannot help to overcome the deficiency. If the sample is a bad sample, there are no graphs or other techniques that can be used to salvage the data.
If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the Internet, can a graph help to overcome the deficiency of having a voluntary response sample? Choose the correct answer below. A. No, a graph cannot help to overcome the deficiency. Before graphing, all inaccurate values and outliers must be removed from the data set. B. Yes, a graph can help to overcome the deficiency. Certain graphs that hide any specific values in the data, such as pie charts, can be used to hide deficiencies in the sampling technique. C. No, a graph cannot help to overcome the deficiency. If the sample is a bad sample, there are no graphs or other techniques that can be used to salvage the data. D. Yes, a graph can help to overcome the deficiency. Any graph that is given with a sufficiently accurate description of any deficiencies in the sampling technique is no longer considered biased.
A bar chart and a Pareto chart both use bars to show frequencies of categorical data. What characteristic distinguishes a Pareto chart from a bar chart, and how does that characteristic help us in understanding the data?
In a Pareto chart, the bars are always arranged in descending order according to frequencies. The Pareto chart helps us understand data by drawing attention to the more important categories, which have the highest frequencies.
Random Sampling
In a poll conducted by a certain research center, 1175 adults were called after their telephone numbers were randomly generated by a computer, and 34% were able to correctly identify the president. Which type of sampling did the research center use? Cluster sampling Stratified sampling Convenience sampling Systematic sampling Random sampling
If we do not reject the null hypothesis, is it valid to say that we accept the null hypothesis? Why or why not?
No, we have only shown that we do not have enough evidence to reject it.
Random Sampling
In a poll conducted by a certain research center, 1288 adults were called after their telephone numbers were randomly generated by a computer, and 36% were able to correctly identify the secretary of state. Which type of sampling did the research center use? Random Cluster Stratified Systematic Convenience
Fill in the blank. In a _______ distribution, the frequency of a class is replaced with a proportion or percent.
In a relative frequency distribution, the frequency of a class is replaced with a proportion or percent.
A. The given description corresponds to an experiment.
In a study of 442 children with a particular disease, the subjects were given certain drugs to determine if the drugs have an effect on the disease. Does the given description correspond to an observational study or an experiment? A. The given description corresponds to an experiment. B. The given description corresponds to an observational study. C. The given description does not provide enough information to answer this question.
Yes, misconduct appears to be a major factor because the majority of retractions were due to misconduct.
In a study of retractions in biomedical journals: 405 were due to error, 194 were due to plagiarism, 888 were due to fraud, 291 were due to duplications of publications, and 273 had other causes. Does the Pareto chart (above) showing such retractions, appear to show misconduct (fraud, duplication, plagiarism) as a major factor? Please explain.
Which of the following is always true? a. For skewed data, the mode is farther out in the longer tail than the median. b. Data skewed to the right have a longer left tail than right tail. c. The mean and median should be used to identify the shape of the distribution. d. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
Which of the following is always true?
In a symmetric and bell-shaped distribution, the mean, median, and mode are the same
a nonzero axis
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as __ ______________ ________.
Explain the difference between a single-blind and a double-blind experiment.
In a single-blind experiment, the subject does not know which treatment is received. In a double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received.
A(an) ______ is a person or object that is a member of the population being studied
Individual
IQR
Inner Quartile Range middle 50% of data Formula: Q3-Q1 resistant to outliers, measure of spread
What is a lurking variable?
Is an explanatory variable that was not considered in the study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables in the study
No. The data values in each class could take on any value between the class limits, inclusive.
Is it possible to identify the exact values of all of the original service times?
Suppose the equation of a least-squares regression line is y = −3.17 −2.4x. What can be said about the correlation coefficient?
It is negative, but its exact value cannot be determined from the given information.
A student randomly sampled 15 senior male students and 15 senior female students and found their grade point average through their junior year. She obtained the accompanying scatterplot. The correlation coefficient between sex and grade point average is approximately -0.254. What does this mean?
It means nothing, as the correlation coefficient should not be interpreted when one or both of the variables are categorical.
In an editorial, the Poughkeepsie Journal printed this statement: "The median price- the price exactly in between the highest and lowest -...." Does this statement correctly describe the median? Why or why not?
No. It describes the midrange, not the median.
Refer to the accompanying data set and use the 30 screw lengths to construct a frequency distribution. Begin with a lower class limit of 2.470 in., and use a class width of 0.010 in. The screws were labeled as having a length of 2 1/2 in.
Length: 2.470 - 2.479 2.480 - 2.489 2.490 - 2.499 2.500 - 2.509 2.510 - 2.519 Frequency: 1 7 9 10 3
How to solve for a relative frequency table.
Look at both frequency tables and put 0 where data is missing. Find the total of both. Divide total by number given and multiply by 100.
Below are 36 sorted ages of an acting award winner. Find Upper P10 using the method presented in the textbook.
Look at notes for print out of help to work out this problem.
Find the third quartile Q3 of the list of 24 sorted values shown below. 27 31 35 35 36 38 39 45 46 48 49 51 52 52 54 56 57 64 68 71 78 79 80 8227 31 35 35 36 38 39 45 46 48 49 51 52 52 54 56 57 64 68 71 78 79 80 82
Look at notes for print out of help to work out this problem.
population z-score
M = Mean O = Standard Deviation
looking for B in percentile formula
MUST ROUND UP if you get 7.2, use the 8th percentile if you land on a whole number, not a decimal, use the average of that number and the one above it
All methods used for visualizing distributions are based on which of the following?
Make a mark that indicates how many times each value occurred in the data set.
Identify which of these designs is most appropriate for the given experiment: completely randomized design, randomized block design, or matched pairs design. A drug is designed to treat insomnia. In a clinical trial of the drug, amounts of sleep each night are measured before and after subjects have been treated with the drug.
Matched pairs design
Formula for the Mean From a Frequency Distribution
Mean from frequency distribution: - Find the class midpoint of each class limit - Multiply each frequency and class midpoint - Add products - The number gotten from this goes on the top of the equation. --------------------------- DIVIDED BY - Sum of frequencies
An insurance company crashed four cars of the same model at 5 miles per hour. The costs of repair for each of the four crashes were $433, $440, $495, and $207 . Compute the mean, median, and mode cost of repair.
Mean: $393.75 Median:$436.5 Mode: None
A value at the center or middle of a data set is an _________?
Measure of center
There are many potential pitfalls that can cause problems when analyzing data. Which of these choices are not classified as a potential pitfall? Order of survey questions Nonresponse Self-reported data Measured data
Measured data
What is an observational study?
Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables
The value that would be right in the middle if you were to sort the data from smallest to largest is called the ______.
Median.
descriptive
Methods used that summarize or describe characteristics of data are called _______ statistics
A linear regression line was constructed relating two variables, x, and y where X is the independent variable and y is the response (dependent variable). The slope was found to be 20 and the intercept was found to be -4. Based on this information, predict the value of y when x = 2
NOT 12 ANSWER: C. 36
In publishing the results of some research work, the following values of the correlation coefficient were listed. Which one would appear to be incorrect? A. 1.2 B. 0.90 C. -0.8 D. 0
NOT C. (-0.8) ANSWER: A. 1.2
Look at #29 chart and answer the questions: Construct a histogram on the calculator. Do the data appear to have a distribution that is approximately normal?
No, it is not symmetric
Is it possible to identify the exact values of all of the original service times?
No, the data values in each class could take on any value between the class limits, inclusive.
Look at #5 chart and answer the question: Does the frequency distribution appear to have a normal distribution?
No, the distribution does not appear to be normal.
A magazine published a list consisting of the state tax on each gallon of gas. If we add the 50 state tax amounts and then divide by 50, we get 27.3 cents. Is the value of 27.3 cents the mean amount of state sales tax paid by all U.S. drivers? Why or why not?
No, the value of 27.3 cents is not the mean because the 50 amounts are all weighted equally in the calculation, but some states consume more gas than others, so the mean amount of state sales tax should be calculated using a weighted mean.
If we find that there is a linear correlation between the concentration of carbon dioxide in our atmosphere and the global temperature, does that indicate that changes in the concentration of carbon dioxide cause changes in the global temperature?
No. The presence of a linear correlation between two variables does not imply that one of the variables is the cause of the other variable.
A psychology student wishes to investigate differences in political opinions between business majors and political science majors at her college. She randomly selects 100 students from the 260 business majors and 100 students from the 180 political science majors. Does this sampling plan result in a random sample? Simple random sample? Explain.
No; no. The sample is not random because political science majors have a greater chance of being selected than business majors. It is not a simple random sample because some samples are not possible, such as a sample consisting of 50 business majors and 150 political science majors.
Can the variance of a data set ever be negative? Explain.
No; since the variance is based on the squared deviations from the mean and N, it cannot be negative.
A marketing firm does a survey to find out how many people use a product. To accomplish this, they select a random sample of one hundred people consumers and record how many use the product. Is this an observational study or an experiment?
Observational study
Suppose that you need to create a list of n values that have a specific known mean. Some of the n values can be freely selected. How many of the n values can be freely assigned before the remaining values are determined? (The result is referred to as the number of degrees of freedom.)
Of the n values, n−1 can be freely selected because the remaining value(s) can be expressed in terms of the assigned values and the known mean.
Comparing deviations
Only compare two sample standard deviations when the sample means are approximately the same.
Determine which of the four levels of measurement is most appropriate: Students' grades, A, B, or C, on a test. Interval Nominal Ordinal Ratio
Ordinal
In your own words, describe to someone who knows only a little statistics how to recognize when an observation is an outlier. What action(s) should be taken with an outlier?
Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers.
Which measure of the center (mean or median) is more resistant to outliers, and what does "resistant to outliers" mean?
Outliers are observed values far from the main group of data. In a histogram they are separated from the others by space. Outliers must be looked at in closer context to know how to treat them. If they are mistakes, they might be removed or corrected. If they are not mistakes, you might do the analysis twice, once with and once without the outliers.
After inspecting all of 55,000 kg of meat stored at the Wurst Sausage Company, it was found that 45,000 kg of the meat was spoiled. Is this value a statistic or a parameter?
Parameter
Determine whether the given value is a statistic or a parameter. Thirty percent of all dog owners poop scoop after their dog. Statistic Parameter
Parameter
Determine whether the given value is a statistic or a parameter: In a study of all 3153 seniors at a college, it is found that 50% own a computer
Parameter because the value is a numerical measurement describing a characteristic of a population
The most common correlation coefficient, called the ____ correlation coefficient, measures the strength of the linear association between variables.
Pearson product-moment
Among fatal plane crashes that occurred during the past 70 years, 269 were due to pilot error, 54 were due to other human error, 665 were due to weather, 85 were due to mechanical problems, and 479 were due to sabotage. Construct the relative frequency distribution. What is the most serious threat to aviation safety, and can anything be done about it? What is the relative frequency for pilot error, other human error, weather, mechanical problems, and sabotage? Round to one decimal point. What is the most serious threat to aviation safety, and can anything be done about it?
Pilot Error: 17.3% Other Human Error: 3.5% Weather: 42.8% Mechanical problems: 5.5% Sabotage: 30.9% Weather is the most serious threat to aviation safety. Weather monitoring systems could be improved. Whi
When we used the z-score method, we found that 77 was the only outlier, and it was an extreme one. But, what if we use the quartiles method?
Q1=23 Q2= Q3=56.5 IQR=13.5 LIF=2.75 UIF=56.75 LOF=17.5 UOF=77 OBS. BETWEEN LIF AND UIF=USUAL OBS. BETWEEN LOF AND LIF=MO OBS. BETEEN UIF AND UOF=MO OBS. BEFORE LOF AND THOSE AFTER UOF=EO 77 is at the border between MO AND EO. We can consider it a mild or extreme outlier. This examples shows that z-score method is better than quartiles method because it is even more specific. Meanwhile, quartiles method gives you the chance to designate it one of the other.
Population of country of origin is qualitative or quantitative?
Quantitative because it is a numerical measure
_______ divide data sets in fourths.
Quartiles
Identify which of these types of sampling is used: random, systematic, convenience, stratified, or cluster. A large company wants to administer a satisfaction survey to its current customers. Using their customer database, the company randomly selects 60 customers and asks them about their level of satisfaction with the company.
Random
Identify which type of sampling is being used: A pollster uses a computer to generate 500 random numbers, then interviews the voters corresponding to those numbers.
Random
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A womanwoman is selected by a marketing company to participate in a paid focus group. The company says that the woman was selected because she was randomly chosen from all adults.
Random Sampling
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. In a poll conducted by a certain research center, 718718 adults were called after their telephone numbers were randomly generated by a computer, and 89 %89% were able to correctly identify the attorney general.attorney general.
Random sampling
______ is used when subjects are assigned to different groups through a process of random selection.
Randomization
What purpose does randomization serve in an experiment?
Randomization insures that the effect of factors whose levels cannot be controlled is minimized.
In statistics, what is true of randomness?
Randomness is hard to achieve without help from a computer or some other randomizing device.
Below are the jersey numbers of 11 players randomly selected from a football team. Find the range, variance, and standard deviation for the given sample data. What do the results tell us? 26, 49, 12, 77, 55, 59, 40, 92, 70, 99, 27
Range equals=87 Sample standard deviation equals =27.9 Sample variance equals=778.4 Jersey numbers are nominal data that are just replacements for names, so the resulting statistics are meaningless.
The data are discrete because the data can only take on specific values.
State whether the data described below are discrete or continuous, and explain why. The numbers of employees working at different companies.
x̄
Represents the mean of a set of sample values.
N
Represents the number of data values in a population.
A distribution of a variable in which most of the values are relatively small but that also has a few very large values is called ________.
Right-skewed
B. It is questionable that the sponsor is a fitness equipment company because this sponsor can be greatly affected by the conclusion.
Several studies showed that after regular exercise on a treadmillafter regular exercise on a treadmill, subjects had loweredlowered blood pressure. High blood pressure has been associated with increased risk of heartblood pressure. High blood pressure has been associated with increased risk of heart disease and stroke.disease and stroke. A fitness equipment companyfitness equipment company financed this research. Choose the correct answer below. A. It is not possible to take accurate measurements. B. It is questionable that the sponsor is a fitness equipment company because this sponsor can be greatly affected by the conclusion. C. The data used in the studies is not reliable because it was not measured by the administrator. D. Since the research is composed of voluntary response samples, there may be key data points missing.
When is B&W Plot simple or extended?
Simple: Data set does not contain outliers Extended: Data set contains outliers (MO or EO)
The x-values in the table to the right are the nicotine amounts (in mg) in different 100 mm filtered, non-"light" menthol cigarettes. The y-values are the nicotine amounts (in mg) in different king-size nonfiltered, nonmenthol, and non-"light" cigarettes. xx 1.11.1 0.80.8 0.90.9 1.01.0 1.11.1 yy 1.11.1 1.31.3 1.21.2 1.11.1 1.61.6 minus− minus− minus− minus− minus− minus− minus− If suitable methods of statistics are used, it can be concluded that the average (mean) nicotine amount of the 100 mm filtered, non-"light" menthol cigarettes is less than the average (mean) nicotine amount of the king-size nonfiltered, nonmenthol, and non-"light" cigarettes. Can it be concluded that the first type of cigarette is safe? Why or why not?
Since the first type of cigarette contains less nicotine than the second type of cigarette, the first type is safer. However, it cannot be concluded that it is safe.
Lower Class Limits
Smallest numbers in each categories
The data are discrete because the data can only take on specific values.
State whether the data described below are discrete or continuous, and explain why. The numbers of children in families.
A sample of 120 employees of a company is selected, and the average age is found to be 37 years. Is this value a statistic or a parameter?
Statistic
Determine whether the given value is a statistic or a parameter. "A sample of 120 employees of a company is selected, and the average age is found to be 37 years"
Statistic
Finding Quartiles
Step 1 Arrange the data in ascending order. Step 2 Determine the median, M, or second quartile, Q2 . Step 3 Divide the data set into halves: the observations below (to the left of) M and the observations above M. The first quartile, Q1 , is the median of the bottom half, and the third quartile, Q3 , is the median of the top half.
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "49,34, and 48 students are selected from the Sophomore, Junior, and Senior classes with 496,348, and 481 students respectively"
Stratified
Identify which of these types of sampling is used: random, systematic, convenience, stratified, or cluster. To determine her breathing ratebreathing rate, Carrie divides up her day into three parts: morning, afternoon, and evening. She then measures her breathing rate at 4 randomly selected times during each part of the day.
Stratified
Identify which type of sampling is being used: 49, 34, and 48 students are selected from the Sophomore, Junior, and Senior classes with 496, 348, and 481 students respectively.
Stratified
To determine her air quality, Carrie divides up her day into three parts, morning, afternoon, and evening. She then measures her air quality at 4 randomly selected times during each part of the day. What type of sampling is this?
Stratified
Which sampling method subdivides the population into categories sharing similar characteristics and then selects a sample from each subdivision?
Stratified
What is meant by confounding?
Study occurs when the effects of TWO or MORE explanatory variable are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study
A graphical display of a data set is given. State whether the distribution is (roughly) symmetric, right skewed, or left skewed. Two dice were rolled and the sum of the two numbers was recorded. This procedure was repeated 400 times. The results are shown in the relative frequency histogram below.
Symmetric
A tax auditor selects every 1000th income tax return that is received. Identify which of these types of sampling is used Stratified Systematic Simple Random Cluster Convenience
Systematic
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A researcher selects every 221th social security number and surveys the corresponding person.
Systematic
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. To estimate the percentage of defects in a recent manufacturing batch, a quality control manager at ToshibaToshiba selects every 2020th laptoplaptop that comes off the assembly line starting with the secondsecond until she obtains a sample of 100100 laptopslaptops.
Systematic
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A sample consists of every 49th student from a group of 496 students."
Systematic
Identify which of these types of sampling is used: random, stratified, systematic, cluster, convenience. "A tax auditor selects every 1000th income tax return that is received."
Systematic
DISJOINT
THEY HAVE NOTHING IN COMMON. WHEN IT STATES (A OR B) P(A OR B) = P(A)+P(B)
Open-ended Distribution
That is, the class has no specific beginning value or no specific ending value. A frequency distribution with an open-ended class is called an open-ended distribution.
5. When you need to find the z-score that forms the boundary between 2 areas under the bell curve i.e. between top 20% & bottom 80% use:
The *Tail column* & find the proportion closest to the percentage e.g. the proportion closest to .2000; the z-score in that row is the z-score that forms that boundary.
Interquartile range
The ... IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the third and first quartiles and is found using the formula
Q1 Q2 Q3
The 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile. The 2nd quartile divides the bottom 50% of the data from the top 50% of the data, so that the 2nd quartile is equivalent to the 50th percentile, which is equivalent to the median. The 3rd quartile divides the bottom 75% of the data from the top 25% of the data, so that the 3rd quartile is equivalent to the 75th percentile.
Look at the #45 charts and answer the questions: Compare the pie chart found above to the Pareto chart given on the left. Can you determine which graph is more effective in showing the relative importance of job sources?
The Pareto char is more effective.
S=Range/4
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______
Look at #47 charts and answer the question: Applying a strict interpretation of the requirements for a normal distribution, do the depths appear to be normally distributed? Why or why not?
The frequency polygon does not appear to approximate a normal distribution because the frequencies do not increase to a maximum and then decrease, and the graph is not symmetric.
Frequency Polygon
The frequency polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points.
The (frequency) distribution appears to be SKEWED TO THE RIGHT (or positively skewed).
The given data represent the number of people from a town, aged 25-64, who subscribe to a certain print magazine. The frequency polygon graph (above) suggests the distribution is ____________ ____ ______ __________?
Determine whether the given value is a statistic or a parameter: A homeowner measured the voltage supplied to his home on all 30 days of a given month, and the average (mean) value is 131.6 volts
The given value is a parameter for the month because the data collected represented a population
Suppose you construct a graph to compare the student populations of the five largest high schools in your city and choose to depict the populations with school buildings of various sizes. If the school buildings are drawn so that the length and the width are each in proportion to the population of the corresponding schools, is the resulting graph misleading? Why or why not?
The graph will be misleading since the student populations are one-dimensional data, but the graph uses a two-dimensional school building to represent it.
In a study designed to test the effectiveness of a medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the medication taken at regular intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the study?
The group sample sizes are all large so the researchers could see the effects of the treatment.
Heights of statistics students were obtained by a teacher as part of an experiment conducted for the class. The last digit of those heights are listed below. Construct a frequency distribution with 10 classes. Based on the distribution, do the heights appear to be reported or actually measured? What can be said about the accuracy of the results?
The heights appear to be reported because there are disproportionately more 0s and 5s. They are likely not very accurate because they appear to be reported.
Fill in the blank. The heights of the bars of a histogram correspond to _______ values.
The heights of the bars of a histogram correspond to frequency values.
The histogram has a LONGER RIGHT TAIL, so the distribution of the data is SKEWED TO THE RIGHT.
The histogram has a ____________ __________ ________, so the distribution of the data is ____________ ____ ______ __________.
The table shows the magnitudes of the earthquakes that have occurred in the past 10 years. Use the frequency distribution to construct a histogram. Does the histogram appear to be skewed? If so, identify the type of skewness.
The histogram has a longer right tail, so the distribution of the data is skewed to the right.
The histogram represents 17 debate team members.
The histogram (above) represents the weights (in pounds) of members of a certain high-school debate team. How many team members are included in the histogram (above)?
Look at #27 chart and answer the questions: Construct a histogram on the calculator. Does the histogram appear to depict data that have a normal distribution?
The histogram appears to depict a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is roughly symmetric.
Look at #30 charts and answer the questions: Construct a histogram on the calculator. Does the histogram appear to depict data that have a normal distribution?
The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then decrease and the histogram is symmetric.
Look at #28 chart and answer the questions: Construct a histogram on the calculator. Does the histogram appear to depict data that have a normal distribution?
The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is symmetric.
The histogram represents 27 debate team members.
The histogram below represents the weights (in pounds) of members of a certain high-school debatedebate team. How many team members are included in the histogram?
The histogram to the right represents the weights (in pounds) of members of a certain high-school programming team. How many team members are included in the histogram?
The histogram represents 18 programming team members. (the x-axis shows weight in pounds, but the y-axis is frequency)(you would count the frequency from each bar in the histogram to get total members).
Explain the circumstances for which the interquartile range is the preferred measure of dispersion. What is an advantage that the standard deviation has over the interquartile range?
The interquartile range is preferred when the data are skewed or have outliers. An advantage of the standard deviation is that it uses all the observations in its computation.
Name two measures of the variation of a distribution, and state the conditions under which each measure is preferred for measuring the variability of a single data set.
The interquartile range is preferred when the data is strongly skewed or has outliers. The standard deviation is preferred when the data is relatively symmetric.
Lower Class Limit
The lower class limit represents the smallest data value that can be included in the class.
Which of the following is NOT a characteristic of the mean? -The mean is relatively reliable. -The mean is called the average by statisticians. -The mean is sensitive to outliers. -The mean takes every data value into account.
The mean is called average by statisticians.
A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean of the median? Why?
The mean will be likely larger BECAUSE the extreme values in the right tail tend to pull up the mean in the direction of the tail
What is the Midrange of a data set?
The measure of center that is the value midway between the max and min values in the original data set.
Which statement is NOT true regarding the median?
The median is always one of the values in the data set.
How can you tell from a boxplot if the distribution is symmetric?
The median is in the center of the box, and the left and right whiskers are approximately the same length.
How can you tell from a boxplot if the distribution is symmetric?
The median is in the center of the box, and the left and right whiskers are approximately the same length.
Definition of Mode
The most frequently occurring data value and is the appropriate measure of center for nominal data. (A data set can have one mode, more than one mode, or no mode.
If you flip a fair coin repeatedly and the first four results are tails, are you more likely to get heads on the next flip, more likely to get tails again, or equally likely to get heads or tails?
The next flip is equally likely to be heads or tails because each flip is independent of the others and the coin does not "keep track" of the past results.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Social security numbers
The nominal level of measurement is most appropriate because the data cannot be ordered.ordered.
A teacher wants to find out whether the chance of drawing a Queen is 7.7%. In the last 5 minutes of class, he has all the students draw cards replacing the previous card and shuffling between each draw until the end of class and then report their results to him. Which condition(s) for use of the binomial model is/are not met?
The number of trials is fixed.
Class Boundaries
The numbers used to separate the classes, but with out the gaps created by class limits (example pg. 47).
If an observation has a z-score of 0, this means which of the following?
The observation is equal to the mean.
Ogive
The ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution.
The accompanying data represent the percentage of recent high school graduates (graduated within 12 months before the given year-end) who enrolled in college in the fall. Construct a time-series plot and comment on any trends. Comment on any trends. Choose the correct comment below.
The percentage of high school graduates who enrolled in college has generally increased, though there have been some down years.
Suppose a researcher is testing someone to see whether she or he can tell Soda X from Soda Y, and the researcher is using 22 trials, half with Soda X and half with Soda Y. The null hypothesis is that the person is guessing. About how many should the researcher expect the person to get right under the null hypothesis that the person is guessing?
The person should get 11 right.
B. With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set? Choose the correct answer below. A. Adequate class boundaries for a histogram cannot be found with this data set. B. With a data set that is so small, the true nature of the distribution cannot be seen with a histogram. C. There must be an even number of data values in the data set to create a histogram. D. This data set would yield a histogram that is not bell-shaped.
Gaps
The presence of _______ can show that we have data from two or more different populations (example pg. 51).
A researcher is testing someone who claims to have ESP by having that person predict whether a coin will come up heads or tails. The null hypothesis is that the person is guessing and does not have ESP, and the population proportion of success is 0.50. The researcher tests the claim with a hypothesis test, using a significance level of 0.05. Fill in the blanks below with an accurate statement about the potential conclusion of this test.
The probability of concluding that the person has ESP when in fact she or he does not have ESP is 0.05.
What is the significance level of a test?
The probability of rejecting the null hypothesis when, in fact, the null hypothesis is true
Chebyshev's theorem.
The proportion of any set of data lying within K standard deviations of the mean is always at least 1-1/K^2, where K is any positive number greater than 1. Theorem applies to ANY data set. Results are only approximate. Results are lower limits ("at least"), so it has limited usefulness.
Which of the following is the best explanation to what should happen to the proportion of heads as the number of coin flips increases?
The proportion should get closer to 0.5 as the number of flips increases.
Which is relatively better: a score of 90 on a psychology test or a score of 47 on an economics test? Scores on the psychology test have a mean of 93 and a standard deviation of 12. Scores on the economics test have a mean of 52 and a standard deviation of 4.
The psychology test score is relatively better because its z score is greater than the z score for the economics test score.
Which is relatively better: a score of 58 on a psychology test or a score of 49 on an economics test? Scores on the psychology test have a mean of 8585 and a standard deviation of 10. Scores on the economics test have a mean of 58 and a standard deviation of 3.
The psychology test score is relatively better because its z score is greater than the z score for the economics test score.
Determine whether the description below corresponds to an observational study or an experiment. In a studystudy sponsored by a company, 11 comma 07911,079 people were asked what contributes most to their anxiety commaanxiety, and 37 %37% of the respondents said that it was their health.health.
The study is an observational study because the survey subjects were not given any treatment.
Which of the following does the confidence level measure?
The success rate of the method of finding confidence intervals
Determine whether the sample described below is a simple random sample. In the last yearyear, 123 comma 423123,423 adults got marriedgot married in a county. A researcher plans to conduct a survey of 800800 of those newlyweds.newlyweds. After obtaining a list of those who got married commagot married, he numbers the list from 1 to 123 comma 423 comma123,423, and then he uses a computer to randomly generate 800800 numbers between 1 and 123 comma 423.123,423. His sample consists of the newlywedsnewlyweds corresponding to the selected numbers.
The sample is a simple random sample because every sample of size 800800 has the same chance of being selected.
Determine whether the sample described below is a simple random sample. In order to test for a difference in the way that workersworkers and non dash workersnon-workers purchase magazines commamagazines, a research institution polls exactly 638638 adult workersworkers and 638638 adult non dash workersnon-workers randomly selected from adults in the United States.
The sample is not a simple random sample because every sample of size 12761276 does not have the same chance of being selected.
Determine whether the sample described below is a simple random sample. A quality control engineer selects every 5000 thevery 5000th hairdryerhairdryer that isis produced.
The sample is not a simple random sample because every sample of the same size does not have the same chance of being selected.
The histogram to the right represents the weights (in pounds) of members of a certain high-school programming team. How many team members are included in the histogram?
The sample size can be found by adding the heights of all the bars in the histogram.
Which of the following conditions regarding sample size must be met to apply the Central Limit Theorem for Sample Proportions? The sample size is large enough that the sample expects at least 10 successes and 10 failures. The sample size must be at least ½ the population size. The sample size must be at least 1/10 the population size. The samples size is large enough that the sample expects at least 50 successes and 50 failures.
The sample size is large enough that the sample expects at least 10 successes and 10 failures.
What does n denote?
The sample size, which is the number of of data values.
Describe the sample standard deviation in words rather than with a formula.
The sample standard deviation is the square root of the quotient of the sum of the squared deviations from the mean and (n - 1).
A community college school board is negotiating a new contract with the college faculty. The distribution of faculty salaries is skewed right by several faculty members who make over $100,000 per year. If the school board wants to give the community the impression that the faculty are already overpaid, should they advertise the mean or median of the faculty salaries?
The school board should use the mean to make their argument. The mean will be higher than the median since it will be influenced by the few high salaries.
A histogram aids in analyzing the _______ of the data.
The shape of the distribution
The standard deviation is used in conjunction with the _____ to numerically describe distributions that are bell shaped. The ____ measures the center of the distribution, while the standard deviation measures the ____ of the distribution.
The standard deviation is used in conjunction with the MEAN to numerically describe distributions that are bell shaped. The MEAN measures the center of the distribution, while the standard deviation measures the SPREAD of the distribution.
If all the data values in a set are identical, what can you conclude about the standard deviation?
The standard deviation is zero.
Allie calculated a correlation coefficient of -0.5. She made a mistake in her calculation since the correlation coefficient cannot be negative.
The statement is false.
Alex calculated a correlation coefficient of -1.5. He made a mistake in his calculation since the correlation coefficient has to be between -1 and 1.
The statement is true.
The lengths of the rows are similar to the heights of bars in a histogram; longer rows of data correspond to higher frequencies. Generally, stem-and-leaf plot(s) are a (visual) 90 degree rotation, representative of a histogram (lengths being equal to heights).
The stem-and-leaf plot (above) shows the test scores 67, 73, 85, 75, 89, 89, 88, 90, 98, 100. How does the stem-and-leaf plot show the distribution of these data?
The average on a exam is 72 with a standard deviation of 6. A student scores a 66 on the exam. Which of the following is correct?
The student's score is 1 standard deviation below the exam average
Indicate whether the study is an observational study or a controlled experiment. A group of boys is randomly divided into two groups. One group watches violent cartoons for one hour, and the other group watches cartoons without violence for one hour. The boys are then observed to see how many violent actions they take in the next two hours, and the two groups are compared.
The study is a controlled experiment.
Exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line.
What is linear correlation?
Identify the requirements for a discrete probability distribution.
The sum of the probabilities must equal one. Each probability must be between zero and one inclusive.
Two symbols are used for the mean: μ and x. Which represents a parameter and which a statistic?
The symbols μ represents a parameter and x represents a statistic.
The distribution appears to be skewed to the right (or positively skewed).
The the frequency polygon (above), represents data from the frequency distribution of the number of people from a town aged 25-64, who subscribe to a certain print magazine. Does the graph (above) suggest that the distribution is skewed? If so, how?
Which of the following is not a requirement of the binomial probability distribution? a. Each trial must have all outcomes classified into two categories b. The trials must be dependent. c. The procedure has a fixed number of trails. d. The probability of a success remains the same in all trails.
The trails must be dependent (For a binomial distribution, the trials must be independent.)
Which of the following is not a criterion for the binomial distribution?
The trials must be dependent.
When two dice are rolled, is the event "the first die shows a number greater than 2 on top" independent of the event "the second die shows a number greater than 2 on top?"
The two events are independent because the result of the first die does not affect the result of the second die.
The tallest living man has a height of 243 cm. The tallest living woman is 234 cm tall. Heights of men have a mean of 173 cm and a standard deviation of 7 cm. Heights of women have a mean of 162 cm and a standard deviation of 5 cm. Relative to the population of the same gender, who is taller? Explain.
The two heights are from very different populations, so a comparison requires that the heights be standardized by converting them to z scores. To determine the z score, use one of the following expressions. The variables z, x, x, s, μ,σ correspond to the z score, data value in question, sample mean, sample standard deviation, population mean, and population standard deviation, respectively. Use the Z score for population formula: Men: Z= 243-173/7=10 Women: Z= 234-162/5= 14.4 Note that the highest relative height will have a greater z score. Relative to the population of the same gender, who is taller? Why? The woman is relatively taller because her z score is greater.
Upper Class Limit
The upper class limit represents the largest data value that can be included in the class.
When applying the Central Limit Theorem for Sample Proportions, which of the following can be substituted for p when calculating the standard error if the value of p is unknown? The value of the sample proportion The value of the sample standard deviation The value of the sample mean None of these. The standard error cannot be computed if the value for p is unknown.
The value of the sample proportion
What determines the exact shape of a Normal distribution?
The values of the mean and the standard deviation
One of the tallest living men has a height of 240 cm. One of the tallest living women is 227 cm tall. Heights of men have a mean of 177 cm and a standard deviation of 6 cm. Heights of women have a mean of 163 cm and a standard deviation of 5 cm. Relative to the population of the same gender, who is taller? Explain.
The woman is relatively taller because the z score for her height is greater than the z score for the man's height.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer. -2.00, -1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviation above the mean and would correspond to the highest of the five different possible test scores.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: minus−2.00, minus−1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: −2.00, −1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
If your score on your next statistics test is converted to a z score, which of these z scores would you prefer: −2.00, −1.00, 0, 1.00, 2.00? Why?
The z score of 2.00 is most preferable because it is 2.00 standard deviations above the mean and would correspond to the highest of the five different possible test scores.
unitless
The z-score is ... It has mean 0 and standard deviation 1.
If you calculate the z-score for your height in inches, what unit is used on the z-score?
The z-score will have no units.
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income
If someone's gross annual income has a z-score of positive 2, what can be concluded?
Their income is 2 standard deviations above the mean income.
Given below are the numbers of indoor movie theaters, listed in order by row for each year. Use the given data to construct a time-series graph. What is the trend? How does this trend compare to the trend for drive-in movie theaters? What is the trend? How does this trend compare to the trend for drive-in movie theaters?
There appears to be an upward trend, unlike drive-in movie theaters, which have a downward trend.
Gina calculated a correlation coefficient between hours studied and grade point average as +0.75. Which of the following is a correct statement based on this correlation coefficient?
There is a fairly strong positive relationship between hours studied and grade point average, indicating that grade point averages tend to be higher for students who study more.
Which of the following is NOT true about statistical graphs?
They utilize areas or volumes for data that are one-dimensional in nature.
For a data set of weights (pounds) and highway fuel consumption amounts (mpg) of six types of automobile, the linear correlation coefficient is found and the P-value is 0.025. Write a statement that interprets the P-value and includes a conclusion about linear correlation.
The P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 2.5%, which is low, so there is sufficient evidence to conclude that there is a linear correlation between weight and highway fuel consumption in automobiles.
For a data set of brain volumes (cm3) and IQ scores of four males, the linear correlation coefficient is found and the P-value is 0.336. Write a statement that interprets the P-value and includes a conclusion about linear correlation.
The P-value indicates that the probability of a linear correlation coefficient that is at least as extreme is 33.6, which is high, so there is not sufficient evidence to conclude that there is a linear correlation between brain volume and IQ score in males.
Quartiles
This divides data sets into fourths, or four equal parts.
A magician claims that he has a fair coin—"fair" because both sides, heads and tails, are equally likely to land face up when the coin is flipped. He tells you that if you flip the coin eight times, the probability of getting eight heads is 1/256. Is this an example of a theoretical probability or an empirical probability? Explain.
This is an example of theoretical probability because it is not based on an experiment.
Indicate whether the following study is an observational study or a controlled experiment. Records of patients who have had broken ankles are examined to see whether those who had physical therapy achieved more ankle mobility than those who did not.
This is an observational study. Since the researchers did not assign subjects to the control or treatment group beforehand, they did not satisfy a key feature of controlled experiments
A company was conducting a survey to investigate people's spending habits and how they may have changed in recent years. One question on the survey was, "Did you spend more/less/the same amount of money this year as you did in 2007, the year the recession began in earnest in this country?" Is this question biased? If so, what answer does it favor?
This question is biased toward "spend less," since it mentions the recent recession. Many people would feel that they should answer that they spent less, since the country is in a recession.
Indicate whether the study is an observational study or a controlled experiment. A researcher was interested in the effects of exercise on academic performance in elementary school children. She went to the recess area of an elementary school and identified some students who were exercising vigorously and some who were not. The researcher then compared the grades of the exercisers with the grades of those who did not exercise.
This study is an observational study.
Stratified
To determine her air quality, Samantha divides up her day into three parts: morning, afternoon, and evening. She then measures her air quality at 33 randomly selected times during each part of the day. What type of sampling is used? Stratified Random Convenience Systematic Cluster
Stratified
To determine her heart rate, a subject divides their day into three parts: morning, afternoon, and evening. They then measure their heart rate at 22 randomly selected times during each part of the day. What type of sampling was used? Random Stratified Cluster Convenience Systematic
Decide if the following statement is true or false and explain your answer. P(Z<2.50) = P(Z ≤ 2.50)
True; these two probabilities are equal because there is no area under the standard normal curve associated with a single value.
When making predictions based on regression lines, which of the following is not listed as a consideration? -Use the regression equation for predictions only if the graph of regression line on the scatter-plot confirms that the regression line fits the point reasonably well. -Use the regression equation for prediction only if the linear correlation coefficient r indicates that there is a linear correlation between two variables. -Use the regression line for prediction only if the data go far beyond the scope of the available sample data. -If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate.
Use the regression line for prediction only if the data go far beyond the scope of the available sample data
An instructor at the College of Lake County is interested in the average number of days that CLC math students are absent from class during a semester. Let X = number of days that a CLC math student is absent. Then X is an example of a:
Variable
The correlation coefficient makes sense only if the trend is linear and the _______.
Variables are numerical
Which characteristic of data is a measure of the amount that the data values vary?
Variation
Which characteristic of data is a measure of the amount that the data values vary?
Variation
COMPLEMENT RULE
WHEN EVENTS DON'T OCCUR USE P(A) = 1-P(A)
CONTINUOUS DATA
WOULD BE ON A THERMOMETER.
Days before a presidential election, a nationwide random sample of registered voters was taken. Based on this random sample, it was reported that "52% of registered voters plan on voting for Robert Smith with a margin of error of ±3%." The margin of error was based on a 95% confidence level. Fill in the blanks to obtain a correct interpretation of this confidence interval. We are ______ confident that the _______ of registered voters _______ planning on voting for Robert Smith is between _______ and _______.
We are 95% confident that the percentage of registered voters in the nation planning on voting for Robert Smith is between 49% and 55%.
Eric randomly surveyed 150 adults from a certain city and asked which team in a contest they were rooting for, either North High School or South High School. From the results of his survey, Eric obtained the following 95% confidence interval for the proportion of all adults in the city rooting for North High, (0.52,0.68). Interpret this confidence interval.
We are 95% sure that between 52% and 68% of all adults in this city will root for North High School.
Which of the following statements about correlation is true? -We say that there is a positive correlation between x and y if there x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if there is no distinct pattern in the scatter-plot. -We say that there is a negative correlation between x and y if the x-values increase as the corresponding y-values increase. -We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values decrease.
We say that there is a positive correlation between x and y if the x-values increase as the corresponding y-values increase.
Construct the cumulative frequency distribution that corresponds to the given frequency distribution. Weight (oz): Number of stones 1.2-1.6: 5 1.7-2.1: 2 2.2-2.6: 5 2.7-3.1: 5 3.2- 3.6: 13
Weight (oz): Cumulative Frequency 1.2-1.6: 5 1.7-2.1: 7 2.2-2.6: 12 2.7-3.1: 17 3.2- 3.6: 30
1. The value of r is always between -1 and 1 inclusive. -1≤ r ≤ 1 2. If all values of either variable are converted to a different scale, the value of r does not change. 3. The value of r is not affected by the choice of x or y. Interchange all x- and y-values and the value of r will not change. 4. r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. 5. r is very sensitive to outlliers in the sense outlier can dramatically affect its value.
What are the properties of r?
1. To assume that correlation implies causality 2. Another error arises with data based on averages Averages suppress individual variation and may inflate the correlation coefficient. 3. A third error involves the property of linearity. If there is no linear correlation, there might be other correlation that is not linear.
What are the three most common errors made in interpreting correlation results?
Nonlinear relationship
What can we say about r = -.087
That if increase the x (the length) by 1 cm, the predicted height of the person will increase by 3.22 cm.
What does ^y=80.9 + 3.22x tell us?
Expresses a linear relationship between a response variable y and two or more predictor variables (x1,x2...xk) The general form of a multiple equation obtained from sample data is: ^y = b0 + b1x1 + b2x2 + ....bkxk
What is a multiple regression equation?
For a pair of sample x,y values, the residual is the difference between the observed sample value of y and the y value that is predicted by using the regression equation. resi=observed y- predicted y = y-^y
What is a residual?
The number that measures how well paired sample data fit a straight-line pattern when graphed.
What is the linear correlation coefficient r?
[St.Dev. 1]+[St.Dv. 2] = [34.7 + 13.5] = 48.2% probability
What is the probability that a randomly selected time falls between 40 and 42 seconds?
C. If the device eliminated all bike thefts, it would reduce odds of bike theft by 100%, so the 300% figure is misleading.
What is wrong with this statement: An ad for a device used to discourage bike thefts stated: "This device reduces your odds of bike theft by 300 percent." Choose the correct answer below. A. If bike theftsbike thefts fell by 100%, it would be cut in half. Thus, a decrease of 200% means that it would be totally eliminated, and a decrease of more than 200% is impossible. B. The actual amount of the decrease in bike thefts is less than 100%. C. If the device eliminated all bike thefts, it would reduce odds of bike theft by 100%, so the 300% figure is misleading. D. The statement does not mention the initial amount of bike thefts.
Ideally, the standard deviation would be zero because all the measurements should be the same.
What should be the value of the standard deviation?
Z-SCORE
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a
Z-score
When a data value is converted to a standardized scale representing the number of standard deviations the data value lies from the mean, we call the new value a _______.
Which of the following is NOT a property of the standard deviation? -The value of the standard deviation is never negative -The standard deviation is a measure of variation of all data values from the mean. -When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations. -The units of the standard deviation are the same as the unites of the original data.
When comparing variation in samples with different means, it is good practice to compare the two sample standard deviations.
Which of the following is NOT a property of the standard deviation? a. When comparing variation in samples with very different means, it is good practice to compare the two standard deviation. b. The value of the standard deviation is never negative c. The st. dev. is a measure of variation of all data values from the mean. d. The units of the st. dev. are the same as the units of the original data.
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
Which of the following is NOT a property of the standard deviation?
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
A histogram of a set of data indicates that the distribution of the data is skewed right. Which measure of central tendency will likely be larger, the mean or the median? Why?
When data are either skewed left or skewed right, there are extreme values in the tail, which tend to pull the mean in the direction of the tail. If the distribution of the data is skewed right, there are large observations in the right tail. These observations tend to increase the value of the mean, while having little effect on the median.
Fill in the blank. When drawings of objects are used to depict data, false impressions can be made. These drawings are called _______.
When drawings of objects are used to depict data, false impressions can be made. These drawings are called pictographs.
When is a Data Set Multimodal?
When more than two data values occur with the same greatest frequency, each one is a mode and the data set is said to be multimodal.
In a graph, if one or both axes begin at some value other than zero, the differences are exaggerated. This bad graphing method is known as ---.
a nonzero axis
When it refers to a normal distribution does the term "normal" have the same meaning as in ordinary language? What criterion can be used to determine whether the data depicted in a histogram have a distribution that is approximately a normal distribution? Is this criterion totally objective, or does it involve subjective judgement?
When referring to a normal distribution, the term normal has a meaning that is different from its meaning in ordinary language. A normal distribution is characterized by a histogram that is approximately bell-shaped. Determination of whether a histogram is approximately bell-shaped does require some subjective judgment.
Round-Off Rule for Measures of Variation
When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data.
Raw data
When the data are in original form, they are called raw data
Identify when the interquartile range is better than the standard deviation as a measure of dispersion and explain its advantage.
When the distribution is skewed left or right or contains some extreme observations, then the interquartile range is preferred since it is resistant.
Ungrouped Frequency Distribution
When the range of the data values is relatively small, a frequency distribution can be constructed using single data values for each class. This type of distribution is called an ungrouped frequency distribution
When is a Data Set Bimodal?
When two data values occur with the same greatest frequency, each one is a mode and the data set is bimodal.
A negative z-score indicates a data value is less than the mean.
Whenever a data value is less than the mean,
RANGE
Which measure of variation is very sensitive to extreme values?
Range
Which measure of variation is very sensitive to extreme values?
The mean is called the average by statisticians
Which of the following is NOT a characteristic of the mean?
Quantitative
Which of the following is NOT a level of measurement? Ordinal Nominal Ratio Quantitative
Quantitative
Which of the following is NOT a level of measurement? Quantitative Nominal Ordinal Ratio
C. Utilizing valid statistical methods and correct sampling techniques
Which of the following is NOT a misuse of statistics? A. Concluding that a variable causes another variable because they have some correlation B. Misleading graphs C. Utilizing valid statistical methods and correct sampling techniques D. Making conclusions about a population based on a voluntary response sample
D. Utilizing valid statistical methods and correct sampling techniques
Which of the following is NOT a misuse of statistics? A. Misleading graphs B. Making conclusions about a population based on a voluntary response sample C. Concluding that a variable causes another variable because they have some correlation D. Utilizing valid statistical methods and correct sampling techniques
When comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations.
Which of the following is NOT a property of the standard deviation?
MEAN
Which of the following is NOT a value in the 5-number summary?
B. Quiz scores from a college level statistics course are analyzed to determine student progress.a Not voluntary (and no bias).
Which of the following is NOT a voluntary response sample? A. A radio station asks for call-in responses to a question concerning city recycling. B. Quiz scores from a college level statistics course are analyzed to determine student progress. C. A local dentist asks her patients to fill out a questionnaire and mail it back to determine the quality of the care received during an office visit. D. A survey is taken at a mall by asking passersby if they will fill out the survey.
A. Quiz scores from a college level statistics course are analyzed to determine student progress.
Which of the following is NOT a voluntary response sample? A. Quiz scores from a college level statistics course are analyzed to determine student progress. B. A radio station asks for call-in responses to a question concerning city recycling. C. A survey is taken at a mall by asking passersby if they will fill out the survey. D. A local dentist asks her patients to fill out a questionnaire and mail it back to determine the quality of the care received during an office visit.
When thinking about the variability of a categorical distribution, it is sometimes useful to think of the word_______.
diversity
B. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same.
Which of the following is always true?. A. For skewed data, the mode is farther out in the longer tail than the median. B. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same. C. The mean and median should be used to identify the shape of the distribution. D. Data skewed to the right have a longer left tail than right tail.
The frequency distribution below shows arrival delays for airplane flights. Arrival_delay_(min) Frequency (-60)-(-31) 11 (-30)-(-1) 28 0-29 11 30-59 0 60-89 2 Use the frequency distribution to construct a histogram. Which part of the histogram depicts flights that arrived early, and which part depicts flights that arrived late?
Which part of the histogram depicts flights that arrived early, and which part depicts flights that arrived late? The two leftmost bars depict flights that arrived early, and the other bars to the right depict flights that arrived late.
The population of ages at inauguration of all U.S. Presidents who had professions in the military is 62, 46, 68, 64, 57. Why does it not make sense to construct a histogram for this data set?
With a data set that is so small, the true nature of the distribution cannot be seen with a histogram.
Does the result appear to have a normal distribution? Why or why not?
Yes, because the frequencies start low comma reach a maximum comma then become low again, and are roughly symmetric about the maximum frequency.
Look at #6 chart and answer the question: Does the frequency distribution appear to have a normal distribution? Explain.
Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric.
Look at #11 at the charts and answer the question: Does the result appear to have a normal distribution? Why or why not?
Yes, because the frequencies start low, reach a maximum, then become low again, and are roughly symmetric about the maximum frequency.
Does the frequency distribution appear to have a normal distribution? Explain. Temperature_ (F) Frequency 35-39 1 40-44 3 45-49 10 50-54 12 55-59 10 60-64 2 65-69 1
Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric.
The graph to the right uses cylinders to represent barrels of oil consumed by two countries. Does the graph distort the data or does it depict the data fairly? Why or why not? If the graph distorts the data, construct a graph that depicts the data fairly.
Yes, because the graph incorrectly uses objects of volume to represent the data.
Look at the #41 charts and answer the questions: Does the configuration of the points appear to suggest that the volumes are from a population with a normal distribution? Are there any outliers?
Yes, the population appears to have a normal distribution because the dotplot resembles a "bell" shape. Yes, the volume of 50 oz appears to be an outlier because it is far away from the other volumes.
An education expert is researching teaching methods and wishes to interview teachers from a particular school district. She randomly selects ten schools from the district and interviews all of the teachers at the selected schools. Does this sampling plan result in a random sample? Simple random sample? Explain
Yes; no. The sample is random because all teachers have the same chance of being selected. It is not a simple random sample because some samples are not possible, such as a sample that includes teachers from schools that were not selected.
Suppose a student earns a 75 on his statistics exam, and his grade has a z-score of 1.5. Since the class did not perform well on the exam, the professor announces that she will adjust the grades by adding 10 points to each score. How will this adjustment change the student's z-score?
Your z-score will not change since the adjustment shifts the entire distribution of scores but does not change the relative position of your score in the class.
99.7% within 3 Standard deviation
[99.7-95= 4.7/2 = 2.35% >> [2.35% |..|.. () ..|..|2.35%]
Mean from frequency distribution
[Sigma(f*x)]/(Sigma*f). First multiply each frequency and class midpoint; then add the products. DIVIDED BY. Sum of frequencies.
median
a data set = the MOC that is the middle value when the original data values are arranged in order of increasing/decreasing magnitude
Identify the class width, class midpoints, and class boundaries for the given frequency distribution. Daily Low Temperature (degrees°F) 40-42 43-45 46-48 49-51 52-54 55-57 58-60 Frequency 1 3 5 11 7 7 1 a) What is the class width? b) What are the class midpoints? c) What are the class boundaries?
a) 3 b) 41, 44, 47, 50, 53, 56, 59 c) 39.5, 42.5, 45.5, 48.5, 51.5, 54.5, 57.5, 60.5
Identify the class width, class midpoints, and class boundaries for the given frequency distribution. Height (inches) 65.0-68.9 69.0-72.9 73.0-76.9 77.0-80.9 81.0-84.9 85.0-88.9 89.0-92.9 93.0-96.9 97.0-100.9 101.0-104.9 Frequency 4 25 9 1 0 0 0 0 0 1 a) What is the class width? b) What are the class midpoints? (Use ascending order. Round to two decimal places as needed.) c) What are the class boundaries? (Use ascending order. Round to two decimal places as needed.)
a) 4 b) 66.95, 70.95, 74.95, 78.95, 82.95, 86.95, 90.95, 94.95, 98.95, 102.95 c) 64.95, 68.95, 72.95, 76.95, 80.95, 84.95, 88.95, 92.95, 96.95, 100.95, 104.95
Identify the class width and class midpoints. Height (inches) Frequency 59.0-61.9 4 62.0-64.9 25 65.0-67.9 9 68.0-70.9 1 71.0-73.9 0 74.0-76.9 0 77.0-79.9 0 80.0-82.9 0 83.0-85.9 0 86.0-88.9 1
a). What is the class width? 3 b.)What are the class midpoints? 60.45, 63.45, 66.45, 69.45 ,72.45 ,75.45, 78.45, 81.45,84.45, 87.45
Which of the following is NOT true about statistical graphs? a. They utilize areas or volumes for data that are one-dimensional in nature. b. Similar graphs can be constructed in order to compare data sets. c. They can be used to identify extreme data values. d. They can be used to consider the overall shape of the distribution.
a.
A _______ is a graph of each data value plotted as a point.
dot plot
A ________ is a graph of each data value plotted as a point
dotplot
The classical approach to probability requires that the outcomes are
equally likely
Listed below are the measured radiation emissions (in W/kg) corresponding to cell phones: A, B, C, D, E, F, G, H, I, J, and K respectively. The media often present reports about the dangers of cell phone radiation as a cause of cancer. Cell phone radiation must be 1.6 W/kg or less. Find the a. mean, b. median, c. midrange, d. mode for the data. Also complete part e. 1.47 1.46 1.38 0.26 0.57 0.92 0.44 0.67 0.55 0.36 1.56 a. find the mean b. Find the median. c. Find the midrange. d. Find the mode. e. If you are planning to purchase a cell phone, are any of the measures of center the most important statistic? Is there another statistic that is most relevant? If so, which one?
a. .876 b. .67 c. .91 d. There is no mode. e. The maximum data value is the most relevant statistic, because it is closest to the limit of 1.6 W/kg and that cell phone should be avoided.
A particular group of men have heights with a mean of 173 cm and a standard deviation of 7 cm. Richard had a height of 199 cm. a. What is the positive difference between Richard's height and the mean? b. How many standard deviations is that [the difference found in part (a)]? c. Covert Richard's height to a z score. d. If we consider "usual" heights to be those that convert to z scores between -2 and 2, is Richard's height usual or unusual?
a. 26 cm b. 3.71 c. 3.71 d. Unusual
With a height of 61 in., George was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 65.6 in and a standard deviation of 1.7 in. a. What is the positive difference between George's height and the mean? b. How many standard deviations is that [the difference found in par (a)]? c. Convert George's height to a z score. d. If we consider "usual" heights to be those that convert to z scores between -2 and 2, is George's height usual or unusual?
a. 4.6 in. b. 2.71 c. -2.71 d. Unusual
An insurance institute conducted tests with crashes of new cars traveling at 6 mi/h. The total cost of the damages was found for a simple random sample of the tested cars and listed below. Find the (a) mean, (b) median, (c) mode, and (d) midrange for the given sample data. Do the different measures of center differ very much? $7,526 $4,949 $9,127 $6,403 $4,287 a. The mean is ---. b. The median is ---. c. Find the mode. d. The midrange is ----. Do the different measures of center differ very much?
a. 6458.4 b. $6,403 c. There is no mode d. $6707 The different measures of center do not differ by very large amounts.
The graph to the right shows the braking distances for different cars measured under the same conditions. Describe the ways in which this graph might be deceptive. How much greater is the braking distance of Car A than the braking distance of Car C? Draw the graph in a way that depicts the data more fairly. a. In what way might the graph be deceptive? b. How much greater is the braking distance of Car A than the braking distance of Car C?
a. By starting the horizontal axis at 100, the graph cuts off portions of the bars. b. The braking distance of Car A is about 40% greater than the braking distance of Car C.
Response bias
exist when the answers on a survey do not reflect the true feelings of the respondent
Nonresponse bias
exists when individuals selected to be in the sample who do not respond to the surgery have different opinions from those who do
a. A statistics class with 36 students is arranged so that there are 6 rows with 6 students in each row, and the rows are numbered from 1 through 6. A die is rolled and a sample consists of all students in the row corresponding to the outcome of the die. b. For the same class described in part (a), the 36 student names are written on 36 individual index cards. The cards are shuffled and six names are drawn from the top. c. For the same class described in part (a), the six youngest students are selected.
a. This sample is not a simple random sample. It is a random sample. b. This sample is a simple random sample. It is a random sample. c. This sample is not a simple random sample. It is not a random sample.
With a height of 70in, Roger was the shortest president of a particular club in the past century. The club presidents of the past century have a mean height of 75.1in and a standard deviation of 2.4in. a.What is the positive difference between Roger's height and the mean? b.How many standard deviations is that? [the difference found in part (a)]? c.Convert Roger's height to a z score. d.If we consider "usual" heights to be those that convert to z scores between minus−2 and 2, is Roger's height usual or unusual? **when you enter the formulas, change the negative signs on all answers. They should all be positive. Instead of -9.1, the answer would be 9.1
a. To find the positive difference between Roger's height and the mean, subtract the mean from Roger's height and find the absolute value of the difference. I 70cm-75.1cm I = 5.1in b. To determine how many standard deviations the difference is, compare the difference, 5.1, to the standard deviation, 2.4. 5.1/2.4 = 2.13 c. A z score is the number of standard deviations that a given value x is above or below the mean. It is found using the following expressions. - uses notes for formula - since the club is a population, use the population formula. 70-75.1/2.4 = -2.13 d. The z score is -2.13. Since "usual" heights are considered to be those that convert to z scores between −2 and 2, Roger's height is unusual.
Use the pulse rates (beats per minute) of males in the accompanying data set to construct a frequency distribution. Begin with a lower class limit of 40 and use a class width of 10. Do the pulse rates of males appear to have a normal distribution? b.) Do the pulse rates of males appear to have a normal distribution?
a.) Pulse Rate Frequency 40-49 2 50-59 21 60-69 55 70-79 41 80-89 27 90-99 5 100-109 2 b.)The pulse rates of males appear to have a normal distribution because the frequencies start low, increase, and then decrease; and are roughly symmetric.
The table below shows the magnitudes of the earthquakes that have occurred in the past 10 years. magnitude Frequency 5.0-5.9 6 6.0-6.9 6 7.0-7.9 13 8.0-8.9 4 9.0-9.9 2 Use the frequency distribution to construct a histogram. Using a loose interpretation of the requirements for a normal distribution, does the histogram appear to depict data that have a normal distribution? Why or why not?
a.) Does the histogram appear to depict data that have a normal distribution? The histogram appears to roughly approximate a normal distribution. The frequencies generally increase to a maximum and then decrease, and the histogram is symmetric.
xi means
all x values
outliers
are sample values that lie very far away from the majority of the other sample values.
Standard Deviation
average distance from the mean, square root of variance; used much more often - Population STD: sigma - Sample STD: S
Which of the following is always true? a. For skewed data, the mode is farther out in the longer tail than the median. b. In a symmetric and bell-shaped distribution, the mean, median, and mode are the same. c. Data skewed to the right have a longer left tail than right tail. d. The mean and median should be used to identify the shape of the distribution.
b.
A graphical display of a data set is given. Identify the overall shape of the distribution as (roughly) bell-shaped, triangular, uniform, reverse J-shaped, J-shaped, right skewed, left skewed, bimodal, or multimodal. A relative frequency histogram for the heights of a sample of adult women is shown below.
bell shaped
Before using the normal model to represent a data set, first check that the shape of the data's distribution is what shape?
both symmetric and unimodal
Which of the following is NOT a measure of center? a. mode b. mean c. census d. median
c.
Which of the following is NOT a measure of center? -census -mean -median -mode
census
Which of the following is NOT a measure of center?
census
Time
changing characteristics of the data over time (CVDOT)
____ is the difference btw two consecutive lower class limits or two consecutive lower class boundaries in a frequency distribution.
class width
The process of representing categorical variables with numbers (such as letting a 1 represent "smoker" and a 0 represent "non-smoker") is called _______.
coding
The ___________ for a set of nonnegative sample or population data, expressed as a percent, describes the standard deviation relative to the mean.
coefficient of variation
A ___ probability of an event is a probability obtained with knowledge that some other event has already occurred.
conditional (knowledge)
Descriptive statistics
consists of organizing and summarizing information
2. When you need to find a proportion between 2 positive OR 2 negative z-scores, you:
consult the *mean to z column* for both. Find proportions & subtract the smaller from the larger.
Data are more than just numbers, because data have ______.
context
Which of the following is NOT a characteristic of the mean? a. The mean is sensitive to outliers. b. The mean is relatively reliable. c. The mean takes every data value into account. d. The mean is called the average by statisticians.
d.
Which of the following is NOT a characteristic of the mean? a.) The mean is sensitive to outliers. b.) The mean is relatively reliable. c.) The mean takes every data value into account. d.) The mean is called the average by statisticians.
d.) The mean is called the average by statisticians.
mode
data set = the value that occurs with the greatest frequency
Methods used that summarize or describe characteristics of data are called _______ statistics.
descriptive
If every x value is transformed into a z-score, then the distribution of z-scores will have what following properties regarding shape, mean, and standard deviation?
distribution of z-scores will have exactly the same shape as original distribution of scores; z-score mean will always have mean of 0 & z-scores will always have standard deviation of 1.
Below are 36 sorted ages of an acting award winner. Find Upper P using the method presented in the textbook. 30 18,18,19,21,22,25,26,26,29,31,32,34,37,41,42,42,43,45,47,49,51,5,51,52,55,58,58,59,62,63,64,65,67,74,74,76
next compute L=(k Over 100)times n where n is the total number of values in the data set and k is the percentile being used. n=36 k= 30 30/100*36 10.8 L=11 p 30=32
In modified boxplots, a data value is a(n) - if it is above Q3 + (1.5)(IQR) or below Q1 - (1.5)(IQR)
outlier
In modified boxplots, a data value is a(n)_______ if it is above Q3_(1.5)(IQR) or below Q1-(1.5)(IQR).
outlier
In modified box plots, a data value is a(n) _______ if it is above Q3+(1.5)(IQR) or below Q1−(1.5)(IQR).
outlier
In modified boxplots, a data value is a(n) _______ if it is above Q+(1.5)(IQR) or below Q−(1.5)(IQR). 3 1
outlier
In modified boxplots, a data value is a(n) _______ if it is above Q3plus+(1.5)(IQR) or below Q1minus−(1.5)(IQR).
outlier
- are sample values that lie very far away from the majority of the other sample values.
outliers
Correlation is affected by ____.
outliers
________are sample values that lie very far away from the majority of the other sample values.
outliers
unusual
outside 2 standard deviations
A health and fitness club reviews the weights of all of their members, and found that the average weight was 148 lb. Is this value a statistic or a parameter?
parameter
When drawings of objects are used to depict data, false impressions can be made. These drawings are called _______.
pictographs
When drawings of objects are used to depict data, false impressions can be made. These drawings are called
pictographs.
When drawings of objects are used to depict data, false impressions can be made. These drawings are called _______.
pictographs.
The most appropriate graphical display of categorical data is
pie chart
In statistics, the data we work with is just one part of a bigger picture called the ____.
population
N means
population
ρ = [1 / N] Σ { [ (xi - μx) / σx][ (Yi - μY) / σy ] } is the equation for what?
population correlation coefficient
What do each of the symbols mean in the equation ρ = [1 / N] Σ { [ (xi - μx) / σx][ (Yi - μY) / σy ] }? ρ = ____ N = ____ ∑ = ____ xi = ____ μx = ____ σx = ____ yi = ____ μy = ____
population correlation coefficient; population observation number; sum of; observation x; x population mean; population standard deviation; observation y; y population mean
"Mu" [µ] means
population mean
σ
population standard deviation
A ___________ is the complete collection of all measurements or data collected, whereas, a __________ is a subcollection of members selected from the complete collection. population; sample sample; population sample; census population; parameter
population; sample
A ____ correlation means that if one variable gets bigger, the other variable tends to get bigger.
positive
r = Σ(xy) / √[ ( Σ x² )( Σ y² ) ] is the formula for what?
product-moment correlation coefficient
What do each of the symbols mean in the formula r = Σ(xy) / √[ ( Σx² )( Σy² ) ] r = ____ ∑ = ____ x = ____ (formula) y = ____ (formula)
product-moment correlation coefficient; sum of; = (xi - ẋ); = (yi - ẏ)
percentile
provided information about how the data are spread over the interval from the smallest value to the largest value. (Recall the median divides the lower 50% of a set of data from the upper 50%. The median is a special case of a general concept called the percentile)
Mode is primarily a measure of
qualitative central tendency
Obtain the linear correlation coefficient for the data. Round your answer to three decimal places. Managers rate employees according to job performance (x) and attitude (y). The results for several randomly selected employees are given below. X: 59, 63, 65, 69, 58, 77, 76, 69, 70, 64, Y:72, 76, 78, 82, 75, 87, 92, 83, 87, 78
r=0.863 (edit-list-stat-linreg(ax+b)
What measure of variation is very sensitive to extreme values?
range
range rule of thumb to find SD
range/4
Modified Boxplots
regular boxplot constructed with these modifications: 1. A special symbol used to identify outliers. 2. The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier.
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency
In a -- distribution, the frequency of a class is replaced with a proportion or percent.
relative frequency
In a ___ distribution, the frequency of a class is replaced with a proportion or percent.
relative frequency
A _______ histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies instead of actual frequencies.
relative frequency histogram A relative frequency histogram has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies (as percentages or proportions) instead of actual frequencies.
stem plots
represents quantitative data by separating each value into two parts: the stem (the leftmost digit) and the leaf ( the rightmost digit)
Because the median is not affected by the size of an outlier and does not change even if a particular outlier is replaced by an even more extreme value, we say the median is _____ to outliers.
resistant
How to solve for a cumulative frequency distribution.
rewrite the table as less than the lower class boundary of the next class. EX: 20-29, 30-39 you would use "less than 30" Make sure to add the frequencies less than the number next to the phrase "less than"
In a boxplot, if the median is to the left of the center of the box and the right whisker is substantially longer than the left whisker, the distribution is skewed_______
right
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
s = range / 4
The Range Rule of Thumb roughly estimates the st. dev. of a data set as ___.
s = range/4
the symbol for sample variance is
s squareds2
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
s= range/4
The Range Rule of Thumb is a rough estimate of the standard deviation of a data set as _______
s=range/4
The Range Rule of Thumb roughly estimates the standard deviation of a data set as _______.
s=range/4
the symbol sample variance is
s^2
The _______ is/are a subset of the population that is being studied.
sample
r = [ 1 / (n - 1) ] Σ { [ (xi - x) / sx ] [ (yi - y) / sy ] } is the equation for what?
sample correlation coefficient
What do each of the variables mean in the equation r = [ 1 / (n - 1) ] Σ { [ (xi - ẋ) / sx ] [ (yi - y) / sy ] }? r = ____ n = ____ ∑ = ____ xi = ____ ẋ = ____ sx = ____ yi = ____ ẏ = ____ sy = ____
sample correlation coefficient; sample observation number; sum of; x observation; x mean; x standard deviation; y observation; y mean; y standard deviation
"x-bar" means
sample mean
s
sample standard deviation
Biased estimator
sample standard deviation, s, is this of the population standard deviation, little sigma. Values of the sample standard deviation s do NOT target the value of the population standard deviation little sigma. values of s generally tend to underestimate the value of little sigma.
What is s2 the symbol for?
sample variance
The correlation becomes weaker as the data points become more s____.
scattered
A - is a plot of paired data (x,y) and is helpful in determining whether there is a relationship between the two variables.
scatterplot
A _______ is a plot of paired data (x,y) and is helpful in determining wheather there is a relationship between the two varaibles
scatterplot
Fill in the blank. A histogram aids in analyzing the _______ of the data.
shape of the distribution A histogram is a visual tool used to represent and analyze data. It is basically a graphic version of a frequency distribution, and it can show the center, variation, and the shape of the distribution of the data.
z-score
standardized score; compare a data point to peers formula: data-mean/ standard deviation positive z-score means above average; negative z-score means below average units for z-scores are standard deviation, so anything can be compared anything above or below 2 is unusual
4. When you need to find the P for an area *greater than* a negative Z or *Less than* a positive Z use:
the *Body column*. Because the body column includes the mean & the tail.
For data sets having a distribution that approximately bell-shaped, ______ states that about 68% of all data values fall within one standard deviation from the mean.
the Empirical Rule
For data sets having a distribution that is approx. bell-shaped, ___________ states that about 68% of all data values fall within one standard deviation from the mean.
the Empirical Rule
For data sets having a distribution that is approximately bell-shaped, --- states that all about 68% of all data values fall within one standard deviation from the mean.
the Empirical Rule
Percentile
the Kth percentile, denoted Pk of a set of data is a value such that K percent of the observations are less than or equal to the value represented by the percentile, like class rank but the percentil starts from low to high, so 5th percentile is 5% of population has this or less and so forth, 95th is the top 95% of the data, and 95% of individuals and this number or less
Whenever a data value is less than the mean, ------------ (Hint: pertaining to z-score)
the corresponding z-score is negative
Whenever a data value is less than the mean, ______.
the corresponding z-score is negative
Whenever a data value is less than the mean
the corresponding z-score is negative
Whenever a data value is less than the mean, _______.
the corresponding z-score is negative
Normal When graphed, a normal distribution has a "bell" shape. Characteristics of the bell shape are (1) the frequencies increase to a maximum, and then decrease, and (2) symmetry, with the left half of the graph roughly a mirror image of the right half. Next Question
A(n) _______ distribution has a "bell" shape.
Fill in the blank. A(n) _______ distribution has a "bell" shape.
A(n) normal distribution has a "bell" shape.
Suppose the list below shows how many text messages Elyse sent each day for the last 10 days. If Elyse wants to know how many text messages she typically sends each day, which measure of central tendency better describes the typical number of text messages per day?
Median; The median of 27.5 is a better representative of the center since it is resistant to the one extreme value. The mean of 33.3 is not representative of the typical number of texts since only one number is larger than the mean.
If we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the Internet, can a graph help to overcome the deficiency of having a voluntary response sample?
No, a graph cannot help to overcome the deficiency. If the sample is a bad sample, there are no graphs or other techniques that can be used to salvage the data.
The table provided below shows paired data for the heights of a certain country's presidents and their main opponents in the election campaign. Construct a scatterplot. Does there appear to be a correlation?
No, there does not appear to be a correlation because there is no general pattern to the data.
Construct a scatter diagram using the data table to the right. This data is from a study comparing the amount of tar and carbon monoxide (CO) in cigarettes. Use tar for the horizontal scale and use carbon monoxide (CO) for the vertical scale. Determine whether there appears to be a relationship between cigarette tar and CO.
Yes, as the amount of tar increases the amount of carbon monoxide also increases.
Does the frequency distribution appear to have a normal distribution? Explain. Temperature (degrees°F) 40-44 45-49 50-54 55-59 60-64 65-69 70-74 Frequency 1 3 10 12 10 2 1
Yes, because the frequencies start low, proceed to one or two high frequencies, then decrease to a low frequency, and the distribution is approximately symmetric. Bell-shaped.
The graph to the right uses cylinders to represent barrels of oil consumed by two countries. Does the graph distort the data or does it depict the data fairly? Why or why not? If the graph distorts the data, construct a graph that depicts the data fairly. Does the graph distort the data? Why or why not?
Yes, because the graph incorrectly uses objects of volume to represent the data.