Math-11 Final
Whiskers
(1.5 * IQR) away from Q1 and Q3 Lower whisker: Q1 - (1.5 * IQR) Upper whisker: Q3 + (1.5 * IQR)
Correlation cannot reveal
causation
Mean of a Probability Distribution
the long-term average of many trials of a statistical experiment
Sampling Frame
Universe you will be picking from
histogram symmetry
When the left and right halves of a histogram look similar/the same
Baye's Theorem
P(A|B) = [P(B|A)*P(A)]/(P(B))
range
(max value)-(min value)
Twelve teachers attended a seminar on mathematical problem solving. Their attitudes were measured before and after the seminar. A positive number change attitude indicates that a teacher's attitude toward math became more positive. The twelve change scores are as follows:3; 8; -1; 2; 0; 5; -3; 1; -1; 6; 5; -2 What is the median change score?
1.5 First order the numbers. Since there are an even number of numbers, the median is the average of the middle two numbers. The median is 1.5.
For the set of scores 0, 2, 6, 8, the population variance is ______ .
10
Find the population standard deviation of the scores 0,2,2,2,2,3,5,8.
2.291288
Find the population variance of the scores 0,2,2,2,2,3,5,8.
5.25
Common Confidence intervals and critical values
90% = 1.645 95% = 1.966 99% = 2.576
When r is close to 0, the correlation is
weak (little to no correlation)
Spread of distribution
where does most of the data lie?
How to find line of best fit
model error = sum of the absolute value of residuals OR sum of (residuals)^2
Sample
A (hopefully) representative subset of your population e.g. A spoonful of soup from the top of the pot
Hypothesis
A claim that may or may not be true
Expected Value of a Coin
A coin toss where 0 represents landing heads and 1 represents landing tails has expected value 0.5. If I flip a coin many many times then the average outcome is likely to be 0.5 (half heads and half tails).
Discrete
A random variable is discrete if it has a finite number of outcomes or a countable number of outcomes.
Probability Distribution
A table, graph, or formula that shows all the possible outcomes and their probabilities.
As the size of a sample grows, the sampling distributions tends to look
more and more normal
Describe non-sampling error.
An issue that affects the reliability of sampling data other than natural variation; it includes a variety of human errors including poor study design, biased sampling methods, inaccurate information provided by study participants, data entry errors, and poor analysis.
Standard Deviation
The standard deviation measures an average distance from the mean if the experiment is run many many times.
For a binomial experiment with 20 trials P(r < 4) = P(r > 16).
False
Expected Value of a density function
E(X) = The integral from -∞ to ∞ of x*f(x)dx
Expected value for a discrete random variable
E(X) = sum of x for [P(x)*x] (the sum of all (each outcome multiplied by its probability)
Adding random variables (no constants)
E(X±Y) = E(X) ± E(Y) if X and Y are independent variables: Var(X±Y) = Var(X)+Var(Y) (always +, never -!) SD(X±Y) = sqrt(Var(X)+Var(Y)) (always +, never -!)
How does intensity of skew affect the difference between mean and median?
Lower skew = lower difference between mean and median. Greater skew = greater difference.
Know the symbols for both Statistics and Parameter:
Proportion, mean, SD, correlation, regression coefficient
Consider the boxplot below. boxplot with five point summary: 43,58,61,67,74 What quarter has the smallest spread of data? What is that spread? What quarter has the largest spread of data? What is that spread? Find the Inter Quartile Range (IQR): Which interval has the most data in it? What value could represent the 52nd percentile?
Second 3 First 15 9 57-61 63
Methods for Marginal and Joint Probabilities
Venn diagrams, contingency tables, P(A or B) rule, P(A and B) rule.
Confidence statement format
We are (C%) confident that the (population parameter) is in the (confidence interval)
Standard deviation
how far each value is from the mean
Exclusive "or"
often used in real life. A or B means: 1. A, but not B 2. B, but not A
don't assume your data are all part of
one homogeneous population. think about possible subgroups to make analysis better.
sample space
the set of ALL possible outcomes
95% of all point estimates are
within (+/-) (2)*(SE) of p
Predictor variable
x-axis variable that predicts y.
Regression line equation
ŷ = b0 + b1 * x
The probability of any particular outcome happening is
0 This is because the integral from a to a of f(x) = 0
For the set of scores 0, 0, 1, 1, 1, 2, 3, 5, 5, 5, 6, the mode is ______ .
1 and 5
For the set of scores 0, 0, 1, 1, 1, 2, 3, 5, 5, 5, 6, the median is ______ .
2
multimodel histogram
3+ peaks
Suppose A and B are independent evets, then P(A and B)
=P(A) * P(B)
Normal Distribution (aka the Bell Curve)
Density function is too hard to calculate. Usually given or computed with technology. -Mean is in the middle of the bell curve. Increasing and decreasing values past the mean are evenly distributed before and after the mean, resulting in a bell curve.
Scaling random variables
E(aX) = aE(X) Var(aX) = a^2 * Var(X) SD(aX) = |a| * SD(X)
Midterm 2 material
Midterm 2 material
The sampling distribution is a normal curve with model
N( μ, σ/sqrt(n) )
Determine the level of measurement for the following variable: Number of desks in a classroom
Ratio
Marginal Probabilities
The probability of one value of a categorical variable occurring (A)
The symbol -x represents the mean of the sample.
True
Uniform Distribution
a continuous random variable (RV) that has equally likely outcomes over the domain.
A random variable should always be a
numeric outcome, NEVER a probability
unimodel histogram
one peak
Scatterplot
one variable on x-axis (predictor), other on y-axis (response).
the median is resistant to
outliers and skew
bimodel histogram
two peaks
Interpolation
using your model to predict a new y value for an x value that is within the span of x data in your model
The average salary for American college graduates is $46,000. You suspect that the average is less for graduates from your college. The 43 randomly selected graduates from your college had an average salary of $44,519 and a standard deviation of $14,692. What can be concluded at the 0.05 level of significance? H0: mu.gif = 46000 Ha: mu.gif 46000
<
Answer the following True or False: For a binomial experiment with 20 trials P(r < 4) = P(r > 16).
False
If the correlation is 0, then there is no relationship between the two variables.
False
Since the area under the normal curve within two standard deviations of the mean is 0.95, the area under the normal curve that corresponds to values greater than 2 standard deviations above the mean is 0.05.
False
The symbol O-x represents the standard deviation of a sample of size n.
False
The symbol U-x represents the mean of a sample of size n.
False
interquartile range (IQR)
(upper quartile)-(lower quartile)
Americans receive an average of 17 Christmas cards each year. Suppose the number of Christmas cards is normally distributed with a standard deviation of 6. Let X be the number of Christmas cards received by a randomly selected American. Round all answers to two decimal places. A. X ~ N( , ) B. If an American is randomly chosen, find the probability that this American will receive no more than 11 Christmas cards this year. C. If an American is randomly chosen, find the probability that this American will receive between 10 and 20 Christmas cards this year. D. 63% of all Americans receive at most how many Christmas cards?
17 6 .16 .57 19l
Forty randomly selected students were asked the number of pairs of sneakers they owned. Let X = the number of pairs of sneakers owned. The results are as follows: Sneakers Frequency 1 2 2 5 3 8 4 12 5 12 6 0 7 1 Round your answers to two decimal places. The mean is: The median is: The sample standard deviation is: The first quartile is: The third quartile is: What percent of the respondents have had fewer than 4 pairs of sneakers? % 67.5% of all respondents have had at most how many pairs of sneakers?
3.775 4 1.29 3 5 37.5 4 The first five answers should come directly from the calculator. Please watch the video if you have troubles with this. The sixth question just involves adding up those below 4 and dividing by the total. The last one involves multiplying the percent by the total then see what number corresponds to that ranking.
X ~ N(30,10). Suppose that you form random samples with sample size 4 from this distribution. Let xBar be the random variable of averages. Let ΣX be the random variable of sums. Round all answers to two decimal places. A. xBar~ N( , ) B. P(xBar<30) = 0.81 C. Find the 95th percentile for the xBar distribution. D. P(xBar > 36)= E. Q3 for the xBar distribution = 33.37
30 5 .5 38.22 .12 33.37
Expected Value
If the experiment is run many many times, then it is very likely that the average value of x will be very close to the expected value expected arithmetic average when an experiment is repeated many times; also called the mean. Notations: μ. For a discrete random variable (RV) with probability distribution function P(x),the definition can also be written in the form μ = ∑xP(x).
Understanding p-value
If the p-value is below 0.05, there is less than a 5% chance for that probability to be observed given that the center of the distribution is the mean given by the conditions of the null hypothesis.
When multiplying data by a value Y, how are the statistics affected?
Minimum value = Y*min Maximum value = Y*max Mean = Y*mean Median = Y*median SD = |Y|*SD IQR = |Y|*IQR
When multiplying data by a value Y and adding a value X, how are the statistics affected?
Minimum value = Y*min + X Maximum value = Y*max + X Mean = Y*mean + X Median = Y*median + X SD = |Y|*SD IQR = |Y|*IQR
When adding a value X to a data, how are the statistics affected?
Minimum value = min + X Maximum value = max + X Mean = mean + X Median = mean + X SD = SD (unaffected) IQR = IQR (unaffected)
For the set of scores 0, 2, 6, 8, the mode is ______ .
There is no mode
Determine the level of measurement for the following variable: Nationality
Nominal
Determine the level of measurement for the following variable: Phone company
Nominal
When using the central limit theorem with n = 100, it is not necessary to assume the distribution of the population data is normally distributed.
True
Difference in means (p1 and p2): Hypothesis test for two independent samples
Use tdf = (x̄1-x̄2)-0 / SE df=min(n1-1, n2-1) no pooling necessary
Difference in proportions (p̂1 and p̂2): Hypothesis test for two independent samples
Use z = (p̂1-p̂2)-0 / SEpooled where SEpooled = sqrt(p̂pooled * q̂pooled)/n1 + (p̂pooled * q̂pooled)/n2)
Variance (σ^2)
Var(X) = Sum of all x: (x-μ)^2 * P(x)
Alternate hypothesis (HA)
What you expect might be true. Opposite to the null hypothesis. e.g. The drug produces more treatment than the placebo: p(drug) > p(placebo)
One-sided alternative hypothesis
When you are excited with only one side; the better side. You are hoping your % is on a certain side of the comparison %. P(a) > P(b) or P(a) < P(b)
There are many curves in the t-distribution family
With n data points in sample, you use t-distribution with df (degrees of freedom) = n-1
E(X±Y) = E(X) ± E(Y) is true even if
X and Y are dependent
The choose symbol (X nCr Y)
X nCr Y: Helps you calculate how many ways there are to list X successes among Y attempts. Formula: (n!)/[k! * (n-k0!)]
If x = Exp(λ):
X represents how long we will have to wait before an event with rate λ occurs. E(x) = 1/λ (The probability that we have to wait X years before the event occurs) Var(x) = 1/(λ^2) SD(x) = 1/λ
Formula Review
X ~ B(n, p) means that the discrete random variable X has a binomial probability distribution with n trials and probability of success p. X = the number of successes in n independent trials n = the number of independent trials X takes on the values x = 0, 1, 2, 3, ..., n p = the probability of a success for any trial q = the probability of a failure for any trial p + q = 1 q = 1 - p The mean of X is μ = np. The standard deviation of X is σ = √npq.
Statistical Inference
The attempt to say something about the population parameter given a particular sample statistic (i.e. point estimate)
Chapter Review
The central limit theorem can be used to illustrate the law of large numbers. The law of large numbers states that the larger the sample size you take from a population, the closer the sample mean x⎯⎯gets to μ.
decay parameter
The decay parameter describes the rate at which probabilities decay to zero for increasing values of x. It is the value m in the probability density function f(x) = me(-mx) of an exponential random variable. It is also equal to m = 1/μ , where μ is the mean of the random variable.
Describe what is meant by a binomial (or Bernoulli) distribution.
The distribution of the result of an experiment with -A fixed number of trials -The trials are independent -Each trial results in success or failure -The probability of success, p, is the same for each trial.
The histogram below shows the number of times that students in a statistics class have been to London. Check each of the following that are true statements.
The mean is greater than the median. This histogram is skewed right. A relative frequency histogram would also have the same shape, just a different scale on the vertical axis.
The histogram below shows the distribution of a recent exam. Check each of the following that are true statements.
The median is greater than the mean. A relative frequency histogram would also have the same shape, just a different scale on the vertical axis. This histogram is skewed left.
From the SEpi equation: what does se^2/n + se^2 tell us?
The more spread that exists around our line (i.e., the bigger se) the less confident we are in our prediction. Having more data helps reduce SEPI but this can only help so much.
Describe Sampling Error.
The natural variation that results from selecting a sample to represent a larger population; this variation decreases as the sample size increases, so selecting larger samples reduces sampling error.
Uniform Distribution Continuous Variable
The number of seconds after the exact minute that classes end follows a uniform distribution.
Power of a test
The power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. P(reject H0|H0 is false) = 1 - β or P=P(fail|HA is true) = 1-P(success|HA is true)
Joint probabilities
The probability of two things joining forces and happening simultaneously (A and B)
p-value
The probability under the curve of the test statistic z (recall that z = [(μ) - (μ0) / SE] For one-tailed test: p-value = P(z>z0) If HA is on the right tail or p-value = P(z<z0) if HA is on the left tail For two-tailed test: p=value = 1-(2*P(z<z0)) (<-most common) = 2*P(z>z0) = 2*P(z<-z0)
The statistics below describe the data collected by a business person who researched the purchases of 65 customers. mu = 27.52, Median=24.96, Sigma=6.34, Q1=18.34, Q3=35.72, n=65 What is the IQR? Check all that apply.
The range that contains the middle half of the data The range between Q1 and Q3 18.34 to 35.72
Outliers
data that stands apart from the distribution
median
data value in the middle of the list of data
Reasons for using the complement rule
it is often easier to calculate the complement of something.
If the value xf, i.e. height of 6 feet, then the prediction interval of yf for a person with the same height xf is:
yf +/- t*n-2 * SE(PI) where SEpi = sqrt([SE(b1)]^2 * (xf - xhat)^2 + se^2/n + se^2)
b0
ŷ-b1(x) this is the x-intercept ( value of ŷ when x=0 )
Favorite Baseball Team
qualitative
Time in Line to Buy Groceries
quantitative - continuous
b1
r(stdev y / stdev x) this is the slope ( value that y increases by for every unit that x is increased by )
One of the best ways to avoid bias is by introducing
random elements into the sampling process e.g. Stir the pot before tasting the soup
Continuous random variable
random quantity that can take on any value on a continuous scale ("a smooth interval of possibilities") e.g. The amount of water you drink in a day, how long you wait for a bus, how far you live from the nearest grocery store.
Three ideas for measuring spread
range, interquartile range, the five number summary
Median on a histogram
same amount of area on both sides
bias
sample is not representative of the population in some way -good sampling is about reducing as much bias as possible
if x and y are 2 independent random variables with normal distributions, then
x-y is also normal. also, since x and y are independent, var(x-y) = var(x) + var(y) and thus SD(x-y) =sqrt(var(x-y)) = sqrt(var(x)+var(y)) = sqrt(SD(x)^2 + SD(y)^2)
The standard deviation measures the average distance of all values from the mean.
True
The symbol O-x represents the population standard deviation of all possible sample means from samples of size n.
True
The symbol O^p represents the standard deviation of all possible sample proportions from samples of size n.
True
The symbol U-x represents the population mean of all possible sample means from samples of size n.
True
The symbol U^p represents the mean of all possible sample proportions from samples of size n.
True
Confidence Interval formula for t-distribution
x̄ +/- t*n-1 *(SE(x)) SE(x) = sigma/sqrt(n) = Sx/sqrt(n)
Response variable
y-axis variable that is predicted by x.
The mean amount of time it takes a kidney stone to pass is 14 days and the standard deviation is 6 days. Suppose that one individual is randomly chosen. Let X=time to pass the kidney stone. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that a randomly selected person with a kidney stone will take longer than 21 days to pass it. C. Find the minimum number for the upper quarter of the time to pass a kidney stone. days.
14 6 .12 1805
The histogram below shows the distribution of a recent exam. What might be the standard deviation for these exam scores?
15 Since the bulk of the data is between 70 and 100, this range covers around two standard deviations: 30/2 = 15.
The amount of coffee that people drink per day is normally distributed with a a mean of 16 ounces and a standard deviation of 5 ounces. 23 randomly selected people are surveyed. Round all answers to two decimal places. A. xBar~ N( , ) B. For the 23 people, find the probability that the average coffee consumption is between 15 and 18 ounces per day. C. What is the probability that one randomly selected person drinks between 15 and 18 ounces of coffee per day? D. Find the IQR for the average of 23 coffee drinkers. Q1 = Q3 = IQR:
16 1.04 .80 .34 15.3 16.70 1.4
The following are weights in pounds of a college sports team: 165, 171, 174, 180, 182, 188, 189, 192, 198, 202, 202, 225, 228, 235, 240 Find the first quartile.
171. The first quartile is the number such that 25% of the values are at or below that number.
The statistics below describe the data collected that represents the number of calories in a single serving of cereal for 15 types of cereals. mu=182, median=186, sigma=15, 1st quartile=172, 3rd quartile = 197, n=18 What is the IQR? Check all that apply.
172 to 197 The range that contains the middle half of the data The range between Q1 and Q3
The mean height of an adult giraffe is 18 feet. Suppose that the distribution is normally distributed with standard deviation 0.8 feet. Let X be the height of a randomly selected adult giraffe. Round all answers to two decimal places. A. X ~ N( , ) B. What is the median giraffe height? ft. C. What is the Z-score for a 20 foot giraffe? D. What is the probability that a randomly selected giraffe will be shorter than 17 feet tall? E. What is the probability that a randomly selected giraffe will be between 16 and 19 feet tall? F. The 90th percentile for the height of giraffes is ft.
18 ,8 18 2.5 .11 .89 19.03
Suppose that the duration of a particular type of criminal trial is known to be normally distributed with a mean of 18 days and a standard deviation of 5 days. Let X be the number of days for a randomly selected trial. Round all answers to two decimal places. A. X ~ N( , ) B. If one of the trials is randomly chosen, find the probability that it lasted at least 21 days. C. If one of the trials is randomly chosen, find the probability that it lasted between 15 and 20 days. D. 85% of all of these types of trials are completed within how many days?
18 5 .27 .38 23
The lengths of adult males' hands are normally distributed with mean 189 mm and standard deviation is 7.2 mm. Suppose that 30 individuals are randomly chosen. Round all answers to two decimal places. A. xBar~ N( , ) B. For the group of 30, find the probability that the average hand length is less than 190. C. Find the third quartile for the average adult male hand length for this sample size.
189 1.31 .78 189.88
In the 1992 presidential election, Alaska's 40 election districts averaged 1956.8 votes per district for President Clinton. The standard deviation was 572.3. (There are only 40 election districts in Alaska.) The distribution of the votes per district for President Clinton was bell-shaped. Let X= number of votes for President Clinton for an election district. (Source: The World Almanac and Book of Facts) Round all answers to two decimal places. A. X ~ N( , ) B. Is 1956.8 a population mean or a sample mean? C. Find the probability that a randomly selected district had fewer than 1700 votes for President Clinton. D. Find the probability that a randomly selected district had between 2000 and 2700 votes for President Clinton. E. Find the first quartile for votes for President Clinton.
1957 572 Population mean .33 .37 1571
The histogram below shows the lengths of many spiders found on the forest floor. Histogram with tallest bar in the middle at 7, twice as tall as the many bars symmetrically dropping on the left and right. Without calculating, what might be the standard deviation for this data?
2 Since the bulk of the data is between 5 and 9, this range covers around two standard deviations: 4/2 = 2.
Test for Independence
2 or more populations split across a categorical variable. You should have a 2-dimensional table of counts.
The round off error when measuring the distance that a long jumper has jumped is uniformly distributed between 0 and 5 mm. Round all answers to two decimal places. A. The mean of this distribution is B. The standard deviation is C. The probability that the round off error for a jumper's distance is exactly 2.5 mm is P(x = 2.5) = D. The probability that the round off error for the distance that a long jumper has jumped is between 2 and 4 mm is P(2 < x < 4) = E. The probability that the jump's round off error is greater than 1 is P(x > 1) = F. P(x > 4 | x > 2) = G. Find the 60th percentile. H. Find the minimum for the upper quartile.
2.5 1.44 0 .4 .8 .33 3 3.75
For the set of scores 0, 0, 1, 1, 1, 2, 3, 5, 5, 5, 6, the mean is ______ .
2.636363636
Suppose that you are offered the following "deal". You roll a die. If you roll a 1, you win $20. If you roll a 2 or 3, you win $5. If you roll a 4, 5, or 6, you pay $15. A. Complete the PDF Table. List the x values from largest to smallest. x P(x) _ _ _ _ _ _ B. Find the expected value. C. Interpret the expected value. D. Based on the expected value, should you play this game?
20. .17 5. .33 -15. .5 -2.45 If you play many games you will likely win on average about this much. No, since the expected value is negative, you would be very likely to come home with less money if you played many games.
The following are weights in pounds of a college sports team: 165, 171, 174, 180, 182, 188, 189, 192, 198, 202, 202, 225, 228, 235, 240 Find the standard deviation. Round your answer to the nearest pound.
24.0
Suppose that the distance of fly balls hit to the outfield (in baseball) is normally distributed with a mean of 246 feet and a standard deviation of 39 feet. Let X be the distance in feet for a fly ball. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that a randomly hit fly ball travels less than 200 feet. C.Find the 70th percentile for the distribution of fly balls.
246 39 .12 266
Suppose you want to construct a 90% confidence interval for the proportion of cars that are recalled at least once. You want a margin of error of no more than plus or minus 5 percentage points. How many cars must you study?
271
The amount of calories consumed by customers at the Chinese buffet is normally distributed with mean 2885 and standard deviation 651. One randomly selected customer is observed to see how many calories X that customer consumes. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that the customer consumes less than 2000 calories. C. What proportion of the customers consume over 5,000 calories? D. The Piggy award will given out to the 1% of customers who consume the most calories. What is the least amount of calories a person must consume to receive the Piggy award?
2885 651 .09 0 4399
Los Angeles workers have an average commute of 29 minutes. Suppose the LA commute time is normally distributed with a standard deviation of 13 minutes. Let X represent the commute time for a randomly selected LA worker. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that a randomly selected LA worker has a commute that is longer than 40 minutes. C.Find the 90th percentile for the commute time of LA workers.
29 13 .2 46
Suppose that the weight of an newborn fawn is uniformly distributed between 2 and 4 kg. Suppose that a newborn fawn is randomly selected. Round all answers to two decimal places. A. The mean of this distribution is B. The standard deviation is C. The probability that the fawn will weigh more than 2.8 kg. D. Suppose that it is known that the fawn weighs less than 3.5 kg. Find the probability that the fawn weights more than 3 kg.= E. Find the 90th percentile for the weight of fawns.
3 .58 .6 .33 3.8
In China, 4-year-olds average 3 hours a day unsupervised. Most of the unsupervised children live in rural areas, considered safe. Suppose that the standard deviation is 1.5 hours and the amount of time spent alone is normally distributed. We randomly survey one Chinese 4-year-old living in a rural area. We are interested in the amount of time X the child spends alone per day. (Source: San Jose Mercury News) Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that the child spends less than 2 hours per day unsupervised. C. What percent of the children spend over 12 hours per day unsupervised? percent. D. 60 percent of all children spend at least how many hours per day unsupervised? hours.
3 1.5 .25 0 2.6
The percent of fat calories that a person in America consumes each day is normally distributed with a mean of about 36 and a standard deviation of 10. Suppose that one individual is randomly chosen. Let X=percent of fat calories. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that a randomly selected fat calorie percent is more than 28. C. Find the minimum number for the upper quarter of percent of fat calories.
36 10 .79 42.74
The percent of fat calories that a person in America consumes each day is normally distributed with a mean of about 36 and a standard deviation of about 10. Suppose that 16 individuals are randomly chosen. Round all answers to two decimal places. A. xBar~ N( , ) B. For the group of 16, find the probability that the average percent of fat calories consumed is more than 20. C. Find the third quartile for the average percent of fat calories.
36 2.5 1 37.69
Each sweat shop worker at a computer factory can put together 4 computers per hour on average with a standard deviation of 0.8 computers. 20 workers are randomly selected to work the next shift at the factory. Round all answers to two decimal places and assume a normal distribution. A. xBar~ N( , ) B. For the 20 workers, find the probability that their average number of computers put together per hour is between 3 and 5. C. If one randomly selected worker is observed, find the probability that this worker will put together between 3 and 5 computers per hour.
4 .18 1 .79
The statistics below shows the full time equivalent student (FTES) count for the history of Lake Tahoe Community College. mu = 1000, median = 1014, sigma = 474, Q1 = 528.5, Q3 = 1447.5, n = 29 A sample of 8 years are taken. What is the best prediction for the numbers of these years that have fewer than 1014 FTES? 25% of all years had at least how many FTES? 25% of all years had at most how many FTES? What percent of all the years had between 1014 and 1447.5 FTES? percent. What is the population standard deviation? How many standard deviations above the mean is the third quartile? Round your answer to three decimal places.
4. 1447.5 528.5 25 474 0.944 Since 1014 is the median, half of the sample is predicted to be less than this number. Asking about 25% more than a number is the same as 75% less, meaning the third quartile. Asking 25% less is the first quartile. Between the first and third quartile is the IQR or 50% of the data. The standard deviation is sigma. The last question asks for the z-score which is z = (x - mu)/sigma.
The patient recovery time from a particular surgical procedure is normally distributed with a mean of 4.2 days and a standard deviation of 1.7 days. Let X be the recovery time for a randomly selected patient. Round all answers to two decimal places. A. X ~ N( , ) B. What is the median recovery time? days. C. What is the Z-score for a patient that took 2 days to recover? D. What is the probability of spending more than 6 days in recovery? E. What is the probability of spending between 4 and 5 days in recovery? days. F. The 80th percentile for recovery times is? days.
4.2 1.7 4.2 -1.29 .14 .23 5.63
Suppose you want to construct a 99% confidence interval for the proportion of first dates that lead to a second date. You want a margin of error of no more than plus or minus 2 percentage points. How many couples must you observe?
4161
The number of ants per acre in the forest is normally distributed with mean 45,289 and standard deviation 12,340. Let X= number of ants in a randomly selected acre of the forest. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that a randomly acre in the forest has fewer than 40,000 ants. C. Find the probability that a randomly selected acre has between 35,000 and 50,000 ants. D. Find the first quartile.
45289 12340 .33 .45 36966
The statistics below describe the data collected by a business person who researched the amount of money 65 customers spent. mu = 27.52, Median=24.96, Sigma=6.34, Q1=18.34, Q3=34.18, n=65 A sample of 10 receipts is taken. What is the best prediction for the number of these receipts that were less than $24.96? 25% of all receipts were more than how much money? 25% of all receipts were less than how much money? What percent of all the receipts were between $18.34 and 35.72? 71.7 percent. What is the population standard deviation? How many standard deviations below the mean is the first quartile? Round your answer to three decimal places.
5 35.72 18.34 50 6.34 1.448 Since 24.96 is the median, half of the sample is predicted to be less than this number. Asking about 25% more than a number is the same as 75% less, meaning the third quartile. Asking 25% less is the first quartile. Between the first and third quartile is the IQR or 50% of the data. The standard deviation is sigma. The last question asks for the z-score which is z = (x - mu)/sigma.
The age of the children in kindergarten on the first day of school is uniformly distributed between 4.8 and 5.8 years old. A first time kindergarten child is selected at random. Round all answers to two decimal places. A. The mean of the distribution is B. The standard deviation is C. The probability that the child will be older than 5 years old is D. The probability that the child will be between 5.2 and 5.7 years old is E. If such a child is at the 45th percentile, how old is that child? years old.
5.3 .3 .8 .5 5.25
Suppose you want to construct a 90% confidence interval for the average speed that cars travel on the highway. You want a margin of error of no more than plus or minus 0.5 mph and know that the standard deviation is 7 mph. At least how many cars must you clock?
531
The amount of syrup that people put on their pancakes is normally distributed with mean 60 mL and standard deviation 10 mL. Suppose that 25 randomly selected people are observed pouring syrup on their pancakes. Round all answers to two decimal places. A. xBar~ N( , ) B. For the group of 25 pancake eaters, find the probability that the average amount of syrup is between 55 mL and 65 mL. C. If a single randomly selected individual is observed, find the probability that the this person consumes between 55 mL and 65 mL of syrup.
60 2 .99 .38
The average number of words in a romance novel is 64,182 and the standard deviation is 17,154. Assume the distribution is normal. Let X be the number of words in a randomly selected romance novel. Round all answers to two decimal places. A. X ~ N( , ) B. Find the proportion of all novels that are between 50,000 and 60,000 words. C. The 90th percentile for novels is words. D. The middle 40% of romance novels have from words to
64182 17154 .2 86166 55186 73178
Suppose that the speed at which cars go on the freeway is normally distributed with mean 68 mph and standard deviation 5 miles per hour. Let X be the speed for a randomly selected car. Round all answers to two decimal places. A. X ~ N( , ) B. If one car is randomly chosen, find the probability that is traveling more than 70 mph. C. If one of the cars is randomly chosen, find the probability that it is traveling between 65 and 75 mph. D. 90% of all cars travel at least how fast on the freeway? mph.
68 5 .34 .65 61.59
The average number of miles (in thousands) that a car's tire will function before needing replacement is 68 and the standard deviation is 15. Suppose that 9 randomly selected tires are tested. A. xBar~ N( , ) B. For the 9 tires tested, find the probability that the average miles (in thousands) before need of replacement is between 65 and 75 . C. If a randomly selected individual tire is tested, find the probability that the number of miles (in thousands) before it will need replacement is between 65 and 75 .
68 5 .64 .26
The following show the results of a survey asking women how many pairs of shoes they own : 2, 4, 4, 5, 7, 8, 8, 9,12,15,17, 28 Find the standard deviation. Round your answer to one decimal place.
7.3
The statistics below describe the data collected by a psychologist who surveyed single people asking how many times they went on date last year. mu = 14, median = 11, sigma = 6.2, Q1 = 7.5, Q3 = 18, n = 200 What is the IQR? Check all that apply.
7.5 to 18 The range that contains the middle half of the data The range between Q1 and Q3
Suppose you want to construct a 99% confidence interval for the mean number of seconds that people spend brushing their teeth. You want a margin of error of no more than plus or minus 2 seconds and know that the standard deviation is 21 seconds. At least how many people must you observe?
732
The average American man consumes 9.8 grams of sodium each day. Suppose that the sodium consumption of American men is normally distributed with a standard deviation of 0.9 grams. Suppose an American man is randomly chosen. Le X = the amount of sodium consumed. Round all numeric answers to 2 decimal places. A. X ~ N( , ) B. Find the probability that this American man consumes between 9 and 10 grams of sodium per day. C. The middle 20% of American men consume between what two weights of sodium? Low: High:
9.8 .9 .4 9.57 10.03
The following show the results of a survey asking women how many pairs of shoes they own : 2, 4, 4, 5, 7, 8, 8, 9,12,15,17, 28 The mean is: (Round to two decimal places). The sample standard deviation is: (Round to two decimal places). The first quartile is: The median is: The third quartile is: Find the number of pairs of shoes that is 2 standard deviations above the mean. (Round to the nearest whole number). You just met a woman who has 6 pairs of shoes. How many standard deviations below the mean is this woman's show ownership? 1.67 (Round to two decimal places).
9.92 7.26 4.5 8 13.5 24 0.54 These questions are all meant to be done using the TI 84 or comparable calculator. Please watch the video that is linked from the question for instructions. Only the last two questions need a specific formula. The second to last just take the mean and add twice the standard deviation. The last asks for the z-score = (x - mean)/(standard deviation).
Last year, you did a study and found out that 67% of the students who eat in the cafeteria have a salad. This year you want to construct a 95% confidence interval for the proportion of students who have a salad in the cafeteria. You want a margin of error of no more than plus or minus 3 percentage points. How many students must you observe?
944
Spread
Another parameter we might care about. Big spread is exciting for people in Vegas because they focus on the bigger wins
Handout Lectures
Continuous Random Variables
The histogram below shows the lengths of many spiders found on the forest floor. Histogram with tallest bar in the middle at 7, twice as tall as the many bars symmetrically dropping on the left and right. Check each of the following that are true statements.
If there had been only 40 spiders of length 7 mm instead of 80, then the new standard deviation would be larger then the original standard devitiation A relative frequency histogram would also have the same shape, just a different scale on the vertical axis. This histogram is approximately normal.
memoryless property
If there is a known average of λ events occurring per unit time, and these events are independent of each other, then the number of events X occurring in one unit of time has the Poisson distribution. The probability of k events occurring in one unit time is equal to P(X=k)=λke−λ/k!
"95% confident" technically means
If you drew many, many samples, and for each one, you find p̂ and built a confidence interval by reaching out +/- 2 standard deviations, then the true population parameter would be in about 95% of these intervals. -It is not "A 95% chance that your value is in the interval" -It is actually "95% of the intervals you find will contain the parameter"
Simple Random Sample (SRS)
Imagine each point in a box as a person. We just pick a certain number of random points.
Numeric/Quantitative Data
Numerical data with units
The average temperatures and standard deviations for the towns of Springfield, Oakridge, and Riverton are shown in the table below. Also shown are high temperatures that these three towns had on July 4. July 4 Temp. Mean Temp. Standard Dev. Springfield 98 89 6 Oakridge 83 77 3 Riverton 94 94 8 Relative to the town's typical weather, which town had the most extreme temperature on July 4?
Riverton Springfield's July 4 temperature was 1.5 standard deviations above its mean, Oakridge's July 4 temperature was 2 standard deviations above its mean, and Riverton's July 4 temperature was at its mean.
The expected value is often referred to as the "long-term" average or mean.
This means that over the long term of doing an experiment over and over, you would expect this average.
Point Estimate
a single number computed from a sample and used to estimate a population parameter
Random Variable
a variable that takes on possible numeric values that result from a random event
If you break up the sample space into disjoint sets, the probabilities of these events must
add up to 1
Inferential Statistics
also called statistical inference or inductive statistics; this facet of statistics deals with estimating a population parameter based on a sample statistic. For example, if four out of the 100 calculators sampled are defective we might infer that four percent of the production is defective.
Mean on a histogram
balance point of the histogram, the torque is the same on both sides of the mean
In general, the side of the SD gives a sense for how
closely you experience playing the game will hug the mean
The X^2 distribution is used to compare
counts in a table (to a list of expected values).
CI for mean difference of paired samples
dhat +/- t*df*(SE(dhat) -d stands for differences SE = s/sqrt(n) df = n-1
Difference in Means: Confidence Interval for paired, dependent samples
dhat +/- t*df*(SE(dhat) SE = Standard deviation of d / sqrt(n)
Residual
e=y-ŷ (how off the model is at the value of x) -y = value observed from actual data point -ŷ = value predicted from regression line
Pros of range
easy to calculate, gives sense of span of data
Attributes of a scatterplot
form, direction, strength, outliers
high influence point
gives a significantly different slope for the regression line if it is included, versus excluded, from an analysis
When a histogramis skewed right, the mean is
greater than the median. (Small amount of higher values push the mean forward but don't affect the median)
marginal probabilities are the sums of
joint probabilities
Two random variables are independent fi
knowing the outcome of one has no effect on the outcome of the other
As the graph skews to the right, the mean becomes
larger than the median. The mean is pulled right by the large values in the data set.
Tails of distribution
left and right sides of a graph
When the relationship is curved, the correlation is
less meaningful
The density graph predicts
likeliness of an event occurring, not its probability of occuring
Correlation only works with
linear relationships
correlation is unaffected by
linear scale changes (cor(x,y) = cor (x,2.5y) = cor(x,y+14) = cor(2x-17, 99999y+1))
Probability Model
lists the different outcomes for a random variable and gives the probability of each outcome
E(x) is also known as the
long-run average, denoted μ
Skewed left
longer tail on the left
Skewed right
longer tail on the right
When a histogram is skewed left, the mean is
lower than the median (Small amount of lower values push the mean back but don't affect the median)
To establish a causation, eliminate
lurking variables
Sampling distribution
making a histogram of all the means from all our different samples
center used for symmetric distributions without outliers
mean
high leverage point
outlier where x is far from the mean of x values
For a two sample proportion test testing p1 and p2, you would think about
p1 - p2. e.g. p1 - p2 > 0
Correlation matrix
shows the correlation of every variable with every other variable
For any x-value (or z-score, if you convert to a standard normal model, N(0,1)) the percentile is
simply the area to the left of this value
Conditions for creating a regression model
since correlations are involved, we need our three conditions from before: 1) quantitative variable 2) straight enough 3) no outliers 4) residual noise
the line of best fit is determined by
slope and y-intercept
The sample size does not need to be
some percentage of the population size. Larger samples are better irrespective of the population size e.g. Tasting a small pot of soup gives same amount of info as tasting a large pot of soup -However, tasting 3 spoons of soup is better than tasting 1 spoon
event
some set of outcomes you might care about
subgroups can be identified in original data or residuals.
split your data into different parts and doing several linear regressions instead of one, clunky regression.
Standard Error
sqrt(p̂*q̂ / n) The same as standard deviation, sqrt(pq/n), but built upon p̂ (the sample distribution) instead of p. You are trying to estimate the population using the sample distribution, so you use p̂ values to estimate p.
Standard deviation of a density function
sqrt(var(x))
p̂ pooled
successes1+success2 / n1 + n2 Used when you are doing a hypothesis test. If we assume H0 is true (p1-p2 = 0) then the populations are the same and pooling p̂1 and p̂2 will give better approximations than using both of them separately.
cons of range
summarizes the data using only 2 data points, not resistant to outliers
Difference in Means: Hypothesis Test for paired, dependent samples
tdf = (dhat - 0)/ SE
outcome
the data created from a trial
z-score
the linear transformation of the form z = x - μ/σ ; if this transformation is applied to any normal distribution X ~ N(μ, σ) the result is the standard normal distribution Z ~ N(0,1). If this transformation is applied to any specific value x of the RV with mean μ and standard deviation σ, the result is called the z-score of x. The z-score allows us to compare data that are normally distributed but scaled differently.
In any given situation, the higher the risk of Type I error, the lower the risk of Type II error.
the lower the risk of Type II error.
Error Bound for a Population Proportion (EBP)
the margin of error; depends on the confidence level, the sample size, and the estimated (from the sample) proportion of successes.
For positive test results to be useful ,you need
the orders of magnitude of "test accuracy" and "disease prevalence" to be better matched.
You can ask questions about __________ since the probability of any individual outcome is always 0
the probability of some interval of values occurring
Conditional probability P(A|B)
the probability that event A occurs given the information that event B occurred. Pronounced P(A given B)
Using noise to determine whether a regression model is appropriate
the residual plot should show "noise", or no observable patterns in the plot. -if a pattern is seen, regression is not appropriate
The units on variance will always be
the square of the units in the problem. This can make variance difficult to interpret
Standard Error of the Mean
the standard deviation of the distribution of the sample means
Reexpressing data
to make data more visually appealing, to create more commonly-shaped histograms, to get lens of analysis correct
Methods for conditional probabiltiies
tree diagrams, P(A|B), and Baye's Theorem
The Law of Averages
(Gambler's fallacy) Incorrect use of LLN. False way of thinking that says if the current situation is out of whack, then it must correct itself in the short term.
Confidence interval for 2 sample proportions
(p̂1-p̂2)+/-z*SE(p̂1-p̂2) -samples must be independent from each other, at least 10 success/fails condition must be met also
CI for mean difference in two samples
(x̄1-x̄2) +/- t*df*(SE(x̄1-x̄2) SE = sqrt(s1^2/n1 + s2^2/n2) df=min(n1-1, n2-1) (min means pick the lowest number between the two)
If 60% of people run and 20% of runners wear long socks, what percent of people run and wear long socks? (What is the joint probability?) -What is the probability that someone doesn't wear long socks given that they run?
-0.2*0.6 = 0.12, so 12% of the total people run and wear long socks. -(0.6-0.12 = 0.48 = people that don't wear long socks and run) so 0.48 / 0.6 = 0.8 = 80% of wearing short socks given that they run
The student council is hosting a drawing to raise money for scholarships. They are selling tickets for $1 each and will sell 3000 tickets. There is one $500 grand prize, four $100 second prizes, and thirty $10 third prizes. You just bought a ticket. Find the expected value for your profit.
-0.6 (with margin: 0.02)
The student council is hosting a drawing to raise money for scholarships. They are selling tickets for $5 each and will sell 800 tickets. There is one $2,000 grand prize, three $300 second prizes, and fifteen $20 third prizes. You just bought a ticket. Find the expected value for your profit.
-1.0 (with margin: 0.02)
Twelve teachers attended a seminar on mathematical problem solving. Their attitudes were measured before and after the seminar. A positive number change attitude indicates that a teacher's attitude toward math became more positive. The twelve change scores are as follows:3; 8; -1; 2; 0; 5; -3; 1; -1; 6; 5; -2 Find the change score that would be 2.4 standard deviations below the mean. Round your answer to two decimal places.
-6.48 The standard deviation is about 3.50 and the mean is about 1.92. Calculate: 1.92 - 2.4 x 3.50 = -6.48
Summary of Sampling Statistics: To estimate a population parameter p, we can
-Draw a random sample of size n. -This sample will have a statistic p̂ ≈ p. -If we drew many samples, each would have its own statistic p̂ and we could make a histogram of these values -The histogram, the sampling distribution, is approximately: N( μ, σ/sqrt(n) )
When to use Z vs. T
-If you know sigma (almost never true): use z-distribution -In all other cases: Use t-distribution
Why doesn't X+X = 2X?
-In the X+X scenario, we often add winning and losing situations which diminish the influence of one another. (e.g win + win, loss + loss, win + loss, loss + win are all possible) -In the 2X scenario, you either win twice or you lose twice.
Cluster Sampling
-Sampling in which elements are selected in two or more stages, with the first stage being the random selection of naturally occurring clusters and the last stage being the random selection of elements within clusters -e.g. asking people as they walk into various gyms on campus what their average GPAs are. 3 different gyms can have both grads and undergrads. -Pieces just because it's more convenient -Pieces heterogeneous in relation to parameter you're measuring (Gyms all have same undergrads and grads)
Continuous Distributions
-The equation for a continuous probability distribution is called the probability density function (pdf). -The cumulative distribution function (cdf) is the area to the left of the value. -P(a < x < b) is the area under the curve. -The probability density function is never negative. -The total area bounded by the pdf and the x-axis is 1. -The pdf can reach values greater than 1.
In inference about regression, you use the histogram for all b1 values because b0 doesn't really tell us anything.
-The histogram of all possible b1's is centered at the true population parameter, β1 -SE = se / (sx*sqrt(n-1)) se = standard deviation of residuals sx = standard deviation of x values -The curve is best approximated by a histogram with tn-2 1. the conditions for inference from a regression line must be met (straight enough, quantitative, no outliers, residual noise) 2. independence condition (random, and <10% rule) 3. histogram of residuals is nearly normal
Stratified Random Sampling
-What is the average GPA of UCSD students? -Since grads and undergrads have much different average GPAs, you split the sample into 2 groups, do SRSs on each, then combine the results. -Pieces are homogeneous in relation to parameter you are measuring (undergrads have lower GPAs, grads have higher GPAs)
Common Geometric Model questions
-What is the probability that it takes exactly k <Bernoulli trials> to get the first <success>? -On average, how many <Bernoulli Trials> will it take to get the first <success>?
Common Binomial Model questions
-What's the probability of getting exactly k<successes> in n<Bernoulli trials>? -On average, how many <successes> will i get if I do n<Bernoulli trials>?
Margin of error is increased by
-smaller samples -higher level of confidence
Answer the following questions and round your answers to 2 decimal places. 66% of Americans are home owners. If 45 Americans are selected at random, find the probability that A. Exactly 30 of them are home owners. B. At most 25 of them are are home owners. C. More than 33 of them are are home owners. D. Between 27 and 32 (including 27 and 32) of them are are home owners.
.12. .09 .11. .65
Complete the following probability distribution function table. x P(x) -4 [response1] 0 0.14 5 0.32 20 0.41
.13 <p>The probabilities add to 1.</p> <p>1 - 0.14 - 0.32 - 0.41 = 0.13</p>
Answer the following questions and round your answers to 2 decimal places. 75% of owned dogs in the United States are spayed or neutered. If 47 dogs are randomly selected, find the probability that A. Exactly 36 of them have been spayed or neutered. B. At most 30 of them have been spayed or neutered. C. At least 35 of them have been spayed or neutered. D. Between 30 and 40 (including 30 and 40) of them have been spayed or neutered.
.13. .06 .61. .94
Answer the following questions and round your answers to 2 decimal places. 31% of all college students major in STEM (Science, Technology, Engineering, and Math). If 36 students are randomly selected, find the probability that A. Exactly 11 of them major in STEM. B. Fewer than 10 of them major in STEM. C. More than 13 of them major in STEM. D. Between 10 and 15 (including 10 and 15) of them major in STEM.
.14. .28 .20. .66
Answer the following questions and round your answers to 2 decimal places. 84% of all Americans live in cities with population greater than 50,000 people. If 50 Americans are selected at random, find the probability that A. Exactly 42 of them live in cities with population greater than 50,000 people. B. At most 45 of them live in cities with population greater than 50,000 people. C. More than 40 of them live in cities with population greater than 50,000 people. D. Between 35 and 40 (including 35 and 40) of them live in cities with population greater than 50,000 people.
.15. .92 .73. .27
Answer the following questions and round your answers to 2 decimal places. 13% of all Americans live in poverty. If 45 Americans are randomly selected, find the probability that A. Exactly 5 of them live in poverty. B. At most 5 of them live in poverty. C. At least 5 of them live in poverty. D. Between 3 and 6 (including 3 and 6) of them live in poverty.
.17. .46 .71. .58
Answer the following questions and round your answers to 2 decimal places. 70% of bald eagles survive their first year of life. If 25 bald eagles are selected at random, find the probability that A. Exactly 18 of them survive their first year of life. B. At most 19 of them survive their first year of life. C. More than 16 of them survive their first year of life. D. Between 15 and 22 (including 15 and 22) of them survive their first year of life.
.17. .81 .68. .89
A smart phone manufacturer is interested in constructing a 99% confidence interval for the proportion of smart phones that break before the warranty expires. 97 of the 1750 randomly selected smart phones broke before the warranty expired. Round your answers to three decimal places. A. With 99% confidence the proportion of all smart phones that break before the warranty expires is between and . B. If many groups of 1750 randomly selected smart phones are selected, then a different confidence interval would be produced for each group. About percent of these confidence intervals will contain the true population proportion of all smart phones that break before the warranty expires and about percent will not contain the true population proportion.
0.041 0.070 99 1
A psychologist is interested in constructing a 95% confidence interval for the proportion of people who accept the theory that a person's spirit is no more than the complicated network of neurons in the brain. 68 of the 945 randomly selected people who were surveyed agreed with this theory. A. With 95% confidence the proportion of all people who accept the theory that a person's spirit is no more than the complicated network of neurons in the brain is between and . B. If many groups of 945 randomly selected people are surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of all people who accept the theory that a person's spirit is no more than the complicated network of neurons in the brain and about percent will not contain the true population proportion.
0.055 0.088 95 5
You are interested in constructing a 95% confidence interval for the proportion of all caterpillars that eventually become butterflies. Of the 400 randomly selected caterpillars observed, 42 lived to become butterflies. A. With 95% confidence the proportion of all caterpillars that lived to become a butterfly is between and . B. If many groups of 400 randomly selected caterpillars were observed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of caterpillars that become butterflies and about percent will not contain the true population proportion.
0.075 0.135 95 5
A businessperson is interested in constructing a 90% confidence interval for the proportion of coupons that get redeemed. 97 of the 900 randomly selected coupons sent out were redeemed. A. With 90% confidence the proportion of all coupons that get redeemed is between and . B. If many groups of 900 randomly selected coupons were sent out, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of coupons that get redeemed and about percent will not contain the true population proportion.
0.091 0.125 90 10
A biologist is interested in constructing a 90% confidence interval for the proportion of coyotes that survive at least one year after straying from the pack. 42 of the 350 randomly observed coyotes that strayed from the pack were still alive one year later. A. With 90% confidence the proportion of all coyotes that survive at least one year after straying from the pack is between and . B. If many groups of 350 randomly selected coyotes that strayed from the pack are observed, then a different confidence interval would be produced for each group. About percent of these confidence intervals will contain the true population proportion of all coyotes that survive at least one year after straying from the pack and about percent will not contain the true population proportion.
0.091 0.149 90 10
Fifty part-time students were asked how many courses they were taking this term. The (incomplete) results are shown below: # of Courses Frequency Relative Frequency Cumulative Relative Frequency 1 30 0.6 2 15 3 Find the relative frequency for students taking 3 courses.
0.1
Complete the following probability distribution function table. x P(x) 1 0.3 3 [response1] 7 0.2 12 0.4
0.1 (with margin: 0.01)
You are interested in constructing a 95% confidence interval for the proportion of college students who have seen at least one Shakespeare play. Of the 700 randomly selected college students surveyed, 120 had seen a Shakespeare play. A. With 95% confidence the proportion of all college students who have seen a Shakespeare play is between and . B. If many groups of 700 randomly selected college students were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of college students who have seen a Shakespeare play and about percent will not contain the true population proportion.
0.144 0.199 95 5
Complete the following probability distribution function table. x P(x) -4 0.31 -2 0.14 0 0.07 2 [response1] 4 0.28
0.2 (with margin: 0.01)
You are interested in constructing a 95% confidence interval for the proportion of all statistics students who receive tutoring. Of the 500 randomly selected statistics students, 128 received tutoring. A. With 95% confidence the proportion of all statistics students who receive tutoring is between and . B. If many groups of 500 randomly selected statistics students were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of statistics students who receive tutoring and about percent will not contain the true population proportion.
0.218 0.294 95 5
You are interested in constructing a 90% confidence interval for the proportion of people who will buy a new cell phone this year. Of the 800 randomly selected people surveyed, 382 will buy a new cell phone this year. A. With 90% confidence the proportion of all people who will buy a cell phone this year is between and . B. If many groups of 800 randomly selected people were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of people who will buy a new cell phone this year and about percent will not contain the true population proportion.
0.448 0.507 90 10
Do men score higher than women on average on the final exam in their statistics class? Final exam scores of twelve randomly selected male statistics students and thirteen randomly selected female statistics students are shown below. Male 95 86 58 77 81 90 88 97 72 80 82 97 Female 78 86 79 67 83 84 99 90 87 76 85 72 94 Assume that both populations follow a normal distribution. What can be concluded at the 0.05 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifMen = mu.gifWomen Ha: mu.gifMen [response1] mu.gifWomen Test statistic: [response2] p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that men score higher than women on average on the final exam in their statistics class.
0.451 Fail to reject the null hypothesis insufficient
A politician is interested in constructing a 95% confidence interval for the proportion of Americans who are in favor of legalizing marijuana. 532 of the 1008 randomly selected Americans who were surveyed were in support of legalizing marijuana. A. With 95% confidence the proportion of all Americans who are in favor of legalizing marijuana is between and . B. If many groups of 1008 randomly selected Americans are surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of Americans who are in favor of legalizing marijuana and about percent will not contain the true population proportion.
0.497 0.559 95 5
You are interested in constructing a 95% confidence interval for the proportion of Americans who believe that evolution of species is false. Of the 1200 randomly selected Americans surveyed, 671 believe that evolution of species is false. A. With 95% confidence the proportion of all Americans who believe that evolution of species is false is between and . B. If many groups of 1200 randomly selected Americans were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of Americans that believe evolution of species is false and about percent will not contain the true population proportion.
0.531 0.587 95 5
A researcher is interested in constructing a 99% confidence interval for the proportion of Alzheimer patients living in nursing homes who exhibit temporary memory improvement while being visited by their loved ones. 920 of the 1356 randomly observed Alzheimer patients did show temporary memory improvement. A. With 99% confidence the proportion of all Alzheimer patients living in nursing homes who exhibit temporary memory improvement while being visited by their loved ones is between and . B. If many groups of 1356 randomly selected Alzheimer patients living in nursing homes are observed while being visited by their loved ones, then a different confidence interval would be produced for each group. About percent of these confidence intervals will contain the true population proportion of all Alzheimer patients living in nursing homes that exhibit temporary memory improvement while being visited by their loved ones and about percent will not contain the true population proportion.
0.646 0.711 99 1
Fifty part-time students were asked how many courses they were taking this term. The (incomplete) results are shown below: # of Courses Frequency Relative Frequency Cumulative Relative Frequency 1 30 0.6 2 15 3 Determine the Cumulative Relative Frequency for taking 2 classes.
0.9 Since there are a total of 50 Students and 45 of them are taking 2 or fewer classes. 45/50 = 0.9.
Sixty adults with gum disease were asked the number of times per week they used to floss before their diagnoses. The (incomplete) results are shown below: # Flossing per Week Frequency Relative Frequency Cumulative Relative Frequency 0 27 0.4500 1 18 3 0.9333 6 3 0.0500 7 1 0.0167 What is the cumulative relative frequency for flossing 6 times per week? Round to 4 decimal places.
0.9833 The cumulative relative frequency for 3 time per week is 0.9333. Add in the 0.0500 for 6 times per week gives a cumulative relative frequency of 0.9833.
For a continuous random variable X which takes on any real number, we need model it through a density function f(x) which has 2 properties:
1) f(x) >/= 0 for all x 2) The integral from -∞ to ∞ of f(x) equals 1
How do we decide on the null hypothesis and the alternative hypothesis?
1. Adopt some belief for the moment (null hypothesis) 2. Operating under the assumption that this belief is true, you collect some data. -If the data supports the belief, you continue to operate with this mindset (fail to reject null hypothesis) -If the data supports an alternative belief, discard old belief in favor of new belief (reject null hypothesis in favor of alternative hypothesis)
Steps to testing a hypothesis
1. Create null hypothesis H0 2. Create alternative hypothesis HA 3. Draw a sample and consider it assuming the null hypothesis H0 is true. Find the mean and SD of this data and make a plot. (you use p and q instead of hats because if you are assuming that H0 is true, then you are assuming you know the values for p and q) -Calculate the p-value: the probability/chance of seeing our result or something more extreme if our universe is "H0: The drug works as well as the placebo" 4. If p-value </= 0.05, reject null hypothesis If p-value > 0.05, fail to reject null hypothesis
two ways to use t-distribution:
1. Estimate p1-p2 using a confidence interval about p̂1 - p̂2 2. Run a hypothesis test with H0: p1-p2 = 0
Steps for Test for Independence
1. Find the expected counts for each cell. This is equal to (row total)(column total)/(table total) 2. Find X^2 (same as before), X^2 = sum (Oi-Ei)^2/Ei 3. Find the P-Value: look up the X^2 value on X^2df, where df = (r-1)(c-1). r = amount of rows, c = amount of columns (exclude total column/row) 4. Use the P-value to conclude based on the null.
Two cases for the X^2 distribution
1. Goodness-of-fit 2. Test of homogeneity/independence
Examples of goodness-of-fit questions
1. If we look at the birth months of National Hockey League (NHL) players, do they resemble what we might see in the larger US population? 2. If we breed a bunch of peas, do we really get the results expected from Mendel's theory of genetics?
Steps of the Goodness-of-fit test
1. You wish to compare a collection of counts to those predicted by some theory 2. Calculate the expected counts from your theory (Expected = Total population * Percentage expected for that category) 3. Calculate X^2 = sum (Oi-Ei)^2/Ei Oi = observed counts Ei = expected counts 4. Find the P-value. Look up the X^2 value on the curve X^2k-1, where k is the number of categories 5. Use the P-value to decide about H0: The observed and expected values are the same.
Incorrect uses of linear regression
1. fail to look at the residuals and make sure the model is reasonable 2. don't extrapolate with caution 3. don't consider outliers carefully enough 4. build a model of data that isn't straight enough
p-value main points
1. p-values can indicate how incompatible the data are with a specified statistical model 2. P-values do not measure the probability that the studies hypothesis is true 3. A P-value (statistical significance) does not measure the size of an effect or the importance of a result (practical significance) 4. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold
P overload
1. p: is the proportion of some trait in a population. It is a parameter 2. p̂: is the proportion of some trait ina sample. It is a statistic. 3. P(A) is the probability of some event A happening 4. P-value a conditional probability: it is the probability of getting the value p̂ (or something more extreme) in a universe where p is true.
Properties of the Normal Density Curve
1. symmetric about the mean 2. empirical rule applies 3. mean=median=mode 4. area under curve =1 5. as (x-->8,x-->-8) curve approaches but never reaches zero
Twenty-five randomly selected students were asked the number of movies they watched the previous week. The results are as follows: # of Movies Frequency 0 5 1 9 2 6 3 4 4 1 Round your answers to two decimal places. The mean is: The median is: The sample standard deviation is: 1.122 The first quartile is: The third quartile is: What percent of the respondents watched at least 3 movies the previous week? 45 % 56% of all respondents watched fewer than how many movies the previous week?
1.48 1 1.122 1 2 20 2 The first five answers should come directly from the calculator. Please watch the video if you have troubles with this. The sixth question just involves adding up those 3 and above and dividing by the total. The last one involves multiplying the percent by the total then see what number corresponds to that ranking.
The histogram below shows the distribution of the number of times students from a statistics class have been to London. Without calculating, what might be the standard deviation for this data?
1.5 Since the bulk of the data is between 0 and 3, this range covers around two standard deviations: 3/2 = 1.5.
Fifty randomly selected students were asked the number of speeding tickets they have had. The results are as follows: Tickets Frequency 0 8 1 15 2 21 3 3 4 2 5 0 6 1 Round your answers to two decimal places. The mean is: The median is: 3 The sample standard deviation is: The first quartile is: The third quartile is: What percent of the respondents have had at most 2 speeding tickets? % 6% of all respondents have had at least how many speeding tickets?
1.6 2 1.16 1 2 88 4 The first five answers should come directly from the calculator. Please watch the video if you have troubles with this. The sixth question just involves adding up those 2 and below and dividing by the total. The last one involves multiplying the percent by the total then see what number corresponds to that ranking.
Sixty adults with gum disease were asked the number of times per week they used to floss before their diagnoses. The (incomplete) results are shown below: # Flossing per Week Frequency Relative Frequency Cumulative Relative Frequency 0 27 0.4500 1 18 3 0.9333 6 3 0.0500 7 1 0.0167 What percent of adults flossed 7 times per week? Round to one decimal place.
1.67 Since the relative frequency for 7 times per week is 0.0167, we convert this to a percent by multiplying by 100% to get 1.7%
Twelve teachers attended a seminar on mathematical problem solving. Their attitudes were measured before and after the seminar. A positive number change attitude indicates that a teacher's attitude toward math became more positive. The twelve change scores are as follows:3; 8; -1; 2; 0; 5; -3; 1; -1; 6; 5; -2 What is the average change score? Round your answer to two decimal places.
1.92
Twelve teachers attended a seminar on mathematical problem solving. Their attitudes were measured before and after the seminar. A positive number change attitude indicates that a teacher's attitude toward math became more positive. The twelve change scores are as follows: 3, 8, -1, 2, 0, 5, -3, 1, -1, 6, 5, -2 The mean is: (Round to two decimal places). The sample standard deviation is: (Round to two decimal places). The first quartile is: The median is: The third quartile is: Find the change score that corresponds to 3 standard deviations above the mean. (Round to the nearest whole number). If a teacher experiences a change score of 4, how many standard deviations above the mean is this score? (Round to two decimal places).
1.92 3.5 -1 1.5 5 12 0.59 These questions are all meant to be done using the TI 84 or comparable calculator. Please watch the video that is linked from the question for instructions. Only the last two questions need a specific formula. The second to last just take the mean and add three times the standard deviation. The last asks for the z-score = (x - mean)/(standard deviation).
The statistics below describe the data collected by a psychologist who surveyed single people asking how many times they went on a date last year. mu = 14, median = 11, sigma = 6.2, Q1 = 7.5, Q3 = 18, n = 200 A sample of 20 single people is taken. What is the best prediction for the number of these single people that went on fewer than 11 dates last year ? 25% of all single people had more than how many dates? 7.5 25% of all single people had fewer than how many dates? What percent of all the single people had between 7.5 and 18 dates? percent. What is the population standard deviation? How many standard deviations below the mean is the first quartile? Round your answer to three decimal places.
10 18 7.5 50 6.2 1.048 Since 11 is the median, half of the sample is predicted to be less than this number. Asking about 25% more than a number is the same as 75% less, meaning the third quartile. Asking 25% less is the first quartile. Between the first and third quartile is the IQR or 50% of the data. The standard deviation is sigma. The last question asks for the z-score which is z = (x - mu)/sigma.
Suppose that you are offered the following "deal." You roll a die. If you roll a 6, you win $10. If you roll a 4 or 5, you win $5. If you roll a 1, 2, or 3, you pay $8. A. Complete the PDF Table. List the x values from largest to smallest. x P(x) _ _ _ _ _ _ B. Find the expected value. 0.23 C. Interpret the expected value. [ Select ] D. Based on the expected value, should you play this game?
10. .17 5. .33 -8. .5 -.65. If you play many games you will likely win on average very close to this amount. No, since the expected value is negative, you would be very likely to come home with less money if you played many games.
IQ is normally distributed with a mean of 100 and a standard deviation of 15. Suppose one individual is randomly chosen. Let X=IQ of an individual. Round all answers to two decimal places. A. X ~ N( , ) B. Find the probability that a randomly selected person's IQ is over 105. C. A school offers special services for all children in the bottom 3% for IQ scores. What is the highest IQ score a child can have and still receive special services? D. Find the Inter Quartile Range (IQR) for IQ scores. Q1: Q3: IQR:
100 15 .37 71.8 90 110 20
Sixty adults with gum disease were asked the number of times per week they used to floss before their diagnoses. The (incomplete) results are shown below: # Flossing per Week Frequency Relative Frequency Cumulative Relative Frequency 0 27 0.4500 1 18 3 0.9333 6 3 0.0500 7 1 0.0167 How many adults flossed exactly 3 times per week?
11.0 Since there was a total of 60 adults in the study and 49 did not floss exactly 3 times per week, 60 - 49 = 11 flossed exactly 3 times per week.
Based on a preliminary study of 50 laboratory cats, you have found that 82% of them do not fall on their feet when dropped while intoxicated. You want to construct a 90% confidence interval for the proportion of all laboratory cats that land on their feet when dropped while intoxicated. You want a margin of error of no more than plus or minus 5 percentage points. How many additional cats must you treat?
110
The following are weights in pounds of a college sports team: 165, 171, 174, 180, 182, 188, 189, 192, 198, 202, 202, 225, 228, 235, 240 The mean is: (Round to the nearest whole number). The sample standard deviation is: 38 (Round to the nearest whole number). The first quartile is: (Round to the nearest whole number). The median is: (Round to the nearest whole number). The third quartile is: (Round to the nearest whole number). Find the weight that is 2 standard deviations below the mean. (Round to the nearest whole number). A new player who is 215 pounds wants to join the team. How many standard deviations from the mean is this new player? (Round to two decimal places).
198 24 180 192 225 150 0.71 These questions are all meant to be done using the TI 84 or comparable calculator. Please watch the video that is linked from the question for instructions. Only the last question needs a specific formula. It asks for the z-score = (x - mean)/(standard deviation).
Find the sample standard deviation of the scores 0,2,2,2,2,3,5,8.
2.44949
The number of seconds X after the minute that class ends is uniformly distributed between 0 and 60. Round all answers to two decimal places. A. X ~ U( , ) Suppose that 50 classes are clocked then the sampling distribution is B. xBar~ N( , ) C. What is the probability that the average of 50 classes will end with the second hand between 25 and 35 seconds?
30 17.32 30 2.45 .96
For the set of scores 0, 2, 6, 8, the median is ______ .
4
Find the sample variance of the scores 0,2,2,2,2,3,5,8.
6
The average amount of money that people spend at Don Mcalds fast food place is $6.50 with a standard deviation of $1.75. 45 customers are randomly selected. Round all answers to two decimal places and assume a normal distribution. A. xBar~ N( , ) B. For the 45 customers, find the probability that their average spent is less than $5.00. C. What is the probability that one randomly selected customer will spend less than $5.00?
6.5 .26 0 .2
The amount of pollutants that are found in waterways near large cities is normally distributed with mean 8.5 ppm and standard deviation 1.4 ppm. 18 randomly selected large cities are studied. Round all answers to two decimal places. A. xBar~ N( , ) B. For the 18 cities, find the probability that the average amount of pollutants is more than 9 ppm. C. What is the probability that one randomly selected city's waterway will have more than 9 ppm pollutants? D. Find the IQR for the average of 18 cities. Q1 = Q3 = IQR:
8.5 .33 .06 .36 8.28 8.72 .45
American college students have an average of 4.6 credit cards per student. Is the average less for 20-year-olds who are not in college? The data for the 18 randomly selected 20-year-olds who are not in college is shown below: 8, 4, 3, 0, 6, 2, 4, 1, 5, 5, 4, 2, 3 ,4, 2, 7, 4, 0 Helpful Videos: Set-up (Links to an external site.)Links to an external site. H0: mu.gif = 4.6 Ha: mu.gif 4.6
<
On average, Americans have lived in 3 places by the time they are 18 years old. Is this average less for college students? The 67 randomly selected college students who answered the survey question had lived in an average of 2.6 places by the time they were 18 years old. The standard deviation for the survey group was 1.4. H0: u = 3
<
Do shoppers at the mall spend less money on average the day after Thanksgiving compared to the day after Christmas? The 37 randomly surveyed shoppers on the day after Thanksgiving spent an average of $117. Their standard deviation was $29. The 35 randomly surveyed shoppers on the day after Christmas spent an average of $138. Their standard deviation was $34. What can be concluded at the 0.05 level of significance? H0: mu.gifThanksgiving = mu.gifChristmas Ha: mu.gifThanksgiving mu.gifChristmas Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that shoppers at the mall spend less money on average the day after Thanksgiving compared to the day after Christmas.
< t 0.003 Reject the null hypothesis sufficient
Do left handed starting pitchers pitch fewer innings per game on average than right handed starting pitchers? Thirteen randomly selected left handed starting pitchers' games and fourteen randomly selected right handed pitchers' games were looked at. The table below shows the results. Left 7 4 5 6 6 8 2 5 7 5 8 4 2 Right 8 6 8 9 7 9 9 8 4 5 7 9 3 6 Assume that both populations follow a normal distribution. What can be concluded at the 0.05 level of significance? H0: mu.gifleft = mu.gifright Ha: mu.gifleft mu.gifright Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that left handed starting pitchers pitch fewer innings per game on average than right handed starting pitchers.
< t 0.017 reject the null hypothesis sufficient
Is there a difference between the average amount of money each shopper at the mall spends the day after Thanksgiving vs. the day after Christmas? The 40 randomly surveyed shoppers on the day after Thanksgiving spent an average of $123. Their standard deviation was $32. The 33 randomly surveyed shoppers on the day after Christmas spent an average of $141. Their standard deviation was $39. What can be concluded at the 0.01 level of significance? H0: mu.gifThanksgiving = mu.gifChristmas Ha: mu.gifThanksgiving mu.gifChristmas Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference in the average amount of money each shopper at the mall spends the day after Thanksgiving vs. the day after Christmas.
< t 0.038 Fail to reject the null hypothesis insufficient
If an event can never occur, then P(A)
=0
If an event must occur, then P(A)
=1
American college students have an average of 4.6 credit cards per student. Is the average more for 20-year-olds who are not in college? The data for the 19 randomly selected 20-year-olds who are not in college is shown below: 8, 4, 3, 0, 6, 2, 4, 11, 5, 5, 4, 2, 3 ,4, 2, 7, 4, 9, 7 H0: u = 4.6
>
The average amount of time it takes for couples to further communicate with each other after their first date has ended is 1.5 days. The standard deviation is 0.6 days. Is this average longer for blind dates? A researcher interviewed 34 couples who had recently been on blind dates and found that they averaged 1.65 days to communicate with each other after the date was over. H0: u = 1.5 Ha: u Not Equal 1.5
>
Do shoppers at the mall spend more money on average the day after Thanksgiving compared to the day after Christmas? The 38 randomly surveyed shoppers on the day after Thanksgiving spent an average of $133. Their standard deviation was $36. The 43 randomly surveyed shoppers on the day after Christmas spent an average of $126. Their standard deviation was $17. What can be concluded at the 0.05 level of significance? H0: mu.gifThanksgiving = mu.gifChristmas Ha: mu.gifThanksgiving mu.gifChristmas Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that shoppers at the mall spend more money on average the day after Thanksgiving compared to the day after Christmas.
> t 0.0139 Fail to reject the null hypothesis insufficient
Do left handed starting pitchers pitch more innings per game on average than right handed starting pitchers? Fourteen randomly selected left handed starting pitchers' games and fourteen randomly selected right handed pitchers' games were looked at. The table below shows the results. Left 7 8 5 6 6 8 9 5 7 5 8 4 8 9 Right 1 6 8 4 7 3 9 8 4 5 7 2 3 6 Assume that both populations follow a normal distribution. What can be concluded at the 0.05 level of significance? H0: mu.gifleft = mu.gifright Ha: mu.gifleft mu.gifright Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that left handed starting pitchers pitch more innings per game on average than right handed starting pitchers.
> t 0.029 Reject the null hypothesis sufficient
Is the average time to complete an obstacle course faster when a patch is placed over the right eye than when a patch is placed over the left eye? Thirteen randomly selected volunteers first completed an obstacle course with a patch over one eye and then completed an equally difficult obstacle course with a patch over the other eye. The completion times are shown below. "Left" means the patch was placed over the left eye and "Right" means the patch was placed over the right eye. Left 49 45 51 37 40 38 47 45 58 41 49 48 39 Right 41 42 45 43 41 40 47 46 55 42 44 43 34 Assume the distribution of the differences is normal. What can be concluded at the 0.10 level of significance? (d = speed right - speed left) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean time to complete the obstacle course with a patch over the right eye is greater than the population mean time to complete the obstacle course with a patch over the left eye.
> t 0.061 Reject the null hypothesis sufficient
Is memory ability before a meal worse than after a meal? Twelve people were given memory tests before their meal and then again after their meal. The data is shown below. A higher score indicates a better memory ability. Before 74 68 82 97 76 81 80 75 88 84 79 91 After 76 68 85 94 79 88 83 72 90 87 79 90 Assume the distribution of the differences is normal. What can be concluded at the 0.05 level of significance? (d = score before - score after) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean memory ability before a meal is worse than the population mean memory ability after a meal.
> t 0.068 Fail to reject the null hypothesis insufficient
Does the average Presbyterian donate more than the average Catholic in church on Sundays? The 35 randomly selected members of the Presbyterian church donated an average of $28 with a standard deviation of $14. The 44 randomly selected members of the Catholic church donated an average of $24 with a standard deviation of $12. What can be concluded at the 0.10 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifPres. = mu.gifCatholic Ha: mu.gifPres. mu.gifCatholic Test statistic: p-Value = 0.092 Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the average Presbyterian donates more than the average Catholic in church on Sundays.
> t 0.092 Reject the null hypothesis sufficient
Members of fraternities and sororities are required to volunteer for community service. Do fraternity brothers work more volunteer hours on average than sorority sisters? The data below show the number of volunteer hours worked for 13 randomly selected fraternity brothers and 13 randomly selected sorority sisters. Frat 16 12 5 24 32 9 17 11 5 8 14 5 10 Sor 8 11 7 19 7 3 6 18 8 6 10 24 16 Assume that both populations follow a normal distribution. What can be concluded at the 0.01 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifFrat = mu.gifSor Ha: mu.gifFrat mu.gifSor Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that fraternity brothers work more volunteer hours on average than sorority sisters.
> t 0.250 Fail to Reject the null hypothesis insufficient
On average is the younger sibling's IQ higher than the older sibling's IQ? Eleven sibling pairs were given IQ tests. The data is shown below. Younger 104 96 102 125 86 100 90 117 102 110 81 Older 107 87 99 121 90 96 94 117 108 114 72 Assume the distribution of the differences is normal. What can be concluded at the 0.05 level of significance? (d = Younger Sibling IQ - Older Sibling IQ) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean IQ score for younger siblings is higher than the population mean IQ score for older siblings.
> t 0.332 Fail to reject the null hypothesis insufficient
You are interested in constructing a 95% confidence interval for the proportion of all pregnant women who regularly drink caffeinated beverages. Of the 1000 randomly selected pregnant women, 271 regularly drank caffeinated beverages. A. With 95% confidence the proportion of all pregnant women who regularly drink caffeinated beverages is between and . B. If many groups of 1000 randomly selected pregnant women were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population proportion of pregnant women who regularly drink caffeinated beverages and about percent will not contain the true population proportion.
? 0.299 95 5
Convenience Sample Bias
A form of bad sample frame. Easiest sample to take is not representative of population. e.g. You work at facebook and survey 5000 on whether they love FB. This is convenience sample bias because you are likely friends with your coworkers, who also work at facebook and are more likely to either love it or hate it (depending on how working there affects them).
Correlation (R or r)
A statistic that measures strength and direction of a linear association between two quantitative variables where no outliers are present.
Continuous
A variable is continuous if it is not discrete.
Lurking variable
A variable not x or y that causes a change in either x or y.
Random Variable
A variable that has a single numerical value that is determined by the chance of an outcome of an experiment.
What does a z-score measure?
A z-score is the number of standard deviations from the mean a data point is.
Consider the three histograms below that have the same range. Order them by standard deviation with lowest standard deviation first, then the middle standard deviation, and finally the highest standard deviation.
A, C, B
Visualize the probability table on a graph
An outcome is more likely if there is more area in the bar for that value on the graph We also know that the sum of the areas of the bars must be 1 Heights must be at least 0 (no negative bars)
Probability density function (pdf)
Area under graph =1
The Law of Large Numbers
As the number of trials in a probability experiment increases, the difference between the theoretical probability of an event and the relative frequency probability approaches zero.
Exponential Distrbution
Asks about a continuous idea, usually related to time: f(x) = {λe^(-λx) when x >/= 0 0 otherwise
Hypothesis Testing
Based on sample evidence, a procedure for determining whether the hypothesis stated is a reasonable statement and should not be rejected, or is unreasonable and should be rejected.
Calculating area under a distribution using a z-score table
Calculate the z-score, then find the % value on the table corresponding to your calculated z-score value.
Subjective probability
Consider a number of factors important to the situation, personally decide how important they are, and use these to come up with an answer. eg. I have a 60% chance of getting an A because i do all readings, HW, come to classes, and am an A/B student in other math classes.
Correlation vs causation thinking
Correlation thinking: weight and height are correlated, so heavier people tend to be taller Causation thinking: (wrong) weighing more causes you to become taller
Categorical/Qualitative Data
Data that falls into categories or labels; often text ideas; tend not to have units
Adding constants to random variables
E(X±c) = E(X) ± c Var(x±c) = Var(x) SD(x±c) = SD(x)
If x = Uniform(a,b):
E(x) = (a+b)/2 Var(x) = [(b-a)^2]/ 12 SD)x = sqrt(Var(x)) = (b-a) / sqrt(12)
A discrete probability distribution function has two characteristics:
Each probability is between zero and one, inclusive. The sum of the probabilities is one.
Disjoint events
Events A and B are disjoint if they share no common outcomes ex: A: rolling an even number on a die B: rolling a 5 on a die
Independence
Events A and B are independent if event A occurring has no effect on the probability of B occurring, and vice versa.
Population
Everything you want to study e.g. A huge pot of soup
Type II Error
Failing to reject the null hypothesis when it is false. β
A larger sample size will result in a smaller sample standard deviation.
False
Convenience sampling was used to ask 20 students how much money they spent on books this quarter. The standard deviation was found to be $35. If a larger sample with a more scientific sampling technique is used then the standard deviation of the new sample will go down.
False
For any sample of size 100, the population mean, the mean of the sampling distribution, and the sample mean will always be equal to each other.
False
If a distribution is normal, then it is not possible to randomly select a value that is more than 4 standard deviations from the mean.
False
If a researcher wants to have a distribution of a sample that is approximately normal then that researcher should collect a sample with sample size greater than 30.
False
If the data are quantitative and there are more than 30 numbers in the data set, then the distribution of this data will always be approximately normally distributed.
False
If the distribution of the population has a nonzero standard deviation and mean 10 then the probability that an individual data value will be greater than 11 is always less than the probability that the mean of 25 randomly selected data values will be greater than 11.
False
If the profit on a raffle ticket has an expected value of -5 dollars, then the most likely outcome of purchasing a raffle ticket is a net loss of $5.
False
If two six sided dice are rolled, then the sum of the dice is an example of a continuous random variable.
False
If x is a random variable with a general normal distribution and if a is a positive number and if P(x > a) = 0.24, then P(x < -a) is also 0.24.
False
If x represents a random variable coming from a normal distribution and P(x > 15.7) = 0.04, then P(x < -15.7) = 0.04.
False
Ten cards are selected out of a 52 card deck without replacement and the number of Jacks is observed. This is an example of a Binomial Experiment.
False
The symbol U^p represents the proportion of a sample of size n.
False
Variance is the square root of standard deviation.
False
If the expected value for a five dollar raffle ticket is 0.85, then there is a 85% chance that the ticket will win.
False, we can only say that if many raffle tickets are purchased then the average return is likely to be $0.85. Notice that this is a dollar amount, not a probability.
Claim 2: 65% of UCSD students are FB users
False. Population parameter may not match the sample statistic
Percent variance explains (R^2 or r^2)
For a given linear model, r^2 (the correlation coefficient squared) is the proportion of the variation in the y-variable that is accounted for (or explained) by the variation in the x-variable
Central Limit Theorem
Given a random variable (RV) with known mean μ and known standard deviation, σ, we are sampling with size n, and we are interested in two new RVs: the sample mean, the size (n) of the sample is sufficiently large, then the distribution of the sample means and the distribution of the sample sums will approximate a normal distributions regardless of the shape of the population. The mean of the sample means will equal the population mean, and the mean of the sample sums will equal n times the population mean. The standard deviation of the distribution of the sample means, σ/√n, is called the standard error of the mean.
Sampling Distribution
Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution.
Goodness-of-fit
Goodness of Fit Test: 1 population (NHL players, peas) split across a categorical variable (birth month, phenotype). You should have a 1-dimensional table of counts.
A venture capitalist, willing to invest $1,000,000, has three investments to choose from. The first investment, a software company, has a 20% chance of returning $5,000,000 profit, a 30% chance of returning $1,000,000 profit, and a 50% chance of losing the million dollars. The second company, a hardware company, has a 10% chance of returning $3,000,000 profit, a 30% chance of returning $1,000,000 profit, and a 60% chance of losing the million dollars. The third company, a biotech firm, has a 10% chance of returning $6,000,000 profit, a 80% of no profit or loss, and a 10% chance of losing the million dollars. Order the expected values from smallest to largest.
Hardware, Biotech, Software Software Company: 5,000,000 x 0.2 + 1,000,000 x 0.3 + -1,000,000 x 0.5 = 800,000 Hardware Company: 3,000,000 x 0.1 + 1,000,000 x 0.3 + -1,000,000 x 0.6 = 0 Biotech Firm: 6,000,000 x 0.1 + 0 x 0.8 + -1,000,000 x 0.1 = 500,000
Discrete random variable
Has only a) finitely-many outcomes (e.g. X is time of DMV service) or b) space between the values (e.g. Y is the number of meteors that have hit a planet)
From the SEpi equation: what does (xf - xhat)^2 tell us?
How far the individual is from the center of all the individuals we used to build our model. As we move far away from the core of our data, we should be more worried.
The expected value of the geometric model answers
How many trials are needed to get the first success, on average
From the SEpi equation: what does [SE(b1)]^2 tell us?
How unsure we are about the real slope of the regression line.
If you have two quantitative variables, you can measure the strength of an association using a correlation coefficient.
If you have two qualitative (categorical) variables, you can use a chi-squared test for the significance of an association.
Approximation rule
In ANY data set that is normally distributed: -About 68% of the data values are within 1 SD of the mean -About 95% of the data values are within 2 SDs of the mean -About 99.7% of the data values are within 3 SDs of the mean
Chapter Review
In a population whose distribution may be known or unknown, if the size (n) of samples is sufficiently large, the distribution of the sample means will be approximately normal. The mean of the sample means will equal the population mean. The standard deviation of the distribution of the sample means, called the standard error of the mean, is equal to the population standard deviation divided by the square root of the sample size (n).
Common question for the Poisson distribution
In general, <some behavior> is average. How likely am I to see <some specific behavior>? e.g. You have 12.5 emails per day and X% are spam. How likely are you to see 5 spam emails in a day? e.g. There is an average of 2.5 goals scored in each soccer game. How likely is it for a game to have 9 goals?
Assumptions made for statistical inference when using the t-distribution
Independence of data: Randomization condition, <10% condition. Population distribution must be nearly normal: -look for near-normality in histogram of your sample -More skew is OK as n gets larger
Which is more accurate: interpolation or extrapolation?
Interpolation is more accurate because the pattern you built applies to the data within range
Determine the level of measurement for the following variable: Longitude
Interval
Determine the level of measurement for the following variable: Temperature
Interval
The density graph is NOT P(X)
It is a function that helps you figure out probabilities by examining the area under it. Its shape suggests what values are more likely (relatively) but the probability of any particular otucome occuring is still 0
For smaller sample sizes (n<30) or populations where you don't know σ (and must approximate using sx), there is a better approximation of the sampling distribution than the normal model
It is called the t-distribution
How do you increase the power of a test?
Lower the cutoff value (α)
Margin of Error
ME = z*(sqrt(p̂*q̂ / n)) ME = z*(SE)
The average weight of a newborn German Shepherd is 485 g with a standard deviation of 57 g. The average weight of a Labrador Retriever is 282 g with a standard deviation of 21 g. The average weight of a Poodle is 176 grams with a standard deviation of 8 g. Rover the German Shepherd was born at 400 g, Max the Labrador Retriever was born at 240 g, and Fluffy the Poodle was born at 184 g. Which of these three puppies is the smallest relative to its breed?
Max the Labrador Retriever Rover the German Shepherd is 1.5 standard deviations below the mean for its breed, Max the Labrador Retriever is 2 standard deviations below the mean for its breed, and Fluffy the Poodle is 1 standard deviation above its mean. Hence Max the Labrador Retriever is the smallest relative to his breed.
Determine whether the following is an example of a sampling error or a non sampling error. A sociologist surveyed 300 people about their level of anxiety on a scale of 1 to 100. Unfortunately, the person inputting the data into the computer accidentally transposed six of the numbers causing the statistics to have errors.
Non Sampling Error
You are interested in finding a 95% confidence interval for the mean number of units students take at your college. The standard deviation for all US college students' units taken is 3.2. Suppose you survey 64 students at your college and find that they averaged 14.3 units with a standard deviation of 2.9 units. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the mean number of units taken by all students at your college is between and . C. If many groups of 64 randomly selected students were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of units taken by students at your college and about percent will not contain the true population mean number of units taken by students at your college.
Normal 13.52 15.08 95 5
You are interested in finding a 90% confidence interval for the average number of miles that Americans drove last year. Based on historical records, you know that the standard deviation for annual miles driven is 3518 miles. Suppose you survey 92 randomly selected Americans and find that they averaged 14,289 miles with a standard deviation of 3096 miles. Round your answers to the nearest whole number. A. The sampling distribution follows a distribution. B. With 90% confidence the mean number of miles driven by all Americans last year is between and . C. If many groups of 92 randomly selected Americans were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of miles driven by Americans and about percent will not contain the true population mean number of miles driven by Americans.
Normal 13686 14892 90 10
You are interested in finding a 90% confidence interval for the average number of names that people can correctly recall after being introduced to 40 people at a party. Based on past research, you know that the standard deviation for this number is 6.34 names. Suppose you surveyed 55 randomly selected people and found that they correctly remembered an average of 18.16 names. The standard deviation for this group was 4.28 names. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 90% confidence the mean number of names that all people can remember when introduced to 40 people at a party is between and . C. If many groups of 55 randomly selected people were surveyed, then a different confidence interval would be produced for each group. About percent of these confidence intervals will contain the true population mean number of names remembered and about percent will not contain the true population mean number of names remembered.
Normal 16.75 19.57 90 10
Based on hospital records the standard deviation for the recovery time after ACL surgery is 8 days. You are testing out a new ACL surgery and want to find a 99% confidence interval for the mean recovery time. You perform ACL surgery with this new technique on 38 randomly selected patients and find that they averaged 36 days for recovery and had a standard deviation of 7 days. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 99% confidence the mean recovery time for all patients who will receive this surgery is between and . C. If many groups of 38 randomly selected Americans were surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of days for recovery and about percent will not contain the true population mean number of days for recovery.
Normal 32.66 39.34 99 1
The standard deviation for time to graduate from the university is 0.7 years. You are interested in finding a 95% confidence interval for the average time business majors take to graduate. You survey 45 recent graduates from the business department and find that they averaged 4.8 years and had a standard deviation of 0.9 years. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the mean time for business majors to graduate is between and . C. If many groups of 45 randomly selected recent graduates from the business department, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean time to graduate and about percent will not contain the true population mean time to graduate.
Normal 4.60 5.00 95 5
The standard deviation for car battery lifetime is known to be 1.2 years. You are interested in finding a 95% confidence interval for the mean lifetime of batteries when the car is driven primarily near the ocean. You test 40 cars and find that their battery's average lifetime is 5.3 years and the standard deviation is 1.7 years. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the mean battery life for all cars that are driven primarily near the beach is between and . C. If many groups of 40 randomly selected cars that are primarily driven near the beach, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean battery lifetime and about percent will not contain the true population mean battery lifetime.
Normal 4.93 5.67 95 5
You are interested in finding a 95% confidence interval for the average number of steps that people take walking each day. Based on past research, you know that the standard deviation for this number is 1239 steps. Suppose you asked 67 randomly selected people to wear a pedometer for one day and found that they averaged 5293 steps. The standard deviation for this group was 1431 steps. Round your answers to the nearest whole number. A. The sampling distribution follows a distribution. B. With 95% confidence the mean number of steps that people take each day is between and . C. If many groups of 67 randomly selected people were observed, then a different confidence interval would be produced for each group. About percent of these confidence intervals will contain the true population mean number of steps per day and about percent will not contain the true population mean number of steps taken per day.
Normal 4996 5590 95 5
The standard deviation for the amount of money people spend at the mall is known to be $23. You are interested in finding a 90% confidence interval for the average amount of money that people spend at the mall on Valentines Day. You survey 67 shoppers on Valentines Day as they leave the mall and find that they spent and average of $58 and had a standard deviation of $25. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 90% confidence the mean amount of money shoppers spend on Valentines Day is between and . C. If many groups of 67 randomly selected Valentines Day shoppers are surveyed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean amount spent and about percent will not contain the true population mean amount spent.
Normal 53.38 62.62 90 10
Formula Review
Normal Distribution: X ~ N(µ, σ) where µ is the mean and σ is the standard deviation. Standard Normal Distribution: Z ~ N(0, 1). Calculator function for probability: normalcdf (lower x value of the area, upper x value of the area, mean, standard deviation) Calculator function for the kth percentile: k = invNorm (area to the left of k, mean, standard deviation)
Assessing Normality
Normal probability plots If sample data is taken from a population that is normally distributed, a normal probability plot of the actual values versus the expected Z-scores will be approximately linear.
Currently patrons at the library speak at an average of 63.2 decibels and the standard deviation is 4 decibels. Will this average increase after a "keep your voices down" sign is removed from the front entrance? After the sign was removed, the librarian random recorded 41 patrons speaking at the library. Their average decibel level was 64.1. H0: u = 63.2 Ha: u < 64
Not Equal
The average final exam score for the statistics course is 77% and the standard deviation is 8%. A professor wants to see if there will be a difference in the average final exam score for students who are given colored pens on the first day of class. The final exam scores for the 18 randomly selected students who were given the colored pens are shown below. Assume that the distribution of the population is normal. 75, 88, 84, 68, 96, 72, 81, 97, 77, 79, 85, 81, 52, 80, 98, 83, 78, 90 What can be concluded at the 0.05 level of significance? H0: u = 77
Not Equal
Density function
Only area under the graph is linked to probability.
Determine the level of measurement for the following variable: Survey responses to questions about the customer service rated as "excellent," "good," "satisfactory," or "unsatisfactory."
Ordinal
P(A|B)
P(A and B)/P(B)
Losing Disjointness: P(A or B) =
P(A) + P(B) - P(A and B)
If all the outcomes in a sample space are equally likely, we define the probability of an event A to be
P(A) = (# of outcomes in event A)/(# fo outcomes in the sample space) where 0 <= P(A) <= 1
Complement rule
P(A) = 1 - P(A^c)
In general, P(A and B) =
P(A|B) * P(B)
Advanced Baye's Theorem (for when P(B) is not known)
P(A|B) = [P(B|A)*P(A)] / [P(B|A)*P(A) * P(B|A^c)P(A^c)]
P(making a Type I error) =
P(reject H0|H0 is true) = alpha
The Poisson Distribution
P(x) = (λ^x)(e^-λ) / (x!) λ = average value x = value whose probability you are trying to predict E(x) = λ SD(x) = sqrt(λ)
continuous probability distributions
PROBABILITY = AREA
A New York student, Alma, scored 135 points on her standardized state test that had a mean score of 115 and a standard deviation of 10. A Kansas student, Peter, scored 112 points on his standardized state test that had a mean score of 94 and a standard deviation of 6. A New Mexico student, Grace, scored 173 points on her standardized state test that had a mean score of 181 and a standard deviation of 8 points. Which student did the best relative to the rest of his or her state?
Pete Alma's score is 2 standard deviations above the mean for her state, Peter's score is 3 standard deviations above the mean for his state, and Grace's score is 1 standard deviation below the mean for her state. Peter's score is best relative to his state.
Political pollsters may be interested in the proportion of people that will vote for a particular cause. Match the vocabulary word with its corresponding example. All the voters in the district The 750 voters who participated in the survey The proportion of all voters from the district who will vote for the cause The proportion of the 750 survey participants who will vote for the cause The answer "Yes" or "No" to the survey question The list of 750 "Yes" or "No" answers to the survey question
Population Sample Parameter Statistic Variable Data
A rancher is interested in the average age that a cow begins producing milk. Match the vocabulary word with its corresponding example. All milk cows The 62 milk cows that were observed by the rancher The average age that all milk cows are when they first produce milk The average age for the 62 observed milk cows as they first produced milk The age when a milk cow first produced milk The list of the 62 ages
Population Sample Parameter statistic Variable Data
prediction interval vs confidence interval
Prediction: Range of values that future observations will fall for [a single person] Confidence: Range of values that future observations will fall for [the average of all people like that person]
Formula Review
Probability density function (pdf) f(x): f(x) ≥ 0 The total area under the curve f(x) is one. Cumulative distribution function (cdf): P(X ≤ x)
The Central Limit Theorem (CLT)
Proves the sampling distribution for a proportion statistic or mean statistic will be a normal distribution, regardless of the population distribution (assuming we have met the 2 conditions: Independence and Nearly Normal)
Q1, Q2, and Q3
Q1: median in the first (lower) half of the data Q2 (median): median of the whole distribution Q3: median in the second (upper) half of the data
Bernoulli trial
Random variable with precisely 2 independent outcomes. P(x) = {p (x=success) or [1-p = q] (x=failure)
Confidence Interval
Range of values around a point estimate that convey our uncertainty about the population parameter (as well as a range of plausible values for it)
Type I Error
Rejecting the null hypothesis when it is actually true
Standard Deviation (σ)
SD(X) = sqrt(Var(X))
Systematic Sampling
Sample elements are selected from a list or from sequential files e.g. Asking every 10th person you see
Bad Sample Frame Bias
Sample is not representative of population. e.g. Want to determine if people in US like facebook. Study facebook users in US. You completely underrepresent people who don't use facebook. Maybe they don't use facebook because they hate it!
Determine whether the following is an example of a sampling error or a non sampling error. 12% of all people are left handed. A researcher randomly selected 200 people and found that 16% of them were left handed. No mistakes were made in the data collection or data recording. The 4% difference is due to ...
Sampling Error
Consider the boxplot below. Box Plot with five Point Summary: 3,8,10,20,38 What quarter has the smallest spread of data? What is that spread? What quarter has the largest spread of data? What is that spterm-34read? Find the Inter Quartile Range (IQR): Which interval has the most data in it? What value could represent the 53rd percentile?
Second 2 Fourth 18 12 3-10 11
Consider the boxplot below. boxplot with five point summary: 24,27,29,36,42 What quarter has the smallest spread of data? What is that spread? What quarter has the largest spread of data? What is that spread? Find the Inter Quartile Range (IQR): Which interval has the most data in it? What value could represent the 55th percentile?
Second 2 Third 7 9 26-29 31
A venture capitalist, willing to invest $1,000,000, has three investments to choose from. The first investment, a software company, has a 10% chance of returning $5,000,000 profit, a 30% chance of returning $1,000,000 profit, and a 60% chance of losing the million dollars. The second company, a hardware company, has a 20% chance of returning $3,000,000 profit, a 40% chance of returning $1,000,000 profit, and a 40% chance of losing the million dollars. The third company, a biotech firm, has a 10% chance of returning $6,000,000 profit, a 70% of no profit or loss, and a 20% chance of losing the million dollars. Order the expected values from smallest to largest.
Software, Biotech, Hardware Software Company: 5,000,000 x 0.1 + 1,000,000 x 0.3 + -1,000,000 x 0.6 = 200,000 Hardware Company: 3,000,000 x 0.2 + 1,000,000 x 0.4 + -1,000,000 x 0.4 = 600,000 Biotech Firm: 6,000,000 x 0.1 + 0 x 0.7 + -1,000,000 x 0.2 = 400,000
Chapter Review
Some statistical measures, like many survey questions, measure qualitative rather than quantitative data. In this case, the population parameter being estimated is a proportion. It is possible to create a confidence interval for the true population proportion following procedures similar to those used in creating confidence intervals for population means. The formulas are slightly different, but they follow the same reasoning. The "plus four" method for calculating confidence intervals is an attempt to balance the error introduced by using estimates of the population proportion when calculating the standard deviation of the sampling distribution. Simply imagine four additional trials in the study; two are successes and two are failures. Calculate p′=x+2n+4 , and proceed to find the confidence interval. When sample sizes are small, this method has been demonstrated to provide more accurate confidence intervals than the standard formula used for larger samples.
Parameter
Some value summarizing the population
Statistic
Some value summarizing the sample
Standard deviation equation
Sqrt(sum of (y1-mean))/(n-1))
Null hypothesis (H0)
Statement that says nothing interesting is happening (the opposite of what you're looking for) e.g. if you're trying to prove that a drug produces more treatment than a placebo, your null hypothesis would be that the drug produces the same amount of treatment as the placebo: p(drug) = p(placebo)
If you were required to survey Fresno City College students regarding their employment status, which sampling technique would you use? Explain.
Stratified sample, it is a method for selecting a random sample used to ensure that subgroups of the population are represented adequately; divide the population into groups (strata). Use simple random sampling to identify a proportionate number of individuals from each stratum.
A fitness center is interested in finding a 90% confidence interval for the mean number of days per week that Americans who are members of a fitness club go to their fitness center. Records of 220 members were looked at and their mean number of visits per week was 2.4 and the standard deviation was 2.1. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 90% confidence the population mean number of visits per week is between and visits. C. If many groups of 220 randomly selected members are studied, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of visits per week and about percent will not contain the true population mean number of visits per week.
T 2.17 2.63 90 10
A researcher is interested in finding a 95% confidence interval for the mean number of times per day that college students text. The study included 210 students who averaged 28 texts per day. The standard deviation was 21 texts. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the population mean number of texts per day of is between and texts. C. If many groups of 210 randomly selected students are studied, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of texts per day and about percent will not contain the true population mean number of texts per day.
T 25.14 30.86 95 5
The mayor is interested in finding a 95% confidence interval for the mean number of pounds of trash per person per week that is generated in city. The study included 120 residents whose mean number of pounds of trash generated per person per week was 31.5 pounds and the standard deviation was 7.8 pounds. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the population mean number of pounds per person per week is between and pounds. C. If many groups of 120 randomly selected people in the city are studied, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of pounds of trash generated per person per week and about percent will not contain the true population mean number of pounds of trash generated per person per week.
T 30.09 32.91 95 5
A psychologist wants to use a 95% confidence interval to estimate the mean number of days that American teens take to go on their first date after breaking up with their boyfriend or girlfriend. He surveyed 68 teens who averaged 44 days and had a standard deviation of 19 days. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the population mean number of days that American teens take to go on their first date after breaking up with their boyfriend or girlfriend is between and . C. If many groups of 68 randomly selected teens are surveyed, then a different confidence interval will be produced from each group. About percent of these confidence intervals will contain the true population mean number of days that American teens take to go on their first date after breaking up with their boyfriend or girlfriend and about percent will not contain the true population mean number of days that American teen take to go on their first date after breaking up with their boyfriend or girlfriend.
T 39.4 48.6 95 5
A medical researcher is testing the effectiveness of a new pain medication for mothers during child birth. She is interested in finding a 99% confidence interval for the mean amount of pain on a scale of 1 to 10,with 1 meaning no pain and 10 severe pain, that medicated mothers experience during child birth. The study included 57 mothers who took the experimental medication. The mean amount of pain these mothers experienced was 4.8 and the standard deviation was 1.9. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 99% confidence the population mean amount of pain for all birthing mothers after taking the medication is between and . C. If many groups of 57 randomly selected birthing mothers are given the medication, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean pain level and about percent will not contain the true population mean pain level.
T 4.13 5.47 99 1
A researcher is interested in finding a 95% confidence interval for the mean number minutes students are concentrating on their professor during a one hour statistics lecture. The study included 150 students who averaged 42 minutes concentrating on their professor during the hour lecture. The standard deviation was 12 minutes. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the population mean minutes of concentration is between and minutes. C. If many groups of 150 randomly selected students are studied, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of minutes of concentration and about percent will not contain the true population mean number of minutes of concentration.
T 40.06 43.94 95 5
A CEO of a large company is interested in finding a 95% confidence interval for the mean number of days per year that employees call in sick. The study included 135 employees who averaged 7 sick days per year. The standard deviation was 4 sick days. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the population mean number of sick days per year is between and days. C. If many groups of 135 randomly selected employees are studied, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of sick days per year and about percent will not contain the true population mean number of sick days per year.
T 6.32 7.68 95 5
An environmental scientist wants to use a 95% confidence interval to estimate the mean number of hours per day her solar panel receives direct sunlight. She observed the panel for 48 randomly selected days and found that the solar panel received an average of 7.1 hours of sunlight per day and the standard deviation was 2.3 hours. Round your answers to two decimal places. A. The sampling distribution follows a distribution. B. With 95% confidence the population mean number of hours of sunlight the solar panel receives per day for all days is between and . C. If many groups of 48 randomly selected days were observed, then a different confidence interval would be produced from each group. About percent of these confidence intervals will contain the true population mean number of hours of sunlight per day that the solar panel receives and about percent will not contain the true population mean number of hours of sunlight that the solar panel receives.
T 6.4 7.8 95 5
Binomial Distribution
The distribution of the result of an experiment with -A fixed number of trials, n -The trials are independent -Each trial results in success or failure -The probability of success, p, is the same for each trial.
Memoryless
The exponential distribution is memoryless. The probability of a washing machine lasting for 3 years is the same as the probability of a washing machine lasting for 3 years if it has already lasted 30 years. (This might not be true in real life, but it is true in probability)
If f(x) is a density function for the continuous random variable x, then P(a < x < b) equals
The integral from a to b of f(x)
Requirements for both types of X^2 tests
The requirements are the same: 1) You start with a one-dimensional (k x 1) table of observed counts. You wish to compare these counts to those predicted by some theory. 2) The counts in the cells of the table must be independent of one another. Randomly sampling the people that comprise these counts usually gives us this. 3) The expected count for each cell must be at least 5. (Note: We don't require that the observed counts be at least 5, just the expected counts.)
What are the effects on error of increasing alpha (α)
The risk of a Type I error is decreased and the risk of a Type II error is increased.
Which of the following are reasons that a sampling technique may not be scientific. Choose all that apply.
The wording of survey question influences the response. The sample size is too small. The sample is not representative of the population. Two factors cannot be separated to determine which is the one that is responsible for the outcome. The graphs are drawn in a way to mislead the reader. The funders of the project are partial to the results. People who were asked refused to answer. Trying to conclude that there is a cause-and-effect relationship when something else causes both. Self-Selected Sample.
Raw Scores & Z-Scores
The z-score represents the number of standard deviations away from the mean to the value (x).
A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials, n. There are only two possible outcomes, called "success" and, "failure" for each trial. The letter p denotes the probability of a success on one trial and q denotes the probability of a failure on one trial. The n trials are independent and are repeated using identical conditions.
Consider the boxplot below. Boxplot with five point summary: 6,10,19,21,26 What quarter has the smallest spread of data? What is that spread? What quarter has the largest spread of data? What is that spread? Find the Inter Quartile Range (IQR): Which interval has the most data in it? What value could represent the 53rd percentile?
Third 2 Second 9 11 19-22 20
Volunteer bias
Those who are willing to take their own time to voluntarily complete something like a survey usually look different from those who don't. It does not represent those are opt out of volunteering.
Hypothesis testing on Slopes of Regression Lines
Tn-2 = b1-0/SE(b1) -SE(b1) will be given
neutral way of inflating r^2
Tossing outliers and doing the analysis without them (good or bad depending on the situation) -if outliers are trolls, tossing them is fine -if outliers are valid, observed data, you cannot toss them
15 cards are selected out of a 52 card deck such that after each card is selected, it is placed back into the deck and the deck is reshuffled. Then the total number of hearts selected follows a binomial distribution.
True
200 randomly selected Americans are asked if they smoke cigarettes. Then the results of this procedure can be treated as a binomial distribution.
True
A survey is taken of 35 randomly selected LTCC students asking them, "Do you plan to transfer to a university next year?" The distribution of possible responses of the 35 students is an example of a binomial distribution.
True
A survey is taken of 90 randomly selected Americans asking them, "Do you think congress should vote to change the constitution?" The distribution of possible responses of the 90 Americans is an example of a binomial distribution.
True
Exactly 50% of the area under the normal curve lies to the right of the mean.
True
Given a survey conducted of randomly selected people, the number of siblings people have is a discrete random variable and the distance from their bellybuttons to their knees is a continuous random variable.
True
If Z is a random variable from a standard normal distribution and if P(Z<a)=0.42, then P(Z<-a)=0.58.
True
If a business owner, who is only interested in the bottom line, computes the expected value for the profit made in bidding on a project to be -3,000, then this owner should not bid on this project.
True
If a distribution is normal with mean 8 and standard deviation 2, then the median is also 8.
True
If a distribution is skewed right, then the median for this population is smaller than the median for the sampling distribution with sample size 100.
True
If the sample size is 100 and the population standard deviation is 20, then the standard deviation of the sampling distribution is 2.
True
If x represents a random variable coming from a normal distribution and P(x < 10.4) = 0.78, then P(x > 10.4) = 0.22.
True
If x represents a random variable coming from a normal distribution with mean 3 and if P(x > 4.8) = 0.15, then P(3 < x < 4.8) = 0.35.
True
If z is a random variable with a standard normal distribution and if a is a positive number and if P(z > a) = 0.15, then P(-a < z < a) = 0.7.
True
The population mean will always be the same as the mean of all possible x-bars that can be computed from samples of size 200.
True
Draw SRS of 200 UCSD students and ask if they have a FB account. 130 say they do. Claim 1: 65% of our sample are FB Users
True. p̂ = 130/200 = 65%
Claim 3: About 65% of UCSD students use FB
Vague. Need to learn how to do better. "About" is not precise enough in statistics
Do we always get a normal model for the sampling distribution of a mean
We do if 2 conditions are met: 1. Independence Assumption: The items in each sample must be independent of one another. Typically, better to check two conditions (which effectively mean independence) 1.A. Randomness Condition: The items in your sample must be randomly chosen 1.B. <10% Condition: Your sample size needs to be <10% of the population size. 2. Nearly Normal Condition (Sample size condition): The population histogram should look nearly normal. If this histogram shows skew, the sample size needs to be large for the sampling distribution to be normal. e.g. n>30 for moderate skew, n>60 for large skew.
Uniform Distribution
When a finite interval of possibilities are all equally likely: f(x) = {1/(b-a) when a </= x </= b 0 otherwise height = 1/(b-a)
Tower of power
When original data or the residuals convince you that the data are not straight enough, apply a mathematical function to the values
Proportions and means
When populations are big, we must draw a random sample and estimate these parameters using statistics
Two-sided alternative hypothesis
When you are excited about results on both sides. You are wondering if your percentage is different from the comparison %. P(a) =/= P(b)
Geometric model
X = Geom(p), where p is the probability of success and X is the number of trials needed to get a success. -Assume we are doing a Bernoulli trial with success probability p (and failure probability q=1-p) over and over until we get a success. The probability of getting a success in x trials is: P(x)=[q^(x-1)]*p E(X) = 1/p SD(X) = sqrt(q/p^2) = [sqrt(q)]/(p)
Formula Review
X ∼ N(μ, σ) μ = the mean σ = the standard deviation
Binomial Model
X=Binom(n,p), where n is the amount of trials, p is the probability of success, and X is the number of successes in n trials. -Probability of getting k successes in n Bernoulli trials is: P(k) = (n nCr k) * (q^(n-k)) * (p^(k)) E(X) = np SD(X) = sqrt(npq)
Does empirical probability make sense?
Yes, you tend to get what you expect
Theoretical probability
You build a mathematical model to describe a situation and use the axioms of probability to determine the likelihood of some events eg. Determine chance of rolling even numbers on a die is 1/2 because 3/6 of possible otucomes are even numbers
Empirical probability
You determine how likely something is by trying it over and over and looking at tons of data. eg. determining if a coin is fair by flipping it 100,000 times and recording number of heads and tails
Multistage Sampling
You focus on undergrads today and ask every 4th one you see. You do grads the next day and ask every 4th one you see. -Uses 2 or more of the previous methods (excluding SRS)
By assuming H0, build a universe where p is in accordance with H0.
You must first make sure that the sampling distribution is approximately normal: Make sure <10% of total population Make sure np >= 10 success and nq >= 10 fails
Finding Z using pooling
Z = (p̂1-p̂2)-0 / SEpooled SEpooled = sqrt(p̂pooled - q̂pooled)/n1 + (p̂pooled - q̂pooled)/n2)
The five number summary
[lower whisker {Minimum value, Q1, Median, Q3, Maximum Value} upper whisker]
All confidence intervals work the same way, with slight changes
a
Random Variable (RV)
a characteristic of interest in a population being studied; common notation for variables are upper case Latin letters X, Y, Z,...; common notation for a specific value from the domain (set of all possible values of a variable) are lower case Latin letters x, y, and z. For example, if X is the number of children in a family, then x represents a specific integer 0, 1, 2, 3,.... Variables in statistics differ from variables in intermediate algebra in the two following ways. -The domain of the random variable (RV) is not necessarily a numerical set; the domain may be expressed in words; for example, if X = hair color then the domain is {black, blond, gray, green, orange}. -We can tell what specific value x the random variable X takes only after performing the experiment.
Standard Normal Distribution
a continuous random variable (RV) X ~ N(0, 1); when X follows the standard normal distribution, it is often noted as Z ~ N(0, 1).
Exponential Distribution
a continuous random variable (RV) that appears when we are interested in the intervals of time between some random events, for example, the length of time between emergency arrivals at a hospital
Normal Distribution
a continuous random variable where μ is the mean of the distribution and σ is the standard deviation;
Binomial Probability Distribution
a discrete random variable (RV) that arises from Bernoulli trials; there are a fixed number, n, of independent trials. "Independent" means that the result of any trial (for example, trial one) does not affect the results of the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV X is defined as the number of successes in n trials. The notation is: X ~ B(n, p). The mean is μ = np and the standard deviation is σ = √npq.
Binomial Distribution
a discrete random variable (RV) which arises from Bernoulli trials; there are a fixed number, n, of independent trials. "Independent" means that the result of any trial (for example, trial 1) does not affect the results of the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV X is defined as the number of successes in n trials.
Do not create a regression when what type of outlier is present?
a high influence outlier is present
Probability Distribution Function (PDF)
a mathematical description of a discrete random variable (RV), given either in the form of an equation (formula) or in the form of a table listing all the possible outcomes of an experiment and the probability associated with each outcome.
Average
a number that describes the central tendency of the data; there are a number of specialized averages, including the arithmetic mean, weighted mean, median, mode, and geometric mean.
Standard Deviation of a Probability Distribution
a number that measures how far the outcomes of a statistical experiment are from the mean of the distribution
Mean
a number that measures the central tendency; a common name for mean is 'average.' The term 'mean' is a shortened form of 'arithmetic mean.'
Parameter
a numerical characteristic of a population
if the correlation is 1 or -1, the scatterplot must make
a perfect line
Binomial Experiment
a statistical experiment that satisfies the following three conditions: There are a fixed number of trials, n. There are only two possible outcomes, called "success" and, "failure," for each trial. The letter p denotes the probability of a success on one trial, and q denotes the probability of a failure on one trial. The n trials are independent and are repeated using identical conditions.
Give an example of a variable for each type of level of measurement: a) Nominal b) Ordinal c) Interval d) Ratio
a) Nominal - Nation of origin b) Ordinal - Highest degree conferred c) Interval - Movie rating d) Ratio - Volume of water used by a household in a day
If x = Uniform(1,4) What is probability of getting a rational number? What is the probability of getting an irrational number
a. 0 b. 1, because nearly every value from 1 to 4 is irrational
Trial
an action that creates data
Bernoulli Trials
an experiment with the following characteristics: There are only two possible outcomes called "success" and "failure" for each trial. The probability p of a success is the same for any trial (so the probability q = 1 − p of a failure is the same for any trial).
Confidence Interval (CI)
an interval estimate for an unknown population parameter. This depends on: the desired confidence level, information that is known about the distribution (for example, known standard deviation), the sample and its size.
Quadrant 1 or 2 curve
apply a function higher on the tower of power than is currently used
Quadrant 3 or 4 curve
apply a function lower on the tower of power than is currently used
Law of Large Numbers (LLN)
as a random process is repeated more and more, the proportion of times an event occurs converges to a number (the probability of that event)
Good way of inflate r^2
dividing data into subgroups that are more homogenous
Extrapolation is dangerous because
it assumes the relationship holds beyond the data range you have seen and used for a model
The center of the sample distribution is at
mean, μ
Is mean or the median more resistant to outliers?
median
Which center of distribution is resistant to outliers
median
center used for asymmetric distributions (skewed)
median
Outliers can occur for many different reasons:
mistakes, atypical, scientifically important
Bigger padding in a confidence interval leads to
more confidence, but less relevance. (100% confidence that a value is in the interval 0-1000 is obvious. Where is the value, 2? 200? 547?)
In general, the bigger X^2 is, the
more evidence we have against H0
As df becomes larger, the t-distribution becomes
more standard/normal. The center does not change. The spread becomes narrower.
when you average things, you are eliminating
most variation that happens
The Poisson model is a good approximation of the Binomial model when
n >/= 20 and P <0.05 or n >/= 100 and p < 0.1 This is helpful because the Binomial model becomes unusable when n gets really big or small
finding n given margin of error
n=[(z*)²(p̂)(q̂)] / (ME)²]
Uniform model histogram
no peaks
As long as the conditions are met, it does not matter what distribution you start with. If you keep taking samples. you'll eventually get a
normal distribution
A high r^2 value is
not an indicator that a linear model is appropriate
If an outcome is common to both events, the events are
not disjoint
54% of students entering four-year colleges receive a degree within six years. Is this percent different for students who play intramural sports? 458 of the 800 students who played intramural sports received a degree within six years. H0: p = 0.54 Ha: p < 0.54
not equal
Before the furniture store began its ad campaign, it averaged 166 customers per day. The manager is hoping that the average has changed since the ad came out. The data for the 10 randomly selected days since the ad campaign began is shown below: 179, 182, 150, 203, 145, 199, 182, 234, 200, 177 Assuming that the distribution is normal, what can be concluded at the 0.05 level of significance? H0: u = 166
not equal
Members of fraternities and sororities are required to volunteer for community service. Do fraternity brothers work fewer volunteer hours on average than sorority sisters? The data below show the number of volunteer hours worked for 13 randomly selected fraternity brothers and 13 randomly selected sorority sisters. Frat 6 12 5 4 6 9 8 11 5 8 4 5 10 Sor 8 28 12 19 20 3 16 18 8 6 10 24 16 Assume that both populations follow a normal distribution. What can be concluded at the 0.05 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifFrat = mu.gifSor Ha: mu.gifFrat mu.gifSor Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that fraternity brothers work fewer volunteer hours on average than sorority sisters.
not equal to t 0.002 reject the null hypothesis sufficient
Is there a difference in the amount of writing political science classes and history classes require? The 56 randomly selected political science classes assigned an average of 17 pages of essay writing for the course. The standard deviation for these 56 classes was 3.5 pages. The 39 randomly selected history classes assigned an average of 15 pages of essay writing for the course. The standard deviation for these 39 classes was 2.8 pages. What can be concluded at the 0.05 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifPolySci = mu.gifHistory Ha: mu.gifPolySci mu.gifHistory Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference in the average amount of writing political science classes and history classes require.
not equal to t 0.003 reject the null hypothesis sufficient
Is there a difference based on gender in average final exam scores in statistics classes? Final exam scores of twelve randomly selected male statistics students and thirteen randomly selected female statistics students are shown below. Male 85 76 58 77 81 90 88 97 72 70 82 64 Female 78 96 79 67 93 84 99 90 87 76 85 92 94 Assume that both populations follow a normal distribution. What can be concluded at the 0.10 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifMen = mu.gifWomen Ha: mu.gifMen mu.gifWomen Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference based on gender in population mean statistics final exam scores.
not equal to t 0.071 reject the null hypothesis sufficient
Members of fraternities and sororities are required to volunteer for community service. Do they average the same number of volunteer hours per month or is there a difference? The data below show the number of volunteer hours worked for 13 randomly selected fraternity brothers and 13 randomly selected sorority sisters. Frat 6 12 5 4 2 9 7 11 5 8 14 5 10 Sor 8 11 7 19 7 3 6 18 8 6 10 24 16 Assume that both populations follow a normal distribution. What can be concluded at the 0.05 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifFrat = mu.gifSor Ha: mu.gifFrat mu.gifSor Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference in population mean number of volunteer hours that fraternity and sorority members work each month.
not equal to t 0.099 Fail to reject the null hypothesis insufficient
Is there a difference in the average donation given in Presbyterian vs Catholic church on Sundays? The 41 randomly selected members of the Presbyterian church donated an average of $28 with a standard deviation of $12. The 38 randomly selected members of the Catholic church donated an average of $31 with a standard deviation of $14. What can be concluded at the 0.05 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifPres. = mu.gifCatholic Ha: mu.gifPres. mu.gifCatholic Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference between Presbyterians and Catholics in population mean donation on Sundays.
not equal to t 0.312 Fail to reject the null hypothesis insufficient
Is there a difference between the average number of innings that left handed starting pitchers pitch per game and the number of innings that right handed starting pitchers pitch per game? Twelve randomly selected left handed starting pitchers' games and fourteen randomly selected right handed pitchers' games were looked at. The table below shows the results. Left 7 4 9 6 6 8 2 5 7 5 8 4 Right 8 6 8 9 7 3 9 8 4 5 7 9 3 6 Assume that both populations follow a normal distribution. What can be concluded at the 0.10 level of significance? H0: mu.gifleft = mu.gifright Ha: mu.gifleft mu.gifright Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference in population mean number of innings that left handed starting pitchers and right handed starting pitchers pitch per game.
not equal to t 0.431 Fail to reject the null hypothesis insufficient
Regression to the mean
people far from the mean are pulled towards it in subsequent trials because it is easier to score near the mean than far from it.
T-distribution gives more
precise results
General confidence interval formula
p̂ +/- [(z*)(SE(p̂))] = (p̂ - [(z*)(SE(p̂))],p̂ + [(z*)(SE(p̂))]) where SE(p̂) = sqrt(p̂*q̂ / n) z* is the critical value
so if p̂1~N(p1, sqrt((p1q1/n1)) and p̂2~N(p2,sqrt((p2q2/n2)), then
p̂1-p̂2~N(p1-p2, sqrt((p1q1/n1)+(p2q2/n2))
A Person's Ethnicity
qualitative
mean
the center of a distribution must take into account the data values themselves, not just the order they're in. It is the calculated average value (sum of all terms / amount of terms)
To get a normal sampling distribution from samples of a population: The greater the skew in the population,
the higher n must be to get a normal sampling distribution
The time that it takes for the next train to come follows a distribution with f(x) = 0.05 where x goes between 15 and 35 minutes. Round all numerical answers to two decimal places. A. This is a distribution. B. It is a distribution. C. The mean of this distribution is D. The standard deviation is E. Find the probability that the time will be at most 30 minutes. F. Find the probability that the time will be between 25 and 30 minutes = G. Find the 80th percentile.
uniform continuous 25 5.77 .75 .25 31
Inclusive "or"
used in probability. A or B means: 1. A, but not B 2. B, but not A 3. Both A and B
bad way of inflating r^2
using summarized data rather than unsummarized data
Extrapolation
using your model to predict a new y value for an x value that is outside the span of x data in your model
mode
value that occurs the most often in a set of data
Because of randomness, there is
variation in this statistic
When r is close to -1, the correlation is
very strong and negative
When r is close to 1, the correlation is
very strong and positive
CI for the mean of one sample
x̄ +/- t*df*(SE(x̄)) SE = s/sqrt(n) df = n-1
Z-score
y-ȳ /(SDy) ȳ = mean Unitless idea that tells you how many standard deviations above the mean some piece of data is z=0 is the mean (0 standard deviations from the mean) z=1 means 1 standard deviation away from the mean z=x means x standard deviations away from the mean
Confidence interval formula from regression
yhat new +/- t*n-2 * SEci Where SEci = sqrt([SE(b1)]^2 * (xnew - xhat)^2 + se^2/n)
Subgroups may not be visible unless
you think about them
If the residuals show any type of pattern
your current linear model is not appropriate
Formula Review
z = a standardized value (z-score) mean = 0; standard deviation = 1 To find the kth percentile of X when the z-scores is known: k = μ + (z)σ z-score: z = x - μσ Z = the random variable for z-scores Z ~ N(0, 1)
Critical value
z* If you want a confidence interval of 80%, then 10% would be to the left and 10% to the right. Therefore, the critical value of the z-score (z*) would be at the 90th percentile (80+10) because 10% is to the right of the 90th percentile.
Recall the regression line equation
ŷ = b0 + b1 * x b0 is the intercept b1 is the slope
If you have a situation modelled by Binom(n,p) in which n is large and p is small, then use a Poisson model instead where
λ = np where: [n >/= 20 and P </=0.05 or n >/= 100 and p </= 010] and [np </=20]
Parameter, μ (or E(x)
μ (or E(x) A value that helps summarize a probability model
Variance of a density function
μ(mu) = mean The integral from -∞ to ∞ of ∫(x-μ)^2 * f(x)dx Easier version: E[X^2] - μ^2 =[integral from -∞ to ∞ of ∫x^2 * f(x)dx] - μ^2
The standard deviation of the sampling distribution is:
σ = sqrt(pq/n) (square root of [probability of success]*[probability of failure] divided by the [number of samples])
The spread of the sampling distribution is:
σ/sqrt(n) (Standard deviation over the square root of the number of samples)
Is Race a factor in the number of times a month a person eats out? The table below shows data that was collected. White Black Hispanic Asian 6 4 7 8 3 1 3 10 2 5 5 5 4 2 4 14 6 6 7 Assume that all distributions are normal, the four population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifWhite = mu.gifBlack = mu.gifHispanic = mu.gifAsian Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.009 There is sufficient evidence to support the claim that at least two of the mean times eating out per month differ from each other.
You are interested in investigating whether the type of computer a person primarily uses and the type of car they drive are dependent. The table below shows the results of a survey. What can be concluded at the 0.05 level of significance? Sedan SUV Truck iPad 74 51 29 Notebook 152 98 70 Desktop 41 30 40 The p-value is: p-Value = Conclusion:
0.016 There is sufficient evidence to support the claim that computer type and car type are dependent.
A researcher is interested in investigating whether living situation and type of pet ownership are dependent. The table below shows the results of a survey. What can be concluded at the 0.05 level of significance? Single Family Couple Dog 52 78 43 Cat 58 80 37 Various 34 51 18 None 67 53 28 The p-value is: p-Value = Conclusion:
0.083 There is insufficient evidence to make a conclusion on whether living situation and pet ownership type are independent or dependent.
Order the following from smallest correlation, r, to largest correlation, r.
C, B, A, D
1-way and 2-way ANOVA (Analysis of Variance) are used to test whether two or more population variances differ.
False
If the correlation r = 0.92, then the points of the scatter plot do not all lie on a line.
True
Do men and women select different breakfasts? The breakfast ordered by randomly selected men and women at a popular breakfast place is shown below. French Toast Pancakes Waffles Omelettes Men 49 31 28 53 Women 65 72 55 60 Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Conclusion:
0.044 There is sufficient evidence to state that the distribution of breakfast ordered at the restaurant is different for men and women.
A study was done to look at the relationship between number of vacation days employees take each year and the number of sick days they take each year. The results of the survey are shown below. Vacation 7 12 9 3 8 10 5 8 10 15 12 7 0 9 13 Sick 4 0 12 2 0 9 3 5 1 0 7 8 3 17 0 Helpful Videos:Finding the Equation (Links to an external site.)Links to an external site., Interpreting the Correlation (Links to an external site.)Links to an external site., Testing for Correlation (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. A. r = B. r2 = C. Find the equation of the least squares regression line: y = + x D. What does the regression line predict for the number of sick days an employee took if that person took 11 vacation days that year? (Note that this may not be a reliable prediction. Why?) E. Test the hypothesis that there is a linear correlation between the number of vacation days and the number of sick days. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value =
-0.0578 0.0033 5.373 -0.0749 4.549 0.8377 There is insufficient evidence to support the claim that there is a correlation between the number of vacation days taken and the number of sick days taken.
Determine the value that the correlation, r, could be for the following scatter plot.
-0.54
A study was done to look at the relationship between number of movies people watch at the theater each year and the number of books that they read each year. The results of the survey are shown below. Movies 3 6 0 9 14 5 7 12 2 1 7 0 5 11 4 Books 7 0 18 0 1 4 5 0 6 2 10 9 2 2 0 A. r = Round to three decimal places. B. r2 = Round to three decimal places. C. Find the equation of the least squares regression line: y = + x Round to three decimal places. D. What does the regression line predict for the number of books read by a person who watches 8 movies per year? Round to nearest whole number. E. Test the hypothesis that there is a linear correlation between number of movies watched at the theater and number of books read. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value = Round your answer to three decimal places.
-0.5654 0.3197 8.1592 -0.6557 2.8336 0.0281 There is statistically significant evidence to support the claim that there is a correlation between the number of movies watched at the theater and the number of books read.
A study was done to look at the relationship between number of lovers college students have had in their lifetimes and their GPAs. The results of the survey are shown below. Lovers 3 2 0 5 3 1 1 8 2 0 1 3 1 4 6 GPA 2.4 3.0 3.8 2.0 2.7 3.3 3.7 0.5 3.6 4.0 2.8 3.6 3.5 1.8 2.2 Helpful Videos:Finding the Equation (Links to an external site.)Links to an external site., Interpreting the Correlation (Links to an external site.)Links to an external site., Testing for Correlation (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. A. r = Round to three decimal places. B. r2 = Round to three decimal places. C. Find the equation of the least squares regression line: y = + x Round to three decimal places. D. What does the regression line predict for the GPA of a student who has had 4 lovers? Round to three decimal places. E. Test the hypothesis that there is a linear correlation between number of lovers and GPA. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value = Round your answer to three decimal places.
-0.8973 0.8051 3.856 -0.3736 2.3616 0 There is statistically significant evidence to support the claim that there is a correlation between the number of lovers and GPA.
A Psychologist is interested in testing whether there is a difference in the distribution of personality types for business majors and social science majors. The results of the study are shown below. Open Conscientious Extrovert Agreeable Neurotic Business 41 74 46 61 38 Social Science 72 75 63 80 95 Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Round your answer to three decimal places. Conclusion:
0.006 There is sufficient evidence to state that the distribution of personality types is different for business and social science majors.
Is the type of area that a person lives in a factor in the age that a person experiences their first passionate kiss? The table below shows data that was collected. City Suburbs Rural 13 15 11 14 16 12 13 12 13 15 15 12 12 17 11 13 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifCity = mu.gifSuburbs = mu.gifRural Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.0068 There is statistically significant evidence to support the claim that at least two of the mean first kiss ages differ from each other.
Do college students enjoy playing sports less than watching sports? Eleven randomly selected college students were asked to rate playing sports and watching sports on a scale from 1 to 10 with 1 meaning they have no interest and 10 meaning they absolutely love it. The results of the study are shown below. Play 1 4 2 7 5 1 9 6 2 5 2 Watch 3 9 6 7 3 4 7 8 6 10 7 Assume the distribution of the differences is normal. What can be concluded at the 0.01 level of significance? (d = score before - score after) H0: mu.gifd = 0 Ha: mu.gifd [response1] 0 Test statistic: [response2] p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that, on average, college students enjoy playing sports less than they enjoy watching sports?
0.007 reject the null hypothesis sufficient
Do the needs of patients in the emergency room differ for those with health insurance vs. those without health insurance? Of the randomly selected emergency room patients with health insurance 42 had an injury, 35 were sick, 44 had heart problems and 39 had other needs. Of the randomly selected emergency room patients without health insurance 36 had an injury, 44 were sick, 18 had heart problems and 30 had other needs. Conduct the appropriate hypothesis test using a 0.05 level of significance. p-Value = Conclusion:
0.0175 There is sufficient evidence to state that the distribution of emergency room needs is not the same for those with health insurance and those without.
Are hair color and body type independent? The table below shows the results of a researcher's observations of randomly selected people. What can be concluded at the 0.05 level of significance? Blonde Brunette Red Head Short and Slender 53 177 31 Short and Pudgy 62 134 36 Tall and Slender 71 118 40 Tall and Heavy 46 100 32 (Links to an external site.)Links to an external site. The p-value is: p-Value = Conclusion:
0.021 There is sufficient evidence to support the claim that hair color and body type are dependent.
A researcher is interested in investigating whether the military branch a person signs up for and the person's blood type are dependent. The table below shows the results of a survey. What can be concluded at the 0.05 level of significance? O A B AB Army 97 67 43 12 Navy 108 90 50 17 Air Force 78 50 67 8 Marines 62 61 45 11 The p-value is: p-Value = Conclusion:
0.021 There is sufficient evidence to support the claim that military branch and blood type are dependent.
A restaurant manager is interested in investigating whether the main course ordered and the dessert ordered are dependent. The table below shows the results of a survey. What can be concluded at the 0.05 level of significance? Fish Meat Vegetarian Cake 41 68 19 Ice Cream 62 99 54 Pie 59 104 25 The p-value is: p-Value = Conclusion:
0.027 There is sufficient evidence to support the claim that main course and dessert ordered are dependent.
Is there a difference between community college statistics students and university statistics students in what technology they use on their homework? Of the randomly selected community college students 43 used a computer, 102 used a calculator with built in statistics functions, and 65 used a table from the textbook. Of the randomly selected university students 28 used a computer, 33 used a calculator with built in statistics functions, and 40 used a table from the textbook. Conduct the appropriate hypothesis test using a 0.05 level of significance. p-Value = Conclusion:
0.029 There is sufficient evidence to state that the distribution of technology use for statistics homework is not the same for statistics students at community colleges and at universities.
A hospital wants to determine if the type of treatment for pneumonia is a factor in recovery time? The table below shows the number of days to recovery for several randomly selected pneumonia patients that had various types of treatment. Overnight Hospital Stay A Few Hours in the Hospital Sent Home with Medicine 14 15 21 18 16 17 12 6 9 24 10 28 6 15 24 8 18 11 26 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifOvernight = mu.gifFewHours = mu.gifHome Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Round to three decimal places Conclusion:
0.0319 There is sufficient evidence to support the claim that at least two of the mean recovery times differ from each other.
Is a statistics class' delivery type a factor in how well students do on the final exam? The table below shows the average percent on final exams from several randomly selected classes that used the different delivery tpes. Online Hybrid Face-To-Face 72 83 80 71 73 78 70 84 84 80 81 87 81 86 79 82 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifOnline = mu.gifHybrid = mu.gifFace2Face Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.0394 There is sufficient evidence to support the claim that at least two of the mean exam scores differ from each other.
Determine the value that the correlation, r, could be for the following scatter plot.
0.04
A researcher wants to know if the clothes a woman wears is a factor in her GPA. The table below shows data that was collected from a survey. Dress Jeans Skirt Shorts 3.2 3.9 3.3 4.0 2.7 2.6 2.0 3.4 3.6 3.7 2.5 3.6 3.0 3.5 3.4 3.7 3.1 1.5 Assume that all distributions are normal, the four population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifDress = mu.gifJeans = mu.gifSkirt = mu.gifShorts Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value =
0.0480 There is sufficient evidence to support the claim that at least two of the mean GPAs differ from each other.
You are interested in investigating whether the type of computer a person primarily uses and the type of car they drive are dependent. The table below shows the results of a survey. What can be concluded at the 0.05 level of significance? Sedan SUV Truck iPad 74 51 29 Notebook 142 98 70 Desktop 41 30 37 The p-value is: p-Value = Conclusion:
0.063 There is insufficient evidence to make a conclusion on whether computer type and car type are independent or dependent.
Are the snow conditions a factor in the number of visitors to a ski resort? The table below shows data that was collected. Powder Machine Made Hard Packed 1210 2107 2846 1080 1149 1638 1537 862 2019 941 1870 1178 1528 2233 1382 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.10. H0: mu.gifPowder = mu.gifMachineMade = mu.gifHardPacked Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.0807 There is sufficient evidence to support the claim that at least two of the mean attendance numbers differ from each other.
A researcher surveyed randomly selected Democrats and Republicans asking them what the number one concern should be for the president of the United States. The results of the survey are shown below. Is there evidence to conclude that there is a difference in what Democrats and Republicans think is the most important? Economy Foreign Affairs Family Values Environment Other Democrats 77 68 45 17 60 Republicans 65 72 60 10 45 Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Conclusion:
0.147 There is insufficient evidence to state whether the distribution of the presidential priorities is different for Democrats and Republicans.
Is the racial distribution for students on work study different from the racial distribution for students not on work study? The results of a recent study are shown below. White Black Hispanic Asian Other Work Study 71 12 38 46 17 Not Work Study 132 47 99 84 40 Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Conclusion:
0.169 There is insufficient evidence to state whether the racial distribution for students on work study differs from the racial distribution for students who are not on work study.
Are hair color and body type independent? The table below shows the results of a researcher's observations of randomly selected people. What can be concluded at the 0.05 level of significance? Blonde Brunette Red Head Short and Slender 53 127 31 Short and Pudgy 62 134 36 Tall and Slender 71 118 40 Tall and Heavy 46 100 42 (Links to an external site.)Links to an external site. The p-value is: p-Value = Conclusion:
0.24 There is insufficient evidence to make a conclusion on whether hair color and body type are independent or dependent.
A researcher wants to know if there is a difference between the mean amount of sleep that people get for various types of employment status. The table below shows data that was collected from a survey. Unemployed Part Time Worker Full Time Worker 9 8 7 7 7 9 8 7 7 9 8 5 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifUnemployed = mu.gifPartTime = mu.gifFullTime Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.2458 There is insufficient evidence to support the claim that at least two of the mean sleep times differ from each other.
A researcher wants to know if the clothes a woman wears is a factor in her GPA. The table below shows data that was collected from a survey. Dress Jeans Skirt Shorts 3.2 3.9 3.6 4.0 2.7 2.6 2.0 3.4 3.6 3.7 2.5 3.3 3.0 3.5 3.4 3.7 3.1 3.1 Assume that all distributions are normal, the four population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifDress = mu.gifJeans = mu.gifSkirt = mu.gifShorts Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.2529 There is insufficient evidence to support the claim that at least two of the mean GPAs differ from each other.
A fitness trainer is interested in investigating whether ethnicity and the first exercise activity that members engage in are dependent. The table below shows the results of the trainer's observation of randomly selected members. What can be concluded at the 0.05 level of significance? Cardio Weight Machines Free Weights White 87 52 55 Hispanic 41 20 17 Black 36 19 11 Other 28 16 7 (Links to an external site.)Links to an external site. The p-value is: p-Value = Conclusion
0.271 There is insufficient evidence to make a conclusion on whether ethnicity and initial exercise type are independent or dependent.
A fisherman is interested in whether the distribution of fish caught in Green Valley Lake is the same as the distribution of fish caught in Echo Lake. Of the 191 randomly selected fish caught in Green Valley Lake, 105 were rainbow trout, 27 were other trout, 35 were bass, and 24 were catfish. Of the 293 randomly selected fish caught in Echo Lake, 135 were rainbow trout, 48 were other trout, 67 were bass, and 43 were catfish. Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Round your answer to three decimal places. Conclusion:
0.293 There is insufficient evidence to state that the distribution of fish in Green Valley Lake is not the same as the distribution of fish in Echo Lake.
You are interested in investigating whether gender and major are independent at your college. The table below shows the results of a survey. What can be concluded at the 0.05 level of significance? Math/Science Arts/Humanities Business/Econ Other Male 45 51 38 22 Female 53 89 60 40 The p-value is: p-Value = Conclusion:
0.445 There is insufficient evidence to make a conclusion on whether gender and major are independent or dependent.
Is Race a factor in the number of times a month a person eats out? The table below shows data that was collected. White Black Hispanic Asian 6 4 7 8 8 1 3 3 2 5 5 5 4 2 4 1 6 6 7 Assume that all distributions are normal, the four population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifWhite = mu.gifBlack = mu.gifHispanic = mu.gifAsian Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value =
0.4711 There is insufficient evidence to support the claim that at least two of the mean times eating out per month differ from each other.
What is the relationship between the attendance at a major league ball game and the runs scored by the home team? Attendance figures (in thousands) and the runs scored by the home team for 12 randomly selected games are shown below. Attendance 28 45 32 56 19 27 41 33 58 16 30 43 Runs 5 7 6 11 0 2 8 1 4 3 3 0 Helpful Videos:Finding the Equation (Links to an external site.)Links to an external site., Interpreting the Correlation (Links to an external site.)Links to an external site., Testing for Correlation (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. A. r = Round to three decimal places. B. r2 = Round to three decimal places. C. Find the equation of the least squares regression line: y = + x Round to three decimal places. D. What does the regression line predict for the number of runs scored by the home team if there are 35,000 in attendance? Round to the nearest whole number. E. Test the hypothesis that there is a linear correlation between attendance and home team's runs scored. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value = Round your answer to three decimal places.
0.537 0.2884 -0.7177 0.1369 4 0.0718 There is insufficient evidence to support the claim that there is a correlation between attendance and home team's runs scored.
A Psychologist is interested in testing whether there is a difference in the distribution of personality types for business majors and social science majors. The results of the study are shown below. Open Conscientious Extrovert Agreeable Neurotic Business 41 52 46 61 58 Social Science 72 75 63 80 65 Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Conclusion:
0.557 There is insufficient evidence to state whether the distribution of personality types is different for business and social science majors.
Are the snow conditions a factor in the number of visitors to a ski resort? The table below shows data that was collected. Powder Machine Made Hard Packed 1210 2107 846 1080 1149 1638 1537 862 2019 941 1870 1178 1528 1233 1382 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifPowder = mu.gifMachineMade = mu.gifHardPacked Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.5682 There is insufficient evidence to support the claim that at least two of the mean attendance numbers differ from each other.
What is the relationship between the number of minutes per day a woman spends talking on the phone and the woman's weight? The time on the phone and weight for 9 women are shown in the table below. Time 45 21 72 53 36 12 39 81 57 Weight 120 115 188 130 125 141 135 150 149 Helpful Videos:Finding the Equation (Links to an external site.)Links to an external site., Interpreting the Correlation (Links to an external site.)Links to an external site., Testing for Correlation (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. A. r = Round to three decimal places. B. r2 = Round to three decimal places. C. Find the equation of the least squares regression line: y = + x Round to three decimal places. D. What does the regression line predict for the weight of a woman who spends 50 minutes per day talking on the phone? pounds. Round to the nearest whole number. E. Test the hypothesis that there is a linear correlation between the amount of time women talk on the phone and women's weight. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value = Round your answer to three decimal places.
0.617 0.381 111.216 0.6059 141.516 0.0766 There is insufficient evidence to support the claim that there is a correlation between the amount of time women talk on the phone and women's weight.
Is there a difference in the car company for Midwesterners and people from the west coast? Of the 184 randomly selected Midwesterners surveyed 87 had an American car, 64 had a car from Asia, and 33 had a cars from another country. Of the 240 randomly selected people from the west coast who were surveyed, 107 had an American car, 93 had a car from Asia, and 40 had a car from outside of the US and Asia. Perform the hypothesis test at a 5% level of significance and find the p-Value. p-Value = Conclusion:
0.703 There is insufficient evidence to state that the distribution of countries where cars are from is different for Midwesterners and people from the west coast.
What is the relationship between the amount of time statistics students study per week and their final exam scores. The results of the survey are shown below. Study 9 7 4 16 8 0 10 8 14 3 10 Final 82 78 65 93 90 37 79 70 90 81 84 Helpful Videos:Finding the Equation (Links to an external site.)Links to an external site., Interpreting the Correlation (Links to an external site.)Links to an external site., Testing for Correlation (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. A. r = B. r2 = C. Find the equation of the least squares regression line: y = + x D. What does the regression line predict for the final exam score for a student who studied 11 hours per week? E. Test the hypothesis that there is a linear correlation between the amount of time statistics students study per week and their final exam scores. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value =
0.7885 0.6217 55.5033 2.6793 84.9756 0.0039 There is sufficient evidence to support the claim that there is a correlation between study time and final exam score.
American college students have an average of 4.6 credit cards per student. Is the average less for 20-year-olds who are not in college? The data for the 18 randomly selected 20-year-olds who are not in college is shown below: 8, 4, 3, 0, 6, 2, 4, 1, 5, 5, 4, 2, 3 ,4, 2, 7, 4, 0 Assuming that the distribution is normal, what can be concluded at the 0.05 level of significance? H0: mu.gif = 4.6 Ha: mu.gif 4.6 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of credit cards held by 20-year-olds who are not in college is less than 4.6.
< t 0.030 reject the null hypothesis sufficient
A grocery store manager did a study to look at the relationship between the amount of time (in minutes) customers spend in the store and the amount of money (in dollars) they spend. The results of the survey are shown below. Time 18 24 9 32 12 17 22 28 15 21 19 Money 31 30 26 79 24 37 45 88 22 35 25 A. r = Round to three decimal places. B. r2 = Round to three decimal places. C. Find the equation of the least squares regression line: y = + x Round to three decimal places. D. What does the regression line predict for the amount of money a person will spend who shops for 20 minutes? Round to the nearest cent (2 decimal places). E. Test the hypothesis that there is a linear correlation between the time spent in the market and the money spent at the market. Use a level of significance of 0.05. H0: r = 0 Ha: r Not Equal to 0 p-Value = Round your answer to three decimal places.
0.8137 0.6621 -13.5466 2.7236 40.9254 0.0023 There is statistically significant evidence to support the claim that there is a correlation between the time spent and the money spent at the market.
A researcher wants to know if the news station a person watches is a factor in the amount of time (in minutes) that they watch. The table below shows data that was collected from a survey. CNN Fox Local 45 15 18 62 43 37 28 68 26 38 50 60 23 31 51 55 22 Assume that all distributions are normal, the three population standard deviations are all the same, and the data was collected independently and randomly. Use a level of significance of 0.05. H0: mu.gifCNN = mu.gifFox = mu.gifLocal Ha: At least two of the means are different from each other. Calculate the p-Value. p-Value = Conclusion:
0.9223 There is insufficient evidence to support the claim that at least two of the mean times differ from each other.
Determine the value that the correlation, r, could be for the following scatter plot. ScatterPlot with all points on the line y = 3/5 x + 1
1
The average final exam score for the statistics course is 77% and the standard deviation is 8%. A professor wants to see if the average exam score will be lower for students who are given colored pens on the first day of class. The final exam scores for the 18 randomly selected students who were given the colored pens are shown below. Assume that the distribution of the population is normal. 75, 68, 74, 69, 76, 72, 81, 87, 77, 79, 75, 81, 52, 80, 98, 72, 78, 70 What can be concluded at the 0.05 level of significance? H0: mu.gif = 77 Ha: mu.gif 77 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean final exam score for students who are given colored pens at the beginning of class is less than 77%.
= Z 0.258 Fail to reject the null hypothesis insufficient
The average house has 12 paintings on its walls. The standard deviation is 3.7 paintings. Is the mean smaller for houses owned by teachers? The data show the results of a survey of 14 teachers who were asked how many paintings they have in their houses. Assume that that distribution of the population is normal. 11, 15, 7, 6, 9, 12, 16, 13, 8, 14, 3, 10, 8, 9 What can be concluded at the 0.05 level of significance? H0: mu.gif = 12 Ha: mu.gif 12 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of paintings that are in teacher's houses is less than 12.
< Z 0.026 Reject the null hypothesis sufficient
It takes an average of 10 minutes for blood to begin clotting after an injury. The standard deviation is 3 minutes. An EMT wants to see if the average will decrease if the patient is spoken softly to. The EMT randomly selected 38 injured patients to speak softly to and noticed that they averaged 9.1 minutes for their blood to begin clotting after their injury. What can be concluded at the 0.05 level of significance? H0: mu.gif = 10 Ha: mu.gif 10 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean time for blood to begin to clot after an injury is less than 10 minutes for patients who are spoken to softly.
< Z 0.0322 Reject the null hypothesis statistically significant
Are couples that live together before they get married less likely to end up divorced within five years of marriage compared to couples that live apart before they get married? 38 of the 370 couples from the study who lived together before they got married were divorced within five years of marriage. 56 of the 430 couples from the study who lived apart before they got married were divorced within five years of marriage. What can be concluded at the 0.10 level of significance? H0: PLiveTogether = PLiveApart Ha: PLiveTogether PLiveApart Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that couples that live together before they get married are less likely to end up divorced within five years of marriage compared to couples that live apart before they get married.
< Z 0.114 Fail to reject the null hypothesis insufficient
The average fruit fly will lay 400 eggs into rotting fruit. A biologist wants to see if flies that have a certain gene modified will lay fewer eggs on average. The data below shows the number of eggs that were laid into rotting fruit by several fruit flies that had this gene modified. Assume that the distribution of the population is normal. 323, 408, 391, 452, 387, 365, 378, 411, 426, 367, 333, 298, 362 What can be concluded at the 0.01 level of significance? Helpful Videos:Calculations (Links to an external site.)Links to an external site., Set-up (Links to an external site.)Links to an external site., Interpretations (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gif = 400 Ha: mu.gif 400 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of eggs that fruit flies with this gene modified will lay in rotting fruit is less than 400.
< t 0.038 Fail to reject the null hypothesis insufficient
On average is the younger sibling's IQ lower than the older sibling's IQ? Eleven sibling pairs were given IQ tests. The data is shown below. Younger 104 96 102 125 86 100 90 117 102 110 81 Older 107 97 99 131 90 96 94 117 108 114 82 Assume the distribution of the differences is normal. What can be concluded at the 0.05 level of significance? (d = Younger Sibling IQ - Older Sibling IQ) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean IQ score for younger siblings is less than the population mean IQ score for older siblings.
< t 0.038 reject the null hypothesis sufficient
Nationally, patients who go to the emergency room wait an average of 6 hours to be admitted into the hospital. Do patients at rural hospitals have a shorter waiting time? The 43 randomly selected patients who went to the emergency room at rural hospitals waited an average of 5.5 hours to be admitted into the hospital. The standard deviation for these 43 patients was 1.8 hours. What can be concluded at the 0.05 level of significance? H0: mu.gif = 6 Ha: mu.gif 6 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean waiting time to be admitted into the hospital from the emergency room for patients at rural hospitals is less than 6 hours.
< t 0.038 reject the null hypothesis sufficient
Does 10K running time increase when the runner listens to music? Ten runners were timed as they ran a 10K with and without listening to music. The the running times in minutes are shown below. No Music 52.4 43.9 52.6 44.2 53.3 51.4 44.2 41.1 47.9 46.2 Music 55.7 41.3 52.8 47.5 53.9 54.1 49.6 37.8 51.5 49.6 Assume the distribution of the differences is normal. What can be concluded at the 0.01 level of significance? (d = Time Without music - Time With music) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean running time for a 10K increases when the runners listen to music.
< t 0.049 Fail to reject the null hypothesis insufficient
It says on the new Meat Man Barbecue's box that it takes 10 minutes for assembly. The manager of the retail store where the barbecues are sold thinks that it takes less time to assemble. The manager surveyed 55 randomly selected people who purchased the Meat Man Barbecue and found that their average time was 9.3 minutes. The standard deviation for this survey group was 4.1 minutes. What can be concluded at the 0.05 level of significance? H0: mu.gif = 10 Ha: mu.gif 10 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean amount of time to assemble the Meat Man barbecue is less than 10 minutes.
< t 0.105 Fail to reject the null hypothesis insufficient
Is the average time to complete an obstacle course faster when a patch is placed over the left eye than when a patch is placed over the right eye? Thirteen randomly selected volunteers first completed an obstacle course with a patch over one eye and then completed an equally difficult obstacle course with a patch over the other eye. The completion times are shown below. "Left" means the patch was placed over the left eye and "Right" means the patch was placed over the right eye. Left 49 43 51 37 46 38 47 45 53 41 49 48 39 Right 51 42 49 43 41 40 47 46 55 42 50 51 39 Assume the distribution of the differences is normal. What can be concluded at the 0.10 level of significance? (d = score before - score after) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean time to complete the obstacle course with a patch over the left eye is less than the population mean time to complete the obstacle course with a patch over the right eye.
< t 0.155 Fail to reject the null hypothesis insufficient
The average American consumes 8.7 liters of alcohol per year. Does the average college student consume less alcohol per year? A researcher surveyed 48 randomly selected college students and found that they averaged 8.3 liters of alcohol consumed per year with a standard deviation of 2.9 liters. What can be concluded at the 0.10 level of significance? H0: mu.gif = 8.7 Ha: mu.gif 8.7 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean amount of alcohol consumed by college students is less than 8.7 liters per year.
< t 0.172 Fail to reject the null hypothesis insufficient
Women are recommended to consume 1800 calories per day. You suspect that women at your college consume fewer calories each day on average. The data for the 13 women who participated in the study is shown below: 1778, 1809, 1653, 1793, 1882, 1700, 1648, 2112, 1539, 1740, 1734, 1831, 1782 Assuming that the distribution is normal, what can be concluded at the 0.05 level of significance? H0: mu.gif = 1800 Ha: mu.gif 1800 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean calorie intake for women at your college is less than 1800.
< t 0.217 Fail to reject the null hypothesis insufficient
Currently patrons at the library speak at an average of 63.2 decibels and the standard deviation is 4 decibels. Will this average increase after a "keep your voices down" sign is removed from the front entrance? After the sign was removed, the librarian random recorded 41 patrons speaking at the library. Their average decibel level was 64.1. H0: mu.gif = 63.2 Ha: mu.gif 64
>
The average amount of time it takes for couples to further communicate with each other after their first date has ended is 1.5 days. The standard deviation is 0.6 days. Is this average longer for blind dates? A researcher interviewed 34 couples who had recently been on blind dates and found that they averaged 1.65 days to communicate with each other after the date was over. H0: mu.gif = 1.5 Ha: mu.gif 1.5
>
The average final exam score for the statistics course is 77% and the standard deviation is 8%. A professor wants to see if there will be a difference in the average final exam score for students who are given colored pens on the first day of class. The final exam scores for the 18 randomly selected students who were given the colored pens are shown below. Assume that the distribution of the population is normal. 75, 88, 84, 68, 96, 72, 81, 97, 77, 79, 85, 81, 52, 80, 98, 83, 78, 90 What can be concluded at the 0.05 level of significance? Helpful Videos: Set-up (Links to an external site.)Links to an external site. H0: mu.gif = 77 Ha: mu.gif > 77
>
The average salary for American college graduates is $46,000. You suspect that the average is more for graduates from your college. The 47 randomly selected graduates from your college had an average salary of $53,115 and a standard deviation of $24,197. What can be concluded at the 0.1 level of significance? Helpful Videos: Set-up (Links to an external site.)Links to an external site. H0: mu.gif = 46000 Ha: mu.gif 46000
>
Are job applicants with easy to pronounce last names more likely to get called for an interview than applicants with difficult to pronounce last names. 500 job applications were sent out with last names that are easy to pronounce and 500 identical job applications were sent out with names that were difficult to pronounce. 138 of the "applicants" with easy to pronounce names were called for an interview while 104 of the "applicants" with difficult to pronounce names were called for an interview. What can be concluded at the 0.01 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: PEasyToPronounce = PDifficultToPronounce Ha: PEasyToPronounce PDifficultToPronounce Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that people with easy to pronounce last names are more likely to get called for an interview compared to people with difficult to pronounce last names.
> Z 0.006 reject the null hypothesis sufficient
Are blonde female college students more likely to have boyfriends than brunette female college students? 219 of the 350 blondes surveyed had boyfriends and 220 of the 400 brunettes surveyed had boyfriends. What can be concluded at the 0.05 level of significance? H0: PBlonde = PBrunette Ha: PBlonde PBrunette Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that blonde female college students are more likely to have boyfriends than brunette female college students.
> Z 0.018 Reject the null hypothesis sufficient
Are freshmen psychology majors more likely to change their major before they graduate than freshmen business majors? 484 of the 975 freshmen psychology majors from a recent study changed their major before they graduated and 314 of the 697 freshmen business majors changed their major before they graduated. What can be concluded at the 0.01 level of significance? Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: PPsychology = PBusiness Ha: PPsychology PBusiness Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that freshmen psychology majors are more likely than freshmen business majors to change their majors.
> Z 0.032 Fail to reject the null hypothesis insufficient
Are couples that live together before they get married more likely to end up divorced within five years of marriage compared to couples that live apart before they get married? 54 of the 380 couples from the study who lived together before they got married were divorced within five years of marriage. 51 of the 490 couples from the study who lived apart before they got married were divorced within five years of marriage. What can be concluded at the 0.05 level of significance? H0: PLiveTogether = PLiveApart Ha: PLiveTogether PLiveApart Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that couples that live together before they get married are more likely to end up divorced within five years of marriage compared to couples that live apart before they get married.
> Z 0.044 reject the null hypothesis sufficient
Is the proportion of wildfires caused by humans in the south greater than the proportion of wildfires caused by humans in the west? 403 of the 481 randomly selected wildfires looked at in the south were caused by humans while 514 of the 640 randomly selected wildfires looked at the west were caused by humans. What can be concluded at the 0.10 level of significance? H0: Psouth = Pwest Ha: Psouth Pwest Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the proportion of wildfires caused by humans in the south is greater than the proportion of wildfires caused by humans in the west.
> Z 0.068 Reject the null hypothesis sufficient
The average final exam score for the statistics course is 77% and the standard deviation is 8%. A professor wants to see if the average exam score will be higher for students who are given colored pens on the first day of class. The final exam scores for the 17 randomly selected students who were given the colored pens are shown below. Assume that the distribution of the population is normal. 75, 78, 74, 89, 76, 92, 81, 87, 77, 79, 75, 81, 52, 80, 98, 72, 78 What can be concluded at the 0.05 level of significance? H0: mu.gif = 77 Ha: mu.gif 77 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean final exam score for students who are given colored pens at the beginning of class is greater than 77%.
> Z 0.144 Fail to reject the null hypothesis insufficient
The average number of cavities that 30-year-old Americans have had in their lifetimes is 7.0. The standard deviation 2.7 cavities. Do 20 year olds have more cavities? The data show the results of a survey of 16 twenty-year-olds who were asked how many cavities they have had. Assume that that distribution of the population is normal. Helpful Videos:Calculations (Links to an external site.)Links to an external site., Set-up (Links to an external site.)Links to an external site., Interpretations (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. 6, 7, 7, 8, 7, 8, 9, 6, 5, 6, 7, 8, 7, 6, 9, 8 What can be concluded at the 0.05 level of significance? H0: mu.gif = 7 Ha: mu.gif 7 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of cavities for 20-year-olds is more than 7.0.
> Z 0.427 Fail to reject the null hypothesis insufficient
The average American consumes 8.7 liters of alcohol per year. Does the average college student consume more alcohol per year? A researcher surveyed 51 randomly selected college students and found that they averaged 9.8 liters of alcohol consumed per year with a standard deviation of 3.9 liters. What can be concluded at the 0.05 level of significance? H0: mu.gif = 8.7 Ha: mu.gif 8.7 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean amount of alcohol consumed by college students is greater than 8.7 liters per year.
> t 0.025 reject the null hypothesis sufficient
Does 10K running time decrease when the runner listens to music? Eleven runners were timed as they ran a 10K with and without listening to music. The the running times in minutes are shown below. No Music 58.4 43.9 52.6 49.2 53.3 57.4 44.2 41.1 47.9 50.2 44.8 Music 55.7 41.3 52.8 47.5 53.9 54.1 39.6 37.8 51.5 49.6 42.3 Assume the distribution of the differences is normal. What can be concluded at the 0.05 level of significance? (d = Time Without music - Time With music) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean running time for a 10K decreases when the runners listen to music.
> t 0.026 reject the null hypothesis sufficient
Women are recommended to consume 1800 calories per day. You suspect that women at your college consume more calories each day on average. The data for the 13 women who participated in the study is shown below: 1878, 1809, 1753, 1793, 1882, 1900, 1948, 2112, 1639, 1840, 2034, 1831, 1882 Assuming that the distribution is normal, what can be concluded at the 0.05 level of significance? H0: mu.gif = 1800 Ha: mu.gif 1800 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean calorie intake for women at your college is more than 1800.
> t 0.029 Reject the null hypothesis sufficient
The commercial for the new Meat Man Barbecue claims that it takes 10 minutes for assembly. A consumer advocate thinks that the assembly time is longer than 10 minutes. The advocate surveyed 46 randomly selected people who purchased the Meat Man Barbecue and found that their average time was 10.9 minutes. The standard deviation for this survey group was 4.5 minutes. What can be concluded at the 0.10 level of significance? H0: mu.gif = 10 Ha: mu.gif 10 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean amount of time to assemble the Meat Man barbecue is more than 10 minutes.
> t 0.091 Reject the null hypothesis sufficient
Is memory ability before a meal different compared to after a meal? Twelve people were given memory tests before their meal and then again after their meal. The data is shown below. Before 74 68 82 97 76 81 80 75 88 84 79 91 After 76 68 85 94 79 88 83 72 90 87 79 90 Assume the distribution of the differences is normal. What can be concluded at the 0.05 level of significance? (d = score before - score after) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean memory ability before a meal differs from the population mean memory ability after a meal.
> t 0.136 Fail to reject the null hypothesis insufficient
Is memory ability before a meal better than after a meal? Twelve people were given memory tests before their meal and then again after their meal. The data is shown below. A higher score indicates a better memory ability. Before 74 68 82 97 76 81 80 75 88 84 79 91 After 76 68 85 94 79 88 83 72 80 87 79 80 Assume the distribution of the differences is normal. What can be concluded at the 0.05 level of significance? (d = score before - score after) Helpful Video (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gifd = 0 Ha: mu.gifd 0 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean memory ability before a meal is better than the population mean memory ability after a meal.
> t 0.413 Fail to reject the null hypothesis insufficient
A researcher found the correlation between age of death and number of cigarettes smoked per day to be -0.95. Based just on this information, the researcher can justly conclude that smoking causes early death.
False
A researcher is studying hours of sleep based on job type and also based on national origin to see if hours of sleep is independent of each of these. The researcher should use a 1-way ANOVA for this study.
False
A study was done asking people how much money they spend per month on their natural gas bill and how much money per month they spend on their electric bill. The correlation R was found to be 0.94 and the P-value for correlation was 0.0003. Then a person with a high natural gas bill will also have a high electric bill.
False
Data was collected on the distance from runner's bellybuttons to the ground and the time it takes them to run 100 meters. R-Squared = .87. Then 87% of runners with bellybuttons far from the ground will run faster than average.
False
Data was collected on the size of towns and the high school drop out rate. A left tailed hypothesis test was performed for r. If the P-value was 0.02 and a = 0.1, then it can be concluded that larger towns tend to have higher drop out rates.
False
If the equation of the regression line that relates income in dollars of student's parents, x, with cost in dollars of tuition, y, is y = 8000 + 0.02x, then the slope tells us that for every dollar increase in tuition, student's parent's income tends to be 2 cents higher.
False
The average number of cavities that 30-year-old Americans have had in their lifetimes is 7.0. The standard deviation 2.7 cavities. Is the mean different for 20-year-olds? The data show the results of a survey of 15 twenty-year-olds who were asked how many cavities they have had. Assume that that distribution of the population is normal. 6, 7, 5, 3, 7, 8, 4, 6, 5, 6, 4, 6, 7, 6, 9 What can be concluded at the 0.05 level of significance? Helpful Videos:Calculations (Links to an external site.)Links to an external site., Set-up (Links to an external site.)Links to an external site., Interpretations (Links to an external site.)Links to an external site. Hint (Links to an external site.)Links to an external site. Textbook Pages (Links to an external site.)Links to an external site. H0: mu.gif = 7 Ha: mu.gif 7 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of cavities for 20-year-olds differs from 7.0.
Not equal to Z 0.126 Fail to reject the null hypothesis insufficient
A researcher wants to determine if there is a difference among the mean running speeds of four breeds of dogs. If the distributions all follow a normal distribution, and the standard deviations are all the same, then the 1-Way ANOVA is the appropriate test to use.
True
Data was collected on the number of miles people drive per year and the amount they spend eating out per year. A right tailed hypothesis test was performed for rho. If the P-value was 0.02 and alpha = 0.1, then it can be concluded that people who drive a lot per year tend to spend a lot of money eating out per year.
True
If a 1-way ANOVA produces a p-value of 0.007 with a level of significance of 0.05, then there is evidence to conclude that the distributions are not all the same.
True
If the equation of the least squares regression line was computed to be y = 45.7+3.1x, then the correlation cannot be less than 0.
True
If the equation of the regression line for the day's high temperature x and the mean number of robberies committed that day y is y = 0.3x + 28, then on average, days with higher temperatures also tend to be days with a greater number of robberies.
True
If the equation of the regression line that relates percent blood alcohol, x, to reaction time in milliseconds, y, is y = 36 - 1.3x, then the slope tells us that for every percent increase in blood alcohol, we can predict reaction time to go down by 1.3 milliseconds
True
If the populations all have a uniform distribution, then 1-way and 2-way ANOVA cannot be used.
True
It is possible for data to have a strong correlation while having only weak evidence to suggest that the population correlation differs from 0.
True
Is the proportion of wildfires caused by humans in the south less than the proportion of wildfires caused by humans in the west? 326 of the 450 randomly selected wildfires looked at in the south were caused by humans while 412 of the 520 randomly selected wildfires looked at the west were caused by humans. What can be concluded at the 0.05 level of significance? H0: Psouth = Pwest Ha: Psouth Pwest Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the proportion of wildfires caused by humans in the south is less than the proportion of wildfires caused by humans in the west.
not equal to Z 0.007 reject the null hypothesis sufficient
Is there a difference between the proportion of blonde and brunette female college students who have boyfriends? 219 of the 350 blondes surveyed had boyfriends and 220 of the 400 brunettes surveyed had boyfriends. What can be concluded at the 0.05 level of significance? H0: PBlonde = PBrunette Ha: PBlonde PBrunette Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is [response5] evidence to make the conclusion that there is a difference between the proportion of blonde and brunette female college students who have boyfriends.
not equal to Z 0.036 Reject the null hypothesis
Is there a difference in the proportion of wildfires caused by humans in the south and in the west? 495 of the 650 randomly selected wildfires looked at in the south were caused by humans while 331 of the 414 randomly selected wildfires looked at the west were caused by humans. What can be concluded at the 0.05 level of significance? H0: Psouth = Pwest Ha: Psouth Pwest Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that there is a difference between the proportion of wildfires caused by humans in the south and the proportion of wildfires caused by humans in the west.
not equal to Z 0.147 Fail to reject the null hypothesis insufficient
The average house has 12 paintings on its walls. The standard deviation is 4.7 paintings. Is the mean different for houses owned by teachers? The data show the results of a survey of 14 teachers who were asked how many paintings they have in their houses. Assume that that distribution of the population is normal. 11, 15, 7, 14, 9, 12, 16, 13, 8, 14, 3, 10, 8, 9 What can be concluded at the 0.05 level of significance? H0: mu.gif = 12 Ha: mu.gif 12 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of paintings that are in teacher's houses is different from 12.
not equal to Z 0.280 Fail to reject the null hypothesis insufficient
American college students have an average of 4.6 credit cards per student. Is the average different for 20-year-olds who are not in college? The data for the 18 randomly selected 20-year-olds who are not in college is shown below: 3, 4, 3, 0, 6, 2, 4, 1, 5, 5, 0, 2, 3 ,4, 2, 7, 4, 0 Assuming that the distribution is normal, what can be concluded at the 0.01 level of significance? H0: mu.gif = 4.6 Ha: mu.gif 4.6 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of credit cards held by 20-year-olds who are not in college is not equal to 4.6.
not equal to t 0.005 reject the null hypothesis sufficient
The commercial for the new Meat Man Barbecue claims that it takes 10 minutes for assembly. A consumer advocate thinks that the claim is false. The advocate surveyed 50 randomly selected people who purchased the Meat Man Barbecue and found that their average time was 11.2 minutes. The standard deviation for this survey group was 3.1 minutes. What can be concluded at the 0.05 level of significance? H0: mu.gif= 10 Ha: mu.gif 10 Test statistic: p-Value = . Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean amount of time to assemble the Meat Man barbecue is not equal to 10 minutes.
not equal to t 0.009 reject the null hypothesis sufficient
Nationally, patients who go to the emergency room wait an average of 6 hours to be admitted into the hospital. Is this average different for rural hospitals? The 37 randomly selected patients who went to the emergency room at rural hospitals waited an average of 5.4 hours to be admitted into the hospital. The standard deviation for these 37 patients was 1.6 hours. What can be concluded at the 0.01 level of significance? H0: mu.gif = 6 Ha: mu.gif 6 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean waiting time to be admitted into the hospital from the emergency room for patients at rural hospitals is not equal to 6 hours.
not equal to t 0.029 Fail to reject the null hypothesis insufficient
Before the furniture store began its ad campaign, it averaged 166 customers per day. The manager is hoping that the average has changed since the ad came out. The data for the 10 randomly selected days since the ad campaign began is shown below: 179, 182, 150, 203, 145, 199, 182, 234, 200, 177 Assuming that the distribution is normal, what can be concluded at the 0.05 level of significance? H0: mu.gif = 166 Ha: mu.gif 166 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean number of customers since the ad campaign began is not equal to 166.
not equal to t 0.045 reject the null hypothesis sufficient
Women are recommended to consume 1800 calories per day. You suspect that the average calorie intake is different for women at your college. The data for the 13 women who participated in the study is shown below: 1778, 1809, 1653, 1793, 1882, 1700, 1648, 2112, 1539, 1740, 1734, 1831, 1782 Assuming that the distribution is normal, what can be concluded at the 0.05 level of significance? H0: mu.gif = 1800 Ha: mu.gif 1800 Test statistic: p-Value = Round your answer to three decimal places. Conclusion: There is evidence to make the conclusion that the population mean calorie intake for women at your college is not equal to 1800.
not equal to t 0.434 Fail to reject the null hypothesis insufficient