Stats Final Review
An(n) ___ is obtained by dividing the population into homogeneous groups and randomly selecting individuals from each group.
stratified sample
sample variance and standard deviation
s² and s statcrunch: 1. stat 2. summary stat -> column 3. variance and stand. dev
If requals=_______, then a perfect negative linear relation exists between the two quantitative variables.
-1
Construct a relative frequency distribution of the data
1. add all frequency 2. frequency divided by total
A polling organization conducts a study to estimate the percentage of house holds that have more than one computer. It mails a questionnaire to 1391 randomly selected households across the country and asks the head of each household if he or she has more than one computer. Of the 1391 households selected, 34 responded.
A) Nonresponse bias B) The polling organization should try contacting households that do not respond by phone orface-to-face.
Suppose you are interested in comparing brand A interior latex paint to brand B interior latex paint. Design an experiment to determine which paint is better for painting kitchens. Choose the best design for this experiment.
Matched-pairs design because experimental units are paired and there are only two levels of treatment
A binomial probability experiment is conducted with the given parameters. Compute the probability of x successes in the n independent trials of the experiment. n = 7, p = 0.45, x = 4
P(4) = 0.2388 even # = use binompdf [2nd VARS]
Let the sample space be S={1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Suppose the outcomes are equally likely. Compute the probability of the event E={2, 4, 7, 8}.
P(E) = 4/10 = 0.4
If P(E) = 0.55, P(E or F) = 0.70, and P(E and F) = 0.20, find P(F).
P(F) = 0.35 P(E or F) = P(E) + P(F)− P(Eand F) 0.80 = 0.55 + P(F) - 0.10 -> 0.80 - 0.55 + 0.10 = P(F)
Assume the random variable X is normally distributed with mean μ = 50 and standard deviation = 7. Find the 77th percentile.
The 77th percentile is 55.18 1. find closest to 0.77 on z score chart 2. plug numbers into this equation to get answer -> μ + (z score)(o)
In a poll, 37% of the people polled answered yes to the question "Are you in favor of the death penalty for a person convicted of murder?" The margin of error in the poll was 55%, and the estimate was made with 95% confidence. At least how many people were surveyed?
The minimum number of surveyed people was 359. p*(1-p) * (z/E)² -> 0.37(1-0.37)(1.97/0.05)^2 = 358.19
For the fiscal year 2007, a tax authority audited 1.49% of individual tax returns with income of $100,000 or more. Suppose this percentage stays the same for the current tax year. What is the probability that two randomly selected returns with income of $100,000 or more will be audited?
The probability is 0.000222 P(E and F) = P(E) x P(F) -> 0.0149 x 0.0149 = 2.2201 E-4
Find the probability P(E^c) if P(E) = 0.25.
The probability of P(E^c) is 0.75 1 - P(E)
According to a certain country's department of education, 44.7% of 3-year-olds are enrolled in day care. What is the probability that a randomly selected 3-year-old is enrolled in day care?
The probability that a randomly selected 3-year-old is enrolled in day care is 0.447 percent -> decimal
Find the value of Z subscript alpha (α). alpha (α) = 0.09
The value of z = 0.09 is 1.34 - find the closest to 0.09 inside the z score chart
A _______ is a numerical summary of a sample.
statistic
Test the hypothesis using the P-value approach. Be sure to verify the requirements of the test. H0: p=0.3 versus H1: p>0.3 n=100; x=35; α=0.05
Z0 = 1.09 P value = -0.138 Do not reject the null hypothesis, because p value is greater than α calc = stats, test, #5, plug in the get the answer
Compute the critical value Zα/2 that corresponds to a 91% level of confidence.
Zα/2 = 1.70 1-0.91 = 0.09/2 = 0.045 -> find closest in z score chart to find z
Determine the area under the standard normal curve that lies between (a) z = -0.45 and z = 0.45 (b) z = -1.61 and z = 0 (c) z = -1.67 and z = 2.16
a) 0.3473 b) 0.6643 c) 0.9372 use calc: 2nd VARS -> normalcdf -> plug in lower and upper while keeping the other two same
side by side box (a) What is the median of variable x? (b) What is the third quartile of variable y? (c) Which variable has more dispersion? Why? (d) Describe the shape of the variable x. Support your position. Choose the correct answer below. (e) Describe the shape of the variable y. Support your position. Choose the correct answer below.
a) 50 [middle line of box x] b) 53 [third line on box y] c) Variable y —the interquartile range of variable y is larger than that of variable x. [IQR = Q3-Q1] d) Symmetric—the median is the center of the box and the left and right whiskers are about the same length. e) Skewed right—the median is left of center in the box and the left whisker is shorter than the right whisker.
Assume the random variable X is normally distributed with mean μ=50 and standard deviation sigma σ=7. Compute the probability. Be sure to draw a normal curve with the area corresponding to the probability shaded. P(X>35) (a) Which of the following normal curves corresponds to P(X>35)? (b)P(X>35)=?
a) graph based on > 35 P(X>35) = P (x> -2.14) = 0.9838 ^ first solve x - mean / stand. dev. = the z score is found, use it to find actual answer on chart
A college entrance exam company determined that a score of 25 on the mathematics portion of the exam suggests that a student is ready forcollege-level mathematics. To achieve this goal, the company recommends that students take a core curriculum of math courses in high school. Suppose a random sample of 250 students who completed this core set of courses results in a mean math score of 25.2 on the college entrance exam with a standard deviation of 3.4. Do these results suggest that students who complete the core curriculum are ready for college-level mathematics? That is, are they scoring above 25 on the math portion of the exam? Complete parts a) through d) below. (a) State the appropriate null and alternative hypotheses. Fill in the correct answers below. (b) Verify that the requirements to perform the test using the t-distribution are satisfied. Check all that apply (c) identify the test statistic test and p value (d) Write a conclusion based on the results. Choose the correct answer below.
a) H0: u = 23 versus H1: u > 23 b) the students were randomly sampled, the sample size is larger than 30, and the students' test scores were independent of one another c) t0 = 2.30, p value = 0.012 [calc: stat, test, #2] d) reject the null hypothesis and claim that there is sufficient evidence to conclude that the population mean is greater than 23.
A simple random sample of size n=81 is obtained from a population with μ=75 and σ=9. (a) Describe the sampling distribution of x over bar. (b) What is (P x> 76.7)? (c) What is (P x≤73)? (d) What is (74.45<x<76.8)?
a) The distribution is approximately normal. ux = 75, ox = 1 [ Ux = u and o = o / square root of n] b) 0.0446 c) 0.0228 d) 0.6729 ^ solve for b, c and d = x - ux / ox a] solve to find the z score then find the answer on the chart. once found, subtract 1 b] same as above, but no need to subtract from 1, answer will be from z score chart c] solve and then plug into 2nd VARS normalcdf
The data in the table to the right are based on the results of a survey comparing the commute time of adults to their score on a well-being test. Complete parts (a) through (d) below. (a) Which variable is likely the explanatory variable and which is the response variable? (b) Draw a scatter diagram of the data. Which of the following represents the data? (c) Determine the linear correlation coefficient between commute time and well-being score. (d) Does a linear relation exist between the commute time and well-being index score?
a) The explanatory variable is commute time and the response variable is the well-being score because commute time affects the well-being score. b) use stat crunch to find correct graph c) statcrunch: stat -> regression -> simple linear -> plug in x and y and compute; r = is the answer d) Yes, there appears to be a negative linear association because r is negative and is less than the opposite of the critical value.
Determine the point estimate of the population mean and margin of error for the confidence interval. Lower bound is 20, upper bound is 30.
a) The point estimate of the population mean is 25. b) The margin of error for the confidence interval is 5. point estimate = add both then divide by 2; margin of error = upper - point estimate
The accompanying data represent the miles per gallon of a random sample of cars with a three-cylinder, 1.0 liter engine. (a) Compute the z-score corresponding to the individual who obtained 39.7 miles per gallon. Interpret this result. (b) Determine the quartiles. (c) Compute and interpret the interquartile range, IQR. (d) Determine the lower and upper fences. Are there any outliers?
a) The z-score corresponding to the individual is 0.23 and indicates that the data value is 0.23 standard deviation(s) above the mean. [x - x bar (mean) / standard deviation ] b) Q1 = 36.9 Q2 = 38.55 Q3 =40.95 [find in statcrunch (put numbers ascending order first): stat, summary, column, click Q1, Q3 and IQR; for Q2 = add # 12 and 13 and divide by 2] c) The interquartile range is 4.05 mpg. It is the range of the middle 50% of the observations in the data set. [ find IQR = Q3 - Q1] d) lower fence = 30.825; upper fence = 47.025; the outliner(s) is/are 48.8 [lower fence = Q1-1.5(IQR), upper fence = Q3-1.5(IQR); outliers = any number from the given chart greater than upper fence]
The following data represent the number of games played in each series of an annual tournament from 1932 to 2003. Complete parts (a) through (d) below. (a) Construct a discrete probability distribution for the random variable x. (b) Graph the discrete probability distribution. Choose the correct graph below. (c) Compute and interpret the mean of the random variable x (d) Compute the standard deviation of the random variable x.
a) add all of frequency then get each number and divide it with the total b) graph based on c) statcrunch: stat -> calc -> plug in x and y (discrete probability) -> mean shown at the bottom; The series, if played many times, would be expected to last about ... games, on average. d) same steps above, answer to the right of it
Suppose a simple random sample of size n = 75 is obtained from a population whose size is N=10,000 and whose population proportion with a specified characteristic is p equals 0.2. (a) Describe the sampling distribution of p hat. Determine the mean of the sampling distribution of p hat. Determine the standard deviation of the sampling distribution p hat (b) what is the probability of obtaining x = 18 or more individuals with the characteristic? that is, what is P( p hat greater than or equal to 0.24)? (c) what is the probability of obtaining x = 9 or fewer individuals with the characteristic? that is, what is P( p hat less than or equal to 0.12)?
a) approximately normal because n is less than or equal to 0.05 and np(1-p) greater than or equal to 10 - 0.2 - 0.046188 [square root of up(1-up) / n] b) 0.1932 [x-up/ op -> normalcdf (0.866026, 4, 0, 1) c) 0.0418 [same as above but instead, normalcdf will be (-4, -1.73,0, 1)
Assume that the random variable X is normally distributed, with mean μ=42 and standard deviation σ=10. Compute the probability. Be sure to draw a normal curve with the area corresponding to the probability shaded. (a) Which of the following shaded regions corresponds to P(X≤39)? (b) P(X≤39) =?
a) graph based on < or equal to 39 b) P(X < or equal to 39) = P (x> -0.3) = 0.3821
Assume the random variable X is normally distributed with mean mu equals 50μ=50 and standard deviation sigma equals 7σ=7. Compute the probability. Be sure to draw a normal curve with the area corresponding to the probability shaded. (a) Which of the following normal curves corresponds to (35<X<66)? (b) (35<X<66)=?
a) in between graph b) lower = -2.14 and upper = 2.29 -> 0.9728 ^ solve as same question above, but solve it twice and can plug it into 2nd VARS normalcdf
Match the linear correlation coefficient to the scatter diagram. The scales on the x- and y-axis are the same for each scatter diagram. (a) r=−0.992, (b) r=−1, (c) r=−0.049
a) linear; on top of each other b) linear; one straight line c) scattered everywhere
(a) Identify the shape of the distribution, and (b) determine the five-number summary. Assume that each number in the five-number summary is an integer.
a) the distribution is roughly symmetric b) 0, 5, 10, 15, 20
A data set is given below. (a) Draw a scatter diagram. Comment on the type of relation that appears to exist between x and y. (b) Given that x over bar =3.6667, s Subscript x =2.5820, y over bar =4.1333, s Subscript y =1.8228, and r =−0.9533, determine the least-squares regression line. (c) Graph the least-squares regression line on the scatter diagram drawn in part (a).
a) use stat crunch to find correct graph; There appears to be a linear, negative relationship. b) statcrunch: stat -> regression -> simple linear -> plug in x and y and compute; y = ... + ...x is the answer] c) same graph as (a); can confirm on page 2 from last step
Test the hypothesis using the P-value approach. Be sure to verify the requirements of the test. H0: p=0.75 versus H1: p≠0.75 n=500, x=360, α=0.1 a) Is np0(1−p0)≥10? Select the correct choice below and fill in the answer box to complete your choice. (b) now find p hat (c) find the test statistic z0 (d) find the p value and stat the conclusion of the hypothesis test
a) yes because np0 (1-p0) = 93.75 [ n(p)(1-p) ] b) p hat = 0.72 c) -1.55 d) 0.121; do not reject the null hypothesis, because the p value is greater than α ^ for a-d, get the answer from stat, test, #5
Determine the required value of the missing probability to make the distribution a discrete probability distribution.
add all and then subtract from 1
What measure of central tendency best describes the "center" of the distribution?
bell shaped distribution = mean
A(n) _____ is obtained by dividing the population into groups and selecting all individuals from within a random sample of the groups.
cluster sample
In statistical studies, researchers want to determine how varying one or more _______ variables may impact the value of a(n) _______ variable.
explanatory variable, response variable
Suppose a person claims that, "0.9% of all people in the nation always eat out." Is this a descriptive or inferentialstatement?
inferential
The standard deviation is used in conjunction with the ____ to numerically describe distributions that are bell shaped. The ___ measures the center of the distribution, while the standard deviation measures the ___ of the distribution.
mean, mean, spread
A frequency distribution lists the ___ of occurrences of each category of data, while a relative frequency distribution lists the ___ of occurrences of each category of data.
number, proportion
A __________ is a numerical summary of a population.
parameter
sample mean
x bar
One year Thomas had the lowest ERA (earned-run average, mean number of runs yielded per nine innings pitched) of any male pitcher at his school, with an ERA of 3.27. Also, Karen had the lowest ERA of any female pitcher at the school with an ERA of 3.01. For the males, the mean ERA was 4.652 and the standard deviation was 0.834. For the females, the mean ERA was 4.844 and the standard deviation was 0.712. Find their respective z-scores. Which player had the better year relative to their peers, Thomas or Karen? (Note: In general, the lower the ERA, the better the pitcher.)
z = x - μ/ standard deviation Thomas had an ERA with a z-score of -1.66. Karen had an ERA with a z-score of −2.58. Which player had a better year in comparison with their peers? Karen had a better year because of a lowerz-score.
The ___ represents the number of standard deviations an observation is from the mean.
z-score
The sum of the deviations about the mean always equals ____
zero
population mean
μ
In a poll, a random sample of 2163 adults (aged 18 and over) was asked, "When you see an ad emphasizing that a product is made in your country, are you more likely to buy it, less likely to buy it, or neither more nor less likely to buy it?" The results of the survey are presented in the side-by-side graph. Complete parts (a) through (d) below.
(a) What proportion of 18- to 34-year-old respondents are more likely to buy when made in their country? What proportion of 35- to 44-year-old respondents are more likely to buy when made in their country? The proportion of 18- to 34-year-old respondents is 0.67. The proportion of 35- to 44-year-old respondents is 0.58. (b) What age group has the greatest proportion who are more likely to buy when made in their country? 18-34 yrs (c) Which age group has a majority of respondents who are less likely to buy when made in their country? 55+ yrs (d) What is the apparent association between age and likelihood to buy when made in their country? As age decreases, likelihood to buy homegrown increases