STAT 301 Exam 2

Ace your homework & exams now with Quizwiz!

Continuous random variables that appear to have equally likely outcomes (evenly distributed) over their range of possible values should be modeled using which distribution?

Uniform continuous distribution

when is the confidence level approximately C?

when the sample size is larger and the population distribution is not normal

when do we use the z test statistic?

when we know the population standard deviation

what is the level c confidence interval for mu?

x bar +/- m

when is the confidence level exactly C?

when the population distribution is normal

are test results/conclusions about the samples or the populations?

populations

do side-by-side boxplots and mean plots give us information about the samples or the populations?

samples

The natural tendency of randomly drawn samples to differ, from another is known as

sampling variability or sampling error.

what is the fourth step of a significance test?

state a conclusion regarding evidence against the null hypothesis

what is the first condition for a matched pair analysis?

subjects are matched in pairs or there are two measurement on each subject

what does the Bonferroni Multiple Comparisons test tell us?

that there is a significant different between 2 groups if the p-value is greater than or equal to alpha or if the CI for that difference does not contain 0

what does the coefficient of determination represent?

the % of variation in the data that is explained by the ANOVA model

what two numbers is the shape of f distribution depended upon?

the degrees of freedom for the numerator and the degrees of freedom for the denominator

The central limit theorem says that when a random sample of size n is drawn from any population with mean μ and standard deviation σ (read as sigma), then when n is sufficiently large (n ≥ 30)

the distribution of the sample mean is approximately normal.

what is the pooled standard deviation?

the estimate of the population standard deviation

what do we not know when doing ANOVA?

the population standard deviation

what is does a confidence level state?

the probability that the method will give a correct answer (90%, 95%, 99%, etc.)

what is a p-value?

the probability, assuming H0 is true, that the test statistic would take a value as extreme or more extreme than the actually observed

what does the coefficient of determination estimate?

the standard error of the model

what do we use to estimate the population mean when we know the population standard deviation?

the standard normal hypothesis tests and confidence intervals

what do means plots do?

they compare the means for different groups

what do normal quantile plots do?

they detect outliers and extreme deviations from normality for each group

how many sets of hypothesis will there always be for two-way ANOVA?

three

why do we standardize?

to assess how far the estimate is from the hypothesized value

what is a significance test designed to do?

to assess the strength of evidence against the null hypothesis

T/F: Both the null and alternative hypothesis are about the population parameter.

true

T/F: the P-vale we calculate depend on the alternative hypothesis.

true

how can we estimate the parameter mean with one value of x bar calculated from one SRS?

we apply the central limit theorem, in which x bar is approximate

if the distribution is normal, what can we assume?

we are under the central limit theorem

why do we use the sample standard deviation instead of the population standard deviation when doing ANOVA?

we do not know the population standard deviation, so we use the sample standard deviation to build a pooled estimator of the standard deviation

what do we carry out a one-way ANOVA?

we draw a SRS from each population and use the data to test the null hypothesis that the population means are all equal

what else do we need to do before running ANOVA?

we need to run a summary of statistics with information about each group

how is two-way ANOVA more efficient than one-way ANOVA?

we reduce the amount of collected data by studying two factors simultaneously

what is the third condition for a matched pair analysis?

we use the one-sample CI and HT procedures that we learned before

what are the advantages of two-way ANOVA?

-more efficient -reduced residual variation (within groups) -ability to study between the factors

what are the two parts of any confidence interval?

1) an interval compound 2) a confidence level

what are the two types of inference?

1) estimate the parameter mean 2) test a statement about the parameter

what three plots can be used before running ANOVA?

1) means plots 2) side by side boxplots 3) normal quantile plots

what assumptions can we make for ANOVA?

1) need independent SRSs from each population 2) populations assumed to be normal with the same standard deviation (same width/spread)

what do we know/don't know about the population distribution of x?

1) the distribution could have any overall shape 2) we don't know the population mean 3) we assume we know the standard deviation

what are the three possible conclusions from two-way ANOVA?

1) there is/is no evidence that factor A has an effect 2) there is/is no evidence that factor B has an effect 3) there is/is no evidence that the interaction between factors A and B has an effect

A U.S. Web Usage Snapshot indicated a monthly average of 36 Internet visits per user from home. A random sample of 24 Internet users yielded a sample mean of 42.1 visits with a standard deviation of 5.3. At the 0.01 level of significance can it be concluded that this differs from the national average. You need to find test statistic ONLY using a formula. Report your answer to three decimal places. Do NOT round in the intermediate calculations. Note: The first statistic in the second and third equation is the sample mean X bar.

Ans: 5.638 Null hypothesis Ho : u = 36 Alternate hypothesis Ha : u not equal to 36 Test statistics t = (sample mean - claimed mean)/(s.d/√n) t = (42.1 - 36)/(5.3/√24) t = 5.638

What does it mean for an estimator to be consistent?

As the sample size gets larger and larger the estimates ultimately get closer and closer (i.e. converge) to the true value of the parameter.

In most situations, the true mean and true standard deviation of a population are unknown (unobserved) quantities that have to be estimated from sample data.

True

when can we run ANOVA?

when the largest sample std < 2(smallest sample std)

Indiana University administration reported that 56% of all faculty and staff members donated to the United Way campaign. A survey of a random sample of 100 faculty and staff members found that 60% have donated to United Way campaign. In this setting,

56% is a parameter value and 60% is a statistic value.

A random sample of 16 measurements was selected from a population that is approximately normally distributed produced sample mean = 97.94 and sample standard deviation = 12.64. If we construct 80% and 95% confidence intervals (CI) for population mean form the data, which statement is true?

80% CI will be narrower than the 95% CI for population mean.

A water hydrant dispenses water at a rate described by a uniform continuous distribution over the interval 50 to 70 gallons per minute. Find the probability that at most 65 gallons are dispensed during a randomly selected minute. Report your answer to 2 decimal places. Do NOT report your answer as a fraction. Such as if you get ¼, report it 0.25

Ans: 0.75 a = 50 b = 70 c = 50 d = 65 P(c < x < d) = (d - c) / (b - a) P(58 < x < 62) = (65 - 50) / (70 - 50) = 0.75

To estimate the proportion of traffic deaths in California last year that were alcohol related, determine the necessary sample size for the estimate to be accurate to within 0.06 with 90% confidence. Based on results of a previous study, we expect the proportion of traffic deaths in California to be about 0.30. > qnorm(0.950) [1] 1.644854 Note: Use z value = 1.645 Sample size must be reported as a whole number. Such as if you get n = 22.1, report it 23, if you get n = 22, report it 22, if you get n = 22.9, report it 23, etc.

Ans: 158 n = (0.30)(1-0.30)(1.645)(1.645) / (0.06)(0.06) = 157.8515

A confidence interval estimate is desired for the gain in a circuit on a semiconductor device. Assume that the gain is normally distributed with standard deviation s = 20. How large must n be if the width of the 95% confidence interval is to be 40? > qnorm(0.975) [1] 1.959964 Note: Use z value = 1.96 Sample size must be reported as a whole number. Such as if you get n = 22.1, report it 23, if you get n = 22, report it 22, if you get n = 22.9, report it 23, etc.

Ans: 4 Margin of error E = length of CI / 2 = 40 / 2 = 20 Sample size = ( Z * / E)2 a) n = ( 1.96 * 20 / 20)2 = 3.84 n = 4

Four students are randomly sampled at a certain school so that the proportion of students who like tofu can be estimated. Three of the four students say they like it. Why is it not appropriate to use z (= large sample) confidence interval for P?

Because there are fewer than 15 successes or fewer than 15 failures in the sample

Which distribution should be used to model the waiting time until the first event occurs?

Exponential distribution

Of all possible unbiased estimators for a population parameter, the estimator that is the minimum variance unbiased estimator (MVUE) has the largest variance.

False

A weapons manufacturer uses a liquid propellant to produce gun cartridges. During the manufacturing process, the propellant can get mixed with another liquid to produce a contaminated cartridge. A statistician found that 23% of the cartridges in a particular lot were contaminated. Suppose you randomly sample (without replacement) gun cartridges from this lot until you find the first contaminated one. Let X be the number of cartridges sampled until the first contaminated one is found. Which distribution best describes the context?

Geometric distribution

Which of the following is a form possible for a null hypothesis?

H0: population characteristic = hypothesized value

If all else remain the same, which of the following will make a confidence interval for the population mean narrower? I. Decrease the confidence level II. Decrease the sample size III. Decrease the margin of error

I and III

If all else remains the same, which of the following will make a confidence interval for the population mean wider? I. Increase the confidence level II. Increase the sample size III. Decrease the margin of error

I only

In general, which of the following statement is true about the sampling distribution of the sample mean? Formula: Standard error of sample mean = σ / sqrt(n)

Increasing the sample size decreases the standard error.

Choose an incorrect relationship among mean, median, and mode for a normal distribution.

Mean < Median

what is formula for the central limit theorem?

N(mean, (standard deviation/square root of n))

Which distribution should be used to model for predicting the number of events that occur over a given interval of time?

Poisson distribution

A sample of 36 commuters in Chicago showed the sample average of the commuting times was 33.2 minutes and the sample standard deviation was 8.3 minutes. A researcher is interested in finding a 99% confidence interval of true average commuting times in Chicago. What is an appropriate population parameter in this setting?

Population mean

To qualify for a police academy, candidates must score in the top 10% on a general abilities test. The test has a mean of 200 and a standard deviation of 20. Find and interpret the lowest possible score to qualify for a police academy. Assume the test scores are normally distributed. Two R codes and outputs are given below but only one of them is correct on the context of the question. Your job is first identify the right R code and interpret the result. You do NOT need to run R code on your machine. R Code 1: > qnorm(0.90,mean=200,sd=20) [1] 225.63 R code 2: > qnorm(0.10,mean=200,sd=20) [1] 174.369

The lowest possible score to qualify for a police academy for candidates is 225.63 on a general abilities test.

Assume each newborn baby had a probability of approximately 0.54 of being female and 0.46 of being male. For a family of four children, let X = number of children who are female. Which of the following statement correctly describes the required conditions of a binomial distribution?

The n trials are independent, each trial has the same probability of a success, and each trial has two possible outcomes.

The Central Limit Theorem states that the sampling distribution of the sample mean is approximately normal under certain conditions. Which of the following condition is the necessary for the Central Limit Theorem to be used?

The sampling must be done randomly and the sample size must be large (e.g., at least 30).

Choose the best statement that is true about the standardized z-score of a value of a normal random variable X, which has mean µ and standard deviation σ.

The z-score has a mean equal to 0, the z-score has a standard deviation equal to 1, the distribution of z-scores is a normal distribution, and the z-score tells us by how many multiples of σ the original X observation fall away from the mean.

What is the total area or probability under any probability density function curve for a continuous random variable such as normal distribution curve?

Total area or probability = 1

A random sample of 250 students at Indiana University finds that these students take an average of 15.6 credit hours per semester with a standard deviation of 2.1 credit hours. The 98% confidence interval for the true mean is 15.6 ± 0.309 (i.e. sample mean ± margin of error). Interpret the confidence interval.

We are 98% confident that the true average number of credit hours per semester taken by Indiana University students falls in the interval 15.291 to 15.909 hours.

what is a null hypothesis?

a claim that we will try to find evidence against

what is a significance level alpha?

a decisive value that announces, in advance, how much evidence against the null hypothesis (Ho) we will require to reject it

what is one-way ANOVA?

a method for comparing several population means

when do we use a two-sided hypothesis?

when the parameter differs from its null hypothesis in either direction (u does not equal u0)

when do we use a one-sided hypothesis?

when the parameter differs from its null hypothesis value in a specific direction (u<u0 or u>u0)

what is a coefficient of determination?

an additional parameter that is given as output by some statistical packages

what is an alternative hypothesis?

an alternative statement that we suspect is true

what is a level C confidence interval for a parameter?

an interval compound from sample data by a method that has probability C of producing an interval containing the true value of the parameter

what is the relationship between confidence level and margin of error?

as confidence level increases, margin of error increases

what is the relationship between population standard deviation and margin of error?

as population standard deviation increases, margin of error increases

what is the relationship between sample size and margin of error?

as sample size increases, margin of error decreases

why do we still have to estimate the population standard deviation although we're mainly interested in finding the population mean?

because a confidence interval or testing a hypothesis for the population mean requires estimating the population standard deviation

what do we do if we estimate the parameter mean?

build a confidence interval (CI)

what is the third step of a significance test?

calculate the probability (p-value) of the estimate under the null hypothesis

what is the second step of a significance test?

calculate the test statistic to measure the compatibility between the null hypothesis and the data

The sampling distribution of a statistic is a probability distribution

calculated from all possible random samples of a specific size (n) taken from a population. all values the statistic can take in all possible samples of size n.

what is the first step in a significance test?

create a null and alternative hypothesis

what distribution is for large(r) samples?

distribution for the sample mean

what do we do if we test a statement about the parameter?

do a hypothesis test (HT)

what is an interval compound?

estimate +/- margin of error

what is the second condition for a matched pair analysis?

for each pair of individual, we use the difference between two measurement as the data for our analysis

what type of plots are appropriate to see/measure skewness?

histograms, stemplots, or boxplots

what do side by side boxplots indicate?

if there is a heavy overlap of boxes, that indicates that there probably won't be a difference in the means of the groups

what is the Bonferroni Multiple Comparisons test?

it is a test that consists of simultaneous 2-sample comparison of means t tests

what does it mean when a P-value is large?

it is not rare to have big difference between the estimate (x bar) and the hypothesized value (u0); there is not enough evidence against the null hypothesis (Ho)

what does it mean when a P-value is small?

it is rare to measure an estimate (x bar) that is different from the hypothesized value (u0); the data provides strong evidence against the null hypothesis (Ho)

what is the significance of I when doing ANOVA?

it is the number of groups or populations and it is the number of means we want to compare

what is an f test statistic?

it is the test statistic that is used for one-way ANOVA that has a non-symmetric distribution

The Central Limit Theorem is considered powerful in statistics because

it works for any population distribution provided the sample size from a random sample is sufficiently large.

what type of boxplot is appropriate to see outliers?

modified boxplot

what type of boxplot is appropriate to check for normality?

normal quantile boxplot

what do you need to do if you reject the null hypothesis with ANOVA?

perform the Bonferroni Multiple Comparisons test


Related study sets

[BIO 430] Ch.12 - Nervous Tissue

View Set

Cuban Revolution and rule of Fidel castro

View Set

Ethics and Professional Responsible

View Set