STAT101 - Final Exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

State whether the alternative hypothesis below is a two-sided, right-sided or left-sided z-test: 𝐻𝑎: 𝑝 < 𝑝0

left-sided z-test

The sampling distribution of sample means is centered on the population mean. True False

True

Two Proportions: how do you calculate the confidence interval when considering two sample proportions?

You simply subtract the two samples from one another, add/subtract the confidence interval (95% = 1.96) and times by the Standard Error.

Two Proportions: How is the standard error calculation different when finding the z-interval for the difference of two proportions?

You have to add the standard error for the two samples.

Note that a larger sample size 𝑛, the _______ standard error of the sample mean

smaller

𝜇 (pronounced "mu")

the average from a population

𝑥̅ (pronounced "x-bar")

the average from a sample of a population

sampling frame

the list of elements in a population (for example, a list of all households in a city)

How do you express the Standard Distribution Model?

𝑍~𝑁(0,1) For the standard normal distribution the mean is 0 and the variance (and equivalently the standard deviation) is 1

You are conducting a one-sample z-test for a proportion. Your null hypothesis is that the population proportion equals 0.60 while the alternative is that is not equal to 0.60. If the sample size is 100 and the sample proportion is 0.50, what is the z test statistic?

-2 To calculate: z0 = 0.60 za not equal to 0.60 n = 100 p-hat = 0.50 SE = square root of 0.50(.40) / 100 SE = .049 z= 0.50 - 0.60 / .049 z = -2.04

When testing the Success-Failure Condition, we calculate the proportion as a mean. From the following samples, what is the value? p1 = 1, p2 = 0, p3 = 1, p4 = 1 What does this number mean?

0.75 1+0+1+1 / 4 = 0.75 For each sample, 1 = success and 0 = failure. In our sample size, we have 3 successes and 1 failure giving us an average success rate of 75%. However, because we do not have at least 10 successes and 10 failures, this would not be an adequate sample size.

Simple Random Sampling (SRS)

A probability sampling technique in which each element in the population has a known and equal probability of selection. Every element is selected independently of every other element, and the sample is drawn by a random procedure from a sampling frame.

Tom has a Z-score of -0.5 on an exam while Ann has a Z-score of 1.5. Who scored higher on the exam? Tom Ann Not enough information is given

Ann

A sample statistic is a numerical summary of a population. True False

False A population parameter is a numerical summary of a population, while a sample statistic is a numerical summary of a sample.

True or False: The probability of a Z-score lying within 2 standard deviations of zero on the standard normal distribution is approximately 0.68.

False. No, the probability is approximately 0.95.

What does the height of the Normal Distribution Model tell us?

How likely certain sets of values are to occur.

Which of the following is true? 1. Inferential statistics is about making conclusions about a population using data from a random sample 2. Inferential statistics is organizing or presenting data from a population only 3. Descriptive statistics is about making conclusions about a population using data from a random sample

Inferential statistics is about making conclusions about a population using data from a random sample

What is a Point Estimate?

Our best guess from our sample of a population parameter.

Assuming we know the standard error (SE), how can you express a confidence interval (Point Estimate)?

Point Estimate +-(plus or minus) 2 x SE (standard error)

X~N(μ,σ2)

The Normal Distribution Model

What does the height in the Normal Distribution Model tell us?

The probability, or how likely the values are to occur.

Success/Failure Condition

The sample size must be big enough so that both the number of "successes," np, and the number of "failures," nq, are expected to be at least 10.

standard deviation

a computed measure of how much scores vary around the mean score.

In statistical inferences, what does ≈ denote?

a very close, but not exact approximate

What inferences can we make using the Central Limit Theorem?

about the world around us from relatively small, random samples of target populations.

In a Normal Distribution Model, 𝑁 denotes: 𝑋~𝑁(𝜇, 𝜎2)

normal distribution

elements

the sample (for example, individuals) from the population

In a Normal Distribution Model, 𝜇 denotes: 𝑋~𝑁(𝜇, 𝜎2)

"mu" is the population mean (average)

one-stage cluster sampling

A set of clusters is randomly selected, and all the cases in the selected clusters are included in the sample Clusters are groups, typically those naturally occurring (for example, neighborhoods, cities, countries) Each element can be a member of only one cluster. Clusters are shown as 𝑐

Stratified Sampling

A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group (for example, racial or ethnic groups, teams, organizations) Each element can be a member of only one of the strata. Strata is shown as 𝑠

sampling distribution

The distribution of sample means of the same size 𝑛 from the same population • The sampling distribution of sample means is centered on the population mean 𝜇 • Although the mean from any particular sample is likely to be off, on average the sample mean 𝑥̅ is our best guess for 𝜇

What is the mean of the standard normal distribution model (that is, where is it centered)? The mean of the standard normal distribution model is always 0.5 The mean of the standard normal distribution model is always zero The mean of the standard normal distribution model is always 1

The mean of the standard normal distribution model is always zero

True or False: The sampling distribution of sample means is symmetric

True

True or False: The standard normal distribution is always centered on 0 and always has a variance of 1.

True

The ____________ is a point estimate for the population proportion. • sample variance • sample proportion • sample size

sample proportion

How do we find a standard (or unit) normal distribution when comparing two types of distributions/scores?

We convert X to Z-scores

In a Normal Distribution Model, ~ denotes: 𝑋~𝑁(𝜇, 𝜎2)

is distributed as

Inferential statistics

making conclusions about a population based only on data from a sample

Parameter

numerical summary of a population

Descriptive statistics

organizing and presenting data from a sample or population

How much data in a sample size is enough to draw a conclusion using CLT that would reflect a normal distribution?

𝑛 ≥ 30 For fairly symmetric distributions, 𝑛 ≥ 15 If the actual distribution in the population is normally distributed, then the sampling distribution of the mean is always normally distributed

What two parameters govern the Normal Distribution Model?

𝜇 "mu" - the population average 𝜎2 "sigma squared" - the variance

If we know that the normal SAT score in the population with 𝜇 = 1500 and 𝜎 = 300, what is the probability someone would score at least 1630 on an SAT? HINT: You must first calculate the z-score for scoring 1630. 𝑥 − 𝜇 𝑧 = ------ 𝜎

1630 - 1500 / 300 130/300 = 0.43 And if you want to find out the probability of scoring less than 1630, use the compliment rule: 1.0 - 0.43 = .57

What is the variance of the standard normal distribution model (that is, what is the spread)? The variance of the standard normal distribution model is always 0.5 The variance of the standard normal distribution model is always zero The variance of the standard normal distribution model is always 1

The variance of the standard normal distribution model is always 1

How do you convert the values of a Standard Distribution Model into a Z-Score?

The z-score of a value is the number of standard deviations it falls above or below the mean.

How is the Normal Distribution Model expressed?

X~N(μ,σ2)

standard error of the mean

gives a measure of how far off our estimate of the average is from the true population average

What does a test statistic (like a z-test statistic) tell us?

how far our sample proportion (p-hat) is from the claimed proportion (p-subscript zero) of the null hypothesis

You are conducting a one-sample z-test for a proportion. Your null hypothesis is that the population proportion equals 0.50 and your alternative hypothesis is that the population proportion is less than 0.50. What kind of z-test are you conducting?

left-sided z-test

From the data below, if we take a sample size of 3, calculate the Success-Failure Condition. [remember, np has to be greater than or equal to 10 AND n(1-p) has to be greater or equal to 10.] min Q1 median Q3 max mean sd n 0 0 0 1 1 0.3289 0.469 11471 Will the success-failure condition hold?

np = 0.99 n(1-p) = 2.01 To calculate: Since n=3, np = 3 x 0.33 = 0.99 n(1-p) = 3 x 0.67 = 2.01 Since both of these are smaller than 10, the success-failure condition will not hold.

standard error of the mean

the standard deviation of a sampling distribution

sampling variability

the variability in our samples

In a Normal Distribution Model, 𝜎2 (sigma squared) denotes: 𝑋~𝑁(𝜇, 𝜎2)

the variance

Using the data below and a confidence level of 95%, calculate the z-interval estimate. Sample Size (n) = 343 Proportion No Yes Total 0.12 0.88 1.0 What does the z-interval estimate mean? Hint: First determine the Standard Error (SE).

(0.0857, 0.1543) The point estimate (p-hat) = 0.12 (people who do not have health insurance) To find the z-interval estimate, we must first find the Standard Error. SE = the square root of p(1-p) / n SE = the square root of 0.12(1-0.12) / 343 SE = 0.0175 To get a 95% confidence level, we have to add and subtract (1.96x0.0175) from 0.12 (which is our center) |-------------- ---------------| .0857 0.12 .1543 This means, we are 95% confident that the people who do not have health insurance in our sample is between 8.6% and 15.4%.

Calculate the Standard Error (SE) for a Proportion with the population = 1,000, using the data below: min Q1 median Q3 max mean sd n 0 0 0 1 1 0.3289 0.469 11471 Hint: the SE is simply the standard deviation in the population divided by the square root of the sample size.

0.015 To calculate: n = 1,000 SE = sd / square root of 1,000 SE = 0.47 / 31.623 SE = 0.015

Suppose you have a sample proportion of 0.80 and a sample size of 100. What is the standard error for the sample proportion?

0.04 To find the standard error from a sample proportion: Take the square root of the sample proportion (0.80) times (1 - .80 the sample proportion) divided by the sample size. .80 (1 - .80) / 100 squared = 0.04

Using the data below, what is Point Estimate for people without health insurance? Proportion No Yes Total 0.12 0.88 1.0

0.12 The point estimate is simply the number of "successes" in our sample population.

What are three commonly used sampling methods?

1. Simple Random Sampling 2. Stratified Sampling 3. Cluster Sampling (one stage and multi-stage)

What are the Five 5-Steps for Hypothesis Testing: z-test for a Sample Proportion

1. State the null and alternative hypotheses in terms of 𝐻_0 or 𝐻_𝑎 2. Select significance level 𝛼 3. Compute the test statistic 𝑧 4. Assume the null is true and make a decision by: a. Comparing the test statistic 𝑧 with the critical value or values b. Comparing the p-value with the significance level 𝛼 5. State your conclusion

The sampling distribution for a sample proportion is approximately normal if there are at least ________ "successes" and _________ "failures" in the population.

10, 10

A student has a Z-score of 0 on an exam. If you know the distribution of exam scores is closely approximated by a normal distribution, what is the student's percentile? 50th percentile 95th percentile 68th percentile

50th percentile

Suppose we know that for the SAT 𝜇 = 1500 and 𝜎 = 300 in the American population while for the ACT 𝜇 = 21 and 𝜎 = 5. The distribution of SAT and ACT scores are both nearly normal. If we're told Ann scored 1800 on her SAT and Tom scored 24 on his ACT, who performed better? (X - 𝜇) Z= -------- 𝜎

Ann = 1800 - 1500 / 300 = 1 Tom = 24 - 21 / 5 = 0.6 Ann scored 1 standard deviation above the mean while Tom score 0.6 standard deviations about the mean

If the area under the curve to the LEFT of the vertical line equals 0.841, then what is the area under the curve to the RIGHT of the vertical line?

It is 0.159

What is our sampling distribution of a sample mean centered on?

It is centered on 𝜇 and has a standard deviation of σ/√n

Two Proportions: How is the Point Estimate determined when considering two proportions?

It is the difference between the two proportions.

How do you express the variance of a sample distribution?

It's the variance divided by the sample size.

How do you express the standard deviation of the sampling distribution?

It's the variance divided by the square root of the sample size.

Which of the following is NOT a feature of the sampling distribution of sample means? The sampling distribution of sample means is symmetric The sampling distribution of sample means has a variance equal to the population variance The sampling distribution of sample means is "mound-shaped"

NOT: The sampling distribution of sample means has a variance equal to the population variance

Suppose we gather a random sample of runners and we calculate a sample average running time of 94 minutes. If the SE of our estimate is 1, what is the 95% confidence interval?

Point Estimate ± 2 × 𝑆E = (94 − 2 = 92, 94 + 2 = 96) • We are 95% confident that the true population mean lies in the interval of 92 to 96 • That is, about 95% of those intervals would contain the actual mean if we repeatedly took random samples of the e size from the population

An economist randomly sampled 50 students from the population of students currently attending Harvard. What kind of sampling procedure did the economist conduct? Simple random sampling Multistage clustering sampling Stratified sampling

Simple random sampling

Since we conducted a census we know the population mean a the population standard deviation min Q1 median Q3 max mean sd n 0.5 17 30 41 102 30.137 17.86153 11471 If we collected samples of 10 (n=10), what is the standard error? sd/sq sample

Standard Error = SD/square root of sample size 17.86 ------ = 5.65 sr of 10

An educational psychologist split the population of Ivy League students into eight (8) groups based on the school they attended in the past year (Brown University, Columbia University, Cornell University, Dartmouth College, Harvard University, the University of Pennsylvania, Princeton University, or Yale University). She then randomly sampled 10 students within each of these groups. What kind of sampling method did she conduct? One-stage cluster sampling Simple random sampling Stratified sampling

Stratified sampling

Strata are ____________________ while cluster sampling works best when elements within clusters are ______________.

Stratified sampling works best when elements within strata are homogeneous, while cluster sampling works best when elements within clusters are heterogeneous

Which of the following is true of the normal distribution model? The area under the curve is always 1 The area under the curve is always 0.5 The distribution is always centered on zero

The area under the curve is always 1

Two Proportions: When conducting Hypothesis Testing, step 3 requires you to compute a test statistic. When considering two proportions, how do you achieve this task?

The z-test statistic is the difference between the two samples - the null hypothesis (which is zero) divided by the standard error.

In a Normal Distribution Model, what happens if we set 𝜇 = 0 and 𝜎= 1?

Then we have what is called the standard (or unit) normal distribution model • For short we call this: 𝑍~𝑁(0,1)

Two Proportions: When considering two proportions, how many successes and failures must be achieved before the sample can be trusted to reflect a "normal distribution"?

There must be 5 successes and failures. If this is true, you can use the z-test.

How can you define our sample mean (𝑥̅) in terms of a z-score?

This z-score tells us how many standard deviations our sample mean (𝑥̅) is from the population mean on the sampling distribution. We can create a confidence interval by asking how many z's we want to capture with our interval

Using Central Limit Theorem (CLT) and Z-Scores

This z-score tells us how many standard deviations our sample mean (𝑥̅) is from the population mean 𝜇 on the sampling distribution • We can create a confidence interval by asking how many z's we want to capture with our interval.

True or False: The sampling distribution of sample means is "mound-shaped"

True

normal distribution model

We can read this as "𝑋 is distributed normally with a mean of 𝜇 and variance of 𝜎2 X (our variable) ~ (is distributed as) N (as a normal distribution) centered on the population mean with a a variance of sigma squared.

If we do not know the standard error (SE), how can we express a confidence interval (Point Estimate)?

We do not typically know the standard deviation(𝜎) , so we often use the sample standard deviation (𝑠).

When calculating a confidence interval for a proportion, we need to know the standard error which is given by SE=(p(1−p)n). Since we typically do not know the population proportion, how do we calculate the standard error of the sample proportion?

When calculating confidence intervals we use the sample proportion as an estimate for the population proportion in the formula for the SE

What is the probability of a Z-score lying between the two vertical lines, approximately? left = 0.382 center = x right = 0.113

When you subtract the mean of "x", you get the value of zero. The total probability of an event occurring is 1 (or 100%). So take 1 and subtract the two remaining variabilities ( 1 - 0.382 - 0.113 = 0.505)

From the data below, if we take a sample size of 50, calculate the Success-Failure Condition. [remember, np has to be greater than or equal to 10 AND n(1-p) has to be greater or equal to 10.] min Q1 median Q3 max mean sd n 0 0 0 1 1 0.3289 0.469 11471 Will the success-failure condition hold?

np = 16.55 n(1-p) = 33.5 To calculate: Since n=50, np = 50 x 0.33 = 16.55 n(1-p) = 50 x 0.67 = 33.5 Since both are equal to or greater than 10, the success-failure condition holds true.

For any given range of values in the Normal Distribution Model, the area under the curve is the __________________ that those range of values will occur

probability

sampling method

procedure for selecting sample elements from a population

State whether the alternative hypothesis below is a two-sided, right-sided or left-sided z-test: 𝐻𝑎 : 𝑝 > 𝑝0

right-sided z-test

point estimate

sample mean 𝑥̅ as our best guess for the population mean 𝜇

Standard Error

standard deviation divided by the square root of the sample size

Central Limit Theorem (CLT)

states that if random samples of size n are drawn repeatedly from any population with mean and variance, then when n is relatively large, the distribution of these sample means will be approximately normal.

confidence interval

statistical range, with a given probability, that takes random error into account

What do the upper and lower confidence limits mean in relation to a Point Estimate?

they are based on the standard error (SE), which tells us how far off we are on average from the population mean

State whether the alternative hypothesis below is a two-sided, right-sided or left-sided z-test: 𝐻𝑎: 𝑝 ≠ 𝑝0

two-sided z-test

You are conducting a one-sample z-test for a proportion. Your null hypothesis is that the population proportion equals 0.25 and your alternative hypothesis is that the population proportion is NOT equal to 0.25. What kind of z-test are you conducting?

two-sided z-test

What do we mean when we say we have 95% confidence that our interval contains the true population mean?

we are 95% confident that the true parameter lies in our interval

How are two ways strata and clusters different?

• All strata are represented in the sample, while only a subset of clusters are in the sample • Stratified sampling works best when elements within strata are homogeneous, while cluster sampling works best when elements within clusters are heterogeneous

Simple Random Sampling - CONS

• Need to have a list of all the elements in the population • May be quite expensive or difficult to obtain an appropriate sampling frame

Stratified Sampling: CONS

• Need to have a list of all the elements in the population as well as their membership in the strata • May be quite expensive or difficult to obtain an appropriate sampling frame • May require more administrative work when ting a random sample

Simple Random Sampling - PROS

• Simple sampling procedure (get it?) • Requires relatively little knowledge about the population

One-Stage Cluster Sampling: CONS

• Typically results in greater variation between successive samples • Thus may require larger samples to say something about the population

Multi-Stage Cluster Sampling: CONS

• Typically results in greater variation between successive samples • Thus may require larger samples to say something about the population • Adds another layer of complexity to the sampling procedure (compare with one-stage cluster sampling)

Stratified Sampling: PROS

• We can ensure that we analyze groups with small proportions in the population • Successive samples tend to be less different from one another


Ensembles d'études connexes

CHAPTER 5 - States of Consciousness, Chapter 6 - Learning, Chapter 7 - Memory, Chapter 8 - Cognition and Thinking

View Set

1.A Ch 3. Supply Chain Management Strategy

View Set

Advanced accounting ch 1, Chapter 01 - The Equity Method of Accounting for Investments, AA Ch. 1, Advanced Accounting Exam 1 (1-3), ACC 230 Exam 1

View Set

Microbiology, Chapter 5, Viruses

View Set

Ch. 32-34 Online Learning Center Questions

View Set