Exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Statistical Inference

Drawing conclusions on the basis of observing only a smaller sample. Conclusions apply to a larger group of individuals. Also backed by a statement of our confidence in them.

Binomial Mean and Variance

If we have a binomial distribution with n trials and the probability of success p, then mean= mu = np variance= sigma^2 = np(1-p) standard Deviation= sigma= root(np(1-p))

Population Distribution

Population Distribution of a variable is the distribution of all values among all the individuals in that population.

P(Z > 2.09)

Procedure: Find the probability associated with 2.09 on the table and subtract that probability from 1. P(Z > 2.09) = 1 - P(Z < 2.09) = 1 - P(Z < 2.09) = 1 - 0.9817 = 0.0183

P(Z < 1.25)

Since the table gives us the area under the standard normal curve to the left of z, then provide the value from the table. Look down the first column and find 1.2, then go across the top row until you find .05.

Shape of Binomial Distribution

Symmetric if p = 0.5, but also when n is large even if p is close to 0 or 1.

Accurate

The method measures what it intended; correctly Estimates population parameter

Sample Space of Binomial Distribution

The sample space for the binomial setting consists of 2n possible outcomes. Example: n = 5 trials 2n = 32 possible outcomes

Sampling Variability

The value of a statistic varies in repeated random sampling Main idea: to see how trustworthy a procedure is, ask what would happen if we repeated it many times.

Analogy for accuracy and precision

To be a good golfer, we need to get the golf ball in the cup. Needs to be both accurate (tends to hit the ball near the cup) and precise (even when missing, does not miss by much). Think of the cup as the population parameter and the golf balls as estimates.

Measuring Accuracy and Precision

We will measure an estimate's accuracy by considering bias (which focuses on center). We will measure an estimate's precision with a statistic called the standard error (which focuses on spread).

Hypothesis Testing

assesses evidence for a claim about a parameter by comparing it with observed data is used to judge between two different claims about the parameter.

Important Questions to Ask when Sampling

What percentage of people who were asked to participate actually did so? Did researchers choose people to participate in the survey or did the people themselves choose to participate? If a large percentage of those chosen to participate refused to do so or if people themselves chose to participate, the conclusions of the survey are suspect.

Properties of Normal Distribution

When drawing the normal curve, the mean, μ and the standard deviation, σ have specific roles The mean μ is the center of the curve. The values (μ - σ) and (μ + σ) are the inflection points of the curve.

Continuous or Discrete Random Variable? Exp. RV a) Take a 30-question # of questions MC test answered correctly b) Observe cars arriving # of cars that paid the toll at a tollbooth for 1 hour c) Observe an # of non-productive employee's work hours (8 hour day) d) Weight of a shipment # of pounds

X = 0, 1, 2, ...20; discrete X = 0, 1, 2, ...; discrete 0 < X < 8; continuous X > 0; continuous

Can we use a sample proportion to make inferences about a population proportion?

Yes we can p= population proportion p(hat)= sample proportion (Vary from sample to sample)

Parameter

a measure that describes a population. This value is fixed, yet (most likely) unknown

Statistics

a measure that describes a sample. This value can vary from sample to sample but is known when we have the sample.

Hypothesis

an assumption made about the population from which the sample was taken. Two Types Null Hypothesis, H0 (read H-naught) Alternative Hypothesis, Ha (read H-A)

Independent Event

if knowing that one occurs does not change the probability that the other occurs.

Estimated Standard Error in CI

root((p(hat)(1-p(hat))/n)

Margin of Error

shows how precise we believe our estimate is based on the variability of the estimate. It is the quantity we add and subtract to our estimate when constructing a confidence interval. z*root((p(hat)(1-p(hat))/n)

Confidence Intervals

supplements an estimate of a parameter with an indication of its variability answers the question "what is the value of the parameter?"

significance level

the probability of making the mistake of rejecting the null hypothesis when, in fact, the null hypothesis is true. Typical values for a are 0.10, 0.05, and 0.01. If the null hypothesis is rejected, the probability of type I error will be 10%, 5% or 1%. a = 0.05, there is a 5% chance of rejecting a true null hypothesis.

Unbiased estimate:

when the mean of the sampling distribution of the statistic is equal to the true value of the parameter being estimated.

Critical Value in CI

z*

Standardization?

z=(x-mu)/sigma mu=mean sigma=st. dev.

Precision (CLT)

reflected in the spread of distribution The means are nearly the same but the standard error for the larger sample size is much smaller. Smaller standard error = more precise

As the confidence level increases...

the margin of error of the confidence interval also increases.

Random Variable

variable whose value is a numerical outcome of a probability experiment

95% Confidence Interval Example

(0.3758, 0.4642) We are roughly 95% sure that this interval captures the unknown population approval rating proportion.

Requirements of a Discrete Probability Distribution

0 ≤ P(xi) ≤ 1 for each value of xi of X Sum of all probabilities = 1 P(X = x) is read the probability that the random variable X equals the value x. pi = P(xi) = P(X = xi)

Density Curve

A density curve is a mathematical model of a distribution. A density curve is a mathematical model of the histogram The total area under the density curve, by definition, is equal to 1, or 100%. Histogram is built from the sample. Smoothed curve describes the population. "Area under the curve" = probability

Sampling Distribution

A graph that represents all possible values the sample proportion can take is defined as a sampling distribution. The sampling distribution of a statistic is the distribution of all possible values taken by the statistic when all possible samples of a fixed size n are taken from the population. It is a theoretical idea; in reality, we often do not actually build it (though today we will simulate it). The sampling distribution of a statistic is the probability distribution of that statistic.

Standard Normal Distribution

A normal distribution with mean = 0 and standard deviation = 1. Any normal distribution with mean, m and standard deviation, s can be standardized...

Discrete Random Variable

A random variable that can take on a countable number of observations An example is flipping a coin 10 times where X is the number of heads.

Continuous Random Variable

A random variable whose values are uncountable An example is the amount of time it takes to complete a task. Our exam is 70 minutes. You can finish any time between 0 and 70 minutes

Confidence intervals provide us with:

A range of plausible values for a population parameter. A confidence level, which expresses our level of confidence that the interval contains the population parameter.

Simple Random Sampling (SRS)

Each individual in the population has the same chance of being chosen for the sample. In addition, every possible sample of size n out of a population of N individuals has an equally likely chance of being selected. Subjects are selected without replacement (subjects cannot be selected twice).

Example of Finding Critical Value

Ex: Find the critical value for a 92% confidence interval (see board). A = 0.92; 1 - A = 0.08; (1 - A)/2 = 0.04 Look inside the Table 2 for 0.04 to find the low critical value (remember it is a plus/minus). If you wanted to find the positive z critical value, look inside the table for 1 - 0.04 = 0.96.

Binomial Probability Function Example

Flip a coin five times. First, find out how many ways can we obtain 2 heads in the 5 flips where a head is a success. Let's use the first part of the function 5!/(2!(5-2)!)=10

Conditions for SE to be correct

For this standard error to be correct, The sample is randomly selected from the population of interest. If the sampling is without replacement, the population must be at least 10 times larger than the sample size.

Right-sided Hypotheses

H0: p = p0 Ha: p > p0

Left-sided Hypotheses

H0: p = p0 Ha: p < p0

Two-sided Hypotheses

H0: p = p0 (a given value we are testing) Ha: p ≠ p0

Precise

If the method is repeated, the estimates are very consistent.

Multiplication Rule

If two events A and B are independent, then P(A and B) = P(A) x P(B)

We say what we say if we reject H0.

If we do not reject the null hypothesis, we conclude that there is not enough statistical evidence to infer that Ha is true. Remember, we will never accept the H0 as true, we always will either reject or not reject it.

Binomial Distribution

One of several specific discrete probability distributions. Handles probability problems with two outcomes (or problems that can be reduced to two outcomes). Result of a binomial experiment.

Error

P(Type I error) = a. This is called the significance level. P(Type II error) = b. a and b are inversely related. Any attempt to reduce one will increase the other.

Population VS Samples

Populations - We have access to all of the individuals in a group of interest : Descriptive measures of populations are called parameters. Samples - We can only access a portion of the individuals in a group of interest. Descriptive measures of samples are called statistics.

Precision

Precision is reflected in the spread of distribution. It is measured by using the standard deviation of the column of sample proportions.

Probability Sample

Probability sampling uses chance to select a sample, based on known selection probabilities. Any bias is accommodated using knowledge of the selection probabilities.

P(1.01 < Z < 2.02)

Procedure: Find the area to the left of 2.02 and subtract the area to the left of 1.01. P(Z < 2.02) = 0.9783 P(Z < 1.01) = 0.8438 P(1.01 < Z < 2.02) = 0.9783 - 0.8438 = 0.1345 A little off because of rounding

Conditions to Check

Random sample Large sample size np0 ≥ 10 and n(1 - p0) ≥ 10 Large population Population is at least 10 times bigger than the sample size if the sample is collected without replacement. Independence* Each observation has no influence on any others. (*Note: the book separates conditions 1 and 4, but we do not have to).

p(hat) characteristics

The mean of p(hat)'s equals the population proportion; p. This will always happen. This does not work in practice but only theoretically

H0 Information

The null hypothesis (H0) is assumed to be true throughout the statistical analysis. Only if the sample observations are in extreme contradiction to H0 do we reject H0 in favor of Ha. If H0 cannot be rejected, we do not conclude that H0 is true but merely that we have no evidence to reject it.

Identify the population and sample. What was the parameter of interest? What is the statistic? In January 2015 the Pew Research Center published a report stating that 37% of Americans believed that genetically modified foods (GMOs) were safe to eat. This was based on a survey of 2002 American adults.

The population is all American adults. The sample was the 2002 American adults who were surveyed. The parameter of interest is the percentage of all American adults who believe that GMOs are safe to eat. The statistic is 37% (the percentage of the sample who felt this way).

What if you were asked to find that z-value where 10% of the data fell above this value?

The probabilities are found inside the table. Remember, they are probabilities to the left of a specific z-value. We notice 0.10 is to the right of the z-value in question. If 0.10 is the area to the right, then 0.90 is the area to the left. Then, find the probability closest to 0.9000 inside the table. I found 0.8997. This probability is associated with the z-value 1.28.

Discrete Probability Distribution

The probability distribution of a discrete random variable X lists the values and their corresponding probabilities.

Probability Distribution

The probability distribution of a random variable X tells us what values X can take and how to assign probabilities to those values. Also called probability model

P-Value

The probability, computed assuming H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed. The smaller the p-value, the stronger the evidence against H0. (The easier it will be to reject H0).

Binomial Distribution Variables

The random variable representing the count X of successes in the binomial setting has the binomial distribution with parameters n and p. The parameter n is the total number of observations. The parameter p is the probability of success on each observation. The count of successes X can be any whole number between 0 and n.

Alternative Hypothesis, Ha

The research hypothesis; the statement about a population parameter we intend to demonstrate is true. Claims that the effect we are looking for does exist.

Central Limit Theorem

The simulations helped us understand what happens all the time. Thus, we do not need to simulate over and over again. The central limit theorem gives information about the shape of the sampling distribution of the sample proportion when certain conditions hold.

Test Statistic

The test statistic has the structure: z=(observed value-null value)/SE = (p(hat)-p0)/root(p0(1-p0)/n) 𝑝 ̂ is the sample proportion p0 is proportion believed to be true in the null hypothesis

Binomial Experiment Requirements

There are a fixed number of trials n. Each observation fall into one of just two categories (called success and failure). The probability of a success is the same for each trial and is labeled, p. The n trials are all independent.

Second Piece of the function

To obtain the probability of any one specific outcome we use... P("outcome") = p#S (1 - p)#F Where p = probability of success 1 - p = probability of failure #S = number of successes (k) #F = number of failures (n - k)

Cautions about Sampling

Under coverage some individuals or groups in the population are left out of the process of choosing the sample. Nonresponse individuals chosen for the sample cannot be contacted or refuse to cooperate/respond. Response bias behavior of respondent or interviewer may lead to inaccurate answers or measurements Wording of questions confusing or leading (biased) questions; words with different meanings Unfortunately, random sampling cannot handle all types of possible bias.

Procedure for finding Critical Value

Using standard normal table: Once we know the area in the middle (call it A), we can use it to find the critical value. If area A is found in the middle of our standard normal distribution, then the area 1 - A represents the total area in both tails. Thus, the area in one tail would be (1-A)/2

Suppose we have IQ test results that we know are normally distributed with mean 100 and standard deviation 15. (We write N(100, 15)). Find the probability that a randomly selected person scores below 112.

We are looking for P(X < 112) We must standardize.. z=(x-mu)/sigma=(112-100)/15=.8 so, P(Z < 0.80) will allow us to use the table to answer this question. P(Z < 0.80) = 0.7881 Thus there is a 78.81% chance that a randomly selected individual scored below 112 on the IQ Test.

Making a Decision using the P-value

We compare the p-value with the significance level, α. This value is decided before conducting the test. If the p-value is less than or equal to α (p ≤ α), then we reject H0. If the p-value is greater than α (p > α), then we fail to reject H0. If the p-value is as small or smaller than a, we say that the data are statistically significant at level a.

What happens when we construct many confidence intervals?

We will simulate the process of constructing many, many confidence intervals. This will help us clearly understand the interpretation of a confidence interval.

Empirical Rule

Approximately 68% of the data will lie within 1 standard deviation of the mean. Approximately 95% of the data will lie within 2 standard deviations of the mean. Approximately 99.7% of the data will lie within 3 standard deviations of the mean.

Things that can go wrong in choosing a sample for a population

BIAS A survey method is biased if it has a tendency to produce an untrue value. Three types: Measurement bias Sampling bias Use of an estimator that is biased (we will not do this)

Bias

Bias is measured as the distance between the mean value of the estimator (the center of the distribution) and the population parameter.

Categorical Variables

Calculating mean is impossible. BUT we can cound the "successes" p(hat)= # who approve (x) / total # sample (n)

Example Aspirin claims that the proportion of headache sufferers who get relief with two pills is 53%. What is the probability that in a random sample of 400 headache sufferers, less than 50% get relief? We are given p = 0.53 and n = 400 We are asked to find P( < 0.50)

Check that p(hat) is approximately normal Find the mean and SE m = 0.53 s = 0.025 Standardize Z = (0.50 - 0.53)/0.025 = -1.20 P(Z < -1.20) = 0.1151

Null Hypothesis, H0

Claims that the effect we are looking for does not exist. It is the "no change" or "no difference" hypothesis. A skeptical statement about a population parameter. The hypothesis test is designed to assess the strength of the evidence against the null hypothesis.

Continued...

Next, we want to calculate the probability of one of these sequences happening. The question is, what is the probability of you getting say SSFFF in five coin flips. Said in another way, what is the probability of getting a head on the first flip and a head on the second and a tail on the third and a tail on the fourth and a tail on the fifth. We know the probability of getting heads on a coin flip is 0.5 (getting a tail is also 0.5). In addition, we know that getting a head on the first flip has no affect on getting a head or a tail on the second or any of the other flips. Thus coin flips are independent. Consider using the Multiplication Rule for independent events (remember a head is a success). P(SSFFF) = P(S) * P(S) * P(F) * P(F) *P(F) P(SSFFF) = P(S)2 * P(F)3 P(SSFFF) = 0.52 * 0.53 P(SSFFF) = 0.5 * 0.5 * 0.5 * 0.5 * 0.5 = 0.03125

Sampling Bias

Occurs when a sample is used that is not representative of the population Example: Internet polls - people who answer these polls tend to be those who have strong feelings about the results and are not necessarily representative of the population

Measurement Bias

Results from asking questions that do not produce a true answer. Occurs when measurements tend to record values larger (or smaller) than the true value Example: Asking people, "How much do you earn?" It is likely that people will report a value higher than their actual salary, resulting in an estimate that tends to be too large. Self-reporting of personal data The use of confusing wording in survey questions The use of non-neutral language in questions

Standard Error

SE= sigma p(hat)= root((p(1-p)/n) in reality we dont know p thus the SE= root((p(hat)(1-p(hat))/n)

Example of Sample Consider a simple random sample of size n = 2 form a population of N = 4. Population: {A, B, C, D} How many different samples of 2 can be taken (assuming no replacement).

Samples: [AB, AC, AD, BC, BD, CD] Each sample is equally liking.

Sampling Distribution

Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

Conditions for CLT for Sample Proportions

Sampling is random and independent. May sample with or without replacement. Large sample: The sample size, n, is large enough that the sample expects at least 10 successes and 10 failures. np>=10, n(1-p)>=10 Big population: If sampling is done without replacement, the population must be at least 10 times larger than the sample size.

Confidence Level

Tells us how often the estimation is successful. Measures the success rate of the method, not of any one particular interval.

Continuous Probability Distribution

The continuous random variable, X takes all values in an interval of numbers (often measurements). The continuous probability distribution assigns probabilities as areas under a density curve.

Properties continued

The curve is symmetric about the mean (i.e. area under the curve to the left of the mean is equal to the area under the curve to the right of the mean). The mean = median = mode. So, the highest point of the curve is at x = μ. The curve has inflection points at (μ - σ) and (μ + σ). The total area under the curve is equal to 1. As x gets larger and larger (in either the positive or negative directions), the graph approaches but never reaches the horizontal axis.

Binomial Probability Function

The first part of the function gives us the number of ways of arranging exactly k successes among n trials. This is also called the number of possible combinations. This piece of the function is also known as the binomial coefficient. n!/(k!(n-k)!)

Variable

defined to be a characteristic of an individual

Sampling Design

describes exactly how to choose a sample from the population.

The probability that on entering college, a student will graduate is 0.77. An academic advisor selected a random sample of 12 students and followed that group for 6 years. What is the probability of the following events. Exactly 10 of the 12 graduate? Less than half graduate? 8 or more graduate? Between 7 and 9 inclusive graduate?

n=12, p=.77 x=10 x<=5 x>=8 7<=x<=9

Type II error

occurs if one does not reject H0 when it is false.

Type I error

occurs if one rejects a true H0

Confidence interval equation

p(hat)- z*root((p(hat)(1-p(hat))/n)<p<p(hat)+ z*root((p(hat)(1-p(hat))/n)


Conjuntos de estudio relacionados

HSA 4184 Chapter 7 Quiz + Knowledge Check Qs

View Set

Social Studies - Chapter 20 - America and World War II 1941-1945 - Grade 11

View Set

ITP 120 - Chapter 4 File Access QUIZ

View Set

6.6 BONE GROWTH AND DEVELOPMENT DEPEND ON BONE REMODELING, WHICH IS A BALANCE BETWEEN BONE FORMATION AND BONE RESORPTION

View Set