STAT chapter 9 notes

Ace your homework & exams now with Quizwiz!

confidence interval

a confidence interval is an interval of a values that the researcher is fairly sure will cover the true unknown value of the population parameter -in other words we use a confidence interval to estimate the value of a population parameter -we can not obtain a specific number but we can make intervals

estimate the population proportion from a single sample proportion

- we can find how far apart the sample proportion and the rue population proportion are likely to be. that information is contained in the standard deviation of p-hat = √p(1-p)/n p= the population proportion p-hat = a sample proportion the standard error of p-hat √p-hat(1-p-hat)/n ex. if p-hat = .39 and n = 2400 the standard error is √.39(1-.39)/2400 = .01 -estimates the theoretical standard deviation of the sample distribution for all possible sample proportions from this population -because we know that the mean, which is the true proportion p, is almost surely within 3 standard deviations of the observed value p-hat, we know that p is almost surely within the range p-hat +/- 3 (standard errors) = .39 +/- 3(.01) = .39 +/- .03

from curiosity to questions about parameter

curiosity about something -> question about a parameter ->collect data appropriate for that parameter -> make inferences about the value of the parameter

increasing the sample size

if the sample size increase -the mean of the sampling distribution of possible sample means would remain the same (8 is still 8 which in the example was μ) -the standard deviation sd.x-bar would decrease σ/√n =5/√100= .5 instead of 5/√25=.1(range based on empirical rule gets smaller with increase n) -the larger the sampling the more accurate estimates of the population values tend to be

parameter

is a number that is a summary characteristic of a population, a random situation, or a comparison of populations -also known as a population parameter -we assume that the parameter is a fixed unchanging value -usually the parameter is not know to us and will never be known because we can not measure every unit of the population -even though this is the case we will be able to statistical methods to make an informed guess

notation for the mean of paired differences

n= the sample consists of n matched pairs of observation -the response variable is quantitative -di = the difference in the two measurements for individual i, where i = 1,2,...n -μd= the mean for the population of differences, if all possible pairs were to be measured -σd = the standard deviation for the population of differences -d-bar = the mean for the sample of differences -sd = (the d is a subscript) the standard deviation for the sample of differences

statistical inference

the procedure we will use for making these conclusions -the two most common procedures are to find confidence intervals and the conduct hypothesis tests

standard deviation of measurements versus standard deviation of sample means

the standard deviation of the population σ the standard deviation of the sample mean σ/√n

paired difference or independent sample

-"population mean for paired differences" -"differences in two population means" >paired differences - data that are formed by taking the differences in matched pairs are called >population mean for paired differences- once we have taken differences, the parameter of interest is the population mean for paired differences, which is the mean of the difference of matched pairs for the entire population >dependent samples- samples taken as matched pairs are sometimes called dependent variables because the two values for a pair are not statistically independent >with two independent variables, the individuals in one sample are not coupled in any way with the individuals in the other sample ex. measuring the height of brother and sisters would not produce independent variables -but random measurements of girls and boys int he city would make independent variables -it can also be and independent sample when one sample is divided into two groups based on a categorical grouping variable such as smokers and nonsmokers or girls and boys >difference in two population means- the parameter of interest with quantitative data from independent samples

conditions for which the approximate normality of the sampling distribution for a sample proportion applies

-for each situation 3 conditions must be met for the approximate normality of the sampling distribution to apply 1. the physical situation -> either there is an actual population with fixed proportion who have a trait of opinion of interest, or there is a repeatable situation for which an outcome of interest occurs with a fixed relative frequency probability. 2. data collection-> either a random sample is selected from an actual population or situation is repeated numerous times, with the outcome each time independent of outcomes all other times 3. sample size or number of trials-> the size of the sample of number of binomial trials must be large enough that we expect to see at least ten of each of the two possible responses or outcomes. that is, np and n(1-p) must each be at least 10 ex. 1 in 5 people win with a lottery ticket -you buy 100 tickets -np=20 -n(1-p) = 80 -mean = p = .20 -standard deviation = √p(1-p)/n = √(.2)(.8)/100 = .04 -there is about a 68% chance that the proportion of the 100 tickets that will give a prize will be between .16 and .24 -about 95% chance that the proportion of winning tickets will be between .12 and .28 -about 99.7% chance that the proportion of winning tickets will be between .08 and .32 different ex. p-hat was looked for in the center part of the A1 table and so p-hat ≥ .53 = .09

conditions for the sampling distribution of the mean to be approximately normal

-for the sampling distribution of the sample mean to be approximately normal, unlike the sampling distribution for proportions, it is not always necessary to nave a large sample. -the sampling distribution of the sample mean is approximately normal in both of the following types of situations situation 1: the population of the measurements of interest is bell-shaped, and a random sample of any size is measured situation 2: the population of measurements of interest is not bell-shaped but a large random sample is measured. 30 is usually used as an arbitrary demarcation of large but if there are extreme outliers it is better to have an even larger sample size when a population of observation is bell-shaped and/or a large random sample is to be taken, the sampling distribution for a sample mean can be defined as follows let μ = mean for the population of interest let σ = standard deviation for the population of interest let x-bar = sample mean (the mean of the sample) mean = μ standard deviation = σ/√n -> the standard deviation of x-bar (does not change because it represents the population)

hypothesis testing

-hypothesis testing or significance testing uses sample data to attempt to reject a hypothesis about the population -usually researchers want to reject the notion that chance alone can explain the sample results -the hypothesis that researchers set up to reject in that setting is that two categorical variables are unrelated to each other in the population so that any observed relationship in the sample is simply due to chance -in most research settings the desired conclusion is that the variables under scrutiny are related -hypothesis testing is applied to a population parameters by specifically a null value for a parameter-a value that would indicate that nothing of interest is happening -ex the average weight loss is 0 pounds for clinic patrons -so the null value would be 0 pounds -the clinics goal of course would be to show that something more interesting is happening and that in fact the average weight loss is greater than 0 pounds -in most cases the researcher hopes to show that the null value is not correct

a common format for the five sampling distribution

-in each case, as long as certain conditions are met, the sampling distribution is approximately normal -the mean of the sampling distribution is the population parameter that corresponds to the sample statistic (the parameter of interest estimated by the statistic) -the standard deviation of a sampling distribution measures how the values of the sample statistic might vary across different samples from the same population -the standard deviation and mean are vital to specifying the sampling distribution for any sample statistic -in each of the five cases the mean of the sampling distribution for the statistic is the population parameter estimation -population proportion= p-hat (the sample proportion) -population mean = x-bar (the sample mean)

a statistic or a sample a statistic

-is a number that is computed form a sample of values taken from a larger population -it is a numerical summary of the sample data -the sample data may be collected in a survey, an observational study, or an experiment

sd module: sampling distribution for the difference in two sample means

-now we are looking at questions that can be answered by looking at the difference in two means for independent samples -we need to collect independent samples from the two groups or population -in independent samples the individuals are not coupled in any way -you can use a categorical variable to form two groups

sampling distribution module 4 sampling distribution for the mean of paired differences

-paired differences for a sample of pairs can be used to estimate the mean difference for the population of pairs -independent samples or matched pairs -> in which observations are taken on two halves of a pair (sometimes called dependent samples)

chapter 9 random

-possible values of sample statistics are variables -if 2 different samples are taken from the same population, it is likely that the sample statistic will be different for those two samples -ex if we take 2 different random samples of 1000 people (asked who was left handed) the proportions would probably differ (you will not get the same exact numbers) -each of these sample proportions is an example of a sample statistic -the corresponding population parameter is the proportion of the entire population that is left handed -note that the population parameter remains fixed although we do not know its numerical value -the value of a sample statistic may change from sample to sample and we will know the value once we have measured a sample general idea-> one population parameter(fixed) and many possible values of a sample statistic

sampling distribution module 2 sampling distribution for the difference in 2 sample proportions

-sometimes we want to compare a proportion or probability for two populations -another way to make a comparison in this situation is to examine the difference in two population proportions -this is appropriate when we have two populations -there are usually 2 questions of interest about a difference in 2 population proportions -we want to estimate the value of the difference -often we want to test the hypothesis that the difference is 0 -> which would indicate that the two proportions are equal -we need to collect independent samples a summary of three ways to create samples that can be considered to be independent in this situation -take seperate representative samples form each of two populations such as men and women and find the sample proportion of interest for each sample -take one representative sample form a population and use a measureed categorical variable such as male/female or smoker/nonsmoker to divide them into two groups. find the sample proportion of interest for each of the two froups -randomly assign participants in a randomized experiment to two treatments such as aspirin or placebo and find the sample proportion of interest in each condition

standard deviation and standard error of statistics

-the formula for the standard deviation of the sampling distribution differs for each of the five cases -in each case it is dependent on the sample size and the sampling distribution gets smaller for larger samples (as the sample size gets larger the variance among values gets smaller) -the standard deviation of the sampling distribution of the sample mean is called the standard deviation x-bar -the the standard deviation of the sampling distribution of a sample proportion is called the standard deviation of p-hat -we use the term standard error to describe the estimated standard deviation for a sampling distribution -the standard error of x-bar the standard error of p-hat

sample distribution module 3: sampling distribution for one sample mean

-the information of interest involves the mean or means of quantitative variables -how close will sample means be to the population mean ex. mean -8 pounds sd-5 pounds -95% (2 standard deviations) 8+10=18 8-10=-2 -we are interested in using the sample mean to estimate the population mean, we will need to use the sample standard deviation to help find a standard error for this estimate

mean, standard deviation, and standard error for the sampling distribution of d-bar

-the mean of sampling distribution d-bar is μd, the population mean of the differences. the standard deviation of d-bar is s.d.d-bar = σd/√n -the standard error is s.e.d-bar = sd/√n -the standard error is used to estimate the standard deviation when σd is not known

conditions for the sampling distribution of d-bar to be approximately normal

-the same conditions that are required for the sampling distributions of x-bar to be approximately normal also hold for the sampling distribution of d-bar. either of the following two situations will work situation 1: the population of differences is bell shaped, and a random sample of any size is measured situation 2 : the population of differences is not bell shaped, but a large random sample is measured. 30 is usually used as an arbitrary demarcation of large but if there are extreme outliers it is better to have an even larger sample size

random chapter 9

-the sample distribution of the statistic that estimates the parameter -a confidence interval procedure to estimate the parameter to within an interval possibilities -a hypothesis test procedure to test whether the parameter equals a specific value

sample distribution (SD) 0: an overview of sampling distribution

-the sample statistic and parameter relationship allow us to determine the accuracy of sample statistics representing the population and to create a confidence interval -it also allows us to determine the extent to which sample results are surprising or unusual statistics as a random variable: -not ever sample will have the same results >random variable- assigns a number to the outcome of a random circumstance -sample statistics are like random variables -a sample statistic has a probability distribution which is called a sampling distribution -the sampling distribution for a static describes the possible values the static might have when random samples are taken from a population -a sample statistic will be within a specific distance of the unknown population parameter -that tells us how confident we an be that an interval extending that specified distance above and below the sample statistic will reach the population parameter -a sampling distribution is the probability distribution for a sampling statistic -not about individual values for the population -sampling distributions give us information about sampling statistics -when we have the sampling distribution we can use that to make inferences about population parameter

the big five parameters

-two basic types of variables -> categorical and quantitative -the best way to summarize categorical data is to find the proportion or percentage that fall into each category -one of the most useful summaries for quantitative data is the mean -the five parameters are all involved either in proportions (for categorical data) or mean (for quantitative data) -parameter-> population samples -statistic-> samples the five parameters for categorical 1. one population proportion (or probability) 2. difference in 2 population proportions the five parameters for quantitative 3. one population mean 4. population mean of paired differences (dependent) 5. difference in 2 population means (independent) the five parameters symbol for the population parameter 1. p 2. p1-p2 3. μ 4. μd 5. μ1-μ2 the five parameters symbol for the sample statistic 1. p (hat) 2. p1-p2 (hat) 3. x (bar) 4. d (bar) 5. x1-x2 (bar)

sampling distribution (SD): sampling distribution for one sample proportion

-we conduct an experiment with n trials and a success probability p, and get successes on x trials -the parameter of interest is the population proportion p -in each case we can compute the statistic p-hat = the sample proportion = x/n (the proportion of trials resulting in success) -if we repeated the experiment we would get a different number for p-hat -x is a binomial random variable

reading and pronounced the notion for parameter and statistics

-μ- mew -μ- is used for population means -the notation x-bar and d-bar are notations for sample means -p-used for population proportion -p-hat is the sample proportions -the smaller numbers that they come with are subscripts -read as "p hat one minus p hat two" - μd "mu-d" or "mu-sub-d" -standard deviation (population) -> σ --standard deviation (sample) -> s

familiar examples translated into questions about parameter

estimating the proportion falling into a category of a categorical variable: -what proportion of American adults believe there is extraterrestrial life? -in what proportion of opposite sex British marriage is the wife taller than her husband population parameter: p = proportion of the population falling into that category sample estimate p-hat = proportion of the sample falling into that category estimate the difference between two population proportions falling into a category of a categorical variable: -how much difference is there between the proportions who would quit smoking if smokers were to take the antidepressant (Zyban) versus if they were to wear a nicotine patch? -how much difference is there between the proportion of men who don't snore? population parameter: p1-p2 where p1 and p2 represent the proportion falling into the category of interest in population 1 and 2 respectively sample estimate: p1-p2 (hat) the difference between the two sample proportion estimating the mean of a quantitative variable -what is the mean time that college students watch tv per day? what is the mean pulse rate of women? population parameter: μ = population mean for the variable sample estimate: x-bar sample mean for t he variable estimating the mean of paired differences for quantitative variables -what is the mean difference in weights for freshman at the beginning and at the end of the first semester? -what is the mean difference in age between husbands and wives in Britain? population parameter μd = population mean of the differences in the two measurements sample estimate d bar = mean of the differences for a paired sample of the two measurements estimating the difference between two population means for a quantitative variable -how much difference is there in mean weight loss for those who diet compared to those who exercise to lose weight? -how much difference is there between the mean foot lengths of men and women? population parameter: μ1-μ2 where μ1 and μ2 represent the means in populations 1 and 2 respectively sample estimate: x1-x2 (bar) the difference between the two sample means

notion for the difference in two proportions

in the notation scheme for two proportions we will continue to use p too denote a population proportion and p-hat to denote a sample proportion, and we will use subscripts 1 and 2 to represent the groups for the population p1 = population proportion for the first population p2 = population proportion for the second population the parameter of interest is p1-p2 the difference is population proportions: p-hat 1 = sample proportion for the sample from the first population p-hat 2 = sample proportion for the sample from the second population

sample estimate or estimate

is sometimes used for a sample statistic when the statistic is used to estimate the unknown value of a population parameter

the sampling distribution for a sample proportion

let p = population proportion of interest or binomial probability of success let p-hat = corresponding sample proportion or proportion of successes -is numerous samples or repetitions of the same size n are taken, the distribution of possible values of p-hat is approximately a normal curve distribution with mean =p standard deviation = p-hat = √p(1-p)/n "the standard deviation of p-hat" -this approximate normal distribution is called the sampling distribution of p-hat note-> the n individuals in the sample or the repetition must be independent, equivalent to condition 3 ina binomial experiment

many possible samples

normally research is only done once but statisticians have calculated what to expect for the vast majority of possible samples and how much variability to expect among them

example

p-hat = 385/1017 = .379 (sample proportion) women opposing the death penalty p-hat = 254/885 = .287 (sample proportion) men opposing the death penalty difference p1-p2(hat) = .379 - .287 =.092 -if we were to repeat this experiment with the same number of men and women we most likely will not get .092 again as the difference -√.37(1-.37)/1017 + .27(1-.27)/885 = .021 = the standard deviation of the sampling distribution

the sampling distribution for a sample proportion

p-hat = X/n -dividing each possible value of X by the sample size n does not change the shape of the distribution of possible values -in other words for large n, the sampling distribution for a sampling proportion is approximately a normal distribution (bell shaped curve) -this can applied in two ways -situation 1: a random sample is taken form a relatively large actual population and a categorical variable is measured -situation 2 : a binomial experiment is repeated numerous times and the proportion of success is measured

notation for the difference in two means

the data consists of n1 observations from the first population and n2 observations from the second population. the response variable measured in both samples is quantitative μ1 = population mean for the first population μ2 = population mean for the second population the parameter of interest is μ1- μ2 = the difference in population means x-bar = sample mean for the sample from the first population x-bar = sample mean for the sample from the second population

mean, standard deviation, and standard error for sampling distribution of p1-p2(hat )

the mean of the sampling distribution of p1-p2(hat) is p1-p2 = the difference in population proportion -the standard deviation of the sampling distribution is the standard deviation of p1-p2(hat) which is sd (p1-p2) = √ p1(1-p1)/n1 + p2(1-p2)/n2 when we do not know the population proportions we use sample proportions instead thus the standard error of p1-p2(hat) se(p1-p2 hat) √p1(1-p1)/n1 + p2(1-p2)/n2 (all p's are p-hats) -the variance of any random variable is the standard deviation squared -variance = x -standard deviation = √x

statistical significance

the phrase statistical significance is used when the sample statistic would not be likely is the null value were correct -this is considered to be evidence that the null value is not correct -ex if the weight loss clinic observes that a sample of patrons had an average weight loss of 10 pound and determines that this would be unlikely if the true population average weight loss was 0 pounds, they have a statistically significant evidence for the claim that the average population weight loss is greater than 0 pounds

standard error of the mean

the population standard deviation σ the sample standard deviation s ->used in the σ place when determining the standard deviation for the sampling distribution of sample means -> when making this substitution we call the results the standard error of the mean -the standard error measures roughly how much on average the sample mean x-bar is in error as an estimate of the population mean the standard error of the mean = s/√n s is the standard deviation of the observations in the sample

sampling distribution

the sampling distribution for a statistic is the probability distribution of possible values of the static for repeated samples of the same size taken from the same population

conditions for the sampling distribution of p1-p2 (hat) to be approximately normal

the sampling distribution of the difference in two independent sample proportions is approximately normal when both of these conditions hold condition 1: sample proportions are available for two independent samples, randomly selected from the two populations of interest condition 2: all of the quantities n1p1, n1(1-p1) n2p2, n2(1-p2) are at least 10. these represent the expected numbers of success and failures in each of the 2 samples


Related study sets

17. Injuries of tendons, diagnostics, treatment

View Set

Agile Stakeholder Management Quiz

View Set

Revenue Sharing and Competitive Balance

View Set