intro to social statistics: ch 4 – part 2
solving sample distribution problems
1. check the assumptions 2. describe the distribution shape: normal outliers: n/a center: mean spread: standard deviation 𝜎/square root of n notation X~N( 𝜇, 𝜎)
assumptions for sampling proportions
1. the data being used to make inferences must be from a simple random sample (SRS) 2. the sample size must be "large enough" for the central limit theorem to apply. (for describing the shape of the distribution) sample size considered to be large enough that both the expected number of successes in the sample n𝜋, and the expected number of failures in the sample, n(1-𝜋), are greater than or equal to 10 the shape of the 0-1 random variable has a binomial distribution, not a normal distribution convert 𝜋 to decimal
assumptions of central limit theorem (CLT)
1. the data being used to make inferences must be from a simple random sample (SRS) selected from the populations (properties for the center and spread hold must be satisfied) 2. either the population associated with the variable is normal, or the sample size is "large enough" for the CLT to apply (for describing the shape of the distribution) how large should sample size n be? if original distribution is not normal or unknown, the distribution of X-bar is approximately normal by CLT if n>=15 if original distribution is not normal and is skewed or heavily skewed, for CLT to apply n>=30 (for slightly skewed n>=15) if original distribution is not normal or unknown and n<15, then the shape of the distribution of X-bar cannot be determined
sampling distribution of X-bar: finding probabilities
In-between 1. Hit 2nd VARS, option 2 - normalcdf 2. normalcdf(lower number, upper number, 𝜇, 𝜎/square root of n ) round to four places after the decimal Greater than 1. Hit 2nd VARS, option 2 - normalcdf 2. normalcdf(lower number, 1E99, 𝜇, 𝜎/square root of n ) round to four places after the decimal Less than 1. Hit 2nd VARS, option 2 - normalcdf 2. normalcdf(-1E99, upper number, 𝜇, 𝜎/square root of n) round to four places after the decimal prob for one random individual uses 𝜎
properties for center and spread
center: the mean of X-bar is E(X-bar) = 𝜇 where E is the expectation of X-bar spread: the standard deviation of X-bar is SE(X-bar) = 𝜎/square root of n where SE is standard error of X-bar
describing sampling distribution of sample proportion p
first, check the assumptions shape: use CLT, normal distribution outliers: N/a center: the mean of p is E (p)= Expectation of p = 𝜋 spread: the standard deviation of p is SE (p)= square root of 𝜋(1-𝜋)/n; SE is the standard error hence the sampling distribution of p is p~N(𝜋, square root of 𝜋(1-𝜋)/n)
finding probabilities for sample proportion using the calculator
first, describe the sampling distribution next, find the probability that the sample proportion p is less than, greater than, or in-between LESS THAN 1. Hit 2nd VARS, option 2 - normalcdf 2. normalcdf(-1E99, upper number, 𝜋, square root of 𝜋(1-𝜋)/n) round to four places after the decimal GREATER THAN 1. Hit 2nd VARS, option 2 - normalcdf 2. normalcdf(lower number, 1E99, 𝜋, square root of 𝜋(1-𝜋)/n ) IN-BETWEEN 1. Hit 2nd VARS, option 2 - normalcdf 2. normalcdf(lower number, upper number, 𝜋, square root of 𝜋(1-𝜋)/n) notation: normalcdf(...)=
statistical inference
includes using statistics computed from a sample to make statements about unknown population parameters. sample mean: X-bar used to estimate population mean, 𝜇 sample proportion: p used to estimate population proportion, 𝜋
central limit theorem (CLT)
is a mathematical property that states that regardless of the shape of the original population, if the sample size is "large enough" the shape of the sampling distribution will be approximately normal
sampling distribution of sample mean
of a statistic is the distribution of values taken by the statistic in a large number of simple random samples of the same size n from the same population important to look of the distribution of X-bar
describing sample distribution
shape: normal outliers: n/a center: mean spread: standard deviation 𝜎/square root of n *if the shape is normal we do not have to concern ourselves with outliers to do this we must assume the value of the population parameters is known, so for the current application we are assuming that the value of the population mean 𝜇 is known
sampling distribution of sample proportion p
supposes we have a qualitative (or categorical) variable. This variable can have two or more categories, and no order or ranking is placed on the categories. From these categories, we identify one or more outcomes to be a "success" and another to be a "failure," depending on the parameter that we are interested in. this creates 0-1 Random Variable statistical notation, we have a qualitative variable X that can take only two possible values, what we call a "success" and what we call a "failure". Numerically, we code the successes as 1s and the failures as 0s, and we assume the proportion of "successes" in the population is 𝜋 hence the proportion of "failures" in the population is 1-𝜋 p = number of successes in the sample/ sample size n p is the point estimate of 𝜋 for the current application, we are assuming that the value of the population proportion 𝜋 is known
point estimate
we estimate the unknown 𝜇 with sample mean X-bar. hence X-bar is known as the point estimate of 𝜇
