Exam 3 Soc 381L
68% lies between what 2 standard deviations?
+1 -1
95% lies between what 2 standard deviations?
+2 -2
Normal Curve
1) mean, median, mode equal to one another 2) distribution is symmetrical 3) unimodal 4) based on theory
Step 3: Calculate the Confidence Interval
Calculate the lower and upper limits of the interval. Lower Limit = 7.5 - 1.96(0.07) = 7.36 Upper limit = 7.5 + 1.96(0.07) =7.64
Our Test: 2-tail, alpha of .05
For the purposes of this class and lab we will always use a two-tailed test and an alpha level of .05 In a two tailed test and with degrees of freedom above 120, an alpha level of .05 corresponds to a critical region of +-1.96 with a T test.
Our Test: 2-tail, alpha of .05
For the purposes of this class and lab we will always use a two-tailed test and an alpha level of .05 In a two tailed test and with degrees of freedom above 120, an alpha level of .05 corresponds to a with a T test.
95% CI 99%CI
More precise, less confident More confident, less precise
Calculate the Standard Error of the Mean
Standard error of the mean - the standard deviation of a sampling distribution
T-Statistic
(Obtained)-is the statistic that is computed to test the null hypothesis about a population mean when the population standard deviation is _unknown_and is estimated using the sample standard deviation. *It represents the number of standard deviation units the sample mean is from the hypothesized population mean (assuming the null hypothesis is true).
99% lies between what 2 standard deviations?
+3 -3
Confidence interval estimates
- a range of values defined by the confidence level within which the population parameter is likely to fall. i.e. The average level of education for adult Americans is somewhere between 12 and 14 years.
Why are z scores important?
- many variables come close to being normally distributed - many statistical tests rely on the assumption that data is normally distributed. - allows us to test if likelihood of outcome observed in data occured by chance (random) or a real finding. Tells us if we can reject the null.
What is the standard error of the mean?
-indicator of sampling error -tells us how far sample means are from the population means -as standard error becomes smaller, sample means come closer to population mean
Research hypothesis(H1):
-reflects the substantive hypothesis and contains a statement of strict inequality >,<,≠
What are the characteristics of the normal distribution?
-theoretical distribution-means that the distribution is based on theory rather than on real data. -Mean, median, mode are equal to one another. -Unimodal-one single, most common value -Bell shaped/symmetrical.
Opioid Epidemic-Drug Dealer Profiles
1 Convicted drug dealers in Pennsylvannia have on average 2 prior violent convictions Claim: u=2 per violent convictions Complement: u≠2 prior convictions 2 Estimate sample mean U=2 Pop mean Y =2 on average found .04 (sample mean) N=2000 3 - Population of convicted drug dealers U=2 Gamma =? - sample of convicted drug deals Y=.04 sy=.225 4- assumptions: random sample, normal distribution or sample large enough, and interval ratio variable 5- select sampling distribution Null hypoth: Convicted drug dealers have an average of two violent convictions Research hypoth: Convicted drug dealers do not have on average 2 violent convictions
Three types of hypothesis
1) whether 2 variables are related to one another 2) whether sample statistic differs from population parameter 3) whether 2 groups/catergories differ in characteristics (Do urban/rural areas differ in opioid drug OD's?)
Assumptions of One Sample Hypothesis Tests ???
1- sample randomly selected 2- hypothesis tests about means /avg assumes variables the measured is at interval ratio measurement. 3 - variable normally distributed or sample size large enough
Six Steps to One Sample Hypothesis Testing
1-Meet required assumptions of one sample hypothesis tests 2-State the research and null hypothesis 3-Select the sampling distribution and specify the test statistic 4-Choose the strictness of your test 5-Compute the test statistic 6-Make a decision and interpret the results
Steps to determine confidence interval for mean
1. Calculate the standard error of the mean 2. Decide on the level of confidence Find the corresponding Z value 3. Calculate the confidence interval 4. Interpret the results
A difference between the sample and the population could mean one of two things.....
1.Real difference 2. Sampling error
Opioid Epidemic Claims-Epidemiology
38% adult americans were prescribed opioids 2015 Claim:U= 38% Null hypoth - statement of equality Complement: U≠38% Research hypoth Opioid overdoses kill at least 4 people every hour Claim: U>,=4 Null Complement U<4 Research
The 2006 General Social Survey contains information on the number of hours worked by a respondent each week. The mean number of hours worked per week is 39.04, with a sample standard deviation of 11.51. The sample size is 83. Calculate a 95 percent confidence interval for this sample estimate and provide an interpretation of the interval.
39.04 + - 1.96 (1.26) 39.04 +- 2.47 CI = (36.57, 41.51) We are 95% confident that the actual mean number of hours worked falls somewhere between 36.57 hours and 41.51 hours.
What is estimation?
A process where we select a random sample from a population and use a sample statistic to estimate a population parameter.
Two Tailed Test With An Alpha of .05
Assuming that your sample size is large enough...using a two tailed test and an alpha of .05, if your T obtained is > 1.96 or < -1.96, reject the null hypothesis (Ho)! Critical Region-if T obtained falls in gray area reject the null (Ho)
A consumer watch company announces that the mean ounces in Big Brew, a delicious beer, is less than the advertised 32 ounces.
Claim u> 32 oz Complement: u =,> oz null
Substantive Hypothesis: Americans have on average 12 years of education.
Claim: u=12 Null Complement: u≠12
A Law school reports on its website that 90% of its graduates pass the bar on the first try. What is null and research hypothesis?
Claim: u=90% Null Complement: u≠90
Confidence Interval Width
Confidence Level - Increasing our confidence level from 95% to 99% means we are less willing to draw the wrong conclusion - we take a 1% risk (rather than a 5%) that the specified interval does not contain the true population mean. If we reduce our risk of being wrong, then we need a wider range of values . . . So the interval becomes less precise.
S4: Choose the strictness of your test
Critical Region- the area under the sampling distribution tthat include unlikely samples outcomes. -if your test statistic falls in this area, you reject the null hypothesis. -The size of the critical region is reported as alpha Alpha-it is the level of probability at which _the null hypothesis is rejected. It is also the proportion of all of the areas included in the critical region.
Confidence intervals give us an interval in which we can be confident the true population mean lies
Hypothetical Confidence Interval for the true population mean, estimated by the sample mean and sample standard deviation (standard error) sample mean known, population mean unknown
Hypothesis Testing of Sample Statistics
In hypothesis testing-there are always a pair of hypotheses (the null and the research) One represents the claim; the other is its complement. Americans on avg have 12 yrs edu. Claim: u=12 yrs Complement u ≠12 null Either the research hypothesis or the null hypothesis can be the original claim
Researcher Bob measures the daily caloric intake of players for his favorite NFL football team. For the following set of caloric intakes, fill in the missing cells using the information provided. The average caloric intake for the entire team is 4100 with a standard deviation of 120. Raw Score Z Score John's Intake 3,800 ? -2.5 Dan's Intake ? 4412 2.6
John: Z= (3800-4100)/120 =-2.5 Dan Y=4100 +(2.6 * 120): 4412
Step 1: Calculate the S.E
N=500, Sample Mean=7.5, Standard Deviation=1.5 The larger the sample size the smaller the standard error will be (because we are dividing by the square root of the sample size). The smaller the standard error, the smaller our confidence interval will be. Which means we can more accurately estimate the true population mean (a smaller interval = better). 𝑆𝑦=𝑆𝑦/√𝑁=1.5/√500=.07
How do we calculate the standrad error of the mean?
Population/square root sample size
One Sample Hypothesis Testing Examples: Educational attainment among those sampled in the GSS (random sample of 1972) compared to education levels for the population as a whole (population)
Sample of Americans 13.5 (population parameter) years of education Sample standard dev = 3.127 Claim: American population has on average 12 years of education u=12 yrs complement ≠12 yrs u=? x=13.5 sample U=13 population Is difference between 13.5 and 12 statistically significant? Real difference:Americans on average have higher than 12 yrs education. (13.5 is populatiob parameter) Sampling error: average level education for Americans is the same as national average. This sample shows higher education mean
Practice: Calculating a 95% CI
Say we conduct a survey to determine the distance students commute to campus We survey 500 students The sample mean is 7.5 miles and the standard deviation is 1.5 miles We want to calculate a confidence interval around our sample mean (7.5) so that we can be 95% certain it contains the true commute time for the full population of students
Estimating Standard Errors
Since the standard deviation of the population is generally not known, we usually work with the estimated standard error which utilizes the standard deviation of the sample (SY):
S 2) State research & null hypothesis
Substantive hypothesis: Americans on average have 12 years of education.
Two Sample Tests
Tests whether a difference on a characteristic between two groups in our sample is _large enough_________to conclude that the populations these samples represent differ on the same characteristic
We want to construct an estimate of where the population mean falls based on our sample statistics
The actual population parameter falls somewhere on this line This is our Confidence Interval
Õ¯y Standard error of the mean (use when you have the population standard deviation); when you have the population standard deviation use this in the confidence interval formula above instead of estimated standard error.
σ_Y ̅ =σ_Y/√N
population standard deviation
σꙋ
Interpretations
We are 95% confident that the actual mean miles traveled to campus is somewhere between 7.36 miles and 7.6 miles. There is a 5% chance that we are wrong.
Researcher Bob is interested in whether there is a difference between people who earned a bachelor's degree or higher (response represented by BAORHIGHER) versus people who do not have a bachelor's degree (NOBA) in the age at which they had their first child. Use the output below, a two-tailed test, and an alpha level of .05 to answer the following questions. Please write answers in complete sentences, do not just use notation. . What is the null hypothesis? What is your research hypothesis? Using an alpha level of .05 can we reject the null hypothesis? Why or Why not?
The null hypothesis is that those without a bachelor's degree and those with a bachelor's degree or higher have their first child at similar ages (u1=u2). The research hypothesis is that those without a bachelor's degree and those with a bachelor's degree or higher have their first child at different ages (u1≠u2). Yes, our p-value (.000) is less than the alpha of .05 so we can reject the null hypothesis that those without a bachelor's degree and those with a bachelor's degree or higher have their first child at similar ages. Those without a bachelors degree (NOBA) have children at a younger age than those with a bachelors degree or higher (BAORHIGHER)
What are standard Z scores?
The number of standard deviations that a given raw score is above or below the mean. + = higher score than mean - = lower score than mean larger z score = larger difference btw score & mean
in a 95% confidence level
There are 95 chances out of 100, that the interval contains the population mean. - There are 5 chances out of 100 that the interval DOES NOT contain the population mean.
S3: Selecting sampling distribution and specify the test statistic
We have been using the normal distribution and the Z statistic . We use the Z statistic when we know the population standard deviation *********However because we rarely know the population standard deviation we will be using the T-Statistic.********
Step 2: Decide on Level of Confidence and Find Corresponding Z Value
We want a 95% confidence interval So what z-value do we use? 1.96
S5: Computing the Test Statistic
We will not be calculating t-statistic by hand or using the t-distribution table to determine whether we reject or accept the null hypothesis. Rather, we will be using SPSS statistical output and p-values to determine whether to reject or accept (fail to reject) the null hypothesis.
Z
Z score that corresponds with confidence level
What do we find in all normal curves?
a constant percentage of area under the curve lying between mean & any given distance from mean when measure in standard deviation units.
Null-(Ho)
a hypothesis that contains a statement of equality =,<=,>=; most often a strict statement of "no difference"
What is sampling error?
discrepancy between the sample estimate of the population parameter and the real population parameter.
sȲ
estimated standard error
sȲ
estimated standard error of the mean (use when you have sample standard deviation) S¯y estimated tandard error of the mean (use when you have sample standard deviation) S_Y ̅ =S_Y/√N
Point estimates
exact estimate i.e. The average level of education for adult Americans is 13.43 years
How do we reduce sampling error?
larger sample size
What is under a normal curve?
mean, standard deviation
What are the two types of estimation?
point and interval
margin of error
raduis of confidence interval
Ȳ
sample mean (estimate of
N
sample size
SY
sample standard deviation
σ_Y
standard deviation of the population
σ_Y ̅
standard error of the mean (use when you have population standard deviation)
sample standard deviation
sꙋ
confidence levels
the likelihood expressed as a percentage or a probability, that a specified interval will contain the population parameter.
As long as the distribution is normal and you know the mean and standard deviation we can determine?
the percentage of case that fall between any score and that mean Ex: Ȳ=500 sꙋ=100 place 500 in middle and then standard deviations +/- 100 from center 500
What can the area under a normal curve be conceptualized as?
the percentage of total number of objects in the sample Example: SAT scores 50% fall left or right of median
Why is sampling the basis for inferential statistics?
through sampling we are attempting to generalize characteristics of the population based on the characteristics of the sample
what is the goal of most research?
to find the population parameter based on sample statistics
population mean
µꙋ
sample mean
Ȳ