Stats test chapters 4-5
how to compute sample variance?
(X-Xbar)2/N (scores on the variable X-the mean) squared/ Number of cases
Xbar=100 s=15 X=115 compute z score
Zx=115-100/15 =+1 so above the standard deviation by 1
Statistic test scores show a mean of Xbar=79 and a standard deviation of s=9. How many standard deviations fall between the mean and a test score of 90? What score falls 1.5 standard deviations below the mean?
1.22 standard deviations fall between a score of 90 and the mean of 79. the score 1.5 standard deviations below the mean is 65.5
what is the max number of cases we usually go to?
1000
what are 4 things we can do with the standard normal distribution?
1. estimate % and # of cases scoring at and below any given score 2. estimate % above any given score 3. estimate % between any 2 scores 4. estimate cut off scores
what do we do with z scores?
1. locate scores 2. common metric -compare scores on instruments that produce very different scores 3.score translation
what are the 3 reasons that the normal distribution is purely theoretical?
1. no minimum and no maximum scores 2. scores are continuous -between every 2 scores there is another one 3. N=infinitely large number of cases/scores
what are the 3 parts to the central limit theorem?
1. population mean = mean of sample 2. describes how variable the samples are 3. shape will be normal regardless of shape of population distribution
What 3 things are true about Z scores?
1. will always have a mean of 0 2. variance and standard deviation =1 3. distribution will show no change in shape
Approximately 20% of the population will suffer from clinically significant depression. In a group of 136 people, how many would you expect to develop depression?
27.2 people compromise 20% of this population
what is the min number of cases we should be using?
50
in a normally distributed sampling distribution of the mean, how many Ox units must go on each side of the centerpoint to capture 80% of the sample mean? What percentage of the samples drawn from a population will have means between 2.10 Ox above and 2.10 Ox below the center point?
80% of the sample means in a normally distributed sampling distribution are found within +/- 1.28 Ox of the center point. And 96.42% of the sample means in a normally distributed sampling distribution are found within +/- 2.10 Ox of the center point
The confidence interval defined by Xbar +/- 1.28 Ox will capture u, in what percentage of the samples drawn from a population? in what percentage of the samples will this confidence interval fail to capture u? what about the confidence interval defined by Xbar +/- 2.10 Ox?
Provided that the sample sizes are N>/= 50, about 80% of the samples drawn from a population will have means that fall within +/- 1.28 Ox of u. All of these means are within 1.28 Ox units of u and so surrounding any one of them by +/- 1.28 Ox will capture u. About 20% of the samples will produce confidence intervals that fail to capture u
what does converting scores into Z scores tell us?
With some precission as to where the score fell that not available in their raw scores
Compute an 80% confidence interval for the population proportion based on the data p=Xbar=.8 N=100 O=.42
XBAR +/- 1.28 Ox where Ox= O/square root of N .8 +/- 1.28 (.042) = .42/square root of N .8 +/- .054 = .042
construct 90%, 95% and 99% confidence intervals to estimate the population mean based on the following data: Xbar= 100 N=50 O=15
Xbar +/- 1,65 Ox 100 +/- 1.65(2.12) --> 15/square root of 50 100 +/- 3.50 Xbar +/- 1.96 Ox 100 +/- 1.96(2.12) 100 +/- 4.16 Xbar +/- 2.58 Ox 100 +/- 2.58(2.12) 100 +/- 5.47
compute a 75% confidence interval to estimate the population mean based on the data Xbar=100 N=50 O=15
Xbar +/- 1.15 Ox where Ox=O/square root of N 100 +/- 1.15(2.12) = 15/square root of N 100 +/- 2.12 = 2.12
what is the confidence interval for 90%?
Xbar +/- 1.60 Ox (where Ox is standard error of the mean)
compute 90%, 95% and 99% confidence intervals for the population proportion based on the following data: p=Xbar=.8 N=100 O=.42
Xbar +/- 1.65 Ox .8 +/- 1.65 (.042) .8 +/- .069 Xbar +/- 1.96 Ox .8 +/- 1.96 (.042) .8 +/- .082 Xbar +/- 2.58 Ox .8 +/- 2.58 (.042) .8 +/- .108
estimate 90% U( mean of population) of IQ for WF N=374 Xbar=101.2 s=17
Xbar +/- 1.65(Ox) where Ox=s/square root of N Xbar +/- 1.65 (17/square root of 374) Xbar +/- 1.65 ( .88) 101.2 +/- 1.45
what is the confidence interval for 95%?
Xbar +/- 1.96 Ox (where Ox is standard error of the mean)
est. proportion of MSU students who favor a 4 day week with a 95% confidence interval N=146 no=0 yes=1 prop or Xbar=62 s=.49
Xbar +/- 1.96 Ox ---> s/square root of N so .49/square root of 46=.041 62 +/- 1.96(.041) 62 +/- .80 so 62%-80%
compare the widths of the 95% confidence intervals to estimate the population proportion when p=.5 O=.25 and a) N=50 b) N=500
Xbar +/- 1.96 Ox --> .5 +/- 1.96(.035) --> .5 +/- .069 Xbar +/- 1.96 Ox --> .5 +/- 1.96(.011) --> .5 +/- .022
compare the widths of the 95% confidence intervals to estimate the population mean when N=50, Xbar=100 and a) O=10 b) O=100
Xbar +/- 1.96 Ox --> Xbar +/- 1.96(1.41) -->Xbar+/- 2.76 Xbar +/- 1.96 Ox --> Xbar +/- 1.96(14.14) -->Xbar+/- 27.21
suppose you wanted to construct a 95% confidence interval for the population mean. this confidence interval is to have a total width of 6 points, and the population standard deviation is estimated to be O=12. how large a sample do you need?
Xbar +/- 1.96 Ox where Xbar +/- 3 1.96 Ox=3 1.96 (O/square root of N)=3 1.96(12/square root of N) =3 12/square root of N=1.53 1.53/square root of N=12 square root of N=7.84 N=61.51
what is the confidence interval for 99%?
Xbar +/- 2.58 Ox (where Ox is standard error of the mean)
suppose you want to construct a 99% confidence interval to estimate the proportion of people who favor stricter gun control. you want this confidence interval to be 6 percentage points wide (that is +/- .03) and have estimated that the population standard deviation may be as high as O=.25. How large a sample will you need in order to construct this confidence interval?
Xbar +/- 2.58 Ox = Xbar +/- .03 2.58 Ox = .03 --> by definition of Ox 2.58 (O/square root of N)=.03 2.58 (.25/square root of N) = .03 2.58(.25/square root of N) =.03 .25/square root of N=.012 .012(square root of N) =.25 square root of N=20.83 N=433.89
suppose you wanted to construct a 99% confidence interval for the population proportion. this confidence interval is to have a width of +/- .03 and the population standard deviation is estimated to be O=.50. How large a sample do you need?
Xbar +/- 2.58 Ox where Xbar =.03 2.58Ox = .03 2.58 (O/square root of N)=.03 2.58 (.50/square root of N)=.03 .50/square root of N=.01 .01/square root of N=.5 square root of N=50 N=2500
using the sample ages below, construct a 90% confidence interval for the population mean. 21,18,19,25,21,20,24,27,29
Xbar +/- t(df=N-1;10) Ox where Ox=s/square root of N 22.67 +/- 1.860 Ox = 3.77/Square root of 9 22.67 +/- 1.860(1.26) = 1.26 22.67 +/- 2.34
U (mean of population) height of MSU males at 99 % confidence level Xbar= 1 inch s=3
Xbar or p+/- 2.58 Ox Xbar or p +/- 1 2.589(s/square root of N)= 1 2.58(3/square root of N) =1 2.58(3)=7.74 square root of N=7.74 N=59.91
construct the 90%, 95%, and 99% confidence intervals for the population proportions when N=50 O=.25 and p=.5
Xbar+/- 1.65 Ox --> .5+/- 1.65(.035) --> .5+/- .058 Xbar+/- 1.96 Ox --> .5+/- 1.96(.035a) --> .5+/- .069 Xbar+/- 2.58 Ox --> .5+/- 2.58(.035) --> .5+/- .090
IQ scores have a mean of 100 and a standard deviation of 15. SAT scores have a mean of 500 and a standard deviation of 100. Which score is higher, an SAT score of 580 or an IQ of 108?
Zsat score=580-500/100=+.80 Z IQ=108-100/15=+.53 SAT score is higher
how to compute Z scores?
Zx= X-Xbar/s
X=100 Xbar=100 s=15 compute z score
Zx=100-100/15 = 0 so is exactly at mean
ACT SAT Xbar=25 Xbar=1000 s=3 s=20 Joe=27 Sally= 1100 Whos score is better?
Zx=27-25/3=.67 Zx=1100-1000/200=/.5 Sallys Score is better
ACT SAT Xbar=25 Xbar=900 s=3 s=108 Joe=29 ----> how would he have done on the SAT?
Zx=29-25/3=1.33 so he would be above average on SAT equally 900+1.33(1.80) =estimated 1139 on SAT
TMA MAT Xbar=50 Xbar=150 s=10 s=30 Sam=45 Bill=140 Whos score is better?
Zx=45-50/10= -.5 Zx=140-150/30= -.33 Sam scored better
Is WF hotter or Uglier? Temp Ugliness (1-10 scale) Xbar= 55 Xbar=5 s=20 s=2.5 WF=65 ugliness=8
Zx=65-55/20=.5 Zx=8-5/2.5=1.2 WF is uglier than it is hotter
Depression Well-being Xbar=5 Xbar=120 s=1.5 s=8 Bill=8 --------------> how would he have scored on the well being test?`
Zx=8-5/1.5=2 120-1.5(8)=estimated 108
X=85 Xbar=100 s=15 compute z score
Zx=85-100/15 = -1 so below the mean
IOWA IQ Xbar=80 Xbar=100 s=15 s=15 Jo=92-------------> how would Jo have done on the IQ test?
Zx=92-80/15=.8 which means he is above average. So 100+.8(15)=estimated 112 on IQ test
formula for z score
Zx=the standard score or z score corresponding to raw score X X=a raw score Xbar=(x with a line over it)the mean of the distribution of raw scores s=the standard deviation of the distribution of raw scores Zx=X-Xbar/s
confidence interval
a range of values computed from sample data that has a known probability of capturing some population parameter of interest
confidence interval for the population proportion
a range of values computed from sample date that has a known probability of capturing the population proportions
sampling distribution of the proportion
a theoretical distribution showing the frequency of occurrence of values of the proportion computed for all possible samples of size N sampled with replacement from a population
sampling distribution
a theoretical distribution that depicts the frequency of occurrence of values of some statistic computed for all possible samples of size N drawn from some population
use the table of areas under the normal curve to find the proportion of cases: a) between the mean and z= 1.07 b) between the mean and z= -1.07 c) between z= -1.0 and z= 1.0 d) between z= -1.30 and z=1.50 e) beyond z= 1.50 f) beyond z= -1.75 g) beyond z= 0 h) between z= -1 and z= -2 i) between z= 1.3 and z= 1.8
a) .3577 b) .3577 c) .6826 d) .8364 e) .0668 f) .0401 g) .5000 h) .1359 i) .0609
in a normal distribution of IQ scores having a mean of Xbar= 100 and a standard deviation of s=15, what proportion of the cases score: a) between 100 and 120? b) between 100 and 90? c) higher than 108? d) lower than 96? e) between 105 and 110? f) between 90 and 95? g) lower than 110? h) higher than 90?
a) .4082 b) .2486 c) .2981 d) .3936 e) .1193 f) .1193 g) .7486 h) .7486
if IQs of 286 cases are normally distributed with Xbar=100 and s=15 how many cases would you expect to find with scores: a) at 115 or higher? b) at 85 or lower? c) between 100 and 115?
a) 45.39 b) 45.39 c) 97.61
in a distribution of IQ scores Xbar=100 and s=15. compute z scores for IQs of a)100, b)73, c)107
a) Z100=100-100/15=0 b)Z73=73-100/15=-1.8 c)Z107=107-100/15=.47
in a distribution of test scores showing a mean of Xbar=75 and a standard deviation of s=10, compute z scores corresponding to raw scores of 95, 60, and 90
a) Z95=2.00 b)Z60= -1.50 c)Z90= 1.50
in a normal distribution of IQ scores, with Xbar=100 and s=15: a) what proportion of the area under the curve falls between scores of 115 and 130? b) what percentage of the cases have scores between 115 and 130? c) what is the probability that a case drawn at random will score between 115 and 130?
a) the proportion of area under the curve between 115 and 130 is .1359 b)the percentage of cases scoring between 115 and 130 is 13.59% c) the probability of drawing a case at random that scores between 115 and 130 .1359
Score on a statistics test are approximately normally distributed with mean =75 and s=10. Which score falls: a) one standard deviation about the mean? b) two standard deviations below the mean? How many standard deviations separate scores of: c)70 and 80? d) 65 and 85?
a. 75+1(10)=85 b. 75-2(10)=55 c. 70 vs. 80=10 points = 1 standard deviation d. 65 vs. 85= 20 points= 2 standard deviation
theoretical distribution
an ideal distribution , one that we can imagine and one that may be approximated by empirical distribution but does not actually exist
interval estimation f
an inferential statistical procedure that uses sample data to compute a range of values having a known probability (or confidence) of capturing some population parameter of interest, usually the population of mean or proportion.
explain how sample size affects the width of the confidence interval
as sample size increases, the width of the confidence level decreases. this is because as N increases, the size of the standard error of the mean (Ox) that surrounds each side of the sample mean or proportion also decreases
how do the widths of the confidence intervals computed compare?
as the confidence level increases, the width of the confidence interval increases. the increase in width from the 95% to the 99% confidence interval is substantially greater than the increase from the 90% to the 95% confidence interval
explain how the variability affects the width of the confidence interval
as the data variability increases, the width of the confidence interval also increases. this is because as O increases, so does the size of the standard error of the mean (ox) that surrounds each side of the sample mean or proportion
explain how the level of confidence affects the width of the confidence interval
as the level of confidence increases, the width of the confidence interval also increases. this is because as the level of confidence increases, so does the number of standard errors of the man (Ox) that one must include on each side of the sample mean or proportion
what does a percentile rank tell you that a z score does not? what does a z score provide that a percentile rank does not?
both z scores and percentile ranks locates scores in the distribution but they convey different information. Percentile ranks tell one directly the percentage of cases falling at and below the specified score. but percentile ranks provide only an ordinal scale of measurement. standard scores do not locate scores in as understandable a manner as percentile ranks, but z scores do maintain an interval scale of measurement
sampling distribution of the mean
constructed by drawing all possible samples of size N from population, computing the mean for each sample, and plotting the frequency of occurrence of the various values that are obtained from one sample mean to the next
central limit theorem
describes three key features of the sampling distribution of the mean; its shape, its mean, and its standard deviation
empirical distribution
distribution of real, observed scores
several distributions are described below. which of these are sampling distributions? which are not? explain why. a. a distribution showing the frequency of occurrence of IQ scores for 100 college freshman b. a distribution showing the frequency of occurrence of mean IQ scores computed for all possible samples of size N=75 drawn from the entire freshman class c. a distribution depicting the proportions of people in a sample of 25 who favor and who do not favor a school bond issue d. a distribution depicting the frequency of occurrence of various sample proportions for all possible samples of size N=30 drawn from an entire population
distributions a) and c) are not sampling distributions. they are sample distributions that describe the distribution of scores in single samples. Distributions b) and d) are sampling distributions. They describe the frequency of occurrence of a sample statistic [the mean in B) and the proportion in d)] for all possible samples drawn from some population
sampling without replacement
each case, once drawn for inclusion in a sample, is not replaced into the population and therefore cannot be included more than once in any given simple
in each of the following data sets, cases are scored 1 if they favor a school bond issue and 0 if the do not. For each data set, compute: the proportion who favor the school bond issue; the mean; and the variance. compare the proportions and means. interpret the variances a) 1,1,1,1,1,0 b)1,0,1,0,1,0 c) 1,1,1,1,1,1
for data set 1: p=.83, Xbar=.83, s2 (s squared)=.14 for data set 2: p=.50, Xbar=.50, s2(s squared)= .25 for data set 3: p= 1.0, Xbar=1.0, s2(s squared)=0
what does Xbar mean?
mean
compute the mean of the population from data, also compute the mean of the sampling distribution of the mean of u=50. compare these values ages=10, 30, 50, 70, 90
mean is also Uxbar=50
in the sample data shown below, political preference (democrat or republican) is shown for each of 10 cases. construct a 95% confidence interval to estimate the proportion of republicans in the population Republican Republican Democrat Democrat Democrat Republican Republican Republican Democrat Republican
p=Xbar=.60 s=.52 N=10
confidence interval for the population mean
range of values having a known probability of capturing the population mean
sampling error
refers to the discrepancies that exist between the characteristics of a population and characteristics of samples drawn from the population
standard scores or z scores
represented by z, a standard score or z score measures the difference between a raw score and the mean of the distribution using the standard deviation of the distribution as the unit of measure
formula for standard deviation
s=square root of s2 sample standard deviation = square root of the sample variance squared
formula for corrected sample standard deviation
s^=square root of s^2 the corrected sample standard deviation=square root of the corrected sample variance square
sampling with replacement
sampling procedure in which each case, after being drawn for inclusion in a sample, is replaced into the population and becomes eligible for inclusion again in that same sample
Give an original example of empirical distribution and explain why it is not a theoretical distribution
scores on a statistics test, heights of 23 men, ages of the students at a university, etc.
Why should we care about the normal distribution if it doesn't exist in the real world?
the degree that data approximate the normal distribution, everything we know about the normal distribution also applies to our data
suppose you have drawn a sample from some large population and have computed the sample's mean. under which of the following circumstances would the sample mean by most likely for fall close to the population mean? under which circumstances would the sample mean be most likely to show a large deviation from the population mean? a) small sample sizes; low date variability b) large sample size; low data variability c) small sample size; high data variability d) large sample sizes; high data variability
the least variability of sample means (standard error of the mean) occurs when sample size is large and data variability is low. it is under these circumstances b) that any given sample mean would be most likely to fall close to the population mean. the greatest variability of sample means occurs when sample size is low and data variability is high. it is under these circumstances c) that any given sample mean would be most likely to deviate substantially from the population mean
In an approximately normal distribution of statistics test scores with mean=75 and s=10, what percentage of students score between 75 and 85? What is the probability of a student scoring between 75 and 85? What percentage falls between 55 and 65? What is the probability of finding a score in this range?
the score of 75 falls at the mean of the distribution and 85 is 1 standard deviation about the mean. in a normal distribution, 34.13% of the cases fall between the mean and the score 1 standard deviation above the mean. Expressed as a proportion, this gives us the probability of finding a score in this range: .3413
compute the standard deviation of the sample means forming the sampling distribution of the mean of u=50 also use equation 5.1 to compute the standard error of the mean for this sampling distribution. Why do these values differ?
the standard deviation of the sample means forming the sampling distribution of the mean is s=21.60. this is close to, but not exactly the same as, the standard error of the mean computed according to Ox=20.00. the central limit theorem tells us that the equation is only accurate when sampling sizes are N>/= 50
standard error of the mean
the standard deviation of the sampling distribution of the mean. approximately equal to the average absolute difference between the sample means and the mean of the population from which the samples were drawn
on one measure of scholastic aptitude, having a mean of Xbar =20 and a standard deviation of s=5, student A has a score of 27. on a different test, having a mean of Xbar=600 and a standard of s=150, student B has a score of 675. which student has a higher scholastic aptitude?
these scores are compared by converting them each to z scores according to table: GRADE f A 26 B 42 C 89 D 37 F 14 Za=1.4 Zb=5 student A has the higher scholastic aptitude
True or false- as scores get higher in depression, the scores lower in well being
true
true or false- the large the sample size the smaller the confidence interval
true
true or false-Z scores scan for outliers
true
true or false-confidence intervals are not always useful especially when they are large
true
in a distribution of test scores showing a mean of Xbar=75 and a standard deviation of s=10, what raw scores correspond to z scores of -1.2, 1.6, and 0?
use the table to "work backwards" from z scores to raw scores GRADE f A 26 B 42 C 89 D 37 F 14 a)63 b)91 c)75
why must we be cautious when drawing conclusions about populations on the basis of the sample data?
we must be cautious when drawing conclusions about populations based on sample data because no one sample drawn from a population is likely to provide a perfect reflection of the population
Compute the 90%, 95%, and 99% confidence intervals for the population mean when N=50 and O= 15 and Xbar=100
when N=50 and O=15, Ox=15/square root of 50=2.12. so Xbar +/- 1.65 Ox --> 100+/- 1.65(2.12) -->100+/- 3.50 xbar+/- 1.96 Ox --> 100+/- 1.96(2.12) -->100+/- 4.16 Xbar+/- 2.58 Ox --> 100+/- 2.58(2.12) --> 100+/- 5.47