Data analysis - Ch 6
z statistic formula
(do the one on the right first)
what does the notation N(0,1) mean?
- 0 = mean, median & mode - 1 = standard deviation & variance (because 1 squared = 1) - N = normal distribution
distribution of means: same average
- Consider a z score for the distribution of scores: • Suppose z=0, for a raw score of 73. - What is z for the mean of the distribution of means (denoted μM)? • Also z = 0, which would correspond to a raw mean of 73. - Whatever the average of a population of scores is, that's also the average of a distribution of means computed from that population.
converting to the standard normal distribution simplifies interpretations
- someone with a z score of 1 is one standard deviation above the mean. Always, by definition - you get 35 questions right on the first exam. what does that mean? on its own, that statement is not very interpretable - you got a z score of +0.5. what does that mean? it means you scored half a standard deviation above the average grade
computing a z statistic
- we can also compute a z statistic for a single mean in a distribution of means - this is very similar to computing a z score for a distribution of scores - we now need to change the equation slightly and use standard error (rather than the standard deviation)
a score of z = -1.96 is ___
1.96 standard deviations below the mean - 1.96 is an important number for hypothesis testing
Distributions of Means: More Normality
Consider a distribution of scores: • If the distribution of scores is normal, then the distribution of means will also be normal. • If the distribution of scores is not normal, then the distribution of means will be increasingly normal with higher sample sizes. - We can construct a distribution of means for any size N. That distribution represents the set of all possible sample means we could possibly observe when we sample from a specific population of scores, using sample size N.
normal distribution
a distribution of values having a specific shape that is symmetric, unimodal, and bell- shaped. Also defined mathematically.
distributions of mean
a distribution of values, where each value is the mean (Average) of values from another distribution
when you have a bigger sample size you have ____
a more normal looking distribution when increasing sample size - has less variability
standard normal distribution
a normal distribution with a mean of 0 and a standard deviation of 1. - notation: N(0,1)
the central limit theorem will often allow us to work with approximately normal distributions even when ___
the population distribution is not normal
if we have a positive Z score we are going to be ___ the mean
above the mean
how do we get a percentile?
add up the percentages that come at or below the number you are looking for
the standard error gets smaller as the sample gets larger
true
a z score table tells us the corresponding percentile for ____
any value of z
a score of z = 0 is ___
at the mean
if we have a negative Z score we are going to be ___ the mean
below the mean
what does z = ±1.96 mean?
between these values lie 95% of the population - going to be a very common cut off
Benford's law
certain values are more common in the leading place of a number - leading place: furthest place to the left - naturally occurring data sets - 1 occurs more frequently than 2, 2, occurs more frequently than 3, etc - going to hold for financial data
conceptual idea: deviations from a central tendency are ___
more often small than large and the deviations occur randomly in either direction - are also likely to be above or below the mean
how to convert a z score back to a raw score
multiply the z score by the standard deviation and add the mean
what does z = ±1 mean?
one standard deviation above/below the mean. - between these 2 values lie 68% of the population
central limit theorem
refers to how a distribution of sample means is a more normal distribution than a distribution of scores, even when the population distribution is not normally distributed
when we compute the average of a ____ we are effectively drawing one observation (a mean) from a distribution of means
sample
we have a distribution of scores, and a distribution of means for a _____
specific number of scores (N)
we are always talking about the distance from the mean in ___
standard deviation units
what does z = 0 mean?
the mean of a distribution
the standard normal distribution is a specific version of ____
the normal distribution - it is a normal distribution that is defined to have a mean of 0 and a standard deviation of 1
standard error (σM).
the standard error is the standard deviation of a distribution of means for a specific sample size, N.
standard normal distribution is also called ____
the standard normal curve
why compute z scores?
translating questions to put them on the same scale - "how likeable are you, on a 1-7 scale?" - "how warm would your friends rate you on a feeling thermometer (0 - 100)?" - averaging questions with different scales produces fairly meaningless values (at least with the arithmetic mean). Standardizing (computing z scores for each question) equalizes the scales - its a legitimate way to compare apples to oranges - alternatively, we can think of it as converting everything into mangos - we are putting everything on the same metric of measurement
the distribution of means and the distribution of scores have the same mean
true
the distribution of means has less variability
true
distribution of means: less variability
Consider a distribution of scores: • Suppose σ = 5 for the distribution of scores x. • What is σ for the corresponding distribution of means? • It's smaller than 5 (as long as your sample size is greater than 1). We will call the standard deviation of means the standard error and label it σM. - specifically we can compute the standard error (σM.)
to convert to a standard normal distribution, we will need to take our raw scores and ____ them
standardize them - these standardized scores are referred to as z scores
the distribution of means becomes more normal as N increases (even if the population distribution is not normally distributed)
true - the sampling distribution of the mean should be normally distributed if we start with a skewed distribution
as we change our sample size we will increase ____
variability
what can we do with the distribution of means?
we can create a distribution of means and compute z statistics for these means - these distributions are often referred to as sampling distributions of the mean - set of averages that can be computed for all possible combinations of N scores from a particular population of scores - specific sample size for each sample
to allow for more direct comparisons we should convert ___ scores into___
we should convert raw scores into standardized scores (z scores)
as the sample size increases ____
we will better represent the distribution
do the 2 halves of the distribution have the same properties?
yes
are distributions other than the normal curve possible?
yes other distributions are possible - even though the normal curve does occur often
if you continue to decrease interval width ____
you would get a more normal looking distribution
because the standard normal distribution (set of z scores) has a specific shape, any normal set of data can be converted into ___ to learn even more about the individual scores
z scores
standardized values are called ____
z scores
why do we convert to z scores?
1. converting to the standard normal distribution simplifies equations 2. converting to the standard normal distribution simplifies interpretations
how to convert a raw score to a z score
1. subtract the mean from each data point 2. then divide each resulting difference by the standard deviation - we will use both these operations to convert to a standard normal curve
what can we construct from the central limit theorem?
- Through the Central Limit Theorem, we can construct a distribution of means from all combinations of N scores from an original distribution of scores. That distribution of means will be more normal than the original distribution (assuming that the original distribution is not perfectly normal), and any distribution of means for a higher N will be more normal than a distribution of means for lower N. - With sufficiently high N (e.g., N > 30), the distribution of means approximates a normal distribution, even when the population distribution of scores is not normally distributed.
what is the difference between a standard deviation & a standard error?
- a standard deviation is a measure of variability for a distribution of scores in a single sample or in a population of scores. - a standard error is the standard deviation in a distributions of means (of all possible samples of a given size from a particular population of individual scores) - they are similar but are for different distributions
what do we know about normal distributions?
- a z score of 0 cuts the lower half of scores from the upper half of scores - roughly 34% of scores fall between z = 0 & z = 1 - roughly 98% of scores fall below z = 2
standard error formula
- as N increases, the standard error decreases. variability in the distribution of means decreases as N increases
converting to the standard normal distribution simplifies equations:
- mean = 0 - variance = 1 - standard deviation = 1
a score of z = 1 is ___
1 standard deviation above the mean
what are the 3 properties for a distribution of means?
1. the distribution of means has the same mean as the population distribution of scores 2. the distribution of means has a less variability than the distribution of scores. this also reduces the range of observed values 3. assuming a sufficient sample size, the distribution of means becomes more normally distributed (assuming that the parent population is not normal already) --> this last point is REALLY important because it is going to let us know which type of distribution we are working with
what are the characteristics of a standard normal distribution / standard normal curve
1. unimodal 2. bell-shaped 3. symmetric - also defined mathematically (we can talk about different regions)
subtracting the mean from each data point :
changes the distribution's mean to 0
standard deviation (scores) symbol for mean
μ
standard error (means) symbol for mean
μM
standard deviation (scores) symbol for spread
σ
standard error (means) symbol for spread
σM