Stats exam 2
Sample mean
Sum of x / n = EX/n ex. 5+4+3+2= 14 14/4 = 3.5
Variance
Average of the square deviations of the scores around the mean -use computational formula
Modality
-number of meaningful peaks -Unimodal -Bimodal -etc
Population mean
"mu"
Ex
- add all the x values together
What do we mean when we say that distribution of scores is negatively skewed? What is the relationship of the 3 measures of CT when the distribution is negatively skewed?
-Everything is centered around the median
What do we mean when we say that distribution of scores is positively skewed? What is the relationship of the 3 measures of CT when the distribution is positively skewed?
-Everything is centered around the median
Cumulative frequency
-Frequency of all the scores at or below a particular score -Running total of frequencies
Histogram
-Interval/ Ratio -Nice graph for discrete variable (whole numbers) -series of vertical bars centered on x-axis -height = frequency -adjacent bars touch as long as there is a frequency
Frequency polygon
-Interval/ratio data (measurement) -Make a dot over score on x-axis -Height = frequency -Connect dots with straight lines -Always bring lines to x-axis
Skew distributions
-Lack symmetry
Bar graph
-Nominal or ordinal data (categories) -series of vertical bars centered over score on x-axis -height = frequency -adjacent bars do NOT touch
Cumulative percentage = percentile
-Percentage of cases at or below a particular score -the score at the nth percentile is always the upper real limit of the class interval -Running total of relative percentages
Relative percentage
-Percentage of time the raw score occurs in the sample or population f/n x 100
What do we mean when we say that a distribution of scores is symmetric? What is the relationship of the 3 measures of CT when the distribution is symmetric?
-Scores can be folded in half -They are all the same
Grouped frequency distribution
-Simple frequency table wouldn't be appropriate here because the numbers are so spread out so we construct this -With this we can group scores into class intervals *see example
Ex2
-Square each x value then sum them 5^2 + 4^2 + 3^2 =
E (sigma)
-Summation sign -to add or to sum ex. 5, 4, 3, 2 = E = 14
Sigma
-Summation sign=to add or to sum
Normal distribution
-Symmetrical
Statistical pictures
-Things to include in a good visual display 1.) the date should stand out clearly from the background 2.) there should be a title and purpose of the picture 3. everything should be clearly labeled (pie segments, axes, start at a zero or not) 4. There should be a source given for the data 5. there should be as little "chart junk" as possible
Factors of CT
-Type of data (N,O,I,R) -Shape of distribution (symmetrical, skewed)
Notation
-Use x or y E Ex Ex2 (Ex)2 Exy (Ex)(Ey) N n
Measures of variability
-a score that indicates how spread out the scores are in a distribution A: 0 2 6 10 12= 6 B: 4 5 6 7 8 =6 C: 6 6 6 6 6 =6 -Mean might be the same but they are NOT identical
Class intervals
-groups numerically defined in such a way that any given raw score can belong to one and only one group -Typically estimate the number of class intervals between 5-20 -Lets say we want 10 class intervals 1.) find the highest and lowest scores, subtract them ex. 58-12=46 2.) Divide the number of intervals into the range ex. 46/10 = 4.6 We will round that to 5 -This gives us class interval size meaning we will put 5 values in the class intervals -Start with the lowest score and count 5 spots for the first class interval (ex. 12...16) -Next, tally the corresponding frequency for each of the class intervals
what is a z score? what does the sign of z indicate? what does the value of z indicate?
-represents the number of standard deviation units, an observation is above or below the mean -the sign indicates direction -Value indicates how far away from the mean the observation is
Midpoints
-the halfway point of class interval -The average of the lower and upper real limits
Pictures of categorical data -Pie charts
-useful when only one categorical variable is measured -Shows what percent of the whole falls into each category
Measures of central tendency
1. Mean 2. Median 3. Mode -score that summarizes the location of a distribution on a variable
characteristics of a good statistical picture?
1. data should stand out clearly from background 2. should be a title/purpose of the picture 3. everything clearly labeled (segments, axes, etc) 4. should be a source given for the data 5. little chart junk as possible
what are the key features of normal distribution
1. most of the scores cluster around the middle of the distribution. as the distance from the middle increases in either direction there are fewer and fewer scores 2. symmetrical 3. 3 measures of CT will fall at center of distribution 4. normal distribution is asymptotic to the x-axis - the tails will never join x-axis bc it is infinite 5. constant relationship with o
Key features of normal distribution
1.) Most of the scores cluster around the middle of the distribution. As the distance from the middle increases, in either direction, there are fewer and fewer scores. 2.) Normal distribution is symmetrical 3.) The 3 measures of central tendency fall at precisely the center of the distribution 4.) normal distribution is asymptotic to the x-axis. the tails will never join the x-axis. 5.) The normal distribution has a constant relationship with standard deviation
Three types of measures of variability
1.) Range 2.) Interquartile range 3.) Variance
Characteristics of z-distributions
1.) always has the same shape as the raw score distribution. for example, if the raw score distribution is bimodal then the z-distribution is bimodal. 2.) mean of any z-distribution is 0 3.) standard deviation is equal to 1
Simple distribution
1.) go through raw data to find the lowest and highest score 2.) list each score between lowest and highest including them (under x) 3.) Tally the corresponding frequency *see examples*
Median
50th percentile -Ordinal, interval, or ratio data -Best measure of CT = skewed -Halfway point/exact midpoint of any distribution steps: arrange scores according to magnitude, find very middle
Displaying data
8 7 10 12 9 n= 5
What is the difference between a bar graph and a histogram
Bar- nominal or ordinal (categories) -bars do not touch Histogram- interval or ratio -graph good for discrete or whole numbers -Bars do touch
What is the measure of CT? and why are there 3 measures?
CT= mean, median, mode -Scores thats summarize the location of distribution on a variance Factors- type of data (NOIR), shape of distribution on a variable
Z-score
If we need to standardize a score, such as make comparisons, we compare a z-score. -It is a standard score that represents the number of standard deviation units an observation is above or below the mean.
Standard normal distribution
If we transform the normal distribution into a standard form we get this -has a mean equal to 0 and a standard deviation equal to 1. The proportion of area under the standard normal curve is the same as the relative frequency (proportion of the time a raw score occurs)
Mean
Interval or ratio data -Most widely used -Best measure of CT when distribution is symmetrical -Arithmetic average of all of the scores in the data set
Pictures of categorical data: Pictogram
Like a bar graph except that it uses pictures related to the topic of the graph
Normal distribution
Many distributions of measures conform to the normal curve. The normal distributions are a family of distributions. The normal distribution is actually a theoretical distribution based on a population of an infinite number of cases. The mean is "mu" and the standard deviation is "o"
what are the 3 measures
Mean- I or R data -widely used -Best measure of CT when distribution is symmetrical -Arythmetic average of all the scores in the data set Median- 50th percentile -O, I, R data -Best measure of CT for skewed -Exact midpoint Mode- NOIR -Most frequently occurring score in the data set -you can have more than one mode
Range
Measurement of the width of the entire distribution Highest score - lowest score = range
Exy
Multiply x with corresponding y then sum
Mode
NOIR -Most frequently occurring score or scores in the data set -You can have more than one mode
N
Population size # of cases on the population
n
Sample size # of cases in the sample 1,2,3,4 n=4
Frequency distribution
Table that organizes data based on how often the scores occur
Variance
The average of the square deviations of the scores around the mean S2 = EX2 - (EX2) --------- n -------------------------- n-1
Interquartile range
The range of the middle 50% of the distribution 1st: arrange the scores in order highest to lowest Steps: 1.) find the median 2.) find median of upper and lower half 3.) subtract the upper and lower scores
Note: about z-score
The sign of the z-score indicates direction and the value of the z-score indicates how far away from the mean the observation is. Thus, a z-score of 1.5 is one and one-half standard deviations above the mean. A z-score of -2.0 is two standard deviations below the mean. Z-scores are especially useful for comparing the performance of different individuals on different measures
real limits
Those points falling one half a measurement unit above and below a number ex. 10 = 9.5(lower real limit)-10.5(upper real limit)
Pictures of measurement variables: Line graphs
Useful for displaying how a measurement changes over time
Z-distribution
a distribution of z-scores produced by transmitting all raw scores in a distribution into z-scores. For example, i could transform our exam one grades into z-scores. Then we would have a z-distribution.
What us a measure of variability?
a score that indicates how spread out the scores are in a distribution -means can be the same but that does not mean they are identical -AKA find the mean
(Ex)(Ey)
add all x values, add all y values, multiply
why do we need standard scores or z-scores
if we need to standardize a score, such as make a comparison, we compute a z-score
what is the mean and sd of any z-distribution?
mean = 0 sd = 1
what is the mean and standard deviation of the standard normal distribution?
mean= 0 sd = 1
range
measurement of the width of the entire distribution -Highest - lowest score = range
Raw data
no organization, just collected data
what is modality of a distribution
number of meaningful peaks
standard deviation
o
E(x-y) = Ex-Ey
see examples
Negative skew distributions
see the tail, its going to the left
Positive skew distributions
see the tail, its going to the right
IQR
the range of the middle 50% of the distribution 1- arrange scores in order 2- find median 3- find median of upper and lower half 4- subtract upper and lower middle scores
Pictures of measurement variables: Scatterplot
useful for displaying the relationship between two measurement variables. Each dot = one variable
Pictures of categorical data: Bar graphs
useful when two or three categorical variables
Graphing frequency distributions
x = scores or midpoints y = frequency
