Statistics
Comparing several groups of quantitative data (For the relationship between a categorical explanatory variable and a quantitative response variable) Only one of the two variables is categorical.
Side- by- side boxplots
Boxplots
are most useful when presented side-by-side to compare and contrast distributions from two or more groups. box and whisker plot
box plot
can not determine which city has more households does not indicate how many data values in the dataset
midpoint
center of distribution
center
mean and median
IQR
measures the variability of a distribution gives range covered by the middle 50% of the data
left skewed
median (large) > mean graph data mostly on the right ex. age of death
center and spread
median and IQR
skewed right
most data is on the left side, tail is on the right. Skewed look at tail mean > median (small) ex. salary
uniform
no mode
95% standard deviation
normal shape so 2 standard deviation away.
In order to study whether IQ level is related to gender, data were collected from a sample of 540.
side-by-side boxplots
Only one of the two variables is categorical.
side-by-side boxplots
histogram
symmetry/skewness Peakedness (modality) modality ( mode)
lower quartile
the median is not included in the lower quartile, if there are 8 data, then lower quartile is bottom 4 and upper quartile is upper 4
conditional percents students who takes a statistics class at different times of the day affects smoking status
total on the bottom row
conditional percents age affects type of movie one likes best
total, of each movie % on the right column
bimodal
two mode
A survey was conducted to study the relationship between the zip code of the family home and whether they buy or rent the home. Data were collected from a random sample of 280 families from a certain metropolitan area.
two way table
comparing two categorical variables
two-way table or contingency table
greater variability
which boxplot appears longer larger IQR and longer whiskers indicating larger range
conditional percents if the region where one lives affects whether or not one has insurance
100% inputted in the total column of the table on the right column
99.7% is how many standard deviation away
3 $235 and standard deviation = $20, to be 99.7%, then 3*$20= $60 235-60 = 175 to 235+60 = 295
A store asked 250 of its customers to study the relationship between the amount spent on groceries and income.
scatterplot
Both variables are quantitative
scatterplot
What determines which numerical measures of center and spread are appropriate for describing a given distribution of a quantitative variable?
The shape of the distribution determines which numerical measures of center and spread are appropriate for describing a given distribution of a quantitative variable. If the shape is a normal shape or symmetrical distribution with no outliers, then mean and standard of deviation are used). When the center is the mean, then standard deviation measure using standard deviation because it shows the difference between the mean from each data point. If the shape is skewed to the right or left with outliers, then the median should be used to find the center and the best measure of spread when the median is the center is use IQR. If the shape is unsymmetrical in distribution, then median and IQR are used.
display of the relationship between two variables which are both categorical.
Two way table
unimodal
one mode
steam-and-leaf
preserves the original data it sorts the data easy and quick to construct for small, simple datasets