Graphs, Variables, and Association
What is range?
(maximum-minimum). Usually the worst measure of spread
P(A^c)
1-P(A)
Variable
A characteristic of an individual
How do you calculate outliers?
A data point is an outlier if it falls more than 1.5 x IQR below Q1 or above Q3
What is an outlier?
A data point that falls far outside the normal range of your data.
What does an outlier look like on a box plot?
A dot
What type of association does r measure?
A linear association ONLY. Data can still be associated even if it has a low r value
Parameter
A numerical study of the entire population (usually unknown)
How does r describe the direction of association?
A positive r means a positive association, and vice versa
What is a confounding variable?
A variable associated with both the response and explanatory variable
Inferential Statistics
An inference about the population based on the data
What is a lurking variable?
An unobserved variable that influences the relationship between the variables of interest
What are the advantages of a bar chart?
Can be easier to read than pie charts, you can show results where a participant is allowed to choose more than one category
Explanatory variable
Defines the groups to be compared with respect to values for the response variable (explains the response variable)
Two sub-types of numerical variables
Discrete and Continuous
What types of graphs can be used for Quantitative data?
Dot plots, histograms, and box plots
Quantitative Variable
Each variable value is a number for which a mean makes sense
When do you use IQR and median?
If outliers or skewness are present
What is IQR?
Inter-quartile range. It is the middle 50% of the data. IQR=Q3-Q1
How do you determine association between two variables in a stacked bar chart?
Large differences between conditional percentages in different bars
How do you determine association between two variables in a side-by-side bar chart?
Large height differences between bars in the same section
What are the two measures of center?
Mean and median
Does a box plot show multiple modes?
No
Is the mean resistant to outliers?
No
Is the range resistant to outliers?
No
Is the standard deviation resistant to outliers?
No
Independent variables
No association exists
Descriptive Statistics
Numerical studies of only the sample (ex: mean or proportion)
Discrete Variable
Numerical variable that can only take certain fixed values (no intermediate values possible)
Continuous variable
Numerical variable that can take on any real numerical value over an interval
What types of graphs can be used for qualitative data?
Pie and bar charts
Qualitative Variable
Places an individual into one of several groups
Percentage
Proportion x100
What is the first quartile?
Q1- the 25th percentile (bottom 25% of the data)
What is the third quartile?
Q3- the 75th percentile
What are the measures of spread?
Range, standard deviation, and IQR
How can we visually compare two quantitative variables?
Scatterplots
Characteristics of side-by-side bar charts?
Sections of bars are determined by the explanatory variable. The bars within each section represent the sizes of the response variable within that section
How can we visually compare two categorical explanatory variable with quantitative response variables?
Side-by-side box plots
What types of data are dot plots used for?
Small, discrete data sets
What is variance?
Standard deviation squared
What are the different distributions that data can have?
Symmetric and normal, right skewed, left skewed, unimodal, and bimodal
What is the mean?
The average of all the data (mu or x-bar)
When comparing side-by-side box plots, how can you tell if there's an association?
The box plots do not overlap
How is the strength of an association measured?
The correlation coefficient (r), a number between -1 and 1
What is a conditional distribution?
The distribution of one variable, conditional on a level of the other variable. One variable [conditional on, given, if, when, for] the other variable
Population
The entire group of people the researcher is interested in
Proportion
The frequency of a response/total number of individuals in the study. Is a number between 0 and 1
Sample
The group of people you collect data on
What is the second quartile?
The median
What is the median?
The middle point of the data (after it has been ordered) (M or M-hat)
Response variable
The outcome variable on which comparisons are made (the variable of interest)
What is standard deviation?
The typical distance between an observation and the mean of the data, or the typical distance between two observations (sigma or s)
What is the pth percentile?
The value such that p percent of the observations are less than that value
Characteristics of a stacked bar chart?
There is one bar for each category of the explanatory variable, and the sections within each bar represent the proportion of each response variable within that explanatory variable
Association
When a particular value for one variable is more likely to occur with certain values of the other variable
When do you use standard deviation and mean?
When the distribution is approximately symmetric and has no outliers
Does a box plot show skewness and outliers?
Yes
Is IQR resistant to outliers?
Yes
Is the median resistant to outliers?
Yes
What is the benefit of a pie chart?
You can quickly compare the size of a category to the total
What is a weak r value?
|r|< 0.3
What is a strong r value?
|r|> 0.7