Chapter 6
Simpson's paradox
• As with quantitative variables, lurking variables can affect or reverse relationships between two categorical variables. • As association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson's paradox
marginal distribution
The distribution of the column variable or the distribution of the row variable The marginal distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table Steps 1. Calculate row/column totals if this is not already done 2. Calculate distributions of row or column variable alone (by dividing row/column totals by the total number of observations)
Conditional distribution
The distribution of the row/column variable conditional on a category of the other variable calculated from the counts of one variable limited to a given category of the other variable Conditional distribution of one variable is the distribution of that variable conditional on a particular category of the other variable Steps 1. Select the row(s)/column(s) of interest 2. Use the data in the table to calculate the conditional distribution (by dividing the cell count by the row/column totals)
If you want to examine the relationship between two categorical variables, it is best to look at the: a) Marginal distributions b) Conditional distributions
b) Conditional distributions Marginal distributions tell us nothing about the relationship between two categorical variables. They tell us about only a single variable at a time, but not in relation to the other. The conditional distributions will show how one variable behaves for different values of the second. It is this distribution that lets us see relationships between categorical variables.
Two-way tables
describe relationships between two categorical variables
Conditional distribution distinctions
o Conditional distribution of one variable is the distribution of that variable conditional on a particular value of the other variable o By comparing the conditional distributions of one variable for several categories of the other variable, we can see how/if changing the categories for the second variable influences the distribution of the first variable
Marginal distribution distinctions
o Univariate summaries (only a single variable) o Marginal distributions do NOT say anything about the relationship between the two variables o The overall distribution of the row/column variable regardless of the other hospital