Statistics and Research Methods: Lecture 8
Column Percentages
(Cell Frequency / Column Total) X 100 The contingency table may be converted to percentages of the column by dividing each frequency by the column total and multiplying by 100. With nominal variables we cannot rank them, therefore we cannot say anything about the direction of the association. We can talk about the existence and the strength of one but not a direction of it. Slide 12.
Row Percentages
(Cell Frequency / Row Total) X 100 The contingency table can also be converted to percentages of the rows by dividing each frequency by the row total and multiplying by 100.
Table Percentages
(Cell Frequency / Table Total) X 100 The previous contingency table may be converted to percentages of the table total by dividing each frequency by the table total and multiplying by 100
Crosstabulation and Bivariate Association
A table that presents the distribution - frequencies or/and percents - of one variable (usually the dependent variable) across the categories of one or more additional variables (usually the independent variable or variables). Like a frequency table for at least two (nominal or ordinal) variables. It shows the counts and percentages for each combination of categories for at least two variables or two sets of measurements
Measures of Association for Two Nominal Variables
Assess only the existence and the strength of the association between the two nominal variables. It is not possible to talk about the direction of the association in case of nominal variables, as it is not possible to rank-order categories of nominal variables. The magnitude of the association is between 0.00 and 1. 00 means no relationship, while 1.00 indicates perfect association
Spearman's Rho Coefficient Assumptions
Assumptions: 1. A straight-line correlation 2. Ordinal continuous data 3. Random sampling
Contingency Coefficient
Based on the test statistic chi-square, it evaluates the strength of association between the two nominal variables. It is a symmetric and non-directional measure. The values for CC range from 0 to 1. This is suitable for square tables (3X3, 4X4). It may not reach 1.00 even when there is a perfect association between the two nominal variables, especially in non-square tables. The only difference from Phi is that is uses square crosstabs that are larger. The assumptions are: 1. Square crosstabs 2. Random sampling Slide 31 for formula and 32-35 for example.
Cramer's V
Based on the test statistic chi-square, it evaluates the strength of association between the two nominal variables. This is a symmetric and non-directional measure. It can reach 1.00 only when the two variables have equal marginals. The more unequal the marginals, the more V will be less than 1.00. The values for Cramer's V ranges from 0-1 and is suitable for rectangular tables (2X3, 4X5, etc). The general assumptions are: 1. Nominal variables 2. Rectangular crosstabs. Example on slide 38-41
Types of Ordinal Variables
Collapsed Ordinal Variables: Has only a few discrete categories that can be ranked. Ex. Approve, neutral, disapprove, strongly disapprove Continuous Ordinal Variables: Have many possible scores, but different from interval-ratio variables in that values are still discrete and distance between the categories is not equal. Ex. What do you think about Steven harper's economic policy? 1-5, 1 being strongly disapprove, 5 being strongly approve.
Measures of Association
Describe the nature, strength and direction of the relationship between two variables in general.
Association Between Two Variables
If variables are associated, then the score on one variable can be predicted from the score of the other variable. The stronger the association, the more accurate the predictions. Bivariate association can be investigated by finding answers to three questions: 1. Is there an association? 2. How strong is the association? 3. What is the pattern or direction of the association?
Lambda
Lambda is a measure of association between two categorical variables (a nominal variable with few categories). Lambda is an asymmetrical or directional measure of association. Its value depends on which variable is chosen as an independent (or column) variable. Lambda value shows the existence and strength of association between two variables. Lambda value ranges between 0.00 and 1.00 Formula on page 43, example 44-45
Lambda Weaknesses
Lambda tends to *underestimate* the strength of a relationship, especially when one of the variables has low variation. In such cases, it is a good idea to use Phi, Contingency Coefficient or Cramer's V, depending on size of crosstab.
Negative Association
Low on X is associated with High on Y. High on X is associated with low on Y. As X increases, Y decreases. This relationship is negative.
Positive Association
Low on X is associated with low on Y. High on X is associated with high on Y. As X increases, Y increases. This relationship is positive.
Cross-tabulation and Usage of Percentages
Meaningful to use with nominal and ordinal variables. The Independent variable is always the column variable and dependent variable is the row variable. 1. If the independent variable is on the rows, use row percentages. 2. If the independent variable is on the columns, use column percents. 3. If there is no clear-cut independent variable, use total, row, or column percents, whichever is most meaningful for the particular focus. Example lecture 8 slide 9-10
Measures of Association for Nominal, Ordinal and Interval-Ratio Variables
Nominal Variables: Phi, Cramer's V, Contingency Coefficient and Lambda. Ordinal Variables: Gamma, Tau-b, Tau-c, Somer's d, and Spearman's Rho Interval-Ratio Variables: Pearson's r
Strength of Association
One way to measure strength is to find the "maximum difference". This is the largest difference in column % for any row of the table. If the difference is between 1-10%, it is weak. If it's between 10-30%, it's moderate. If it's greater than 30%, it's strong. - Look at chart on slide 18 for more details.
Gamma
Operates on the PRE logic. This uses ordinal collapsed variables. It is a symmetric non-directional measure of association and used with square or rectangular crosstabs. Its value ranges from -1.00 to +1.00. 0.00 suggests no association, whereas 1 indicates a perfect association. The assumptions are: 1. Ordinal collapsed variables 2. Random sampling Formula on slide 53, example 54-64
Tau-b
Operates on the PRE logic. Used with ordinal-collapsed variables. A symmetric or non-directional measure of association. Tau-b is appropriate for square tables (Number of Columns = Number of Rows, e.g. 2 X 2; 3 X 3) Considers tied pairs in its calculation. Its value ranges from -1.00 to +1. 00. 0.00 suggests no association, whereas 1.00 indicates a perfect association. Formula on slide 66-75.
Phi Coefficient
Phi is a chi-square based measure of association as well. Phi is used as a measure of association for only 2X2 tables formed by true dichotomies. Phi value ranges from 0.00 to +1.00 While 0.00 indicates no relationship, 1.00 shows perfect relationship. There are three assumptions: 1. Nominal data 2. A 2x2 table 3. Random sampling In the picture, it's an upper case N (total number of cases) instead of lower case n depicted in picture. Example slide 25-29
Measures of Association for Discrete Ordinal Variables
Provide information about the strength as well as the direction of association between two ordinal variables. Their values range from -1.00 to +1.00. 0.00 shows no association, whereas 1 indicates a perfect association.
Spearman's Rho Coefficient
Suitable for "ordinal continuous" variables. An index of association between the two variables. Used when participants are ranked on both variables. Calculated using the ranks in the variable, not actual data values, as in the case of Pearson correlation coefficient. A symmetric and PRE-based (proportional reduction in error) measure of association. Its value ranges from -1.00 to +1.00. 0.00 denotes no association, whereas 1.00 indicates a perfect association. Example 86-89
Values for Measures of Association for Nominal Variables
The magnitude is between 0.00 and 1. 00. 0.00 means no relationship, while 1.00 indicates perfect relationship.
Values for Measures of Association for Ordinal and Interval/Ratio Variables
The magnitude ranges between -1.00 and +1.00 (+) means two variables vary in the same direction (increase or decrease simultaneously) (-) indicates that two variables vary in opposite directions. While one increases, the other decreases or vice versa. Whereas -1.00 or +1.00 indicates perfect relationship, 0.00 reveals no relationship.
No Association
There is no change in Y regardless of the categories of X. This indicates no association.
Bivariate Data
Three combinations of variable types: Both variables are qualitative (attribute) One variable is qualitative (attribute) and the other is quantitative (numerical) Both variables are quantitative (both numerical). These are graphically presented as cluster bar graphs with two nominal or ordinal variables OR Line graphs or frequency polygons, or scatter-plot with two interval-ratio variables
Selecting Appropriate Measure of Association
Two criteria: 1. The level of measurement of the variables 2. The technical limitations of association statistics Rule: ALWAYS select correlation coefficient appropriate for the lowest level of measurement in the bivariate relationship (For example If one for nominal and ordinal, use a test for nominal)
Somer's d
Used with collapsed ordinal variables. Used with crosstabs with equal or unequal number of rows and columns (3X3, 2x3). Asymmetric or directional extension of GAMMA . Based on the PRE principle. Used when the researcher knows independent and dependent variables. Its values range from -1.00 to +1.00. 0.00 indicates no association, whereas 1 denotes perfect association. Example and formula 81-82.
Tau-c
Used with collapsed ordinal variables. Used with non-square or rectangular crosstabs (2x3, 4x2, etc) Based on the PRE principle Symmetric or non-directional measure of association. Its values range from -1.00 to +1.00. 0.00 indicates no association, whereas 1 denotes perfect association. Formula on slide 77, example 78-79.