Chapter 12, "Inference on Categorical Data,"

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Contingency table Where we record the results of data collection Observed frequencies The data collected Expected frequencies The data that would be collected if there was no relationship between the variables Categorical data Classification items such as political affiliation and employment status

...

Data collected in a survey is organized in a contingency table.

...

Different types of data may require different analysis techniques. You would not use the same technique when studying the relationship between age and reaction time as you would when studying the relationship between gender and car model preference.

...

Chi-Square Test of Independence

- *is used to determine whether there is an association between a row variable and a column variable in a contingency table constructed from sample data.* *The Null hypothesis is that the variable are not associated (or independent) *The alternative hypothesis is that the variable are associated (or dependent) This lesson focuses specifically on the chi-square test of independence,chi squared, a non-parametric test that will allow us to draw conclusions about the relationship between two variables. Using hypothesis testing, the chi-square test of independence determines if two variables are associated in some way not related to chance. The null hypothesis states that there is no relationship between the two variables; the research hypothesis states that there is a relationship between the two variables that is not due to chance. If we are able to reject the null hypothesis, we have a significant chi-square, which provides evidence that a significant relationship exists between the two variables. However, please note: A significant chi-square does not imply causation! In other words, a significant relationship does not signify that the change in one variable causes the other variable to change. With the chi-square test of independence, the question we attempt to answer is whether two variables are associated, not why they are associated. A significant chi-square signifies a significant relationship, meaning that the association between two variables is not due to chance.

Further Comments About the Chi-Square Test of Independence There are some limitations to the use of the chi-square test of independence. First of all, chi squared is not reliable when the numbers of observations and expected frequencies in each cell are too small. In fact, statistical corrections are necessary when the expected frequency of any cell is less than 5. When this is the case, there are two ways to correct the problem to avoid using a correction procedure: 1) Increase the sample size so that no expected frequency is less than 5. 2) Combine cells (when this approach is justifiable) in order to increase the expected frequency within each cell. Secondly, contingency tables that are 2X2 require the use of the Yate's correction for continuity. This involves decreasing the difference between the observed and the expected frequencies by.5 for each cell. Finally, the chi-square test of independence is only an indication of whether a relationship exists between two variables. Chi-square gives no indication of the strength of the relationship. Further testing is required to determine if the relationship between two variables is a strong one.

...

If a contingency table has 5 columns and 7 rows (not including the total row and column), the degrees of freedom for the Chi-square is 24.

...

Non-parametric tests can be used when dealing with categorical variables.

...

Some data are categorical in nature. Instead of data that can be measured on an interval scale, data may be gathered from categories, such as gender, eye color, and marital status. Or an online poll may simply gather information on whether users agree or disagree with a particular statement. Categorical data require different analysis techniques, ones that do not rely on determining the mean or standard deviation. You will still organize the data, and you will also continue to use hypothesis testing to draw conclusions about the relationship between two variables.

...

Statistical analysis of *categorical data requires different approaches than statistical analysis of ratio or interval scaled data. The chi-square test of independence, for example, is used to determine if two variables are related.* When testing the relationship between two variables: The null hypothesis states that the two variables are not related. The research hypothesis states that the two variables are related and that relationship is not due to chance. A significant chi-square is evidence that a significant relationship exists between the two variables, and therefore, you can reject the null hypothesis.

...

The chi-square formula relies on observed frequencies (the data you collect) and the expected frequencies (the frequencies we would expect to find if no relationship existed between the variables). By constructing a contingency table, you can record and organize the collected data by category and calculate totals for the various categories. Additional calculations can also be included in the cells of the table until you are ready to evaluate the chi-square formula. If the values are evenly distributed across the categories, there's probably little or no relationship between two variables. If the values are more concentrated in a few categories, a relationship between the two variables likely exists.

...

The chi-square test of independence is used with categorical data to determine whether two variables are related. Record and organize observed frequencies in a contingency table. Then calculate the expected frequencies using the row and column totals. You can use this information in the chi-square formula. Once you have your obtained chi-square value, compare it to the critical value for chi-square to determine if the relationship is significant.

...

To calculate chi-square, you need the observed frequencies and expected frequencies. That is why constructing a contingency table is so important. It helps you organize all the pertinent data. A contingency table is also essential for finding the degrees of freedom so that you can determine the critical value for chi-square. df = (#rows - 1)(#columns - 1) Remember, you do not include the total row or the total column in your numbers for the degrees of freedom. Once you calculate the degrees of freedom, use the Critical Values for Chi-Square Table to find the critical value for chi squared. - *If X^2 obt greater than or equal to X^2 crit* ,then chi-square is significant, and you can reject the null hypothesis. - *If X^2 obt less than X^2 crit*, then chi-square is not significant, and any relationship between the values is due to chance. The chi-square test of independence measures the overall difference between observed frequencies and the expected frequencies. The larger the difference is, the larger the chi-square value is. This is necessary for you to establish a true significant relationship between two variables.

...

When applying the chi-square test of independence to the categorical data collected, the first step you take is to create a contingency table to record the observed frequencies and totals of the data. Next, you use the observed frequencies and the expected frequencies that you calculated to determine the value of chi squared. You can then determine whether chi squared is significant and if the relationship between two variables is significant.

...

When working with contingency tables, some statistical corrections may be needed. If the expected frequency of a cell is less than 5, you can do one of the following: - Increase the sample size - Combine cells (when appropriate) to increase the expected frequency in each cell If the contingency table is 2 × 2, it is advised that you use the Yate's correction for continuity. This involves decreasing the difference between the observed and the expected frequencies by 0.5 for each cell. Remember, a significant chi-square does not indicate the strength of the relationship between the two variables. It only indicates that a relationship exists.

...

We then ask the question: Is X^2 obt greater than or equal to X^2 crit?

If we can answer yes to this question, then chi-square is significant. A significant chi-square indicates that our groups have a significant association that is beyond chance. If we answer no to this question, then chi-square is not significant, which means there is not a significant relationship between the two variables and any association we see is purely by chance.

Understanding the Chi-Square Formula

In order to calculate chi-square (chi squared), we need to have the observed frequencies and the expected frequencies. Once those are calculated, we can use the chi-square formula to calculate chi squared. x^2 = E (fo - fe)^2/fe Let's dissect the formula: x^2 - The symbol for chi-square. E — Sigma, which indicates that we are to sum whatever follows f o — The symbol that represents observed frequencies f e — The symbol that represents expected frequencies

Chi-Square Values The chi-square test of independence is essentially a measure of the overall difference between observation and expectation.

Larger differences between the observed frequencies and the expected frequencies will result in larger chi-square values. Larger chi-square values are needed to establish a true significant relationship between two variables.

Categorical Data

Not all research situations involve data that are on the interval or ratio measurement scale. *Parametric inferential statistical tests require that we have at least interval data in order to use them.* This should make sense to you: we certainly can't calculate the mean or standard deviation of a data set if the data are not quantitative in nature. But what happens when we collect data that is categorical rather than quantitative? When a research situation uses categorical data, special analysis techniques must be used.

Observed and Expected Frequencies

Observed Frequencies: The numbers that are listed in the contingency table cells (not including the Total row and column) are referred to as the observed frequencies. For example, the observed frequency for Democrats who voted Yes is 40. Each cell of the contingency table has an observed frequency. Expected Frequencies: Expected frequencies are the frequencies we would expect to find in each cell if no relationship exists between the variables. Just as with observed frequencies, each cell will have an expected frequency. We calculate that frequency for each cell by: Expected Frequencies of each cell = Row total * Column total/ n

The Chi-Square Table

Since the chi-square test of independence is also a test of the null hypothesis, we have to determine the critical value for X^2. To use the Critical Values for Chi- Square table, we need to calculate the degrees of freedom for chi-square. The degrees of freedom is calculated by multiplying one less than the number of rows with one less than the number of columns, not including the total row or the total column. df = (#rows — 1X#columns —1) Once we calculate the degrees of freedom, we use the table to find the critical value for chi squared. (Chi Square Table)

Why is the Chi-Square Test Useful?

Some examples of the types of questions we can answer using the chi-square test of independence include: - A question about selection factors Is gender associated with the selection of a presidential candidate? - A question about who uses services Do male and female students differ in their use of the college counseling center? - A question about relationship between choice and affiliation Is political affiliation associated with voters' vote on the increase of the school tax levy?

Steps for Using a Chi-Square Let's look at the steps for using a chi-square for our example.

Step 1: Create a contingency table to record the observations that were collected during the data gathering process. Step 2: Calculate degrees of freedom and use the Critical Values for Chi-Square table to locate the critical value for chi-square at the .05 significance level. Remember that the level of significance to be used in the research study must be determined prior to data collection! df = (#rows - 1)(#columns - 1) df = (2 - 1)(3 - 1) df = 2 X^2 crit = 5.991 Step 3: Calculate the expected frequencies for each cell. Each expected frequency is listed in parentheses within each cell. Step 4: Use the observed frequencies and the expected frequencies to calculate X^2. X^2 = E (fo - fe)^2/fe Step 5: Determine if chi-square is significant, and interpret the results. Is X^2 obt greater or equal to X^2 crit Is 20.58greater or equal than 5.991? Yes! Therefore, chi squared is significant. The relationship between political affiliation and view on the proposed school levy tax is significant. The differences that we see in responses to this survey question are not due to chance, but are due to the relationship that exists between the two variables.

Calculating Chi-Square

Step 1: The process for calculating x^2 begins with finding the difference between the observed frequency and the expected frequency for each cell. Step 2: Then, the difference is squared and divided by thr expected frequency for that cell. Step 3: Finally, once that has been completed for each cell, the value calculated for each cell is added to those obtained from the other cells to find x^2.

Contingency Table

There are some terms we need to learn before we can begin using the chi-square test of independence. Contingency Table: A contingency table is a table in which we record the results of the data collection. The results are categorized into the appropriate cell, and row and column totals are calculated. Totals are available for each row and each column, and the bottom right cell is the grand total of responses. There's probably little or no association between two variables when the cases are evenly distributed across the cells of the contingency table. When there's a larger concentration of cases in a few of the cells, there's a greater chance that some association exists between the two variables. A possible contingency table organized to answer the following question is shown here: Is political affiliation associated with voters' vote on the increase of the school tax levy? The size of the contingency table is determined by the number of columns and the number of rows, not including the Total column and row. The table shown here is a 2X3 contingency table, since there are two rows and three columns, once we remove the Total column and row.

An Example Using Chi-Square

We've seen each element of the chi-square test of independence. Now let's put together all of the individual pieces and walk through an example. We surveyed 300 people to determine their political affiliation and their views on the proposed school tax levy. Their responses are listed below in the contingency table. We want to determine if there is a significant relationship between political affiliation and whether they will vote for the school tax levy.

These techniques are referred to as non-parametric tests. Examples of categorical data include the following:

Yes, no, or undecided answers on a survey A scale that includes response options strongly agree, agree, disagree, or strongly disagree Classification items such as gender, political affiliation, employment status, or religious afiliation

Check Your Understanding

You wish to find the critical value for Chi-square at the .01 significance level. Your table of data is 8 rows by 3 columns. What is the critical value? Refer to the Chi-square table. 29.1412


Kaugnay na mga set ng pag-aaral

Exam 5 Evolve Questions- Elimination, reproductive, infection

View Set

CHAPTER 13 MONETARY POLICY: CONVENTIONAL AND UNCONVENTIONAL

View Set

Chapter 16: Observational Behavior

View Set

OB final review Penny ch 25,27,28,29

View Set