Chi-Square
What situations allow you to change data from parametric to nonparametric testing?
1. It is simpler to obtain category measurements 2. The original scores may violate some of the basic assumptions that underlie certain statistical procedures 3. The original scores may have unusually high variance. Converting the scores to categories eliminates the variance. 4. The experiment produces an undetermined or infinite score
What are the two categories that the null hypothesis can fall in?
1. No Preference, Equal Proportions - the null states that there is no preference among the categories - population is divided equally (e.g. out of 3 categories, the preference is shared among them at 1/3 of the population) - used in situations when a researcher wants to search for a preference among the categories or whether the proportions differ from one category to another - alternative hypothesis states that the population is not equally divided 2. No Difference from a Known Population - null states that on proportion for one population are not different from the proportions that are known to exist in another population - this hypothesis is only when a specific population distribution is known - e.g. you may know a population distribution from an earlier time and you want to see if it is still true (e.g. grades in 1995) - OR you have a known population distribution and you want to see whether a second population has the same distribution - alternative hypothesis states population proportions are not equal as stated in the null hypothesis
What are the characteristics of the chi-square distribution?
1. The formula for chi-square involves adding squared values, so you can never obtain a negative value. Thus, all chi-square values are zero or larger. - in addition, the number of categories play a role. more categories = large sum for the chi-square value 2. When H0 is true, you expect the data (O values) to be close to the hypothesis (E values). Thus, we expect chi-square values to be small when H0 is true.
What is the relationship between the chi-square and the Pearson correlation?
Both tests intend to evaluate the relationship between two variables
What are nonparametric tests?
Chi-Square Use sample data to evaluate hypotheses about the proportions or relationships that exist within populations Do not state hypotheses in terms of parameter and only meet few (if any) assumptions - sometimes called distribution free tests participants usually classified into categories (e.g. Democrat or Republican) involve measurement on the ordinal and nominal scale (frequencies)
What is the expected frequency?
For each category is the frequency value that is predicted from the proportions in the null hypothesis and the sample size (n). The expected frequencies define an ideal, hypothetical sample distribution that would be obtained if the sample proportions were in perfect agreement with the proportions specified in the null hypothesis. - can be decimals or fractions unlike observed frequencies which are always whole numbers
What does a large chi-square statistic value mean? How can you ensure larger chi-square values?
Indicates a significance because there are large discrepancies between the O and E. There is not a good fit between the data and the hypothesis (null). In turn, we reject the null hypothesis Chi-square value gets larger as there are more categories and the degrees of freedom increases.
Define the chi-square statistic.
Measures how well the data (fo) fit the hypothesis (fe) - numerator measures the difference between the data and hypothesis - add the values to obtain the total discrepancy between the data and hypothesis - denominator has the obtained discrepancy between O and E is viewed as relatively large or relatively small depending on the size of the expected frequency Numerical value of the chi-square is a measure of the discrepancy between the O and E
How is the chi-square statistic similar to the single-sample t-test? When should each be used?
Similar because both tests are intended to use the data from a single sample to test hypotheses about a single population Appropriate to use t-test when the data consists of numerical value (interval and ratio) Appropriate to use chi-square when the individuals in the sample are placed in nonnumerical categories (ordinal and nominal)
What is the null hypothesis for the chi-square test for independence?
States that the two variables being measured are independent; that is, for each individual, the value obtained for one variable is not related to the value second variable VERSION 1 - data are viewed as a single sample with each individual measured on two variables - the null states that there is no relationship between two variables (e.g. personality and color preference) - similar to correlation VERSION 2 - data are viewed as two (or more) separate samples representing two (or more) populations for treatment conditions - the null states that there is no difference between two (or more) variables. The populations have the same proportions. - similar to independent t test or ANOVA EQUIVALENCE - If the proportions are the same (version two null), then there is no relationship (version one null) - If the proportions are different, then there is a relationship
What is the chi-square of independence?
Tests whether there is a relationship between two variables (e.g. personality: introvert, extrovert; color preference: red, yellow, green) Each individual in the sample is classified on both of the two variables, creating a two-dimensional frequency-distribution matrix. The frequency distribution for the sample is then used to test hypotheses about the corresponding frequency distribution for the population.
What does a small chi-square statistic value mean?
There are small discrepancies between the O And the E. We conclude that there is a good fit between the data and the hypothesis (null). We fail to reject the null hypothesis.
What does it mean for two variables to be independent?
Two variables are independent when there is no consistent, predictable relationship between them. In this case, the frequency distribution for one variable is not related to (or dependent on) the categories of the second variable. As a result, when two variables are independent, the frequency distribution for one variable has the same shape (same proportions) for all categories of the second variable.
What conditions must be satisfied before using the chi-square test?
Violation of assumptions and restrictions casts doubt on results, and increases the probability of a Type 1 error 1. INDEPENDENCE OF OBSERVATIONS: One consequence of independent observations is that each observed frequency is generated by a different individual. A chi-square test would be inappropriate if a person could produce responses that can be classified in more than one category or contribute more than one frequency count to a single category. 2. A chi-square test should not be performed when the expected frequency of any cell is less than 5. The chi-square statistic can be distorted when E is very small by causing a small discrepancy that results in a large value for the chi-square statistic. It is best to have a large sample size.
What is the relationship between the chi-square and the independent-measures t test and ANOVA?
evaluate the significance of the relationship by determining whether the relationship observed in the sample provides enough evidence to conclude that there is a corresponding relationship in the population
What are parametric tests?
t-test and ANOVA Tests that concern population parameters and require assumptions about parameters Test hypotheses about population parameters Requires the three assumptions (normal distribution; homogeneity of variance) Require numerical score for reach individual interval or ratio scale
What is the observed frequency?
the number of individuals from the sample who are classified in a particular category. Each individual is counted in one and only one category. - observed frequencies add up to the total sample size - simply measuring individuals to determine which category they belong
What is the chi-square test for goodness of fit? Give an example of a research question.
uses sample data to test hypotheses about the shape or proportions of a population distribution. the test determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis "Of the two leading brands of cola, which is preferred by most Americans?"
