Statistics Chapter 21
This example described a model of dominant epistasis with a 12 : 3 : 1 phenotypic ratio in the second generation (F2). A cross of white and green summer squash plants gives the following numbers of squash in F2: 131 white squash, 34 yellow squash, and 10 green squash. What is, approximately, the expected count of green squash?
10.9 Formula: n * pi Solution: 175 * 1/16 = 0.0625 = 10.9
Are trees distributed randomly in a forest? A sample of 100 trees in the Wade tract was taken, and the tract divided into four equal parts. We expect to find _____ trees in each quadrant.
25 (If the trees are randomly distributed, then we'd expect 1/4 of our sample in each quadrant that represents 1/4 of the land. That means we'll expect (1/4)(100) = 25 trees per quadrant.)
A very large pediatrics office is open for business Monday through Friday. Management wants to know if antibiotics are prescribed uniformly over the workweek (Monday through Friday). A random sample of 200 antibiotic prescriptions written over the past 12 months is taken. What is the null hypothesis for the chi-square goodness-of-fit test?
H0: The distribution is uniform: pMon = pTue = pWed = pThu = pFri = 1/5
This example described a model of dominant epistasis with a 12 : 3 : 1 phenotypic ratio in the second generation (F2). A cross of white and green summer squash plants gives the following numbers of squash in F2: 131 white squash, 34 yellow squash, and 10 green squash. State the null and alternative hypotheses.
H0: pwhite = 12/16 and pyellow = 3/16 and pgreen = 1/16 Ha:H0 is not true
Finding a non-significant P-value is
NOT a validation of the null hypothesis and does NOT suggest that the data do follow the hypothesized model. It only shows that the data are not inconsistent with the model.
What does the chi-square statistic for goodness of fit with k proportions measure?
The chi-square statistic for goodness of fit with k proportions measures how much observed counts differ from expected counts.
When is the chi-square test for goodness of fit used?
The chi-square test for goodness of fit is used when we have a single SRS from a population and the variable is categorical with k mutually exclusive levels.
The null hypothesis of goodness of fit can be
The null hypothesis can be that all population proportions are equal (uniform hypothesis) or that they are equal to some specific values, as long as the sum of all the population proportions in H0 equals 1.
This example described a model of dominant epistasis with a 12 : 3 : 1 phenotypic ratio in the second generation (F2). A cross of white and green summer squash plants gives the following numbers of squash in F2: 131 white squash, 34 yellow squash, and 10 green squash. What is the value of the chi-square statistic for this test?
X2 = 0.124 Formula: X2 = ∑(observed count−expected count)2 / expected count Solution: 1) Convert expected proportion to decimal form: 12 / 16 = 0.75 ; 3/16 = 0.1875 ; 1/16 = 0.0625 2) Calculated expected count: 175 * 0.75 = 131.25; 175 * 0.1875 = 32.81; 175 * 0.0625 = 10.94 3) Observed count minus expected count: 131 - 131.25 = -0.25; 34 - 32.81 = 1.19; 10 - 10.94 = -0.94 4) Square the difference: -0.25^2 = 0.0625; 1.19^2 = 1.41; -0.94^2 = 0.88 5) Divide the difference squared by expected count: 0.0625 / 131.25 = 0.0004762; 1.41 / 32.81 = 0.0429762; 0.88 / 10.94 = 0.0803571 6) Add them all: 0.0004762 + 0.0429762 + 0.0803571 = 0.1238
Chi-Square statistic formula
X2 = ∑(observed count−expected count)2 / expected count
The χ2 distributions are
a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom.
The chi-square statistic is
a sum of components computed separately for each of the k possible outcomes in the distribution
This example described a model of dominant epistasis with a 12 : 3 : 1 phenotypic ratio in the second generation (F2). A cross of white and green summer squash plants gives the following numbers of squash in F2: 131 white squash, 34 yellow squash, and 10 green squash. You can trust the validity of this test because a) all expected counts are larger than 5. b) all observed counts are larger than 5. c) the sample size is larger than 40.
a) all expected counts are larger than 5.
When performing a chi-square goodness of fit test, which of the following must always be a round number? a) the observed counts b) the expected counts c) both the observed and expected counts d) neither the observed nor expected counts
a) the observed counts (These numbers indicate how many times a given outcome was observed. Outcomes can only be counted in whole numbers, so the observed counts will always be round numbers.)
We can safely use the chi-square test when:
all expected counts have values ≥ 1.0 no more than 20% of the k expected counts have values < 5.0
The chi-square distribution is
an approximation to the distribution of the statistic X2.
This example examined the distribution of amaranth seed color. Another genetic model that could explain the existence of exactly three phenotypes is based on a single-gene mode of inheritance with two co-dominant alleles. Under this model, crossing pure breeds of the dominant and recessive traits would give rise to the two original traits as well as an intermediate phenotype in F2, with an expected 1:2:1 ratio of black, brown, and pale seeds, respectively. A genetic crossing resulted in 321 black seeds, 77 brown seeds, and 31 pale seeds. What is the null hypothesis for the corresponding chi-square goodness-of-fit test? a) H0: pblack = pbrown = ppale = 1/3 b) H0: pblack = 0.25 and pbrown = 0.5 and ppale = 0.25 c) H0: pblack = 0.75 and pbrown = 0.18 and ppale = 0.07
b) H0: pblack = 0.25 and pbrown = 0.5 and ppale = 0.25
A chi-square test is sometimes used to confirm the suspicion that a variable has a certain distribution, described by the null hypothesis. In that type of situation, does the lack of significance means that the distribution specified by the null hypothesis is correct? a) Yes, that is what lack of significance means. b) No, there was simply not enough difference to challenge the null hypothesis. c) Yes, since the observations were different than predicted by the null hypothesis. d) No, that is what lack of significance means.
b) No, there was simply not enough difference to challenge the null hypothesis. (Lack of significance means that there was not enough evidence to prove the null hypothesis wrong, but we do not know for sure that the null is correct.)
A researcher wonders if exposure to a certain chemical might change the proportion of obese mice in a population. In a typical population about 10% of mice are considered obese. He collects data from 54 mice exposed to the chemical and records whether or not each mouse is obese. He would like to know if, among mice exposed to the chemical, the proportion of obese mice is greater than 10%. Which test, the chi-square test or the one sample z-test, would be better suited to test the researcher's hypothesis? a) The researcher could use either test, the conclusions will always be the same. b) The one-sample z test, because the alternative hypothesis is one-sided. c) The one-sample z test, because the sample size is relatively small. d) The chi-square test, because the researcher wants to test the distribution of a categorical variable.
b) The one-sample z test, because the alternative hypothesis is one-sided. (A chi-square test cannot be used to test a one-sided alternative hypothesis. In a chi-square test, the alternative hypothesis is always two-sided.)
An advantage of using a one-sample z test, instead of a chi-square test with one degree of freedom, is that: a) The one-sample z test is related to the confidence interval for the population proportion. b) The one-sample z test can be used as a one-sided test. c) Both A and B are correct. d) None of the above. There is no advantage of using a one-sample z test instead of a chi-square test with one degree of freedom.
c) Both A and B are correct.
This example examined the distribution of amaranth seed color. Another genetic model that could explain the existence of exactly three phenotypes is based on a single-gene mode of inheritance with two co-dominant alleles. Under this model, crossing pure breeds of the dominant and recessive traits would give rise to the two original traits as well as an intermediate phenotype in F2, with an expected 1:2:1 ratio of black, brown, and pale seeds, respectively. A genetic crossing resulted in 321 black seeds, 77 brown seeds, and 31 pale seeds. The R output for the appropriate chi-square goodness-of-fit test is X-squared = 568.3566, df = 2, p-value < 2.2e-16. What can you conclude about amaranth seed color? a) The findings prove that amaranth seed color is determined by one gene with two co-dominant alleles. b) The findings are consistent with a color ratio for amaranth seeds determined by one gene with two co-dominant alleles. c) The findings provide very strong evidence that amaranth seed color isn't determined by one gene with two co-dominant alleles.
c) The findings provide very strong evidence that amaranth seed color isn't determined by one gene with two codominant alleles.
In order to use a chi-square test for goodness of fit the data must satisfy a multinomial setting. Which of the conditions below is NOT a condition of the multinomial setting? a) For each observation, the probability of a given outcome is the same. b) The observations are independent. Having information about the outcome of one observation does not change the probabilities assigned to other observations. c) There are an unlimited number of categories, the categories cover all possible outcomes and do not overlap. d) There is a fixed number of observations.
c) There are an unlimited number of categories, the categories cover all possible outcomes and do not overlap. (A multinomial setting requires that each observation falls into only one category, but there are a *fixed* number of categories. If the data is divided into an infinite number of categories a chi-square test cannot be used.)
What does the chi-square statistic measure? a) how big the observed counts are b) how big the expected counts are c) how different the observed counts are from the expected counts d) how different the observed counts are from each other
c) how different the observed counts are from the expected counts (The chi-square statistic is a measure of how different the observed counts are from the expected counts.)
A very large pediatrics office is open for business Monday through Friday. Management wants to know if antibiotics are prescribed uniformly over the workweek (Monday through Friday). A random sample of 200 antibiotic prescriptions written over the past 12 months is taken. Software gives P < 0.01. Your conclusion based on this test is that: a) the data are consistent with a uniform distribution. b) the data provide strong evidence that all five proportions are different. c) the data provide strong evidence that the distribution is not uniform.
c) the data provide strong evidence that the distribution is not uniform.
A researcher wonders if children born in the summer might be at greater risk for a certain birth defect due to higher levels of some environmental toxins. The researcher takes a random sample of birth records for children with this condition and records the month of the birth. To test her hypotheses she should use a chi-square test with _____ levels. a) four b) two c) twelve d) six
c) twelve (There are 12 months in the year so this would give a categorical variable with 12 levels)
The chi-square (χ2) test is used when the data are
categorical
The chi-square (χ2) statistic
compares observed and expected counts.
Before performing the test she should verify that the __________ count for each category is greater than or equal to one.
expected (In order to use a chi-square test the expected count for each category must be greater than or equal to one.)
With n total observations, the expected count for any given outcome i is
expected counti = n × pi0
The chi-square (χ2) test measures
how different the observed data are from what we would expect if H0 was true.
In order to use a chi-square test for goodness of fit the data must satisfy a multinomial setting. One condition for a multinomial setting is that knowledge about one observation does not change the probabilities assigned to other observations, in other words, the observations are all __________.
independent (A multinomial setting requires that all observations are independent.)
What is the symbol for the levels of a categorical variable
k
When there is strong evidence against the null hypothesis the chi-square statistic will have a __________ value.
large (Large values of the chi-square statistic indicate that the observed counts are very different from the expected counts, providing evidence against the null hypothesis.)
For a Chi-Squared distribution, when there is a smaller degree of freedom the distribution is
narrow and tall
A non-significant P-value is
not conclusive: H0 could be true, or not.
If we fail to reject the null hypothesis in a chi-square test then differences between the chi-square components represent only the kind of __________ variations we would expect to see when the null hypothesis is true.
random (Even when the null hypothesis is true, random variations will cause the observed data to differ from what was expected under the null, causing the chi-square components to differ from each.)
To find the P-value associated with a chi-square statistic (X2), calculate the area to the __________ of X2, under the density curve of the distribution.
right (The P-value is the area to the right of X2 under the density curve of the chi-square distribution)
Chi-square distributions generally have a _________ _________ shape.
right-skewed (These distributions are generally right-skewed, with shapes that depend on the degrees of freedom.)
The chi-squared statistic will be _____ if the observed counts are close to the ratio expected under the null hypothesis.
smaller
Large values for χ2 represent
strong deviations from the expected distribution under H0, and will tend to be statistically significant.
Observed counts are
the actual number of observations of each type.
The P-value of the chi-square test for goodness of fit is
the area to the right of the test statistic X2, under the chi-square distribution with k−1 degrees of freedom.
The one-sample z test will always have the same conclusion as
the chi-square test with one degree of freedom.
A significant P-value suggests that
the data do not follow that model.
Expected counts are
the number of observations that we would expect to see of each type if the null hypothesis was true.
The chi-square test compares
the observed counts of observations with the counts that would be expected if H0 was true.
The chi-square goodness of fit test assesses
whether the observed counts of a categorical variable fit the distribution described by the null hypothesis.
For a Chi-Squared distribution, when there is a large degree of freedom the distribution is
wide and short
The chi-square statistic is like a measure of distance, it must always be greater than or equal to _____.
zero (Negative numbers are not used to measure distances.)
The individual values summed in the χ2 statistic are the
χ2 components (or contributions)