STA 2023: Ch 9/10
10.3 List five properties of the F-distribution.
1. Numerator denoted: d.f.N 2. Denominator denoted : d.f.D 3. The F-distribution is: positively skewed and therefore: is not symmetric. 4. The total area under each F-distribution curve is equal to: 1 5. All values of F are greater than or equal to 0. 6. For all F-distributions, the mean value of F is approximately equal to 1.
10.3 List the three conditions that must be met in order to use a two-sample F-test.
1. The samples must be randomly selected. 2. The samples must be independent. 3. Each population must have a normal distribution.
10.4 What conditions are necessary in order to use a one-way ANOVA test?
1. There must be at least 3 samples. 2. Each population must have the same variance. 3. The samples must be randomly selected from a normal, or approximately normal, population. 4. The samples must be independent of each other.
9.2 What is a residual? Explain when a residual is positive, negative, and zero.
A residual is the difference between the observed y-value of a data point and the predicted y-value on a regression line for the x-coordinate of the data point. A residual is positive when the point is above the line, negative when it is below the line, and zero when the observed y-value equals the predicted y-value.
9.1 Explain how to determine whether a sample correlation coefficient indicates that the population correlation coefficient is significant
A table can be used to compare the absolute value of r with a critical value, or a hypothesis test can be performed using a t-test.
9.1 Value of home and life span are two variables that have been shown to have a positive correlation but no cause-and-effect relationship. Describe at least one possible reason for the correlation.
A. Greater wealth allows people to afford more valuable homes and to spend more money on health care, and greater health care spending generally enables people to live longer. B. Exercise tends to increase life spans, people who live within walking distance of amenities tend to walk more than those who do not, and homes that are within walking distance of amenities tend to be more valuable than homes that are not.
10.3 Test the claim about the differences between two population variances
A. If the alternative hypothesis (Ha) contains: -less than (<) or -greater than (>) inequality, a right-tailed test is needed. B. If the alternative hypothesis(Ha) contains - the not-equal-to symbol, (≠) then a two-tailed test is needed. Use: (n1−1) d.f in the numerator and (n2−1) d.fin the denominator.
9.3 A. The coefficient of determination r² is the ratio of which two types of variations? B. What does r² measure? C. What does (1-r²) measure?
A. The coefficient of determination is the ratio of the explained variation to the total variation. B. The coefficient of determination is the percent of variation of y that is explained by the relationship between x and y. C. The value (1-r²) is the percent of the variation that is unexplained
Explain how to determine the values of ad.f.N and d.f.D when performing a two-sample F-test.
A. The variable d.f.N represents the degrees of freedom of the numerator, and the variable d.f.D represents the degrees of freedom of the denominator. B. The value of d.f.N is equal to (n₁ - 1), and the value of d.f.D is equal to (n₂ - 1), where n₁ and n₂ represent the sample sizes of the numerator and denominator (respectively).
9.3 A. Calculate the coefficient of determination given r B. What does this tell you about the explained variation of the data about the regression line? C. About the unexplained variation?
A. coefficient of determination = r² EX: r= 0.033 (square r and get 0.001 after rounding) B. Convert 0.001 to a percentage (0.1%) C. Subtract (0.1%) from 100 to get (99.9%)
10.4 Which statement below describes the hypotheses for a two-way ANOVA test?
A two-way ANOVA test has three null hypotheses, one for each main effect and one for the interaction effect.
10.2 Expected frequency
Expected frequency: (row total * column total) / grand total. Step 1. Find horizontal row total Step 2. Find vertical column total Step 3. Multiply these and divide by the grand total.
10.2 Explain how to find the expected frequency for a cell in a contingency table.
Find the sum of the row and sum of the column in which the cell is located. Find the product of these two sums. Divide the product by the sample size.
10.1 Find the expected frequency, for the given values of n and pi.
Formula: Ei = (n*p)
10.4 State the null and alternative hypotheses for a one-way ANOVA test.
H₀: All population means are equal. Ha: At least one population mean is different from the others.
10.2 Calculate marginal frequencies and sample size.
Marginal relative frequency: calculated by dividing a row total or a column total by the sample size. Sample size: calculated by adding all numbers in the table.
10.2 Decide whether to fail or reject H0.
Reject H0: because the test statistic (x²) is in the rejection region. Fail to reject H0: because the test statistic(x²) is NOT in the rejection region.
10.3 Explain how to find the critical value for an F-test.
Specify the level of significance, α. Determine the degrees of freedom for the numerator, d.f.N, and denominator, d.f.D. Find the critical value of F using technology or the F-distribution table.
10.2 Relative Frequency (%)
Step 1. Count the total number of items or sums of all frequencies. Step 2. Divide each single frequency by the total sum of all frequencies. EX: Cats: 5 Dogs: 5 Total: 10 Divide 5/10 to get 2
10.2 Conditional Relative Frequency (%)
Step 1. Find row (horizontal) total Step 2. Find single row entry and divide by row total.
10.4 Describe the difference between the variance between samples MSB and the variance within samples MSW.
The MSB measures the differences related to the treatment given to each sample. The MSW measures the differences related to entries within the same sample.
9.3 How can the coefficient of determination be interpreted?
The coefficient of determination is the fraction of the variation in money spent that can be explained by the variation in money raised. The remaining fraction of the variation is unexplained and is due to other factors or to sampling error.
9.1 Two variables have a positive linear correlation. Does the dependent variable increase or decrease as the independent variable increases?
The dependent variable increases.
9.3 Describe the explained variation about a regression line in words and in symbols.
The explained variation is the sum of the squares of the differences between the predicted y-values and the mean of the y-values of the ordered pairs.
9.1 "Correlation does not imply causation"
The fact that two variables are strongly correlated does not in itself imply a cause-and-effect relationship between the variables.
10.1 What conditions are necessary to use the chi-square goodness-of-fit test?
The observed frequencies must be obtained randomly and each expected frequency must be greater than or equal to 5.
9.2 Determine if the point is influential. The change in slope or intercept is significant if it is larger than 10%.
The point is not an influential point because the slopes with the point included and without the point included are not significantly different, and the intercepts are not significantly different.
9.1 Describe the range of values for the correlation coefficient.
The range of values for the correlation coefficient is -1 to 1, inclusive.
9.2 Two variables have a positive linear correlation. Is the slope of the regression line for the variables positive or negative?
The slope is positive. As the independent variable increases the dependent variable also tends to increase.
9.3 Standard error of estimate
The standard error of the estimate: the square root of the coefficient of non-determination divided by it's degrees of freedom. DF: N(sample size) - 2 or (N-2)
9.3 Describe the total variation about a regression line in words and symbols.
The total variation is the sum of the squares of the differences between the y-values of each ordered pair and the mean of the y-values of the ordered pairs, or ∑(yi-y)²
9.3 Describe the unexplained variation about a regression line in words and in symbols.
The unexplained variation is the sum of the squares of the differences between the observed y-values and the predicted y-values.
10.2 T/F: If the test statistic for the chi-square independence test is large, you will, in most cases, reject the null hypothesis.
True
9.3 Explain what it means for two variables to have a bivariate normal distribution.
Two variables have a bivariate normal distribution when for any fixed values of x the corresponding values of y are normally distributed, and for any fixed values of y the corresponding values of x are normally distributed.
9.1 Give examples of two variables that have a perfect positive linear correlation and two variables that have a perfect negative linear correlation.
Two variables that have perfect positive linear correlation are the price per gallon of gasoline and the total cost of gasoline. Two variables that have perfect negative linear correlation are the distance from a door and the height of a wheelchair ramp.
9.3 What is the coefficient of determination for two variables that have perfect positive linear correlation or perfect negative linear correlation? Interpret your answer.
Two variables that have perfect positive or perfect negative linear correlation have a correlation coefficient of 1 or −1, respectively. In either case the coefficient of determination is 1, which means 100% of the variation in the response variable is explained by the variation in the explanatory variable.
9.1 A farmer wants to determine if the amount of sunlight received by similar crops can be used to predict the harvest of the crop. explanatory variable? response variable?
amount of sunlight harvest of the crop
10.2 Explain how the chi-square independence test and the chi-square goodness-of-fit tests are similar. How are they different?
chi-square independence test: A. Has d.f = (r-1)(c-1) B. Expected frequency: Er,c C. test if two variables are independent chi-square goodness-of-fit test: A. Has d.f = (k-1) B. Expected frequency: Ei = npi C. Test if a frequency distribution fits an expected distribution Both A. Obtained from a random sample B. Each expected frequency is at least 5 C. Testing a claim about data that are in categories
10.2 Degrees of freedom for chi-square contingency table
d.f = (r-1)(c-1) where r is the number of rows and c is the number of columns. (only counting rows and columns with data values)
Degrees of freedom for ANOVA
dfb =(k-1) the number of groups minus 1 dfw= (N-K)the total number of participants minus the number of groups
9.3 Find the coefficient of determination (r²) using x and y table values
https://exploringfinance.com/coefficient-of-determination-r-squared-calculator/
10.3 Find the critical F-value for a two-tailed test using the indicated level of significance α and degrees of freedom.
https://mathcracker.com/f-critical-values Always the second crit value
9.3 Constructing Prediction Interval
https://mathcracker.com/prediction-interval-calculator-regression-prediction
10.3 Find the critical F-value for a right-tailed test using the indicated level of significance α and degrees of freedom.
https://www.danielsoper.com/statcalc/calculator.aspx?id=4
10.1 Determine the critical value, and the rejection region. DF: (k-1) where k is the number of categories in the table (Chi-squared dist.)
https://www.omnicalculator.com/statistics/critical-value
10.2 Chi-squared independent critical value/rejection region
https://www.omnicalculator.com/statistics/critical-value
10.2 Chi-squared independent test statistic
https://www.socscistatistics.com/tests/chisquare2/default2.aspx
9.2 Finding Correlation Coefficient : r =
https://www.socscistatistics.com/tests/pearson/default2.aspx
9.2 Finding line of regression and estimates ŷ_____x + (________)
https://www.socscistatistics.com/tests/regression/default.aspx
9.1 What does (1 - r²) measure?
percent of the variation that is unexplained
9.1 Discuss the difference between r and p.
r: represents the sample correlation coefficient. p: represents the population correlation coefficient.
9.1 Describe the explained variation about a regression line in words and in symbols.
the explained variation is the sum of the squares of the differences between the predicted y-values and the mean of the y-values of the ordered pairs
9.1 What does r² measure?
the percent of variation of y that is explained by the relationship between x and y