MIS 301 ch 12, 13

Ace your homework & exams now with Quizwiz!

what does a test of independence determine

A test of independence determines whether two factors are independent or not.

Facts about the F distribution

-The curve is not symmetrical but skewed to the right. -There is a different curve for each set of degrees of freedom. -The F statistic is greater than or equal to zero. -As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal as can be seen in the two figures below. -Figure (b) with more degrees of freedom is more closely approaching the normal distribution, but remember that the F cannot ever be less than zero so the distribution does not have a tail that goes to infinity on the left as the normal distribution does. -Other uses for the F distribution include comparing two variances and two-way Analysis of Variance.

5 assumptions to be fulfilled for one way anova test

1. Each population from which a sample is taken is assumed to be normal. 2. All samples are randomly selected and independent. 3. The populations are assumed to have equal standard deviations (or variances). 4. The factor is a categorical variable. 5. The response is a numerical variable.

r-correlation coefficient

A number between −1 and 1 that represents the strength and direction of the relationship between "X" and "Y." The value for "r" will equal 1 or −1 only if all the plotted points form a perfectly straight line.

Y - the dependent variable

Also, using the letter "y" represents actual values while yˆy^ represents predicted or estimated values. Predicted values will come from plugging in observed "x" values into a linear model.

chi square and df

An important parameter in a chi-square distribution is the degrees of freedom df in a given problem. The random variable in the chi-square distribution is the sum of squares of df standard normal variables, which must be independent. The key characteristics of the chi-square distribution also depend directly on the degrees of freedom.

if the null is false in ANOVA

If the null hypothesis is false, then the variance of the combined data is larger which is caused by the different means

goodness of fit variables in chi square

O = observed values (data) E = expected values (from theory) k = the number of different data cells or categories The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. There are n terms of the form (O−E)2E(O−E)2E. The number of degrees of freedom is df = (number of categories - 1).

a is the symbol for the Y-Intercept

Sometimes written as b0b0, because when writing the theoretical linear model β0β0 is used to represent a coefficient for a population.

requirements for test of homogenity

Test StatisticUse a χ2χ 2 test statistic. It is computed in the same way as the test for independence. Degrees of Freedom (df)df = number of columns - 1 RequirementsAll values in the table must be greater than or equal to five. Common UsesComparing two populations. For example: men vs. women, before vs. after, east vs. west. The variable is categorical with more than two possible response values.

when to use chi square

The chi-square distribution is a useful tool for assessment in a series of problem categories. These problem categories include primarily (i) whether a data set fits a particular distribution, (ii) whether the distributions of two populations are the same, (iii) whether two events might be independent, and (iv) whether there is a different variability than expected within a population.

null vs alternative hypothesis

The null hypothesis says that all groups are samples from populations having the same normal distribution. The alternate hypothesis says that at least two of the sample groups come from populations with different normal distributions. If the null hypothesis is true, MSbetween and MSwithin should both estimate the same value.

which tail is the one way anova test

The one-way ANOVA hypothesis test is always right-tailed because larger F-values are way out in the right tail of the F-distribution curve and tend to make us reject H0.

what does the one way anova test depend on

The one-way ANOVA test depends on the fact that MSbetween can be influenced by population differences among means of the several groups. Since MSwithin compares values of each group to its own group mean, the fact that group means might be different does not affect MSwithin.

R2R2 - Coefficient of Determination

This is a number between 0 and 1 that represents the percentage variation of the dependent variable that can be explained by the variation in the independent variable. Sometimes calculated by the equation R2=SSRSST where SSR is the "Sum of Squares Regression" and SST is the "Sum of Squares Total." The appropriate coefficient of determination to be reported should always be adjusted for degrees of freedom first.

X - the independent variable

This will sometimes be referred to as the "predictor" variable, because these values were measured in order to determine what possible outcomes could be predicted.

test of independence chi square

To assess whether two factors are independent or not, you can apply the test of independence that uses the chi-square distribution. The null hypothesis for this test states that the two factors are independent. The test compares observed values to expected values. The test is right-tailed. Each observation or cell category must have an expected value of at least 5.

sum of squares

To find a "sum of squares" means to add together squared quantities that, in some cases, may be weighted. We used sum of squares to calculate the sample variance and the sample standard deviation

testing variability with chi square

To test variability, use the chi-square test of a single variance. The test may be left-, right-, or two-tailed, and its hypotheses are always expressed in terms of the variance (or standard deviation). Use the test to determine variation. The degrees of freedom is the number of samples - 1. The test statistic is (n-1)s2σ20(n-1)s2σ02, where n = sample size, s2 = sample variance, and σ2 = population variance. The test may be left-, right-, or two-tailed.

summary: goodness of fit test

Use the goodness-of-fit test to decide whether a population with an unknown distribution "fits" a known distribution. In this case there will be a single qualitative survey question or a single outcome of an experiment from a single population. Goodness-of-Fit is typically used to see if the population is uniform (all outcomes occur with equal frequency), the population is normal, or the population is the same as another population with a known distribution. The null and alternative hypotheses are: H0: The population fits the given distribution. Ha: The population does not fit the given distribution.

summary: independence test

Use the goodness-of-fit test to decide whether a population with an unknown distribution "fits" a known distribution. In this case there will be a single qualitative survey question or a single outcome of an experiment from a single population. Goodness-of-Fit is typically used to see if the population is uniform (all outcomes occur with equal frequency), the population is normal, or the population is the same as another population with a known distribution. The null and alternative hypotheses are: H0: The population fits the given distribution. Ha: The population does not fit the given distribution.

Summary: homogenity test

Use the test for homogeneity to decide if two populations with unknown distributions have the same distribution as each other. In this case there will be a single qualitative survey question or experiment given to two different populations. The null and alternative hypotheses are: H0: The two populations follow the same distribution. Ha: The two populations have different distributions.

F ratio, two estimates of the variance

Variance between samples: An estimate of σ2 that is the variance of the sample means multiplied by n (when the sample sizes are the same.). If the samples are different sizes, the variance between samples is weighted to account for the different sample sizes. The variance is also called variation due to treatment or explained variation. Variance within samples: An estimate of σ2 that is the average of the sample variances (also known as a pooled variance). When the sample sizes are different, the variance within samples is weighted. The variance is also called the variation due to error or unexplained variation.

one way anova

a method of testing whether or not the means of three or more populations are equal; the method is applicable if: all populations of interest are normally distributed. the populations have equal standard deviations. samples (not necessarily of the same size) are randomly and independently selected from each population. The test statistic for analysis of variance is the F-ratio.

analysis of variance

also referred to as ANOVA, is a method of testing whether or not the means of three or more populations are equal. The method is applicable if: all populations of interest are normally distributed. the populations have equal standard deviations. samples (not necessarily of the same size) are randomly and independently selected from each population. there is one independent variable and one dependent variable. The test statistic for analysis of variance is the F-ratio.

calculation of sum of squares and mean of squares

k = the number of different groups nj = the size of the jth group sj = the sum of the values in the jth group n = total number of all the values combined (total sample size: ∑nj) x = one value: ∑x = ∑sj Sum of squares of all values from every group combined: ∑x2 Between group variability: SStotal = ∑x2 - (∑x2)n(∑ x2)n Total sum of squares: ∑x2 - (∑x)2n(∑x)2n Explained variation: sum of squares representing variation among the different samples: SSbetween = ∑[(sj)2nj]−(∑sj)2n∑[ (sj)2nj ]−(∑sj)2n Unexplained variation: sum of squares representing variation within samples due to chance: SSwithin=SStotal-SSbetweenSSwithin=SStotal-SSbetween df's for different groups (df's for the numerator): df = k - 1 Equation for errors within samples (df's for the denominator): dfwithin = n - k Mean square (variance estimate) explained by the different groups: MSbetween = SSbetweendfbetweenSSbetweendfbetween Mean square (variance estimate) that is due to chance (unexplained): MSwithin = SSwithindfwithin

linear equation

y=a+bx x is the independent variable, and y is the dependent variable.

hypothesis test with correlation r

ρ = population correlation coefficient (unknown) r = sample correlation coefficient (known; calculated from sample data) The hypothesis test lets us decide whether the value of the population correlation coefficient ρ is "close to zero" or "significantly different from zero". We decide this based on the sample correlation coefficient r and the sample size n. If the test concludes that the correlation coefficient is significantly different from zero, we say that the correlation coefficient is "significant." Conclusion: There is sufficient evidence to conclude that there is a significant linear relationship between X1 and X2 because the correlation coefficient is significantly different from zero. What the conclusion means: There is a significant linear relationship X1 and X2. If the test concludes that the correlation coefficient is not significantly different from zero (it is close to zero), we say that correlation coefficient is "not significant". Performing the Hypothesis Test Null Hypothesis: H0: ρ = 0 Alternate Hypothesis: Ha: ρ ≠ 0 What the Hypotheses Mean in Words Null Hypothesis H0: The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between X1 and X2 in the population. Alternate Hypothesis Ha: The population correlation coefficient is significantly different from zero. There is a significant linear relationship (correlation) between X1 and X2 in the population.

F-Ratio Formula when the groups are the same size

F=n⋅sx-2s2pooledF=n⋅sx-2s2pooled where ... n = the sample size dfnumerator = k - 1 dfdenominator = n - k s2 pooled = the mean of the sample variances (pooled variance) sx-2sx-2 = the variance of the sample means

hypotheses for test of homogenity

H0: The distributions of the two populations are the same. Ha: The distributions of the two populations are not the same.

What does MS mean

MS means "mean square." MSbetween is the variance between groups, and MSwithin is the variance within groups.

variables in test of independence

O = observed values E = expected values i = the number of rows in the table j = the number of columns in the table

tail for goodness of fit test

The goodness-of-fit test is almost always right-tailed.

hypotheses for chi square

The null and the alternative hypotheses for this test may be written in sentences or may be stated as equations or inequalities.

purpose of one way ANOVA test

The purpose of a one-way ANOVA test is to determine the existence of a statistically significant difference among several group means. The test actually uses variances to help determine if the means are equal or not.

If null is true in ANOVA

The variance of the combined data is approximately the same as the variance of each of the populations.

b is the symbol for Slope

The word coefficient will be used regularly for the slope, because it is a number that will always be next to the letter "x." It will be written as b1b1 when a sample is used, and β1β1 will be used with a population or when writing the theoretical linear model.

how many degrees of freedom in F ratio

There are two sets of degrees of freedom; one for the numerator and one for the denominator.

linear

a model that takes data and regresses it into a straight line equation.

multivariate

a system or model where more than one independent variable is being used to predict an outcome. There can only ever be one dependent variable, but there is no limit to the number of independent variables.

what is a test for homogenity

a test used to draw a conclusion about whether two populations have the same distribution. The degrees of freedom used equals the (number of columns - 1).

what does The correlation coefficient, r, tells us

about the strength and direction of the linear relationship between X1 and X2

what does the anova test determine

if several population means are equal.

variance

mean of the squared deviations from the mean; the square of the standard deviation. The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and one.

Sum of Squared Errors (SSE)

the calculated value from adding up all the squared residual terms. The hope is that this value is very small when creating a model.

SSbetween

the sum of squares that represents the variation among the different samples

SSwithin

the sum of squares that represents the variation within samples that is due to chance.

Residual or "error"

the value calculated from subtracting y0−yˆ0=e0. The absolute value of a residual measures the vertical distance between the actual value of y and the estimated value of y that appears on the best-fit line.

bivariate

two variables are present in the model where one is the "cause" or independent variable and the other is the "effect" of dependent variable.


Related study sets

Saunders NCLEX Review Endocrine content

View Set

Unfair Claim Settlement Practices class3

View Set

CGSS Chapter 1: Governance and Enforcement

View Set

Life Insurance Policy Provisions, Options and Riders

View Set

Quant Methods in Psych Exam Conceptual Final

View Set