Stats
A two way ANOVA can also be called?
A factorial ANOVA
How is ANOVA different from T-test?
A hypothesis test that is used to compare the means of two samples is called t-test. A statistical technique that is used to compare the means of more than two samples is known as Analysis of Variance or ANOVA
What is ANOVA short for?
ANalysis Of VAriance.
what's another simple way to see if the variances are equal?
Doing a box plot
What's the equation for F?
F = variation between sample means / variation within the samples
ANCOVA contains at least one...
continous independnent variable and at least one nominal variable
What's a covariatE?
is a continuous variable which is not part of the experimental manipulation but still has an effect on the dependent variable
Give an example of 1) one sample t-test
1) Suppose you are interested in determining whether an assembly line produces laptop computers that weigh five pounds. To test this hypothesis, you could collect a sample of laptop computers from the assembly line, measure their weights, and compare the sample with a value of five using a one-sample t-test.
What does an ANCOVA examine?
ANCOVA examines the influence of an independent variable on a dependent variable while removing the effect of the covariate factor.
Why does ANOVA use F-TEST?
ANOVA uses the F-test to determine whether the variability between group means is larger than the variability of the observations within the groups. If that ratio is sufficiently large, you can conclude that not all the means are equal.
difference between regression analysis and ANOVA
A very simple explanation is that regression is the statistical model that you use to predict a continuous outcome on the basis of one or more continuous predictor variables. In contrast, ANOVA is the statistical model that you use to predict a continuous outcome on the basis of one or more categorical predictor variables.
An ANCOVA is a mix of two statistical tests! What are they? and how? GRAPH!
ANCOVA can have the independent samples be NOMINAL and CONTINUOUS. So there are a mix of an ANOVA and a REGRESSION!
The covariate can also be called... two different names
Confounding factor, or concomitant variable
Compare continuous to discrete data
Continuous data can take on any value within a range (income, height, weight, etc.). The opposite of continuous data is discrete data, which can only take on a few values (Low, Medium, High, etc.).
What can you do if normality is not present?
Remove outliers
The ANCOVA is most useful in that it.... 1) 2)
The ANCOVA is most useful in that it (1) explains an ANOVA's within-group variance, and (2) controls confounding factors.
What does the model "independent samples t-test" assumes?
The model assumes that a difference in the mean score of the dependent variable is found because of the influence of the independent variable.
How to obtain best fit line (Value of m and b)?
This task can be easily accomplished by Least Square Method. It is the most common method used for fitting a regression line. It calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Because the deviations are first squared, when added, there is no cancelling out between positive and negative values.
How do you do a box plot?
You find the 25th and 75th centile and those are your box extremes. Then find the median that's the line in the box. Then stretch the sides of the box to the lowest and higher value. TARAN!
What's the rule of thumb
the largest SD should be smaller than 2 time the smallest SD
What's kurtosis?
the sharpness of the peak of a frequency-distribution curve. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers.
How is the independent t-test also called?
unpaired t-test
How can you test for normality? and homogeneity of variance and independence in ANOVA?
These assumptions can be tested using statistical software. The assumption of homogeneity of variance can be tested using tests such as Levene's test or the Brown-Forsythe Test. Normality of the distribution of the scores can be tested using histograms, the values of skewness and kurtosis, or using tests such as Shapiro-Wilk or Kolmogorov-Smirnov. The assumption of independence can be determined from the design of the study.
What do you need in order to find R square? What does that tell you?
You need R, correlation coefficient. R tells you how strong of a linear relationship there is between the 2 variables.
What's F statistic?
is simply a ratio of two variances. The F-test can assess the equality of variances.
How many t-test are there? Explain!
3 t-tests. 1) one sample t -test: compares a sample mean to a known population mean 2) An Independent Samples T-Test: compares two sample means from different populations regarding the same variable 3) A Paired Samples T-Test: compares two sample means from the same population regarding the same variable at two different times such as during a pre-test and post-test, or it compares two sample means from different populations whose members have been matched.
Explain one way ANOVA and give an example
A one-way ANOVA has just one independent variable. For example, difference in IQ can be assessed by Country, and Country can have 2, 20, or more different categories to compare.
What's the difference between a parametric and non parametric test? 7
A statistical test, in which specific assumptions are made about the population parameter is known as the parametric test. A statistical test used in the case of non-metric independent variables is called nonparametric test. In the parametric test, the test statistic is based on distribution. On the other hand, the test statistic is arbitrary in the case of the nonparametric test. In the parametric test, it is assumed that the measurement of variables of interest is done on interval or ratio level. As opposed to the nonparametric test, wherein the variable of interest are measured on nominal or ordinal scale. In general, the measure of central tendency in the parametric test is mean, while in the case of the nonparametric test is median. In the parametric test, there is complete information about the population. Conversely, in the nonparametric test, there is no information about the population. The applicability of parametric test is for variables only, whereas nonparametric test applies to both variables and attributes. For measuring the degree of association between two quantitative variables, Pearson's coefficient of correlation is used in the parametric test, while spearman's rank correlation is used in the nonparametric test.
Explain a two way ANOVA and give an example
A two-way ANOVA refers to an ANOVA using two independent variables. Expanding the example above, a 2-way ANOVA can examine differences in IQ scores (the dependent variable) by Country (independent variable 1) and Gender (independent variable 2). Two-way ANOVA can be used to examine the interaction between the two independent variables. Interactions indicate that differences are not uniform across all categories of the independent variables. For example, females may have higher IQ scores overall compared to males, but this difference could be greater (or less) in European countries compared to North American countries. Two-way ANOVAs are also called factorial ANOVAs.
What are the steps that an ANCOVA follows?
ANCOVA first conducts a regression of the independent variable (i.e., the covariate) on the dependent variable. The residuals (the unexplained variance in the regression model) are then subject to an ANOVA. Thus the ANCOVA tests whether the independent variable still influences the dependent variable after the influence of the covariate(s) has been removed.
Lord's paradox, and what statistical analysis does it describe?
ANCOVA: Lord posits two statisticians who use different, but respected statistical methods, to reach opposite conclusions about the effects of the diet provided in the university dining halls on students' weights. One statistician does not adjust for initial weight and finds no significant difference between dining halls. "...as far as these data are concerned, there is no evidence of any interesting effect of diet (or of anything else) on student weights. In particular, there is no evidence of any differential effect on the two sexes, since neither group shows any systematic change." (Lord 1967, p. 305) The second statistician adjusts for initial weight and finds a significant difference between the two dining halls.
How does ANCOVA increases the power of the ANOVA?
ANOVA, the analysis of variance splits the total variance of the dependent variable into: 1. Variance explained by the independent variable (also called between groups variance) 2. Unexplained variance (also called within group variance) The ANCOVA looks at the unexplained variance and tries to explain some of it with the covariate(s). Thus it increases the power of the ANOVA by explaining more variability in the model.
Why do we have covariates?
Any factor that we put it to pull out variation to allow us to BETTER see our treatment/mean effect. It's a pre condition that may affect my y variable (dependent variable) so we wanna factor it out so we can actually see the treatment effect! we want to be able to factor out the effect of size by putting it in the model. Allow us to see the treatment effect.
What are some graph tests, analysis and statistical tests used to test for normality?
Graphs: histograms, Q-Q-scatterplot Analysis: Skewness and Kurtosis Statistical test: Chi-square, Kolmogorov-Smironov, Shapiro-Wilk,
What are the statistical hypotheses of regression model?
Ho: There is not a significant linear relationship between the dependent (X) and the independent (y) variables. The slope will be equal to 0. Ha: There is a significant linear relationship the dependent (X) and the independent (y) variables. The slope will not be equal to 0.
To increase the power of the test, should you use 1tail or 2 tails? t-test
If the direction of the difference does not matter, a two-tailed hypothesis is used. Otherwise, an upper-tailed or lower-tailed hypothesis can be used to increase the power of the test.
What factors do you have in regression?
In regression analysis, those factors are called variables. You have your dependent variable — the main factor that you're trying to understand or predict. And then you have your independent variables — the factors you suspect have an impact on your dependent variable.
What are two benefits of working with regressions?
It indicates the significant relationships between dependent variable and independent variable. It indicates the strength of impact of multiple independent variables on a dependent variable.
What if you violate the ANOVA assumptions? when is it a big deal? and when is not?
It is important to note that ANOVA is not robust to violations to the assumption of independence. This is to say, that even if you violate the assumptions of homogeneity or normality, you can conduct the test and basically trust the findings. However, the results of the ANOVA are invalid if the independence assumption is violated. In general, with violations of homogeneity the analysis is considered robust if you have equal sized groups. With violations of normality, continuing with the ANOVA is generally ok if you have a large sample size.
What does R square or the coefficient of determination tell you?
It tell you how many points fall within the regression line
What's skewness
It's the measurement of the symmetry of the frequency-distribution curve.
What's kurtosis
Kurtosis is the measurement of the sharpness of the peak in the frequency distribution curve. Kurtosis measures wether the data is heavy tailed or light tailed. if it's heavy tailed it tends to have outliers, if its light tailed, it tends to not have outliers.
What are the assumptions in ANOVA?
Like the t-test, ANOVA is also a parametric test and has some assumptions. 1) ANOVA assumes that the data is normally distributed. 2) The ANOVA also assumes homogeneity of variance, which means that the variance among the groups should be approximately equal. 3) ANOVA also assumes that the observations are independent of each other.
Linear regression
Linear Regression establishes a relationship between dependent variable (Y) and one or more independent variables (X) using a best fit straight line (also known as regression line). In this technique, the dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature of regression line is linear.
Give an example for an ANOVA
Medicine - Does a drug work? Does the average life expectancy significantly differ between the three groups that received the drug versus the established product versus the control?
Give an example of 2) independent samples t-test
Medicine - Has the quality of life improved for patients who took drug A as opposed to patients who took drug B? Sociology - Are men more satisfied with their jobs than women? Do they earn more? Biology - Are foxes in one specific habitat larger than in another?
What happens as the more covariates are introduced into ANCOVA? what if the covariates are week?
Note that just like in regression analysis and all linear models, over-fitting might occur. That is, the more covariates you enter into the ANCOVA, the more variance it will explain, but the fewer degrees of freedom the model has. Thus entering a weak covariate into the ANCOVA decreases the statistical power of the analysis instead of increasing it.
What's the Pearson's correlation analysis?
R = The Pearson product-moment correlation coefficient is a measure of the strength of the linear relationship between two variables. It is referred to as Pearson's correlation or simply as the correlation coefficient. Example: was introduced to measure the correlation between fish size and heavy metal concentration.
We can evaluate the model performance using the metric....... also called.....
R squared, also called coefficient of determination
How is regression helpful?
Regression is helpful because it helps you in sorting out which variables do indeed have an impact. It answers the questions: Which factors matter most? Which can we ignore? How do those factors interact with each other? And, perhaps most importantly, how certain are we about all of these factors?
What are the 6 assumptions of ANCOVA?
Regular ANOVA 1) Independent observations 2) Normally distributed error 3) homogeneity of variances ADDITIONAL: 4) Linearity 5) Homogeneity of Regression slopes 6) Covariate is measure without error
Give an example of 3) paired samples t-test
Suppose you are interested in evaluating the effectiveness of a company training program. One approach you might consider would be to measure the performance of a sample of employees before and after completing the program, and analyze the differences using a paired sample t-test.
Why is the ANOVA test a popular test in experiments?
The ANOVA is a popular test; it is the test to use when conducting experiments. This is due to the fact that it only requires a nominal scale for the independent variables - other multivariate tests (e.g., regression analysis) require a continuous-level scale.
What's a levene test used for?
The Levene test can be used to verify that assumption that variances are equal across groups or samples.
How are the Levene and the bartlett test different?
The Levene test is less sensitive than the Bartlett test to departures from normality. If you have strong evidence that your data do in fact come from a normal, or nearly normal, distribution, then Bartlett's test has better performance.
What's the difference between simple linear regression and multiple linear regression?
The difference between simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable.
Explain Variation between sample means (the nominator to find F)
The group means are: 11.203, 8.938, 10.683, and 8.838. These group means are distributed around the overall mean for all 40 observations, which is 9.915. If the group means are clustered close to the overall mean, their variance is low. However, if the group means are spread out further from the overall mean, their variance is higher. Clearly, if we want to show that the group means are different, it helps if the means are further apart from each other. In other words, we want higher variability among the means.
One way ANCOVA has to have at least three variables, which ones are they?
The independent variable, which groups the cases into two or more groups. The independent variable has to be at least of nominal scale. The dependent variable, which is influenced by the independent variable. It has to be of continuous-level scale (interval or ratio data). Also, it needs to be homoscedastic and multivariate normal. The covariate, or variable that moderates the impact of the independent on the dependent variable. The covariate needs to be a continuous-level variable (interval or ratio data).
Do we need high or low F value to reject the null hypothesis? explain with a graph
The low F-value graph shows a case where the group means are close together (low variability) relative to the variability within each group. The high F-value graph shows a case where the variability of group means is large relative to the within group variability. In order to reject the null hypothesis that the group means are equal, we need a high F-value.
What's the main purpose of ANOVA?
The main purpose of an ANOVA is to test if two or more groups differ from each other significantly in one or more characteristics.
What's the procedure of an ANOVA?
The null hypothesis for an ANOVA is that there is no significant difference among the groups. The alternative hypothesis assumes that there is at least one significant difference among the groups. After cleaning the data, the researcher must test the assumptions of ANOVA. They must then calculate the F-ratio and the associated probability value (p-value). In general, if the p-value associated with the F is smaller than .05, then the null hypothesis is rejected and the alternative hypothesis is supported. If the null hypothesis is rejected, one concludes that the means of all the groups are not equal. Post-hoc tests tell the researcher which groups are different from each other.
What does the p-value mean in a t-test?
The p-value gives the probability of observing the test results under the null hypothesis. The lower the p-value, the lower the probability of obtaining a result like the one that was observed if the null hypothesis was true. Thus, a low p-value indicates decreased support for the null hypothesis.
what does alpha=0.05
This corresponds to a 5% (or less) chance of obtaining a result like the one that was observed if the null hypothesis was true.
What statistical test allows us to do post hoc tests in ANCOVA?
This dialog also allows us to add post hoc procedures to the one-way ANCOVA. We can choose between Bonferroni, LSD and Sidak adjustments for multiple comparisons of the covariates.
Explain variation within the samples (the denominator to find F)
To calculate this variance, we need to calculate how far each observation is from its group mean for all 40 observations. Technically, it is the sum of the squared deviations of each observation from its group mean divided by the error DF. If the observations for each group are close to the group mean, the variance within the samples is low. However, if the observations for each group are further from the group mean, the variance within the samples is higher. If we're hoping to show that the means are different, it's good when the within-group variance is low. You can think of the within-group variance as the background noise that can obscure a difference between means.
What if you find a difference in ANOVA? what do you do?
When you conduct an ANOVA, you are attempting to determine if there is a statistically significant difference among the groups. If you find that there is a difference, you will then need to examine where the group differences lay. At this point you could run post-hoc tests which are t tests examining mean differences between the groups. There are several multiple comparison tests that can be conducted that will control for Type I error rate, including the Bonferroni, Scheffe, Dunnet, and Tukey tests.
What if you violate the assumption of independence, homogeneity across variances or normality? what can you do? what makes it less bad? IN ANOVA
You can't do anything if you violate the independence assumption, this is CRITICAL. homogeneity across variances is not critical if you have EQUAL SIZE GROUPS! If you fail normality, it's not that bad if you have a LARGE Sample size! so just add more organisms to your sample size :)
What's another name for the paired sample t-test?
dependent sample t-test
T-test: Are extremes always outliers? What's a good technique to check for outliers?
just because a value is extreme does not make it an outlier. Let's suppose that our laptop assembly machine occasionally produces laptops which weigh significantly more or less than five pounds, our target value. In this case, these extreme values are absolutely essential to the question we are asking and should not be removed. Box-plots are useful for visualizing the variability in a sample, as well as locating any outliers.
What are the types of ANOVAs?
one-way ANOVA, two-way ANOVA, and N-way ANOVA.
Why for humans STDV is easier to understand than variance?
standard deviations are easier to understand than variances because they're in the same units as the data rather than squared units.
The levenes test is a alternative test to....
the Bartlett test.
In ANOVA, the dependent variable must be....? and the independent variable must be....??
the dependent variable must be continuous (interval or ratio) level of measurement. and the independent variable must be categorical (nominal or ordinal) variables.
What graph test can be tested for normality on a t-test?
the simplest is to inspect the data visually using a histogram or a Q-Q scatterplot. Real-world data are almost never perfectly normal, so this assumption can be considered reasonably met if the shape looks approximately symmetric and bell-shaped.
What's a COVARIATE?
the sum of the product of deviation of x from its mean and deviation of Y from its mean
F test and F-statistics are useful for multiple things, describe:
you can use F-statistics and F-tests to test the overall significance for a regression model, to compare the fits of different models, to test specific regression terms, and to test the equality of means.
What are the assumptions of a t- test?
• The dependent variable must be continuous or ordinal (interval/ratio). • The observations are independent of one another (Independence of observations is usually not testable, but can be reasonably assumed if the data collection process was random) •third assumption is that the data, when plotted, results in a normal distribution, bell-shaped distribution curve. • The final assumption is homogeneity of variance. Homogeneous, or equal, variance exists when the standard deviations of samples are approximately equal.
Explain each of the 6 assumptions of ANCOVA
1) Independent samples: no relationship between individual measurements 2) normally distributed error: falls a normal distribution. the f distribution is based on a normal distribution and so if the scores don't follow the f test is not appropriate, P box can resolve when this assumption is violated. 3) the variance and standard deviation are equal for each group. You can test for this using the levene's test or the rule of thumb. 4) linearity: the relationship between the dependent and independent variable should be linear. Can check this assumption by using a scatterplot. can happen when there is no randomization 5) Homogeneity of Regression slopes 6) Covariate is measure without error
What are two methods to determine that you reject the null hypothesis at alpha 0.05
1) look at the significant value, if it's less than 0.05, then you reject the null hypothesis 2) If the test statistic is higher than the critical value, we also reject the null hypothesis
What are uses of ANCOVA?
1) using of continuous variable as our covariate to pull out variation to allow us to better see our treatment/mean effect 2) is looking at differential slopes of response (size by treatment interaction in my case)
Explain an N-ANOVA and give an example
A researcher can also use more than two independent variables, and this is an n-way ANOVA (with n being the number of independent variables you have). For example, potential differences in IQ scores can be examined by Country, Gender, Age group, Ethnicity, etc, simultaneously.
What does the independent samples t-test tell us?
It tells us whether the difference we see between the two dependent samples is a true difference or whether it is just a random effect (statistical artifact) caused by skewed sampling.
What's variance?
Variances are a measure of dispersion, or HOW FAR THE DATA ARE SCATTERED FROM THE MEAN. Larger values represent greater dispersion. Variance is the square of the standard deviation.
What are the 4 hypotheses that can be stated for a one sample t test? and are they one tail or two tails?
• The null hypothesis (\(H_0\)) assumes that the difference between the true mean (\(\mu\)) and the comparison value (\(m_0\)) is equal to zero. • The two-tailed alternative hypothesis (\(H_1\)) assumes that the difference between the true mean (\(\mu\)) and the comparison value (\(m_0\)) is not equal to zero. • The upper-tailed alternative hypothesis (\(H_1\)) assumes that the true mean (\(\mu\)) of the sample is greater than the comparison value (\(m_0\)). • The lower-tailed alternative hypothesis (\(H_1\)) assumes that the true mean (\(\mu\)) of the sample is less than the comparison value (\(m_0\)).
What are the hypotheses for a paired sample t-test? Are they one tail or 2 tails?
• The null hypothesis (\(H_0\)) assumes that the true mean difference (\(\mu_d\)) is equal to zero. • The two-tailed alternative hypothesis (\(H_1\)) assumes that \(\mu_d\) is not equal to zero. • The upper-tailed alternative hypothesis (\(H_1\)) assumes that \(\mu_d\) is greater than zero. • The lower-tailed alternative hypothesis (\(H_1\)) assumes that \(\mu_d\) is less than zero.