Statistics

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

R is sensitive to...

outliers

Wilcoxon matched pairs

paired samples t-test - Test for a significant differences between two paired samples.

An interaction effect is present when the effects of an independent variable (in part) depends on a second independent variable.

Main effects and interaction may, but do not have to be present at the same time. o If no main effect is present, interaction effects are generally easy to observe because the values of the dependent variable can only be explained by the combination of the independent variables. o If a main effect is present together with an interaction effect, then this means that the impact of independent variable A on dependent variable B differs for the different values of a second independent variable.

ANOVA is used when

you compare 3 or more groups

R might be influenced by...

"restricted range"

Formula for deducing SS from SD

(SD)^2 * (n-1)

More specific conclusions of which groups are differing from each others can be based on

subsequent tests

ANOVA tests for differences between at least

2 groups of the set you are testing

Regression line

A line of best fit for a scatterplot

Wilcoxon-Wilcox mulitple comparison

One way ANOVA and Turkey HSD tests - Tests for significant differences among all possible pairs of Independent samples.

R does not tell anything about...

causation; if A and B are correlated it does not automatically mean that A causes B or vice-versa.

Chi-square is used to test

distributions of categorical variables

If a regression line is perfectly horizontal

it means that the variables have zero correlation (line is horizontal in case b = 0 and b = 0 if r = 0, check formula for b)

Correlation is a measure of...

linear relationship between two variables, indicated by r

When performing ANOVA, the F value is a...

ration of the size of the differences between the groups and the size of random differences. An F value of more than 1 thus indicates that the group differences are larger than would be expected on the basis of chance/randomness.

With factorial ANOVA, when there is more than one independent variable, main effects refer to the overall impact of an independent variable on a dependent variable.

A main effect is thus the overall impact of one independent variable on the dependent variable, not taking into account any other independent variable. o To identify it, answer the question: What is the mean impact of variable A on B, not taking into account (combinations with) any other variable

Neg Correlation

An increase in one variable are accompanied by decreased in the other variable. Reg line from upper left to lower right corner.

Positive correlation

Between 2 variables, high measurements in one variable tend to be assocaited with high measurements on the other variable. Low measurements on one variable with low measurent on the other.

Univariate Distribution

Frequency distribution of one variable

Scatterplot

Graph of scores of a bivariate frequency distribution

In multiple correlation and regression, one variable is

In statistics, the coefficient of multiple correlation is a measure of how well a given variable can be predicted using a linear function of a set of other variables. The Sample Multiple Correlation Coefficient, R, is a measure of the strength of the association between the independent (explanatory) variables and the one dependent (prediction) variable .Multiple correlation/regression separates the unique predictive value of each of the predictor variables by discounting the variance the predictor variables have in common with each other

Mann Whitney U

Independent samples T-test. - Test for a significant difference between two independent samples. Is a nonparametric test of the null hypothesis that two samples come from the same population against an alternative hypothesis, especially that a particular population tends to have larger values than the other.

How to interpret R squared for multiple regression?

It is the total amount of explained variance by the entire regression model. In other words, it is the total amount of variance shared by the predicted variable, and all the predictor variables together.

Regarding differences between simple, and multiple linear regression Why is multiple regression useful?

It is useful because it makes it possible to base predictions on multiple variables, which means that more data or knowledge can be included for predicting or explaining, which should lead to better predictions and explanation. Another important aspect of multiple regression is that it is possible to test the explanatory values of different variables in the regression model. By step by step increasing the complexity in the model, it can be seen how much additional explained variance (R-squared) can be attributed to the added variables, and thus we can get to know about the relative importance of different variables/explanations

Regarding differences between simple, and multiple linear regression What does the multiple linear regression equation represent?

It represents the best prediction of a variable based on the combination of a number of linear relationships with other variables. It thus represents the combined prediction or explanation of one variable on the basis of a number of other variables. As this is linear regression it is based on correlations. Graphically it does not look like a line, as a line is based on a single predictor (x), and is thus one-dimensional. In case of 2 predictor variables, the multiple regression can be visually presented as a plane (flat surface) in a 3D space. Regression with more than 2 predictor variables can't be visualized, but are hyper planes in more-than-three-dimensional space

Bivariate Distribution

Joint distribution of two variables, scores are paired. Two variables have values that are paired for some logical reason.

4 Non-Parametric Tests

Mann Whitney U - Independent samples T-test. - Test for a significant difference between two independent samples. Wilcoxon matched pairs - paired samples t-test - Test for a significant differences between two paired samples. Wilcoxon-Wilcox mulitple comparison - One way ANOVA and Turkey HSD tests - Tests for significant differences among all possible pairs of Independent samples. Spearman r(S) - Pearson product-moment corerlation coefficient,r, - describes the degree of correlation between two variables.

R

Measures the degree of linear relationship between two variables, of a bivarnate distribution coefficient of determination is r squared. iT TELLS THE PROPROTION OF THE VARIANCE BETWEEN THE TWO VARIABLES.

Multiple Regression

Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors. More precisely, multiple regression analysis helps us to predict the value of Y for given values of X1, X2, ..., Xk. Here b0 is the intercept and b1, b2, b3, ..., bk are analogous to the slope in linear regression equation and are also called regression coefficients. They can be interpreted the same way as slope. Thus if bi = 2.5, it would indicates that Y will increase by 2.5 units if Xi increased by 1 unit.

Zero correlation

R=0. No linear relationship between two varaibles. Horizontal line though graph.

Spearman r(S)

Pearson product-moment corerlation coefficient,r, - describes the degree of correlation between two variables.

Perfect correlation

R=1. You can have a perfect correlation even if the numbers are not the same. The requirement is that the differences between the pairs of scores all be the same.

R² indicates...

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. R-squared is always between 0 and 100%: 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean. The definition of R-squared is fairly straight-forward; it is the percentage of the response variable variation that is explained by a linear model The amount of variance in one variable that can be explained or predicted by knowing the other (be mindful though, this does not indicate causation)

Comparison of Nonparametric to Parametric Tests

Similarities: - The hypothesis-testing logic is the same for both nonparametric and parametric tests. Both tests yield the probability of the observed data, when the null hypothesis is true. The null hypothesis of the two kinds of tests are difference. - Both non and parametric tests require you to assign participatns randomly to subgroups (or to sample randomly from the population) Differences: - Nonparametric tests do not require the assumptions about the populations that parametric tests require. For example, parametric tests such as t-tests and ANOVA produce accurate probabilities when the populations are normally distributed and have equal variances. Nonparametric tests do not assume the populations have these characteristics. -The null hypothesis for nonparametric tests is that the population distributions are the same. For parametric tests the null hypothesis is usually that the populations means are the same. Because distributions can differ in from, variability, central tendency or all three. The interpretation of a rejection of the null hypothesis may not be quite so clear cut afer a nonparametric test.

Correlation

Statistical technique that describes the degree of relationship between two variables

Non-Parametric Techniques

Statistical tecniques that do not require assumptions about the sampled populations. Provide correct values for the probability of a Type I error regardless of the nature of the populations the sample comes from. Rank can be used. a type I error is the incorrect rejection of a true null hypothesis

Regarding differences between simple, and multiple linear regression What is the main difference?

The main difference is that more than one variable is used for predicting

Quantification

Translating phenomena into number, it promotes better understanding

When doing multiple group comparisons after ANOVA there is a difference between

a-priori specified comparisons and post-hoc comparisons.. A-priori specified tests are based on specific hypotheses specified before running the ANOVA and can be done with a t-test

Post-hoc tests typically

compare each group to each other group, trying to look for which differences turn out to be significant. There is a danger of inflation of the chance of a type I error when doing many comparisons, so with posthoc testing measures have to be taken to keep the chance for a type I error under control; this can be done by using a smaller α-level (Bonferroni correction), or by using a special test that controls the chance for a type I error, like Tukeys HSD.

Non-parametric alternatives to parametric tests

make use of ranked scores.

Non-parametric tests are used

o If the basic assumptions of a test (for instance if groups have unequal variances) are violated. o If the data are of ordinal nature ordinal data. o If testing the distribution of a nominal/categorical variable

Non-parametric tests have less

power than parametric tests if their assumptions have not be violated, so they should not be used if a parametric test (t-test, Anova) can safely be used.

When you do any kind of test, be sure to

state the hypotheses, state the critical value (connected to α), compare your obtained value (of t, χ², etc.) to the critical value, and then to write the conclusion. People often forget one or more of those steps because it is so repetitious. You have to be complete however so on exams it is important to write it all down.

The reason we have so many different statistical tests is that each is most (or only) appropriate to deal with specific kind of data: t, Anova, Correlation, regression, Chi-square

t and ANOVA test how one categorical variable (group) impacts on a continuous variable o Correlation and regression deal with relationships between two (or more) continuous variables o Chi-square tests for relationships between categorical variables o Other non-parametric tests are used to draw conclusions about ordinal variables

In a regression equation Y = b∙X + a b indicates...

the slope (steepness) of the line, a is the intercept (value of Y when X would be 0)

The regression line is such that it minimizes the

the squared deviation of data points to the line (= minimizes the squared differences between predicted and actual values; it is thus the best overall prediction possible for the given dataset)

ANOVA is based on the assumptions that

the variances are equal in those populations. ANOVA is robust however to mild violation of these assumptions, meaning that it will still give reliable results.

A low r does not mean that...

there is no relationship. R indicates only linear relationships, but non-linear relationships possibly exist.

Non-parametric means that

these tests don't test for differences parameters, such as the mean and standard deviation or variance. Instead they test for differences in distribution between populations. When performing a nonparametric test as alternative to a parametric test, the original data were on an interval or ratio scale. Any differences in distribution are most likely to be related to differences in the means of these populations, so usually non parametric tests can provide indirect information about differences in means though they do not directly test for it.

Regression lines are used

to make predictions


संबंधित स्टडी सेट्स

Chapter 6: Markets, Equilibrium, and Prices

View Set

Study Guide 3 Marketing management

View Set

BUS 312 Chapter 11 Practice Test

View Set

Chapter 14 Transactions and Locking SQL

View Set

Chapter 3 - The Organic Molecules of Life

View Set