Parametric & Non-Parametric Statistics T-Test, Chi-Square

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Parametric Assumptions

- Parametric tests involve estimating parameters such as the mean - The observations must be independent - The observations must be drawn from normally distributed populations - These populations must have the same variances

Paired and Not Paired Comparisons

- if you have the same sample measured on two separate occasions then this is a paired comparison. - Two independent samples is not a paired comparison - Different samples which are "matched" by age and gender are paired

Pearson Correlation Coefficient

- most widely used - correlation coefficient determines the extent to which values of two variables are "proportional" to each other - Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). It shows the linear relationship between two sets of data. In simple terms, it answers the question, Can I draw a line graph to represent the data? Two letters are used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter "r" for a sample.

T-test

- used to test whether the mean of a sample is signifciantly differenty from a hypothesized same mean (1-sample t). - Used to test whether there is a difference between 2 sample/population means (2-sample t) -T-test relies on the sample being drawin from a normally distributed population - If sample not normal then use th Wilcoxon Signed Rank Test as an alternative

When to use a non-parametric test

Non parametric tests are used when your data isn't normal. Therefore the key is to figure out if you have normally distributed data. For example, you could look at the distribution of your data. If your data is approximately normal, then you can use parametric statistical tests. Does your data allow for a parametric test, or do you have to use a non parametric test like chi-square? The rule of thumb is: For nominal scales or ordinal scales, use non parametric statistics. For interval scales or ratio scales use parametric statistics. A skewed distribution is one reason to run a nonparametric test. Other reasons to run nonparametric tests: One or more assumptions of a parametric test have been violated. Your sample size is too small to run a parametric test. Your data has outliers that cannot be removed. You want to test for the median rather than the mean (you might want to do this if you have a very skewed distribution).

Reasons for non-parametric tests

Non-parametric (NP) tests were developed for situations where fewer basic statistical assumptions have to be met (particularly around normality) NP tests still have assumptions but are less stringent NP tests can be applies to normal data but parametric tests have greater power if assumptions are met NP's use the rank of values rather than the actual values ex: 1,2,3,4,5,7,13,22,38,45 - actual 1,2,3,4,5,6,7,8,9,10 - Rank Do not require the population from which stat is draw to first a particular parameter - however, there is often less power and a greater sample size is needed

Alternative Correlation Coefficients

Spearman rho: For rank/ordinal DV Cramer's V: Y is categorical or dchiotomous Point Biserial r: one true dichotomous variable Phi Coefficient: Both variables are not dichotomous

Mann-Whitney test

The Mann-Whitney U test is the nonparametric equivalent of the two sample t-test. While the t-test makes an assumption about the distribution of a population (i.e. that the sample came from a t-distributed population), the Mann Whitney U Test makes no such assumption. used when we want to compare two unrelated or independent groups for parametric data you use the unpaired independent samples t-test

Spearman Rank Correlation

The Spearman rank correlation coefficient, rs, is the nonparametric version of the Pearson correlation coefficient. Your data must be ordinal, interval or ratio. Spearman's returns a value from -1 to 1, where:+1 = a perfect positive correlation between ranks-1 = a perfect negative correlation between ranks0 = no correlation between ranks.

Non-parametric version of parametric tests

Wilcoxon-signed rank test for simple sample t-test Paired Wilcoxon-signed rank for paired sample t-test Mann-Whitnet tests for 2 independent samples t-test Kruskal-Wallis for One-way Analysis of Variance Spearman rank for Pearson's correlation Friedman for Repeated Measures Cohen's Kappa is a NP for inter-rater agreement Kolmogorov-Smirnov is a NP test for normality Chi-Square is an NP test for contingencies

chi-square (x^2)

a nonparametric, inferential statistic that tests whether the frequencies of responses in our sample represent certain frequencies in the population; used with nominal data A chi-square test for independence compares two variables in a contingency table to see if they are related. In a more general sense, it tests to see whether distributions of categorical variables differ from each another. A very small chi square test statistic means that your observed data fits your expected data extremely well. In other words, there is a relationship. A very large chi square test statistic means that the data does not fit very well. In other words, there isn't a relationship.

Chi Square ctd.

A NP test for statistical significance Tests differences between frequencies - compares observed and expected frequencies Utilizes nominal data - frequencies obtained from tallying number of cases in each category Null hypothesis: no difference between groups in the population A chi square test will give you a p-value. The p-value will tell you if your test results are significant or not. In order to perform a chi square test and get the p-value, you need two pieces of information: Degrees of freedom. That's just the number of categories minus 1. Tip: The Chi-square statistic can only be used on numbers. They can't be used for percentages, proportions, means or similar statistical value. For example, if you have 10 percent of 200 people, you would need to convert that to a number (20) before you can run a test statistic.

What are non-parametric tests?

A non parametric test (sometimes called a distribution free test) does not assume anything about the underlying distribution (for example, that the data comes from a normal distribution). That's compared to parametric test, which makes assumptions about a population's parameters (for example, the mean or standard deviation); When the word "non parametric" is used in stats, it doesn't quite mean that you know nothing about the population. It usually means that you know the population data does not have a normal distribution. For example, one assumption for the one way ANOVA is that the data comes from a normal distribution. If your data isn't normally distributed, you can't run an ANOVA, but you can run the nonparametric alternative—the Kruskal-Wallis test. If at all possible, you should use parametric tests, as they tend to be more accurate. Parametric tests have greater statistical power, which means they are likely to find a true significant effect. Use nonparametric tests only if you have to (i.e. you know that assumptions like normality are being violated). Nonparametric tests can perform well with non-normal continuous data if you have a sufficiently large sample size (generally 15-20 items in each group).

Paired t-test

A paired t test (also called a correlated pairs t-test, a paired samples t test or dependent samples t test) is where you run a t test on dependent samples. Dependent samples are essentially connected — they are tests on the same person or thing. For example: Knee MRI costs at two different hospitals, Two tests on the same person before and after training, Two blood pressure measurements on the same person using different equipment. Choose the paired t-test if you have two measurements on the same item, person or thing. You should also choose this test if you have two items that are being measured with a unique condition. For example, you might be measuring car safety performance in Vehicle Research and Testing and subject the cars to a series of crash tests. Although the manufacturers are different, you might be subjecting them to the same conditions. With a "regular" two sample t test, you're comparing the means for two different samples. For example, you might test two different groups of customer service associates on a business-related test or testing students from two universities on their English skills. If you take a random sample each group separately and they have different conditions, your samples are independent and you should run an independent samples t test (also called between-samples and unpaired-samples).

Advantage and disadavantages of paired samples t-test

Disadvantage: Assumes data are a random sample from a population which is normally distributed Advantage: Uses all detail of the available data, and if the data are normally distributed it is the most powerful test

The Wilcoxon Signed Rank Test for Paired Comparisons

The Wilcoxon signed rank test (also called the Wilcoxon signed rank sum test) is a non-parametric test. When the word "non-parametric" is used in stats, it doesn't quite mean that you know nothing about the population. It usually means that you know the population data does not have a normal distribution. The Wilcoxon signed rank test should be used if the differences between pairs of data are non-normally distributed. Two slightly different versions of the test exist: The Wilcoxon signed rank test compares your sample median against a hypothetical median. The Wilcoxon matched-pairs signed rank test computes the difference between each set of matched pairs, then follows the same procedure as the signed rank test to compare the sample against some median. The term "Wilcoxon" is often used for either test. This usually isn't confusing, as it should be obvious if the data is matched, or not matched. The null hypothesis for this test is that the medians of two samples are equal. It is generally used: As a non-parametric alternative to the one-sample t test or paired t test. For ordered (ranked) categorical variables without a numerical scale.

Uses of Chi-s

The chi-squared distribution has many uses in statistics, including: Confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. Independence of two criteria of classification of qualitative variables. Relationships between categorical variables (contingency tables). Sample variance study when the underlying distribution is normal. Tests of deviations of differences between expected and observed frequencies (one-way tables). The chi-square test (a goodness of fit test).

Degrees of Freedom

The number of individual scores that can vary without changing the sample mean. Statistically written as 'N-1' where N represents the number of subjects. Degrees of freedom of an estimate is the number of independent pieces of information that went into calculating the estimate. It's not quite the same as the number of items in the sample. In order to get the df for the estimate, you have to subtract 1 from the number of items. Let's say you were finding the mean weight loss for a low-carb diet. You could use 4 people, giving 3 degrees of freedom (4 - 1 = 3), or you could use one hundred people with df = 99. Degrees of Freedom: What are they? Share on Degrees of freedom are used in hypothesis testing.Contents (click to skip to that section): What are Degrees of Freedom? DF: Two Samples Degrees of Freedom in ANOVA Why Do Critical Values Decrease While DF Increase? What are Degrees of Freedom? Degrees of freedom in the left column of the t distribution table. Degrees of freedom of an estimate is the number of independent pieces of information that went into calculating the estimate. It's not quite the same as the number of items in the sample. In order to get the df for the estimate, you have to subtract 1 from the number of items. Let's say you were finding the mean weight loss for a low-carb diet. You could use 4 people, giving 3 degrees of freedom (4 - 1 = 3), or you could use one hundred people with df = 99. In math terms (where "n" is the number of items in your set): Degrees of Freedom = n - 1 Why do we subtract 1 from the number of items? Another way to look at degrees of freedom is that they are the number of values that are free to vary in a data set. What does "free to vary" mean? Here's an example using the mean (average):Q. Pick a set of numbers that have a mean (average) of 10.A. Some sets of numbers you might pick: 9, 10, 11 or 8, 10, 12 or 5, 10, 15.Once you have chosen the first two numbers in the set, the third is fixed. In other words, you can't choose the third item in the set. The only numbers that are free to vary are the first two. You can pick 9 + 10 or 5 + 15, but once you've made that decision you must choose a particular number that will give you the mean you are looking for. So degrees of freedom for a set of three numbers is TWO.

T-Score

The t score is a ratio between the difference between two groups and the difference within the groups. The larger the t score, the more difference there is between groups. The smaller the t score, the more similarity there is between groups. A t score of 3 means that the groups are three times as different from each other as they are within each other. When you run a t test, the bigger the t-value, the more likely it is that the results are repeatable. A large t-score tells you that the groups are different. A small t-score tells you that the groups are similar. How big is "big enough"? Every t-value has a p-value to go with it. A p-value is the probability that the results from your sample data occurred by chance. P-values are from 0% to 100%. They are usually written as a decimal. For example, a p value of 5% is 0.05. Low p-values are good; They indicate your data did not occur by chance. For example, a p-value of .01 means there is only a 1% probability that the results from an experiment happened by chance. In most cases, a p-value of 0.05 (5%) is accepted to mean the data is valid.

Types of T-Tests

There are three main types of t-test: An Independent Samples t-test compares the means for two groups. A Paired sample t-test compares means from the same group at different times (say, one year apart). A One sample t-test tests the mean of a single group against a known mean.


संबंधित स्टडी सेट्स

Unit 2, challenge 1, 2, 3: What is Sociology

View Set

Chapter 14 Gene Regulation in Bacteria

View Set

Chapter 7 Trust Justice & Ethics

View Set

micro chapter 10, mirco chapter 11, micro chapter 12

View Set

Biology 1610 Ch 11 Learning curve and quiz

View Set

Info Tech Data quiz, Info Tech Hardware Quiz, Info Tech People Quiz, Info Tech Software Quiz, Info Tech Process Quiz

View Set