Statistics Exam 4

Ace your homework & exams now with Quizwiz!

Know the notation for population and sample mean

"x-bar" = sample mean. |x (flip this)population: mu. It's a strange u.

Know the notation for population and sample standard deviation

(Population = o and standard deviation = s)For sample = s = look at formula! data minus means squared divided by (the number of data minus one)

Know the advantages and disadvantages of a nonparametric test

- Advantages: 1. because nonparametric test have less rigid requirements than parametric tests, they can be applied to a wider variety of situations. 2. Nonparametric tests can be applied to more data types than parametric tests. For example, nonparametric tests can be used with data consisting of ranks, and they can be used with categorical data, such as genders of survey respondents. - Disadvantages: 1. Nonparametric tests tend to waste information because exact numerical data are often reduced to a qualitative form. 2. Nonparametric tests are not as efficient as parametric tests, so a parametric tests generally needs stronger evidence (such as a larger sample or greater differences) in order to reject a null hypothesis.

Be able to state statistical when we declare there to be a significantly high or low number of successes (i.e. The Rare Event Rule)

- Significantly high or low number of successes: x successes among n trials is a significantly high number of successes if the probability of x or more successes is unlikely with a probability of 0.05 or less. If its rare, then the "treatment" appears to be effective. But if its something like an assumption, then it is probably not correct.

Know what range of numbers r can have

- The value of r is always between -1 and 1

Know the advantages and disadvantages of rank correlation versus linear correlation

- advantages: rank correlation can be used with paired data that are ranks or can be converted to ranks. Unlike the parametric methods of Chapter 10, the method of rank correlation does not require a normal distribution for any population. - Also, can be used for ordinal data - Rank correlation can be used to detect some (not all) relationships that are not linear. Disadvantages: efficiency rating of 0.91.

Understand what it means to reject the null hypothesis in a One-Way ANOVA.

- if the p-value is 0.05 or less, reject the null hypothesis Null hypothesis: Ho: u1 = u2 = u3 H1: at least one of the means is different than the others When we conclude that there is sufficient evidence to reject the claim of equal population means, we cannot conclude from ANOVA that any particular mean is different from the others. (There are several other methods that can be used to identify the specific means that are different, and some of them are discussed in Part 2 of this section.

Be able to interpret the findings from a Normal Distribution

- it is bell-shaped. The graph of the standard normal distribution is bell-shaped. - mu = 0. The standard normal distribution has a mean equal to 0. - o = 1: the standard normal distribution has a standard deviation equal to 1.

Be able to interpret a percentile and a quartile

- percentile: measures of location, denoted P1, P2... which divide a set of data into 100 groups with about 1% of the values in each group. quartile: measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.

Know the shape of an F distribution and that the values cannot be negative.

- the F distribution is not symmetric. It is skewed right. - Values of the F distribution cannot be negative - The exact shape of the F distribution depends on the two different degrees of freedom

Understand how to identify significant results with probabilities as described on p 215 and on p 230

- x success among n trials is a significantly high number of successes if the probability of x or more successes is 0.05 or less. - Significantly low number of successes: x successes among n trials is a significantly low number of successes if the probability of x or fewer successes is 0.05 or less

Know what the z score and what the area stand for (and where they are found) in a standard normal distribution (Table A - 2)

- z score: distance along the horizontal scale of the standard normal distribution (corresponding to the number of standard deviations above or below the mean); refer to the leftmost column and top row - Area: region under the curve; refer to the values in the body of Table A-2.

Given the "Guidelines for Finding the Best Multiple Regression Equation", be able to use the guidelines and critical thinking to choose the best multiple regression equation.

1. Use common sense and practical considerations to include or exclude variables. 2. Consider the P-value (should be low) 3. Consider equations with high values of adjusted R2, and try to include only a few variables.

know what the follow measure of dispersion means: standard deviation

A set of sample values, denoted by s, that is a measure of how much data values deviate away from the mean.Significantly low values are (u - 2o) or lower. Significantly high values are the opposite

Know what resistant means

A statistic is resistant if the presence of extreme values (outliers) does not cause it to change very much.

Given the instructions, be able to conduct a One-Way ANOVA in SPSS

Analyze Compare Means One Way Anova Put the Independent Variable in the "Factor" Box Put the Dependent Variable in the "Dependent List" Click on the Post Hoc and select "LSD" box and click "Continue"

Given the instructions, be able to conduct a Rank Correlation Test in SPSS

Analyze Correlate Bivariate Place the variables in the variable box Check the Spearman box Check one tailed or two tailed

Be able to construct and interpret a normal quantile plot in order to evaluate normality

Analyze Descriptive Statistics Q-Q Plots Test Distribution Box indicates Normal Click okay

Given the instructions, be able to conduct a Kruskal-Wallis Test in SPSS

Analyze Nonparametric Legacy Dialogs K Independent Samples

Wilcoxon Rank Sum Test SPSS

Analyze Nonparametric Tests Legacy Dialogs Two Independent Samples Define Groups: which one is 0 and 1?

Given the instructions, be able to conduct a Wilcoxon Signed-Ranks Test in SPSS

Analyze Nonparametric tests Legacy Dialogs 2 Related Samples Place the variables in each box Check the Wilcoxon box Click ok Here you are putting the data side by side.

Given the instructions, be able to conduct a One-Way ANOVA in Excel. Be able to interpret the results of an ANOVA given the summary Excel.

Data Data Analysis Anova: Single Factor Input Range, enter the range

Know what a rank is and how to determine a rank in the event of a tie.

Data are sorted when they are arranged according to some criterion, such as the smallest to largest or best to worst. A rank is a number assigned to an individual sample item according to its order in the sorted list. The first item is assigned a rank of 1, the second item assigned a rank of 2, and so on. In the event of a tie, take the mean of the ranks Also, make them the absolute value of their number Moreover, if there is a zero, they do NOT get a rank You minus the difference between the two independent samples. Then you rank them in numerical order. After that, you put the corresponding sign to the ranks depending one whether or not the original difference was negative or positive.

Understand the definitions in the Definitions Block on page 4 of Triola

Data: collections of observations, such as measurements, genders, or survey responses Statistics: is the science of planning studies and experiments; obtaining data; and organizing, summarizing, presenting, analyzing, and interpreting those data and then drawing conclusions based on them. Population: is the complete collection of all measurements or data that are being considered. Census: collection of data from every member of the population Sample: subcollection of members selected form a population.

Know the difference between discrete and continuous data

Discrete: results when the data values are quantitative and the number of values is finite, or "countable" Continuous: result from infinitely many possible quantitative values, where the collection of values is not countable (lengths of distances from 0 cm to 12 cm)

Understand the Central Limit Theorem as described on p. 288

For all sample of the same size "n" with n>30, the sampling distribution of |x can be approximated by a normal distribution with mean "Greek u" and standard deviation "Greek o" / the square of n.

Understand the Central Limit Theorem as described

For all samples of the same size n with n > 30, the sampling distribution of |x can be approximated by a normal distribution with a mean "mu" and a standard deviation o / (the square of n). When you have a uniform set of statistics, if you take the means of the samples, you will have a normal bell-shaped curve.

Understand the paragraph on the bottom of p 300

If the requirement of a normal distribution is not too strict, simply look at a histogram and find the number of outliers. If the histogram is roughly bell-shaped and the number of outliers is 0 or 1, treat the population as if it has a normal distribution.

Understand what the Rare Event Rule tells us

If, under a given assumption, the probability of a particular outcome is very small and the outcome occurs significantly less than or significantly greater than we expect with that assumption, we conclude that the assumption is probably not correct.

ANOVA fact

In a one-way ANOVA the samples are categorized in one way.

Understand why we do one ANOVA instead of multiple t-tests

In general, as we increase the number of individual tests of significance of significance, we increase the risk of finding a difference by chance alone (instead of a real difference in means). Type I error is far too high.

Be able to determine whether two samples are independent or dependent (matched pairs)

Independent: If the occurrence of one does not affect the probability of the occurrence of the other. dependent: The opposite of independent. Affects the probability of the other.

Realize how to use the range rule of thumb

It is the mean plus (the standard deviation times two). According to the range rule of thumb, the vast majority of values should lie within 2 standard deviations of the mean, so we can consider a value to be significant if it is at least 2 standard deviations away from the mean.

Given the instructions, know what claims a Wilcoxon Signed-Ranks Test can test

It tests dependent or matched pairs of data (the differences are equal to zero). Needs to be a simple random sample. The population needs to be somewhat symmetric. There is NO requirement for the data to have a normal distribution. Excel cannot do these tests. SPSS knows to use a t test or a z test. Z test is used when you have more than 30 distribution.

Wilcoxon Rank Sum Test

Mann-Whitney U Test Two independent. IRS: Independent Rank Sum Null Hypothesis: The two samples come from populations with equal medians Alternative Hypothesis: The median of the first population is different from (or greater than, less than) the median from the second populations. - The two populations have different medians - The first population has a median greater than the median of the second population - The first population has a median less than the median of the second population

Be able to differentiate between nominal, ordinal, interval, and ratio data

Nominal: names, labels, categories. Cannot be arranged in some order (yes, no, undecided) Ordinal level: some order, but differences by subtraction mean nothing Interval: can be arranged in order and differences can be found and are meaningful. Do not have a natural zero. Ratio: arranged in order, differences can be found, natural zero starting point

Be able to describe the difference between a parametric and nonparamtetric test

Nonparametric (or distribution-free) do not require that samples come from populations with normal distributions or any other particular distributions.

Know what ANOVA stands for

One-way analysis of variance (ANOVA) is a method of testing the equality of three or more population means by analyzing sample variances. One-way analysis of variance is used with data categorized with one factor (or treatment), so there is one characteristic used to separate the sample data into different categories.

Understand the Law of Large Numbers

Over a long period of time, relative frequency probability stabilizes into something not super crazy.

Positive, negative, no, or nonlinear exists between two variables

Positive: Distinct straight-line, or linear, pattern. X values increase, the corresponding y values also increase. This is for scatterplots. Negative: Distinct straight-line or linear pattern. X values increase, y values decrease. This is for scatterplots. No relationship: No distinct pattern. This is for scatterplots. Also known as no correlations. Nonlinear / curvelinear: Distinct pattern suggesting a correlation between x and y, but the pattern is not that of a straight line. This is for scatterplots.

Know the difference between qualitative (categorical) and quantitative data

Quantitative: consists of numbers representing counts or measurements. Categorical: consists of names or labels.

Know what Adjusted R2 is and why we use it instead of R2

R2: denotes the multiple coefficient of determination, measure of how well the multiple regression equation fits the sample data. Perfect fit = 1. Poor fit = 0. Flaw: as more variables are included, R2 increases.Better to use adjusted R2: adjusted coefficient of determination. Modified to account for the number of variables and the sample size.

Given the instructions, be able to conduct a Rank Correlation Test in Excel

Replace each of the original of the original sample values by its corresponding rank. Select an empty cell and click Insert Function Statistical CORREL Array 1: cell range for the rank data of the first variable. (Ranks) Array 2: enter the cell range for the rank data of the second variable. (Actual values) Click okay.

Wilcoxon Rank Sum Test Requirements

Requirements 1. There are two independent random samples. 2. Each of the two samples has more than 10 values. 3. These tests must be done in SPSS 4. Don't have to be an equal number

Given a formula be able to calculate a z-score

SAMPLE: z = (x - sample mean) / sample standard deviation POPULATION: z = (x - mean of a population) / population standard deviation

Understand how we determine if a values are significantly high or low

Solve for x. If the number is above the positive number or lower than the negative number there is significance. The reason why this is important is because it is has value in the world.

Understand the difference between Practical and Statistical Significance

Statistical Significance: achieved when we get a result that is very unlikely to occur by chance. 5% or less. Practical Significance: some treatment or finding is effective, but common sense might suggest that the treatment or finding does not make enough of a difference to justify its use or to be practical

Understand what a Bonferroni (or other post hoc) test will do for you

The Bonferroni test will make it possible to see if you can figure out which mean is different than the others.

Be able to interpret a Kruskal-Wallis Test printout from SPSS

The Kruskal-Wallis Test is a nonparametric test that uses ranks of simple random samples from three or more independent populations to test the null hypothesis that the populations all have the same median. It is also called the H TEST.

Know what a regression line is and what each symbol in a regression equation stands for

The regression line (or line of best fit, or least-squares line) is the straight line that "best" fits the scatterplot of the data. The regression equation algebraically describes the regression line. The regression equation expresses a relationship between x and y^, x = in a regression: explanatory variable, predictor variable, independent variable y^ = in a regression equation: dependent variable, response variable, criterion variable b0 = y-intercept b1 = slope

Know the symbol for the linear correlation coefficient and what it represents

The symbol is r.exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line. The linear correlation r measures the strength of the linear correlation between the paired quantitative x values and y values in a sample. The linear correlation coefficient is sometimes referred to as the Pearson product moment correlation coefficient in honor Karl Pearson who originally developed it. Taken to three decimal places.

Misleading terminology

The term distribution-free test correctly indicates that a test does not require a particular distribution. The term nonparametric tests is misleading in the sense that is suggests that the tests are not based on a parameter, but there are some nonparametric tests that are based on a parameter such as the median. Due to the widespread use of the term nonparametric test, we use that terminology, but we define it to be a test that does not require a particular distribution.

Density Curve

The total area under the curve is equal to 1

What does the term treatment/factor mean. They are synonyms.

The word treatment is used because early applications of analysis of variance involved agricultural experiments in which different plots of farmland were treated with different fertilizers, seed-types, insecticides, and so on.

Know how many outcomes are available in a binomial probability distribution

There are only two outcomes available

Be able to state how A Universal Truth described on p. 288 ties to the Word of God

There is one Creator, and that God made order. Everything has a normal distribution.

Be able to identify skewness and an outlier in a box plot

Think of the histogram when it comes to skewness. Where the box is is where most of the data is. Right skewed means that there is a longer "tail" to the right.Outliers:1. Find the three quartiles.2. Find the interquartile range, where IQR = Q3 - Q13. 1.5 x IQR4. In a modified boxplot, a data value is an outlier if it is above Q3 by am amount greater than 1.5 x IQR or below Q1 by an amount greater than 1.5 x IQR.

Realize that the odds are stacked in favor of the house in gambling.

This has to do with actual odds versus payoff odds

Given the instructions, know what claims a Kruskal-Wallis Test / can test

This tests three or more independent samples

Given the instructions, know what claims a Rank Correlation Test can test

To find if there is a relationship or associations between two variables. Uses ranks. also known as the Spearman's rank correlation test Must be a simple random sample and must be in ranks or converted to ranks We use rs to test for an association between two variables. Null hypothesis: ps = 0 Alternative: p <> 0 (there is a correlation)

Be able to use Table A-2 to find probabilities

Use the area. That is the probability.

Understand SPSS output

When looking at the differences, look at the p-values and see if he will fit under the bar. Don't just look at the one above, but also look at the LSD bar as well. "Least Significant Difference". Post Hoc: not always generally agreed upon

Given the table on page 289, be able to determine how to calculate the correct Z score

When working with an individual value from a normally distributed population, use z = (x - mu) / o When working with a mean from some sample of n values, be sure to use the value of o/ the square of n for the standard deviation of the sample means, so use z = (|x - mu) / (o / the square of n)

Be able to interpret a Wilcoxon Signed-Ranks Test printout from SPSS

You will be able to see how many are negative, positive, and what was tied. You will see the t-test statistic or z-test depending on how many people there are.

Understand what is meant by Z score critical values.

a critical value is a z score on the borderline separating those z scores that are significantly low or significantly high.

Understand what a random variable

a variable that has a single numerical value, determined by chance, for each outcome of a procedure. A discrete random variable has either a finite or a countable number of values A continuous random variable has infinitely many values associated with measurements.

Know what a z score allows you to compare

a z-sore allows to see if you are significantly low or high if you are two deviations from the mean

Understand the terms "at most" "no more than" and "at least"

at most: that is the tops no more: that is the tops at least: that is the lowest they go

Understand the following when it comes to histogram: bell-shape, uniform, skew to the right and left

bell-shape: normal distribution uniform: different possible values occur with approximately the same frequency skew to the right (positively skewed / longer right tail) skew to the left (negatively skewed / longer left tails)

Know what a probability distribution is

compilation of random variables and it usually has a normal distribution

Be able to interpret a 5-Number Summary

consists of five values: minimum, first quartile, second quartile (same as median), third quartile, maximum.

Know what correlation means

exists between two variables when the values of one variable are somehow associated with the values of another variable

Know what linear correlation means

exists between two variables when there is a correlation and the plotted points of paired data result in a pattern that can be approximated by a straight line

Know what multiple regression is and why we use it

expresses a linear relationship between a response variable y and two or more predictor variables (x1, x2, .... xk). The general form of a multiple regression equation obtained from sample data is y^ = b0 +b1x1 +b2x2+ ... + bkxk.Coefficients b0, b1... are sample statistics. x1 is a single independent variable

Be able to interpret what that Z score means.

how many standard deviations you are from the mean

Know what "efficiency" means

how much evidence is necessary to reject a null hypothesis. For example, a parametric test needs only 95 people versus 100 people in a nonparametric test

Given the instructions, know what claims a Sign Test can test

matched pairs of sample data, nominal data with two categories, claims about the median of a single population. If you have less than five girls and five boys, then you would have to do a sign test.

Know how to calculate the three measures of center: mean, median, mode. When to use each. 99-102

mean: adding all the values and dividing the total by the number of values (not resistant) median: middle value when they're organized (resistant) mode: value occurs with the greatest frequency

Know the difference between a parameter and a statistic

parameter: a numerical measurement describing some characteristic of a population. Statistic: a numerical measurement describing some characteristic of a sample.

Know the difference between a proportion and a mean

proportion: yes/no mean: average

Be able to calculate range

range: the difference between the maximum data value and the minimum data value

F distribution formula

variance between samples / variance within samples Look at the formula in Triola

Be able to use Formula 6-2 to convert values to Z scores in order to find probabilities.

z = (x - mu) / o (round z scores to two decimal places)


Related study sets

Chapter 32 The Building of Global Empires

View Set

Chapter 34 - Inflation, Deflation and Macro Policy

View Set

AP Art History Chapter 5 Vocabulary

View Set

(4.3) Role based access control (RBAC)

View Set

Chapter 15 - Respiratory Emergencies

View Set

Qualified plans, and federal Tax considerations for life insurance and annuities

View Set

PY 205- Research Methods in Psychology - Final Study Guide Ch. 6-10

View Set