Geog 200 Final
What are Type 1 and Type 2 errors in statistics and how do they relate to the significance level of a statistical test?
A Type I error rejects the null hypothesis when it is actually true, this is also known as a false positive- seeing a relationship where one does not exist. A type II error accepts the null hypothesis when it is actually false, this is also known as a false negative - not seeing a relationship where it does exist. A false positive is typically worse. Selecting a significance level determines the chance of making a Type I Error.
What is meant by a normal distribution? Use a diagram to show how the mean, the standard deviation, and representative z-scores of a variable relate to the normal distribution.
A normal distribution is represented by a bell shaped curve, with a peak value at the mean. A z-score represents the number of standard deviations above or below the mean (a z-score of zero is equal to the mean). For example, a z-score of one represents one standard deviation from the mean. Numbers above the mean have positive z-scores, negative z-scores are less than the mean, and the bigger the number, the further they are from the mean.
What is meant by the statement "you can't prove anything with [inferential] statistics"
Conclusions utilizing inferential statistics go beyond the data themselves. Inferential statistics can be used to infer characteristics of a populations from a sample, to compare characteristics of different samples, to test the relationship between variables, or to predict future behavior rather than specifically proving something with calculations.
Nominal
Data are differentiated by name only. There are no inherent numerical values, nor can it be said that one data point is necessarily bigger, better, etc—they are just "different." Can be numbers, but numbers have no numerical significance and merely serve as identifiers (Ex. Zip codes, number on a player's jersey). Categories are mutually exclusive: one each thing can not be classified in more than one category ("If it's A, it can't be B"). Categories are exhaustive: data sets cover everything (there is a category for every data set)
Ordinal
Do not have a specific numerical value, but can be ranked according to some quality or characteristic. Examples include many ordered survey responses (e.g. poor, fair, good, excellent), and 1st, 2nd, 3rd rankings. Arithmetic operations cannot be performed on this data. Weakly ordered: Each observation fits into a ranked category (e.g. poor, fair, good; 1-5, 6-10, 11-15, 16-20, 21+) Strongly ordered: Each observation is given a specific position in a rank-order scheme (e.g. "Top Ten List")
Compare and contrast the classical hypothesis testing approach and the probability value (p-value) approach to inferential statistics.
In the classic hypothesis testing approach, the proposed explanation (hypothesis) is stated and acceptance criteria (significance levels) are selected beforehand. Hypothesis testing uses significance levels and null hypotheses. In the probability value approach, the test is run first and results are used to draw inferences, using probability values. P-value approach: 0.01 → 99% difference that there is a legitimate difference; the p-value shows the chance something comes from randomness.
Interval
Include data that have a specific numerical value, but which is measured on a scale with an arbitrary zero point. Most arithmetic operations can be performed on this data, but not multiplication. Classic examples are the Celsius and Fahrenheit temperature scales.
How do parametric and non-parametric statistical tests differ? What requirements does a data set have to meet for parametric testing to be used?
Parametric tests require interval or ratio scale data. Parametric tests assume "normality" in a data distribution. Non-parametric tests allow data at any level to be used. Non-parametric tests do not assume normality in a data distribution. Non-parametric tests incorporate less information than parametric tests so they yield less certain outcomes for a given sample size than parametric tests. Therefore non-parametric tests need a larger sample size to achieve a similar significance level. Non-parametric tests may yield less conclusive results.
Discuss some of the factors you might consider when deciding how large a sample size to employ in a study.
Sampling technique used, population parameter being estimated, desired degree of precision, desired level of confidence, increasing the sample size or not (the researcher must strike a balance between resource expenditures and acceptable error levels). Necessary sample size depends upon the statistic being calculated (e.g. mean, total, proportion), the characteristics of the population, and the type of sampling. Larger sample size to decrease error, use larger sample size if there is lots of variation.
Ratio scale
Specific numerical values for which there is a fixed zero point. Often combined with interval scale data, to form interval-ratio scale data. Examples include linear distance, area, population, GDP, sunlight hours, etc. All arithmetic operations can be performed on ratio-scale data.
Inferential Statistics
The use of statistical techniques to make conclusions (inferences) that go beyond the data themselves. are used to make generalizations about a population on the basis of a sample—includes estimation and hypothesis testing
What three factors determine the appropriate value of the critical statistic for a t-test? Explain what happens to the critical value as each of these three factors changes.
Use a t-test if the sample size is less than 30. The chosen significance level (), the number of tails in the test, and the number of degrees of freedom/sample size determine the appropriate value of the critical statistic. As increases, t-crit decreases (if increases from 0.01 to 0.05, t-crit decreases). When a test goes from one tail to two-tail, t-critincreases. When sample size increases, the absolute value of t-crit,t-crit, decreases.
Confidence level
probability that a statistic was not the result of "random chance" (Confidence Level = 1 - ). So for =0.05, Confidence level is 0.95, or 95% These are used as minimum thresholds to be met during hypothesis testing
Significance Level
probability that the result of a statistical test could have arisen from "random chance" (e.g. = 0.05 means that there is a 5% chance that a result came from randomness). Equivalent to p-value in p-value testing
Classic Hypothesis Testing
proposed explanation is stated and acceptance criteria are selected beforehand (uses significance levels)
descriptive statistics.
summarizes pertinent characteristics of a data set (mean, range, standard deviation) while
Probability (P-value) Testing
test is run first and results are used to draw inferences (uses probability values)