Math 202 : Probability and Statistics Final Test Prep
Quartile
1) Division of data into 4 equal parts
Stemplot
A graphical representation of a quantitative data set. Leading values of each data point are presented as stems and second digits are given as leaves.
Ogive
A line graph that depicts cumulative frequencies.
Correlation
A measure of the extent to which two factors vary together.
Standard Deviation
A measure of variability that describes an average distance of every score from the mean (r).
Probability
A number that describes how likely it is that an event will occur
Parameter
A numerical measurement describing some characteristic of a population.
Statistic
A numerical measurement describing some characteristic of a sample.
Degrees of Freedom
A parameter of the t distribution. When the t distribution is used in the computation of an interval estimate of a population mean, the appropriate t distribution has n-1 degrees of freedom, where n is the size of the simple random sample.
z-Test
A parametric inferential statistical test of the null hypothesis for a single sample where the population standard deviation is known.
t-Test
A parametric inferential statistical test of the null hypothesis for a single sample where the population standard deviation is unknown.
Outcome
A possible result of a probability experiment
Independent
A relationship between two sets of data or two datum which states the outcome of one has no effect on the outcome of the other.
Experiment
A research method in which an investigator manipulates one or more factors to observe the effect on some behavior or mental process
Quota Sample
A sample deliberately constructed to reflect several of the major characteristics of a given population.
Systematic Sample
A sample drawn by selecting individuals systematically from a sampling frame.
Random Sample
A sample in which every element in the sample has an equal chance of being selected.
Convenience Sample
A sample that includes members of the population that are easily accessed.
Normal
A sample which follows the Empirical Rule for distribution.
Cluster Sample
A sampling design in which entire groups are chosen at random.
Line of Best Fit
A straight line that comes closest to the points on a scatter plot.
Two-way Table
A table containing counts for two categorical variables. It has r rows and c columns.
Frequency Table
A table for organizing a set of data that shows the number of times each item or number appears.
Wording Bias
A type of response bias where the question is posed to achieve a desired result.
Standardized Value
A value found by subtracting the mean and dividing by the standard deviation.
Lurking Variable
A variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two.
Sample Space
All possible outcomes of an experiment.
Type I Error
An error that occurs when a researcher concludes that the independent variable had an effect on the dependent variable, when no such relation exists; a false positive.
Type II Error
An error that occurs when a researcher concludes that the independent variable had no effect on the dependent variable, when in truth it did; a false negative.
Binomial
An experiment in which a set number of trials is used.
Geometric
An experiment in which there is no set number of trials but is ended by achieving an outcome.
Observational Study
An experiment which observes individuals and measures variables of interest but does not attempt to influence the responses.
Outlier
An extreme deviation from the mean.
Confounded Variable
An unintended difference between the conditions of an experiment that could have affected the dependent variable.
Undercoverage
Occurs when some groups in the population are left out of the process of choosing the sample.
Experimental Probability
Probability based on what happens when an experiment is actually done.
IQR
Range of the middle 50% of the values; Q3-Q1 = 75th percentile - 25th percentile.
Steps Used in a Hypothesis Test
Regardless of the type of hypothesis being considered, the process of carrying out a significance test is the same and relies on four basic steps: Step One: State the null and alternative hypotheses (see section 11.2) Also think about the type 1 error (rejecting a true null) and type 2 error (declaring the plausibility of a false null) possibilities at this time and how serious each mistake would be in terms of the problem. Step Two: Collect and summarize the data so that a test statistic can be calculated. A test statistic is a summary of the data that measures the difference between what is seen in the data and what would be expected if the null hypothesis were true. It is typically standardized so that a p-value can be obtained from a reference distribution like the normal curve. Step Three: Use the test statistic to find the p-value. The p-value represents the likelihood of getting our test statistic or any test statistic more extreme, if in fact the null hypothesis is true. For a one-sided "greater than" alternative hypothesis, the "more extreme" part of the interpretation refers to test statistic values larger than the test statistic given. For a one-sided "less than" alternative hypothesis, the "more extreme" part of the interpretation refers to test statistic values smaller than the test statistic given. For a two-sided "not equal to" alternative hypothesis, the "more extreme" part of the interpretation refers to test statistic values that are farther away from the null hypothesis than the test statistic given at either the upper end or lower end of the reference distribution (both "tails"). Step Four: Interpret what the p-value is telling you and make a decision using the p-value. Does the null hypothesis provide a reasonable explanation of the data or not? If not it is statistically significant and we have evidence favoring the alternative. State a conclusion in terms of the problem.
Marginal Frequency
Row and column totals in a contingency table (cross-tabulation) that represent the univariate frequency distributions for the row and column variables.
Snowball Sample
Samples in which informants provide contact information about other people who share some of the characteristics necessary for a study.
Empirical rule
States that, in a normal distribution, about 68% of the terms are within one standard deviation of the mean, about 95% are within two standard deviations, and about 99.7% are within three standard deviations (normal curve).
Descriptive Statistics
Statistical procedures used to describe characteristics and responses of groups of subjects.
Inferential Statistics
Statistics that are used to interpret data and draw conclusions.
Statistical Method
Step 1 Prepare: 1) Context What is the goal of the study? 2) Source of the Data Is the data from conflict of interest? 3) Sampling Method Which Method was used? Step 2 Analyze 1) Graph the data 2) Explore the data (are their outliers? how is the data distributed? ) 3)Apply Statistical Method Conclude Is there a statistical significance?
Experiment
The act of conducting a controlled test or investigation.
Simulation
The act of repeating an experiment to get more accurate statistical evidence.
Margin of Error
The range of percentage points in which the sample accurately reflects the population, the range surrounding a sample's response within which researchers are confident the larger population's true response would fall.
Theoretical Probability
The ratio of the number of favorable outcomes to the number of possible outcomes if all outcomes have the same chance of happening.
Central Limit Theorem
The sampling distribution of the mean will approach the normal distribution as n increases (n>30).
Sample Space
The set of all possible outcomes
Standard Error
The standard deviation of a sampling distribution.
Interpolation
Using the Least Squares Regression Line to predict a y-value for an x-value within the x-data set.
Discrete Random Variable
Variable where the number of outcomes can be counted and each outcome has a measurable and positive probability.
population probability
is the proportion of times you expect something to occur when you draw randomly from a population
Coefficient of Determination
Measures the percentage of variation in a dependent variable explained by one or more independent variables (r^2).
Scatter Plot
A graph with points plotted to show a possible relationship between two sets of data.
Scatterplot
A graphed cluster of dots, each of which represents the values of two variables.
Data
Information gathered from observations.
Sample
Items selected at random from a population and used to test hypotheses about the population.
Histogram
A bar graph that shows the frequency of data within equal intervals.
Non-Response Bias
A bias caused by a number of people who did not respond to the survey.
Causation
A cause and effect relationship in which one variable controls the changes in another variable.
Event
A collection of one or more outcomes of an experiment
Spread
A descriptive feature in which describes the range of the data graphically.
Center
A descriptive feature which describes the placement and relation of the median to the other parts of the graphic representation.
Bar Graph
A graph that uses horizontal or vertical bars to display data
Response Bias
Anything in the survey design that influences the responses from the sample.
Voluntary Response Bias
Bias introduced to a sample when individuals can choose on their own whether to participate in the sample.
Law of Large Numbers
Law stating that a large number of items taken at random from a population will (on the average) have the population statistics.
Qualitative
Data identified by something other than numbers.
Quantitative
Data or datum being numerically defined.
Boxplot
Displays the 5-number summary as a central box with whiskers that extend to the non-outlying data values.
Mutually Exclusive
Each event or variable is independent from one another. No event or variable will have an effect on the probability of outcome for any other event or variable.
Matched Pairs
Either two measurements are taken on each individual such as pre and post OR two individuals are matched by a third variable (different from the explanatory variable and the response variable) such as identical twins.
Extrapolation
Estimating a value outside the range of measured data.
Simple Random Sample
Every member of the population has a known and equal chance of selection.
Placebo Effect
Experimental results caused by expectations alone; any effect on behavior caused by the administration of an inert substance or condition, which is assumed to be an active agent.
Dotplot
Graphs a dot for each case against a single axis.
Trial
In probability, a single repetition or observation of an experiment
Mean
The arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores.
Block Design
The random assignment of subjects to treatments is carried out separately within each block.
Mode
The datum which occurs the most in a set of data.
Residual
The difference between an observed value of the response variable and the value predicted by the regression line.
Sampling Distribution
The distribution of values taken by the statistic in all possible samples of the same size from the same population.
Population
The entire aggregation of items from which samples can be drawn.
Null Hypothesis
The hypothesis that states there is no difference between two or more sets of data in a significance test.
Alternative Hyothesis
The hypothesis which states the Null Hypothesis is incorrect in a significance test.
Probability
The likelihood that a particular event will occur.
Least Squares Regression Line
The line that minimizes the sum of squared residuals.
Correlation
The measure of a relationship between two variables or sets of data.
Median
The middle score in a distribution; half the scores are above it and half are below it.
Joint Frequency
The number of responses for a given characteristic.
Stratified Sample
The population is divided into strata and a random sample is taken from each stratum.
p-Value
The probability of getting a result at least as extreme as the result given from the test. The lower the value the stronger the evidence.
Conditional Probability
The probability that a particular event will occur, given that another event has already occurred.
Statistical Significance
When your discovered p-value is less than your alpha (.05 if not given). States that chance alone would rarely produce an equally extreme result.
Outlier
a number in a set of data that is much larger or much smaller than most of the other numbers in the set
Box-and-Whisker Plot
shows the distribution of data. The middle half of the data is represented by a "box" with a vertical line at the median. The lower fourth and upper fourth are represented by "whiskers" that extend to the smallest and largest values.
Histogram
shows/compares frequency of continuous data
Chi-Squared Goodness of Fit
uses sample data to test hypotheses about the shape or proportions of a population distribution. The test determines how well the obtained sample proportions fit the population proportions specified by the null hypothesis.