Statistics Study Guide
Factorial
n! = (n)(n-1)(n-2)...3.2.1
Formula for sample standard deviation
s=√∑(xi-x-bar)²/n-1
Formula for sample variance
s²=∑(xi-x-bar)²/n-1
Sample-mean
x-bar; The mean of a sample data
Formula for sample z-score
z=x-xbar/s
Formula for population z-score
z=x-µ/δ
Formula for expected value
µ=E(X)=∑x p(x)=x₁*p(x₁)+x₂*p(x₂)...
Formula for population standard deviation
δ=√∑(X-µ)²/N
Formula for variance of a discrete random variable
δ²=E(X-µ)²=∑(x-µ)²p(x)=∑x²p(x)-µ²
Formula for population variance
δ²=∑(X-µ)²/N
Descriptive statistics
A branch of statistics that uses numerical and graphical methods to look for patterns in a dataset and to present that information in a convenient form, but does not use analysis
Inferential statistics
A branch of statistics that uses sample data to make estimates, predictions, and decisions about a population
Variable
A characteristic of interest about each individual element of a population or sample
Population
A collection of individuals or objects that is under study
Compound event
A composition of 2 or more events, can be the result of a union or intersection of events
Normal distribution
A distribution that models population distributions with a symmetric mound shaped distribution
Interval estimator (confidence interval)
A formula that tells us how to use the sample data to calculate an interval that estimates the target parameter.
Histogram
A graphical representation of frequency distribution in which observations are divided into classes
Box plot
A graphical representation of the distribution using five numbers: minimum, lower quartile, median, upper quartile, maximum
Pie chart
A graphical summary of data that uses a circle partitioned into sectors to show the relative frequency of items in each class
Bar chart
A graphical summary of data that uses bars of fixed width and varying height to show the frequency or relative frequency of items in each class
Symmetric distribution
A mound- or bell-shaped distribution with a high frequency of observations in the middle of the range
Subjective probability
A probability assigned based on the subjective judgment of an individual
Theoretical probability
A probability in which basic outcomes of the process are defined, probabilities are assigned to the basic outcomes and probabilities of compound events are computed
Experiment
A process that yields a single outcome that cannot be predicted with certainty
Bernoulli
A random variable that can assume only two possible values: 1 (success) and 0 (failure)
Discrete random variable
A random variable that takes either finitely many values or a countably infinite set of values
Point estimator
A rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the target parameter (example: sample mean).
Simple random sample
A sample in which each element in the population has an equal chance of being selected
Systematic sample
A sample in which the first element is picked at random, and then every kth element is picked
Cluster sample
A sample obtained by sampling some of, but not all of, the possible subdivisions within the population
Stratefied random sample
A sample obtained by stratifying the sampling frame and then selecting a fixed number of elements from each stratum by simple random sampling
Countable set
A set in one to one correspondence with integers
Split stem plot
A stem and leaf plot in which some stems are split into two parts to reduce the size of the plot
Sample
A subset of the population
Stem and leaf plot
A summary of data that divides each observation into two parts, with the leaves grouped on a stem
Relative frequency distribution
A tabular summary of data showing both the frequency and the relative frequency of items in each class
Frequency distribution
A tabular summary of data showing different classes and the frequency of items in each of several non-overlapping classes
Continuous random variable
A variable in which the set of possible values consists of one or more intervals on the number line
Random variable
A variable that assigns a unique value to each outcome of the Sample space, S
Percentile ranking
A way to express where an observation falls in a dataset using percentiles
Complementary event
All sample points that do not belong to the event
Binomial experiment
An experiment in which identical trials are repeated and we are interested in the number of certain outcomes
Outlier
An extreme observation that does not match the general pattern of a dataset
Real line
An interval of the form (-∞,∞)
Finite interval
An interval of the form (a,b) or [a,b]
Semifinite interval
An interval of the form [0,∞)
Interquartile range
Another measure of spread, which gives the middle 50% of a dataset
Event
Any subset of the sample space
Sample point
Basic outcome of an experiment
Independence
Events that do not depend on the outcome of another event
Mutually exclusive events
Events that share no sample points
Chebyshev's rule
For any number k>1, at least (1-1/k²) fraction of the data will lie within k standard deviations of the mean
Additive rule of probability
For any two events A and B, P(AUB) = P(A) + P(B) - P(AB)
The empirical rule
For data with a bell-shaped distribution, approximately 68% of the observations will be within one standard deviation of the mean, approximately 95% of the observations will be within two standard deviations of the mean, and approximately 99.7% of the observations will be within three standard deviations of the mean
Examples of sample points
HH, HT, TH, TT
Probability density function (PDF)
Height of the curve at x
Distribution
How the observations are spread over the range of the data
Area under the curve
How to determine probability using a probability histogram
Central limit theorem
If the sample size n is large then the sampling distribution x-bar is approximately normal with mean µ and variance δ/√n
Basic counting principle
In an experiment done in two independent stages, where Stage I has m possible outcomes and Stage II has n possible outcomes, the experiment can be performed in (m)(n) ways
Quartile
One of three numbers which partition a dataset into four parts
Permutation
Ordered arrangement
Multiplicative rule
P(AB)=P(A)P(B|A)=P(B)P(A|B)
Empirical probability
P(Event) = Relative frequency of the event = Number of occurrences of an event/Number of times experiment is repeated
Range
The difference between the minimum and maximum observations
Stem
The digits of each observation, excluding the leaf (in a stem and leaf plot)
Union
The entire Venn diagram, including both circles and the intersection
Median
The middle-most observation
Mode
The most frequent observation
Binomial random variable
The number of successes in n trials
Intersection
The part of a Venn diagram where the circles overlap
Relative frequency
The proportion of the total number of observations belonging to the class
Leaf
The right-most digit of each observation (in a stem and leaf plot)
Leaf unit
The unit used to separate the leaf from a stem in a stem and leaf plot, assumed to be 1
Target parameter
The unknown population parameter that we are interested in estimating.
Qualitative data
Variables that are not numerical (in the true sense) but are categorized into various groups
Quantitative data
Variables that can assume numerical values (in the true sense)
Law of Large Numbers
When an experiment is repeated many times, then the relative frequency of a particular outcome approaches the actual probability of that particular outcome
Parameter
Numerical descriptive measure of the population
Sample statistic
Numerical descriptive measure of the sample
Mean
The average of a group of numbers
Variance
The average squared distance between all the observations and the mean
Expected value (mean)
The center of the distribution of a random variable
Center
Measure of central tendency
Z score
Measures the relative position of an observation compared to the mean, expressed in terms of standard deviation
Data
Numbers or information with a context
Venn diagram
Pictorial diagram in which the sample space is represented by a rectangle with sample points represented by solid dots inside the rectangle and events by circles within the rectangle
Conditional probability
Probabilities of events change when additional information is provided, in particular, if another related event is known to have occurred
Example of a sample space
S = {HH, HT, TH, TT}
Sample space
Set of all possible outcomes of an experiment, denoted by S
Negative distribution
Skewed to the left; a distribution with a high frequency of observations in the high end of the range
Positive distribution
Skewed to the right; a distribution with a high frequency of observations in the lower end of the range
Probability distribution
Specification of the possible values and probability associated with each possible value of the discrete random variable
Variability
Spread of the data
Standard deviation
Square root of variance, or average between a typical observation and the mean of a dataset
Probability
Study of randomness and uncertainty; numerical measure of chance