STA 2023 Exam 1
Probability
=(UV-LV)/(Max-Min)
Expected
=mean=average
Multiplicative Rule
P(A and B) = P(A) x P(B)
Additive rule for disjoint events
P(A or B) = P(A) + P(B)
Additive rule
P(A or B) = P(A) + P(B) - P(A and B)
Conditional probabilities
P(AlB) = P(A and B)/P(B) or P(BlA) = P(A and B)/P(A)
Compliment rule
P(not A) = 1-P(A)
Random variable
a variable that represents the numerical outcome(s) of a random phenomenon
Outliers
any observations that are significantly far away from the rest of the data points
Categorical
bar charts (bars do not touch) and pie charts
Normal
bell curve, mean=median=mode
Discrete RV's
binomial and non binomial, RV's which assume a finite number of outcomes, countable, probability of an exact value can be computed
Exact
binompdf (n,p,x)
Graphical
categorical and quantitative
Numerically
center and spread
r2
coefficient of determination, percent of variation in y that is explained by x, between 0% and 100%, 1-r2= the fraction of the variation in y that is NOT explained by x, larger r2 is better
Binomial RV's
collection of yes/no (binary) outcomes
r
correlation coefficient, sign of the slope, measures strength and direction, between -1 and 1, not affected by units of x and y
Discrete
countable
Quantitative
data that is describing using numbers, can be averaged
Categorical
data that is describing using words or categories, qualitative
Probabilities
determined by the proportion of times the event(s) will occur in a long series of independent trials (law of large numbers)
Less than x times
doesn't include x
More than x times
doesn't include x
At least 1
equals 1 - none
Quantitative
histograms (bars usually touch), stem plots, box plots, and dot plots
At least x times
includes x
At most x times
includes x
IQR
interquartile range, Q3-Q1 where Q3= 75th percentile (the median of the 'top' half) and Q1= 25th percentile (the median of the 'bottom' half), gives the spread of the central (middle) 50% of the data set
Non binomial RV's
many types (distinguishing specific type is not required)
Center
mean, median, mode
Skewed right
mean->median->mode
Skewed left
mean<-median<-mode
Bimodal
mean=median
Rectangular/uniform
mean=median
Bell shaped with an outlier
mean>median
Continuous
measurable
Range
min-max, the measure of spread that is affected most by outliers
Least square regression line
minimizes the sum of the residuals squared
Disjoint
mutually exclusive, cannot occur together
Binomial distribution
observation of binary, probability of success is constant, n fixed observations, n observations are independent, X~B(n,p)
Independent
occurrence of one does not affect the probability of the other
Uniformly distributed RV's
outcomes are equally likely
Value
probability, normalcdf(LV,UV,mean,SD), for z use 0 as mean and 1 as SD
Spread
range, variance, standard deviation, IQR
Standard deviation
represents the 'typical' distance from the mean, in units of data, measure of spread that is smaller for distributions where the points are clustered around the middle, cannot be negative
Variance
represents the 'typical' squared distance from the mean, in squared units, measure of spread around the mean, but its units are not the same as those of the data points
Normally distributed RV's
should be told the population is normal, z score can be used
b
slope, as x increases by 1, y is predicted to increase/decrease by b
Mean
the 'average', the balancing point, distances from the data points always add up to zero
Sensitivity
the condition is correctly determined to exist in the subject
Specificity
the condition is correctly determined to not exist in the subject
False positive
the condition is incorrectly determined to exist in the subject
False negative
the condition is incorrectly determined to not exist in the subject
Median
the middle ordered value, the 50th percentile, falls in (n+1)/2 position, always exactly 50% of the observations on either side of it and is not very sensitive to outlier, robust
Mode
the most frequently occurring number, the measure of center represents the most common observations or class of observations
Statistics
the science of collecting, analyzing, interpreting, and presenting data
Non binomial
three or more outcomes, mean
Binomial
two outcomes, mean and SD
Continuous RV's
uniformly distributed and normally distributed, RV's which assume an infinite number of outcomes, measurable, probability of any exact value cannot be computed, probability is 0 (move on)
%
value, invNorm(area to the left,mean,SD), for z use 0 as mean and 1 as SD
Residuals
vertical distance between the points and the line, sum to 0, error=actual-predicted
a
y intercept, when x=0, y is predicted to be a