ST 231 - Final Exam
linear vs ~normal vs independent vs same
linear regression inference conditions to check, residual: scatterplot/no departure from straight line in population vs histogram/varies about population /y-values often from different x-values vs repeated observations on same individual not allowed vs scatter plot/responds SDs for all x values
CI length vs confidence level vs CI vs margin of error
determined from knowing: margin of error (m) vs m/population SD (O)/sample size (n) vs sample mean (x^bar)/m vs confidence level (C)/o/n
degrees of freedom vs vs association between 2 categorical variables vs p^hat vs #successes + 2/n + 4 vs
df vs previous = (r - 1)(c - 1) vs sample proportion estimating _ (#successes/total)
Z~N(0,1) vs t(n-1) vs t(df)
distribution: population mean (u) + (o) known/population proportion (p)/comparing proportion vs previous o un-/matched pairs difference (ud) vs comparing means (u1 - u2)
type I error (a) vs type II error (b)
errors, reject H0/when actually: yes/true vs no/false
y^hat vs a vs b vs x
first = second + third x last: dependent variable vs y-intercept vs slope vs independent variable
boxplot vs max vs Q3 vs median vs Q1 vs min
five-number summary graph/4 sections (25% each) vs >est line point vs >est box point vs middle vs <est box point vs <est line point
residual vs least-squares regression line
(y - y^hat) min vertical distance between dots/regression line vs (y on x) <s sum of squared distance of data points
parameter vs statistic vs sampling distribution
# describing population/value unknown in statistical practice (can't examine entire population) vs # computable from sample data without using + used to estimate unknown previous vs values taken distribution by previous in all possible same n from same population
scatterplot/histogram
(2) most important figure to construct + examine for checking conditions for linear regression inference
nCr vs binomial distribution
(^nr) vs P(X = r) = (previous)(p)^k x (1-p)^(n-r)
test statistic vs estimate vs mean vs standard error (SE)
(first - second/last) t or z vs x^bar/d^bar/p^hat vs u0/ud0/p0 vs ()^0.5
expected (Ei) vs observed (Oi)
(never/occasionally/always: male = 228/81/185 "vs" female = 107/29/80 "vs" total = 710) counts (never): (228 + 107)(228 + 81 + 185)/710 vs 228
z-score
(x - u)/o
<p-value
>evidence against H0 (no effect nor difference statement) or doesn't fall between lower confidence limit (LCL) + upper (UCL)/probability test statistic will take value as observed assuming H0 = true
comparing means vs association between 2 quantitative variables
CI: (x1^bar - x2^bar) +- t**(s1^2/n1 + s2^2/n2) vs b +- t**SEb
u = u0 vs u = ud0 vs u1 = u2 vs p = p0 vs p1 = p2 (p1 - p2 = 0) vs no association vs B = 0
H0: population mean (u) + (o) known or un- vs matched pairs difference (ud) vs comparing means (u1 - u2) vs population proportion (p) vs comparing proportions (p1 - p2) vs association between 2 categorical variables vs previous quantitative
A vs AUB vs independence vs disjoint/mutually exclusive
P(): 1 - P(A^bar) [probability of _ complement) vs P(A) + P(B) - P(A&B) [general addition rule) vs occurrence of one doesn't affect other/previous = P(A) x P(B) vs (2) not previous/no event in common
p1^hat - p2^hat vs p^hat
SE, for CI (C for Copy Cat)/under H0: ((p1^hat(1-p1^hat))/n1 + (p2^hat(1-p2^hat))/n2)^0.5/"p1 = p2", ((p^hat(1-p^hat)(1/n1 + 1/n2))^0.5 vs ((p^hat(1-p^hat))/n)^0.5/"p = p0", ((p0(1-p0))/n)^0.5
0.51 vs 0.49
a (random # generator), P(x-a) = 0.51: < vs >
X~X^2(df) vs t(n-2)
association between 2 variables distribution: categorical vs quantitative
X^2/categorical vs t/quantitative
association between 2 variables test statistic: Σ((observed count - expected count)^2/expected count) vs (b - 0)/SEb
ordinal vs nominal
categorical variable, ordered: yes/stage 0-4 cancer (0 = benign, 4 = metastasis) vs no/eye colour or sex
categorical vs >+ve vs +ve vs right
chi-square: statistic best for _ data vs X^2 statistic best evidence against H0 when # vs distributions = family taking on only _ values vs skewed to _
P(x<= b) - P(x<=a) vs 1-p(x<=a)
cumulative, P(): a < x <= b vs x>a
row total vs column total vs table total
expected count: first x second/last
CI vs independent vs two vs no (must be <10%)
includes value (u) vs previous 90% for differences between 2 _ population means vs _-sided t-test vs rejected at 10% significance
inversely proportional vs quadruple
margin of error (m)-sample size (n): relationship (n or also with p-value) vs change to n <ing CI width by 1/2
1.5 x IQR vs r^2 vs extrapolation vs influential
max or min/spots suspected outliers (>Q3/<Q1) vs %variation vs x > max or < min vs removal >ly changes calculation
measures of centre vs measures of spread
mean/median/mode vs range/IQR/quartiles/SD/variance
median (M) vs mean (x^bar)
measures of centre, outlier (skewness) resistant: yes/distribution midpoint/same as next when symmetrical vs no/farther out in tail > previous for skewed/average
range vs interquartile range (IQR)
measures of spread, outlier resistant: no/>est - <est vs yes/Q3 - Q1
sample SD vs sample variance
measures of spread: ((sum(xi - x^bar)^2)/(n-1))^0.5 vs previous without ()^0.5
central limit theorem vs normal vs proportion sd
n >= ~30, sampling distribution for x^bar ~Normal, ux^bar = u, ox^bar = (o/(n)^0.5 vs >er n = closer distribution to _ vs ((p(1-p))/n)^0.5, np >= 10, n(1 - p) >= 10
confidence interval (CI) vs estimate vs margin of error (m)
next +- last/2x last vs x^bar/d^bar/p^hat vs critical value (z**/t**) x standard error (SE)
dependent vs matched pair vs independent
one selection process related for other vs test/samples not next vs test 2 x^bar differences/use SE for x1^bar - x2^bar for normal populations with = variances
25th vs 50th vs 75th
percentile, Q (_% below it): 1 vs 2 vs 3
2 vs 4 vs 7/21
plus four CI for p (p^~): (#successes in sample + first)/(n + second), 5/17 = last
o known vs o unknown vs matched pairs difference (ud)
population mean (u) CI: x^bar +- z**o/(n)^0.5 vs x^bar +- t***s/(n)^0.5 vs d^bar +- t**sd/(n)^0.5
z/o known vs t/o unknown vs t/matched pairs difference (ud) vs population proportion (p)
population mean (u) test statistic, letter/formula: (x^bar - u0)/(o/(n)^0.5) vs (x^bar - u0)/(s/(n)^0.5) vs (d^bar - ud0)/(sd/(n)^0.5) vs (p^hat - p0)/((p0(1-p0))/n)^0.5
difference vs all vs proportion
population parameter example: _ in mean IQ for 7th grade boys + girls vs mean depth for _ northern hemisphere icebergs vs _ of previous ppl at WLU drinking coffee regularly
confidence interval (CI) vs hypothesis test vs condition
population parameter: possible values range + precision estimation/estimate +- margin of error vs how confident in drawing conclusions from sample/H0 + Ha vs previous/independent + counts (success/failure) each 5+ in both samples
population (p) vs comparing (p1 - p2)
proportion CI: p^hat +- z**((p^hat(1-p^hat))/n)^0.5 vs (p1^hat - p2^hat) +- z**((p1^hat(1-p1^hat))/n1 + (p2^hat(1-p2^hat))/n2)^0.5
discrete vs continous
quantitative variable, finite decimal places: yes (counts)/#pushups in 1 min vs in-/height + weight
b vs a vs residual vs u
r(sy/sx) vs y^bar - (b)x^bar vs y - y^hat vs second + (first)"x", y~N(_, o)
explain vs residuals vs systematic vs extrapolation
regression analysis: square of correlation coefficient (r^2) measures variation in y _ed by variation in x vs difference between actual y values + predicted values vs previous plot indicating line good fit if no _ pattern vs predicting y values for x outside observed data range
yes vs no vs p-value vs critical value (t*)
reject H0, a-third or test statistic (t/X^2)-last: > vs < vs p vs t-table/requires C or P (one or 2 sided) + df
z/comparing proportions (p1 = p2 or p1 - p2 = 0) vs t/comparing means (u1 - u2)
test statistic, letter/formula: ((p1^hat - p2^hat) - 0)/(p(1-p^hat)(1/n1 + 1/n2))^0.5 vs ((x1^bar - x2^bar) - 0)/(s1^2/n1 + s2^2/n2)^0.5
(s1^2/n1 + s2^2/n2)^2 / ((1/(n1 - 1))(s1^2/n1)^2 + (1(n2 - 1))(s2^2/n2)2)
two-sample t test formula
E(X) vs sd(X)
u = np vs o = (np(1-p))^0.5
population parameter vs confidence level vs parameter estimate
unbiased estimator/sample statistic with mean (expected value) = to/knowing unbiased doesn't tell how close to _ vs CI/probability an interval selected at random will contain previous vs x1^bar - x2^bar
categorical (qualitative) variable vs quantitative variable
variables: groups individuals/ordinal + nominal vs numerical value/discrete + continuous
1.645 vs 1.96 vs 2.575
z*, C (confidence/%): 90 vs 95 vs 99