NCSU ST370 Chapters 1-8 Important Content
r² sentence
"about {r²}% of the variability in Y can be expressed by its linear relationship with X"
μ (ANOVA model)
(ANOVA) baseline mean
i
(ANOVA) number of levels in a factor; 1,......t
j
(ANOVA) number replicates in each treatment group; 1,......n
N (ANOVA)
(ANOVA) t x n
significance of a small MSE
(ANOVA) very little variation in response variable within treatment groups
the total area a pdf, f(y)
1
uniform distribution properties
1) X assumes values only in a bounded interval 2) the pdf of X is constant over the interval, f(x) = c
requirements for a binomial experiment
1) fixed number of identical trials 2) trials are independent 3) two outcomes, often "success" or "failure" 4) each trial has the same probability of success, p
design of experiments procedure (6 steps)
1) identify problem 2) determine factors 3) determine number of experimental units 4) determine how the factors will be handled (controlled, manipulated, etc) 5) collect data and perform analysis 6) draw conclusions (inferential statistics)
types of experimental error
1) inherent variability 2) measurement error 3) variations in applying/creating treatments 4) extraneous/lurking/confounding variables
assumptions of SLR
1) linear relationship between X and Y (scatter plot) 2) Yi's are independent from each other (study design) 3) Y is approximately normal (QQ plot)
assumptions of one-way ANOVA and how to test them
1) random sample is selected from each group (design) 2) true variance of Y is the same for all groups (Levene Test) 3) Y is normally distributed within each population (histogram or QQ plot)
when are observational studies performed?
1) to study relationships among variables 2) to learn about the population distribution 3) obtain info/data for experimental studies 4) when control is unethical or impossible
the two sources of variation in one-way ANOVA
1) treatment effect 2) error
percentage of all values within 1 SD of mean
68.3%
percentage of all values within 2 SDs of mean
95.4%
E(Y) of a binomial distribution
E(Y) = np
replicate
EUs that receive the same treatment
disjoint or mutually exclusive
P(A ∩ B) = 0
r² equation
SS(model) / SS(Tot)
cumulative distribution function (CDF) for discrete RV
The probability that the observed value of X will be at most x; denoted as F(x) = P(X≤x).
expected value of a discrete random variable is a parameter (T/F)
True
shape of binomial distribution
X ~ Bin (n, p) ; n = #trials, p= probability of success in each trial
how to write the exponential distribution
X ~ Exp(λ)
how to write a normal distribution
X ~ N(μ,σ²)
how to write the uniform distribution
X ~ U (a,b); where a and b are the bounds/parameters of the distribution
writing of the standard normal distribution
Z ~ N(0,1)
parameter
a (usually) unknown summary value about the population
two-way ANOVA analyzes data from what experiment type?
a factorial experiment
F-stat
a measure of the ratio of the variation between the groups to the variation within the groups
random variable ***
a real-valued function with domain and range that assigns a real number to each outcome possible
treatment
a specific experimental condition, either the level of a factor or combinations of levels from multiple factors
correlation
a statistic that measures the strength and direction of the linear association between two quantitative variables
sample
a subset of the population we observe data on
statistic
a summary value calculated from the sample observations
qualitative
a variable that is described by attributes or labels
quantitative
a variable that is described by numerical measurements
methods for accounting for/reducing effects of lurking variables
a) controlled variables b) blocking
significance of total sum of squares
all of the variation in the response variable in our sample compared to the overall mean
full factorial experiment
all possible level combinations are used as treatments
probability distribution
all possible values with corresponding probabilites
population
all the values, items, or individuals of interest
stratified sampling (define and tell how)
allows the researcher to control on variables that may influence outcome. 1) divide the pop into groups (strata) 2) select a SRS from each group
Bernoulli Trial
an experiment with only 2 possible mutually exclusive outcomes
ANOVA (acronym)
analysis of variance
one way ANOVA answers the question:
are the means of these groups different? (for only 1 factor)
control treatment
benchmark treatment sometimes necessary for comparison
factor
categorical (qualitative) explanatory variable of interest
statistical inference
claim about a population based on sample data
properties of a normal distribution
continuous unimodal defined entirely by the mean and SD symmetric
continuous
data type in which any value in an interval is possible
ordinal
data type in which categories can be ordered
nominal
data type in which categories have no ordering
discrete
data type of finite or countable finite number values
probability density function (pdf)
describes the probability distribution of a continuous RV; denoted by f(y)
blocking
divide subjects with similar characteristics into "blocks," then in each block randomly assign to treatment groups
variations in applying/creating treatments
error due to treatment not being clearly defined, leaving room for interpretation
extraneous/lurking/confounding variables
error from variables that are not part of the treatment, but may influence the response
inherent variability
error type characterized by the fact that no two experimental units are the same
measurement error
error type due to error in measurement
mutually independent
events are _____________ if the probability of the intersection of any subset of the n events is equal to the product of the individual probabilities
simple random sampling (define and tell how)
every unit in the population has an equal chance of being selected. 1) assign each unit of the population a number 2) use a random number generator to select which units to use
what type of study establishes causality?
experimental study
for a CDF, F(x), F'(x) = ?
f(y)
when interaction is not significant, use (treatment effects/main effects)
fitted main effects
when interaction is important, use (treatment effects/main effects)
fitted treatment effects
completely random design (CRD)
for t treatments, replicated n_t times each, use a random number generator to assign the treatments to the EUs
controlled variables
holding certain variables constant across the EUs decreases generalizability, but reduces experimental error
independent
if an only if any one of the 3 hold: P(A|B) = P(A) P(B|A) = P(B) P(A ∩ B) = P(A) * P(B)
observational study
individuals in a sample are studied but the investigator does not attempt to manipulate or influence the variables of interest
multiplicative model indicators
interaction p<.05 = interaction = dependent
additive model indicators
interaction p>.05 = no interaction = independent
beta_0 hat meaning
intercept; whe X=0, we expect Y to be beta_0 hat
beta_0 and beta_1 are estimated with the method of
least squares
μ
mean (parameter)
x̄
mean (statistic)
randomization
means that the treatments are randomly allocated to the EUs
repeated measures
measuring the same experimental unit multiple times
k (two way ANOVA)
number of replicates in each treatment group
extrapolation
predicting a new Y value that is outside the range of the data.
conditional probability
probability of an event A given event B already occurred
p-value
probability that we found an f-stat as large as we did by chance
p or π
proportion (parameter)
p̂
proportion (statistic)
probability
proportion of times something would likely occur in many repeated trials
covariate
quantitative explanatory variable
epsilom_ij
random error corresponing to the j-th observation in the i-th level
replication
repetition of an experiment using a large group of subjects to reduce chance variation in the results
coefficient of determination
r²
r or rho hat
sample (estimated) correlation
SRS avoids ________
selection bias
beta_1 hat meaning
slope; for every 1 unit increase in X, we expect a change of about beta_1 hat in the average of y
σ
standard deviation (parameter)
S or s
standard deviation (statistic)
discrete random variable
takes on a finite or countably infinite number of values
continuous random variable
takes on a subset of intervals of real numbers
study
the act or process of investigating of something
main effects
the differences in the mean response when the factor goes from one level to another
union
the event consisting of all out comes that are either in A or B (A ∪ B)
intersection
the event consisting of all out comes that are in both A and B (A ∩ B)
memoryless property
the exponential distribution has this special property, which means if X is the lifetime of a component the probability of failure is constant across time; the probability the component will last "a+t" time units given it has already lasted "a" units is the same as that of a new component lasting more than "t" times units
alpha_i
the main effect of group i
probability mass function (pmf)
the probability distribution function of a discrete random variable symbolized by f(x) = P(Y = y)
complement (A')
the set of all available outcomes not contained in the event
level
the specified value for the factor
experiment
the study environment is regulated, the variables of interest are manipulated by the investigator
MSE in SLR
the variation between observed #'s and predicted #'s
experimental error
the variation in response among replicates
treatment effect
there is an effect due to the variables we are setting
rho
true population correlation
what does a 3x4 factorial design indicate?
two factors, one with 3 levels the other with 4
experimental units
units on which the treatments are assigned
convenience sample
use of the most convenient group available
σ²
variance (parameter)
S² or s²
variance (statistic)
exponential distribution's single parameter
λ; λ > 0
pdf of an exponential distribution
λe^(-λx)