Stats 221 Exam 2 Review
The T-distribution is always centered at zero and has a single parameter: ___ ___ ___.
Degrees of freedom
The chi square distribution has just one parameter called ___ ___ ___, which influences the shape, center, and spread of the distribution.
Degrees of freedom
The chi-squared distribution has one parameter: ___ ____ ___.
Degrees of freedom
___association if variable Y tends to decrease as variable X increases.
Negative
___ ___ is a single plausible guess at the value of a parameter underlying a sample.
Point Estimate
___ ___ of parameters: Good for "single-best-guess" situations.
Point estimates
p(x|μ,σ)=1/(σ√2π)×e^(-0.5((x-μ)/σ)^2 ) is known as the ___ ___ function.
Probability Density
___ ___ demonstrate that sample proportions will stabilize toward underlying probabilities as sample size increases.
Probability experiments
___ is a statistical measure that describes the amount of variation in the response that is explained by the least-squares line.
R-squared (R^2)
Chi-Squared Distribution: ___ ____ ___ between two categorical variables: (m.1-1)(m.2-1) (two variables: (# levels in var1 - 1) * (# levels in var2 - 1)
Test for independence
A ___ ___ distribution, or ___ for short, is a normal distribution that has a u of zero and a sigma of one.
Standard normal, z-distribution
The ___ is always centered at ___ and has a single parameter: degrees of freedom.
T-distribution, zero
___ hypothesis: A real association in the population
Alternative
Chi-Squared Distribution: ___ ___ ___ for one multilevel categorical variable: m - 1 (one variable: number of levels in variable - 1)
Goodness of fit
In other words, the ___ test tells you if your sample data represents the data you would expect to find in the actual population.
Goodness-of-fit
Correlation, which always takes values between ___ and ___, describes the strength of the linear relationship between two variables. We denote the correlation by R.
-1, 1
The standard normal distribution is a normal distribution where {μ = ___, σ=___}
0, 1
Compare means across more than two levels => ___.
ANOVA
___ uses a single hypothesis test to check whether the means across many groups are equal.
ANOVA
Interval estimates strike a balance between ___ and ___.
Accuracy and precision
Expanding the width of an interval estimate improves its ___, at the expense of its ____.
Accuracy, precision
Statistical Dependence: if the value or level of one variable is known, we have a different & more precise understanding of how a related variable varies than otherwise; they are ___.
Associated
The ___ distribution is sometimes used to characterize data sets and statistics that are always positive and typically right skewed.
Chi-square
The ___ test is intended to test whether a pattern or association observed in a set of sample data: represents a real association in the population from which the sample was drawn or reflects a random sampling error when, in reality, there is no real association in the population.
Chi-square
The ___ test is used to test statistical significance of associations in a two-way table (so, between categorical variables)
Chi-square
When applying the ___ test to a two-way table, we use df = (R - 1) * (C - 1) where R is the number of rows in the table and C is the number of columns.
Chi-square
The ___ ___ is the sum of the squares of k independent standard normal random variables.
Chi-squared distribution
When we are calculating ___ ___, we're trying to estimate some unknown parameter, and we want to put some boundaries around a range of credible values or plausible values
Confidence intervals
___, which always takes values between -1 and 1, describes the strength of the linear relationship between two variables. We denote it by R.
Correlation
___: strength of a linear relationship
Correlation
___ ___ (based on selected level of significance): the minimum value at which the test statistic would lead you to reject the hypothesis.
Critical value
The calculation of degree of freedom is ___ between one-way table and two-way table.
Different
One way to determine the location of the rejection region is to look at the ___ of the ___ sign in the ___ hypothesis is pointing.
Direction, inequality, alternate
Failing to reject H.0 ___ equal accepting H.0. (Does or does not?)
Does not
___ ___ = the relative frequencies that we would expect if there was no association in the data.
Expected Frequencies
If we ___, we are making an unreliable bet that the approximate linear relationship will be valid in places where it has not been analyzed.
Extrapolate
Applying a model estimate to values outside of the realm of the original data is called ___
Extrapolation
___ is risky...or treacherous.
Extrapolation
The ___ test is used to test if sample data fits a distribution from a certain population (i.e. a population with a normal distribution or one with a binomial distribution)
Goodness-of-fit
___ is directly scrutinized against observed data. __ is only indirectly evaluated.
H.0, H.A
___ ___ uses data to scrutinize the plausibility of a single possible value of parameter(s) of interest.
Hypothesis testing
Important behavior: a sample will converge towards its underlying probability distribution as sample size ___.
Increases
Using a normal distribution when certain conditions are met: 1. a broader ___ condition 2. The ___ condition must be met by both groups.
Independence, success-failure
The sampling distribution for p.hat based on a sample of size n from a population with a true proportion p is nearly normal when: 1. The sample's observations are ___, e.g. are from a simple random sample. 2. We expected to see at least 10 successes and 10 failures in the sample, i.e. np >_ 10 and n(1-p) >_10. This is called the ___ ___.
Independent, Success-failure condition
___ ____: as a guess at the unknown value of a parameter of interest, an interval estimate constitutes a series of plausible ("reasonable") values. Formally intervals are written [L, U] where L is the lower boundary and U is the upper boundary.
Interval Estimate
___ ___: Appealing to researchers who want let the data do the talking
Interval Estimates
For the normal distribution, when sigma is ___, the distribution gets stretched out and a result, as it gets stretched out, it sort of gets shorter as well.
Large
___ ___ association: Variable increases or decreases at a constant rate for each unit increase in variable X.
Linear Montonoic
It is more common to explain the strength of a ___ ___ using R^2, called R-squared.
Linear fit
___ associations: if two associated variables are ordinal or numerical, their association is (your answer) if one variable tends to change in a single direction as the other increases.
Monotonoic
___ ___ association: Variable Y increases or decreases at a changing rate for each unit increase in variable X.
Nonlinear monotonic
___ associations: variability in Y is associated with variability in X, but the central tendency of Y does not change in a single direction as variable X increases.
Nonmonotonic
The ___ ___, aka the Gaussian, the bell-shaped curve models the distribution of continuous numerical outcomes (x), ideally assuming these are unbounded (-∞<x<∞)
Normal distribution
As n increases in size, the shape of the t-distribution begins to resemble a ___ ___ and the t-scores become ___ in magnitude (absolute value).
Normal distribution, smaller
___ hypothesis.: no association in the population
Null
Generally we must check three conditions on the data before performing ANOVA: 1. the observations are independent within and across groups, 2. the data within each group are nearly normal, and 3. the variability across the groups is about equal. When these three conditions are met, we may perform an ANOVA to determine whether the data provide strong against the ___ hypothesis that all the u.i are ___.
Null, equal
___ ___ = the relative frequencies actually observed in the data for the sample.
Observed Frequencies
___ hypothesis test (assume sample size = n) e.g. when use a t-distribution to model the sample mean; df = n - 1
One-sample
A ___ table describes counts for each outcome in a single variable.
One-way
___ ___: The probability of observing the sample result (or a more "extreme" result) if the null hypothesis were actually true.
P-value
A ___ is a distributional characteristic of a given model or population.
Parameter
___ do the same thing for probability distributions or population distributions as sample statistics do for samples.
Parameters
___: N(μ, σ)
Population Mean
___ ___: distribution of the population (the whole set of values)
Population distribution
Inference of sample mean using T-Distribution: When the ___ standard deviation σ is ___ (which is rarely known in real life).
Population, unknown
___ association if variable Y tends to increase as variable X increases.
Positive
Squaring each standardized difference before adding them together does two things: 1. Any standardized difference that is squared will now be ___. 2. Differences that already look unusual - e.g. a standardized difference of 2.5 - will become much ___ after being squared.Larger
Positive, larger
When planning a study, we want to know how likely we are to detect an effect we care about. In other words, if there is a real effect, and that effect is large enough that it has practical value, then what's the probability that we detect that effect? This probability is called the ___, and we can compute it for different sample sizes or for different effect sizes.
Power
Narrowing the width of an interval estimate improves its ___, at the expense of its ___.
Precision, accuracy
It is more common to explain the strength of a linear fit using ___.
R-squared
We ___ observe the sampling distribution of the proportions. It is the distribution of the proportions we would get if we took infinite numbers of samples of the same size as our sample, but since we can't take infinite samples, we would like to model this underlying, unknown distribution.
Rarely
The Chi-square test intended to test whether a pattern or association observed in a set of sample data: A. represents a ___ ___ in the population from which the sample was drawn or reflects ___ ___ ___ when, in reality, there is no real association in the population.
Real association, random sampling error
The ___ of the ith observation (x.i, y.i) is the difference of the observed response (y.i) and the response we would predict based on the model fit (y.i). e.i = y.i - y^.i
Residual
___: N(x̄, s)
Sample
___ ___: distribution of the sample (a subset of the population)
Sample distribution
When a sample is small, we also require that the ___ ___ come from a ___ ___ population. We can relax this condition more and more for larger and larger sample sizes.
Sample observations, normally distributed
For example, we found that if we have a sample size of 100 in each group, we can only detect an effect size of 3mmHg with a probability of about 0.42. Suppose the researchers move forward and only used 100 patients per group, and the data did not support the alternative hypothesis, i.e. the researchers did not reject H.0. This is a very bad situation to be in for a few reasons: - We want to avoid this situation, so we need to determine an appropriate ___ ___ to ensure we can be pretty confident that we'll detect any effects that are practically important.
Sample size
Remember ___ ___ will summarize a sample distribution; a ___ summarizes a population distribution or probability distribution. But we use the same kinds of parameters as we use statistics; we have location, scale, shaped parameters
Sample statistics, parameter
Inference of sample mean using T-Distribution: So we use the ___ standard deviation s instead to estimate the ___ ___ in the sampling distribution SE.
Sample, standard error
"___ ___" is a fancy word for "standard deviation" that we only use when we are talking about.
Standard error
___ ___: distribution of he sample statistics you're interested in.
Sampling distribution
___ ___ of ___ using normal ____: N(p.obs, SE.p)
Sampling distribution, proportion, approximation
___ ___ are really the basis of most of the statistical inferences.
Sampling distributions
For the normal distribution, when ___ is small, the distribution is more concentrated.
Sigma
For the normal distribution, when sigma is ___, the distribution is more concentrated.
Sigma
For the normal distribution, if we hold ___ constant but vary ___ then we note that distribution maintains the same scales, it's not more stretched out, but instead its center moves; as its center moves, the whole distribution along with it.
Sigma, mu
___ p-value -> reject null.
Small
The sampling distribution for p.hat based on a sample of size n from a population with a true proportion p is nearly normal when: 1. The sample's observations are independent, e.g. are from a simple random sample. 2. We expected to see at least 10 successes and 10 failures in the sample, i.e. np >_ 10 and n(1-p) >_10. This is called the success-failure condition. When these conditions are met, then the sampling distribution p.hat is nearly normal with mean p and ___ ___ = sqrt(p(1-p)/n)
Standard Error
___ ___: if the value or level of one variable is known, we have a different & more precise understanding of how a related variable varies than otherwise; they are associated.
Statistical dependence
___ ___: knowing the value of one variable does not change our understanding of another variable; they are unassociated.
Statistical independence
If we compare the means of a given variable between multiple groups (categorical levels) by analyzing each pair separately, we perform m((m-1)/2 ___ tests for independence. We compound our risk of committing both type 1 and type 2 errors.
Two-sample
___ hypothesis test (assume sample sizes = n1 and n2) e.g. t-tests for differences between two sample means (unpaired data) df=min(n.1-1, n.2-1) (as the smaller of n-1 for the two samples)
Two-sample
A ___ table describes counts of combinations of outcomes for two variables.
Two-way
A ___ ___ Error is rejecting the null hypothesis when H.0 is actually true. A ___ ___ error is failing to reject the null hypothesis when the alternative is actually true.
Type 1, type 2
Statistical Independence: knowing the value of one variable does not change our understanding of another variable; they are ___..
Unassociated
The standard normal distribution is also known as the ___ distribution.
Z
The ___ of an observation is the number of standard deviation it falls above or below the mean.
Z-score
We can rescale normal distributions to the z distribution by rescaling normally distributed observations to "standard scores" or ___.
Z-scores
Generally we must check three conditions on the data before performing ANOVA: 1. the observations are ___ within and across groups, 2. the data within each group are nearly ___, and 3. the ___ across the groups is about equal.
independent, normal, variability
The Normal distribution has two parameters: ___ (___ parameter) and ___ (___ parameter).
μ, location, σ, scale