Stats 221 Exam 2 Review

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

The T-distribution is always centered at zero and has a single parameter: ___ ___ ___.

Degrees of freedom

The chi square distribution has just one parameter called ___ ___ ___, which influences the shape, center, and spread of the distribution.

Degrees of freedom

The chi-squared distribution has one parameter: ___ ____ ___.

Degrees of freedom

___association if variable Y tends to decrease as variable X increases.

Negative

___ ___ is a single plausible guess at the value of a parameter underlying a sample.

Point Estimate

___ ___ of parameters: Good for "single-best-guess" situations.

Point estimates

p(x|μ,σ)=1/(σ√2π)×e^(-0.5((x-μ)/σ)^2 ) is known as the ___ ___ function.

Probability Density

___ ___ demonstrate that sample proportions will stabilize toward underlying probabilities as sample size increases.

Probability experiments

___ is a statistical measure that describes the amount of variation in the response that is explained by the least-squares line.

R-squared (R^2)

Chi-Squared Distribution: ___ ____ ___ between two categorical variables: (m.1-1)(m.2-1) (two variables: (# levels in var1 - 1) * (# levels in var2 - 1)

Test for independence

A ___ ___ distribution, or ___ for short, is a normal distribution that has a u of zero and a sigma of one.

Standard normal, z-distribution

The ___ is always centered at ___ and has a single parameter: degrees of freedom.

T-distribution, zero

___ hypothesis: A real association in the population

Alternative

Chi-Squared Distribution: ___ ___ ___ for one multilevel categorical variable: m - 1 (one variable: number of levels in variable - 1)

Goodness of fit

In other words, the ___ test tells you if your sample data represents the data you would expect to find in the actual population.

Goodness-of-fit

Correlation, which always takes values between ___ and ___, describes the strength of the linear relationship between two variables. We denote the correlation by R.

-1, 1

The standard normal distribution is a normal distribution where {μ = ___, σ=___}

0, 1

Compare means across more than two levels => ___.

ANOVA

___ uses a single hypothesis test to check whether the means across many groups are equal.

ANOVA

Interval estimates strike a balance between ___ and ___.

Accuracy and precision

Expanding the width of an interval estimate improves its ___, at the expense of its ____.

Accuracy, precision

Statistical Dependence: if the value or level of one variable is known, we have a different & more precise understanding of how a related variable varies than otherwise; they are ___.

Associated

The ___ distribution is sometimes used to characterize data sets and statistics that are always positive and typically right skewed.

Chi-square

The ___ test is intended to test whether a pattern or association observed in a set of sample data: represents a real association in the population from which the sample was drawn or reflects a random sampling error when, in reality, there is no real association in the population.

Chi-square

The ___ test is used to test statistical significance of associations in a two-way table (so, between categorical variables)

Chi-square

When applying the ___ test to a two-way table, we use df = (R - 1) * (C - 1) where R is the number of rows in the table and C is the number of columns.

Chi-square

The ___ ___ is the sum of the squares of k independent standard normal random variables.

Chi-squared distribution

When we are calculating ___ ___, we're trying to estimate some unknown parameter, and we want to put some boundaries around a range of credible values or plausible values

Confidence intervals

___, which always takes values between -1 and 1, describes the strength of the linear relationship between two variables. We denote it by R.

Correlation

___: strength of a linear relationship

Correlation

___ ___ (based on selected level of significance): the minimum value at which the test statistic would lead you to reject the hypothesis.

Critical value

The calculation of degree of freedom is ___ between one-way table and two-way table.

Different

One way to determine the location of the rejection region is to look at the ___ of the ___ sign in the ___ hypothesis is pointing.

Direction, inequality, alternate

Failing to reject H.0 ___ equal accepting H.0. (Does or does not?)

Does not

___ ___ = the relative frequencies that we would expect if there was no association in the data.

Expected Frequencies

If we ___, we are making an unreliable bet that the approximate linear relationship will be valid in places where it has not been analyzed.

Extrapolate

Applying a model estimate to values outside of the realm of the original data is called ___

Extrapolation

___ is risky...or treacherous.

Extrapolation

The ___ test is used to test if sample data fits a distribution from a certain population (i.e. a population with a normal distribution or one with a binomial distribution)

Goodness-of-fit

___ is directly scrutinized against observed data. __ is only indirectly evaluated.

H.0, H.A

___ ___ uses data to scrutinize the plausibility of a single possible value of parameter(s) of interest.

Hypothesis testing

Important behavior: a sample will converge towards its underlying probability distribution as sample size ___.

Increases

Using a normal distribution when certain conditions are met: 1. a broader ___ condition 2. The ___ condition must be met by both groups.

Independence, success-failure

The sampling distribution for p.hat based on a sample of size n from a population with a true proportion p is nearly normal when: 1. The sample's observations are ___, e.g. are from a simple random sample. 2. We expected to see at least 10 successes and 10 failures in the sample, i.e. np >_ 10 and n(1-p) >_10. This is called the ___ ___.

Independent, Success-failure condition

___ ____: as a guess at the unknown value of a parameter of interest, an interval estimate constitutes a series of plausible ("reasonable") values. Formally intervals are written [L, U] where L is the lower boundary and U is the upper boundary.

Interval Estimate

___ ___: Appealing to researchers who want let the data do the talking

Interval Estimates

For the normal distribution, when sigma is ___, the distribution gets stretched out and a result, as it gets stretched out, it sort of gets shorter as well.

Large

___ ___ association: Variable increases or decreases at a constant rate for each unit increase in variable X.

Linear Montonoic

It is more common to explain the strength of a ___ ___ using R^2, called R-squared.

Linear fit

___ associations: if two associated variables are ordinal or numerical, their association is (your answer) if one variable tends to change in a single direction as the other increases.

Monotonoic

___ ___ association: Variable Y increases or decreases at a changing rate for each unit increase in variable X.

Nonlinear monotonic

___ associations: variability in Y is associated with variability in X, but the central tendency of Y does not change in a single direction as variable X increases.

Nonmonotonic

The ___ ___, aka the Gaussian, the bell-shaped curve models the distribution of continuous numerical outcomes (x), ideally assuming these are unbounded (-∞<x<∞)

Normal distribution

As n increases in size, the shape of the t-distribution begins to resemble a ___ ___ and the t-scores become ___ in magnitude (absolute value).

Normal distribution, smaller

___ hypothesis.: no association in the population

Null

Generally we must check three conditions on the data before performing ANOVA: 1. the observations are independent within and across groups, 2. the data within each group are nearly normal, and 3. the variability across the groups is about equal. When these three conditions are met, we may perform an ANOVA to determine whether the data provide strong against the ___ hypothesis that all the u.i are ___.

Null, equal

___ ___ = the relative frequencies actually observed in the data for the sample.

Observed Frequencies

___ hypothesis test (assume sample size = n) e.g. when use a t-distribution to model the sample mean; df = n - 1

One-sample

A ___ table describes counts for each outcome in a single variable.

One-way

___ ___: The probability of observing the sample result (or a more "extreme" result) if the null hypothesis were actually true.

P-value

A ___ is a distributional characteristic of a given model or population.

Parameter

___ do the same thing for probability distributions or population distributions as sample statistics do for samples.

Parameters

___: N(μ, σ)

Population Mean

___ ___: distribution of the population (the whole set of values)

Population distribution

Inference of sample mean using T-Distribution: When the ___ standard deviation σ is ___ (which is rarely known in real life).

Population, unknown

___ association if variable Y tends to increase as variable X increases.

Positive

Squaring each standardized difference before adding them together does two things: 1. Any standardized difference that is squared will now be ___. 2. Differences that already look unusual - e.g. a standardized difference of 2.5 - will become much ___ after being squared.Larger

Positive, larger

When planning a study, we want to know how likely we are to detect an effect we care about. In other words, if there is a real effect, and that effect is large enough that it has practical value, then what's the probability that we detect that effect? This probability is called the ___, and we can compute it for different sample sizes or for different effect sizes.

Power

Narrowing the width of an interval estimate improves its ___, at the expense of its ___.

Precision, accuracy

It is more common to explain the strength of a linear fit using ___.

R-squared

We ___ observe the sampling distribution of the proportions. It is the distribution of the proportions we would get if we took infinite numbers of samples of the same size as our sample, but since we can't take infinite samples, we would like to model this underlying, unknown distribution.

Rarely

The Chi-square test intended to test whether a pattern or association observed in a set of sample data: A. represents a ___ ___ in the population from which the sample was drawn or reflects ___ ___ ___ when, in reality, there is no real association in the population.

Real association, random sampling error

The ___ of the ith observation (x.i, y.i) is the difference of the observed response (y.i) and the response we would predict based on the model fit (y.i). e.i = y.i - y^.i

Residual

___: N(x̄, s)

Sample

___ ___: distribution of the sample (a subset of the population)

Sample distribution

When a sample is small, we also require that the ___ ___ come from a ___ ___ population. We can relax this condition more and more for larger and larger sample sizes.

Sample observations, normally distributed

For example, we found that if we have a sample size of 100 in each group, we can only detect an effect size of 3mmHg with a probability of about 0.42. Suppose the researchers move forward and only used 100 patients per group, and the data did not support the alternative hypothesis, i.e. the researchers did not reject H.0. This is a very bad situation to be in for a few reasons: - We want to avoid this situation, so we need to determine an appropriate ___ ___ to ensure we can be pretty confident that we'll detect any effects that are practically important.

Sample size

Remember ___ ___ will summarize a sample distribution; a ___ summarizes a population distribution or probability distribution. But we use the same kinds of parameters as we use statistics; we have location, scale, shaped parameters

Sample statistics, parameter

Inference of sample mean using T-Distribution: So we use the ___ standard deviation s instead to estimate the ___ ___ in the sampling distribution SE.

Sample, standard error

"___ ___" is a fancy word for "standard deviation" that we only use when we are talking about.

Standard error

___ ___: distribution of he sample statistics you're interested in.

Sampling distribution

___ ___ of ___ using normal ____: N(p.obs, SE.p)

Sampling distribution, proportion, approximation

___ ___ are really the basis of most of the statistical inferences.

Sampling distributions

For the normal distribution, when ___ is small, the distribution is more concentrated.

Sigma

For the normal distribution, when sigma is ___, the distribution is more concentrated.

Sigma

For the normal distribution, if we hold ___ constant but vary ___ then we note that distribution maintains the same scales, it's not more stretched out, but instead its center moves; as its center moves, the whole distribution along with it.

Sigma, mu

___ p-value -> reject null.

Small

The sampling distribution for p.hat based on a sample of size n from a population with a true proportion p is nearly normal when: 1. The sample's observations are independent, e.g. are from a simple random sample. 2. We expected to see at least 10 successes and 10 failures in the sample, i.e. np >_ 10 and n(1-p) >_10. This is called the success-failure condition. When these conditions are met, then the sampling distribution p.hat is nearly normal with mean p and ___ ___ = sqrt(p(1-p)/n)

Standard Error

___ ___: if the value or level of one variable is known, we have a different & more precise understanding of how a related variable varies than otherwise; they are associated.

Statistical dependence

___ ___: knowing the value of one variable does not change our understanding of another variable; they are unassociated.

Statistical independence

If we compare the means of a given variable between multiple groups (categorical levels) by analyzing each pair separately, we perform m((m-1)/2 ___ tests for independence. We compound our risk of committing both type 1 and type 2 errors.

Two-sample

___ hypothesis test (assume sample sizes = n1 and n2) e.g. t-tests for differences between two sample means (unpaired data) df=min(n.1-1, n.2-1) (as the smaller of n-1 for the two samples)

Two-sample

A ___ table describes counts of combinations of outcomes for two variables.

Two-way

A ___ ___ Error is rejecting the null hypothesis when H.0 is actually true. A ___ ___ error is failing to reject the null hypothesis when the alternative is actually true.

Type 1, type 2

Statistical Independence: knowing the value of one variable does not change our understanding of another variable; they are ___..

Unassociated

The standard normal distribution is also known as the ___ distribution.

Z

The ___ of an observation is the number of standard deviation it falls above or below the mean.

Z-score

We can rescale normal distributions to the z distribution by rescaling normally distributed observations to "standard scores" or ___.

Z-scores

Generally we must check three conditions on the data before performing ANOVA: 1. the observations are ___ within and across groups, 2. the data within each group are nearly ___, and 3. the ___ across the groups is about equal.

independent, normal, variability

The Normal distribution has two parameters: ___ (___ parameter) and ___ (___ parameter).

μ, location, σ, scale


Ensembles d'études connexes

6.1 Exchange between organisms and their environment

View Set

Diversity in the Workplace Final

View Set

Technology and its effects on modern america

View Set

The American Red Cross CPR Quiz Answers

View Set