STATS

¡Supera tus tareas y exámenes ahora con Quizwiz!

Large samples are over

30

Minimizing error

= parameter that has the least error given the data -Also called ordinary least squares (OLS) -Not always accurate, just not as terrible

What are Type I and Type II errors?

A Type I error occurs when we believe that there is a genuine effect in our population, when in fact there isn't. A Type II error occurs when we believe that there is no effect in the population when, in reality, there is.

What is a test statistic and what does it tell us?

A test statistic is a statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses, or to establish whether a model is a reasonable representation of what's happening in the population.

Mean is center of

CI; S = SD (not including equation bc it takes too long in docs)

Central limit theorem:

Central limit theorem: as samples get large, the sampling distribution has a normal distribution with a mean equal to the population mean and a SD (standard deviation) of: σ-x (-x meaning mean) = S/√N

Standard error of the mean (SE):

Standard error of the mean (SE): standard deviation of sample means

Outcome = (model) + error

The data we observe can be predicted from the model we choose to fit plus some degree of error

What is the mean and how do we tell if it's representative of our data?

The mean is a simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the 'typical' score. We use the variance, or standard deviation, to tell us whether it is representative of our data. The standard deviation is a measure of how much error there is associated with the mean: a small standard deviation indicates that the mean is a good representation of our data.

Bonferroni correction

applied to alpha level to control overall Type I error rate when multiple significance tests are carried out -Criterion of significance of the alpha level / # of tests conducted -Too strict when lots of tests performed

Confidence interval

boundaries in which we believe the population value will fall

Deviance is another word for

error

Linear models

models based on straight lines

Some overlap is

needed (no overlap = means are from different populations)

Null hypothesis means

no relationship between variables -P is long-run probability

Two Tailed:

non-directional -Changing these and their tests are the fact is cheating

Statistical significance

p < 0.05, our model explains a sufficient amount of variation to reflect genuine effect

T- distribution:

probability distr. changes shape when sample size increases

Interval estimate

sample values as midpoint w/ upper and lower limit)

Sampling variation

samples vary bc they contain different members of population

Point estimate

single value from sample

Large samples

small differences can be significant

Sample

small sub-set of the population

Interval is

small, sample mean is close to true mean

Average error is

ss/n (# of variables)

SS can be used to assess

total error

Population

total set of observations that can be made

P value:

usually 0.05 (Fisher never stated this as the magic number though) -5% chance = threshold of confidence

We predict ______ of an _______ ________ based on a model

values of an outcome variable

Degrees of freedom

(df) = number of scores adjusted because we are trying to estimate population (n-1)

Parameters

(usually) constants believed to represent some fundamental truth about the relations between variables in the model

What do the sum of squares, variance and standard deviation represent? How do they differ?

All of these measures tell us something about how well the mean fits the observed sample data. Large values (relative to the scale of measurement) suggest the mean is a poor fit of the observed scores, and small values suggest a good fit. They are also, therefore, measures of dispersion, with large values indicating a spread-out distribution of scores and small values showing a more tightly packed distribution. These measures all represent the same thing, but differ in how they express it. The sum of squared errors is a 'total' and is, therefore, affected by the number of data points. The variance is the 'average' variability but in units squared. The standard deviation is the average variation but converted back to the original units of measurement. As such, the size of the standard deviation can be compared to the mean (because they are in the same units of measurement).

Types of Hypothesis

Alternative: the one with 1; the effect will be present Null: the one with 0; the effect is absent Not true or untrue, but in terms of probability Directional hypothesis: effect will occur and the direction is stated Non-directional hypothesis: effect is absent and direction is not stated

What is statistical power?

Power is the ability of a test to detect an effect of a particular size (a value of 0.8 is a good level to aim for).

Statistical Power

Probability a given test will find an effect assuming one exists -Depends on: ~How big the effect is, how strict we are on significance, sample size ~= 1-p ~Can use to calculate sample size necessary to achieve given level of power

What's the difference between the standard deviation and the standard error?

The standard deviation tells us how much observations in our sample differ from the mean value within our sample. The standard error tells us not about how the sample mean represents the sample itself, but how well the sample mean represents the population mean. The standard error is the standard deviation of the sampling distribution of a statistic. For a given statistic (e.g. the mean) it tells us how much variability there is in this statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came.

Why do we use samples?

We are usually interested in populations, but because we cannot collect data from every human being (or whatever) in the population, we collect data from a small subset of the population (known as a sample) and use these data to infer things about the population as a whole.

Intervals

Typically use 95% or sometimes 99% CI -Probability that in 95% of the sample, the CI contains the population value (95% of the time the sample value will appear in pop value, 5% of the time it won't)

Fit

degree to which a statical model represents the data collected good/moderate/poor

One tailed:

directional (results in opposite direction, you MUST accept null hypothesis) -Changing these and their tests are the fact is cheating

Non-directional hypothesis:

effect is absent and direction is not stated

Directional hypothesis:

effect will occur and the direction is stated

Test stats

effect/over; systematic variance/non-systematic variance

Experimental error rate

error rate across statistical tests conducted on the same data = 1-0.95 ^n

Sampling distribution

frequency distribution of sample means from the same population

Any parameter than can be estimated in a sample has a

hypothetical sampling distribution and standard error

Small samples

large differences can be non-significant

To calculate confidence interval we need to know the

limits w/in which 95% of the sample will fall

null hypothesis

the one with 0; the effect is absent

alternative hypothesis

the one with 1; the effect will be present


Conjuntos de estudio relacionados

AP World History Midterm Questions

View Set

MyAP Classroom Quizzes for Dervivative Test

View Set

Probability and Statistics: Week 1 Exercise

View Set

Chapter 27: Surface Processing Operations

View Set