Statistics (OpenIntro)
pnorm(1300, mean = 1100, sd = 200, lower.tail=FALSE) tells you
% of values greater than 1300
pnorm(1300, mean = 1100, sd = 200) tells you
% of values lower than 1300
Hypothesis Testing Steps
1. Check Assumptions 2. Hypotheses 3. use sample data to collect an estimate of that parameter 4. Compare your estimate to the claim (critical value) 5. Make a conclusion about the claim (reject or fail to reject)
binomial distribution describes the probability of...
binomial distribution describes the probability of having exactly k successes in n independent Bernoulli trials with probability of a success p (in Example 4.28, n = 4, k = 3, p = 0:7)
binomial distribution is used to describe...
binomial distribution is used to describe the number of successes in a fixed number of trials. This is different from the geometric distribution, which described the number of trials we must wait before we observe a success.
chi-square distribution has ____________ parameter(s) called ____________
chi-square distribution has just one parameter called degrees of freedom (df), which influences the shape, center, and spread of the distribution.
covariates
covariates are characteristics (excluding the actual treatment) of the participants in an experiment. If you collect data on characteristics before you run an experiment, you could use that data to see how your treatment affects different groups or populations
The _________________ is used to describe how many trials it takes to observe a success.
geometric distribution
negative binomial distribution describes...
negative binomial distribution describes the probability of observing the kth success on the nth trial.
"one-way" meaning
testing one independent variable (e.g. different between two phrasings of questions). two-way means two independent variables (e.g. different between three phrasings of questions).
sampling distribution
the distribution of values taken by the statistic in all possible samples of the same size from the same population
R squared
the proportion of the total variation in a dependent variable explained by an independent variable
confidence interval
the range of values within which a population parameter is estimated to lie
x-bar vs x-hat
x-bar is always a sample mean but it is not the only possible method of estimating a population mean. Informally: a hat is an estimate that is sometimes calculated by the arithmetic mean, but can be some other type of estimate (median, mode, some kind of maximum likelihood estimate, etc.)
Examples of parameters
Because the mean and standard deviation describe a normal distribution exactly, they are called the distribution's parameters.
chi squared degrees of freedom
(row-1)(column-1)
Determining Independence
1) If the observations are from a simple random sample, then they are independent. 2) If a sample is from a seemingly random process, e.g. an occasional error on an assembly line, checking independence is more dicult. In this case, use your best judgement.
geometric distribution is either of two discrete probability distributions...
1) The probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set { 1, 2, 3, ... } 2) The probability distribution of the number Y = X − 1 of failures before the first success, supported on the set { 0, 1, 2, 3, ... }
t distribution
A distribution specified by degrees of freedom used to model test statistics for the sample mean, differences between sample means, etc.
Adjusted R-squared
A goodness of fit measure in multiple regression analysis that penalizes additional explanatory variables by using a degrees of freedom adjustment in estimating the error variance
chi-square distribution
A skewed distribution whose shape depends solely on the number of degrees of freedom. As the number of degrees of freedom increases, the chi-square distribution becomes more symmetrical.
one-sided hypothesis test
A test in which the null hypothesis is rejected only if the evidence indicates that the population parameter is greater than (smaller than) θ0. The alternative hypothesis also has one side.
dummy variable
A variable for which all cases falling into a specific category assume the value of 1, and all cases not falling into that category assume a value of 0.
significance level
As a general rule of thumb, for those cases where the null hypothesis is actually true, we do not want to incorrectly reject H0 more than 5% of the time. This corresponds to a significance level of 0.05. That is, if the null hypothesis is true, the significance level indicates how often the data lead us to incorrectly reject H0. We often write the significance level using (the Greek letter alpha): = 0:05.
Bias
Bias describes a systematic tendency to over- or under-estimate the true population value. For example, if we were taking a student poll asking about support for a new college stadium, we'd probably get a biased estimate of the stadium's level of student support by wording the question as, Do you support your school by supporting funding for the new stadium?
Power (effect size)
In other words, if there is a real effect, and that effect is large enough that it has practical value, then what's the probability that we detect that effect? This probability is called the power, and we can compute it for different sample sizes or for different effect sizes.
data snooping or data fishing
It is inappropriate to examine all data by eye (informal testing) and only afterwards decide which parts to formally test. This is called data snooping or data fishing. Naturally, we would pick the groups with the large differences for the formal test, and this would leading to an inflation in the Type 1 Error rate. To understand this better, let's consider a slightly different problem. Suppose we are to measure the aptitude for students in 20 classes in a large elementary school at the beginning of the year. In this school, all students are randomly assigned to classrooms, so any differences we observe between the classes at the start of the year are completely due to chance. However, with so many groups, we will probably observe a few groups that look rather different from each other. If we select only these classes that look so different and then perform a formal test, we will probably make the wrong conclusion that the assignment wasn't random.
Mean vs expected value
It is no accident that we use the symbol for both the mean and expected value. The mean and the expected value is one and the same.
Null and Alternative Hypothesis
Null hypothesis (Ho) - stating no difference Alternative hypothesis (Ha) - stating there's a difference
Why run simulations?
One simulation isn't enough to get a great sense of the distribution of estimates we might expect in the simulation, so we should run more simulations. In a second simulation, we get ^p2 = 0:885, which has an error of +0.005. In another, ^p3 = 0:878 for an error of -0.002. With the help of a computer, we've run the simulation 10,000 times and created a histogram of the results from all 10,000 simulations in Figure 5.2. This distribution of sample proportions is called a sampling distribution
Percentile
Percentile is the fraction of cases that have lower scores than Ann.
Power (effect size) example
Suppose that the company researchers care about finding any effect on blood pressure that is 3 mmHg or larger vs the standard medication. Here, 3 mmHg is the minimum effect size of interest, and we want to know how likely we are to detect this size of an effect in the study.
Z-score
The Z-score of an observation is defined as the number of standard deviations it falls above or below the mean.
chi-squared test a particular distribution example (evaluating goodness of fit for a distribution)
The actual data, shown in the Observed row in Figure 6.11, can be compared to the expected counts from the Geometric Model row. In general, the expected counts are determined by (1) identifying the null proportion associated with each bin, then (2) multiplying each null proportion by the total count to obtain the expected counts.
frequency histogram vs density histogram
The difference between a frequency histogram and a density histogram is that while in a frequency histogram the heights of the bars add up to the total number of observations, in a density histogram the areas of the bars add up to 1
MSG vs MSE (ANOVA)
The larger the observed variability in the sample means (MSG) relative to the within-group observations (MSE), the larger F will be and the stronger the evidence against the null hypothesis.
mean square error (MSE)
The mean square between the groups is, on its own, quite useless in a hypothesis test. We need a benchmark value for how much variability should be expected among the sample means if the null hypothesis is true. To this end, we compute a pooled variance estimate, often abbreviated as the mean square error (MSE), which has an associated degrees of freedom value dfE = n - k. It is helpful to think of MSE as a measure of the variability within the groups.
mean square between groups (MSG)
The method of analysis of variance in this context focuses on answering one question: is the variability in the sample means so large that it seems unlikely to be from chance alone? This question is different from earlier testing procedures since we will simultaneously consider many groups, and evaluate whether their sample means differ more than we would expect from natural variation. We call this variability the mean square between groups (MSG), and it has an associated degrees of freedom, dfG = k - 1 when there are k groups. The MSG can be thought of as a scaled variance formula for means. If the null hypothesis is true, any variation in the sample means is due to chance and shouldn't be too large.
reference level (hint: dummy variable)
The missing level is called the reference level, and it represents the default level that other levels are measured against
dummy variable trap
The mistake of including too many dummy variables among the independent variables; it occurs when an overall intercept is in the model and a dummy variable is included for each group. I.E. you need to remove one group/column.
geometric distribution
The probability distribution of a geometric random variable X. All possible outcomes of X before the first success is seen and their associated probabilities.
p-value
The probability of observing a test statistic as extreme as, or more extreme than, the statistic obtained from a sample, under the assumption that the null hypothesis is true.
collinear
We say the two predictor variables are collinear (pronounced as co-linear) when they are correlated, and this collinearity complicates model estimation. While it is impossible to prevent collinearity from arising in observational data, experiments are usually designed to prevent predictors from being collinear.
Central Limit Theorem
The theory that, as sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution.
qnorm(0.85,mean=70,sd=3) tells you
The value where 85% of observations are lower than (73.1)
influential point
Usually we can say a point is influential if, had we fitted the line without it, the influential point would have been unusually far from the least squares line. It is tempting to remove outliers. Don't do this without a very good reason. Models that ignore exceptional (and interesting) cases often perform poorly. For instance, if a financial firm ignored the largest market swings they would soon go bankrupt by making poorly thought-out investments.
Bernoulli random variable
When an individual trial only has two possible outcomes, often labeled as success or failure, it is called a Bernoulli random variable
Z Value for 90% confidence
Z = 1.64
Z value for 95% confidence
Z = 1.96
Z Value for 99% confidence
Z = 2.58
chi-squared test
a statistical test of the fit between a theoretical frequency distribution and a frequency distribution of observed data for which each observation may fall into one of several classes.
point estimate
a summary statistic from a sample that is just one number used as an estimate of the population parameter
high leverage
outliers in x direction
When we're talking about a sampling distribution or the variability of a point estimate, we typically use the term...
standard error rather than standard deviation, and the notation SE^p
