MATH-11 STATISTICS MEGASTUDY

Ace your homework & exams now with Quizwiz!

if the correlation is 1 or -1, the scatterplot must make

a perfect line

Random Variable

a variable that takes on possible numeric values that result from a random event

If x = Uniform(1,4) What is probability of getting a rational number? What is the probability of getting an irrational number

a. 0 b. 1, because nearly every value from 1 to 4 is irrational

If you break up the sample space into disjoint sets, the probabilities of these events must

add up to 1

Trial

an action that creates data

Quadrant 1 or 2 curve

apply a function higher on the tower of power than is currently used

Quadrant 3 or 4 curve

apply a function lower on the tower of power than is currently used

Law of Large Numbers (LLN)

as a random process is repeated more and more, the proportion of times an event occurs converges to a number (the probability of that event)

Mean on a histogram

balance point of the histogram, the torque is the same on both sides of the mean

Correlation cannot reveal

causation

In general, the side of the SD gives a sense for how

closely you experience playing the game will hug the mean

The X^2 distribution is used to compare

counts in a table (to a list of expected values).

Outliers

data that stands apart from the distribution

median

data value in the middle of the list of data

CI for mean difference of paired samples

dhat +/- t*df*(SE(dhat) -d stands for differences SE = s/sqrt(n) df = n-1

Difference in Means: Confidence Interval for paired, dependent samples

dhat +/- t*df*(SE(dhat) SE = Standard deviation of d / sqrt(n)

Good way of inflate r^2

dividing data into subgroups that are more homogenous

Residual

e=y-ŷ (how off the model is at the value of x) -y = value observed from actual data point -ŷ = value predicted from regression line

Pros of range

easy to calculate, gives sense of span of data

Attributes of a scatterplot

form, direction, strength, outliers

high influence point

gives a significantly different slope for the regression line if it is included, versus excluded, from an analysis

When a histogramis skewed right, the mean is

greater than the median. (Small amount of higher values push the mean forward but don't affect the median)

Standard deviation

how far each value is from the mean

Extrapolation is dangerous because

it assumes the relationship holds beyond the data range you have seen and used for a model

Reasons for using the complement rule

it is often easier to calculate the complement of something.

marginal probabilities are the sums of

joint probabilities

Two random variables are independent fi

knowing the outcome of one has no effect on the outcome of the other

As the graph skews to the right, the mean becomes

larger than the median. The mean is pulled right by the large values in the data set.

Tails of distribution

left and right sides of a graph

When the relationship is curved, the correlation is

less meaningful

The density graph predicts

likeliness of an event occurring, not its probability of occuring

Correlation only works with

linear relationships

correlation is unaffected by

linear scale changes (cor(x,y) = cor (x,2.5y) = cor(x,y+14) = cor(2x-17, 99999y+1))

Probability Model

lists the different outcomes for a random variable and gives the probability of each outcome

E(x) is also known as the

long-run average, denoted μ

Skewed left

longer tail on the left

Skewed right

longer tail on the right

When a histogram is skewed left, the mean is

lower than the median (Small amount of lower values push the mean back but don't affect the median)

To establish a causation, eliminate

lurking variables

Sampling distribution

making a histogram of all the means from all our different samples

center used for symmetric distributions without outliers

mean

The center of the sample distribution is at

mean, μ

Which center of distribution is resistant to outliers

median

center used for asymmetric distributions (skewed)

median

Outliers can occur for many different reasons:

mistakes, atypical, scientifically important

How to find line of best fit

model error = sum of the absolute value of residuals OR sum of (residuals)^2

As the size of a sample grows, the sampling distributions tends to look

more and more normal

Bigger padding in a confidence interval leads to

more confidence, but less relevance. (100% confidence that a value is in the interval 0-1000 is obvious. Where is the value, 2? 200? 547?)

In general, the bigger X^2 is, the

more evidence we have against H0

As df becomes larger, the t-distribution becomes

more standard/normal. The center does not change. The spread becomes narrower.

when you average things, you are eliminating

most variation that happens

The Poisson model is a good approximation of the Binomial model when

n >/= 20 and P <0.05 or n >/= 100 and p < 0.1 This is helpful because the Binomial model becomes unusable when n gets really big or small

finding n given margin of error

n=[(z*)²(p̂)(q̂)] / (ME)²]

Uniform model histogram

no peaks

As long as the conditions are met, it does not matter what distribution you start with. If you keep taking samples. you'll eventually get a

normal distribution

A high r^2 value is

not an indicator that a linear model is appropriate

If an outcome is common to both events, the events are

not disjoint

A random variable should always be a

numeric outcome, NEVER a probability

Exclusive "or"

often used in real life. A or B means: 1. A, but not B 2. B, but not A

don't assume your data are all part of

one homogeneous population. think about possible subgroups to make analysis better.

unimodel histogram

one peak

Scatterplot

one variable on x-axis (predictor), other on y-axis (response).

high leverage point

outlier where x is far from the mean of x values

the median is resistant to

outliers and skew

For a two sample proportion test testing p1 and p2, you would think about

p1 - p2. e.g. p1 - p2 > 0

Regression to the mean

people far from the mean are pulled towards it in subsequent trials because it is easier to score near the mean than far from it.

T-distribution gives more

precise results

General confidence interval formula

p̂ +/- [(z*)(SE(p̂))] = (p̂ - [(z*)(SE(p̂))],p̂ + [(z*)(SE(p̂))]) where SE(p̂) = sqrt(p̂*q̂ / n) z* is the critical value

so if p̂1~N(p1, sqrt((p1q1/n1)) and p̂2~N(p2,sqrt((p2q2/n2)), then

p̂1-p̂2~N(p1-p2, sqrt((p1q1/n1)+(p2q2/n2))

b1

r(stdev y / stdev x) this is the slope ( value that y increases by for every unit that x is increased by )

One of the best ways to avoid bias is by introducing

random elements into the sampling process e.g. Stir the pot before tasting the soup

Continuous random variable

random quantity that can take on any value on a continuous scale ("a smooth interval of possibilities") e.g. The amount of water you drink in a day, how long you wait for a bus, how far you live from the nearest grocery store.

Three ideas for measuring spread

range, interquartile range, the five number summary

Median on a histogram

same amount of area on both sides

bias

sample is not representative of the population in some way -good sampling is about reducing as much bias as possible

Correlation matrix

shows the correlation of every variable with every other variable

For any x-value (or z-score, if you convert to a standard normal model, N(0,1)) the percentile is

simply the area to the left of this value

Conditions for creating a regression model

since correlations are involved, we need our three conditions from before: 1) quantitative variable 2) straight enough 3) no outliers 4) residual noise

the line of best fit is determined by

slope and y-intercept

The sample size does not need to be

some percentage of the population size. Larger samples are better irrespective of the population size e.g. Tasting a small pot of soup gives same amount of info as tasting a large pot of soup -However, tasting 3 spoons of soup is better than tasting 1 spoon

event

some set of outcomes you might care about

subgroups can be identified in original data or residuals.

split your data into different parts and doing several linear regressions instead of one, clunky regression.

Standard Error

sqrt(p̂*q̂ / n) The same as standard deviation, sqrt(pq/n), but built upon p̂ (the sample distribution) instead of p. You are trying to estimate the population using the sample distribution, so you use p̂ values to estimate p.

Standard deviation of a density function

sqrt(var(x))

p̂ pooled

successes1+success2 / n1 + n2 Used when you are doing a hypothesis test. If we assume H0 is true (p1-p2 = 0) then the populations are the same and pooling p̂1 and p̂2 will give better approximations than using both of them separately.

cons of range

summarizes the data using only 2 data points, not resistant to outliers

Difference in Means: Hypothesis Test for paired, dependent samples

tdf = (dhat - 0)/ SE

mean

the center of a distribution must take into account the data values themselves, not just the order they're in. It is the calculated average value (sum of all terms / amount of terms)

outcome

the data created from a trial

To get a normal sampling distribution from samples of a population: The greater the skew in the population,

the higher n must be to get a normal sampling distribution

In any given​ situation, the higher the risk of Type I​ error, the lower the risk of Type II error.

the lower the risk of Type II error.

For positive test results to be useful ,you need

the orders of magnitude of "test accuracy" and "disease prevalence" to be better matched.

You can ask questions about __________ since the probability of any individual outcome is always 0

the probability of some interval of values occurring

Conditional probability P(A|B)

the probability that event A occurs given the information that event B occurred. Pronounced P(A given B)

Using noise to determine whether a regression model is appropriate

the residual plot should show "noise", or no observable patterns in the plot. -if a pattern is seen, regression is not appropriate

sample space

the set of ALL possible outcomes

The units on variance will always be

the square of the units in the problem. This can make variance difficult to interpret

Reexpressing data

to make data more visually appealing, to create more commonly-shaped histograms, to get lens of analysis correct

Methods for conditional probabiltiies

tree diagrams, P(A|B), and Baye's Theorem

bimodel histogram

two peaks

Inclusive "or"

used in probability. A or B means: 1. A, but not B 2. B, but not A 3. Both A and B

bad way of inflating r^2

using summarized data rather than unsummarized data

Extrapolation

using your model to predict a new y value for an x value that is outside the span of x data in your model

Interpolation

using your model to predict a new y value for an x value that is within the span of x data in your model

mode

value that occurs the most often in a set of data

Because of randomness, there is

variation in this statistic

When r is close to -1, the correlation is

very strong and negative

When r is close to 1, the correlation is

very strong and positive

When r is close to 0, the correlation is

weak (little to no correlation)

Spread of distribution

where does most of the data lie?

95% of all point estimates are

within (+/-) (2)*(SE) of p

Predictor variable

x-axis variable that predicts y.

if x and y are 2 independent random variables with normal distributions, then

x-y is also normal. also, since x and y are independent, var(x-y) = var(x) + var(y) and thus SD(x-y) =sqrt(var(x-y)) = sqrt(var(x)+var(y)) = sqrt(SD(x)^2 + SD(y)^2)

CI for the mean of one sample

x̄ +/- t*df*(SE(x̄)) SE = s/sqrt(n) df = n-1

Confidence Interval formula for t-distribution

x̄ +/- t*n-1 *(SE(x)) SE(x) = sigma/sqrt(n) = Sx/sqrt(n)

Response variable

y-axis variable that is predicted by x.

Z-score

y-ȳ /(SDy) ȳ = mean Unitless idea that tells you how many standard deviations above the mean some piece of data is z=0 is the mean (0 standard deviations from the mean) z=1 means 1 standard deviation away from the mean z=x means x standard deviations away from the mean

If the value xf, i.e. height of 6 feet, then the prediction interval of yf for a person with the same height xf is:

yf +/- t*n-2 * SE(PI) where SEpi = sqrt([SE(b1)]^2 * (xf - xhat)^2 + se^2/n + se^2)

Confidence interval formula from regression

yhat new +/- t*n-2 * SEci Where SEci = sqrt([SE(b1)]^2 * (xnew - xhat)^2 + se^2/n)

Subgroups may not be visible unless

you think about them

If the residuals show any type of pattern

your current linear model is not appropriate

Critical value

z* If you want a confidence interval of 80%, then 10% would be to the left and 10% to the right. Therefore, the critical value of the z-score (z*) would be at the 90th percentile (80+10) because 10% is to the right of the 90th percentile.

Regression line equation

ŷ = b0 + b1 * x

Recall the regression line equation

ŷ = b0 + b1 * x b0 is the intercept b1 is the slope

b0

ŷ-b1(x) this is the x-intercept ( value of ŷ when x=0 )

If you have a situation modelled by Binom(n,p) in which n is large and p is small, then use a Poisson model instead where

λ = np where: [n >/= 20 and P </=0.05 or n >/= 100 and p </= 010] and [np </=20]

Parameter, μ (or E(x)

μ (or E(x) A value that helps summarize a probability model

Variance of a density function

μ(mu) = mean The integral from -∞ to ∞ of ∫(x-μ)^2 * f(x)dx Easier version: E[X^2] - μ^2 =[integral from -∞ to ∞ of ∫x^2 * f(x)dx] - μ^2

The standard deviation of the sampling distribution is:

σ = sqrt(pq/n) (square root of [probability of success]*[probability of failure] divided by the [number of samples])

The spread of the sampling distribution is:

σ/sqrt(n) (Standard deviation over the square root of the number of samples)

Whiskers

(1.5 * IQR) away from Q1 and Q3 Lower whisker: Q1 - (1.5 * IQR) Upper whisker: Q3 + (1.5 * IQR)

The Law of Averages

(Gambler's fallacy) Incorrect use of LLN. False way of thinking that says if the current situation is out of whack, then it must correct itself in the short term.

range

(max value)-(min value)

Confidence interval for 2 sample proportions

(p̂1-p̂2)+/-z*SE(p̂1-p̂2) -samples must be independent from each other, at least 10 success/fails condition must be met also

interquartile range (IQR)

(upper quartile)-(lower quartile)

CI for mean difference in two samples

(x̄1-x̄2) +/- t*df*(SE(x̄1-x̄2) SE = sqrt(s1^2/n1 + s2^2/n2) df=min(n1-1, n2-1) (min means pick the lowest number between the two)

-If 60% of people run and 20% of runners wear long socks, what percent of people run and wear long socks? (What is the joint probability?) -What is the probability that someone doesn't wear long socks given that they run?

-0.2*0.6 = 0.12, so 12% of the total people run and wear long socks. -(0.6-0.12 = 0.48 = people that don't wear long socks and run) so 0.48 / 0.6 = 0.8 = 80% of wearing short socks given that they run

Summary of Sampling Statistics: To estimate a population parameter p, we can

-Draw a random sample of size n. -This sample will have a statistic p̂ ≈ p. -If we drew many samples, each would have its own statistic p̂ and we could make a histogram of these values -The histogram, the sampling distribution, is approximately: N( μ, σ/sqrt(n) )

When to use Z vs. T

-If you know sigma (almost never true): use z-distribution -In all other cases: Use t-distribution

Why doesn't X+X = 2X?

-In the X+X scenario, we often add winning and losing situations which diminish the influence of one another. (e.g win + win, loss + loss, win + loss, loss + win are all possible) -In the 2X scenario, you either win twice or you lose twice.

Cluster Sampling

-Sampling in which elements are selected in two or more stages, with the first stage being the random selection of naturally occurring clusters and the last stage being the random selection of elements within clusters -e.g. asking people as they walk into various gyms on campus what their average GPAs are. 3 different gyms can have both grads and undergrads. -Pieces just because it's more convenient -Pieces heterogeneous in relation to parameter you're measuring (Gyms all have same undergrads and grads)

In inference about regression, you use the histogram for all b1 values because b0 doesn't really tell us anything.

-The histogram of all possible b1's is centered at the true population parameter, β1 -SE = se / (sx*sqrt(n-1)) se = standard deviation of residuals sx = standard deviation of x values -The curve is best approximated by a histogram with tn-2 1. the conditions for inference from a regression line must be met (straight enough, quantitative, no outliers, residual noise) 2. independence condition (random, and <10% rule) 3. histogram of residuals is nearly normal

Stratified Random Sampling

-What is the average GPA of UCSD students? -Since grads and undergrads have much different average GPAs, you split the sample into 2 groups, do SRSs on each, then combine the results. -Pieces are homogeneous in relation to parameter you are measuring (undergrads have lower GPAs, grads have higher GPAs)

Common Geometric Model questions

-What is the probability that it takes exactly k <Bernoulli trials> to get the first <success>? -On average, how many <Bernoulli Trials> will it take to get the first <success>?

Common Binomial Model questions

-What's the probability of getting exactly k<successes> in n<Bernoulli trials>? -On average, how many <successes> will i get if I do n<Bernoulli trials>?

Margin of error is increased by

-smaller samples -higher level of confidence

The probability of any particular outcome happening is

0 This is because the integral from a to a of f(x) = 0

For a continuous random variable X which takes on any real number, we need model it through a density function f(x) which has 2 properties:

1) f(x) >/= 0 for all x 2) The integral from -∞ to ∞ of f(x) equals 1

How do we decide on the null hypothesis and the alternative hypothesis?

1. Adopt some belief for the moment (null hypothesis) 2. Operating under the assumption that this belief is true, you collect some data. -If the data supports the belief, you continue to operate with this mindset (fail to reject null hypothesis) -If the data supports an alternative belief, discard old belief in favor of new belief (reject null hypothesis in favor of alternative hypothesis)

Steps to testing a hypothesis

1. Create null hypothesis H0 2. Create alternative hypothesis HA 3. Draw a sample and consider it assuming the null hypothesis H0 is true. Find the mean and SD of this data and make a plot. (you use p and q instead of hats because if you are assuming that H0 is true, then you are assuming you know the values for p and q) -Calculate the p-value: the probability/chance of seeing our result or something more extreme if our universe is "H0: The drug works as well as the placebo" 4. If p-value </= 0.05, reject null hypothesis If p-value > 0.05, fail to reject null hypothesis

two ways to use t-distribution:

1. Estimate p1-p2 using a confidence interval about p̂1 - p̂2 2. Run a hypothesis test with H0: p1-p2 = 0

Steps for Test for Independence

1. Find the expected counts for each cell. This is equal to (row total)(column total)/(table total) 2. Find X^2 (same as before), X^2 = sum (Oi-Ei)^2/Ei 3. Find the P-Value: look up the X^2 value on X^2df, where df = (r-1)(c-1). r = amount of rows, c = amount of columns (exclude total column/row) 4. Use the P-value to conclude based on the null.

Two cases for the X^2 distribution

1. Goodness-of-fit 2. Test of homogeneity/independence

Examples of goodness-of-fit questions

1. If we look at the birth months of National Hockey League (NHL) players, do they resemble what we might see in the larger US population? 2. If we breed a bunch of peas, do we really get the results expected from Mendel's theory of genetics?

Steps of the Goodness-of-fit test

1. You wish to compare a collection of counts to those predicted by some theory 2. Calculate the expected counts from your theory (Expected = Total population * Percentage expected for that category) 3. Calculate X^2 = sum (Oi-Ei)^2/Ei Oi = observed counts Ei = expected counts 4. Find the P-value. Look up the X^2 value on the curve X^2k-1, where k is the number of categories 5. Use the P-value to decide about H0: The observed and expected values are the same.

Incorrect uses of linear regression

1. fail to look at the residuals and make sure the model is reasonable 2. don't extrapolate with caution 3. don't consider outliers carefully enough 4. build a model of data that isn't straight enough

p-value main points

1. p-values can indicate how incompatible the data are with a specified statistical model 2. P-values do not measure the probability that the studies hypothesis is true 3. A P-value (statistical significance) does not measure the size of an effect or the importance of a result (practical significance) 4. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold

P overload

1. p: is the proportion of some trait in a population. It is a parameter 2. p̂: is the proportion of some trait ina sample. It is a statistic. 3. P(A) is the probability of some event A happening 4. P-value a conditional probability: it is the probability of getting the value p̂ (or something more extreme) in a universe where p is true.

Test for Independence

2 or more populations split across a categorical variable. You should have a 2-dimensional table of counts.

multimodel histogram

3+ peaks

Common Confidence intervals and critical values

90% = 1.645 95% = 1.966 99% = 2.576

If an event can never occur, then P(A)

=0

If an event must occur, then P(A)

=1

Suppose A and B are independent evets, then P(A and B)

=P(A) * P(B)

Sample

A (hopefully) representative subset of your population e.g. A spoonful of soup from the top of the pot

Hypothesis

A claim that may or may not be true

Convenience Sample Bias

A form of bad sample frame. Easiest sample to take is not representative of population. e.g. You work at facebook and survey 5000 on whether they love FB. This is convenience sample bias because you are likely friends with your coworkers, who also work at facebook and are more likely to either love it or hate it (depending on how working there affects them).

Correlation (R or r)

A statistic that measures strength and direction of a linear association between two quantitative variables where no outliers are present.

Lurking variable

A variable not x or y that causes a change in either x or y.

Visualize the probability table on a graph

An outcome is more likely if there is more area in the bar for that value on the graph We also know that the sum of the areas of the bars must be 1 Heights must be at least 0 (no negative bars)

Spread

Another parameter we might care about. Big spread is exciting for people in Vegas because they focus on the bigger wins

Exponential Distrbution

Asks about a continuous idea, usually related to time: f(x) = {λe^(-λx) when x >/= 0 0 otherwise

Calculating area under a distribution using a z-score table

Calculate the z-score, then find the % value on the table corresponding to your calculated z-score value.

Ch 24

Chi-Squared Tests

ch 4

Comparing distributions

Ch 18

Confidence Intervals for Proportions

Subjective probability

Consider a number of factors important to the situation, personally decide how important they are, and use these to come up with an answer. eg. I have a 60% chance of getting an A because i do all readings, HW, come to classes, and am an A/B student in other math classes.

Handout Lectures

Continuous Random Variables

Correlation vs causation thinking

Correlation thinking: weight and height are correlated, so heavier people tend to be taller Causation thinking: (wrong) weighing more causes you to become taller

Categorical/Qualitative Data

Data that falls into categories or labels; often text ideas; tend not to have units

Normal Distribution (aka the Bell Curve)

Density function is too hard to calculate. Usually given or computed with technology. -Mean is in the middle of the bell curve. Increasing and decreasing values past the mean are evenly distributed before and after the mean, resulting in a bell curve.

CH14

Dependent events, tree diagrams, Bayes Theorem

Expected Value of a density function

E(X) = The integral from -∞ to ∞ of x*f(x)dx

Expected value for a discrete random variable

E(X) = sum of x for [P(x)*x] (the sum of all (each outcome multiplied by its probability)

Adding random variables (no constants)

E(X±Y) = E(X) ± E(Y) if X and Y are independent variables: Var(X±Y) = Var(X)+Var(Y) (always +, never -!) SD(X±Y) = sqrt(Var(X)+Var(Y)) (always +, never -!)

Adding constants to random variables

E(X±c) = E(X) ± c Var(x±c) = Var(x) SD(x±c) = SD(x)

Scaling random variables

E(aX) = aE(X) Var(aX) = a^2 * Var(X) SD(aX) = |a| * SD(X)

If x = Uniform(a,b):

E(x) = (a+b)/2 Var(x) = [(b-a)^2]/ 12 SD)x = sqrt(Var(x)) = (b-a) / sqrt(12)

Disjoint events

Events A and B are disjoint if they share no common outcomes ex: A: rolling an even number on a die B: rolling a 5 on a die

Independence

Events A and B are independent if event A occurring has no effect on the probability of B occurring, and vice versa.

Population

Everything you want to study e.g. A huge pot of soup

Type II Error

Failing to reject the null hypothesis when it is false. β

Claim 2: 65% of UCSD students are FB users

False. Population parameter may not match the sample statistic

Percent variance explains (R^2 or r^2)

For a given linear model, r^2 (the correlation coefficient squared) is the proportion of the variation in the y-variable that is accounted for (or explained) by the variation in the x-variable

Goodness-of-fit

Goodness of Fit Test: 1 population (NHL players, peas) split across a categorical variable (birth month, phenotype). You should have a 1-dimensional table of counts.

Discrete random variable

Has only a) finitely-many outcomes (e.g. X is time of DMV service) or b) space between the values (e.g. Y is the number of meteors that have hit a planet)

From the SEpi equation: what does (xf - xhat)^2 tell us?

How far the individual is from the center of all the individuals we used to build our model. As we move far away from the core of our data, we should be more worried.

The expected value of the geometric model answers

How many trials are needed to get the first success, on average

From the SEpi equation: what does [SE(b1)]^2 tell us?

How unsure we are about the real slope of the regression line.

Understanding p-value

If the p-value is below 0.05, there is less than a 5% chance for that probability to be observed given that the center of the distribution is the mean given by the conditions of the null hypothesis.

"95% confident" technically means

If you drew many, many samples, and for each one, you find p̂ and built a confidence interval by reaching out +/- 2 standard deviations, then the true population parameter would be in about 95% of these intervals. -It is not "A 95% chance that your value is in the interval" -It is actually "95% of the intervals you find will contain the parameter"

If you have two quantitative variables, you can measure the strength of an association using a correlation coefficient.

If you have two qualitative (categorical) variables, you can use a chi-squared test for the significance of an association.

Simple Random Sample (SRS)

Imagine each point in a box as a person. We just pick a certain number of random points.

Approximation rule

In ANY data set that is normally distributed: -About 68% of the data values are within 1 SD of the mean -About 95% of the data values are within 2 SDs of the mean -About 99.7% of the data values are within 3 SDs of the mean

Common question for the Poisson distribution

In general, <some behavior> is average. How likely am I to see <some specific behavior>? e.g. You have 12.5 emails per day and X% are spam. How likely are you to see 5 spam emails in a day? e.g. There is an average of 2.5 goals scored in each soccer game. How likely is it for a game to have 9 goals?

Assumptions made for statistical inference when using the t-distribution

Independence of data: Randomization condition, <10% condition. Population distribution must be nearly normal: -look for near-normality in histogram of your sample -More skew is OK as n gets larger

Ch 25 P1

Inference About the Regression Coefficients

Ch 20

Inferences for Means

Which is more accurate: interpolation or extrapolation?

Interpolation is more accurate because the pattern you built applies to the data within range

The density graph is NOT P(X)

It is a function that helps you figure out probabilities by examining the area under it. Its shape suggests what values are more likely (relatively) but the probability of any particular otucome occuring is still 0

For smaller sample sizes (n<30) or populations where you don't know σ (and must approximate using sx), there is a better approximation of the sampling distribution than the normal model

It is called the t-distribution

ch 7

Linear Regression

How does intensity of skew affect the difference between mean and median?

Lower skew = lower difference between mean and median. Greater skew = greater difference.

How do you increase the power of a test?

Lower the cutoff value (α)

Margin of Error

ME = z*(sqrt(p̂*q̂ / n)) ME = z*(SE)

Table of Contents:

Midterm 1 Ch 3: Welcome Ch 4: Comparing distributions Ch 6: Scatterplots, Association, Correlation (2 variables) Ch 7: Linear Regression Ch 8 and 9: More things about Regression Ch 13: Probability Ch 14: More Probability Theory Ch 15: Random Variables _________________________________ Midterm 2 Ch 16: Modeling Handout: Continuous Random Variables Ch 5: Z-scores, the normal model, the standard normal model Ch 17: Sampling Distributions Ch 18: Confidence Intervals for Proportions Ch 19: Testing Hypotheses About Proportions __________________________________ Final Ch 20: Inferences for Means Ch 21: Types of Errors and 21 Questions Ch 22: Two-Sample Proportion Inference Ch 22 and 23: Paired Data, Two Sample Means Ch 25 P1: Inference About the Regression Coefficients Ch 25 P2: Prediction Intervals/Confidence Intervals Ch 24: Chi-Squared Tests

Midterm 2 material

Midterm 2 material

When multiplying data by a value Y, how are the statistics affected?

Minimum value = Y*min Maximum value = Y*max Mean = Y*mean Median = Y*median SD = |Y|*SD IQR = |Y|*IQR

When multiplying data by a value Y and adding a value X, how are the statistics affected?

Minimum value = Y*min + X Maximum value = Y*max + X Mean = Y*mean + X Median = Y*median + X SD = |Y|*SD IQR = |Y|*IQR

When adding a value X to a data, how are the statistics affected?

Minimum value = min + X Maximum value = max + X Mean = mean + X Median = mean + X SD = SD (unaffected) IQR = IQR (unaffected)

Ch 16

Modeling

Ch 14

More Probability Theory

ch 8 and 9

More things about Regression

The sampling distribution is a normal curve with model

N( μ, σ/sqrt(n) )

Numeric/Quantitative Data

Numerical data with units

Density function

Only area under the graph is linked to probability.

P(A|B)

P(A and B)/P(B)

Losing Disjointness: P(A or B) =

P(A) + P(B) - P(A and B)

If all the outcomes in a sample space are equally likely, we define the probability of an event A to be

P(A) = (# of outcomes in event A)/(# fo outcomes in the sample space) where 0 <= P(A) <= 1

Complement rule

P(A) = 1 - P(A^c)

In general, P(A and B) =

P(A|B) * P(B)

Advanced Baye's Theorem (for when P(B) is not known)

P(A|B) = [P(B|A)*P(A)] / [P(B|A)*P(A) * P(B|A^c)P(A^c)]

Baye's Theorem

P(A|B) = [P(B|A)*P(A)]/(P(B))

P(making a Type I error) =

P(reject H0|H0 is true) = alpha

The Poisson Distribution

P(x) = (λ^x)(e^-λ) / (x!) λ = average value x = value whose probability you are trying to predict E(x) = λ SD(x) = sqrt(λ)

Ch 22 and 23

Paired Data, Two Sample Means

Ch 11

Populations and Samples

Ch 25 P2

Prediction Intervals/Confidence Intervals

prediction interval vs confidence interval

Prediction: Range of values that future observations will fall for [a single person] Confidence: Range of values that future observations will fall for [the average of all people like that person]

Ch 13

Probability

Know the symbols for both Statistics and Parameter:

Proportion, mean, SD, correlation, regression coefficient

The Central Limit Theorem (CLT)

Proves the sampling distribution for a proportion statistic or mean statistic will be a normal distribution, regardless of the population distribution (assuming we have met the 2 conditions: Independence and Nearly Normal)

Q1, Q2, and Q3

Q1: median in the first (lower) half of the data Q2 (median): median of the whole distribution Q3: median in the second (upper) half of the data

Ch 15

Random Variables

Bernoulli trial

Random variable with precisely 2 independent outcomes. P(x) = {p (x=success) or [1-p = q] (x=failure)

Confidence Interval

Range of values around a point estimate that convey our uncertainty about the population parameter (as well as a range of plausible values for it)

Type I Error

Rejecting the null hypothesis when it is actually true

Standard Deviation (σ)

SD(X) = sqrt(Var(X))

Systematic Sampling

Sample elements are selected from a list or from sequential files e.g. Asking every 10th person you see

Bad Sample Frame Bias

Sample is not representative of population. e.g. Want to determine if people in US like facebook. Study facebook users in US. You completely underrepresent people who don't use facebook. Maybe they don't use facebook because they hate it!

Ch 17

Sampling Distributions

ch 6

Scatterplots, Association, Correlation (2 variables)

Parameter

Some value summarizing the population

Statistic

Some value summarizing the sample

Standard deviation equation

Sqrt(sum of (y1-mean))/(n-1))

Null hypothesis (H0)

Statement that says nothing interesting is happening (the opposite of what you're looking for) e.g. if you're trying to prove that a drug produces more treatment than a placebo, your null hypothesis would be that the drug produces the same amount of treatment as the placebo: p(drug) = p(placebo)

Ch 19

Testing Hypotheses about Proportions

Statistical Inference

The attempt to say something about the population parameter given a particular sample statistic (i.e. point estimate)

Memoryless

The exponential distribution is memoryless. The probability of a washing machine lasting for 3 years is the same as the probability of a washing machine lasting for 3 years if it has already lasted 30 years. (This might not be true in real life, but it is true in probability)

If f(x) is a density function for the continuous random variable x, then P(a < x < b) equals

The integral from a to b of f(x)

From the SEpi equation: what does se^2/n + se^2 tell us?

The more spread that exists around our line (i.e., the bigger se) the less confident we are in our prediction. Having more data helps reduce SEPI but this can only help so much.

Power of a test

The power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. P(reject H0|H0 is false) = 1 - β or P=P(fail|HA is true) = 1-P(success|HA is true)

Marginal Probabilities

The probability of one value of a categorical variable occurring (A)

Joint probabilities

The probability of two things joining forces and happening simultaneously (A and B)

p-value

The probability under the curve of the test statistic z (recall that z = [(μ) - (μ0) / SE] For one-tailed test: p-value = P(z>z0) If HA is on the right tail or p-value = P(z<z0) if HA is on the left tail For two-tailed test: p=value = 1-(2*P(z<z0)) (<-most common) = 2*P(z>z0) = 2*P(z<-z0)

Requirements for both types of X^2 tests

The requirements are the same: 1) You start with a one-dimensional (k x 1) table of observed counts. You wish to compare these counts to those predicted by some theory. 2) The counts in the cells of the table must be independent of one another. Randomly sampling the people that comprise these counts usually gives us this. 3) The expected count for each cell must be at least 5. (Note: We don't require that the observed counts be at least 5, just the expected counts.)

What are the effects on error of increasing alpha (α)

The risk of a Type I error is decreased and the risk of a Type II error is increased.

Volunteer bias

Those who are willing to take their own time to voluntarily complete something like a survey usually look different from those who don't. It does not represent those are opt out of volunteering.

Hypothesis testing on Slopes of Regression Lines

Tn-2 = b1-0/SE(b1) -SE(b1) will be given

neutral way of inflating r^2

Tossing outliers and doing the analysis without them (good or bad depending on the situation) -if outliers are trolls, tossing them is fine -if outliers are valid, observed data, you cannot toss them

Draw SRS of 200 UCSD students and ask if they have a FB account. 130 say they do. Claim 1: 65% of our sample are FB Users

True. p̂ = 130/200 = 65%

Ch 22

Two-Sample Proportion Inference

Ch 21

Types of Errors

Sampling Frame

Universe you will be picking from

Difference in means (p1 and p2): Hypothesis test for two independent samples

Use tdf = (x̄1-x̄2)-0 / SE df=min(n1-1, n2-1) no pooling necessary

Difference in proportions (p̂1 and p̂2): Hypothesis test for two independent samples

Use z = (p̂1-p̂2)-0 / SEpooled where SEpooled = sqrt(p̂pooled * q̂pooled)/n1 + (p̂pooled * q̂pooled)/n2)

Claim 3: About 65% of UCSD students use FB

Vague. Need to learn how to do better. "About" is not precise enough in statistics

Variance (σ^2)

Var(X) = Sum of all x: (x-μ)^2 * P(x)

Methods for Marginal and Joint Probabilities

Venn diagrams, contingency tables, P(A or B) rule, P(A and B) rule.

Confidence statement format

We are (C%) confident that the (population parameter) is in the (confidence interval)

Do we always get a normal model for the sampling distribution of a mean

We do if 2 conditions are met: 1. Independence Assumption: The items in each sample must be independent of one another. Typically, better to check two conditions (which effectively mean independence) 1.A. Randomness Condition: The items in your sample must be randomly chosen 1.B. <10% Condition: Your sample size needs to be <10% of the population size. 2. Nearly Normal Condition (Sample size condition): The population histogram should look nearly normal. If this histogram shows skew, the sample size needs to be large for the sampling distribution to be normal. e.g. n>30 for moderate skew, n>60 for large skew.

Ch 3

Welcome

Alternate hypothesis (HA)

What you expect might be true. Opposite to the null hypothesis. e.g. The drug produces more treatment than the placebo: p(drug) > p(placebo)

Uniform Distribution

When a finite interval of possibilities are all equally likely: f(x) = {1/(b-a) when a </= x </= b 0 otherwise height = 1/(b-a)

Tower of power

When original data or the residuals convince you that the data are not straight enough, apply a mathematical function to the values

Proportions and means

When populations are big, we must draw a random sample and estimate these parameters using statistics

histogram symmetry

When the left and right halves of a histogram look similar/the same

Two-sided alternative hypothesis

When you are excited about results on both sides. You are wondering if your percentage is different from the comparison %. P(a) =/= P(b)

One-sided alternative hypothesis

When you are excited with only one side; the better side. You are hoping your % is on a certain side of the comparison %. P(a) > P(b) or P(a) < P(b)

There are many curves in the t-distribution family

With n data points in sample, you use t-distribution with df (degrees of freedom) = n-1

Geometric model

X = Geom(p), where p is the probability of success and X is the number of trials needed to get a success. -Assume we are doing a Bernoulli trial with success probability p (and failure probability q=1-p) over and over until we get a success. The probability of getting a success in x trials is: P(x)=[q^(x-1)]*p E(X) = 1/p SD(X) = sqrt(q/p^2) = [sqrt(q)]/(p)

E(X±Y) = E(X) ± E(Y) is true even if

X and Y are dependent

The choose symbol (X nCr Y)

X nCr Y: Helps you calculate how many ways there are to list X successes among Y attempts. Formula: (n!)/[k! * (n-k0!)]

If x = Exp(λ):

X represents how long we will have to wait before an event with rate λ occurs. E(x) = 1/λ (The probability that we have to wait X years before the event occurs) Var(x) = 1/(λ^2) SD(x) = 1/λ

Binomial Model

X=Binom(n,p), where n is the amount of trials, p is the probability of success, and X is the number of successes in n trials. -Probability of getting k successes in n Bernoulli trials is: P(k) = (n nCr k) * (q^(n-k)) * (p^(k)) E(X) = np SD(X) = sqrt(npq)

Does empirical probability make sense?

Yes, you tend to get what you expect

Theoretical probability

You build a mathematical model to describe a situation and use the axioms of probability to determine the likelihood of some events eg. Determine chance of rolling even numbers on a die is 1/2 because 3/6 of possible otucomes are even numbers

Empirical probability

You determine how likely something is by trying it over and over and looking at tons of data. eg. determining if a coin is fair by flipping it 100,000 times and recording number of heads and tails

Multistage Sampling

You focus on undergrads today and ask every 4th one you see. You do grads the next day and ask every 4th one you see. -Uses 2 or more of the previous methods (excluding SRS)

By assuming H0, build a universe where p is in accordance with H0.

You must first make sure that the sampling distribution is approximately normal: Make sure <10% of total population Make sure np >= 10 success and nq >= 10 fails

Finding Z using pooling

Z = (p̂1-p̂2)-0 / SEpooled SEpooled = sqrt(p̂pooled - q̂pooled)/n1 + (p̂pooled - q̂pooled)/n2)

Ch 5

Z-scores, the normal model, the standard normal model

The five number summary

[lower whisker {Minimum value, Q1, Median, Q3, Maximum Value} upper whisker]

All confidence intervals work the same way, with slight changes

a

Do not create a regression when what type of outlier is present?

a high influence outlier is present


Related study sets

Personal Health and Wellness Topic Test

View Set

Chapter 14 - Executive Compensation

View Set

HSA Final Study QsWhich of the following is NOT an example of an exposure?

View Set

Jojo's Bizarre Adventure characters (the epic and cool ones)

View Set