Statistics

Ace your homework & exams now with Quizwiz!

Binomial random variables

"counts" of the number of times a 0-1 outcome occurs in n independent trials:

symbol for the population mean

(mu) µ

event

A set of basic outcomes of the experiment (i.e., a subset of the sample space)

Difference in graphs between a spline and categories

Categories assume a world in which people are types and there's no difference within the type (bar graph). Splines assume that there are linear trends, but they're different in effect throughout

expected value

Expected values is the probability-weighted average. So let's say demand can be 1,2 or 3. The chance of it being 1 is .2, 2 is .3 and 3 is .4. The expected demand is 2

How to read binomial(50,49, .93)

Find the probability of this number of successes occurring is: (number of successes, number of trials, probability of success in one trial)

error variance

How far away the error tends to be from the regression line in OLS. Estimated by the mean square error

Essential feature of OLS

It describes the average Y. So if you plug in the average for each x, you will get the mean y of the sample

How good is your prediction interval around a point prediction in a linear regression?

It's quite good if your point is near the average x (xbar). If it isn't, your prediction range gets wider, because you're less sure. So the range looks like a waist

Variations of the standard error

SD/√n √[p(1-p)/n] Standard error for β₁ in OLS: s(b₁)= √MSE/√SSx (so the average error, equivalent to xbar, divided by the sqrt of the sum of squared errors for x)

What the F test does (and doesn't) predict

The F test only tells us that there is evidence of a relationship between the dependent variable at AT LEAST ONE of the independent variables. Once we conclude that a relationship exists, we need to conduct separate tests to determine which slopes of B are different from zero.

what is p?

The lowercase kind that's sort of rounded stands for the "population correlation coefficient"

Fixed effects

Use this when you think there are omitted variables that you can't account for. It nets out all the unobserved components that are constant within families (preferences, income, family culture). It also doesn't have to assume that the error is not correlated with the x (which random effects assumes). So here the error can be correlated w the x. It's a safer bet.

What happens if you run an lr test and change only one coefficient

You get the same p value as the t-test

variance:

a measure of how dispersed the outcomes of a random variable are. Higher variance is viewed as higher risk in financial contexts. Find its square root to get the SD

What is a "standard normal distribution"

a random variable whose µ=0 and whose σ=1. We call this variable Z rather than X or Y. You use a Z table to find the probability of values distributed like this

combination

how many goody bags you can make out of the possibleoptions Here bda=adb. Written as (nr) in which n is stacked on top of r. Formula: n!/(n-r)!r! Think of it, you add a r! in the denominator to make the number of combinations smaller than the number of permutations since whereas bda and adb used to be unique values, now they count as the same combination

What is this? X~N(µ,σ²)

"X is normally distributed with mean µ and variance σ²." So the N stands for normal distribution, just as you would write binomial(50,49, .93) which states the distribution and the parameters that affect it (number of successes, number of trials, probability of success in one trial).*** Notice that you're not supposed to just calculate the variance but the average variance in your sample, so divide it by n of your sample***

There is an assumption for the linear regression that the errors are randomly distributed. Wtf?

If means that if you have a line and the random noise around you, imagine a bunch of bell curves turned vertically. There will be the most error near your actual regression line, and the error will peter out as you go father away from your predicted estimate (ie is less likely)

Why you can't just look at t statistics

If the model is wrong (omits something important) then the t-stats for that specification aren't valid in any case.

Gauss-Markov theorem

In a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator

How does random effects help you? What is the command

It does not help with endogeneity (so if you suspect that, better go for instrumental variable or fixed effects) But it helps with group level clustering, ie adjusting for the fact that schooling for kids in the same state is more similar than for kids in different states. xtreg ***, i(id). So in the fertilizer example, it adjusts for the fact that all the land in minnesota might be different for all the land in idaho, but not for the fact that people put more fertilizer on bad land overall.

What's up with covariance?

It tells us whether x and y move together. If they both go up and down together, it's positive. If they go in opposite directions, it's negative. We can't really interpret it because it doesn't have units, but we can use it to test for independence of x and y. If x and y are independent, the covariance is zero. BUT the covariance *can* be zero for non-independent events. So how is that helpful? If the covariance is not zero, the events are definitely dependent. If it is zero, reserve judgment.

How to get the variance of the whole portfolio if you only have the variance of the pieces in the portfolio

You can't just add them up little variances to get big variances. Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X,Y) And if you have multiple pieces (say, X Y and Z), don't panic. Just add on the extra Var(Z) and make 2Cov(X,Z) and 2Cov(Y,Z). Caution: If you own many shares of X, you will need to multiply X by those share inside the variance, ie Var(200X) + 2Cov(200X,Y). For some reason, you can multiply out to make 2 400Cov(x,y) but have to square to get 40000Var(X) Caution: don't forget to double the cov. Common error! var(aX)=a²var(X), the fifth line cov(aX,bY )=abcov(X,Y ) var(a+ X)=var(X) (ie if you add to µ, the variance and the covariance stay the same) cov(a + X,b + Y )=cov(X,Y ). conceptual note: the cov here is telling you whether your portfolio varies a lot or a little. if two of your investments go in different directions (neg covar) then you're hedging, so you have a smaller variance, and thus less risk

Uniform Random Variables

every value is equally likely between a and b.

difference between independent events and mutually exclusive events

independent events are like a coin toss and a die roll: getting a number on one doesn't affect the other. Mutually exclusive events are events that can't happen at the same time. You can't both get a one and a six on a die in the same roll, so the probability of these mutually exclusive events happening is zero.

r² vs R²

n r2 associated with the simple linear regression model for one predictor extends to a "multiple coefficient of determination," denoted R2, for the multiple linear regression model with more than one predictor.

permutation

number of different orders you can have for a set number of things (number of ways you can make a three-letter combination out of abcd) Here bda does not equal adb. Permutation=PERManent. Formula: n!/(n-r)! in which n is the number of options (letters, 4) and r is the number of slots (3). In this case: 4*3*2*1/1= 24. Notice that after you use a letter, you can't use it again, so you only have three letters to choose from

multicollinearity

when a variable appears to have a slope that is insignificant but it's just because it is correlated with another explanatory variable. We can try dropping the suspected variable and seeing if it makes our boy significant

symbol for sample mean

x with a bar over it

general setup for finding a confidence interval

you average±(standard deviation thing)(z score for big stuff, t score fore little stuff) variations:

compute total expected value of two separate clients

you will have two tables with the values and the probability of it being that value. Multiply the number and the probability, and add up the products. Add the sums for the two clients to get the total expected value.

What elements affect the shape of the normal distribution?

σ (sigma, standard deviation) and µ (mu, the mean). Changing µ shifts the center. Increasing σ means a wider/fatter bell because now more deviation from µ

symbol for population standard deviation

σ lowercase sigma

symbol for population variance

σ² lowercase sigma

Number of variables trade-off:

• Too few = Important variables omitted ⇒ potential confounding (estimates biased. Like regression florid and mortality w/out accounting for the counfounding factor of age being higher in florida overall). • Too many = Including irrelevant variables ⇒ blows up SE's (estimates imprecise).

It is possible to reject the joint hypothesis that both coefficients are zero, but still can't reject either coefficient individually. Why?

1) If you have 100 coefficients and only a few are significant, you can fail the joint test 2) multicollinearity: if X and Z move together, you can't tell apart what is the effect of what. Each could be zero as the other takes its effect, but when you do a joint test, they're definitely not both zero. Useful for when the variables in the group are highly correlated. If testing firm performance affecting CEO $, there r many ways to measure firm performance, and note sure ahead of time which measures is most important. Since measures of firm performance are likely to be highly correlated, hoping to find individually significant measures might be asking too much due to multicollinearity. F test can determine whether, as a group, the firm performance variables affect salary.

what do you do if your residuals are systemic in your linear model?

1) Try a polynomial. If that doesn't work, 2) semi-log is when you only take the log of y. B1 is the *110 percent change in the Y from a one unit increase in x. 3) log-log: take the log both of Y and of X. B1 is the percent change in Y from one percent increase in X

Two ways to test joint hypotheses

1) Wald test: There are formulas and for doing this, but they involve the covariance matrix of the coefficient estimates. Also F test. 2) Lr test: Likelihood Ratio Test. In a linear model the LR test and the Wald test are numerically identical. Estimate the restricted and unrestricted models and compute the f statistic according to the formula

Why you want to use a joint hypothesis test (ie F test)

1) to test the significant of two or more dummies. And you need to make sure you can sweep all of them. not good enough to see if one is significant to .5 percent and the other to .1 percent, because the you don't know to what percent you can sweep both. 2) to test when you think you have multicollinearity

Properties of Normally distributed random variables

1. Sums. The sum of Normally distributed r.v.'s is also Normally distributed. e.g., If X, Y are Normal r.v.'s, then W = X+Y is also a Normal r.v. 2. Constants. You can add or multiply a Normal r.v. by constants and still have a Normal r.v. e.g., If Z~N(0,1), then a+bZ ~ N(a,b2) (s.d. is |b|)

relationship between binomial and bernoulli

A Bernoulli random variable is a Binomial random variable with parameter (1,p) (so it can only be 0 or 1, and the probability of it being 1 is p). If you do a bernouli trial, you know the expected value is p (the probability of success) and the variance is p(1-p) (probability of success times probability of failure). The binomial count comes in when you have to tell the expected count of several of these Bernoulli trials. Luckily, by independence, you can just multiply the number of trials by the expected value (p) to get the expected value of the binomial. Same for the variance! Aren't bernouli numbers easy and great?!?

Say the effects of mother's education on respondent's education does not vary with respondent's sex. Draw a graph of what this would look like for men and women

A graph with two parallel lines. just because the effect is the same doesn't mean they start off at the same intercept. Indeed, the coefficient for maeduc may be the same, but the coefficient for (male) probably isn't

Confounding:

A situation in which a measure of the effect of an independent variable (X) on a dependent variable (Y) is distorted because of the association of X with other factor(s) (Z) that influence the outcome of interest. Z must be correlated with X.It is not an intermediate variable in the pathway from exposure to outcome. (ie it's that x and z go together, like inflation and time passing, rather than x causes z, ie the side effect of the medication is that is *causes* low blood pressue, which in turn affects number of headaches. You wouldn't put both medication and low blood pressure as variables) http://www.medicine.mcgill.ca/epidemiology/joseph/courses/epib-621/confounding.pdf

Likelihood Ratio Test Formula

F= [R²-R²*/J]/(1/R²)/(n-k) in which R² is for the unrestricted R²* is for the restricted J is the number of coefficient restrictions K is the number of coefficients in the unrestricted model The larger the F the more the restrictions hurt the model Also, for large samples, it's a chi distribution

monty hall problem: you have three doors and you want a car. You pick door one, and monty opens another door that has a goat in it. Should you stick with door 1?

No, you should switch. There's a 1/3 chance the car is in door 1. There's a 2/3 chance it's in one of the other two. You know it's not in the door he opened, so there's a 2/3 chance it's in the last door you didn't pick

what does the mean square error measure?

Remember the SSE? That's the sum of the squared errors, ie you add up all the distances from the regression line and square them so that the negative distances won't cancel out. The MSE is the average distance from the regression line, so the SSE/(n-2). It's -2 because to make the line you had to estimate at least alpha and beta1. Ie, it measures the error variance in your model.

Why you can't add up the square roots of the investments a portfolio to get the portfolio standard deviation

Since the SD of a random variable is the square root of the variance, to get the SD for a sum of two random variables you have to first work out the variance of the sum and then take the square root. There are no shortcuts here; you can't reverse the order of sums and square roots in mathematics.

Continuous Random Variables

Some variables can take any value in an interval. • e.g., exact age, height, income. For a continuous random variable X, we calculatete the uncertainty through the probability desnity function: the area under the pdf between a and b is the integral of f(x) from a to b

Limits of R²

Sometimes it's really high but is only so because you have few degrees of freedom for a lot of parameters. It's like fitting a 5 dimension plane on 5 points-- of course it fits. So the adjusted R² is calculated, which is R² normally except the SSE and the SST are divided by their respective degrees of freedom (which means it's using the 1- sse/sst formula). SST is how far yi is from avgy. SSE is how far yhat is from yi, and SSR is how far yhat is from avgy

In words, what is the standard error?

Standard error measures the accuracy with which a sample represents a population. OLS: measures the vertical distance between the data point (xi yi) and the hyperplane, assessing the degree of fit between the actual data and the model. The sum of squared residuals (SSR) (also called the error sum of squares (ESS) or residual sum of squares (RSS))[6] is a measure of the overall model fit.

P test in stata

Tests the two-tailed probability of the null hypothesis that your coefficient is actually equal to zero. Stata uses the t-distribution with n-2 degrees of freedom to calculate p-values. For small samples, this assumes the errors are Normal

Difference between standard deviation and standard error

The SD (standard deviation) quantifies scatter — how much the values vary from one another. The SEM (standard error of the mean) quantifies how precisely you know the true mean of the population. It takes into account both the value of the SD and the sample size.

z distribution vs t distribution

The Z is the standard normal distribution (mean 0, SD 1) So if you have a normal distribution that doesn't have those two parameters, subtract your xbar from mu to get the difference between the two and divide by the SE before looking that number up on the z table. Note that if you're checking how far a sample xbar is from mu, you don't want the SD of the whole population, but rather the standard error (how far xbars from the mean). So if the problem gives you the SD of the whole population, make sure to divide it by √n to get the SE. The t-distribution is for when the population already has a normal distribution. You can get the standard error by using the same equation: (xbar-µ)/(S/√n), except that whereas S was actually σ (the population standard deviation) in the example above, here we're using the sample standard deviation to estimate what the σ would be.

sample space

The set of all possible basic outcomes (i.e., stuff in curly brackets)

Explain the math in r²

There are two types of sums of squares: 1) Sum of squares for error 2) Sum of squares for regression SSR is the distance between the average Y and the predicted yhat (with your regression line tells you) SSE tells you the distance between your predicted yhat and the error (yi). The total (SST) is the distance between the average y and the yi. r² describes the proportion of variation of y that is due to your regression rather than to error, so you want it to be big to show that there is a high proportion that is due to your model. Equals SSR/SST -- Note that lousy models can have high r²'s , like some models w as many regressors as data

When to use the t statistic

When the POPULATION is normally distributed and you have a sample size smaller than 30

heterocedasticity

When the width of the scatter plot of the residuals increases or decreases as x increases. So let's say you plot your residuals on the x y and x or yhat on the x. Your residuals should have no patters, bc your error is actually random. But maybe it's not! If the errors fan out like a cone, you've got a problem. Another problem you could have is if your residuals have a linear trend when plotted against time. In that case, include time as a variable

How to get the total standard deviation of two clients who are independent and have two separate expected values

You will need to know how to get the total expected value for the two clients (see flashcard). (you can only get the SD this way if the two clients' values are independent of one another) To get the SD, you want to see how far each potential value is from the total expected value. So subtract each individual potential value from the total expected value. Square it to get rid of the negative. Now multiply by the probability of getting that potential value in the first place to standardize it. Add all these up for both clients to get the variance. Take the square root to get the total SD.

sampling variation

describes the fact that if you randomly sample 40 houses you will get different income values than if you randomly sample 40 different houses.

Variance in a binomial distribution (and how it's different when you're trying to find the average standard error of a population estimate)

p is your probability of success variance= p(1-p) Note that if you have the variance for the entire population and you want the average variance for a single person deviating from the mean, you have to divide by n and do the square root for SD AFTER that. Get it? It's the average because you divide it by the number of numbers?

command for plotting residuals to check for heteroscedasticity

predict r ,resid scatter r class_size in which class_size is the X

Prob a or b =

prob a + prob b - prb (a and b) if there is no overlap, you just add. This is the squares thing from the lecture

What to do when it's essentially a binomial problem but the results of the trials are not independent. I want to know the probability that three randomly chosen customers will like the product, but liking the product is not independent from whether it was a success or failure

prob success = .33 prob failure= .66 prob like it: .6 prob 3 will like it is .33(binomial(5,3,.6) plus .66(binomial(5,3,.6)

prob(AnB)

prob(a)*p(b|a) first i need to know what the general probability is that A will happen. Then I need to know what the probability of b is of happening IF a already happened. I can't just put them together, because maybe A impedes B from happening.

Prob(a|b)

probability that A will happen given hat b already happened = prob(anb)/prob(b) (think of the tree diagram)

symbol for sample correlation

r sample covariance/SD(X)SD(Y)

Difference between r² and MSE

r² measures how well the regression fits, and so does the MSE, but r² can be compared across models. MSE can measure the size of errors in regression (so small errors means good fit) but doesn't tell me anything about whether the explained part of the regression is a good fit. r² does, because it's the proportion of the total sum of squares that is made up of the regression sum of squares

symbol for sample standard deviation

s

How stata gets the right number for the intercept and the slope of OLS

slope: the sum of squares of xy divided by the sum of squares of x intercept: average y- (slope)average x

symbol for sample variance

s² Calculated by gathering a lot of samples seeing how far away each one is from the average. Formula [1/(n-1)]∑ (x-xbar)² So you add up how far away each sample is from the mean and divide that by (almost) the number of samples

symbol for sample covariance

s²xy Calculated by gathering a lot of samples seeing how far away each one is from the average. Formula [1/(n-1)]∑ (x-xbar)(y-ybar) So you add up how far away each sample is from the mean and divide that by (almost) the number of samples. You can see how if x is going up from the average while y is going down, you'll get a negative covariance, showing that they don't move together.

The binomial distribution

tells us how to calculate the probability for the number of times an "either-or" outcome occurs in a fixed number of independent and identical attempts. • There are a fixed number of independent trials. • Each trial has two basic outcomes (e.g., "success"/"failure"). • The probability of a success, p, is constant across trial Formula: [number of trials!/(successes!failures!)]p(success)^(#succ) p(failure)^(#fail)

binomial distribution formula (probability you will get a success x amount of times)

tells us how to calculate the probability for the number of times an "either-or" outcome occurs in a fixed number of independent and identical attempts. Formula: [number of trials!/(successes!failures!)]p(success)^(#succ) p(failure)^(#fail)

the coefficient of determination. Measures how well the regression line fits the data. = SSR/SST

What does a linear regression predict?

the expected average y given x E(Y|X)

Three facts about the normal distribution

the expected value of x is µ (the population mean). the variance of x is the square of the σ (standard deviation) 95% of the time, X is within two standard deviations of the mean

Degrees of freedom for the regression, for the area, for the total

the regression contains in itself k degrees of freedom. The error contains as many degrees of freedom as there are datapoints, except you have to subtract the regression plus the constant (n-k+1) in which k is the number of variables you're regressing. Which means k+n-k-1 is actually n-1 degrees of freedom for the total

"t tests are conditional"

the significance or non0significance of a variable in the equation is conditional on the fact that the regression equation contains other variables

Why do you use the F test

to test multiple hypotheses at once [as opposed to the t test, which can onl check one hypothesis, ie B=0 or B1=B2]. Check if all the dummies are significant (test maed1 maed2 maed3, and if close to zero, we reject the null hypothesis that the coefficients are jointly equal to zero), check significance of a polynomial, check if one coefficient substantively equals another (effect of maeduc=effect paeduc. If small, we reject that ma=pa), check if all the interactions with your dummy variables are jointly significant (so rather than doing an estimates store, you can just joint F-test all three interactions)

How to turn a fully interacted OLS model into two equations

wage= a + b1(ed) +b1(maeduc) +b3(fem) + b3(ed*fem) + b4(maeduc*fem)+ e make one for male and one for female without the interactions, and then it assumed that every term is interacted: wagef= af+ bf1(ed) + b2f(maeduc) + ef Note, this only works in a FILLY INTERACTED MODEL


Related study sets

(2 Prelim) 3 The Human Person In The Environment

View Set

Cell and Molec Bio Chapter 6, 7, 8

View Set

Chapter Exam 2 - Policy Provisions, Options and Riders

View Set

CHAPTER 1, 2, 4, 5 PRE LECTURE QUIZZES

View Set

Chapter 05 - Cultural Implications evolve questions

View Set

Binomial Distribution Assignment

View Set

Chapter 11: Characterizing and Classifying Prokaryotes

View Set