Practice Final
Suppose a sample of five retail stores' monthly profits are: $4,000, $7,000, $5,000, $3,000, and $1,000. What will the sample variance of stores profits be?
$5 million
Suppose you want to build a 95% confidence interval for the ATE when the average outcome for the treated (55 subjects) is 12, with a sample standard deviation of 4, while the average outcome for the control (65 subjects) is 10, with a sample standard deviation of 6. Which of the following would be the proper confidence interval?
(12−10)±1.96(√(4^2/55+6^265))
Suppose you wanted to determine if you should reject the null hypothesis that running a playing fast tempo (as opposed to slow) music in your store has no effect on sales. On 65 randomly selected days with fast tempo music on the average store sales is $2,345 with a sample standard deviation of 45, while on 75 randomly selected days with slow tempo music on the average store sales is $2,555, with a sample standard deviation of 65. What would be the proper t-stat for this hypothesis test?
(2345−2555)/(√45^2/65+65^2/75)
Suppose you have a random sample of 2,179 credit scores from a population of mortgage applicants with a sample mean of 620 and sample standard deviation of 70, and would like to calculate the t-stat for the null hypothesis that the population mean is 610. Which of the following is the correct construction of the t-stat?
(620−610)/(70/√2179)
For a dichotomous treatment regression (X = 1 or 0), the mean outcome for the treated group (X = 1) is 35 and the mean outcome for the untreated group is 67. What will the slope of the regression line be?
-32
Suppose the regression of Output per Hour on employee Age yields a coefficient of -4.3 with a standard error of 1.2. Which of the following equations would properly report the 95% confidence interval for the population coefficient?
-4.3 ± 1.96(1.2)
Suppose the regression line to describe the relationship between Y and a dichotomous treatment (X = 1 for treated, = 0 untreated) is given by Y = 4 + 3X. Suppose that one of the observations that was treated was observed to have an outcome, Y = 8. For this observation, what is the residual?
1
Suppose you wanted to determine if you should reject the null hypothesis that running a playing fast tempo (as opposed to slow) music in your store decreases sales by $100 on average. On 65 randomly selected days with fast tempo music on the average store sales is $2,345 with a sample standard deviation of 45, while on 75 randomly selected days with slow tempo music on the average store sales is $2,555, with a sample standard deviation of 65. What would be the proper t-stat for this hypothesis test?
2345−2555+100)/√(45^2/65+65^2/75)
Suppose you want to build a 90% confidence interval for the ATE when the average outcome for the treated (55 subjects) is 12, with a sample standard deviation of 4, while the average outcome for the control (65 subjects) is 10, with a sample standard deviation of 6. Which of the following would be the proper confidence interval?
2±1.65(√(4^2/55) + (6^2/65))
Assuming you are testing a null hypothesis (two-tailed) about the population mean and determining whether to reject the null hypothesis based on the t-stat at the 95% confidence level, which t-stat would warrant rejecting the null hypothesis?
3
If one was estimating a simple regression of Earnings (Y) on Height of individuals (X), and got a coefficient on the Height variable of 30, what would the coefficient on Height be if you added 3 inches to every individual in the sample but kept their earnings the same?
30
If one is planning to use multiple regression to summarize how the variables X1, X2, X3 explain the variation in Y, how many parameters are involved in estimating the linear regression?
4
For a random variable that can take values from zero to 10, what would be the maximum sample variance that could be observed from a sample of two observations?
50
Suppose you have a random sample of 2179 credit scores from a population of mortgage applicants with a sample mean of 620 and known population standard deviation of 70, and would like to calculate the 99% confidence interval of the population mean credit score. Which of the following would be the correct construction?
620 ± 2.58 (√70/2179)
Suppose a sample of five econometrics students' heights were: 69 inches, 73 inches, 65 inches, 67 inches, and 71 inches. What is the sample mean of the height of the students?
69 inches
Suppose you have a random sample of 200 students' GMAT scores that have a sample mean of 700 and sample standard deviation of 50, and would like to calculate the 90 percent confidence interval of the population mean. Which of the following would be the correct construction?
700 ± 1.65 (√50/200)
The government runs an experiment where a random sample of 200 adults over 40 get a 5% tax cut and a random sample of 200 adults under 40 get no tax cut. Their results show that spending, on average, increased 8% with a 5% tax cut. They conclude that a 5% tax cut for all adult Americans will increase spending by 8%. Why is this logic flawed?
A 5% tax cut for all Americans suffers from selection bias.
Consider the following proposed determining function for the winning percentage of baseball teams WinPcti = 0.500 - 0.1 Team ERAi + 0.2 Team BAi + Ui, where Team ERA is the team earned run average, and Team BA is the team batting average. What is the effect (change) on winning percentage from an increase in Team ERA from 2.3 to 3.3?
A decrease of 0.1 in winning percentage
As the size of a random sample gets larger, what does the distribution of the sample mean begin to resemble?
A normal distribution
Randomizing the treatment in an experiment facilitates all of the following conclusions except for what?
ATE = 0
A confidence interval can be constructed for which of the following population parameters?
All of the answers: Population mean; Population standard deviation; Population variance
In a dichotomous regression all of the following conditions must hold except for what?
An equal number of positive and negative residuals.
To calculate the partial correlation of Coconut Milk household purchases CMi and Salami Deli Meat household purchases Si controlling for household income Yi, (i.e. pCorr(CMi,Si;Yi)), which of the following regressions will be run:
CMi = b0 + b1Yi + ei
Why can't the regression coefficients in correlation analysis be interpreted as the causal effect of a treatment on the outcome?
Co-movement/correlation amongst two variables could be generated by their relationship to other variables.
Which of the following limits the use of experimental data in business settings/applications?
Conducting experiments relevant for business questions are often not feasible.
In evaluating the hypothesis test on experimental data, all the following will change the resulting p-value except for what?
Confidence level
In evaluating the hypothesis test on experimental data, all the following will change the resulting test statistic except for what?
Confidence level
If we wish to use a regression line to determine the effect of multiple treatment levels (e.g., Treatment = 1, 2, 3), why can&'t we just plot the average outcome for each treatment level and "connect the dots?"
Connecting the dots generally will not form a line.
All of the following are conditions that will hold at the estimated coefficients for the simple linear regression line (of Y on X) except for what?
Covariance between Y and the residuals is zero.
In estimating the causal relationship between Sales and Price in the following determining function Salesi = β0 + β1Pricei + Ui, what assumption in addition to E [Ui ] = 0, justifies that the estimated coefficient on Price can be interpreted as an estimate of the causal effect of Price on Sales?
E [PriceiUi ] = 0
In testing if promotions (binary) Pi have an effect on sales, Si with a null hypothesis that the average treatment effect is zero, the null hypothesis could be written as:
E [Si (Pi = 1) - Si (Pi = 0)] = 0
Let Y1i be the output for factory i when it receives a new machine, and let Y0i be the output for factory i when it does not receive a new machine. Also let Yi be the realized output for factory i and let Di be a binary variable that equals one if factory i actually received a new machine. If the company's headquarters randomly assigns new machines to a subset of the company's factories, this ensures:
E(Yi|Di = 1) - E(Yi|Di = 0) = E(Y1i - Y0i)
If you have a sufficiently large random sample from the population, but do not know how treatment was assigned, what object will a confidence interval centered on the difference in average outcomes for the treated and control groups be describing?
ETT + Selection Bias
In trying to measure the "treatment effect" of getting an MBA on career earnings, the observation that individuals with an MBA have higher salaries than individuals without them is likely to suffer from which of the following?
ETT > ATE
Regardless of how treatment was assigned, what can be assumed about what the difference in the average outcomes of the treated and untreated groups is equal to?
Effect of the treatment on the treated + Selection Bias
Suppose you've run a regression relating Revenues to TV Ads and Online Ads. You are willing to make the necessary assumptions to deduce causality and run hypothesis tests. Your results are as follows: If you tested the null hypothesis that Online Ads have no impact on Revenues at the 90% confidence level (i.e., 90% degree of support), you would:
Fail to reject, and conclude there is insufficient evidence to establish that Online Ads impact Revenues C
Consider the following proposed determining function for a student's grade on the econometrics final, FinalGradei = 68 + 4 HoursStudiedi - 2 Num Other Finalsi + Ui, where hours studied is the number of hours studied during finals week by student i, and number of other finals is the number of other finals student i has during finals week. Derive the formula for the change in a student's final grade with respect to a unit change in the number of other finals the student has in finals week.
Final Grade decreases by 2.
Consider the following proposed determining function for a student's grade on the econometrics final, FinalGradei=68+4 HoursStudiedi−0.5 Num Other Finals2i+Ui , where hours studied is the number of hours studied during finals week by student i, and number of other finals squared is the number of other finals student i has during finals week squared. Derive the formula for the change in a student's final grade with respect to a unit change in the number of other finals the student has in finals week.
Final Grade decreases by Num Other Finals.
Why should we expect to estimate the treatment effect of price on quantity sold to be so difficult in nonexperimental data?
Firms vary their prices strategically in response to expectations of the resulting sales.
Why is Line A a better fit for the data in this graph?
For Line A, the average error (difference between the actual Profits and point on the line) is zero.
Why is Line B a better fit for the data in this graph?
For Line B, the residuals (difference between the actual Profits and point on the line) are uncorrelated with Price.
If you were building a hypothesis test to determine whether or not the price elasticity of demand for your product is -3.0, which of the following would be a natural null hypothesis?
H0: Price Elasticity of Demand = -3.0
Suppose you are given a set of regression results on the role of a random treatment assignment experiment on the effect of an ad campaign on consumers' willingness to buy your product. The regression results report an R-squared of 0.04. How should you value the regression results?
Highly, despite the low r-squared it is likely the effect estimated is causal and valuable for active prediction.
In principle, why are t-statistics and critical values not as useful in the practical construction of confidence intervals in most applications?
In large samples (i.e., large degrees of freedom) the t-distribution and standard normal distribution are very similar.
Suppose you're running a multiple regression of Home Prices (in thousands of $) on five different treatment variables including number of bedrooms, number of bathrooms, total square feet, total lot size, and garage size, where all the treatment variables have been standardized (i.e., transformed to have a mean of zero and standard deviation equal to 1). If the coefficient on number of bedrooms is estimated to be 3, how would you interpret the coefficient on the number of bedrooms?
Increasing the number of bedrooms by 1 standard deviation, holding number of bathrooms, total square feet, lot and garage size fixed increases the average home price by three thousand dollars.
Suppose you estimate the following regression of a firm's Sales and number of employees at each location across the country: Sales = 95,342 + 0.76 Number of Employees. You are willing to believe you have a random sample of store locations, and a sufficiently large sample size. Which statements are not yet justified by the regression results?
Increasing the number of employees at a store by 2 will raise sales by 0.76 × 2 = 1.52
Suppose you&'re running a multiple regression of Home Prices on five different treatment variables including number of bedrooms, number of bathrooms, total square feet, total lot size, and garage size, where all the treatment variables have been standardized (i.e.,Which condition transformed to have a mean of zero and standard deviation equal to 1). Which conditions about the multiple regression must hold?
Intercept for the multiple linear regression equals the sample average home price.
Which of the following will yield nonexperimental data?
Investment performance over the last ten years of several portfolio managers
As the size of the random sample gets larger, what happens to the standard deviation of the distribution for the sample mean?
It gets smaller.
In the scientific method why is it crucial to start with a question before conducting the empirical analysis?
It motivates the specific variation in the treatment required to test the hypothesis.
For the same sample, the 95% confidence interval will have what relation to the 99% confidence interval?
It will be smaller.
Suppose you had a random sample of 50 observations with a sample mean of 10 and sample standard deviation of 5. Suppose another observation is observed that has a value of 10. How will the sample mean change from the original sample?
It will not change.
What is the most critical hurdle using nonexperimental data must overcome?
Lack of the random assignment of the treatment
Suppose you estimate the following regression of a Movie's ticket sales and the season of year (Summer =1 if in May, June, July, August, =0 otherwise) that movie's initial release was in: Sales = 295,342 + 40.24 Summer. You are willing to believe you have a random sample of store locations, and a sufficiently large sample size. Which statements are well justified by the regression results?
Movies released in the summer tend to have higher ticket sales.
If you have a sufficiently large sample that was randomly drawn from your target population with a randomly assigned treatment then all of the following conditions hold except for what?
None of the answers are correct
Suppose you have a random sample of 21 credit scores from a population of mortgage applicants with a sample mean of 620 and sample standard deviation of 70, and would like to calculate the 99 percent confidence interval of the population mean credit score. Which of the following would be the correct construction?
None of the answers are correct
If one was attempting to estimate the parameter, b, that best explains the relationship between Sales and price using the following equation Sales = (Price - b)2. Which of the following methods would be the most appropriate to estimate b?
Nonlinear regression
In building a confidence interval for ATE in experimental data which of the following is not required?
Null hypothesis proposed value of the ATE
A convenient way to modify your degree of support for a decision on a hypothesis test is to calculate the ________ in conjunction with test statistic?
P-value
Under which sort of prediction is R-squared particularly informative of the value of the regression results?
Passive prediction
Which of the following settings would require the use of multiple regression (as opposed to simple regression)?
Predicting grocery sales as a function of price and local population.
Which of the following settings best describes a setting in which one would be making an active prediction?
Predicting the change in click-through rates following the change in banner size.
Which of the following settings best describes a setting in which one would be making a passive prediction?
Predicting vehicle sales as a function of the daily max temperature and total rainfall.
Which of the following is an example of nonrandom treatment assignment in nonexperimental data?
Prices impact on number of products sold
Which of the following departments is likely to experience experimental data more frequently than nonexperimental data?
R&D
The assumption of no correlation between the error term (U) and treatment(s) (X) is similar to the assumption of what aspect used in the scientific method?
Random assignment of the treatment
What are the two key assumptions necessary to establish causality from a regression model?
Random sample of participants and random assignment of the treatment
If the t-stat for the sample estimate of a coefficient, M1 calculated as the following t=∣m1/Sm1∣ , where m1 is the estimated coefficient and Sm1 is the properly estimated standard error for the coefficient, comes out to be 2.7, what is the appropriate conclusion?
Reject the null hypothesis that the population coefficient M1 = 0 at a 99% confidence level.
If the t-stat for the sample estimate of a coefficient, M1 calculated as the following t=∣∣∣m1−1Sm1∣∣∣ , where m1 is the estimated coefficient and Sm1 is the properly estimated standard error for the coefficient, comes out to be 2.7, what is the appropriate conclusion?
Reject the null hypothesis that the population coefficient M1 = 1 at a 99% confidence level.
If the t-stat for a hypothesis test (with a two-sided alternative) comes back as 2.6, what would be the appropriate conclusion to draw regarding the null hypothesis under a 95% confidence level?
Reject the null hypothesis.
For a given set of sample statistics, changing the null hypothesized value (K) for a population mean changes everything except for what?
Sample standard deviation
When running an experiment, suppose we assume the participants in the experiment are a random sample of the population. Let Yi be the outcome for individual i and let Di equal one if individual i receives the treatment and zero otherwise. What does the assumption of a random treatment imply?
Selection Bias = 0
In trying to measure the "treatment effect" of taking Tylenol on reducing your "next day's temperature from today", and given the fact that you only take Tylenol when you're feeling sick, which of the following conditions are likely?
Selection Bias > 0
How can applying the insights from the experiment ideal and the scientific method approach facilitate better analysis with nonexperimental data?
Sharpen attention towards the variation in treatment that is most appropriate for measuring treatment effects
To calculate the semi-partial correlation of Coconut Milk household purchases CMi and Salami Deli Meat household purchases Si controlling for household income Yi, (i.e. pCorr(CMi,Si(Yi)), which of the following regressions will be run:
Si = b0 + b1Yi + ei
If one is trying to explain the time series variation in a stock price for a company by using the number of twitter mentions that day, which method is most appropriate?
Simple regression
The methods for solving for the intercept and slope of all of the following procedures will yield identical estimates except for which procedure?
Solving minb,m (∑Ni=1ei)/N)
Given that nonexperimental datasets are likely to have treatments that have not been randomly assigned, what is likely to contribute to difference in average outcomes between groups with different treatment levels?
Some of those differences are driven by a selection bias
Nonexperimental data are likely to have issues with all of the follow assumptions except for what?
Sufficiently large sample size to appeal to normality
If you are comfortable with assumptions required for causal analysis and you have estimated the relationship between Sales and running a promotion together with a price discount to be Salesi = 140.3 (60) + 4.3(0.7) Promo with standard errors reported in parenthesis. What would you predict to occur in the event of running a discount next week?
That it will increase sales by 4.3, and you are 90% confident that Promotions have some effect on Sales. C
In making an active prediction that using a large banner advertisement will increase click-through rates based on a sample of data, what is not an appropriate criticism that someone might have for your prediction?
That the underlying population distribution is not normal.
Suppose you have a random sample of employees in your company and their tenure. The sample mean of this sample is 4.2 years and the sample standard deviation is 4.5 years. How would knowing that the random sample was of size 100 instead of 60 change the 90% confidence interval for the population mean of employee tenure?
The 90% confidence interval will be smaller for 100 than 60.
Suppose you're running a multiple regression of Home Prices on two treatment variables, City, which is a binary variable for whether or not the home is located in a city or not, and Finished Basement,which is a binary variable for whether or not the home has a finished basement. If you solve for the multiple regression using the moment conditions all of the following conditions must hold except for what?
The correlation between the residuals and Home Prices must be equal to zero.
Which of the following is not a condition to estimate a model for causality?
The error terms are distributed normally.
Suppose your lead analyst runs a simple regression of Profits (Y) on price (X). You know that the average profit in the sample was $1,000 and the average price was $25. If your analyst reports that the intercept from the simple regression is 900, what can you infer about the estimated slope?
The estimated slope is 4.
Suppose Apple is considering increasing its advertising expenditures to promote the most recent iPhone. To try to assess the effect of such a move, it looks at sales in several small markets. In some of these markets Apple increased advertising expenditures by 30% and in others there was no change. When conducting this "experiment," what is the treatment?
The increase in advertising expenditures
Why is it helpful to think about the moment conditions of a simple linear regression even if OLS yields the same estimates?
The moment conditions are used directly to produce the slope and intercept.
Why is it helpful to think about the moment conditions of a simple linear regression even if OLS yields the same estimates?
The moment conditions facilitate assessing assumptions about causality.
Which of the following are you assuming is true to calculate the p-value of a test statistic?
The null hypothesis
To use the normal distribution to calculate the p-value of a hypothesis test involving the population mean, which assumption is not necessary?
The null hypothesis is zero.
In the dichotomous regression if one was to replace the regression line prediction of the outcome means (for treated/untreated) with medians, which of the following conditions would hold?
The number of strictly positive residuals would equal the number of strictly negative residuals.
Which of the following is not an assumption required to build a confidence level for an ATE in an experimental design?
The outcomes for the treated group and control group are the same
Suppose you have a random sample of employees in your company and their tenure. The sample mean of this sample is 4.2 years and the sample standard deviation is 4.5 years. How would knowing that the random sample was of size 100 instead of 60 change the p-value of the hypothesis test?
The p-value will be smaller for 100 than 60.
Which of the following is not a required assumption in the reasoning behind constructing a hypothesis test of a population mean?
The population distribution is normal.
After estimating a regression of your firm's store sales and the number of local competitors as follows: Sales = 321,752 + 70.35 Number of Competitors. You are willing to believe you have a random sample of store locations, and a sufficiently large sample size. What is wrong with the following logic, "We need more competitors to enter the markets we're in, so that our sales will rise?"
The positive coefficient on number of competitors is not a causal estimate.
In running the simple linear regression of Y on X, if you know that no observation in your sample has an X value that is equal to X⎯⎯⎯ , what condition might not be true?
The residual for the observation closest to X⎯⎯⎯ will be zero.
Which of the following conditions ensures that the estimates of the coefficients for the population regression equation are distributed normally?
The sample is "large" enough.
Which of the following conditions is necessary for the sample estimates of the population coefficients that best describe the co-movement amongst the variables to be consistent?
The sample is a random sample.
Suppose you send out 350 surveys to random sample of all past customers (your target population) asking them to report their level of satisfaction with your product. Of the 350, you used the 112 that responded to the survey to construct a confidence interval for the population "satisfaction score." What might be a potential problem with this confidence interval?
The sample you're using is not a random sample from the target population.
Suppose one runs the regression of Y on X1 and X2 and the coefficient on X2 is positive. Which of the following correlation conditions must hold in the sample?
The semi-partial correlation between Y and X2 holding X1 constant must be positive.
All of the following are statements of the criteria used to find the line that "best" describes the data in a multiple regression except for what?
The size of the residuals is not correlated with the outcome level.
Beyond the conditions required for consistent estimation of a model for causality, in order to conduct inference what additional assumption is required?
The size of the sample is sufficiently large (e.g., 30(K + 1), where K is the number of coefficients).
Which of the following is a reason why estimating the slope and intercept of a simple linear regression line using the least absolute deviations (LAD) approach is not as common as OLS?
The solution for LAD isn&'t always unique. C
Suppose you're running a multiple regression of Home Prices on two treatment variables, City, which is a binary variable for whether or not the home is located in a city or not, and Finished Basement, which is a binary variable for whether or not the home has a finished basement. If you solve for the multiple regression using the moment conditions which condition must hold?
The sum of the residuals for the observations with Finished Basements (=1) must be zero.
In a dichotomous regression which condition must hold?
The sum of the residuals for treated and untreated groups must be equal.
In a broad sense, the role of a confidence interval for the population mean is meant to accurately portray what?
The uncertainty involved with observing a sample and not the entire population.
When calculating the sample variance of a random sample, you divide the sum of the squared deviations (from the sample mean) by N - 1 instead of N to ensure the estimator achieves what property?
Unbiasedness
Which of the following objects must be the same sign as the covariance of variables X and Y?
Unconditional correlation of X and Y.
Consider the following proposed determining function for the number of views on the New York Times home webpage for a day, i. NumViewsi = 1400 - 700 Weekend Dayi + 200 Election Yeari + Ui, where weekend day is binary variable for if day i is a Saturday or Sunday, and election year is a binary variable for if day i is in an election year. Derive the formula for the change in views with respect to a change in going from a weekday to a weekend (holding Election Year constant).
Views increase by 700.
In the simple linear regression, the intercept will equal the sample average of the outcome (Y) variable if which of the following is true?
X--=0
Which of the following equations cannot be estimated using linear regression techniques?
Y = m1X × m2Z
After estimating a regression of each employees number of contracts sold and their tenure (in number of years) at the company as follows: Contracts = 30.5 + 4.5 Tenure. You are willing to believe you have a random sample of employees, and a sufficiently large sample size. Given these results, is it appropriate to make the following claim, "Our more tenured employees at the company on average get awarded more contracts"?
Yes, you're making a passive prediction.
If the determining function for the likelihood of a mortgage application receiving a loan is given by, Loan Successful = 0.60 + 0.01 Credit Score. What would be the causal effect of increasing your credit score by 10 points?
Your expected probability of getting a loan would increase by 0.01(10) = 0.10
The distinction between causality and correlation is best described as:
causality implies a change in one variable creates a change in another, correlation implies variables move together.
The step that requires the use of inductive reasoning when making an active prediction from a sample of data is:
determining the population parameter from a sample.
As long as your sample is large enough, you don't have to worry about using the sample standard deviation in place of the unknown population standard deviation in constructing a confidence interval because ________.
for a large sample, the t-distribution is similar to the standard normal distribution
If the residuals of a regression model, Yi = B + MXi + Ei, satisfy the condition that their variances are constant across all values of X, then they are said to be:
homoscedastic.
When implementing the scientific method, one moves from a research question to a proposed idea based on limited evidence that justifies further investigation, also known as a:
hypothesis
Suppose a sample of five econometrics students' heights were: 69 inches, 73 inches, 65 inches, 67 inches, and 71 inches. The standard deviation of the heights will be in what units?
inches
In the following determining function, Earningsi = α0 + α1Educationi + Ui, what might be a factor contained in Ui?
innate ability, which might be correlated with education and earnings
While there are a few instances in the business world where one will observe experimental data, understanding the scientific method is critical because:
it's the gold standard for establishing causality.
If one is trying to explain the cross-sectional variation in prices for milk across grocery stores in the country using commercial rental prices and a binary variable for if the grocery store chain owns a dairy farm, which method is most appropriate?
multiple regression
Given the average outcomes for the treated and control groups, and their respective sample standard deviations how does the number of observations impact the p-value of a hypothesis test?
none of the answers
Given the average outcomes for the treated and control groups, and their respective sample standard deviations, how does the number of observations impact the spread of a confidence interval?
none of the answers
Can a t-stat be negative for the hypothesis test of the population mean of the heights of econometrics students (which will always be positive)?
none of the answers are correct
If one is attempting to make a prediction on how much sales will increase in the event of a price discount of 10%, which step will not use deductive reasoning in conducting the prediction?
none of the answers are correct
Suppose one runs the regression of Y on X1 and X2 and both coefficients on X1 and X2 are positive. All of the following correlation conditions must hold in the sample except for which one?
none of the answers are correct
If one was estimating a simple regression of Earnings (Y) on Height of individuals (X), and got a coefficient on the Height variable of 30, what would the intercept be if you added 3 inches to every individual in the sample but kept their earnings the same?
none of the choices are correct
As long as our sample is large enough it will be the case that the average outcome for the treated group should have what distribution?
normal
As long as our sample is large enough it will be the case that the average outcome for the untreated group should have what distribution?
normal
If the determining function for Sales is given by Salesi = α0 + α1Pricei + Ui, what will the correlation between Sales and Price be?
not enough information
The critical hurdle with measuring treatment effects is that:
our subjects cannot be both untreated and treated at the same time.
The correlation between X and Y holding at least one other variable constant is known as:
partial correlation.
The most robust (but perhaps impractical) way to estimate the price elasticity of demand for your product would be:
randomize the price over a period of time and estimate the difference in sales resulting from those changes.
The difference between the observed outcome and the corresponding point on the regression line for a given observation is a:
residual
Which of the following sample statistics influences the sign of the slope coefficient in the simple linear regression (of Y on X)?
sCov(X,Y)
In the simple linear regression, the intercept will equal the sample average of the outcome (Y) variable if which of the following is true?
sCov(X,Y) = 0
The R-squared of a regression is 1 - X, where X is the:
sum of squared residuals divided by the total sum of squares.
Suppose you have a random sample of 2,179 credit scores from a population of mortgage applicants with a sample mean of 620 and sample standard deviation of 70, and would like to determine if this is sufficient enough to rule out that the population mean is not 610. Which of the following objects would you calculate to make this decision?
t-stat
A treatment effect is:
the change in the outcome resulting from variation in the treatment.
The total sum of squares is given by the sum of:
the squared difference between each observation Y and the average value for Y.
In running an experiment it is crucial that there exists both subjects who receive the treatment, but also subjects who do not because:
without variation in the treatment, testing the hypothesis would not be feasible.
When making passive predictions, it is not important to be able conclude that:
your estimate describes a causal relationship between treatments and the outcome.
When making active predictions, it is important to be able to conclude that:
your estimate of the coefficients is a causal estimate.
Using OLS to solve for the slope and intercept of a simple linear regression will yield a regression line that satisfies which of the following conditions?
∑Ni=(1eiXi)/N=0
If you're running a multiple regression of employee Hours Worked on Tenure (in number of years) and MBA (a binary variable equal to 1 for an employee with an MBA, 0 otherwise), which moment conditions would be used?
∑Ni=1(Hoursi−b−m1Tenurei)MBAi)/N=0
If you're running a multiple regression of employee Hours Worked on Tenure (in number of years) and MBA (a binary variable equal to 1 for an employee with an MBA, 0 otherwise), what moment conditions would not be used?
∑Ni=1(Hoursi−b−m1Tenurei)MBAi/N=0
Suppose that the following regression equation best describes the co-movement between Sales, Price and Number of Competitors: Salesi = B + M1Pricei + M2NumCompi. What moment condition would not be used to yield a consistent estimate of B, M1, M2?
∑Ni=1(Salesi−b−m1Pricei)Pricei/N)=0
To determine the intercept and slope coefficient in a simple linear regression line of Y on X, all the following conditions will be used except for what?
∑Ni=1(Yi−mXi ) Xi/ N=0
Using OLS to solve for the slope and intercept of a simple linear regression will yield a regression line that satisfies which of the following conditions?
∑Ni=1ei/N=0