Module 11: Machine Learning
What do we mean when we say the least squares estimator 𝛽̂ is unbiased?
Assuming a specific data generating process, if we engage in the OLS procedure, then on average across many draws, 𝛽̂ will be equal to the true 𝛽 in the population.
True or False: Machine algorithms provide unbiased, consistent estimators.
False
True or False: One benefit of the estimation framework is that it guarantees causality.
False
True or False: The complexity of your model is something that you need to decide on ex ante.
False
True or False: Throughout the lecture Professor Mullainathan uses the concept of a problem being a "𝛽̂ problem" when referring to problems of finding the better predictors and, and a "𝑦̂ problem" being problems involved with finding the better estimates of a probability distribution process.
False
What is true regarding Principal Component Analysis (PCA):
In PCA, principal component one captures the first component of most variance
When fitting a decision tree, what constraints could be made?
Increasing the lower bound of observations per leaf (node) correct Restricting the total number of leafs (nodes) in tree
Suppose you are doing a predictive model using regressions. Which of the following could you do to avoid overfitting your data?
Limit the number of variables allowed in the regression
In what sense do we mean estimation "overfits" relative to prediction?
OLS minimizes the in-sample loss
Suppose you want to measure how Twitter sentiment predict upcoming political protests, pre-processing the data with machine learning would:
Perform sentiment analysis on tweets
Even if we cannot back out individual coefficients from statistical learning methods, these methods can be useful in economics through:
Processing different data to generate economic indicators of interest Answering 𝑦̂ problems, where the relevant policy question relies on our ability to make accurate predictions, not on a causal question
Which of the following does Prof. Mullainathan cite as a reason for why the Artificial Intelligence (AI) approach to solving complex problems, such as face recognition, ultimately stalled?
Sheer complexity of the number of variations that could be encountered Subtleties, such as in language, that are difficult to account for
Which of the following are reasons why the statistical analysis approach to solving complex problems such as sentiment analysis and visual recognition is better than the Artificial Intelligence Approach?
The algorithms generated using the statistical analysis approach tend to have greater predictive power The statistical analysis approach allows you to identify patterns in the data that a human may not have thought of on his/her own
One big debate in this literature is whether human capital is a reason for why countries stay poor. True or False: Since this is a question about whether or not human capital can explain some of the GDP differences, machine learning is not useful in this context.
false
True or False: Generally, when approaching a computational problem such as sentiment analysis, the artificial intelligence community first determines how humans could more efficiently and effectively perform the task at hand, and then programs algorithms that aim to be superior to human knowledge.
false
True or False: In machine learning, cross validation is used to find the model that maximizes the out of sample prediction.
false
True or False: In regards to high dimensional prediction, the term "high dimensional" refers to when a dataset has more raw variables than observations.
false
True or False: Machine learning algorithms decide the optimal training-tuning data split.
false
True or False: Machine learning techniques are always better to use than traditional estimation techniques.
false
True or False: Minimizing bias in statistical models leads to better predictions.
false
True or False: Prediction is always "observable," whereas estimation is always "unobservable".
false
True or False: Relative to survey data, data which is pre-processed using machine learning methods is much less reliable.
false
True or False: The "magic trick" for solving problems such as face recognition and sentiment analysis is to simply better understand how humans approach these problems and apply this information using the Artifical Intelligence approach.
false
True or False: The only reason prediction methods don't work well for estimation problems is that prediction methods don't do the bias-variance tradeoff.
false
True or False: The statistical approach to solving problems such as sentiment analysis was eagerly accepted by the Artificial Intelligence (AI) community
false
True or False: Unlike parameter estimation, prediction does not require the probability distribution of your sample to be independently and identically distributed (i.i.d).
false
True or False: When fitting a decision tree, the decision tree that best fits the data will perform better predictions.
false
The higher the complexity of your model
the higher the signal, and the more likely you are to have an overfitting problem also the worse it will perform out-of-sample
True or False: Consider the example discussed in class. The eigenfaces constructed via PCA each are a separate principal component.
true
True or False: Machine learning has the potential to improve our estimations in the world, by helping us produce certain variables, parameters, and data that can later be run in our estimation machine.
true
True or False: Policy questions of interest are not always causal in nature.
true
True or False: Sentiment analysis is a way of using language to infer whether a particular statement is positive or negative.
true