Module 5...
The degree to which a machine learning model retains its explanatory power when predicting out-of-sample is most commonly described as: A)generalization. B)predominance. C)hegemony.
A is correct
When the training data contains the ground truth, the most appropriate learning method is: A)supervised learning. B)unsupervised learning. C)machine learning.
A is correct
Which of the following is NOT a model that has a qualitative dependent variable? A)Event study. B)Logit. C)Discriminant analysis.
A is correct
Which of the following is least accurate regarding the Durbin-Watson (DW) test statistic? A)If the residuals have positive serial correlation, the DW statistic will be greater than 2. B)If the residuals have positive serial correlation, the DW statistic will be less than 2. C)In tests of serial correlation using the DW statistic, there is a rejection region, a region over which the test can fail to reject the null, and an inconclusive region.
A is correct
An AR(1) autoregressive time series model: A)can be used to test for a unit root, which exists if the slope coefficient equals one. B)can be used to test for a unit root, which exists if the slope coefficient is less than one. C)cannot be used to test for a unit root.
A is correct If you estimate the following model xt = b0 + b1 × xt-1 + et and get b1 = 1, then the process has a unit root and is nonstationary.
A U.S. firm is most likely to introduce into a simulation a constraint related to negative book value of equity, because if the firm experiences negative book value of equity: A)loan covenants may allow lenders to gain control of the firm. B)the firm will probably have to cease operations and liquidate. C)the firm will be prohibited by law from paying dividends.
A is correct Banks that lend to firms sometimes include loan covenants that allow the bank to gain partial control of the firm if the book value of equity becomes negative. In some Asian countries (though not in the U.S.), a firm with negative book value is prohibited by law from paying dividends. A firm does not have to cease operations if its book value of equity becomes negative; hundreds of firms in the U.S. continue to operate despite negative book value of equity.
Big data is most likely to suffer from low: A)veracity. B)variety. C)velocity.
A is correct Big data is defined as data with high volume, velocity, and variety. Big data often suffers from low veracity, because it can contain a high percentage of meaningless data.
If the different risks that an investment is exposed to are correlated, the least appropriate approach to probabilistic risk assessment would be: A)decision trees. B)simulations. C)scenario analyses.
A is correct Correlated risks are difficult to model using decision trees. When investment risks are correlated, simulations can be employed in order to incorporate modeling of these correlations. Correlations can also be dealt with subjectively in scenario analyses, by generating scenarios that reflect correlations.
Alexis Popov, CFA, is analyzing monthly data. Popov has estimated the model xt = b0 + b1 × xt-1 + b2 × xt-2 + et. The researcher finds that the residuals have a significant ARCH process. The best solution to this is to: A)re-estimate the model with generalized least squares. B)re-estimate the model using only an AR(1) specification. C)re-estimate the model using a seasonal lag.
A is correct If the residuals have an ARCH process, then the correct remedy is generalized least squares which will allow Popov to better interpret the results.
Which of the following statements regarding multicollinearity is least accurate? A)Multicollinearity may be present in any regression model. B)Multicollinearity may be a problem even if the multicollinearity is not perfect. C)If the t-statistics for the individual independent variables are insignificant, yet the F-statistic is significant, this indicates the presence of multicollinearity.
A is correct Multicollinearity is not an issue in simple linear regression. Multicollinearity is a condition where two INDEPENDENT VARIABLES in a multiple regression equation are highly correlated with one another
When two or more of the independent variables in a multiple regression are correlated with each other, the condition is called: A)multicollinearity. B)serial correlation. C)conditional heteroskedasticity.
A is correct Multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. This condition distorts the standard error of estimate and the coefficient standard errors, leading to problems when conducting t-tests for statistical significance of parameters.
Which of the following is least likely to contribute to a meaningful output using simulations? A)ad-hoc parameter estimates. B)stationary input distributions. C)static correlations across inputs.
A is correct Some of the issues that might prevent a simulation from generating meaningful output include: Ad-hoc specification (rather than specification based on sound analysis) of parameter estimates (i.e. the garbage-in, garbage-out problem), changing correlations across inputs, non-stationary distributions, and real data that does not fit (pre-defined) distributions.
this is a collection of a distinct set of tokens from all the texts in a sample dataset. A)bag-of-words. B)tokenization. C)stemming.
A is correct Text is considered to be a collection of tokens, where a token is equivalent to a word. Tokenization is the process of splitting a given text into separate tokens. Bag-of-words (BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset. Stemming is the process of converting inflected word forms into a base word.
One choice a researcher can use to test for nonstationarity is to use a: A)Dickey-Fuller test, which uses a modified t-statistic. B)Dickey-Fuller test, which uses a modified χ2 statistic. C)Breusch-Pagan test, which uses a modified t-statistic.
A is correct The Dickey-Fuller test estimates the equation (xt - xt-1) = b0 + (b1 - 1) * xt-1 + et and tests if H0: (b1 - 1) = 0. Using a modified t-test, if it is found that (b1- 1) is not significantly different from zero, then it is concluded that b1 must be equal to 1.0 and the series has a unit root.
An analyst is trying to determine whether stock market returns are related to size and the market-to-book ratio, through the use of multiple regression. However, the analyst uses returns of portfolios of stocks instead of individual stocks in the regression. Which of the following is a valid reason why the analyst uses portfolios? The use of portfolios: A)reduces the standard deviation of the residual, which will increase the power of the test. B)will remove the existence of multicollinearity from the data, reducing the likelihood of type II error. C)will increase the power of the test by giving the test statistic more degrees of freedom.
A is correct The use of portfolios reduces the standard deviation of the returns, which reduces the standard deviation of the residuals.
To make a bag-of-words (BOW) concise, the most appropriate procedure would be to: A)eliminate high- and low-frequency words. B)use a word cloud. C)use N-grams.
A is correct To make a BOW concise, usually high- and low-frequency words are eliminated. High-frequency words tend to be stop words or common vocabulary words. A word cloud is a text data visualization tool. N-grams are used when the sequence of words is important. (LOS 8.f)
Overfitting is least likely to result in: A)higher forecasting accuracy in out-of-sample data. B)inclusion of noise in the model. C)higher number of features included in the data set.
A s correct Overfitting results when a large number of features (i.e., independent variables) are included in the data sample. The resulting model can use the "noise" in the dependent variables to improve the model fit. Overfitting the model in this way will actually decrease the accuracy of model forecasts on other (out-of-sample) data.
Categorical Dependent Variables is another name for A.) Quantitative Dependent Variables B.) Qualitative Dependent Variables C.) Breush - Pagan Test
B is correct
An analyst runs a regression of portfolio returns on three independent variables. These independent variables are price-to-sales (P/S), price-to-cash flow (P/CF), and price-to-book (P/B). The analyst discovers that the p-values for each independent variable are relatively high. However, the F-test has a very small p-value. The analyst is puzzled and tries to figure out how the F-test can be statistically significant when the individual independent variables are not significant. What violation of regression analysis has occurred? A)serial correlation. B)multicollinearity. C)conditional heteroskedasticity.
B is correct An indication of multicollinearity is when the independent variables individually are not statistically significant but the F-test suggests that the variables as a whole do an excellent job of explaining the variation in the dependent variable.
An executive describes her company's "low latency, multiple terabyte" requirements for managing Big Data. To which characteristics of Big Data is the executive referring? A)Velocity and variety. B)Volume and velocity. C)Volume and variety.
B is correct Big Data may be characterized by its volume (the amount of data available), velocity (the speed at which data are communicated), and variety (degrees of structure in which data exist). "Terabyte" is a measure of volume. "Latency" refers to velocity.
When evaluating the fit of a machine learning algorithm, it is most accurate to state that: A)precision is the percentage of correctly predicted classes out of total predictions. B)recall is the ratio of correctly predicted positive classes to all actual positive classes. C)accuracy is the ratio of correctly predicted positive classes to all predicted positive classes.
B is correct Recall (also called sensitivity) is the ratio of correctly predicted positive classes to all actual positive classes. Precision is the ratio of correctly predicted positive classes to all predicted positive classes. Accuracy is the percentage of correctly predicted classes out of total predictions.
_____________ is the ratio of correctly predicted positive classes to all predicted positive classes A.) Recall (sensitivity) B.) Precision C.) Accuracy
B is correct Recall (also called sensitivity) is the ratio of correctly predicted positive classes to all actual positive classes. Precision is the ratio of correctly predicted positive classes to all predicted positive classes. Accuracy is the percentage of correctly predicted classes out of total predictions.
An analyst modeled the time series of annual earnings per share in the specialty department store industry as an AR(3) process. Upon examination of the residuals from this model, she found that there is a significant autocorrelation for the residuals of this model. This indicates that she needs to: A)switch models to a moving average model. B)revise the model to include at least another lag of the dependent variable. C)alter the model to an ARCH model.
B is correct She should estimate an AR(4) model, and then re-examine the autocorrelations of the residuals.
Which of the following would be most appropriate approach to probabilistic risk assessment when facing risks with continuous (rather than discrete) outcomes: A)decision tree. B)simulation. C)scenario analysis.
B is correct Simulations are better-suited to continuous risks, while scenario analyses and decision trees are generally built around discrete outcomes.
The process of splitting a given text into separate words is best characterized as: A)bag-of-words. B)tokenization. C)stemming.
B is correct Text is considered to be a collection of tokens, where a token is equivalent to a word. Tokenization is the process of splitting a given text into separate tokens. Bag-of-words (BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset. Stemming is the process of converting inflected word forms into a base word.
Which of the following most accurately describes one of the advantages of using simulations in decision making? A)Better decisions. B)Better estimation of input variables. C)Better estimates of expected value.
B is correct The Two advantages of using simulation in decision making are 1) Better input estimation and 2) Simulation yields a distribution for expected value rather than a point estimate. Simulations do not 1) yield better estimates of expected value than conventional risk adjusted value models, nor 2) lead to better decisions.
Using cross-sectional data to define the probability distribution of a variable in a simulation is most appropriate when: A)parameter estimates have low variability across companies. B)the peer data is representative of the subject. C)reliable historical data is available that covers a long period of time.
B is correct The cross-sectional variability of peer data can be used to proxy the distribution of a specific variable if the peers are representative of the subject. Cross-sectional data is compiled from observations of many subjects (in this case, firms) at the same point of time (not over time).
_______________ this tests for Conditional Heteroskedasticiity in the error of the regression A.) B.) Durbin Watson Test C.) Breusch - Pagan Test
C
Considering the various supervised machine learning algorithms, a penalized regression where the penalty term is the sum of the absolute values of the regression coefficients best describes: A)k-nearest neighbor (KNN). B)support vector machine (SVM). C)least absolute shrinkage and selection operator (LASSO)
C is correct
Which of the following statements about supervised learning is most accurate? A)Supervised learning requires human intervention in machine learning process. B)Supervised learning does not differentiate between tag and features. C)Typical data analytics tasks for supervised learning include classification and prediction.
C is correct
When constructing a regression model to predict portfolio returns, an analyst runs a regression for the past five year period. After examining the results, she determines that an increase in interest rates two years ago had a significant impact on portfolio results for the time of the increase until the present. By performing a regression over two separate time periods, the analyst would be attempting to prevent which type of misspecification? A)Forecasting the past. B)Using a lagged dependent variable as an independent variable. C)Incorrectly pooling data.
C is correct The relationship between returns and the dependent variables can change over time, so it is critical that the data be pooled correctly. Running the regression for multiple sub-periods (in this case two) rather than one time period can produce more accurate results.
An analyst is building a regression model which returns a qualitative dependent variable based on a probability distribution. This is least likely a: A)probit model. B)logit model. C)discriminant model.
C is correct A probit model is a qualitative dependant variable which is based on a normal distribution. A logit model is a qualitative dependant variable which is based on the logistic distribution. A discriminant model returns a qualitative dependant variable based on a linear relationship that can be used for ranking or classification into discrete states.
Big data is characterized by A.) Volume, Velocity, Values B.) Values Variety, Velocity C.) Variety, Volume , Velocity
C is correct Big Data may be characterized by its volume (the amount of data available), velocity (the speed at which data are communicated), and variety (degrees of structure in which data exist). "Terabyte" is a measure of volume. "Latency" refers to velocity.
In designing a simulation, the step involving identification of probabilistic variables should most likely: A)define probability distributions for every input in a valuation. B)maximize the number of variables that are allowed to vary in a simulation. C)focus attention on a few variables that have a big impact on value.
C is correct In the "Determine probabilistic variables" step, it makes sense to focus attention on a few variables that have a significant impact on value, rather than trying to define probability distributions for dozens of inputs that may have only a marginal impact on value.
Which of the following most accurately describes one of the key steps in running a simulation? Check for: A)serial correlation of residuals. B)heteroskedasticity within variables. C)correlation across variables.
C is correct It is important that we check for correlations across variables before we run a simulation, in order to identify variables that are likely to be correlated with each other. After identifying correlated variables, we can address the issue by either choosing only one of the inputs to vary, or to explicitly build that correlation into the simulation.
Which of the following statements regarding heteroskedasticity is least accurate? A)The presence of heteroskedastic error terms results in a variance of the residuals that is too large. B)Multicollinearity is a potential problem only in multiple regressions, not simple regressions. C)Heteroskedasticity only occurs in cross-sectional regressions.
C is correct It may occur in Cross sectional or Time series analysis If there are shifting regimes in a time-series (e.g., change in regulation, economic environment), it is possible to have heteroskedasticity in a time-series.
Trend models can be useful tools in the evaluation of a time series of data. However, there are limitations to their usage. Trend models are not appropriate when which of the following violations of the linear regression assumptions is present? A)Model misspecification. B)Heteroskedasticity. C)Serial correlation.
C is correct One of the primary assumptions of linear regression is that the residual terms are not correlated with each other. If serial correlation, also called autocorrelation, is present, then trend models are not an appropriate analysis tool.
An analyst is estimating a regression equation with three independent variables, and calculates the R2, the adjusted R2, and the F-statistic. The analyst then decides to add a fourth variable to the equation. Which of the following is most accurate? A)The R2 and F-statistic will be higher, but the adjusted R2 could be higher or lower. B)The adjusted R2 will be higher, but the R2 and F-statistic could be higher or lower. C)The R2 will be higher, but the adjusted R2 and F-statistic could be higher or lower.
C is correct The R2 will always increase as the number of variables increase. The adjusted R2 specifically adjusts for the number of variables, and might not increase as the number of variables rise. As the number of variables increases, the regression sum of squares will rise and the residual T sum of squares will fall—this will tend to make the F-statistic larger. However, the number degrees of freedom will also rise, and the denominator degrees of freedom will fall, which will tend to make the F-statistic smaller. Consequently, like the adjusted R2, the F-statistic could be higher or lower.
Which of the following statements regarding the R2 is least accurate? A)It is possible for the adjusted-R2 to decline as more variables are added to the multiple regression. B)The adjusted-R2 not appropriate to use in simple regression. C)The adjusted-R2 is greater than the R2 in multiple regression.
C is correct The adjusted-R2 will always be less than R2in multiple regression.
David Brice, CFA, has tried to use an AR(1) model to predict a given exchange rate. Brice has concluded the exchange rate follows a random walk without a drift. The current value of the exchange rate is 2.2. Under these conditions, which of the following would be least likely? A)The process is not covariance stationary. B)The forecast for next period is 2.2. C)The residuals of the forecasting model are autocorrelated.
C is correct The one-period forecast of a random walk model without drift is E(xt+1) = E(xt + et ) = xt + 0, so the forecast is simply xt = 2.2. For a random walk process, the variance changes with the value of the observation. However, the error term et = xt - xt-1 is not autocorrelated.
Which of the following statements regarding the results of a regression analysis is least accurate? The: A)slope coefficients in the multiple regression are referred to as partial betas. B)slope coefficient in a multiple regression is the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant. C)slope coefficient in a multiple regression is the value of the dependent variable for a given value of the independent variable
C is correct The slope coefficient is the change in the dependent variable for a one-unit change in the independent variable.
Under which of these conditions is a machine learning model said to be underfit? A)The model identifies spurious relationships. B)The input data are not labelled. C)The model treats true parameters as noise.
C is correct Underfitting describes a machine learning model that is not complex enough to describe the data it is meant to analyze. An underfit model treats true parameters as noise and fails to identify the actual patterns and relationships. A model that is overfit (too complex) will tend to identify spurious relationships in the data. Labelling of input data is related to the use of supervised or unsupervised machine learning techniques.
_________ is the property of having a non-constant variance
Heteroskedasticiity