Boosting
Which of the following statement(s) is/are correct for AdaBoost? A. An AdaBoost classifier is a meta-estimator that begins by fitting a classifier on the original dataset. B. The weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases. C. The default base estimator is a decision tree with max_depth=1 D. The base estimator can not be changed in AdaBoost implementation of sklearn.
A, B, and C
Which of the following hyperparameter(s) is/are common in AdaBoost, Gradient Boost, and XGBoost? A. random_state B. learning_rate C. subsample D. n_estimators
A, B, and D - The 'subsample' hyperparameter is available in Gradient Boost and XGBoost but not in AdaBoost.
Which of the following statements is correct?
Boosting combines multiple weak learners to make a strong model. - Boosting follows sequential modeling and combines multiple weak learners to make a strong model.
Which of the following is correct with regards to XGBoost? A. It can build learners parallelly B. It has the advantage of distributed computing
Both A and B - XGBoost can build learners parallelly and has the advantage of distributed computing.
Which of the following statements is/are correct regarding AdaBoost? A) It builds weak learners(decision tree) with restricted depth B) It builds weak learners(decision tree) - Till a tree is fully grown C) Weights of incorrectly classified points are decreased D) Weights of incorrectly classified points are increased
Both A and D - Adaboost builds weak learners (decision trees) with restricted depth. Since the decision trees are weak learners, therefore they are mainly one-step or two-step decision trees (restricted depth). In an AdaBoost model, weights of data points that have been incorrectly classified by the previous learner are increased before they are used to train the subsequent learner.
Which of the following is/are true: I) In the AdaBoost model, after the first run, the weightage of data points that were predicted wrong is increased. II) AdaBoost consists of under-fitted models.
Both I and II - In the AdaBoost model, after the first run, the weightage of data points that were predicted wrong by the first learner is increased before the data points are used to train the second learner. AdaBoost consists of under-fitted models since it is built up of weak learners.
Select True or False for the following statement: Each tree contributes equally to the final prediction by AdaBoost
False - A weighted voting/weighted average is taken among the models/trees to get the final prediction by the AdaBoost model.
Select True or False for the following statement: In Gradient Boosting, init is a hyperparameter that specifies the base estimator of the algorithm.
False - In Gradient Boosting, init is a hyperparameter that specifies an estimator object that is used to compute the initial predictions.
Select True or False for the following statement: The value of the learning rate is not always between 0 and 1
False - The value of the learning rate is always between 0 and 1. For example, if there are two learners and the value of the learning rate is 1, it will lead to overfitting while if the value is 0, the model will not take into account the predictions made by the second learner.
Weights of each sample remain the same for each subsequent weak learner in an Adaboost model.
False - Weights of samples may change for each subsequent weak learner in an Adaboost model. The samples which are incorrectly predicted by the previous weak learner will be given more weightage when they are used for training the subsequent weak learner.
Which 'gain' is used to build a tree in XGBoost
Gain calculated from similarity score
Which of the following predictive model(s) do not have an option to give more weightage to a certain class for classification problems?
Gradient Boost
Which of the following is not an example of boosting algorithms?
Random Forest - Random forest is an example of bagging algorithms while AdaBoost, Gradient Boosting, and XGBoost are examples of boosting algorithms.
The boosting algorithm builds models
Sequentially - The boosting algorithm builds models sequentially while the bagging algorithm builds models parallelly.
Why do we use the learning rate?
To avoid overfitting - If the residuals are added to the initial prediction then the result will be the original value we are trying to predict, the model will perfectly learn the training data along with the noises and hence will be an overfit model. To avoid such instances learning rate is used.
Select True or False for the following statement:In XGBoost, gamma is a hyperparameter that specifies the minimum loss reduction required to make a split.
True
Select True or False for the following statement Gradient Boosting algorithm tries to predict the residuals and keeps on minimizing the residuals with each iteration of weak learners
True - Gradient Boosting algorithm tries to predict the residuals that have been given by the previous model and keeps on minimizing the residuals (i.e tries to make the residuals closer to 0) with each iteration of weak learners.
Select True or False for the following statement: In stacking, heterogeneous models can be built whose result is combined by using another metamodel
True - In stacking, heterogeneous models can be built whose result is combined by using another metamodel. For example, let's say we have the training and testing dataset. The training dataset is divided into two folds, fold 1 and fold 2. In the beginning the heterogeneous models, for example, let's say, Adaboost, Gradient Boost, and Random Forest models are trained using the fold 1 training dataset. After this, heterogeneous models make their predictions for the fold 2 training dataset. Then using the fold 2 training dataset predictions and observations, another meta-model is trained, for example, let's say XGBoost and the XGBoost model makes the final predictions. Finally, predictions of the meta-model are evaluated by predicting on the testing dataset.
Which boosting algorithm has the advantage of parallel computation, efficient missing value treatment, and cache optimization features in its implementation?
XGBoost - XGBoost has the advantage of parallel computation, efficient missing value treatment, and cache optimization features in its implementation.
What is the correct sequence of steps for the stacking model prediction? a) Training 2 or more base models on fold 1 and predicting on fold 2 b) Predictions of meta-model are evaluated by predicting on test data c) Dividing the train data into 2 folds - fold 1 and fold 2 d) Using the predictions of base models a meta-model is trained
c - > a -> d -> b - The correct sequence of steps for the stacking model prediction is - Dividing the train data into 2 folds - fold 1 and fold 2 Training 2 or more base models on fold 1 and predicting on fold 2 Using the predictions of base models a meta-model is trained Predictions of meta-model are evaluated by predicting on test data
Select the correct pairing
i) Bagging - Equal weightage to all learners, Parallel model building, Samples are independent of each other ii) Boosting - Unequal weightage to all learners, Sequential model building, Samples are dependent on each other - In bagging, parallel model building is followed and equal weightage is given to all learners for making the final prediction. The samples that are used to train each individual learner are independent of each other since sampling with replacement is done. - In boosting, sequential model building is followed and unequal weightage is given to all learners for making the final prediction. More weightage is given to the learners that give more accurate predictions. The samples that are used to train each individual learner are dependent on each other since the samples that are used to train the subsequent learner depend on the predictions that have been made by the previous learner.
Which hyperparameter of XGBoost can be used to deal with the imbalance in data
scale_pos_weight