Ensemble Techniques
Which of the following hyperparameters can be tuned in random forest?
max_depth max_features min_samples_split min_samples_leaf
In a classification setting, for a new test data point, the final prediction by a random forest is done by taking the
mode of the individual predictions
Bagging (Bootstrap Aggregating)
most popular technique-start with model and take different samples of it(randomly), create different sample, then create mod 1 , 2 ,3
steps to next project
#Building the model #We are going to build 2 ensemble models here - Bagging Classifier and Random Forest Classifier. #First, let's build these models with default parameters and then use hyperparameter tuning to optimize the model #performance. #We will calculate all three metrics - Accuracy, Precision and Recall but the metric of interest here is recall. #Recall - It gives the ratio of True positives to Actual positives, so high Recall implies low false negatives, #i.e. low chances of predicting a defaulter as non defaulter
On average, approximately what percentage of samples get selected in sampling with replacement?
63%
In Random Forest, to get different n-models with the same algorithm, we use
Bootstrap aggregation
Bagging refers to
Bootstrap sampling and aggregation
Sampling with replacement makes estimators more and more correlated
False
Which of the following statement regarding the 'stratify' parameter of train_test_split() function in sklearn is correct?
It is used when the data is imbalanced to achieve stratified sampling which ensures that relative class frequencies are approximately preserved in train and test sets.
Which of the following are true for bagging?
Makes the model more robust Guards you against overfitting the model to original data Follows parallel model building
Which of the following problems of a decision tree can be overcome by random forest?
Overfitting Instability due to changes in data
Which of the following predictive models would be considered overfit if the metric of interest is 'accuracy'?
Random Forest: Train accuracy: 1.0 Test accuracy: 0.65
When there are N number of models in an ensemble technique the predicted outcome is decided on the basis of
Voting in Classification problem Averaging in a Regression problem
In bagging, we use the same
algorithms and give different data points to obtain different models.
Which of the following hyperparameters of Random Forest is useful in dealing with imbalanced data by giving more importance to the minority class?
class_weights
boosting
construct model in sequential order
Bagging classifier can only have decision tree as the base estimator.
false
In bootstrap sampling, the same observation can not be picked up more than once.
false
In ensemble methods, errors by each base estimator should not be independent of each other
false
Random forests always need to be pruned to get a good prediction.
false
Bagging is
homogenous therefore all the algorithms used have to be the same.
If p is the probability of choosing an observation then which of the following are true for sampling with replacement?
p remains same at each stage for all observations
Random Forest
randomly select observations/rows and specific features to build multiple decision trees and then averages the results across all the trees
Bootstrap sampling and aggregation
sampling from data/use different sample to train different trees
Ensemble techniques leverage the "low computational time" and compensate for the "high error rate of weak learners" by combining them to create a more computationally complex model with a lower error rate.
true
Random forest randomly picks a subset of independent variables for each node's split. If m is the size of the subset and M is the total number of independent variables, where m is generally less than M.
true
ensemble learning
using multiple models to obtain better predictive performance