Decision Trees
How does the complexity parameter affect tree construction?
Lower the complexity parameter, the more complex is the model - If the complexity parameter i.e alpha is low, it means that the difference between the errors of the pruned tree and the original tree is low. This means that the pruned tree is almost similar to the original tree. Therefore the model is still very complex. So, the lower the complexity parameter, the more complex is the model.
The entropy value of 0 for a node represents
Most pure node - The entropy value of 0 for a node represents that it is the purest node containing observations of one particular class only.
What is the mechanism behind Pre-Pruning?
Early stopping - Stop building the tree by applying restrictions on different hyperparameters - Pre-pruning is used to stop the tree from growing to its full length by bounding the different hyperparameters.
Values of hyperparameters are already defined and can not be changed.
False - Parameters are learned from the data whereas hyperparameters are values that are given by the practitioner and can be changed.
The highest value of Gini Impurity (0.5) means that the node is pure i.e. the data set contains only one class.
False - The highest value of Gini Impurity (0.5) means that the node is perfectly impure i.e. the data set contains an equal number of both the classes.
Consider an example with four features Select the feature that will be used to split the root node, given:
Feature 2 gives Gini gain: 0.11 - The feature that is giving the highest Gini gain will be used to split the root node.
Which of the following represents the correct representation of GINI impurity(G(k)) in terms of probability
G(k) = 1 - Σ Pi^2 - where Pi = Probability of choosing a particular class
Which of the following functions helps in finding the optimal hyperparameters for a machine learning model?
GridSearchCV() - GridSearchCV() helps in finding the optimal hyperparameters for a machine learning model by tuning the model across different sets of hyperparameters and then choosing the optimal hyperparameters based on model performance.
What is the strategy used in the Cross-Validation technique?
It splits data into n groups and runs n experiments, in each experiment n-1 groups are used to train a model and the model is tested on the left-out group - In the Cross-Validation technique, the data is split into n groups and n experiments are run. In each experiment, (n-1) groups are used to train a model and the model is tested on the left-out group.
What will the following code return? Decision_model.feature_importances_ # Decision_model is a decision tree model fitted on data.
It will return an array with the magnitude of feature importances based on which we can decide what importance decision tree has given to each feature - The above code will return an array with the magnitude of feature importances based on which we can decide what importance the decision tree has given to each feature.
A small change in the data can result in large changes in the structure of a decision tree
True - A small change in the data can result in large changes in the structure of a decision tree and therefore the interpretations from the model will change.
Select True/False for the following statement A decision tree at the maximum value of the complexity parameter (alpha) would just be a root node.
True - If the complexity parameter i.e alpha is maximum, it means that the difference between the errors of the pruned tree and the original tree is maximum. This means that the pruned tree is just the root node.
Which of the following impurity measure can be used for a regression decision tree?
Variance - Variance is used as the impurity measure for a regression decision tree where the target variable is a continuous value.
The CART algorithm simply chooses the right "split" by finding the split that
maximizes the Gini gain - The CART algorithm simply chooses the right "split" by finding the split that maximizes the Gini gain.
At the root node, the Gini Impurity value is 0.48. After branching, the weighted average value of Gini Impurity is 0.41 Then the Gini gain is
0.07 - In this case, the Gini gain is the difference between the Gini impurity at the root node and the weighted average value of Gini Impurity after branching. Therefore, Gini gain = 0.48 - 0.41 = 0.07
What is the highest value that GINI impurity can take when we have a binary (0 and 1) classification problem?
0.5 - As given in the problem, we have two classes, 0 and 1. Let's say that the probability of choosing 0 and 1 is P(0) and P(1) respectively. - We know that Gini impurity = P(0)*(1-P(0)) + P(1)*(1-P(1)) = P(0)*P(1) + P(1)*P(0) = 2*P(0)P(1) [Note -> P(0) + P(1) = 1] 2*P(0)*P(1) can take maximum value when P(0) = P(1) = 0.5Therefore, in this case Gini impurity(max) = 2*0.5*0.5 = 0.5
What are the problems with an overfitted decision tree? A) It is not able to generalize well on new data samples B) It captures noise too C) It forms complex rules for prediction D) It performs well on the training set and poorly on the test set
A, B, C, and D - The problem with an overfitted decision tree is that it performs well on the training set but poorly on the test set i.e it is not able to generalize well on new data samples. In the process of overfitting the training data, the tree forms complex rules for prediction and captures noise too.
Which of the following is/are application(s) of the decision tree? A) Classifying tumors B) Spam mail classification C) Stock Price Prediction
A, B, and C - A Decision Tree can be used for both classification and regression problems. Classifying tumors and spam mail classification are examples of classification problems since the target variable is a discrete value while stock price prediction is a regression problem since the target variable is a continuous value.
Which of the following is/are correct regarding pruning? A) It reduces the depth of the tree B) It helps in creating a generalized model C) It makes the tree complex and difficult to interpret D) There are two types of pruning - pre and post
A, B, and D - Pruning reduces the depth of the tree and helps in creating a generalized model by avoiding overfitting the training data. It makes the tree simpler and easier to interpret. There are two types of pruning - pre and post.
Decision Trees can be used for
Both Classification and Regression - Decision Trees can be used for both classification and regression problems. In classification problems, the target variable is a discrete value/class whereas, in regression problems, the target variable is a continuous value.
What is the mechanism behind Post-Pruning?
Building a complete tree and then pruning sub-trees. - The mechanism behind Post-Pruning is building a complete tree and then pruning the sub-trees which are not much 'valuable'.
The final prediction by a regression decision tree model is
Continuous values - The regression decision tree model is used when the target variable is a continuous value.
Identify the machine learning classification problems from the following options a) Intrusion Detection b) Identifying the presence of cancer c) Predicting the price of a car d) Weather prediction (amount of rainfall, temperature)
Options a and b - a) For Intrusion detection, the target variable is a discrete value/class. - b) For Identifying the presence of cancer, the target variable is a discrete value/class. - c) For Predicting the price of a car, the target variable is a continuous value i.e price. - d) For Weather prediction, the target variable is a continuous value like the amount of rainfall or temperature, etc. - In classification problems, the target variable is a discrete value/class whereas, for regression problems, the target variable is a continuous value.
Identify the machine learning classification problems from the following options a) Approve Loan / Disapprove Loan b) Predict sports score c) Admit a person in ICU or not d) Pricing of a house
Options a and c - a) For Approve Loan / Disapprove Loan, the target variable is a discrete value/class. - b) For Predict sports score, the target variable is a continuous value i.e score. - c) For Admit a person in ICU or not, the target variable is a discrete value/class. - d) For Pricing of a house, the target variable is a continuous value i.e price. - In classification problems, the target variable is a discrete value/class whereas, for regression problems, the target variable is a continuous value.
What is the difference between parameters and hyperparameters?
Parameters - are learned from data, hyperparameters - the value is defined by practitioner - Parameters are learned from the data while hyperparameters are defined by the practitioner. For example, the coefficients in linear regression or logistic regression are model parameters while max_depth is a model hyperparameter.
What is one of the main problems of the decision tree?
Prone to Overfitting - One of the main problems of the decision tree is that unless the tree is restricted, it will grow to its full length to achieve complete homogeneity and in this process will overfit the training data.
The first node of a Decision tree is known as
Root node - The root node is the first node of a Decision tree from where the entire branching starts.
score() function used on a decision tree model will return
The accuracy of Decision Tree model - The score() function that is used on a decision tree model will return the accuracy of the model.
Which of the following is correct if the max_depth of a tree is set to 'None'.
The tree will continue to grow until all nodes of the tree are pure. - If the max_depth of a tree is set to 'None', that means the tree is not restricted and it will continue to grow until all nodes of the tree are pure or complete homogeneity is achieved.
Why do we prune decision trees?
To avoid Overfitting - Pruning of decision trees is done to avoid overfitting because if the tree is not pruned/restricted, then it will grow to its full length to achieve complete homogeneity and try to capture every minute detail in training data.
Which hyperparameter helps to perform post-pruning in sklearn decision tree classifier?
ccp_alpha - The ccp_alpha hyperparameter helps to perform post-pruning in the sklearn decision tree classifier.
Which hyperparameter will be used to change the splitting of a decision tree from 'gini' to 'entropy'
criterion - Criterion is the hyperparameter that will be used to change the splitting of a decision tree from 'gini' to 'entropy'.