Pipeline and Hyperparameter Tuning
Which of the following statements is true about grid search?
Grid search looks at each combination of hyperparameters in a sequence - Grid search looks at each of the combinations of the hyperparameters in a sequential manner.
Fill in the blanks with an appropriate option. Imbalance data creates an issue because most of the classifiers/estimators work to improve the ______________, and thus the estimators are biased towards the _________.
Overall accuracy, majority class - Estimators work to improve overall accuracy but in the case of imbalanced data the estimators are biased towards the majority class.
State whether the following statement is True or False "Based on the performance of the model on test data, we should tweak the hyperparameters of the model to get a better result"
False - We should tweak the hyperparameters based on the validation set performance.
Which of the following is the purpose of using pipeline objects?
Standardize the steps used in the ML projects - Pipeline objects are used to standardize the steps used in the ML project.
Grid search will ensure that the best combination of hyperparameters is obtained universally.
False - Grid search selects the best combination from the grid. The best combination of the hyperparameters might exist outside the grid. Hence, grid search does not ensure the best combination of the hyperparameters.
State whether the following statement is True or False "Hyper-parameters are learned from the data"
False - Hyperparameters are defined to tune the model explicitly and not learn from the data.
State whether the following statement is True or False. "Pipelines can only be used for classification purposes and not for regression"
False - Pipelines can be used for both classification and regression.
State whether the following statement is True or False "RandomizedSearchCV needs to be provided with a hyperparameters space in the form of a discrete list"
False - Random search CV takes pick random combinations of the hyperparameters. We don't need to define hyperparameter space as we do for a grid search.
State whether the following statement is True or False "Each step/stage in a pipeline should have a transform function"
False - The Pipeline object requires all the stages to have a "transform()" function except for the last stage which is an estimator.
State whether the following statement is True or False "The training dataset and test dataset should be scaled together to maintain uniformity"
False - We should not treat the train and test set together because it can lead to data leakage.
What are the steps followed in implementing a pipeline?
1. Pipeline(), 2. Pipeline.fit(), Pipeline.score() - The steps followed in implementing a pipeline are: - Pipeline() - A pipeline is defined as a list of tuples - Pipeline.fit() - Pipeline is fitted on the train set - Pipeline.score() - Pipeline objects checks the performance.
If there are 4 steps in the pipeline, how many times the fit_transform() function will be called?
3 - The pipeline function transforms all the steps except for the last step. If there are 4 steps then 3 times transform will be called and for the last step, an estimator will be used.
Which of the following statements are True? 1. GridSearchCV returns the best model from a search over a parameter_grid. 2. GridSearchCV uses cross validation to compute the mean score for each set of hyperparameters.
Both 1 and 2
Which of the following data collection methods is used as an input to a Pipeline object?
List of tuples - Pipeline object takes a list of tuples as an input.
The number of iterations in randomized search is equal to?
N_iter (no. of iterations) - defined by user - Randomized search CV tries out the random combinations of the parameters n times and this is controlled by an argument called 'n_iter'. The number of iterations is defined by the user.
State which of the following statements are true - We should focus only on improving the performance on the training set, and the performance on the testing set will improve automatically. - With an increase in model complexity, the testing error keeps on increasing along with the training error.
None of the two 1. Improving the performance on the train set will lead to overfitting and the model will perform poorly on the test set. 2. With the increase in model complexity, test performance might not increase.
Which of the following packages is used to import GridSearchCV and RandomizedSearchCV?
Sklearn.model_selection - sklearn.model_selection is used to import GridSearchCV and RandomizedSearchCV.
On which of the following split of the data, the model is trained, hyperparameters are tuned and final evaluation is checked?
Training, validation, test - The model is trained on the train set, hyperparameters are tuned on the validation set and final performance is checked on the test set.
Which of the following statements is true about pipelines?
Transformations are always applied on train dataset and test dataset separately
State whether the following statement is True or False "The regression coefficients of the features are the model parameters of a linear regression model"
True - Coefficients related to the features are the model parameters as we obtain them after training the data.
State whether the following statement is True or False "RandomizedSearchCV is computationally less expensive as compared to GridSearchCV"
True - Random search CV tries random combinations of the hyperparameters. This makes Random search less computationally expensive.
State whether the following statement is True or False "RandomizedSearch is known to perform better than GridSearch"
True - Random search has a high chance of hitting the right combination of the hyperparameters. Hence, random search is known to perform better than grid search.
State whether the following statement is True or False. 'The estimator does not have a transform() function'
True - The estimator does not have a 'transform()' function because it builds the model using the data from the previous steps, it does not transform the data.
