ML Pipeline and Hyperparameter Tuning

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Which of the following is true about grid search?

Grid search looks at each combination of parameters in a sequence and then chooses the optimal one

how certain parameters are best

no industry standard

Two generic approaches to searching hyper parameter space include

GridSearchCV (considers all parameter combinations) RandomizedSearchCV (can sample a given # of candidates from a parameter space with a specified distribution

State which of the following statements are true I. We should focus only on improving the performance on the training set, and the performance on testing set will get improved automatically. II. The testing error keeps on falling along with the training error with an increase in model complexity.

none of the two

Pipeline

object that make things more efficient, clean class of sklearn helps simplify the chaining of the transformation steps and the model along with the GridsearchCV helps search over the hyperparameter space applicable at each stage

testing data should be

separetely transformed using the same functions that were used to transform the rest of the data for model building and hyperparameter tuning

Pipelines are objects that help us _______ the processes/steps used in the ML project cycle.

standardize

split data into three, while tuning hyperparameter, to prevent data leak

trng, validation, testing

State whether the following statement is True or False "Grid Search is a computationally extensive cross-validation process"

true

Grid search parameter example import

sklearn.model_selection import GridSearchCV gs = GridSearchCV(knn_clf,param_grid, cv=10 cv = cross validation

GridSearchCV example

HP1, HP2, HP3..eg.

Fill in the blanks with the most appropriate option Data imbalance creates an issue because most of the classifiers/estimators work to improve the ______________, and thus the estimators are biased towards the _________.

Overall accuracy, majority class

State whether the following statement is True or False "GridSearchCV is an exhaustive sampling technique and can be inefficient"

true

State whether the following statement is True or False "Hyperparameters are like handles available to data scientists to tune the performance of their models"

true

State whether the following statement is True or False "The regression coefficients related to the features are the model parameters of a linear regression model"

true

The parameter grid defined for GridSearchCV takes input as

Dictionary

gs.best_params_ does what

extracts the best combination

make_pipeline() differs from Pipeline() as

make_pipeline() does not need the user to input the names for each step while Pipeline does

How to tune a model using train, test and validation split?

1. Pick a combination of hyperparameter 2. Train a model using those hyperparameters 3. Find the model's performance on the validation test 4. Repeat this process for all combinations available 5. Choose the model with the best validation score, and find out the final(generalized) score on the test set

GridsearchCV is

1. a basic hyperparameter tuning technique 2. it builds a model for each permutation of all of the given hyperparameter values 3. each model is evaluated and ranked 4. the combo of hyperparameter values that gives the best performing model is chosen 5. for every combo, cross validation is used and average score is calculated 6. its an exhaustive sampling of the hyperparameter space and can be quite inefficient

Hyperparameters and tuning steps

1. select model type (regressor or classifier) 2. ID the corresponding parameters space (csv file) 3. decide the method for searching and sampling parameter space 4. decide the cross-val scheme to ensure model will generalize 5. decise a score function to use to evaluate the model

Pipeline steps

1. sequentially apply a list of transforms and a final estimator 2. Intermediate steps of the pipeline must be "tranforms", that is , they must implement fit and transform methods 3, The final estimator only needs to implement fit 4. Helps std the model project by enforcing consistency in building testing and production

In a pipeline with 4 steps including the estimation/prediction step, how many times is the fit_transform() function called while calling pipeline.fit()?

3

Which of the following is NOT true about hyperparameter tuning?

Creates artificial data points

The pipeline object takes input as :

List of tuples

Which of the following can be the hyperparameter space of decision tree?

Max_depth, min_samples_split, max_features

Grid search will ensure that the best combination of hyperparameters

The best set of hyperparameters are provided out of the provided discrete sample space. They might not be the best combinations for the model.

How to upgrade the Numpy library?

To upgrade the numpy library, you can run: !pip install numpy==1.20.3 --user in your Jupyter notebook OR pip install numpy==1.20.3 --user in Anaconda prompt

Fill in the blank with the most appropriate option Dataset is divided into 3 parts and below steps are followed: The model is trained on ----- dataset Hyperparameters are tuned on ----- dataset Final performance is checked on ----- dataset

Training, validation, test

Which of the following is true for pipelines?

Transformations are always applied on train dataset and test dataset separately

Which of the following statements is NOT true about data leakage?

Using different datasets for training and testing leads to data leakage

Make_Pipeline

a function that will create the pipeline and automatically name each step. We dont need to specify a name names will be set to the lowercase of their types automatically

What is the difference between fit, fit_transform, and transform?

a. fit - is used to fit parameters of the function b. transform - transforming the data using parameters fitted with the fit function c. fit_transform - to first fit the parameters of the function and then transform the data also

KNeighborsClassifier list of parameters example

algorithm = 'auto' ; leaf_size=30 ; metric = 'minkowski' n_neighbors=5

Hyperparameters are supplied as

arguments to the model algorithms while initializing them..for e.g. setting the criterion for decision tree building "dt_model = DecisionTreeClassifier(criterion = "entrophy")"

Grid search follows a particular pattern and goes through

every set of hyperparameters available, the outputs are always same

State whether the following statement is True or False "Based on the performance of the model on test data, we should tweak the hyper-parameters of the model to get a better result"

false

State whether the following statement is True or False "Each step/stage in a pipeline should have a transform function so that the data fed to the next step is transformed"

false

State whether the following statement is True or False "Greater model complexity always implies better model performance on test dataset"

false

State whether the following statement is True or False "Grid search gives different outputs every time you run it"

false

State whether the following statement is True or False "Grid search will ensure that the best combination of hyperparameters(from all possible values of hyperparameters) are highlighted"

false

State whether the following statement is True or False "Hyper-parameters are learned from the data"

false

State whether the following statement is True or False "In the transformation step of the pipeline, transformation is fit on the training dataset and then train dataset is transformed, similarly transformation is fit on the test dataset and then test dataset is transformed"

false

State whether the following statement is True or False "Pipelines can only be used for classification purpose and not for regression"

false

State whether the following statement is True or False "The training dataset and test dataset should be scaled together to maintain uniformity"

false

State whether the following statement is True or False "make_pipeline is similar to pipelines but it does not require and does not permit, naming the estimators. Instead, their names will be set to the uppercase of their types automatically"

false

Make a Pipeline example code

from sklearn.pipeline.import make_pipeline pipe = make_pipeline( MinMaxScaler(), (SVC())) print("Pipeline steps:\ n{}".format(pipe.steps))

To get a list of hyperparameters for a given algorithm, call the function

get_params() from sklearn.svm.import SVC svc = SVC() svc.get_params()

all the permutations of hyper parameter values are tried out looks like this

gs.fit(X_train, y_train)

Calling KNeighborsClassifier is imported by using these default values

knn_clf = KNeighborsClassifier()

ervery stage a transform function except

last step

Hyperparameters are not

learnt from the data as other model parameters are...for e.g. attribute coefficients in a linear model are learnt from data while cost of error is input as hyper parameter

Hyperparameters are

like handles available to the modeler to control the behavior of the algorithm used for modeling


संबंधित स्टडी सेट्स

Application, Underwriting, and delivering the policy Questions

View Set

French CSET review questions, french cset

View Set

RHEUMATOLOGY, Orthopedics, Dermatology

View Set