478 Midterm

Ace your homework & exams now with Quizwiz!

CART

(Mleft/M)Gleft + (Mright/M)Gright

F_score/ F1_score

2 [(P*R)/(P+R)] == 2/[(1/P)+(1/R)]

For a Multi-class classifier with 4 labels/classes, what is the dimensionality of the confusion matrix?

4 x 4

CART algorithm

Constructs a binary tree; scikit-learn uses CART algorithm for its implementation of decision trees; Recursive algorithm. Greedy algorithm in the sense that it greedily searches for an optimum split at the top level, then repeats the process at each level.

sklearn by default uses entropy as criterion for splitting (T or F)

FALSE

A regression model can predict categorical values (T or F)

False

Kernel trick is to transform the data to a lower dimensional space so that it becomes linearly separable (T or F)

False

Normalization is a random shuffling and always hurts results (T or F)

False

Shuffling data randomly before or after splitting to train/test sets would significantly reduce the model performance (T or F)

False

The cost function of decision trees is a weighted average between both gini and entropy of each node. (T or F)

False

Gini

Gi = 1 - [summation] Pi, k^2

Recall Function

TP / (TP + FN)

Precision function

TP / (TP + FP)

Soft margin & hard margin SVM

If we strictly impose that all instances be off the street and on the correct side of the decision boundary, this is called hard margin classification; Hard margin SVM only works if the data is linearly separable, and it is quite sensitive to outliers

SVM overfitting and underfitting

If you trained an SVM classifier with a linear kernel and it seems to underfit the training set, try changing the kernel to a non-linear kernel such as poly or rbf. If you trained an SVM classifier with a non-linear kernel and it seems to overfit the training set, try changing the kernel to a linear kernel or a non-linear kernel with a lower complexity, e.g. a polynomial with a lower degree. A higher value for C parameter is more likly to lead to overfitting as it narrows down the margins. A lower value for C parameter is less likely to lead to overfitting as it makes the model more generalized by widening the margins.

Example of unsupervised learning?

Image clustering

Online learning vs batch learning differences

In batch learning, the system is incapable of learning incrementally; it must be trained using all available data, whereas in online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches

Ridge Regression Cost Function

J(theta) = MSE(theta) + 1/2 a[summation] theta^2

Lasso Regression Cost Function

J(theta) = MSE(theta) + a[summation] [absolute value] theta

What ML method is supervised learning but sounds like it shouldn't be?

Logistic Regression

What is overfitting?

Model performs well on training data but does not generalize well; Model likely to detect patterns in the noise with not enough data & that data is noisy.

What is RMSE used for?

Regression error

Cross validation:

Splits the data into K different folds, runes for k iterations, and in each iteration reserves 1 fold for testing; Splits the data into K different folds, runs for k iterations, and in each iteration reserves k-1 folds for training; Is a performance measure for ML models.

Difference between supervised and unsupervised learning

Supervised learning and unsupervised learning both have training, however, in supervised learning, the training data you feed the algorithm includes desired solutions, called labels whereas in unsupervised learning data is not labeled

Examples of Machine Learning regression problems

Temperature forecast; predicting stock market

MNIST dataset (0-9) handwritten

The features of each data sample are pixel intensities and there are 10 different labels; The features or numbers from 0-255 and the labels are from 0-9; It is often called the "Hello World" of machine learning

Precision vs recall

The lower number of False Negatives the higher the recall

A confusion matrix with high scores on the main diagonal indicates a good model performance (T or F)

True

A good way to reduce overfitting is to regularize the model (constrain it) by adding a regularization parameter (T or F)

True

A smaller value for Entropy is better and should be preferred for choosing the feature in decision trees (T or F)

True

Clustering is an example of unsupervised learning (T or F)

True

Common to use 80% of data for training and 20% for testing? (T or F)

True

Comparing AI, ML & DL, one can argue that a superset-subset relationship between them such that DL is a subset of ML and ML is a subset of of the broad field of approaches, algorithms and techniques in AI (T or F)

True

Cross validation is an effective way of model evaluation (T or F)

True

Finding an optimal decision tree is an NP complete problem (T or F)

True

Fine tuning model parameters may improve the results of the ML model (T or F)

True

Gradient Descent solves optimization problems on the cost function using gradient matrix and a learning rate which should be neither too small nor too large (T or F)

True

Machine Learning is great for problems for which solutions require a lot of hand-tuning or long lists of rules: one ML algorithm can often simplify code and perform better (T or F)

True

Matplotlib is a Python module that has a wide variety of plotting features and functions and can be used for data visualization (T or F)

True

Normalization is one way of scaling and usually improves the model performance ( T or F)

True

Normalization may change the data range (T or F)

True

Preprocessing the data is a critical step in preparing the data for the ML model and may include cleaning the data by dropping NA values (T or F)

True

ROC curve plots "true positive rate" TPR on y-axis against "false positive rate" FPR on x-axis, and its "area under curve" AUC is a performance measure of ML models (T or F)

True

Regression is predicting a target numeric value, such as the price of a car, given a set of features called predictors (T or F)

True

Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a bit of labeled data. This is called semisupervised learning (T or F)

True

Sometimes scaling large values in features may improve the results of the ML model (T or F)

True

Stochastic Gradient Descent uses only one random instance to compute the gradients at every step whereas Batch Gradient Descent uses the whole training set. (T or F)

True

There is a trade-off between precision and recall such that any attempt to increase precision will decrease recall and vice-versa (T or F)

True

Typical supervised learning task is classification (T or F)

True

Gini

lower gini index is better == lower impurity

See all study sets

478 Midterm

Related study sets

Vocabulary Workshop Level A Final Exam - DEFINITIONS

piel y tejidos blandos, sepsis, dengue, malaria, fiebre amarilla, fiebre tifoidea

Test Chapter 45 med surge

Quiz #3: Identifying Arguments

BSC- Final

MATH 2171 Homework #1 Algebra Review

Classic Books and Authors

Vitamin D Exam

Unit Test Review for physics

section 4 chapter 3

databases

UA geology exam 2 (Igneous and volcanos)

ILRHR 6910 Finance Exam

Reference angles, radian

Biology Chapter 3

Biology 1408

Quiz 4

Economics Chapter 9 Flash Cards

Science 1101 Final

Legal Psych Exam 4