ML final

Ace your homework & exams now with Quizwiz!

Why is dual form SVM better than primal form?

1) The kernel trick can be applied to the dual form and not the primal form 2) When the number of features is less than the number of examples in the training dataset, dual SVM learns less parameters than primal SVM.

What are the steps in multiclass dual soft margin SVM classification using a kernel?

1) derive dual soft margin SVM kernel formula 2) use quadratic programming to solver to train the model 3) train multiple SVM's for multiple classes 4) Given an input x, use the kernel function k(x,x) for prediction.

What are reasonable ways to reduce training error for dual form soft margin SVM multiclass classification.

1) increase the penalty parameter C 2) Increase the degree of the polynomial kernel 3) use an RBF kernel instead of a polynomial kernel

What is the Naive Bayes Assumption

That the x's, x1,x2,x3 ... xn are conditionally independent given y.

For naive bayes, which combination of means and variances can create a linear decision boundary for gaussian class conditional distributions for a binary class variable.

When the two distributions have unequal and equal variances.

T or F : A convex objective function can not learn a non-convex decision boundary

False

T or F : Overfitting is less likely to happen when the feature space is larger.

False (more features will yield a more complex model which could easily overfit)

T or F : Both the training error and the test error of a classifier are expected to decrease as it is trained on a larger dataset.

False (test error will typically decrease, but training error will typically increase)

Suppose we have a kernel function defined over an original M-dimensional space, which corresponds to calculating dot (inner) products in a higher K-dimensional space. If we use this kernel function to learn a kernel SVM, the result is a decision boundary which is linear in the data points in the

K-dimensional space

What is an example of a generative model

Naive Bayes

T or F: A non-linearly separable training set in a given feature space can always be made linearly separable in another space. (Assume there are no overlapping points).

True

T or F: Using the kernel trick, one can get non-linear decision boundaries using algorithms designed originally for linear models.

True

T or F : If one is using value iteration and the values have converged, the policy must have converged as well

True (If value iteration has converged, the values of each state will not change anymore. This means that the policy from iteration to iteration will not change either.

T or F: Value iteration will converge to the same vector of values (V*) no matter what values we use to initialize V.

True (There will only be one set of values that satisfies this condition so no matter where we start value iteration, we will always arrive at the same set of values on convergence)

T or F : For an infinite horizon MDP with a finite number of states and actions and with a discount factor gamma that satisfies 0 < gamma < 1, value iteration is guaranteed to converge.

True (with a discount factor less than 1, value iteration is guaranteed to converge, as shown in lecture)

T or F : A classifier trained on more training data is less likely to overfit

True (with more training data, the model is able to generalize and doesn't overfit)

To reduce overfitting in an SVM using an RBF kernel, you should try

decreasing gamma


Related study sets

Chapter 03 Organization : Structure and Control

View Set

Data Structures and algorithms interview questions

View Set

BUS CH 9,10,14 concept questions

View Set

Chapter 7 Prioritizing Client Care; Leadership, Delegation, and Emergency Response Planning

View Set

Professional Military Knowledge Eligibility Exam (PMK-EE) for E-4: Naval Heritage

View Set