Support Vector Machines
Define the following terms: decision boundary, negative/positive hyperplane, margin
- Decision Boundary: the line we find between the two closest support vectors Negative hyperplane and margin: the area beyond left-side 'limit' of the wide 'street'. This limit itself is called the 'margin'.
List 4 parameters of SVM classification and explain what they do
1) C: influences margin hardness. Higher C means lower bias/higher variance 2) Gamma: Can only be used with RBF kernel. Decides the width/sphere of influence of a data point on its neighbours. Higher gamma means lower bias/higher variance 3) Kernel: decides which function to use for kernel trick 4) Decision Function Shape: decides whether to use OvR or OvO
Give two implementations of linear SVM classification and explain when you would use either
1) LinearSVC: Default implementation. Converges faster for moderate-sized datasets 2) SGDClassifier: This use stochastic gradient descent to optimise cost function for LinearSVC. It is used when datasets are large and don't fit in memory or for online classification
Give implementations for : i) SVC for linearly-separable data iii) kernel usage (polynomial)
1) from sklearn.svm import LinearSVC 2) from sklearn.svm import SVC SVC(kernel = 'poly')
Explain gamma
A parameter for SVM classification. it manages the influence of a support vector. In other words, it manages the effect of a particular data point on its neighbours. So, if a data point is blue, then the model would not necessarily classify another data point next to it as blue. A large gamma implies that a support vector does *not* have a wide spread influence
Explain hard margin classification and soft margin classification. Explain how you can use either and when you would use either
HMC means that the optimal hyperplane must classify *all* data points correctly, without a single data point ending up on the hyperplane or the wrong side. SMC is the opposite, as it is more lenient and can allow misclassification of some points. Degree of hardness of margin classification is controlled by hyperparameter "C" Usually, SMC is preferred because of these issues with HMC: i) HMC can result in overfitting ii) HMC can't work without linearly separable data iii) HMC is even more sensitive to outliers
Give 2 pros and cons of SVM classification
Pros: - Great at handling smaller datasets - Great at handling higher dimensional (high #features) datasets Cons: - Computationally intensive when using kernels and - when searching for best parameters
Define SVM and explain how this model works
SVM is a core ML model that can be used for classification and regression. It is very sensitive to data scales. For classification, it works by fitting the widest possible 'street' or decision boundary between the closest data points that belong to different classes. These closest data points are called 'support vectors' and the decision boundary is called the 'optimal hyperplane' For regression, it works by doing the opposite; fitting the narrowest possible decision boundary between data points
Explain the kernel trick and give 4 different functions for performing it
The kernel trick is a mechanism which allows us to classify non-linearly separable data. It allows us to project our data into another axis. In other words, it allows us to add dimensions and bend space in order to have a hyperplane that separates our classes Functions: linear, radial basis function (RBF), sigmoid, polynomial
Explain how binary vs multi-class classification works with SVM
Two options: - OvR: Create a decision boundary that isolates one class from all the rest. Repeat this process for all classes except the last. Thus, the number of decision boundaries is n-1 where n = #classes - OvO: Create a decision boundary for each combination of classes. Very computationally intensive with larger datasets and #classes