ML2 - Inductive Learning

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

DT applications

> multivalued attributes: in this case information gain is no good. Convert to Boolean tests. > continuous input: identify split points with highest information gain. > continuous output: build a regression tree which ends in a linear function.

Linear classifiers

A linear function can be turned into a linear classifier by setting it equal to zero. It defines a boundary between two linearly separable classes.

Perceptron

A simple linear classifier with input function, activation function and output. A neural network is a combination of many perceptions

Information gain

Attribute, a, splits s into s_i, each with its own entropy. The goodness of a, can be measured in the reduction of entropy of s and all s_i. This corresponds to the weighted sum of the children's entropies.

Random forets

Bagging applied to decision trees in two ways. A set of DT's are learned from bagged datasets. However bagging is also applied at the feature level, known as the random subspace method.

Bagging

Bootstrap aggregation. Improves stability of unstable classifiers such as NN and DT. The training set is used to generate m new datasets by sampling with replacement. The m models are fitted with the classifier and the results are obtained by averaging (regression) or voting (classification)

CART

Classification And Regression Trees. Maximises homogeneity of child nodes by recursively selecting most discriminative attribute. Uses gini index. Post prunes starting with branches with weakest predictive power.

Linear regression

Estimate parameter values which minimise the loss function Weight value can be solved > directly > iteratively

Reduced error pruning

Every branch is considered for pruning. Each branch is removed and converted into a leaf with plurality classification for that leaf. This is then tested and kept it performance is superior to original. This requires: > training set > pruning set > test set

C4.5

Extends idea of ID3 and adds reduced error pruning. Error bound uses upper bound of confidence interval. + handle mixed data types + handle missing values - overfitting is a problem

Gain ratio

Information gain favours attributes with many values. Split information penalises attributes with lots of values. Gain ratio balances the two

Gini impurity

Is the probability of the classification mislabelling a randomly selected element.

ID3

Iterative dichotomiser. Top down induction of decision trees: 1. Window is chosen which is a randomly selected subset of training data 2. Tree is grown on data which has 100% accuracy 3. Tested on all other training instances 4. If accuracy == 1.0 done 5. Else repeat adding misclassified patterns to window Uses information gain and no pruning

DT overfitting

Low resubstitution error but Poor generalisation. Preventions include: > stop growing the tree early to prevent getting to the point of overfitting > allow to overfit then prune

Entropy

Method to select attribute that is best to split the data.

Gradient descent

Plots the loss function as a hyperplane and seeks to find values of W that find the global minimum.

Rule post pruning

Pruning solution in data limited situations A rule represents a route from root to node. Pruning involves removing the branch and applying the rule: at node v, class label x, and see if it improves performance. + very flexible

Decision trees

The aim is to find the most compact decision tree consistent with the training examples. This recursively selects the best attribute to split the examples. At a given node we may find: 1. All examples are positive. We are finished 2. All examples are negative. We are finished 3. Examples remain mixed: select best attribute and split. 4. No examples remain: lack of data. You can return: - default value - plurality classification: best guess given parent - probability 5. Mixed samples and no attributes remain. Due to noise or unobservable attributes. - plurality classification

Boosting

This maintains a weight over training examples, with a higher weight indicating a more difficult example that needs more learning. Classifiers which learn this correctly are given a higher weighted vote. Adaboost is a common approach

Inductive learning

This simply means learning from examples

Logistic function

This smooths out the boundary of the threshold to dampen the effects of misclassification.

Ensemble methods

This uses N classifiers on the training examples and gets them to vote in the final decision . Dramatically reduced error.

Regularisation

We have to be careful of overfitting and must take model complexity into account


Ensembles d'études connexes

Evaluating an Argument on Healthy Eating

View Set