Decision Tree Learning (ML 3)

Ace your homework & exams now with Quizwiz!

rule post-pruning

1. Infer the decision tree from the training set, growing the tree until the training data is fit as well as possible and allowing overfitting to occur. 2. Convert the learned tree into equivalent set of rules by creating one rule for each path from the root node to a leaf node. 3. Prune each rule by removing any preconditions that result in improving its estimated accuracy 4. Sort the pruned rules by their estimated accuracy, and consider them in this sequence when classifying subsequent instances.

greedy algorithm

A greedy algorithm is an algorithmic paradigm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum.

gain ratio

A measure alternative to information gain designed to counteract the effect of attributes that can separate instances into very small subsets.

use statistics

A statistic determines whether or not pruning or expanding is likely to produce an improvement beyond the training set

split information

A term incorporated by the gain ratio

conjunction

AND

ID3's hypothesis space

All decision trees. A complete space of finite discrete-valued functions, relative to the available attributes.

Cost(A)

An attribute assigned to a vector that weights the information gain appropiately.

C4.5

An extension of ID3 that solves some of the issues with decision tree learning

preference or search bias

An inductive bias for certain hypotheses over others.

restriction bias or language bias

An inductive bias generated by the expressiveness of the hypothesis representation.

two approaches to overfitting

Approaches that stop growing the tree earlier, before it reaches the point where it perfectly classifies the data, and approaches that allow the tree to overfit then perform post-pruning.

ID3

Begins by asking the question: which attribute should be tested at the root of the tree?. Each instance attribute is evaluated to using a statistical test to find the best one. A branch or descendant for each value of the attribute selected is created. The entire process is consistently repeated.

inductive learning methods

Can be characterized as search a space of hypotheses for one that fits the training examples. ID3 is the space of all possible decision trees. Simple-to-complex.

incorporating continuous values in decision trees

Defining new discrete-valued attributes that partition the continous attribute-valued decision attributes that parititon the continious attribute value into a discrete set of intervals.

best suited problems characteristics #3

Disjunctive descriptions may be required

entropy and encoding

Entropy can be used to determined the number of bits needed to encode an instance

Information gain's bias

Favors attributes with many different values as opposed to attributes with very few values.

finding a c threshold

Generate a set of candidate thresholds midway between the corresponding values of a set. These thresholds can then be evaluated for information gain associated with each.

Definition of decision tree overfitting

Given a hypothesis space H, a hypothesis h of the set H is said to overfit the training data if there exists some alternative hypothesis h' of the set H that h has smaller error than h' over the training examples, but h' has a smaller error than h over the entire distribution of instances.

handling missing attributes in instances

Giving those nodes the most common attribute in the data, or assigning a probability to each of the values.

Favoring low cost attributes over others

ID3 can be modified to bias towards low cost attributes at the top of the tree

ID3's search hypotheses

ID3 maintains only a single current hypothesis as it searches through the space of decision trees.

ID3's backtracking

ID3 performs no backtracking in its purest form

best suited problems characteristics #1

Instances are represented by attribute-value pairs

advantage of using all training examples at each step

Much less sensitive to errors in individual training examples

disjunction

OR - distinct alternatives

Occam's razor

Prefer the simplest hypothesis that fits the data

Approximate inductive bias of ID3

Shorter trees are preferred over larger trees. Trees that place high information gain attributes close to the root are preferred over those that do not.

minimum description length

Stop growing the tree when it is passed a certain encoded length

inductive bias

The bias an algorithm acquires as it goes from training instances to unforseen instances.

expected entropy

The sum of the entropies of each subset weighted by the fraction of the of examples that belong to that subset

ID3 Restrictions

The target attribute whose value is predicted by the learned tree must be discrete.

best suited problems characteristics #2

The target function has discrete output values

best suited problems characteristics #4

The training data may contain errors.

best suited problems characteristics #5

The training data may contain missing attribute values

training and validation set

Use a set of examples distinct from the training set and examine the utility of post-pruning the tree

preference bias versus restriction bias

Usually better to work with preference bias because the complete hypothesis space is sure to contain the target function, whereas restriction bias might leave the target function inexpressable

discrete attributes with many values

Will have high level of information gain while separating into very small subsets. Causes extreme overfitting

decision trees represent

a disjunction of conjunctions of constraints on the attribute values of instances

decision tree learning

a method for approximating discrete-valued target functions, in which the learned function is represented by a tree

entropy is 0

all examples belong to same class

entropy

characterizes the impurity of an arbitrary collection of examples

reduced-error pruning

consider each of the decision nodes in the tree to be candidates for pruning. Pruning a decision node means removing the subtree of that decision node, making it a leaf, and assigning it the most common classification of the training examples affiliated with that node.

entropy is 1

if there are equal number of positive and negative examples

information gain

is simply the expected reduction in entropy caused by partitioning the examples according to this attribute.

classification problems

labeling a specific example as part of a discrete step

information gain

measures how well a given attribute separates the training examples according to their target classification

Causes of ID3 Overfitting

noise in the data, or the number of training examples are too small to produce a representative sample of the true target function

Cause of ID3 inductive bias

only the search strategy.

best approach to overfitting

pruning the after overfit due to difficulty of knowing when to stop growing the tree

node

represents some attribute of an instance

branch

some value of that attribute

decision tree classification

starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated for the subtree rooted at the new node.

condition for node-removal in reduce-error pruning

the tree post removal performs equally efficient on the validation set as the original tree


Related study sets

Macro Unit 2: Economic Indicators and the Business Cycle

View Set

CHAPTER 1: STRUCTURE & PROPERTIES OF THE ATOM (QUESTIONS)

View Set

Mathematics CKT Sample Test Questions

View Set

Chapter 1 An Introduction to Anatomy and Physiology

View Set

Chapter 17: Pulmonary Clinical Assessment

View Set

Equal Credit Opportunity Act (ECOA) / Regulation B - CRCM Study Guide

View Set

Chapter 25 Manicuring Final Exam

View Set