ITCS 3162 Final Exam Review Guide
Specificity
(Number of True Negatives)/(Number of Actual Negatives)
Sensitivity
(Number of True Positives)/(Number of Actual Positives)
Confidence
(Number of transactions containing A and B)/(number of transactions containing A). A measure of the accuracy of the rule determined by the formula above.
Support
(Number of transactions containing A and B)/(total number of transactions. The proportion of transactions in D that contain A and B.
What are the requirements for using decision trees?
- Decision trees are Supervised learning - need a target variable - Training data should be rich and varied as decision trees learn by example - Target attribute must be discrete (can't be a continuous target variable).
How does the a priori algorithm work?
A priori algorithm takes advantage of the a priori property to shrink the search space.
Are association rules supervised or unsupervised learning?
Although association rules are used for unsupervised learning, it can also be applied for supervised learning for a classification task.
What types of data can be used in association rule mining?
Association rule mining can be used for flag data types and categorical data.
How can we use association rules?
Association rules can be used to: Investigating the proportion of subscribers to your company's cell phone plan that respond positively to an offer of a service upgrade. Predicting degradation in telecommunication networks Determining the proportion of cases in which a new drug will exhibit dangerous side effects.
What is meant by the statement "association rules are easy to do badly"?
Association rules need to be applied with care, since their results are sometimes deceptive.
Why should numeric values in a data set be normalized before mining?
Because large values can overwhelm the influence of other attributes which are measured on a smaller scale. This can be avoided through normalizing the data.
Evaluation for Description
Checking the clarity of understanding elicited in your target audience.
What does CART stand for? What kind of tree does the CART algorithm produce?
Classification and regression trees algorithm. CART algorithm produce binary, containing exactly two branches for each decision node.
When do we use clustering and why?
Clustering is often performed as a preliminary step in a data mining process. This is so the search for the downstream algorithms is reduced.
What is clustering?
Clustering refers to the grouping of records, observations, or cases into classes of similar objects
When can you use k nearest neighbor algorithm?
Considered a non-parametric test (used when you don't need to know distribution of data) it is often used in classification and regression.
What is a decision rule?
Decision rules are rules that are constructed from the decision tree. It is basically a rule on how to interpret the decision tree. It comes in the form if antecedent, then consequent.
Do decision trees represent supervised or unsupervised learning?
Decision tree algorithm represent supervised learning.
In the CRISP-DM process, when do we apply evaluation techniques?
Evaluation techniques are applied before the deployment phase.
Evaluation for Estimation and Prediction
Examining the estimation error using the mean square error.
What are two methods of normalizing numeric attributes? (discuss min-max and z-score)
Min-max: Will almost always lie between 0 and 1 just like the "identical" function. This is preferred when mixing categorical and continuous variables. Z-score: Usually takes values -3<z<3, representing a wider scale than Min-max.
Unsupervised Learning and Examples
No target variable identified. Clustering and Association
False Negative
Represents a record that is classified as negative but it is actually positive.
False Positive
Represents a record that is classified as positive but it is actually negative.
True Negative
Represents a records that is classified as negative.
True Positive
Represents a records that is classified as positive.
Why do we apply a suite of models to our knowledge discovery process?
So we can have a confluence of results. This way, the models act as a validation for each other.
Error Rate
Sum of the false negatives and false positives divided by the total number of records.
Supervised Learning and Examples
Target variable identified. Classification, Regression, and Association
What type of attribute does the target attribute need to be?
The target attribute classes must be discrete.
Why is k-nearest neighbor considered instance based learning?
This is because in k-nearest neighbor algorithm, the training data set is stored.
Why is it important for instance based learning methods like k-nearest neighbor to have a rich database? What does this mean?
This is so that rare classifications be represented sufficiently which will result in the algorithm not predicting common classifications. A rich database means that it has many different combination of attribute values as possible.
Binary Tree
Two nodes at each branch, simpler, can save storage space, deeper tree
Evaluation for Classification
Using error rate, false positive, false negative, error cost adjustment, lift, lift charts, and gain charts.
How do we measure the usefulness of rules? (Lift, Gain - understand ROC curve)
We measure the usefulness of association rules through lift which is defined as: Lift = (Rules confidence)/(Prior proportion of the consequent)
How do you choose the value of k? What are the advantages/disadvantages of using a small versus large value for k?
When choosing the value of k, the data analyst must use a balance value of k so it is not too big or too small. With small k, the algorithm will simply return the target value of the nearest observation which will lead the algorithm to be over fitting. When the k is too big, locally interesting behavior will be overlooked.
Bushier Tree
multiple nodes - ex. Categorical variables may have branch for each value, can result in over fitting.