CIS 412: Quiz 1, Quiz 2, and Attendance Questions

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Select ALL the statements that are TRUE about model accuracy. - Accuracy is misleading with imbalanced data - Accuracy determines the proportion of actual negatives that are correctly identified - Accuracy measures true positive rate - Accuracy doesn't make distinctions between false positives and false negatives

- Accuracy is misleading with imbalanced data - Accuracy doesn't make distinctions between false positives and false negatives

SVM uses a loss function known as hinge loss. Select ALL the following statement that are TRUE about hinge loss. - Hinge loss equals to zero when a negative instance is on the negative side of the boundary - Hinge loss increase linearly with the distance when instances are on the wrong side of the boundary and beyond the margin - hinge loss becomes positive when a negative instance is on the correct side of the margin - The farther away the instances are from the separating boundary, the less the loss

-Hinge loss equals to zero when a negative instance is on the negative side of the boundary - Hinge loss increase linearly with the distance when instances are on the wrong side of the boundary and beyond the margin

The population, with 30 instances, shown below is considered a ______ class problem

Given the visual plot below, select the size (number) of nodes to be included in the optimal tree model.

What does the following figure describe?

5 fold cross validation

How many steps are in a CRISP-Data Mining cycle/process?

predictive model

A formula for estimating the target variable

Which of the following is NOT true in regards to Classification/Decision Tree? - All the variables in the decision tree model must be categorical - the target variable for a classification decision tree cannot be numeric - decision trees for classification recursively perform IG based attribution selection - decision tree is one of the methods that are used for classification

All the variables in the decision tree model must be categorical

Which of the following is NOT part of the data format spectrum in data mining? - unstructured visual data - unstructured textual data - structured numeric data - Big Data

Big Data

In the Classification/Decision Tree shown below, which variable has the highest information gain?

Body shape

Which of the following is NOT a characteristic of Data Mining? -DM extracts useful information and knowledge from large volumes of data by following a well-defined process - DM revolves around data - DM helps make decisions based on intuition

DM helps make decisions based on intuition

A training set is used to train and evaluate a data mining model.

False

After we have added a new, hypothetical instance (star in right chart), we can conclude that Support Vector Machines appear to be more prone to over-fitting than Logistic Regression.

False

Based on the following 10-fold Cross-Validation results for the MegaTelCo data, a Data Analyst would recommend using the Logistic Regression model based on the model performance.

False

Determining which customers are most likely to leave a business (or a social media site) is unsupervised learning.

False

Estimating probability of default (or "write-off") for a new loan application is a regression problem.

False

In selecting informative attributes, we should look for attributes that produce subsets with highest entropy.

False

Linear discriminant boundary uses attributes recursively to classify the data.

False

SVM can only deal with linearly separatable data

False

Tree-structured predictive model creates a linear decision boundary to separate data points into different classes.

False

We expect to have as many hidden layers as possible to get a good model performance on test set.

False

In order to obtain a good model accuracy, we want to minimize which of the following? - True positive - True negative - True positive and false positive - False positive and false negative

False positive and false negative

A Fitting Graph plots ... - Generalization Performance vs. Model Complexity - True Positive rate vs. False Positive rate - True Positive rate vs. False Negative rate - Generalization Performance vs. Size of Training Dataset

Generalization Performance vs. Model Complexity

Determine the best initial attribute to segment the set of Stick-Figures shown below. Choose between two attributes: Head Shape and Body Shape. Entropy (parent/population) = 0.954 Show all your work. You may use the sample log table and simple calculator in your computer. (tips: create two tree structures based on Head Shape and Body Shape respectively, calculate information gain for each of the attributes and compare their values) IG = Entropy(parent) - [ p(c1) * Entropy(c1) + p(c2) * Entropy(c2) + ... ]

Head shape: entropy1 = -1/2*log(1/2)-1/2*log(1/2) = 1 entropy2 <- -4/6*log(4/6)-2/6*log(2/6) = 0.917 IG = 0.954 - (2/8*entropy1 +6/8*entropy2) = 0.016 Body shape: entropy3 = -1/3*log(1/3)-2/3*log(2/3) = 0.918 entropy4 = -1/5*log(1/)-4/5*log(4/5) = 0.722 IG2 = 0.954 - (3/8*entropy3+5/8*entropy4) = 0.159 Body shape has a higher IG

Which of the following CANNOT be a rule derived from the Classification/Decision Tree shown below? - IF(Employed = yes) THEN Class = Not Write-off - IF(Employed = No) AND (Balance < 50k) THEN Class = Not Write-off - IF(Employed = No) AND (Balance >= 50k) AND (Age <45) THEN Class = Write-off - IF(Employed = No) AND (Balance >= 50k) AND (Age >= 45) THEN Class = Write-off

IF(Employed = No) AND (Balance >= 50k) AND (Age <45) THEN Class = Write-off

What do the following two figures describe? Select ALL that apply. - Loss function of SVM - Kernal - Network Topology - Mapping data into a higher dimension

Kernal Mapping data into a higher dimension

In linear discriminant boundaries, we select the best possible line to separate the instances based on: - Best model performance - Loss function - AUC value - Estimated weights

Loss Function

When should the growth (generation) of a Classification/Decision Tree be stopped/terminated? Select ALL that apply.

Nodes are pure, there are no more instances to process

In the context of decision tree, if a categorical attribute to be used for segmentation of a dataset has m possible values, then the dataset will be segmented into _________ subsets/subgroups. - m + 1 - m / (m + n) - m - 1 - None of the above

None of the above (m)

Which of the following steps is NOT part of a Cross-Validation process? - Performing multiple splits - Systematic swapping - Obtaining new data sets - Computing the mean of multiple estimated performances

Obtaining new data sets

Select ALL the statement that are TRUE about overfitting: - Overfitting may be caused by the lack of representative instances in the training data - Holdout method can help detect the issue of overfitting - All data mining procedures have some tendency to overfit - Overfitting can be avoided by applying multiple data mining models to the dataset

Overfitting may be caused by the lack of representative instances in the training data - Holdout method can help detect the issue of overfitting - All data mining procedures have some tendency to overfit

Theoretically which is the preferred method when pruning a decision/classification tree?

Post-pruning

Which of the following components is NOT needed for implementing the Expected Value framework? - The Cost/Benefit Matrix - Probability based Confusion Matrix - The Confusion Matrix - The percentage of targeted instances

Probability based Confusion Matrix

What kind of visualization can we use to show the performance of a classification model at all classification thresholds?

ROC curve

Which is the following is NOT true about the ROC curve? - The dash line represents a random guessing strategy/model - ROC graphs are dependent on the class proportions as well as the costs and benefits - Roc curves are not an intuitive visualization for business stakeholders - AUC measures the area under the ROC curve

ROC graphs are dependent on the class proportions as well as the costs and benefits

How much will this customer use the service?

Regression

Which of the following does NOT describe a Support Vector Machine model (SVM)? - SVMs can estimate class membership probability - SVMs are based on supervised learning - SVMs use the Hinge Loss function - SVMs can only classify linearly separable data

SVMs can only classify linearly separable data

Which of the following is NOT considered an application of Data Mining? - fraud detection - prediction of loan repayment - Summary of the population in Arizona - Prediction of membership of a social media site

Summary of the population in Arizona

Briefly explain the basic difference between supervised and unsupervised data mining.

Supervised data has a specific, quantifiable target that we are interested in or trying to predict. There are two subgroups to supervised which are classification and regression. However, unsupervised does not specify a specific purpose or target for the grouping and there is no guarantee that the results will be meaningful or useful for any particular purpose. Therefore, the difference is that supervised is specific and quantifiable and unsupervised does not specify a target.

TP: x | FP: 0 FN: 0 | TN: y

Given a Confusion Matrix represented by (see image) Calculate its True Positive Rate (i.e., recall ) and True Negative Rate (i.e., specificity). Provide your answers in fraction form (e.g. x/y). (Clearly show your work.)

TPR = TP/(TP+FN) TPR = 65/(65+9) TPR = 65/74 TPR = 0.878 TNR = TN/(TN+FP) TNR = 52/(52+4) TNR = 52/56 TNR = 0.92857

In the mathematical equation below, representing a linear boundary, variable "y" represents the _________. y = b + w1x1 + w2x2 + w3x3 + ... - Input variable - Target variable - y-intercept - Slope

Target Variable

Model induction

The creation of models from data

Based on the figure below, which of the following is NOT valid in terms of basic characteristics of a neural network? - x1, x2, and x3 each process a single feature (predictor) in the dataset - Feature's value will be transformed by the corresponding node's activation function. - There is one hidden layer in this figure - The number of input nodes is predetermined by the number of features in the input data

There is one hidden layer in this figure

One reason that why logistic regression takes the log- odd is that: - It's easier to calculate - To make the estimated value (f(x)) range from -∞ to +∞ - We need negative values to present the possibility - None of above are correct

To make the estimated value (f(x)) range from -∞ to +∞

What are training data and test data used for when we perform data mining tasks?

Training data is the input data for model induction. This means t is used to train the model and help it understand the data and produce the outputs. Training is a certain percentage of the overall data set that is used to train the model. Test data is used to see how accurate the model is after it is trained. This can help to know if overfitting is happening. It is a larger amount of data than the training set and will show if there are issues with the model.

"Do my customers form natural groups?" is an example of clustering.

True

A predictive model is a sort of formula to estimate the unknown value of interest, which we often call "the target".

True

An "Objective (Loss) Function" measures the amount of classification error a model has for a given training dataset.

True

Classification models attempt to predict which class an instance in a population belongs to.

True

Data Mining is the application of various analytical techniques to find useful knowledge, patterns and relationship among data.

True

Decide which customers are mode likely to leave is an example of classification problem

True

Finding the features that differentiate customers into different groups is an example of an unsupervised learning task. [Hint: Think clustering!]

True

In a Classification/Decision Tree induction (generation) process, the next attribute added is the one with the largest increase in Information Gain value.

True

In a Classification/Decision Tree, the root node represents the attribute with the highest information gain value

True

In supervised segmentation, informative attributes increase model accuracy.

True

In the confusion matrix, a false negative occurs when a classifier predicts an instance as negative when it is a positive.

True

In the context of a Classification/Decision Tree, every instance/data point will correspond to one and only one path ending at a leaf node

True

Logistic regression estimates the probability of class membership over a categorical class

True

Over-fitting occurs when a model learns perfect on the training data but can not be generalized to new dataset.

True

Support Vector Machines (SVMs) approach classifies problems by finding the widest possible bar that fits between points of two different classes.

True

When conducting supervised data mining the value of the attributes except for the target variable is known when the model is used

True

Suppose you are working in a marketing team and trying to advertise a new product to your customers. You developed a classification model to identify the customers who would purchase the new product . If the model predicts that one particular customer is going to purchase this product, you will send him/her a promotional offer so they can purchase this product at a discounted price. To evaluate your classification model, you decided to use the expected framework value as the evaluation metric. To calculate the expected value you had to define the cost/benefit matrix. Given the information below: Profit to sell one product to a customer: $100 Cost to manufacture this product: $50 Cost to target (e.g., send a promotion offer) the customer who will purchase the product: $2 What are costs (or benefits) associated with True Positive cell and False Positive cell in the confusion matrix?

True Positives would be we predict someone will buy something if we send them an offer and they do buy something. False Positive would be we predict someone will buy something if we send them an offer and they don't buy anything. The costs associated with a true positive is $50 for manufacturing and $2 for targeting, but you receive $100. So, the cost would be 100 - 2 = $98. The costs associated with the false positive cell is $-2 because you lost the $2 you spent targeting the customer that did not buy anything.

Which of the following is NOT a characteristic of a tree-structured model? - two parents share children - Made up of root, interior nodes, leaf nodes, and branches - every instance always ends up at a leaf node - each branch represents a distinct value of the attribute at that node

Two parents share children

Below is the instance space as we use two attributes (age and balance) to predict write-off or non write-off. Plus sign denotes non write-off. Filled dots denotes write-off. If a new instance has a balance of 40,000 dollars and his age is 40, which class will he being assign to?

Write-off

Which attribute did we use to do the first partitioning in the stick figure exercise in order to produce the pure final subgroups?

body shape

Did advertisements influence a consumer to purchase?

casual modeling

How likely is this consumer to respond to our campaign?

classification

Which dataset do we use for Software lab 2 demonstration?

credit data

Which of the following is NOT a step in a typical Data Mining process? - modeling - data understanding - data storage - data preparation

data storage

Which function do we use to get the first several rows in a data frame (suppose the data frame is called df)?

df.head()

information gain

difference in entropy between parent node and weighted sum of child node

Assume the image below is a representation of one leaf node in a classification/decision tree (having only two classes: + and -). Which of the following information can we get from this image? Select ALL that apply. Information gain entropy total number of instances in this segment all of the above

entropy total number of instances in this segment

Knowing that the complexity of a model increases as the complexity of its linear equation increases, as well as the relations between model complexity and overfitting, which one of the following linear discriminants is most prone to over-fitting a training data set? - f(𝑥) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5(x1/x5) + w6x42 - f(𝑥) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5(x1/x5) - f(𝑥) = w0 + w1x1 + w2x2 + w3x3 + w4x4 - f(𝑥) = w0 + w1x1 + w2x2 + w3x3

f(𝑥) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5(x1/x5) + w6x42

Which of following is NOT used to compare the model performance? - Lift curve - fitting graph - ROC curve - Cumulative profit curve

fitting graph

entropy

how mixed up classes are in a group

In k-fold Cross-Validation method, what is used for training a model? - k + 1 folds - k folds - k - 1 folds - None of the above

k - 1 folds

Which of the following is NOT the data approach to deal with imbalanced data? - Oversampling - K fold cross validation - Under-sampling - None of the above

k fold cross validation

logistic regression

log odds

What items are commonly purchased together?

market basket analysis

Information gain is used to: -measures the change in entropy due to any amount of new information being added -is only used to calculate entropy -is a measure of correlation between numeric variables -is prone to over-fitting

measures the change in entropy due to any amount of new information being added

A "measure of purity" known as entropy ...

measures the impurity of a set

Is there a single evaluation metric that is "right" for any data-mining tasks?

Regression is distinguished from classification by

numerical target variable

regression

numerical target variable

Calculate the entropy value for the set shown below. Show all your work. You may use the log table attached and the calculator in your computer. (Hint: show how you calculate p1, p2 to make sure you get some points) Entropy = - [p1 * log(p1) + p2 * log(p2) + ... ]

p1 = negative p2 = positive p1 = 3/4 p2 = 1/4 entropy = - [3/4*log2(3/4) + 1/4*log2(1/4)] entropy = - [3/4*-0.415 * 1/4*-2] entropy = -[-0.31125 + -0.5] entropy = - [-0.81125] entropy = 0.81125

A Fitting Graph plots

performance vs. model complexity

The goal of Laplace "correction" is to ... - increase the influence of segments (leaf nodes) with only a few instances - reduce the influence of segments (leaf nodes) with only a few instances

reduce the influence of segments (leaf nodes) with only a few instances

Test data

the data for model evaluation

What does the "k" mean in k-means cluster analysis?

the number of clusters

Target variables

the unknown value

Which attribute should be used to create the segmentation for the 10-instance facebook sample data?

CIS 412: Quiz 1, Quiz 2, and Attendance Questions

Ensembles d'études connexes

ADVANCED ACCOUNTING II - Chap 13

Chapter 2

4.08 Quiz: Multiple Samples from a Population

Obstetrics and pediatrics

HBS Eye Anatomy

Promulgated Contract Forms - Unit 5 - test

Econ Exams 1-3

MATH 4C Linear Algebra Chapter 2 TRUE OR FALSE

Project Management Ch3&4 Test Guide

Quiz #3 - Hand & Wrist

Principle of Real Estate 1

CPR/AED, First Aid Study Guide

Module B Learning Outcomes

Le grand Quiz de culture générale!

Inference

ACCT324 Quizzes Exam 1

Estudios Sociales 4 periodo

MKT 352 Exam 2: Chapter 4

ch 4 Questions 1-4: Match each of the following items with it's description

world history chapter 25 review