MSIS 4263 - Exam 2
In building a decision tree, the stopping rule determines ___________
When the splitting should stop
In a decision tree, each leaf represents a _____.
a class
Association Rules (Market Basket) is a predictive data mining approach.
False
Classification trees are used for predicting dependent variables that are continuous in nature, whereas regression trees are used for predicting categorical dependent variables.
False
Decision Tree is a unsupervised learning algorithm
False
Decision tree construction is performed in a bottom-up manner.
False
If the probability of an outcome is p = LaTeX: \frac{outcome\:of\:interest}{all\:possible\:outcomes} o u t c o m e o f i n t e r e s t a l l p o s s i b l e o u t c o m e s , then the probability of a coin flip is p(heads) = LaTeX: \begin{matrix}\frac{2}{6}\end{matrix} 2 6 = 0.333
False
In interpreting the results of a logistic regression model, when the coefficient is negative, this means that the odds ratio is more than 1.
False
Logistic regression uses a straight line to model the probabilities of the predicted values.
False
Odds can range from 0 to 1, whereas Probabilities is a ratio of two probabilities ranging from 0 to infinity
False
This association rule A --> B, suggests that "if A occurs, then B occurs"
True
One of the following is a common algorithm for generating association rules.
Apriori
The module in SAS EM that deals with text mining processes is the _____
Text mining
In text mining, the data preprocessing concept known as Filtering refers to _____.
- Removing unique terms - Removing common words
In preparing to perform text mining, identity some of the issues one encounters in converting unstructured data to structured data.
- Tokenization - Stemming - Filtering - Parsing
Identify the types of data used for classification predictions.
- discrete - categorical - nominal
Consider the transactions in this image. What is the support of the rule {Milk} --> {Sugar}? TID Items 1 Bread, Eggs, Milk 2 Beer, Bread, Milk 3 Apples, Bread, Eggs 4 Bread, Milk, Sugar 5 Eggs, Milk 6 Bread 7 Bread, Milk, Sugar 8 Beer, Bread 9 Apples, Eggs, Sugar 10 Bread, Eggs
20%
Consider the transactions in this image. What is the support of the rule {Bread} --> {Eggs}? TID Items 1 Bread, Eggs, Milk 2 Beer, Bread, Milk 3 Apples, Bread, Eggs 4 Bread, Milk, Sugar 5 Eggs, Milk 6 Bread 7 Bread, Milk, Sugar 8 Beer, Bread 9 Apples, Eggs, Sugar 10 Bread, Eggs
30%
Consider the transactions in this image. What is the support of the rule {Bread} -->{Milk}? TID Items 1 Bread, Eggs, Milk 2 Beer, Bread, Milk 3 Apples, Bread, Eggs 4 Bread, Milk, Sugar 5 Eggs, Milk 6 Bread 7 Bread, Milk, Sugar 8 Beer, Bread 9 Apples, Eggs, Sugar 10 Bread, Eggs
40%
An Odds ratio can be interpreted as ______
A comparison between two Odds
In logistic regression, the target is a/an ___
Binary variable
The measure of strength of an association rule is called _____
Confidence
Which of the following is considered the right hand side of a rule.
Consequent
In text mining, the data preprocessing concept known as lemmetization refers to _____.
Dealing with base words and their related terms
What is the term used to describe the additional information required to predict an event, measured in bits
Entropy
This association rule A <-- B, suggests that "if A occurs, then B occurs"
False
Text mining is the process of applying data mining algorithms to categorical data.
False -Text mining is the process of applying data mining algorithms and approaches to textual data.
Roger is a data scientist working in a marketing consulting company. He has been asked to build a model that identifies loyal and non-loyal customers for one of their biggest clients. What type of classification model is Roger planning to use?
Logistic regression
A field of study that gives computers the ability to learn without being explicitly programmed
Machine learning
Unlike linear regression, logistic regression uses ___ to model
Maximum Likelihood Estimation
In classification, when you test a model, you are _______.
Measuring the model's ignorance
A classifier that performs well on the training data, but poorly on real or new data is known as _____
Overfitting
Identify the type of task that a data scientist is performing when he/she is measuring the model's ignorance.
Partitioning the available data
When one term relates to multiple concepts, this is known as _____
Polysemy
Which of the following is true of supervised learning.
Requires training
The measure of relevance of an association rule is called _____
Support
Multiple synonyms that represent the same concept is known as ____
Synonymy
Logistic regression forces predicted values to fall between 1 and 0.
True
In the classification process, which data partition is usually used in constructing the classification model?
Training data
A corpus is a collection of documents
True
Deep Learning is a subset of Machine Learning
True
Entropy provides us the information required to predict an event with certainty.
True
Latent Semantic Analysis is a text mining algorithm used for topic extraction
True
The Odds of 2 means that a person is twice as likely to experience an event as not to experience it.
True
In the classification process, which data partition is usually used in fine-tuning and assessing the performance of the classification model?
Validation data
In a decision tree, each branch represents a _____.
a decision
The term greedy algorithm refers to _______
algorithms that consider locally optimal solution
Classification is a(n) _____ method
predictive
When a node in a decision tree is impure nodes, this means that _____
we are not certain