MSIS 4263: Exam 2 Quizzes
True
The Odds of 2 means that a person is twice as likely to experience an event as not to experience it. (T/F)
Entropy
What is the term used to describe the additional information required to predict an event, measured in bits
we are not certain
When a node in a decision tree is impure nodes, this means that _____
False
Association Rules (Market Basket) is a predictive data mining approach. (T/F)
Filtering Stemming Tokenization Parsing
In preparing to perform text mining, identity some of the issues one encounters in converting unstructured data to structured data. •Filtering •Stemming •Tokenization •Parsing
Removing common words Removing unique terms
In text mining, the data preprocessing concept known as Filtering refers to _____. •Removing common words •Removing unique terms •Dealing with base words and their related terms •Removal of term suffices
Dealing with base words and their related terms
In text mining, the data preprocessing concept known as lemmetization refers to _____.
False
Text mining is the process of applying data mining algorithms to categorical data. (T/F)
False
(T/F)
Overfitting
A classifier that performs well on the training data, but poorly on real or new data is known as _____
True
A corpus is a collection of documents (T/F)
Machine learning
A field of study that gives computers the ability to learn without being explicitly programmed
A comparison between two Odds
An Odds ratio can be interpreted as ______
predictive
Classification is a(n) _____ method
False
Classification trees are used for predicting dependent variables that are continuous in nature, whereas regression trees are used for predicting categorical dependent variables. (T/F)
20%
Consider the transactions in this image. What is the support of the rule {Milk} --> {Sugar}?
30%
Consider the transactions in this image. What is the support of the rule {Bread} --> {Eggs}?
40%
Consider the transactions in this image. What is the support of the rule {Bread} -->{Milk}?
False
Decision Tree is a unsupervised learning algorithm (T/F)
False
Decision tree construction is performed in a bottom-up manner. (T/F)
True
Deep Learning is a subset of Machine Learning (T/F)
True
Entropy provides us the information required to predict an event with certainty. (T/F)
Partitioning the available data
Identify the type of task that a data scientist is performing when he/she is measuring the model's ignorance.
Categorical Nominal Discrete
Identify the types of data used for classification predictions. • categorical • ratio • nominal • discrete
a decision
In a decision tree, each branch represents a _____.
class
In a decision tree, each leaf represents a _____.
When the splitting should stop
In building a decision tree, the stopping rule determines ___________
Measuring the model's ignorance
In classification, when you test a model, you are _______.
False
In interpreting the results of a logistic regression model, when the coefficient is negative, this means that the odds ratio is more than 1. (T/F)
Binary variable
In logistic regression, the target is a/an ___
Training data
In the classification process, which data partition is usually used in constructing the classification model?
Validation data
In the classification process, which data partition is usually used in fine-tuning and assessing the performance of the classification model?
True
Latent Semantic Analysis is a text mining algorithm used for topic extraction (T/F)
True
Logistic regression forces predicted values to fall between 1 and 0. (T/F)
False
Logistic regression uses a straight line to model the probabilities of the predicted values. (T/F)
Synonymy
Multiple synonyms that represent the same concept is known as ____
False
Odds can range from 0 to 1, whereas Probabilities is a ratio of two probabilities ranging from 0 to infinity (T/F)
Apriori
One of the following is a common algorithm for generating association rules.
Logistic regression
Roger is a data scientist working in a marketing consulting company. He has been asked to build a model that identifies loyal and non-loyal customers for one of their biggest clients. What type of classification model is Roger planning to use?
Support
The measure of relevance of an association rule is called _____
Confidence
The measure of strength of an association rule is called _____
Text mining
The module in SAS EM that deals with text mining processes is the _____
algorithms that consider locally optimal solution
The term greedy algorithm refers to _______
True
This association rule A --> B, suggests that "if A occurs, then B occurs" (T/F)
False
This association rule A <-- B, suggests that "if A occurs, then B occurs" (T/F)
Maximum Likelihood Estimation
Unlike linear regression, logistic regression uses ___ to model
Polysemy
When one term relates to multiple concepts, this is known as _____
Consequent
Which of the following is considered the right hand side of a rule.
Requires training
Which of the following is true of supervised learning.