Data Mining Week 1
Which of the following IS one of the last two steps in the decision-making process?
Choose an alternative
The father of information theory is _____________.
Claude Shannon
Which of the following is NOT one of the last three steps in a CRISP-DM project?
Data scoring
From the bottom of the data mining pyramid to the top, what is the sequence?
Data-Information-Knowledge-Decisions
What is the purpose of CRISP-DM?
Decision Making
What is the Shannon definition of information?
Defined information as surprise
The person credited with coining the three V's defining Big Data is _________.
Doug Laney
Rattle requires that we click on the ___________ button at each stage of processing.
Execute
The problem-solving process includes all of the decision-making process and?
Implement alternative and Evaluate results.
Data is the raw material of the _________.
Information Age
Information is the input for
Knowledge
Which of the following is NOT one of the first three steps in the decision-making process?
Obtain the data
Analytics goes back at least as far as the work of _____________.
Sir Ronald A. Fisher
What made 15-th century printing important?
Spread of knowledge and allowed information to travel faster.
Data transformation refers to the process of altering or converting the original data to a different form or scale, often to meet the assumptions of a ___________ analysis, improve interpretability, or enhance the performance of a model.
Statistical
What is data mining?
The process of discovering useful information from data.
Sample is...?
a subset of the population
Data are facts acquired through ______, ________, experimentation and ________.
conveyance, theory, and computation
A numerical variable is discrete if it results from a
count
According to Fayyad, et al., which of the following is true regarding the relationship of data mining and knowledge discovery in databases?
data mining is a subset of knowledge discovery in databases
Analytics enables us to begin with ______ and end with _______.
data | knowledge
Why are outliers important?
determine their impact
Knowledge is required for
good decisions
What makes a variable numeric?
if meaningful arithmetic can be performed on it
According to Fayyad, et al., which of the following is true regarding the relationship of data mining and model building?
model building is a subset of data mining
According to Fayyad, et al., knowledge discovery in databases is the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable __________.
patterns in data
Managing data is essential to
produce information
Why is the standard deviation important?
quantifies the spread or dispersion of data points
Data is the ____________ of the Information Age
raw material
The purpose of a model in data mining is to ____________________.
separate the systematic part of the phenomenon from the unsystematic part
DMRR mentions confusion matrices in chapter 2. Which of the following is NOT a term referring to correct predictions?
specificity
The Summary provided by Rattle and R has all of Tukey's five-number summary values plus the ________.
standard deviation
Good decisions generate
successful outcomes
For data to be the source for new knowledge requires ___________.
surprise
Data is _______________.
the new oil
What is indicated when the median is smaller than the mean, or vice versa?
there are outliers
What skewness occurs when there are really large values?
to the right
The definition of Big Data's three V's does NOT include which of the following _________.
variance