Data Mining Final Questions

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

To use a trained decision tree model to make predictions for a scoring data set in RapidMiner, you would use the _________ operator. Apply Model Predict Model Apply Predict()

Apply Model

A _________ is a type of data store that is intentionally created for a specific business unit, usually for management and reporting purposes. data extraction data mart data warehouse relational database

data mart

The mathematical formula for calculating the ________ percent is the number of times an association did occur divided by the number of times the premise (or antecedent) occurred in the data set. support confidence Laplace gain

confidence

__________ is a statistical measure of how strong the relationships are between attributes in a data set.

correlation

In RapidMiner, which operator is used to create correlation coefficients? Coefficient Matrix Statistical Correlation Correlate Correlation Matrix

correlation matrix

which of the following is NOT one of the six steps of CRISP-DM process? Evaluation Data Execution Data Understanding Deployment

data execution

Given three political parties, Republican Party, Democrat Party, and Green Party, the logistic regression can be used to predict which party will win the elections.

false

If you attempt to make a prediction for an out-of-range scoring observation in a linear regression model in RapidMiner, the software will throw an error.

false

The Logistic Regression operator in RapidMiner offers only one algorithm for model creation.

false

The k-Means Clustering is a data mining model that predicts values.

false

True or false: A support percent of 18% in an association set would be considered too low to be of any use.

false

True or false: Business (or Organizational) Understanding and Data Understanding are not necessary when you are confident that data is prepared and ready for analysis.

false

True or false: Conducting data mining and analytics on high-volume transactional database systems is recommended because such systems have the most up-to-date data.

false

True or false: Decision tree models do not provide confidence percentages alongside their predictions.

false

True or false: FP-Growth creates association rules in RapidMiner.

false

True or false: In decision tree models, all independent variables are given equal weight when making predictions.

false

True or false: Scatterplots can only show the correlation of two attributes at a time.

false

True or false: The cluster number assigned to each cluster in a k-Mean model indicates the relative importance of each cluster when compared to the others.

false

True or false: When correlation coefficients between two attributes reach the "strong" or "very strong" ranges, you have discovered statistical evidence that one of the attributes causes the other to change in some way. TrueFalse

false

Unlike in linear regression, it is possible to have more than one dependent variable in a logistic regression model.

false

Which of the following is NOT another name for a row of data in a database? case example field observation record tuple

field

To examine all records in one specific cluster in RapidMiner, use a ______________ operator. Sample Filter Examples Cluster Select Attributes

filter examples

To remove out-of-range values from a scoring data set in RapidMiner, use a ___________ operator.

filter examples

To remove unwanted observations from a data set in RapidMiner, use the _______ operator. Select Attributes Remove Observations Filter Observations Filter Examples

filter examples

To remove unwanted or unusable observations from data sets in RapidMiner, use the _______ operator. Filter Examples Select Attributes Declare Missing Values Replace

filter examples

To view which observations are assigned to each cluster in a k-Means model in RapidMiner, use the ________ feature.

folder view

Sloppy organization of data causing dubious analysis results is an example of ________. Data Preparation Lazy Modeling Garbage In, Garbage Out Data Understanding

garbage in, garbage out

To calculate the sum total of all predictions in a linear regression operator in RapidMiner, use a(n) __________ operator. Sum Total Aggregate Summarize

aggregate

In the Neural Net operator in RapidMiner, which of the following parameters will cause the model to stop the training process if its value is reached? Training cycles Momentum Learning rate All of these would stop training if their value is reached.

all of these would stop training if their value is reached

In RapidMiner, the attribute you wish to predict must be set to the role of _________.

label

In a decision tree, the dependent variable value found at the end of each path through the tree is known as a _________. fork branch leaf node

leaf

The output of a correlation is called a ________. contingency table match table similarity index matrix

matrix

To prevent a k-Means model for a large data set from taking a long time to run, you can adjust the _________ parameter in RapidMiner. divergence max runs measure types start values

max runs

In linear regression, the p-values for each independent variable must be smaller than ________. the intercept the value of the y variable alpha the coefficients for each independent variable

alpha

The k-Means clustering technique for data analysis is ideal for _________. machine learning prediction time series forecasting segmentation

segmentation

To remove unwanted attributes from a data set in RapidMiner, use the _______ operator. Filter Examples Select Attributes Remove Attributes Select Examples

select attributes

Correlation coefficients are generally considered strong if they are at least _________. 0.2 or -0.2 0.95 or -0.95 0.6 or -0.6 none of the above

0.6 or -0.6

In RapidMiner, if the number of hidden layers is not specified by the analyst, how many hidden layers will be used to train a neural network? 2 10 5 1

1

The default Confidence Percent used for logistic regression models is _______.

95%

The operator required in RapidMiner to find frequency patterns in a data set is called ________. FP-Growth FP-Find FP-Detect FP-Associate

FP-Growth

Increasing which parameter of the Decision Tree operator in RapidMiner would reduce the size of the tree? Minimal Leaf Size Criterion Confidence Split Size

Minimal Leaf Size

Databases designed to support a dimensional examination and aggregation are referred to as ________. SQL systems OLAP systems ENTP systems OLTP systems

OLAP systems

Databases designed to support a high number of reads and writes are referred to as ________. OLTP systems OLAP systems SQL systems ENTP system

OLTP systems

If a training data set in RapidMiner contains a non-predictive, numeric identification column, how must this be handled when creating logistic regression models? Nothing. RapidMiner can detect and ignore identification columns in training data. The role for the identification column must be set to "ID". Nothing. RapidMiner will allow you to remove the identification column in Results view after the model is created. The role for the identification column must be set to "label".

The role for the identification column must be set to "ID".

Data types for independent variables in a decision tree model must be_______ numeric binary text any of the above

any of the above

Association rules use the _______ algorithm to find frequently associated attributes in a data set. apriori incarini logetti associati

apriori

In a decision tree model represented visually in RapidMiner, the first predictive independent variable is represented __________. at the bottom on the right on the left at the top

at the top

Removing columns from a data set because they are not useful for a certain type of data analysis is an example of _________. attribute reduction observation reduction content reduction document reduction

attribute reduction

The data type of the dependent variable in logistic regression must be ________.

binominal

The required data type for all attributes in an association rule model in RapidMiner is _______. binary polynominal binominal integer

binominal

In RapidMiner, which of the following will automatically be generated when the Apply Model operator applies a neural network model to a scoring data set? both class predictions and confidence percentages class predictions only both class predictions and standard error values confidence predictions only

both class predictions and confidence percentages

Establishing permission to use company data in analytic activities should take place during which CRISP-DM phase? Business (or Organizational) Understanding Data Preparation Deployment Evaluation

business understanding

________ is the first step in the CRISP-DM process. Business (or Organizational) Understanding Evaluation Modeling Data Preparation

business understanding

Which of the following is NOT another name for a column of data in a database? attribute case variable field

case

The averages for each attribute in each cluster created by a k-Means model are called _______. midpoints complex means centroids simple means

centroids

In linear regression, the m variable is the independent variable's ________.

coefficient

The values in correlational analysis results are called _______. convergences covariates coefficients contingencies

coefficients

Evaluation within the CRISP-DM process is intended to ensure that ________. data inputs have numeric data types data mining results are reliable and useful employees within a company know about data mining the company has complied with governmental regulations

data mining results are reliable and useful

Reformatting phone numbers so that they all conform to a 12-character standard, such as 123-456-7890, would take place in which phase of the CRISP-DM process? Deployment Data Preparation Evaluation Data Understanding

data preparation

A __________ is a subset of a database or data warehouse usually created for a specific analytic purpose. relational database reduction denormalization data set

data set

Auditing an organization's databases, spreadsheets, file servers, and records repositories for information to use in data analysis is an example of ________. Modeling Deployment Business (or Organizational) Understanding Data Understanding

data understanding

The attribute you want to predict in a predictive model is called a(n) _________. dependent variable independent variable identifying variable category variable

dependent variable

To see the size of each cluster in RapidMiner, click the ______________ icon in Results view. Folder View Description Centroid Table Graph

description

Changing one categorical attribute (e.g., "Blue," "Red," "Green") into a series of binary attributes (e.g., Blue = 0/1; Red = 0/1; Green = 0/1) is known as _________. dummification digitization binarification dummy coding

dummy coding

Which of the following is NOT an example of data scrubbing as described in the text? handling inconsistent data reducing data handling marginal data handling missing data

handling marginal data

The space between the independent variables and the dependent variable where a neural network model gets trained is called the ____________. training layer hidden layer neural layer synapse layer

hidden layer

A value of "middle-aged" in an attribute that otherwise contains peoples' ages in number of years would be an example of _______. modified data inconsistent data aged data alphabetic data

inconsistent data

An attribute used to predict outcome values in a predictive model is called a(n) _________. dependent variable independent variable identifying variable category variable

independent variable

What is calculated in the nodes of the hidden layer of a neural network? dependent variable weights independent variable ranges dependent variable ranges independent variable weights

independent variable weights

In linear regression, the b variable is the model's ________.

intercept coefficient

What type of correlation occurs when two attributes are correlated to one another, and as the values in one attribute increase, the values in the other attribute decrease? mutual neutral negative positive

negative

In a decision tree, the independent variable found at each branch of the tree is known as a _________ fork branch leaf node

node

he data type for the dependent variable in a classification decision tree model must be __________. Nominal Numeric Binary None of these

nominal

Missing values in a data set mean that ________. the data contains errors the data set is unusable the data set contains outliers none of the above

none of the above

In linear regression, p-values larger than alpha indicate that their corresponding independent variables are __________.

not statistically significant

The data type required for independent variables in a neural network model must be _________. categorical label binary numeric

numeric

The data types of all independent variables in logistic regression must be _________.

numeric

What data type must be assigned to the dependent variable in RapidMiner when building linear regression models? Label Target Numeric Polynominal

numeric

Line-by-line records of each item sold at a grocery store would be an example of ________. organizational data strategic data operational data aggregate data

operational data

Data analysis processes in RapidMiner are built using rectangular building blocks called _________.

operators

What type of correlation occurs when two attributes are correlated to one another, and as the values in one attribute decrease, the values in the other attribute also decrease? positive negative mutual neutral

positive

Discriminant analysis, k-Nearest Neighbors, and Naïve Bayes are all datamining models used to __________ data values. categorize predict & categorize guess predict

predict & categorize

Considerations for Data Understanding include all of the following EXCEPT _________. accuracy of the data presentation of the data age of the data completeness of the data

presentation of the data

The Naïve Bayes technique for predicting categorical outcomes employs both ________ and _______. outliers; skewness probability; outliers variance; skewness probability; variance

probability, variance

If a data analyst finds that a decision tree model has too many nodes or leaves to be meaningful, the analyst should apply _________ to the tree. chopping lifting stumping pruning

pruning

Association rules data mining is often used to produce ________. ideations forecasts predictions recommendations

recommendations

Removing records that contain missing or inconsistent data from a data set before analysis is an example of _________. purging reduction missing values data mining

reduction

Selecting some subset of records from a data set is called ________ the data. modifying sampling morphing correcting

sampling

Data that does not have known outcome values for an attribute you wish to predict is known as ________ data. unknown scoring evaluating training

scoring

The mathematical formula for calculating the ________ percent is the number of times an association did occur divided by the number of times it could have occurred in the data set. support confidence Laplace gain

support

In neural networks, the pathways between independent variables and dependent variables are called __________. synapses convergences line paths propagations

synapses

Data arranged into columns and rows in a database are stored in ________. warehouses tables dimensional arrays matrices

tables

In RapidMiner, if one or more independent variable has a non-numeric data type, what would be required for the Neural Net operator to work correctly? The non-numeric independent variables could be recoded to numeric values or excluded from the model. Nothing. The Neural Net operator in RapidMiner will automatically convert independent variable data types if it needs to. Nothing. The Neural Net operator in RapidMiner will use Linear Regression model instead. Nothing. Like Decision Trees in RapidMiner, the Neural Net operator can also handle independent variables of all different data types.

the non-numeric independent variables could be recoded to numeric values or excluded from the model

The k in k-Means indicates ________. the intercept for the model the number of clusters desired the coefficient of the dependent variable the coefficient of the independent variable

the number of clusters desired

In a neural network model, how many nodes will the output layer always have? the number of variables used in the scoring dataset the number of distinct values in the dependent variable a random number based on how many training iterations (cycles) occur the number of variables used to train the model

the number of distinct values in the dependent variable

Data containing known outcome values for an attribute you wish to predict is known as ________ data. testing training evaluating scoring

training

Which parameter of the Neural Net operator in RapidMiner will limit the number of forward/backward propagations? training cycles learning rate momentum maximum iterations

training cycles

Using two or three different modeling techniques on the same data and then comparing predicted outcomes across the different models is called _________. the circle method trial-and-error analytics hypothesis testing triangulation

triangulation

All of the attribute values must be numeric if one wants to use the k-Means Clustering model.

true

In RapidMiner, the label (dependent variable) can be coded either alphabetically (e.g., true/false) or numerically (e.g., 0/1).

true

In logistic regression, the smaller the p-value for an independent variable, the more predictive power that variable has relative to the dependent variable

true

The values true/false or 0/1 would both be valid combinations for the dependent variable in a logistic regression model.

true

True or false: A confidence percent of 18% in an association set would be considered too low to be of any use. TrueFalse

true

True or false: Data mining modeling techniques can classify, predict, or both.

true

True or false: In RapidMiner, data can be either imported or read into the software from CSV, text, and spreadsheet files. TrueFalse

true

True or false: In RapidMiner, if the min support parameter on the FP-Growth operator is set at 0.8 and the min confidence on the Create Association Rules operator is set at .75, the software may still return rules if an association has confidence of 88% but support of just 52%. TrueFalse

true

True or false: It is permissible at any point in the CRISP-DM process to return to an earlier step.

true

True or false: Scatterplots are a method of visualizing statistical correlations.

true

Value ranges for all attributes for every observation in a scoring data set must be within the value ranges for the corresponding attributes in the training data set in a linear regression model.

true

In linear regression, the x variable is the independent variable's ________

value


Kaugnay na mga set ng pag-aaral

Test Out Linux Pro 4.3.7 Practice Questions

View Set

Skills Lesson: Proposing Logical Alternatives

View Set

IDS 3138 Cannabis and its impact - Cannabis and the Body Quiz 3 study guide

View Set