SAS Enterprise Miner Certification
builds classification models in an attempt to improve classification of rare events in the target.
Rule Induction Node A. builds classification models in an attempt to improve classification of rare events in the target. B. provides several predictive modeling techniques using latent variables.
enables the user to manually build if-then-else rules.
Rules Builder Node A. enables the user to manually build if-then-else rules. B. replace specific values and unknown levels for class variables.
The answer is 17.6576 The measure of association is provided as the Chi Square value in the image.
Refer to the output from the StatExplorer node shown in the image. What is the measure of association of variable used_ind with the target?
Impute
Regression input variables with missing values can cause problems such as biased predictions. Methods to _______________ these values include synthetic distributions, estimation, and tree algorithms. A. Transform B. Modify C. Impute
Logistic
______________ regression is used by the tool if you have a categorical response (a binary or ordinal measurement level). A. Logistic B. Linear
creates random forest models in the HP Env.
HP Forest Node A. generates an identifier variable identifying the observations to be used for training and validation. B. creates random forest models in the HP Env.
creates Generalized Linear models in the HP Env.
HP GLM Node A. creates Generalized Linear models in the HP Env. B. creates random forest models in the HP Env.
creates new datasets or views by combining columns from datasets.
Merge Node A. creates new datasets or views by combining columns from datasets. B. creates a sample dataset.
used to modify columns metadata for variables.
Metadata Node A. used to modify columns metadata for variables. B. establishes a central connection to join multiple nodes.
fits both linear and logistic regression models.
Regression Node A. fits both linear and logistic regression models. B. provides several predictive modeling techniques using latent variables.
performs unsupervised learning using Kohonen vector quantization(VQ), Kohonen self-organizing maps (SOMs), or batch SOMs with Nadaraya-Watson or local-linear smoothing.
SOM/Kohonen Node A. performs unsupervised learning using Kohonen vector quantization(VQ), Kohonen self-organizing maps (SOMs), or batch SOMs with Nadaraya-Watson or local-linear smoothing. B. performs market basket analysis for data with potential item taxonomy.
creates a sample dataset.
Sample Node A. creates a sample dataset. B. creates new datasets or views by combining columns from datasets.
used to save data to a dataset or other specified location.
Save Data Node A. used to save data to a dataset or other specified location. B. used to register a model to the SAS Metadata Server.
Input Tool
The _______________ tool represents the data source that you choose for your mining analysis. It also provides details (metadata) about the variables in the data source that you want to use. A. Input Tool B. Source Tool
Average
The _____________function takes the average of the prediction estimates from the different models as the prediction from the Ensemble tool. This function is the default method. A. Average B. Maximum
Squared Error
The squared difference between a target and an estimate is called the ___________________. A. Misclassification B. Squared Error C. Variance Probability
Complex
______________ models can eliminate prediction bias but are much harder to interpret. A. Complex B. Simple
Doubling Amount
A __________________ gives the amount of change required for doubling the primary outcome odds. It is equal to log(2) ≈ 0.69 divided by the parameter estimate of the input of interest. A. Change Event B. Doubling Amount C. Logit Score
Nominal
A _____________variable (sometimes called a categorical variable) is one that has two or more categories, but there is no intrinsic ordering to the categories. For example, gender is a categorical variable having two categories (male and female) and there is no intrinsic ordering to the categories. A. Nominal B. Ordinal C. Interval
All of the Above
A data source in SAS Enterprise Miner differs from a raw data file because a data source has additional attached metadata. This metadata includes which of the following? a. the variable roles b. the variable measurement levels c. the data table role d. all of the above
Logit Score
A linear combination of the inputs generates a ______________ A. Logit Score B. Primary Outcome C. Secondary Input
Consequence
A profit value called a profit _________________ is assigned to both correct (and incorrect) outcome and decision combinations. The profit _____________ is the expected revenue (profit) or cost (loss or negative profit) that is associated with each outcome and decision combination. A. Consequence B. Increase C. Loss
Odds Ratio
An ______________ expresses the increase in primary outcome odds associated with a unit change in an input. It is obtained by exponentiating the parameter estimate of the input of interest. A. Odds Ratio B. Primary Ratio C. Explanatory Ratio
Interval
An _________________variable is similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. For example, suppose you have a variable such as annual income that is measured in dollars, and we have three people who make $10,000, $15,000 and $20,000. A. Nominal B. Ordinal C. Interval
Ordinal
An ____________variable is similar to a categorical variable. The difference between the two is that there is a clear ordering of the variables. For example, suppose you have a variable, economic status, with three categories (low, medium and high). In addition to being able to classify people into these three categories, you can order the categories as low, medium and high. A. Nominal B. Ordinal C. Interval
Binary
An example of a ___________________target variable is purchase or no-purchase, which is often used for modeling customer profiles. A. Interval B. Ordinal C. Binary
Interval
An example of an ________________target variable is value of purchase, which is useful for modeling the best customers for particular products, catalogs, or sales campaigns. A. Interval B. Ordinal C. Binary
Ordinal
An example of an _____________target is sales volume, which might contain a few discrete values such as low, medium, and high. A. Interval B. Ordinal C. Binary
Appends Data Sets
Append Node A. imports and external file. B. appends data sets.
performs Association or Sequence discovery.
Association Node A. performs Association or Sequence discovery. B. creates new datasets or views by combining columns from datasets.
Logistic
Attempts to predict the probability that a binary or ordinal target will acquire the event of interest as a function of one or more independent inputs. A. Linear B. Logistic
Linear
Attempts to predict the value of a continuous target as a ___________function of one or more independent inputs. A. Linear B. Logistic
automated tool to help find optimal configurations for a neural network model.
AutoNeural Node A. automated tool to help find optimal configurations for a neural network model. B. applies transformations to dataset variables and helps to equally distribute variables that are skewed.
Average Square Error
By changing the model Assessment Measure property to ___________________, you can construct what is known as a class probability tree. It can be shown that by doing so, you can minimize the imprecision of the tree. Analysts sometimes use this model assessment measure to select inputs for a flexible predictive model such as neural networks. A. Assessment Measure property B. Average Square Error C. Regression
Average Square Error
By changing the model assessment measure to _______________________, you can construct what is known as a class probability tree. It can be shown that this action minimizes the imprecision of the tree. Analysts sometimes use this model assessment measure to select inputs for a flexible predictive model such as neural networks. A. Average Square Error B. Misclassification
Standard Deviation
By default, the Neural Network node standardizes all input variables prior to the weight estimation step so that they have a mean of zero and a __________________ of one. This default setting is known as _____________________. A. Standard Deviation B. Irregular Deviation
Categorical
Chi-Squared logworth, Entropy and Gini evaluate split worth for which type of variables? A. Categorical B. Interval
performs observation clustering which can be used to segment databases.
Cluster Node A. performs Association or Sequence discovery. B. performs observation clustering which can be used to segment databases.
establishes a central connection to join multiple nodes.
Control Point Node A. used to examine segmented or clustered data and identify factors that differentiate data segments from the population. B. establishes a central connection to join multiple nodes.
cutoff node for binary target decisions.
Cutoff Node A. cutoff node for binary target decisions. B. provides several predictive modeling techniques using latent variables.
computes summary statistics using the DMDB procedure.
DMDB Node A. computes summary statistics using the DMDB procedure. B. performs observation clustering which can be used to segment databases.
fit an additive non-linear model using bucketed principal components as inputs.
DMNeural Node A. fit an additive non-linear model using bucketed principal components as inputs. B. computes a forward stepwise least squares regression optionally including 2-way interactions, AOV16 and group variables.
partitions data into test, training and validation tables.
Data Partition Node A. partitions data into test, training and validation tables. B. creates a sample dataset.
Indicates if the advisor should use sql pass-through to determine the measurement levels and compute the summary statistics.
Database Pass-Through A. Indicates if the advisor should use sql pass-through to determine the measurement levels and compute the summary statistics. B. Specify whether to apply the "Class levels count threshold" property.
empirical tree represents a segmentation of the data that is created by applying a series of simple rules.
Decision Tree Node A. empirical tree represents a segmentation of the data that is created by applying a series of simple rules. B. applies transformations to dataset variables and helps to equally distribute variables that are skewed.
used to create or modify decision data for building models based on the value of the decisions and/or prior probablities.
Decisions Node A. used to create or modify decision data for building models based on the value of the decisions and/or prior probablities. B. provides several predictive modeling techniques using latent variables.
Specify whether to apply the "Class levels count threshold" property.
Detect Class Levels A. Specify whether to mark variables with more than "Max. Missing Percent" as REJECTED. B. Specify whether to apply the "Class levels count threshold" property.
computes a forward stepwise least squares regression optionally including 2-way interactions, AOV16 and group variables.
Dmine Regression Node A. empirical tree represents a segmentation of the data that is created by applying a series of simple rules. B. computes a forward stepwise least squares regression optionally including 2-way interactions, AOV16 and group variables.
drops variables from the preceding training path.
Drop Node A. drops variables from the preceding training path. B. divides a set of input variables into disjoint or heirarchal clusters.
establishes an end point for group processing. Must be used with a start group.
End Groups Node A. establishes an end point for group processing. Must be used with a start group. B. establishes a central connection to join multiple nodes.
creates a new model by taking a function of posterior probabilities (for class targets) or the predicted values (for interval models) from multiple models.
Ensemble Node A. creates a new model by taking a function of posterior probabilities (for class targets) or the predicted values (for interval models) from multiple models. B. computes a forward stepwise least squares regression optionally including 2-way interactions, AOV16 and group variables.
False Ensemble models combine model predictions to form a single consensus prediction.
Ensemble models combine predictions from multiple models to create a multiple consensus predictions. A. True B. False
tool that illustrates the various UI elements that can be used by extention nodes.
Ext Demo A. tool that illustrates the various UI elements that can be used by extention nodes. B. establishes a central connection to join multiple nodes.
imports and external file.
File Import Node A. imports and external file. B. removes observations from data based upon specified criteria.
removes observations from data based upon specified criteria.
Filter Node A. Provides details about the variables used as input for data mining. B. removes observations from data based upon specified criteria.
Categorical
For _____________________ inputs, replace any missing values with the most frequent category. A. Interval B. Categorical
Decision Tree
For ____________models, the modeling algorithm automatically ignores irrelevant inputs, Other modeling methods must be modified or rely on additional tools to properly deal with irrelevant inputs. A. Regression B. Decision Tree C. Neural Network
Binary
For a ___________ target, the weight estimation process is driven by an attempt to maximize the log-likelihood function. A. Primary B. Interval C. Binary
Estimate
For a binary target, _______________ predictions are the probability of the primary outcome for each case. Primary outcome cases should have a high predicted probability; secondary outcome cases should have a low predicted probability. A. Estimate B. Approximation C. Real
All of the above
For a k-means clustering analysis, which of the following statements is true about input variables? a. Input variables should be limited in number and be relatively independent. b. Input variables should be of interval measurement level. c. Input variables should have distributions that are somewhat symmetric. d. Input variables should be meaningful to analysis objectives. e. All of the above
Logit Link Function
For binary prediction, any monotonic function that maps the unit interval to the real number line can be considered as a link. The _________________ is one of the most common, because it makes it easy to interpret the model. A. Regression Model B. Train Node C. Logit Link Function
True
For interval inputs, replace any missing values with the mean of the non-missing values. A. True B. False
High Performance Linear and Logistic regression models in the HP Env.
HP Regression A. generates principal components to be used as inputs for succesor nodes in the HP Env. B. High Performance Linear and Logistic regression models in the HP Env.
equals 2 × (ROC index - 0.5)
Gini coefficient (for binary prediction) ? A. identifies the expected revenues and expected costs for each decision alternative for each level of a target variable B. equals 2 × (ROC index - 0.5)
creates a series of decision trees by fitting the residual of the prediction from the earlier tree in the series.
Gradient Boosting Node A. creates a series of decision trees by fitting the residual of the prediction from the earlier tree in the series. B. computes a forward stepwise least squares regression optionally including 2-way interactions, AOV16 and group variables.
generates graphical reports and interactive graphs.
Graph Explore Node A. computes summary statistics using the DMDB procedure. B. generates graphical reports and interactive graphs.
performs clustering in the HP environment.
HP Cluster Node A. performs clustering in the HP environment. B. extracts the source code and score metadata to a folder. Must be preceded by a score node.
generates an identifier variable identifying the observations to be used for training and validation.
HP Data Partition Node A. generates an identifier variable identifying the observations to be used for training and validation. B. extracts the source code and score metadata to a folder. Must be preceded by a score node.
generates summary statistics for HP Env.
HP Explore Node A. generates summary statistics for HP Env. B. extracts the source code and score metadata to a folder. Must be preceded by a score node.
generates values for missing variables in the HP Env.
HP Impute A. generates values for missing variables in the HP Env. B. creates Generalized Linear models in the HP Env.
generates neural networks in the HP Env.
HP Neural A. generates neural networks in the HP Env. B. creates Generalized Linear models in the HP Env.
generates principal components to be used as inputs for succesor nodes in the HP Env.
HP Principal Components A. generates principal components to be used as inputs for succesor nodes in the HP Env. B. creates Generalized Linear models in the HP Env.
creates Support Vector Machine models in the HP Env.
HP SVM Node A. creates Support Vector Machine models in the HP Env. B. High Performance Linear and Logistic regression models in the HP Env.
creates variable transformations in the HP Env using high performance procedures.
HP Transform Node A. creates variable transformations in the HP Env using high performance procedures. B. High Performance Linear and Logistic regression models in the HP Env.
generates tree models in the HP Env.
HP Tree Node A. generates tree models in the HP Env. B. High Performance Linear and Logistic regression models in the HP Env.
select variables using high performance procedures.
HP Variable Selection A. select variables using high performance procedures. B. High Performance Linear and Logistic regression models in the HP Env.
The correct answer is c. You must calculate statistics for the models generated in each step of the selection process using both the training and validation data sets. When you study results and assess iterative plots you must compare the average squared error and the misclassification rate associated with different models. You use Validation Misclassification if your predictions are decision and use Validation Error if your predictions are estimates.
How can you optimize complexity for regression models when using SAS Enterprise Miner? a. Calculate statistics for the models generated in each step of the selection process using only the training data set. b. Focus primarily on the Validation data set when you study regression results and assess iterative plots. c. You can configure the Regression tool to adjust the Entry and Stay Significance Level properties to consider a richer class of models than a stepwise model that is based on the default settings. d. As a selection criteria, use Validation Error if your predictions are decision, and use Validation Misclassification if your predictions are estimates.
Polynomial combinations of the model inputs enable predictions to better match the true input/target association. Polynomial combinations of the model inputs enable predictions to better match the true input/target association and minimize the chances of overfitting.
How would you characterize the effects of adding polynomial combinations of the model inputs to a regression? a. Polynomial combinations of the model inputs enable predictions to better match the true input/target association. b. Polynomial combinations of the model inputs decrease the chances of overfitting. c. Polynomial combinations of the model inputs do not minimize prediction bias. d. Polynomial combinations of the model inputs enhance the interpretability of the predictions.
True
If a profit matrix is defined for a data source, the Model Comparison tool selects the model with the largest validation average profit by default. Otherwise, the Model Comparison tool selects the model with the smallest validation misclassification rate. A. True B. False
False Model Prediction estimates will be biased if the outcome proportions in the training sample and scoring populations do not match.
If the outcome proportions in the training sample and scoring populations do not match, model prediction estimates will not be biased. A. True B. False
Categorical
In Decision Tree Modeling, If the input is ____________________, the average value of the target is taken within each ____________________input level. The averages serve the same role as the unique interval input values in the discussion that follows. A. Categorical B. Interval
Logworth
In Decision Tree modeling for splits, At least one ______________ must exceed a threshold for a split to occur with that input. By default, this threshold corresponds to a chi-squared p-value of 0.20 or a ______________ of approximately 0.7. A. Leaf Node B. Chi-Square Estimate C. Logworth
Interval
In Decision Tree modeling, If the measurement scale of the selected input is ______________, each unique value serves as a potential split point for the data. A. Categorical B. Interval
Inputs
In Partial Least Squares (PLS) regression, the goal is to have linear combinations of the ________________ (called latent vectors) that account for variation in both inputs and the target. A. Inputs B. Outputs C. Targets
Logit
In Regression analysis, a linear combination of the inputs generates a ___________ score, the log of the odds of primary outcome. A. Logit B. Log C. Link Logit
Data Spliting
In predictive modeling, the standard strategy for honest assessment of model performance is ________________. A portion of the data source is used for fitting the model—the training data set. The rest of the data source (the validation data set and the test data set) is held out for empirical validation. A. Data Refinement B. Data Splitting C. Data Convergence
Good
In the Cumulative Lift chart, high values of cumulative lift suggest that the model is doing a ____________ job of separating the primary and secondary cases? A. Poor B. Good
Incremental Response Modeling.
Incremental Response Node A. Incremental Response Modeling. B. High Performance Linear and Logistic regression models in the HP Env.
Provides details about the variables used as input for data mining.
Input Data Node - A. removes observations from data based upon specified criteria. B. Provides details about the variables used as input for data mining.
groups variables values into classes that can be used as inputs for predictive modeling.
Interactive Binning Node A. drops variables from the preceding training path. B. groups variables values into classes that can be used as inputs for predictive modeling.
describes the ability of the model to separate the primary and secondary outcomes
Kolmogorov-Smirnov (KS) statistic? A. equals the percent of concordant cases plus one-half times the percent of tied cases B. describes the ability of the model to separate the primary and secondary outcomes
least angle regressions.
LARS node A. least angle regressions. B. computes a forward stepwise least squares regression optionally including 2-way interactions, AOV16 and group variables.
performs link analysis.
Link Analysis Node A. performs link analysis. B. generates graphical reports and interactive graphs.
creates a model for predicting binary and nominal targets, based on its k-closet neighbors from a training data set.
MBR Memory Based Reasoning Node A. least angle regressions. B. creates a model for predicting binary and nominal targets, based on its k-closet neighbors from a training data set.
performs market basket analysis for data with potential item taxonomy.
Market Basket Node A. performs link analysis. B. performs market basket analysis for data with potential item taxonomy.
Specify a maximum percentage of missing values for variables to be rejected. The default value is 50.
Missing Percentage Threshold A. Specify a maximum percentage of missing values for variables to be rejected. The default value is 50. B. If "Detect class levels"=Yes, interval variables with less than the number specified for this property will be marked as NOMINAL. The default value is 20.
compares model predictions from the prior modeling nodes and selects the best modeling candidate.
Model Comparison Node A. used to create or modify decision data for building models based on the value of the decisions and/or prior probablities. B. compares model predictions from the prior modeling nodes and selects the best modeling candidate.
enables you to import and assess a model that is not created with one of the EMiner modeling nodes.
Model Import Node A. enables you to import and assess a model that is not created with one of the EMiner modeling nodes. B. creates a model for predicting binary and nominal targets, based on its k-closet neighbors from a training data set.
Training
More data devoted to ___________ results in more stable predictive models but less stable model assessments. A. Test B. Validation C. Training
Validation
More data devoted to ______________ results in less stable predictive models but more stable model assessments. A. Test B. Validation C. Training
generates various plots and charts.
MultiPlot Node A. generates various plots and charts. B. performs market basket analysis for data with potential item taxonomy.
a class of flexible, nonlinear regression models, discriminate models, and data reduction models that are interconnected in a nonlinear dynamic system.
Neural Network Node A. a class of flexible, nonlinear regression models, discriminate models, and data reduction models that are interconnected in a nonlinear dynamic system. B. creates a model for predicting binary and nominal targets, based on its k-closet neighbors from a training data set.
False positive fraction
One of the most useful is ___________________, the proportion of secondary outcome cases that find their way into the top ranks of predicted cases. A. Sensitivity B. Response C. False positive fraction
Sensitivity
One quantity of interest for a selected fraction of cases is _________________, the proportion of primary outcome cases. A. Sensitivity B. Response
submit programs written in open source languages.
Open Source Integration Node A. submit programs written in open source languages. B. establishes a central connection to join multiple nodes.
provides several predictive modeling techniques using latent variables.
Partial Least Squares node A. a class of flexible, nonlinear regression models, discriminate models, and data reduction models that are interconnected in a nonlinear dynamic system. B. provides several predictive modeling techniques using latent variables.
analyze preprocessed web data.
Path Analysis Node A. analyze preprocessed web data. B. performs market basket analysis for data with potential item taxonomy.
Cumulative lift
Plotting ______________________ for all selection fractions yields a ____________________ chart. High values of ______________________ suggest that the model is doing a good job of separating the primary and secondary cases. A. Response B. Cumulative lift C. Combined lift
used to register a model to the SAS Metadata Server.
Register Model A. submit programs written in open source languages. B. used to register a model to the SAS Metadata Server.
Receiver operating characteristic
Plotting the trade-off between sensitivity and false positive fraction across all selected fractions of data creates a _____________________ curve. The curve that is generated by a given model can be compared to a diagonal line running from the lower left to the upper right of the ________________chart. This diagonal line represents the trade-off between sensitivity and false positive fraction for various-sized random selections of cases. The further a model's _________________ curve pushes upward and to the left, away from the diagonal line, the better it is in separating primary and secondary outcomes. A. Receiver operating characteristic B. Response C. Demand
Failure to adjust for separate sampling. The advantage of separate sampling is that you are able to obtain (on the average) a model of similar predictive power with a smaller overall case count. This is in concordance with the idea that the amount of information in a data set with a categorical outcome is determined not by the total number of cases in the data set itself, but instead by the number of cases in the rarest outcome category.
Prediction estimates reflect target proportions in the training sample, not the population from which the sample was drawn. Score Rankings plots are inaccurate and misleading. Decision-based statistics related to misclassification or accuracy misrepresent the model performance on the population. A. This is due to missing values in the data. B. Failure to adjust for separate sampling.
generates principal components to be used as inputs for succesor nodes.
Principal Components Node A. generates principal components to be used as inputs for succesor nodes. B. groups variables values into classes that can be used as inputs for predictive modeling.
equals the percent of concordant cases plus one-half times the percent of tied cases
ROC index (concordance) ? A. equals the percent of concordant cases plus one-half times the percent of tied cases B. provides a tally of the correct or incorrect prediction decisions
140% The answer is derived from the cumulative lift chart at a depth of 20%
Refer to the graphs shown on the two exhibit slides. The graphs are from a study of response rate to a marketing campaign. How much more likely are the top 20% of targeted respondents to purchase the product than a randomly selected sample? A. 60% B. 30% C. 140%
Specify a maximum number of levels for a class variable before being marked REJECTED. The default value is 20.
Reject Levels Count Threshold A. Specify a maximum number of levels for a class variable before being marked REJECTED. The default value is 20. B. If "Detect class levels"=Yes, interval variables with less than the number specified for this property will be marked as NOMINAL. The default value is 20.
Specify a maximum number of levels for a class variable before being marked REJECTED. The default value is 20.
Reject Levels Count Threshold A. Specify a maximum number of levels for a class variable before being marked REJECTED. The default value is 20. B. Specify whether to apply the "Class levels count threshold" property.
Specify whether to apply the "Reject levels count threshold" property.
Reject Vars with Excessive Class Values A. Specify whether to apply the "Reject levels count threshold" property. B. Specify whether to apply the "Class levels count threshold" property.
Specify whether to mark variables with more than "Max. Missing Percent" as REJECTED.
Reject Vars with Excessive Missing Values A. Specify whether to mark variables with more than "Max. Missing Percent" as REJECTED. B. If "Detect class levels"=Yes, interval variables with less than the number specified for this property will be marked as NOMINAL. The default value is 20.
Overfitting
Removing redundant or irrelevant inputs from a training data set often reduces __________________and improves prediction performance A. Underfitting B. Overfitting
replace specific values and unknown levels for class variables.
Replacement Node A. generates principal components to be used as inputs for succesor nodes. B. replace specific values and unknown levels for class variables.
generates a document for nodes in the process flow.
Reporter Node A. generates a document for nodes in the process flow. B. used to register a model to the SAS Metadata Server.
used to run sas code.
SAS Code Node A. used to run sas code. B. used to register a model to the SAS Metadata Server.
Multi-way Splits
SAS Enterprise Miner enables a multitude of variations of the default tree algorithm. One variation involves the use of __________________instead of binary splits. _________________affect decision trees in the following ways: trades tree height for tree width, complicates the split search, and uses heuristic shortcuts. A. Binary B. Nominal C. Multi-way splits
Stopped The name stopped training comes from the fact that the final model is selected as if training were stopped on the optimal iteration. Detecting when this optimal iteration occurs (while actually training) is somewhat problematic. To avoid stopping too early, the Neural Network tool continues to train until convergence on the training data or until reaching the maximum iteration count, whichever comes first.
SAS Enterprise Miner treats each iteration in the optimization process as a separate model. The iteration with the smallest value of the selected fit statistic is chosen as the final model. This method of model optimization is called ___________ training. A. Stopped B. Iterative C. Prediction
is a penalized likelihood statistic, which can be thought of as a weighted average square error
Schwarz Bayesian Criterion (SBC) / likelihood? A. measures the difference between the prediction estimate and the observed target value B. is a penalized likelihood statistic, which can be thought of as a weighted average square error
extracts the source code and score metadata to a folder. Must be preceded by a score node.
Score Card Export Node A. used to save data to a dataset or other specified location. B. extracts the source code and score metadata to a folder. Must be preceded by a score node.
applies the score code from the preceding training path of a dataset with a role of Score. Used to validate against training/validation data.
Score Node A. applies the score code from the preceding training path of a dataset with a role of Score. Used to validate against training/validation data. B. compares model predictions from the prior modeling nodes and selects the best modeling candidate.
used to examine segmented or clustered data and identify factors that differentiate data segments from the population.
Segment Profile Node A. used to examine segmented or clustered data and identify factors that differentiate data segments from the population. B. compares model predictions from the prior modeling nodes and selects the best modeling candidate.
generates summary and association statistics.
StatExplorer Node A. generates summary and association statistics. B. performs market basket analysis for data with potential item taxonomy.
Survival Data Mining.
Survival Node A. Survival Data Mining. B. High Performance Linear and Logistic regression models in the HP Env.
analyze the autocorrelation and crosscorrelation of the time series data.
TS Correlation A. analyze the autocorrelation and crosscorrelation of the time series data. B. High Performance Linear and Logistic regression models in the HP Env.
provides time series data cleaning, summarization, transformation and transpose.
TS Data Preperation Node A. provides time series data cleaning, summarization, transformation and transpose. B. High Performance Linear and Logistic regression models in the HP Env.
creates the classical seasonal decomposition of time series data.
TS Decomposition Node A. creates the classical seasonal decomposition of time series data. B. High Performance Linear and Logistic regression models in the HP Env.
reduce the dimension of time series using Discrete Wavelet Transform , Discrete Fourier Transform, Singular Value Decomposition or Line Segment Approximation.
TS Dimension Reduction A. creates the classical seasonal decomposition of time series data. B. reduce the dimension of time series using Discrete Wavelet Transform , Discrete Fourier Transform, Singular Value Decomposition or Line Segment Approximation.
generates forecasts by using exponetial smoothing models with optimized smoothing weights for many time series.
TS Exponetial Smoothing A. generates forecasts by using exponetial smoothing models with optimized smoothing weights for many time series. B. reduce the dimension of time series using Discrete Wavelet Transform , Discrete Fourier Transform, Singular Value Decomposition or Line Segment Approximation.
computes similiarity measures associated with time-stamped or time series data.
TS Similarity Node A. computes similiarity measures associated with time-stamped or time series data. B. reduce the dimension of time series using Discrete Wavelet Transform , Discrete Fourier Transform, Singular Value Decomposition or Line Segment Approximation.
The number of cluster centers With k selected, the k-means algorithm chooses cases to represent the initial cluster centers (also named seeds).
The 'k' in k means clustering represents what? A. The number of clusters. B. The number of cluster centers?
decision tree and neural network The Rule Induction tool makes predictions using a combination of rules (from decision trees) and a formula (from a neural network, by default).
The Rule Induction tool combines which of the following models? a. decision tree and neural network b. regression and decision tree c. regression and neural network
True
The Score tool creates score code modules in the SAS, C, and Java languages. These score code modules can be saved and used outside of SAS Enterprise Miner or outside of the SAS System. A. True B. False
True
The Score tool is used to score new data inside SAS Enterprise Miner and to create scoring modules for use outside of SAS Enterprise Miner. The Score tool adds predictions to any data source that has a role of Score. This data source must have the same inputs as the training data. A. True B. False
True
The Variable Selection tool 's chi-square approach is similar to a decision tree. The advantage of the tree-like approach is its ability to detect nonlinear and non-additive relationships between the inputs and the target. However, its method for handling categorical inputs makes it sensitive to spurious input/target correlations. A. True B. False
heuristic
The ___________ algorithm alternately merges branches and reassigns consolidated groups of observations to different branches. The process stops when a binary split is reached. Among all candidate splits considered, the one with the best worth is chosen. The ____________ algorithm initially assigns each consolidated group of observations to a different branch, even if the number of such branches is more than the limit allowed in the final split. At each merge step, the two branches are merged that degrade the worth of the partition the least. After the two branches are merged, the algorithm considers reassigning consolidated groups of observations to different branches. Each consolidated group is considered in turn, and the process stops when no group is reassigned. A. Association B. Heuristic C. Regression
Validation
The ______________ data set is used for monitoring and tuning the model to improve its generalization. The tuning process usually involves selecting among models of different types and complexities. The tuning process optimizes the selected model on the validation data. Consequently, a further holdout sample is needed for a final, unbiased assessment. A. Training B. Validation C. Test
average squared error Average square error is a fundamental statistical measure of model performance. It is calculated by averaging (across all cases in a data set) the squared difference between the actual and the predicted target values of the target variable, as shown in the equation below.
The ______________ measures the difference between the prediction estimate and the observed target value. A. Concordance B. Misclassification Rate C. average squared error
File Import
The _______________ tool enables you to convert selected external flat files, spreadsheets, and database tables into a format that SAS Enterprise Miner recognizes as a data source. A. Data Modify B. File Import
DMNeural
The _______________ tool is designed to provide a flexible target prediction using an algorithm with some similarities to a neural network. A multi-stage prediction formula scores new cases. The problem of selecting useful inputs is circumvented by a principle components method. Model complexity is controlled by choosing the number of stages in the multi-stage predictions formula. A. DMNeural B. Dmine Regression C. Rule Induction
Link Analysis
The _______________ tool is used to discover and examine connections between items in a complex system. The tool transforms data from different sources into a data model that can be graphed. Centrality measures are derived from the graph and the tool can perform item-cluster detection for certain types of data. Recommendation tables can also be provided for transactional input data. A. Sector Analysis B. Data Analysis C. Link Analysis
Rule Induction The Rule Induction algorithm has three steps. 1. Using a decision tree, the first step attempts to locate "pure" concentrations of cases. These are regions of the input space containing only a single value of the target. The rules identifying these pure concentrations are recorded in the scoring code, and the cases in these regions are removed from the training data. The two-color example data does not contain any "pure" regions, so the step is skipped. 2.The second step attempts to filter easy-to-classify cases. This is done with a sequence of binary target decision trees. The first tree in the sequence attempts to distinguish the most common target level from the others. Cases found in leaves correctly classifying the most common target level are removed from the training data. Using this revised training data, a second tree is built to distinguish the second most common target class from the others. Again, cases in any leaf correctly classifying the second most common target level are removed from the training data. This process continues through the remaining levels of the target. With a binary target, no cases will remain in the training data after this step, so the Rule Induction node is essentially a decision tree.
The ________________ tool combines decision tree and neural network models to predict nominal targets. It is intended to be used when one of the nominal target levels is rare. New cases are predicted using a combination of prediction rules (from decision trees) and a prediction formula (from a neural network, by default). A. Rules Node B. Rule Induction B. Rule Inclusion
Number of Surrogate
The __________________Rules, under the Node property, specifies the maximum number of surrogate rules that are sought in each non-leaf node. A surrogate rule is a backup to the main splitting rule. When the main splitting rule relies on an input whose value is missing, the first surrogate rule is invoked. If the first surrogate rule also relies on an input whose value is missing, the next surrogate rule is invoked. If missing values prevent the main rule and all of the surrogate rules from applying to an observation, then the main rule assigns the observation to the branch that it designated as receiving missing values. A. Number of Surrogate B. Data Detection C. Linear Regression
Variable
The ___________________Selection tool provides selection based on one of two criteria: the R-square variable selection criterion or the chi-square selection criterion. A. Model B. Variable C. Input
Dmine Regression The main distinguishing feature of Dmine regression versus traditional regression is its grouping of categorical inputs and binning of continuous inputs. • The levels of each categorical input are systematically grouped together using an algorithm reminiscent of a decision tree. Both the original and grouped inputs are made available for subsequent input selection. • All interval inputs are broken into a maximum of 16 bins in order to accommodate nonlinear associations between the inputs and the target. The levels of the maximally binned interval inputs are grouped using the same algorithm for grouping categorical inputs. These binned-and-grouped inputs and the original interval inputs are made available for input selection.
The ____________________ tool is designed to provide a regression model with more flexibility than a standard regression model. It should be noted that with increased flexibility comes increased chances of overfitting. A. Rule Induction B. Regression Node C. Dmine Regression
Curse of dimensionality
The ______________________ refers to the exponential increase in data required to densely populate space as the dimension increases. For example, the eight points fill the one-dimensional space but become more separated as the dimension increases. In 100-dimensional space, they would be like distant galaxies. The ______________________ limits your practical ability to fit a flexible model to noisy data (real data) when there are a large number of input variables. A densely populated input space is required to fit highly complex models. In assessing how much data is available for data mining, the dimension of the problem must be considered. A. Profiling B. Curse of dimensionality C. Explanation
Model Comparison
The ______________________ tool collates statistics from different modeling nodes for easy comparison. The output below shows some of the statistics that the _______________________ tool generates, such as the misclassification rate, the average squared error, the ROC index, and the Kolmogorov-Smirnov statistic. A. Fit Statistics B. Model Comparison C. Data Explorer
AutoNeural
The ____________________tool ignores decision processing data. Predictions from the ______________ tool are adjusted for prior probabilities, but its actual model selection process is based strictly on misclassification (without prior adjustment). When the primary and secondary outcome proportions are not equal, this process can lead to unexpected prediction results. A. AutoNeural B. Regression C. Neural Network
Bonefferi
The ___________________correction is an adjustment made to P values when several dependent or independent statistical tests are being performed simultaneously on a single data set. To perform a ________________correction, divide the critical P value (α) by the number of comparisons being made. A. Bonefferi B. Cluster C. Ksax
Maximum
The ________________function takes the maximum of the prediction estimate from the different models as the prediction from the Ensemble tool. A. Average B. Maximum
Dimension
The ________________of a problem refers to the number of input variables that are available for creating a prediction. Data mining problems are often massive in ________________. Therefore, predictive models must have a means of selecting useful inputs from a potentially vast number of candidates. Predictive modeling methods such as decision trees, regressions, and neural networks have built-in mechanisms for ______________reduction. A. Dimension B. Logistics C. Collinearity
Ensemble
The ________________tool creates a new model by combining the predictions from multiple models. For prediction estimates and rankings, this combination is usually done by averaging. When the predictions are decisions, this combination is done by voting. The commonly observed advantage of ______________ models is that the combined model provides a better prediction than the individual models that compose it. It is important to note that the _______________ model can be more accurate than the individual models only if the individual models disagree with one another. You should always compare the model performance of the _______________ model with the individual models. Using an _____________ model, you can combine predictions from multiple models to create a single consensus prediction. A. Probability B. Score C. Ensemble
Split
The ___________search starts by selecting an input for partitioning the available training data. If the measurement scale of the selected input is interval, each unique value serves as a potential _________ point for the data. If the input is categorical, the average value of the target is taken within each categorical input level. The averages serve the same role as the unique interval input values in the discussion that follows. A. Explanatory B. Data C. Split
True
The analysis role of each variable tells SAS Enterprise Miner the purpose of the variable in the current analysis. The measurement level of each variable distinguishes continuous numeric variables from categorical variables. The analysis role of the data set tells SAS Enterprise Miner how to use the selected data set in the analysis. A. True B. False
Hidden Units
The basic building blocks of multilayer perceptrons are called ____________. ______________ are modeled after the neuron. Each ______________ receives a linear combination of input variables. The coefficients are called the (synaptic) weights. An activation function transforms the linear combinations and then outputs them to another unit that can then use them as inputs. A. Linear Units B. Regression Nodes C. Hidden Units
Gini Index
The basis for calculating worth in the ___________ method is the proportion of individual class outcomes in the cases (for all cases and for the weighted subsets of the cases). Proportions are squared and then summed across classes. Worth is calculated as the difference between one and the sum of the squares of proportions. A. Entropy B. Gini Index C. Variance
Variance
The basis for calculating worth with the _________ method is the difference between the target value for each case outcome (in the node or subset) and the average of the target values for that case outcome (in the node or subset). These differences are squared and then summed to calculate worth. A. Entropy B. Gini Index C. Variance
Entropy
The basis for calculating worth with the ___________method is the product of the proportion of individual class outcomes in the cases (for all cases and for the weighted subsets of the cases) and the log (base 2) of the proportion of individual class outcomes in the cases (for all cases and for the weighted subsets of the cases). This product is summed over all classes to calculate worth. A. Entropy B. Gini Index C. Variance
Split Search
The first part of the algorithm is called the _______________. The _______________ starts by selecting an input for partitioning the available training data. A. Split Search B. Random Search C. Allocated Search
DMine
The main distinguishing feature of ____________ regression versus traditional regression is its grouping of categorical inputs and binning of continuous inputs. A. Logistic B. Linear C. DMine
True
The metadata definition serves three primary purposes for a selected data set. It informs SAS Enterprise Miner of the following: the analysis role of each variable the measurement level of each variable the analysis role of the data set A. True B. False
Missing Values
The most prevalent problem for neural networks is __________________. Like regressions, neural networks require a complete record for estimation and scoring. Neural networks resolve this complication in the same way that regression does: by imputation. A. Missing Values B. Incomplete Case Structures C. Data Incompatibilities
True
The primary advantage of separate sampling is a reduction in the number of cases required to build a model (with little reduction in model quality). A. True B. False
Test
The______________ data set has only one use: to give a final honest estimate of generalization. Consequently, cases in the __________ set must be treated just as new data would be treated. Therefore, they cannot be involved in the determination of the fitted prediction model. A. Test B. Validation C. Training
Weights
Then you use the Decision _____________tab in the Decision Processing window to specify the values in the profit matrix. A. Nodes B. Estimates C. Weights
Inputs
This term is also known as predictors, features explanatory variable or independent variables? A. Targets B. Inputs
Target
This term is also known as response, outcome or dependent variables? A. Target B. Inputs
Categorical
To decrease a model's degrees of freedom and prevent model overfitting, you can consolidate _________________inputs . First, each level of a _________________variable is transformed into a numeric value using dummy variables. Because _________________variables can have many levels, this can result in the addition of many variables to the model. To reduce the degrees of freedom used by the model, several levels of the _________________variable can be assigned to a single dummy variable. A. Nominal B. Ordinal C. Categorical
applies transformations to dataset variables and helps to equally distribute variables that are skewed.
Transform Variables Node A. enables the user to manually build if-then-else rules. B. applies transformations to dataset variables and helps to equally distribute variables that are skewed.
model a class and interval target variable. The interval target is usually the value that is associated with a level of the class target.
TwoStage Node A. model a class and interval target variable. The interval target is usually the value that is associated with a level of the class target. B. provides several predictive modeling techniques using latent variables.
Accommodate
Unlike standard regression models, neural networks easily accommodate nonlinear and nonadditive associations between inputs and target. In fact, the main challenge is over-accommodation, that is, falsely discovering nonlinearities and interactions. A. Accomodate B. Reject
Expected
Using a completed profit matrix, SAS Enterprise Miner can calculate the ______________profit associated with each action (decision). The ____________profit is equal to the sum of the outcome/action profits multiplied by the outcome probabilities. A. Expected B. Average C. Summarized
divides a set of input variables into disjoint or heirarchal clusters.
Variable Clustering Node A. generates summary and association statistics. B. divides a set of input variables into disjoint or heirarchal clusters.
provides a tool to reduce the number of input variables using R-square, and Chi-Square selection criteria and so on.
Variable Selection Node A. provides a tool to reduce the number of input variables using R-square, and Chi-Square selection criteria and so on. B. divides a set of input variables into disjoint or heirarchal clusters.
Interval
Variance and ProbF logworth evaluate split worth for which type of variables? A. Categorical B. Interval
Model implementation is the process of applying your model to data other than your training data. Model implementation is the process of applying your model to score data in order to make predictions.
What is model implementation? a. Model implementation is the process of saving your predictive model as an xml file. b. Model implementation is the process of refining your model to more accurately make predictions. c. Model implementation is the process of applying your model to data other than your training data. d. Model implementation in the process of deploying your model to other users.
DMNeural
What prediction tool provides these features? Up to three PCs with highest target R square are selected. One of eight continuous transformations are selected and applied to selected PCs. The process is repeated three times with residuals from each stage. A. DMNeural B. Neural Network C. Auto Neural
Concordance
When a pair of primary and secondary cases is correctly ordered, the pair is said to be in ___________________. A. Discordance B. Concordance
Discordance
When a pair of primary and secondary cases is incorrectly ordered, the pair is said to be in ___________________. A. Discordance B. Concordance
Logit
When the target variable is binary, as in the demonstration data, the main neural network regression equation receives the same __________ link function featured in logistic regression. As with logistic regression, the weight estimation process changes from least squares to maximum likelihood. A. data source B. Logit C. Processing
False SAS scores the data internally in SAS EMiner by making a copy of the scored data table on the SAS Foundation Server assigned to your project.
When you score data internally in SAS Enterprise Miner, a copy of the scored data table is not stored on the SAS Foundation Server assigned to your project. If the data table to be scored is very large, you might want to consider scoring the data outside of SAS Enterprise Miner using score code modules. A. True B. False
Memory-Based Reasoning The Memory-Based Reasoning (MBR) tool uses a k-nearest neighbor algorithm to categorize or predict observations.
Which modeling tool uses a k-nearest neighbor algorithm to categorize or predict observations? a. DMNeural b. Dmine Regression c. Memory-Based Reasoning
You use the Score tool to apply a predictive model to scoring data, and you can use the SAS Code tool to save the scored data table to a different SAS library. The Model Implementation tool, the Java Code tool, and the C Code tool do not exist.
Which of the following SAS Enterprise Miner tools might you use to perform model implementation? a. the Score tool and the SAS Code tool b. the Score tool, the SAS Code tool, the Java Code tool, and the C Code tool c. the Score tool and the Model Implementation tool d. all of the above
Neural networks are universal approximators. Neural networks have no internal, automated process for selecting useful inputs.
Which of the following are true about neural networks in SAS Enterprise Miner? a. Neural networks are universal approximators. b. Neural networks have no internal, automated process for selecting useful inputs. c. both a & b. d. Neural networks cannot model nonlinear relationships.
Another benefit is ease in model interpretation. A cost associated with "regularizing" the input distributions using a simple transformation is difficulty in model interpretation.
Which of the following is not a good reason to"regularize" input distributions using a simple transformation? a. Regression models are sensitive to extreme or outlying values in the input space. b. When you perform regression, inputs with highly skewed or highly kurtotic distributions can be selected over inputs that would yield better overall predictions. c. One benefit is improved model performance. d. Another benefit is ease in model interpretation.
A prediction estimate for the target variables is formed from a simple linear combination of the inputs.
Which of the following is not true about logistic regression? a. A prediction estimate for the target variables is formed from a simple linear combination of the inputs. b. The model predicts the probability of a particular level of the target variable at the given values of the input variables. c. The logit link function is one of the most common ways to make predictions, because it makes it easy to interpret the model. d. Two ways to interpret a ligistic regression model are an odds ratio and a doubling amount.
Increasing a model's degrees of freedom increases the chances of a model overfitting.
Which of the following is true about collapsing categorical inputs? a. A single categorical input can vastly decrease a model's degrees of freedom. b. Increasing a model's degrees of freedom increases the chances of a model overfitting. c. Like levels should not be collapsed into one numeric variable to reduce the number of variables needed to represent the categorical variable. d. If an input has a small number of levels, it is more efficient for you to use an autonomous method to consolidate the levels of a variable.
Score data contains the same input variables as training data, but the target variables might be different or missing.
Which of the following statements about the data table that you use for scoring is true? a. Score data contains the same target variable as training data, but the input variables might be different. b. Score data contains the same input variables as training data, but the target variables might be different or missing. c. Score data contains none of the same variables as training data. d. Score data must contain all of the same variables as training data, including input variables and target variables.
Profit
You can enhance the process of model selection by using _______________as a measure of model performance. _____________ is used to weight outcomes to emphasize or de-emphasize their importance in generating statistical measures. A. Response B. Loss C. Profit
None of the above. Typically, you do not need to alter any of the default settings within the Data Source Wizard when you create a data source for your scoring data. Usually, you want to apply your predictive model to the data without making changes to the data first. Partitioning data into training, validation, and test subsets is a task you perform on your training data, but not on your scoring data. Review: Introduction
Which of the following statements about the score data source is true? a. You often partition your score data into training, validation, and test subsets. b. You often have to alter the variable roles, decision processing settings, and profit matrix specifications for your scoring data before you add it to your process flow. c. Both a and b are true. d. None of the above.
Train
You grow a decision tree automatically by using the __________ Node function in the Interactive Decision Tree. A. Train B. Data C. Regression
AutoNeural
You can use the ______________ tool to automatically explore alternative network architectures and hidden unit counts. The ________________ tool conducts limited searches in order to find better network configurations. There are several options that the tool uses to control the algorithm. A. Neural Network B. AutoNeural C. Data Partitioning
They are performed to reduce the bias in model predictions.
Which statement below is true about transformations of input variables in a regression analysis? a. They are never a good idea. b. They help model assumptions match the assumptions of maximum likelihood estimation. c. They are performed to reduce the bias in model predictions. d. They are typically
Cutoff Value
Within the replacement node, you can control the number of standard deviations by selecting the ___________________ property. A. Bin B. Cutoff Value C. Inputs
Assessment Measure
You use the __________property to specify the method that you want to use to select the best tree, based on the validation data. The options for this property include the following: Decision (profit), Average Square Error, Misclassification, and Lift. Misclassification, Lift, and specialized pruning metrics should be used sparingly. A. Assessment Measure B. Average Square Error C. Regression
Estimate
____________ predictions approximate the expected value of the target, conditional on input values. For cases with numeric targets, this number can be thought of as the average value of the target for all cases having the observed input measurements. For cases with categorical targets, this number might equal the probability of a particular target outcome. A. Ranking B. Decision C. Estimate
Linear Linear regression predicts values by fitting a linear equation to available data. If you plot the data, linear regression draws a line through the values. The line represents the predicted value of Y for each value of the input X.
_____________ regression is used by the Regression tool if you have a continuous response (interval measurement level). A. Logistic B. Linear
Sensitivity Charts
______________ contrast the sensitivity statistic (the ability to detect primary outcome cases) versus the selection fraction or false positive fraction. A. Response Rate Charts B. Sensitivity Charts
Predictive
______________ modeling tries to find good rules for predicting the values of one or more variables in a data source from the values of other variables in the data source. After a good rule has been found, it can be applied to new data sources that might or might not contain the variable(s) that are being predicted. A. Regression B. Decision Tree C. Predictive
Simple
______________ models (linear regression, logistic regression) are easy to interpret, but linear predictions might lead to prediction bias. A. Complex B. Simple
Stepwise
______________ selection combines elements from both the forward and backward selection procedures. The method begins in the same way as the forward procedure, sequentially adding inputs with the smallest p-value below the entry cutoff. After each input is added, however, the algorithm reevaluates the statistical significance of all included inputs. If the p-value of any of the included inputs exceeds the stay cutoff, the input is removed from the model and reentered into the pool of inputs that are available for inclusion in a subsequent step. The process terminates when all inputs available for inclusion in the model have p-values in excess of the entry cutoff and all inputs already included in the model have p-values below the stay cutoff. A. Forward B. Backward C. Stepwise
Backward
______________ selection creates a sequence of models of decreasing complexity. The sequence starts with a saturated model, which is a model that contains all available inputs, and therefore, has the highest possible fit statistic. Inputs are sequentially removed from the model. At each step, the input chosen for removal least reduces the overall model fit statistic. This is equivalent to removing the input with the highest p-value. The sequence terminates when all remaining inputs have a p-value that is less than the predetermined stay cutoff. A. Forward B. Backward C. Stepwise
Accuracy
_______________ measures the fraction of cases where the decision matches the actual target value. A. Misclassification B. Accuracy
Unsupervised
________________ classification (also known as clustering and segmenting) attempts to group training data set cases based on similarities in input variables. A. Supervised B. Unsupervised
Synthetic Distribution
________________ methods use a "one size fits all" approach to handle missing values. Any case with a missing input measurement has the missing value replaced with a fixed number. The net effect is to modify an input's distribution to include a point mass at the selected fixed number. The location of the point mass in ____________________ methods is not arbitrary. Ideally, it should be chosen to have minimal impact on the magnitude of an input's association with the target. With many modeling methods, this can be achieved by locating the point mass at the input's mean value. A. Synthetic Distribution B. Equal Distribution C. Estimate Distribution
Input
________________ variables with extreme distributions can diminish the predictive power of regression models. Transformations of these variables to a less extreme, more symmetric form is recommended in most cases. A. Simple B. Input C. Output
Sample, Sampling
_________________ is recommended for extremely large databases because it can significantly decrease model training time. If the ______________is sufficiently representative, relationships found in the ________________can be expected to generalize to the complete data set. The ____________ tool writes the ___________ observations to an output data set. It saves the seed values that are used to generate the random numbers for the ___________ so that you can replicate the ________________. A. Sample B. Stratification
Discordance
__________________ measures the fraction of primary-target cases with predicted score lower than the predicted score secondary-target cases. A. Concordance B.Discordance
Estimates
_______________predictions approximate the expected value of the target, conditioned on the input values. For cases with numeric targets, this number can be thought of as the average value of the target for all cases having the observed input measurements. A. Decisions B. Ranking C. Esitmates
Misclassification
____________________ measures the fraction of cases where the decision does not match the actual target value. A. Misclassification B. Accuracy
Forward
____________________ selection creates a sequence of models of increasing complexity. The sequence starts with the baseline model, a model predicting the overall average target value for all cases. The algorithm searches the set of one-input models and selects the model that most improves on the baseline model. It then searches the set of two-input models that contain the input selected in the previous step and selects the model showing the most significant improvement. By adding a new input to those selected in the previous step, a nested sequence of increasingly complex models is generated. The sequence terminates when no significant improvement can be made. A. Forward B. Backward C. Stepwise
Estimate
_____________________ methods provide tailored imputations for each case with missing values. This is done by viewing the missing value problem as a prediction problem. You can train a model to predict an input's value from other inputs. Then, when an input's value is unknown, you can use this model to predict or estimate the unknown missing value. This approach is best suited for missing values that result from a lack of knowledge about values that have no match or are not disclosed. The ______________ method is not appropriate for not-applicable missing values. A. Synthetic B. Equal C. Estimate
Response Rate Charts
_____________________ plot the proportion of primary outcome cases selected (or transformations of this statistic) versus the selection fraction. A. Response Rate Charts B. Sensitivity Charts
Memory-based reasoning
_______________________ is a process that identifies similar cases and applies the information that is obtained from these cases to a new record. In SAS Enterprise Miner, the ____________________ tool uses a k-nearest neighbor algorithm to categorize or predict observations. A. DMNeural B. Rule Induction C. Memory-based reasoning
Decisions
_____________________usually are associated with some type of action (such as classifying a case as a donor or a non-donor). For this reason, ____________are also known as classifications. ______________prediction examples include handwriting recognition, fraud detection, and direct mail solicitation. A. Decisions B. Ranking C. Estimates
Model Model Implementation is the process of applying a predictive model to data that lacks a target variable in order to make predictions.
________________implementation is the process of applying your chosen predictive model to score data. A. Model B. Training C. Validation
Concordance
_______________measures the fraction of primary-target cases with predicted score that exceeds the predicted score secondary-target cases. A. Concordance B. Discordance
Ranking
_____________predictions order cases based on the input variables' relationships with the target variable. Using the training data, the prediction model attempts to ________high value cases higher than low value cases. A. Decisions B. Ranking C. Estimates
Ranking
____________predictions order cases based on the inputs' relationships with the target. Using the training data, the prediction model attempts to __________ high value cases higher than low value cases. It is assumed that a similar pattern exists in the scoring data so that high value cases will have high scores. The actual scores produced are inconsequential; only the relative order matters. The most common example of a ____________ prediction is a credit score. A. Ranking B. Decision C. Target
measures the difference between the prediction estimate and the observed target value
average squared error? A. measures the difference between the prediction estimate and the observed target value b. is a penalized likelihood statistic, which can be thought of as a weighted average square error
identifies the expected revenues and expected costs for each decision alternative for each level of a target variable
profit or loss? A. describes the ability of the model to separate the primary and secondary outcomes B. identifies the expected revenues and expected costs for each decision alternative for each level of a target variable
accuracy or misclassification
provides a tally of the correct or incorrect prediction decisions A. profit or loss B. accuracy or misclassification
Model implementation is the process of applying your model to data other than your training data. Model implementation is the process of applying your model to score data in order to make predictions. Review: What Is Model Implementation?
What is model implementation? a. Model implementation is the process of saving your predictive model as an xml file. b. Model implementation is the process of refining your model to more accurately make predictions. c. Model implementation is the process of applying your model to data other than your training data. d. Model implementation in the process of deploying your model to other users.
The number of cases required to build a model is reduced, with little reduction in model quality. The main advantage of separate sampling is that the number of cases required to build a model is reduced, with little reduction in model quality. Review: What is Separate Sampling?
What is the main advantage of separate sampling? a. Analysis results reflect the proportions of primary and secondary outcomes in the population of data. b. The number of cases required to build a model is reduced, with little reduction in model quality. c. Larger sample sizes yield more accurate results. d. both a and b
Data Source
A ________________ is a link between SAS Enterprise Miner and a SAS table. It contains metadata that informs SAS Enterprise Miner of all the information about the data set that is required for the analysis project. A. Data Source B. Control Node C. Analytic Workflow
Data Source
A __________________ is a metadata definition that provides SAS Enterprise Miner with information about a SAS data set or SAS table. A. Control Node B. Data Source C. Explorer Window
overfitting The maximal tree represents the most complicated model you are willing to construct from a set of training data. To avoid potential overfitting, many predictive modeling procedures offer some mechanism for adjusting model complexity. For decision trees, this process is known as pruning. Review: Introduction
Decision Tree models use pruning to adjust model complexity and avoid the potential problem known as what? a. overfitting b. accuracy c. concordance d. misclassification
Sample, Explore, Modify, Model, and Assess.
What is the SEMMA architecture?
A project can contain one or more diagrams. A diagram can contain one or more process flows. A process flow contains multiple nodes. Projects contain diagrams, diagrams contain process flows, and process flows contain nodes. Remember that a node is a SAS Enterprise Miner tool. Review: What Is a Project?
Which of the following correctly describes the hierarchical organization of an analysis within SAS Enterprise Miner? a. A project can contain one or more diagrams. A diagram is composed of multiple nodes. A node is composed of multiple process flows. b. A project can contain one or more process flows. A process flow can contain one or more diagrams. c. A project can contain only one diagram, which is composed of one process flow. A process flow can contain multiple nodes. d. A project can contain one or more diagrams. A diagram can contain one or more process flows. A process flow contains multiple nodes.
Fit Statistics can provide information that affects decision predictions but does not affect estimate predictions. Fit Statistics can provide information that effect both decision predictions and estimate predictions. If the decision predictions are of interest, model fit can be judged by misclassification. If estimate predictions are the focus, model fit can be assessed by average square error. Review: Interpreting the Results of the Regression Tool
Which of the following is not true about results produced by the Regression node? a. Variable Summary information identifies the roles of variables used by the Regression node. b. Model Information provides you with information that includes the number of target categories and the number of model parameters. c. Type 3 Analysis of Effects provides you with information about the number of parameters that each input contributes to the model. d. Fit Statistics can provide information that affects decision predictions but does not affect estimate predictions.
none of the above All of these statements about working with profit matrices are true. Review: About Profit Matrices
Which of the following statements about working with profit matrices is false? a. Profit values can be random and can vary between cases. b. When you incorporate a profit matrix into your analysis, the Model Comparison tool uses average profit to compare model performance. c. To specify a profit matrix, you use the Decision Processing window. d. none of the above
Sequence
______________ analysis is a type of association analysis that analyzes the order in which services or products are acquired. A. Market Basket B. Sequence C. Explanatory
Market Basket
_________________ analysis (also known as association rule discovery or affinity analysis) is a popular data mining method for exploring associations between items. A. Profiling B. Market Basket C. Association
Novelty Detection
_________________ methods seek unique or previously unobserved data patterns. These methods are used in business, science, and engineering. Business applications include fraud detection, warranty claims analysis, and general business process monitoring. A. Profiling B. Data Reduction C. Novelty Detection
51 one weight for each input, per hidden unit. Each hidden unit has a bias. The output layer has a bias and a weight for each hidden unit. This equates to (3*10) + 10 + 1 + 10 =51
A multiplayer perceptron neural network is using three interval inputs to model one interval target(outcome). The neural network has ten hidden units and one hidden layer. How many weights, including the biases are being estimated? A. 51 B. 50 C. 41 D. 40
0.69/b1 The logistic regression equation (for two input variables) in general can be represented below: Log(p/1-p) = b0 +b1 X1 + b2 X2 A doubling amount gives the amount of change the required in each of the input variables for doubling the primary outcome odds. It is equal to log2 (or about 0.69) divided by the parameter estimate of the input of interest.
A useful concept in logistic regression is the doubling amount. How would you calculate doubling amount for an input variable that has a parameter estimate of b1? A. 2*b1 B. 2*log(b1) C. 0.69/b1 D. 2/log(b1)
Association
An ________________ rule is a statement of the form (item set A) => (item set B). The goal of the analysis is to determine the strength of all the association rules among a set of items. The value of the generated rules is gauged by confidence, support, and lift. A. Lift B. Association C. Explanatory
57 % The confidence that the purchase of A implies the purchase of B = (total of purchase of A and B together) / (total purchases of A) = 100 / 175 = 57%
An analyst is performing a market basket analysis (affinity analysis) on the purchasing of shaving cream and seltzer water. The purchase data from a set of 250 customers is shown in the image: What is the confidence of the rule "Shaving Cream implies Seltzer Water"? A. 40% B. 57% C. 60%
Nominal The reason is that each level of the variable simply represents the fact that they are different from each other. No level is greater or smaller than the other level. These levels do not follow any natural order. Therefore, the underlying scale of measurement is nominal in nature.
Assume a variable is coded as: 1=unmarried 2= married 3=divorced 4=widowed. Which measurement levels should be selected in SAS EMiner for this variable? A.Unary B.Nominal C.Ordinal D.Interval
Gini coefficient
Assume in a data mining project that the task is to predict rankings of a target variable as accurately as possible. Which of the following should be used to judge prediction models? Choose one: KS statistic average squared error misclassification Gini coefficient
Stratify Stratification will ensure that the segment scheme values will be distributed evenly between the partitions.
Assume that a company has an excellent customer segmentation in place and the segment scheme is a variable in the input data set. What is the best partition method that one should use? A. Cluster B. Stratify C. Random D. Systemic
Sample Method: Stratify and Criterion: Equal The binary target variable will require a Stratify sample method. If equal is selected, the node samples the same number of observations from each stratum.
Assume the Target has an event proportion of 2% in the original data. Which of the following property values should be used in the Sample node of SAS Eminer to create a sample form that data with a balanced 50/50 split for Target? A. Sample Method: Stratify and Criterion: Porportional B. Sample Method: Stratify and Criterion: Equal C. Sample Method: Random and Criterion: Proportional
Each unique value has the potential of being the optimal split point. Split Point Defintion: portioning the available training data. If the measurement scale of the selected input is interval, each unique values serves as a potential split point for the data.
Choose the correct statement that illustrates Decision Tree Split Search for continuous (interval) inputs: A. The variable goes through a non-linear transformation, and the transformed variable is used for testing. B. The variable goes through a binning process, the bins are weighted based on the proportion of events in each bin, and then finally tested as an optimal split point. C. Each unique value has the potential of being the optimal split point.
the highest logworth The best split for an input is the split that yields the highest logworth. Review: Split Search
The best split for an input is a split that yields what? a. a maximal tree b. a contingency table c. a depth adjustment d. the highest logworth
Each unique value has the potential of being the optimal split point. Split Point definition: portioning the available training data. If the measurement scale of the selected input is interval, each unique value serves as a potential split point in the data.
Choose the correct statement that illustrates Decision Tree Split Search for continuous (interval) inputs: A. The variable goes through a non-linear transformation, and the transformed variable is used for testing. B. The variable goes through a binning process, the bins are weighted based on the proportion of events in each bin, and then finally tested as an optimal split point. C. Each unique value has the potential of being the optimal split point. D. Each unique value has the potential of being the optimal split point, except for the extreme observation.
Segment
For higher dimension data when clustering (4+ variables), you can use the ______________ Profile tool to understand the generated partitions. This tool enables you to compare the distribution of a variable in an individual _______________ to the distribution of the variable overall. As a bonus, the variables are sorted by how well they characterize the ______________. A. Sourcing B. Profiling C. Segment
It uses a squared correlation and then a stepwise regression to eliminate irrelevant inputs. When you use the R-square variable selection criteria, a two-step process is followed. 1. SAS EMiner computes the squared correlation for each variableand then assigns the Rejected role to those variables that have a value less than the squared correlation criterion. (The default is 0.005) 2. SAS EMiner evaluates the remaining (not rejected) variables using a forward stepwise R-square regression. Variables that have a stepwise R-Square improvement less than the threshold criterion (default=0.0005) are assigned the Rejected role.
For the Variable Selection node, which statement describes the R-squared variable selection criteria? A. It uses a chi-squared Decision Tree with no Bonferoni adjustment to select the relevant inputs. B. It is similar to a decision tree algorithm in being able to detect nonlinear and non-additive relationsships between inputs and the target. C. It looks for a set of collinear inputs that correlate with the target. D. It uses a squared correlation and then a stepwise regression to eliminate irrelevant inputs.
all of the above Using SAS Enteprise Miner, you can save your score code as a score code module in SAS code, C code, or as Java code. Review: Introduction
How does SAS Enterprise Miner enable you to save your score code? a. as SAS code b. as C code c. as Java code d. all of the above
Two hidden layers In a two-layer MLP, the inputs are fed into the input layer and are multiplied by the interconnection weights as they are passed from the input layer to the first hidden layer. Within the first hidden layer, they get summed and then processed by a nonlinear function(usually the hyperbolic tangent). As the processed data leaves the first hidden layer, again it gets multipled by the interconnection weights and then summed and processed by the second hidden layer. Finally, the data is multipled by the interconnection weights and processed one last time within the output layer to produce the neural network output. Thus, the combination of the two hidden layers enables you to model complex and discontinuous relationships among the inputs and the target.
How many hidden layers are generally need in a MLP-based neural network to capture a discontinuous relationship between inputs and target? A. no hidden layer, direct connection between inputs and output is preferred. B. one hidden layer C. two hidden layers D. Three or more hidden layers
Polynomial combinations of the model inputs enable predictions to better match the true input/target association. Polynomial combinations of the model inputs enable predictions to better match the true input/target association and minimize the chances of overfitting. Review: Standard Logistic Regression
How would you characterize the effects of adding polynomial combinations of the model inputs to a regression? a. Polynomial combinations of the model inputs enable predictions to better match the true input/target association. b. Polynomial combinations of the model inputs decrease the chances of overfitting. c. Polynomial combinations of the model inputs do not minimize prediction bias. d. Polynomial combinations of the model inputs enhance the interpretability of the predictions.
Inputs
It is important to ensure that ____________have desirable distribution properties. The formula builder can be used to modify existing ___________variables and create new variables with desirable properties. ___________should have roughly the same scale of measurement and be free of unusual or missing values. A. Data Sources B. Inputs C. Outputs
all of the above
In SAS Enterprise Miner's Decision Tree node, which of the following types of target variable can be used? Choose one: all of the above nominal with any number of categories interval binary
Association Tool
In SAS Enterprise Miner, market basket and sequence analyses are handled by the _______________tool. The tool transforms a transactions data set into rules. The data source you use must have the role of Transaction and must have an ID variable and a Target variable. The ID variable is some identifier of the transaction and the Target variable is the item. A. Profiling B. Association C. Data Source
Cluster Analysis
In _____________ analysis, the goal is to identify distinct groupings of cases across a set of inputs. A. Cluster B. Segmentation C. Data Source
Segmentation
In _______________analysis, the goal is to partition cases from a cloud of data (data that doesn't necessarily have distinct groups) into contiguous groups. A. Cluster B. Segmentation C. Data Source
Transform input data, apply analysis, and generate deployment methods. SAS Enterprise Miner enables you to perform some of the steps of the full analytic workflow. Some tasks, such as defining the business problem or analytic objective, and selecting, extracting, validating, and repairing input data, must be done before you begin working with SAS Enterprise Miner. Other tasks, such as deploying the analysis, gathering, and assessing results, must be done after you have worked with SAS Enterprise Miner. Review: The Analytic Workflow
In a typical applied analytics project, which of the following tasks would you use SAS Enterprise Miner to perform? a. Extract, validate, and repair input data. b. Transform input data, apply analysis, and generate deployment methods. c. Gather and assess results of deployment. d. All of the above.
high-correlations among input variables When input(X) variables are highly correlated with each other or linear combinations of other input variables, the input variables are called collinear and the situation is said to exhibit multicollinearity.
Multicollinearity in regression refers to which of the following? A. high skewness in distributions of input variables B. non-constant variance of the target variable C. non-normality of the target variable D. high correlations among input variables
Outcome
The _____________ of market basket analysis is a set of association rules such as buying item A implies buying item B (A => B). The strength of the association is measured by the support, confidence, and lift of the rule. A. Outcome B. Input C. Result
Prediction is more important than explanation. What makes neural networks interesting is their ability to approximate virtually any continuous association between the inputs and the target. Neural networks are especially useful for prediction problems where no mathematical formula is known that relates inputs to outputs, prediction is more important than explanation, and there is a lot of training data. The Rule Induction tool combines decision tree and neural network models to predict nominal targets. It is intended to be used when one of the nominal target levels is rare. Review: Neural Network Structure
Neural networks are especially useful for prediction problems where which of the following is true? a. A mathematical formula can be used to relate inputs to outputs. b. Prediction is more important than explanation. c. Little training data is available. d. One of the nominal target levels is rare.
Cluster
One of the main uses of _____________ analysis is data reduction. This process involves identifying distinct groups of cases across a set of inputs. ____________ analysis is useful because it is easier to manage, explore, and model groups rather than individual observations. New observations can be assigned to the appropriate ___________ based on the values of their associated inputs. A. Explanatory B. Data C. Cluster
orphan nodes SAS Enterprise Miner stopping rules help to avoid orphan nodes, control sensitivity, and grow large trees. Increasing the minimum leaf size will avoid orphan nodes. For large data sets, you might want to increase the maximum leaf setting to obtain additional modeling resolution. If you want very large trees and use the chi-square split-worth criterion, deactivate the Split Adjustment option. Review: Stopping Rules
SAS Enterprise Miner stopping rules help to avoid which of the following: a. logworth b. orphan nodes c. missing values d. probability
Confidence
The ______________of an association rule A => B is the conditional probability of a transaction that contains item set B given that it contains item set A (or the probability that a customer has B given that the customer has A). The ________________is estimated by using the following formula: A. Confidence B. Support C. Lift
not impute any missing variables because trees can handle them Do not impute any missing values because trees can handle them as a separate category. The split search criteria for decision tress assign the missing values along one side of a branch at the Splitting node as a category. This is quite different from a regression or neural network, where each input variable is used in a mathematical equation and hence cannot have missing values.
Suppose your input variables have missing values. What should you do before running a decision tree with these input variables? A. impute all missing variables using the Tree method B. impute only interval variables using the Tree method but do not impute the class variables C. not impute any missing variables because trees can handle them
Role: Input Level: Nominal The Advanced Metadata Advisor assists the SAS EMiner user in assigning appropriate Role and Level metadata based upon a data set's variable data type, name and stored values. Although units_sold is a numeric variable, it will be assigned the Level Nominal because it contains fewer than 10 distinct values. The role of the variable units_sole will be input because it does not contain any missing values and is below the Missing Percentage Threshold value.
The SAS data set credit_customers contains a numeric variable units_sold that holds only the values: 1,2,3,4. Based on the settings provided in the Advanced Advisor Options, what will be the Role and Level of the units_sold variable when the credit_customers data set is created using Advanced Metadata Advisor in the Data Source Wizard? Property: Value: Missing Percentage Threshold - 50 Reject Vars with Excessive Missing Values - Yes Class Levels Count Threshold - 10 Detect Class Levels - Yes Reject Levels Count Threshold - 20 Reject Vars with Excessive Class Values - Yes Database Pass-Through - Yes A. Role: Interval Level: Input B. Role: Input Level: Interval C. Role: Input Level: Nominal
The overall distribution of bargain item sales is approximately normal and Segment 1 contains stores selling fewer than average bargain items. The segment profile node provides a graphical representation of overall input distribution - the red outlined bars - as well as segment-specific input distribution - the blue shaded bars. The red outlined bars specify overall case distribution and the blue shaded bars represent segment-specific case distribution.
The SAS data set retail contains information on the count of retail stores sales based on the following item types: bargain, essential, gourmet, and health. Based on the results from the Cluster Profile node, which statement is true? A. The overall distribution of bargain item sales is approximately normal and Segment 1 contains stores selling fewer than average bargain items. B. The overall distribution of essential item sales is approximately normal and Segment 1 contains stores selling fewer than average essential items.
Lift The lift can be interpreted as a general measure of association between the two item sets. Lift values greater than 1 indicate positive correlation; values equal to 1 indicate zero correlation; and values less than 1 indicate negative correlation. If Lift=2 for the rule A => B, then a customer having A is twice as likely to have B than a customer chosen at random. Lift is symmetric, so the lift of the rule A => B is the same as the lift of the rule B => A.
The ______ of the rule A => B is the confidence of the rule divided by the expected confidence, assuming that the item sets are independent. The expected confidence of A => B is the probability that a customer has B. A. Confidence B. Support C. Lift
Support
The __________ for the rule A => B is the probability that the two item sets occur together (or the probability that a customer has both A and B). __________ is symmetric, so the support of the rule A => B is the same as the support of the rule B => A. The support of the rule A => B is estimated by using the following formula: A. Confidence B. Support C. Lift
Results
The ____________ window for the cluster analysis includes the Segment Plot, Segment Size, Mean Statistics, and Output windows. A. Explorer B. Results C. Clustering
none of the above
The importance of an input variable in predicting a target in an MLP-based neural network can be figured out by which of the following? Choose one: none of the above the average of the absolute values of parameter estimates between the input and all of the hidden neurons the highest absolute value of the parameter estimate between the input and any of the hidden neurons multiplied by the absolute value of the parameter estimate of the hidden neuron the highest absolute value of the parameter estimate between the input and any of the hidden neurons
Advanced
Using the default ____________options, the Metadata Advisor does the following: •rejects variables with more than 50% missing values •detects the number of class levels for numeric variables and assigns a role of Nominal to those with class counts below 20 •detects the number of class levels for character variables and assigns a role of Rejected to those with class counts above 20. A. Basic B. Advanced
all of the above The Metadata Advisor Options step in the Data Source Wizard enables you to set the Metadata Advisor, which controls how SAS Enterprise Miner creates metadata for the variables in your data source. The Metadata Advisor has two modes: Basic and Advanced. You can use the Advanced Advisor Options window to customize the advanced metadata options. Review: Using the Advanced Metadata Advisor
Using the default advanced options, the Metadata Advisor performs which of the following options automatically? a. rejects variables with more than 50% missing values b. detects the number of class levels for numeric variables and assigns a role of Nominal to those with class counts below 20 c. detects the number of class levels for character variables and assigns a role of Rejected to those with class counts above 20 d. all of the above
increase the performance of logistic regression Often predictor variables with high degree of right or left skewness are transformed such that they become approximately symmetrical in their distributions to improve the predictvie performance of any regression type model, including logistic regression.
Transformations of input variables to make their distributions more symmetric will likely have what impact in a logistic regression? A. increase the performance of logistic regression B. decrease the performance of logistic regression C. neither increase nor decrease the performance of logistic regression D. create convergence problems in maximum likelihood estimation
To ensure that the choice of split is not influenced by input measurement scale. The Kass adjustment makes it easier for an interval input to be chosen as the best split.
What is the purpose of the Kass (Bonferroni) adjustment in the decision tree split search algorithm? a. To ensure that the choice of split is not influenced by input measurement scale. B. To ensure a non-negative logworth value. C. To give categorical inputs a greater chance to be used in the split.
Sample Properties, Sample Statistics, sample data for the selected variable and a histogram of the data for the selected variable.
When the Explore button is selected in the graphic, what information will be displayed? A. Sample Properties, Sample Statistics, sample data with all variables and values, and a histrogram of the data for the selected variable. B. Sample Properties, Sample Statistics, sample data for the selected variable, and a historgram of the data for the selected variable.
AutoNeural You can use the AutoNeural tool to automatically explore alternative network architectures and hidden unit counts. The AutoNeural tool conducts limited searches in order to find better network configurations. There are several options that the tool uses to control the algorithm. Review: Introduction
Which SAS Enterprise Miner tool can be used to automatically explore alternative network architectures and hidden unit counts? a. AutoNeural b. DMNeural c. Neural Network d. Rule Induction
The Filter tool You use the Filter tool to apply a filter to exclude certain observations from your data for your analysis. The Input Datatool enables you to inlcude a data source in your process flow. The Data Partition tool enables you to partition your training data into subsets for training, validation, and testing. The Explore window enables you to visually explore the details in your data source. Review: Introduction to Filtering Data
Which SAS Enterprise Miner tool would you use to exclude certain observations in your data source, such as extreme outliers, from your analysis? a. The Input Data tool b. The Filter tool c. The Data Partition tool d. The Explore window
growing decision trees interactively There are three methods for constructing decision tree models. These three methods include the following: interactively or by hand automatically autonomously Building models interactively, or by hand, enables you to select and modify the splits of your data to create the tree and its leaves. It is an informative method and is recommended when you are first learning about the Decision Tree node and remains valid even when you are more expert at predictive modeling. You grow a decision tree automatically by using the Train node function of the Interactive Decison Tree tool. Review: Constructing Decision Trees
Which method of growing decision trees is the most informative and hands-on approach? a. growing decision trees automatically b. growing decision trees autonomously c. growing decision trees interactively d. none of the above methods
Stepwise Forward evaluates the improvement upon baseline and builds increasingly complex models from one-input up - sequentially adding smallest p-value. Backwards contains all available inputs and then removes them - sequentially removing largest p-value. Stepwise builds from baseline as forward but scans all variables after each new variable for inclusion in the model. Those with large p-values are removed to the pool for reassessment.
Which method of input selection for regression analysis evaluates the statistical significance of all included inputs after each input is added? A. Forward B. Backward C. Stepwise D. Simple
A. Forward Forward evaluates the improvement upon baseline and builds increasingly complex models from one input up - sequentially adding smallest p-value. Backwards contains all available inputs and then removes them -sequentially removing largest p-value. Stepwise builds from baseline as forward but scans all variables after each new variable for inclusion in the model. Those with large p-values are removed to the pool for reassessment.
Which method of input selection for regression analysis evaluates the statistical significance of the total model to see if it improves on the baseline as the variables are added and once no further improvement is made then variable selection ends? A. Forward B. Backward C. Stepwise D. Simple
Forward Forward evaluates the improvement upon baseline and builds increasingly complex models from one-input up - sequentially adding smallest p-value.
Which method of input selection for regression analysis evaluates the statistical significance of the total model to see if it improves on the baseline as the variables are added and once no further improvement is made, then variable selection ends? A. Forward B. Backward C. Stepwise
the Score tool and the SAS Code tool You use the Score tool to apply a predictive model to scoring data, and you can use the SAS Code tool to save the scored data table to a different SAS library. The Model Implementation tool, the Java Code tool, and the C Code tool do not exist. Review: Introduction
Which of the following SAS Enterprise Miner tools might you use to perform model implementation? a. the Score tool and the SAS Code tool b. the Score tool, the SAS Code tool, the Java Code tool, and the C Code tool c. the Score tool and the Model Implementation tool d. all of the above
all of the above The most widely used type of neural network in data analysis is the multilayer perceptron (MLP). Multilayer perceptrons are often represented by a network diagram instead of an equation. The network diagram is arranged in layers. The first layer, called the input layer, contains any number of inputs. The input layer connects to a hidden layer, which is composed of hidden units. The hidden layer connects to a final layer called the target, or output, layer. A multilayer perceptron can contain additional hidden layers with any number of hidden units. Review: Neural Network Structure
Which of the following is a characteristic of a multilayer perceptron? a. any number of inputs b. one or more hidden layers with any number of hidden units c. linear combination functions in the hidden and output layers d. all of the above
all of the above Predictive models are widely used and come in many varieties. Any model must perform three essential tasks: predict new cases, select useful inputs, and optimize complexity. Different modeling tools use different methods to complete each task. Review: A Predictive Model's Tasks
Which of the following is an essential task for any predictive model? a. predict new cases b. select useful inputs c. optimize complexity d. all of the above
All of the above. For cluster analysis, you should seek inputs that have all the listed attributes. Inputs should also be meaningful to the analysis objective, relatively independent, and have low kurtosis and skewness statistics (at least in the training data). Review: Selecting and Refining Inputs for Analysis
Which of the following is important to consider when selecting inputs for cluster analysis? a. Inputs should have measurement scales with similar ranges. b. Inputs should be limited in number. c. Inputs should have a measurement level of Interval. d. All of the above.
All of the above. For cluster analysis, you should seek inputs that have all the listed attributes. Inputs should also be meaningful to the analysis objective, relatively independent, and have low kurtosis and skewness statistics (at least in the training data). Review: Selecting and Refining Inputs for Analysis
Which of the following is important to consider when selecting inputs for cluster analysis? a. Inputs should have measurement scales with similar ranges. b. Inputs should be limited in number. c. Inputs should have a measurement level of Interval. d. All of the above.
Another benefit is ease in model interpretation.
Which of the following is not a good reason to"regularize" input distributions using a simple transformation? Choose one: Regression models are sensitive to extreme or outlying values in the input space. One benefit is improved model performance. Another benefit is ease in model interpretation. When you perform regression, inputs with highly skewed or highly kurtotic distributions can be selected over inputs that would yield better overall predictions.
One benefit is improved model performance. A cost associated with "regularizing" the input distributions using a simple transformation is difficulty in model interpretation. Review: Introduction
Which of the following is not a good reason to"regularize" input distributions using a simple transformation? a. Regression models are sensitive to extreme or outlying values in the input space. b. When you perform regression, inputs with highly skewed or highly kurtotic distributions can be selected over inputs that would yield better overall predictions. c. One benefit is improved model performance. d. Another benefit is ease in model interpretation.
Fit Statistics can provide information that affects decision predictions, but does not affect estimate predictions.
Which of the following is not true about results produced by the Regression node? Choose one: Variable Summary information identifies the roles of variables used by the Regression node. Fit Statistics can provide information that affects decision predictions, but does not affect estimate predictions. Model Information provides you with information that includes the number of target categories and the number of model parameters. Type 3 Analysis of Effects provides you with information about the number of parameters that each input contributes to the model.
missing values The most prevalent problem for neural networks is missing values. Like regressions, neural networks require a complete record for estimation and scoring. Extreme or unusual values also present a problem for neural networks. This problem is mitigated somewhat by the hyperbolic tangent activation functions in the hidden units. Review: Beyond the Prediction Formula
Which of the following is the most prevalent problem for neural networks? a. extreme or unusual values b. nonnumeric inputs c. missing values d. nonlinear association
all of the above All of these problems can occur if you do not adjust for separate sampling. Review: Adjusting for Separate Sampling
Which of the following problems can result if you do not adjust for separate sampling? a. Prediction estimates reflect target proportions in the training sample, not the population from which the sample was drawn. b. Score rankings plots are inaccurate and misleading. c. Decision-based statistics related to misclassification or accuracy misrepresent the model's performance on the population. d. all of the above
Stepwise The stepwise selection method compares each variable to the previous variable entered and based on a specified level, one variable is considered more significant than the other.
Which of the following sequential selection methods do you use so that SAS Enterprise Miner will look at all variables already included in the model and delete any variable that is not significant at the specified level? a. Backward b. Forward c. Stepwise d. None
When you impute a synthetic value, it eliminates the incomplete case problem. Imputing a synthetic value for a missing value eliminates the incomplete case problem but modifies the input's distribution, which can bias the model predictions. Review: Introduction
Which of the following solves problems for you when you impute missing values? a. When you impute a synthetic value, it replaces missing values with 1 or 0. b. When you impute a synthetic value, it eliminates the incomplete case problem. c. When you impute a synthetic value, predictive information is retained. d. When you impute a synthetic value, each missing value becomes an input to the model.
Unless a profit matrix is defined, the Model Comparison tool selects the model with the smallest validation misclassification rate by default. The Model Comparison tool (on the Assess tab) generates a variety of statistics of fit that are listed for both the training and validation data partitions. By default, unless a profit matrix is defined, the Model Comparison tool selects the model with the smallest validation misclassification rate. You should look at the validation fit statistics that are appropriate to the type of prediction that you are interested in. Whether the best fit is indicated by the highest value or the lowest value depends on the specific fit statistic. Review: Understanding Model Assessment
Which of the following statements about assessing model performance using the Model Comparison tool is true? a. Unless a profit matrix is defined, the Model Comparison tool selects the model with the smallest validation misclassification rate by default. b. The Model Comparison tool calculates values for up to three statistics at a time. c. For all fit statistics that the Model Comparison tool generates, the highest value indicates the best fit. d. The Model Comparison tool appears on the Explore tab.
Filter Tool
You use the_______________to exclude certain observations from your analysis. You might want to remove any cases that seem in error or out of place in a data source. After you remove these cases, your data is ready for subsequent analysis. A. Source Node B. SAS Table C. Filter Tool
Score data contains the same input variables as training data, but the target variables might be different or missing. Score data is data that is structured like the training data but that lacks a target variable. Review: Introduction
Which of the following statements about the data table that you use for scoring is true? a. Score data contains the same target variable as training data, but the input variables might be different. b. Score data contains the same input variables as training data, but the target variables might be different or missing. c. Score data contains none of the same variables as training data. d. Score data must contain all of the same variables as training data, including input variables and target variables.
A data source is a metadata definition that informs SAS Enterprise Miner about the name and location of a SAS table, the SAS code that is used to define a library path, and the variable roles, measurement levels, and other attributes that are important for the data mining project. A data source is a link between an existing SAS table and SAS Enterprise Miner. It contains metadata that informs SAS Enterprise Miner about the name and location of the SAS table, the SAS code that is used to define a library path, and the variable roles, measurement levels, and other attributes that guide the data mining process. Review: What Is a Data Source?
Which of the following statements best describes a SAS Enterprise Miner data source? a. A data source is a SAS table that is used in a SAS Enterprise Miner process flow. b. A data source is a SAS table that has been modified so that it can be used in a SAS Enterprise Miner project. c. A data source is a metadata definition that is used in a SAS Enterprise Miner process flow to inform SAS Enterprise Miner only of the location of the SAS table. d. A data source is a metadata definition that informs SAS Enterprise Miner about the name and location of a SAS table, the SAS code that is used to define a library path, and the variable roles, measurement levels, and other attributes that are important for the data mining project.
The Neural Network tool stops training when the final model is selected. SAS Enterprise Miner treats each iteration of the optimization process as a separate model. The iteration with the smallest value of the selected fit statistic is chosen as the final model. This method of model optimization is called stopped training. To avoid stopping too early, the Neural Network tool continues to train until the training data converges or until the maximum iteration count is reached, whichever comes first. Review: Introduction
Which of the following statements is false? a. The Neural Network tool treats each iteration of the optimization process for a neural network as a separate model. b. The Neural Network tool chooses the iteration with the smallest value of the selected fit statistic as the final model. c. The Neural Network tool stops training when the final model is selected. d. The Neural Network tool often requires trial and error to find the best architecture.
All of the above. The Explore window enables you to visually explore the variables in your data. You can use the Explore window to search for anticipated trends, unanticipated trends, or anomalies. The Explore window includes a Variable Values window that displays all of the values in the data source. It can also contain a histrogram for each variable in the data source, which enables you to graphically explore relationships between the variables. The Explore window also includes a Plot Wizard that enables you to create scatter plots and other types of charts to further explore your data. Review: Using the Explore Window
Which of the following tasks can you perform using the Explore window in SAS Enterprise Miner? a. Use histograms to visually explore relationships between the variables in your data source. b. Examine a table of the values of each variable in your data source. c. View a scatter plot, bar chart, or pie chart of the values in your data source. d. All of the above.
assessing a tree based on validation data The Interactive Decision Tree tool enables you to create decision trees either by interactive training or automatic training. You can use the Interactice Decision Tree tool to view results based on the training data, but in order to assess your tree based on the validation data, you must exit the Interactive Decision Tree tool and use the Results window of the Decision Tree node. Review: Interactively Creating a Decision Tree with One Split
Which of the following tasks cannot be performed in the Interactive Decision Tree tool? a. training interactively b. growing trees automatically c. assessing a tree based on training data d. assessing a tree based on validation data
sensitivity Sensitivity is the proportion of primary outcome cases in a selected fraction. Review: Sensitivity Charts
Which of the following terms refers to the proportion of primary outcome cases in a selected fraction? a. cumulative lift b. cumulative gain c. sensitivity d. false positive fraction
devote more data to training and less data to validation You use the Data Partition tool to specify the fraction of input data devoted to the training, validation, and test partitions. When you select a partitioning strategy it is important to note that there are various trade-offs. More data devoted to training results in more stable predictive models but less stable model assessments. More data devoted to validation results in less stable predictive models but more stable model assessments. The test partition is used only for calculating fit statistics after the modeling and model selection is complete. It is not uncommon to omit the test partition and assign an equal number of cases to the training partition and the validation partition. Review: Using the Data Partition Tool
Which partitioning strategy results in more stable predictive models? a. devote more data to training and less data to validation b. devote more data to validation and less data to training c. devote more data to testing and less data to validation d. devote more data to scoring and less data to training
Cluster analysis is considered to be supervised classification because the k-means clustering algorithm assigns cases to clusters. Cluster analysis is considered to be unsupervised classification because it attempts to group training data set cases based on similarities in input variables. Review: Introduction
Which statement about cluster analysis is false? a. There is no target variable for cluster analysis. b. The Segment Profile tool can be helpful in determining the composition of clusters, particularly when more than three variables are used to generate the segments. c. Cluster analysis is considered to be supervised classification because the k-means clustering algorithm assigns cases to clusters. d. SAS Enterprise Miner uses the k-means clustering algorithm for cluster analysis.
Cluster analysis is considered to be supervised classification because the k-means clustering algorithm assigns cases to clusters. Cluster analysis is considered to be unsupervised classification because it attempts to group training data set cases based on similarities in input variables. Review: Introduction
Which statement about cluster analysis is false? a. There is no target variable for cluster analysis. b. The Segment Profile tool can be helpful in determining the composition of clusters, particularly when more than three variables are used to generate the segments. c. Cluster analysis is considered to be supervised classification because the k-means clustering algorithm assigns cases to clusters. d. SAS Enterprise Miner uses the k-means clustering algorithm for cluster analysis.
Market basket analysis is performed on transactions data, and variables must be assigned to both the ID and Target roles. Market basket analysis is performed on transactions data in which one variable is assigned to the ID role and one variable is assigned to the Target role. The Sequence role does not need to be used unless you want to perform a sequence analysis. Market basket analysis results in a list of association rules. The strength of the association is measured by the support, confidence, and lift of the rule. Review: Introduction
Which statement about market basket analysis is true? a. To perform a market basket analysis, one variable must be assigned to the Sequence role. b. Market basket analysis is performed on transactions data, and variables must be assigned to both the ID and Target roles. c. Market basket analysis results in a list of association rules. The strength of the rules can be measured by the CCC plot. d. Market basket analysis is useful for identifying the order in which customers buy products or services.
Market basket analysis is performed on transactions data, and variables must be assigned to both the ID and Target roles. Market basket analysis is performed on transactions data in which one variable is assigned to the ID role and one variable is assigned to the Target role. The Sequence role does not need to be used unless you want to perform a sequence analysis. Market basket analysis results in a list of association rules. The strength of the association is measured by the support, confidence, and lift of the rule. Review: Introduction
Which statement about market basket analysis is true? a. To perform a market basket analysis, one variable must be assigned to the Sequence role. b. Market basket analysis is performed on transactions data, and variables must be assigned to both the ID and Target roles. c. Market basket analysis results in a list of association rules. The strength of the rules can be measured by the CCC plot. d. Market basket analysis is useful for identifying the order in which customers buy products or services.
Sequence analysis requires inputs in the ID, Target, and Sequence roles. Sequence analysis is useful for determining the order in which a customer acquires products or services over time. The analysis requires a variable in the Sequence role as well as the Target and ID roles. In SAS Enterprise Miner, you use the Association tool to perform sequence analysis. The strength of the rule is measured by the support and confidence of the rule. Review: Introduction
Which statement about sequence analysis is true? a. Sequence analysis can be helpful for determining what product or services a customer purchases in a single transaction. b. Sequence analysis requires inputs in the ID, Target, and Sequence roles. c. The strength of a sequence rule is measured by the confidence, support, and lift of the rule. d. In SAS Enterprise Miner, you use the Sequence Analysis tool to perform a sequence analysis.
Sequence analysis requires inputs in the ID, Target, and Sequence roles. Sequence analysis is useful for determining the order in which a customer acquires products or services over time. The analysis requires a variable in the Sequence role as well as the Target and ID roles. In SAS Enterprise Miner, you use the Association tool to perform sequence analysis. The strength of the rule is measured by the support and confidence of the rule. Review: Introduction
Which statement about sequence analysis is true? a. Sequence analysis can be helpful for determining what product or services a customer purchases in a single transaction. b. Sequence analysis requires inputs in the ID, Target, and Sequence roles. c. The strength of a sequence rule is measured by the confidence, support, and lift of the rule. d. In SAS Enterprise Miner, you use the Sequence Analysis tool to perform a sequence analysis.
The average target value is calculated for each level, and then passed on for testing if it is the optimal split point. Split Point Definition: if the input is categorical, the average value of the target is taken within each categorical input level. The averages serve the same role as the unique interval input values.
Which statement describes the Decision Tree Split Search mechanism for categorical inputs? A. All levels are weighted and the weights are used for testing. B. The levels that have target rate of 0 or 100% are re-binned first, then weighted and the weights are used for testing. C. The average target value is calculated for each level, and then passed on for testing if it is the optimal split point.
segment profile You can use the link graph and rules table to help interpret the results of market basket analysis. Review: Using the Link Graph and Rules Table to Interpret Results
Which tool can be used to help interpret the results of market basket analysis? a. link graph b. segment profile c. CCC plot d. three-dimensional scatter plot
link graph You can use the link graph and rules table to help interpret the results of market basket analysis.
Which tool can be used to help interpret the results of market basket analysis? a. link graph b. segment profile c. CCC plot d. three-dimensional scatter plot
validation A validation data set is used for monitoring and tuning a model to improve its generalization. The tuning process usually involves selecting among models of different types and complexities. The tuning process optimizes the selected model on the validation data. Review: Introduction
Which type of data set is used to monitor and tune a predictive model? a. training b. validation c. testing d. score
ranking Ranking predictions order cases based on the inputs' relationships with the target. Using the training data, the prediction model attempts to rank high value cases higher than low value cases. It is assumed that a similar pattern exists in the scoring data so that high value cases have high scores. The actual scores produced are inconsequential; only the relative order matters. The most common example of a ranking prediction is a credit score. Review: Rankings
Which type of prediction orders cases based on the inputs' relationship to the target? a. decision b. classification c. ranking d. estimate
Decision
_________ are the simplest type of prediction and are usually associated with some kind of action (such as classifying a donor or non-donor). For this reason, ___________ are also known as classifications. Examples of _________ prediction include handwriting recognition, fraud detection, and direct mail solicitation. _________ predictions usually relate to a categorical target variable. For this reason, they are identified as primary, secondary, and tertiary in correspondence with the levels of the target. A. Ranking B. Decision C. Target
Sequence
__________ analysis is an extension of market basket analysis to include a time dimension in the analysis. In this way, transaction data is examined for ________________s of items that occur (or do not occur) more (or less) commonly than expected. A Webmaster might use _____________ analysis to identify patterns or problems of navigation through a Web site. A. Profiling B. Sequence C. Market Basket
Profiling
__________ is a by-product of reduction methods such as cluster analysis. The idea is to create rules that isolate clusters or segments, often based on demographic or behavioral measurements. _____________ helps describe the attributes of the people in a cluster. A marketing analyst might develop profiles of a customer database to describe the consumers of a company's products. A. Profiling B. Data Reduction C. Novelty Detection
Clustering
__________ is considered to be unsupervised classification because the goal is to group training data set cases based on similarities in input variables with no target variable or specific outcome. A. Profiling B. Clustering C. Data Analysis
Data Reduction
________________ is the most ubiquitous application: exploiting patterns in data to create a more compact representation of the original. Though vastly broader in scope, data reduction includes analytic methods such as cluster analysis, which you learn to perform in this lesson. A. Profiling B. Data Reduction C. Novelty Detection
Market Basket
_________________ analysis (or association rule discovery) is used to analyze streams of transaction data (for example, ________________) for combinations of items that occur (or do not occur) more (or less) commonly than expected. Retailers can use this as a way to identify interesting combinations of purchases or as predictors of customer segments. A. Data Reduction B. Profiling C. Market Basket
Market Basket
___________________ analysis (also known as association rule discovery or affinity analysis) is a popular data mining method for exploring associations between items. In the simplest situation, the data consists of two variables: a transaction and an item. For each transaction, there is a list of items. Typically, a transaction is a single customer purchase, and the items are the things that were bought. The result of __________________ analysis is a list of association rules. The value of the generated rules is gauged by confidence, support, and lift. In SAS Enterprise Miner, you use the Association tool to perform ___________________ analysis. After you have run the association analysis, you can use a variety of tools such as the Link Graph and Rules table to interpret the results of the analysis. A. Explanatory B. Diverse C. Market Basket