SAS - Machine Learning (Quizzes)
*Lesson 3* Which of the following statements is correct about the maximal tree? a) The maximal tree excludes the irrelevant inputs. b) The maximal tree generalizes well to new data. c) The maximal tree is the result of the pruning process. d) The maximal tree is constructed on the validation data.
Correct: A (The maximal tree does not contain irrelevant inputs.)
*Lesson 6* A confusion matrix helps you classify which type of target? a) binary b) categorical c) interval
Correct: A (A confusion matrix displays performance statistics for a model with a binary target.)
*Lesson 1* SAS Viya is a cloud-enabled, in-memory analytics run-time environment that scales to data of any size, type, speed, and complexity. True of False?
True
*Lesson 3* A classification tree predicts a categorical target, and a regression tree predicts an interval target. True or False?
True (A decision tree that predicts a categorical target is called a classification tree. A decision tree that predicts an interval target is called a regression tree.)
*Lesson 2* Which of the following statements is true about the Text Mining node? a) It processes audio and video data. b) It transforms a term-by-document frequency matrix using singular value decomposition (SVD) to create binary coefficients. c) It creates topics based on groups of terms that occur together in several documents. Each term-document pair is assigned a score for every topic. d) It does not allow terms and documents to belong to multiple topics.
Correct: C (The Text Mining node creates topics and assigns scores as described.)
*Lesson 5* In support vector machines, finding the separating hyperplane is an optimization problem with constraints that involve the values of the binary target. True or False?
Correct: True (Solving for the support vector machine is actually an optimization problem with two constraints. The first constraint is based on a target value of +1, and the second constraint is based on a target value of -1.)
*Lesson 1* CAS is designed to run in a single-machine symmetric multiprocessing (SMP) configuration but not in a multi-machine massively parallel processing (MPP) configuration. True or False?
False (CAS can run in both SMP and MPP configurations.)
When you create a project in Model Studio, event-based sampling is used (that is, turned on) by default. True or False?
False (Event-based sampling is turned off by default when a project is created. If you want to use event-based sampling, you can manually turn it on in the project settings before you run a pipeline in the project.)
*Lesson 3* The reason for building a decision tree that allows for three-way splits, compared to a tree that allows for two-way splits, is improved model performance. True or False?
False (Research has shown that trees with multi-way splits do not necessarily outperform trees with binary splits.)
*Lesson 1* In SAS Viya, you might have nondeterministic results. True or False?
True (Due to the distributed nature of the engine that underlies SAS Viya (CAS), results might not be reproducible (that is, nondeterministic).
*Lesson 6* The primary considerations for choosing an appropriate model selection statistic, other than the business needs, are the prediction type and the target type. True or False?
True (Other than the business needs, the prediction type and the measurement level of the target are the two factors you must consider when choosing a model selection statistic.)
*Lesson 5* Which of the following describes a Local Interpretable Model-Agnostic Explanation (LIME) plot when the Model Interpretability feature is used in Model Studio? a) It creates a localized linear regression model around a particular observation based on a perturbed sample set of data. b) It is calculated by depth-one decision trees using each input to estimate the predicted values of the support vector machine model. c) It depicts the functional relationship between the model inputs and the model's predictions. d) It presents a disaggregation of the partial dependency (PD) plot to reveal interactions and differences at the observation level.
Correct: A (A LIME plot creates a localized linear regression model around a particular observation based on a perturbed sample set of data.)
*Lesson 4* In an MLP with one hidden layer, which of the following is a mathematical transformation that is applied to a linear combination of the input values? a. activation function b. error function c. hidden layer d. hidden unit
Correct: A (An activation function is a mathematical transformation that is applied to a linear combination of the input values.)
*Lesson 6* Before you score and manage a model in Model Manager, how do you register the model? a) In Model Studio, select Register Models from the Project Pipeline menu on the Pipeline Comparison tab. b) In Model Studio, add a Model Registration node from the Postprocessing group to the pipeline and run the node. c) In Model Manager, select the model in the Projects list and select Register Models from the Models menu.
Correct: A (Before you score and manage a model in Model Manager, you must register the model in Model Studio. To do this, you select Register Models from the Project Pipeline menu on the Pipeline Comparison tab.)
*Lesson 5* A support vector machine in Model Studio can be used only for which of the following? a) binary targets b) linearly separable data c) a maximum of two input variables d) a linear relationship between inputs and the target
Correct: A (In Model Studio, support vector machines are used exclusively with binary targets.)
*Lesson 6* During the model deployment phase, champion-challenger testing compares the performance of the currently deployed model and a challenger model based on which of the following? a) historic data b) new, streaming data
Correct: A (In model deployment, champion-challenger testing compares performance on historic data.)
*Lesson 2* How do the transformations available in the Transformations node minimize bias in model predictions? a) by reducing the effect of extreme or unusual input values b) by replacing missing values and avoiding complete case analysis c) by converting unstructured data to structured data d) by reducing the total number of variables to reduce dimensionality
Correct: A (The transformations available in the Transformations node minimize bias in model predictions by reducing the effect of extreme or unusual input values.)
*Lesson 1* The probability of churning is which type of prediction? a) estimate b) ranking c) decision d) imputation
Correct: A (estimate. Churn is a categorical target, so the probability of the target outcome is an estimate.)
*Lesson 6* A cumulative lift chart shows that a machine learning model has a lift of 2.6 at a depth of 10%. What does this mean? a) For the top 10% of cases, a random model captures 2.6 times more primary outcome cases than the machine learning model. b) For the top 10% of cases, the machine learning model captures 2.6 times more primary outcome cases than a random model. c) For the top 90% of cases, the machine learning model captures 2.6 times more primary outcome cases than a random model. d) For the top 90% of cases, the machine learning model captures 2.6 times more primary outcome cases than secondary outcome cases.
Correct: B (For the top 10% of cases, the machine learning model captures 2.6 times more primary outcome cases than a random model.)
*Lesson 2* Which of the following is a reason why it is important to reduce the number of inputs during data preparation? a) A model that is based on a large number of inputs is very likely to be underfit to the training data. b) The more inputs you use to build the model, the more cases are required to discover the relationship between the inputs and the target. c) Modeling algorithms do not reduce the number of inputs.
Correct: B (If you do not reduce the number of inputs, you need more cases to build a good model. This problem is known as the curse of dimensionality.)
*Lesson 6* Which of the following statements is true about an ROC chart? a) The selection value of each point is displayed on the chart. b) Each point on the chart corresponds to a specific fraction of the sorted data. c) True positives are on the x axis. d) For a perfect model, the ROC curve is a straight line from the bottom left corner to the top right corner of the plot.
Correct: B (In an ROC chart, each point corresponds to a specific fraction of the sorted data.)
*Lesson 2* The Data Exploration node in Model Studio enables you to do which of the following? a) Impute variables based on summary statistics. b) View the most important inputs or suspicious variables. c) See variables with a high percentage of nonmissing values.
Correct: B (In the Data Exploration node's properties panel, you can set Variable selection criterion to Importance to see the most important inputs or to Screening to see suspicious variables.)
*Lesson 2* Which of the following statements is true about the validation data that the Variable Selection node creates from the training data? a) The Variable Selection node always creates these validation data. b) These validation data are used for variable selection during data preparation. c) These validation data are used for model assessment during the modeling process, instead of the original validation partition.
Correct: B (It is recommended that you use the default option to create validation data from the training data. These data are used for variable selection during data preparation. The original validation partition is used for model assessment during the modeling process.)
*Lesson 5* Which of the following describes the Input Relative Importance table that appears in the results when the Model Interpretability feature is used in Model Studio? a) It creates a localized linear regression model around a particular observation based on a perturbed sample set of data. b) It is calculated by depth-one decision trees using each input to estimate the predicted values of the model being interpreted. c) It compares the performance of the model at certain depths of the data ranked by the posterior probability of the event compared to a random model. d) It presents a disaggregation of the partial dependency (PD) plot to reveal interactions and differences at the observation level.
Correct: B (The Input Relative Importance table is calculated by depth-one decision trees using each input to estimate the predicted values of the model being interpreted.)
*Lesson 2* The Variable Selection node uses only supervised methods to select inputs. a) True b) False
Correct: B (The Variable Selection node can perform input selection based on both supervised and unsupervised methods.)
*Lesson 3* Which of the following statements is true about perturb and combine methods? a) These methods can be used only with decision trees. b) The perturb step creates different models by manipulating the distribution of the data or altering the construction method. c) The combine step selects the best splits from a series of models and then builds an individual decision tree model with those splits. d) The final model from a P&C method often has higher variance than a model created by other methods.
Correct: B (The perturb step creates different models, and the combine step produces a single result from the trees in the ensemble model.)
*Lesson 6* Which of the following statements is true about doing a pipeline comparison in Model Studio? a. You can add a challenger model by selecting it on the Pipeline Comparison tab. b. You can add a challenger model by selecting it in its pipeline. c. You cannot add challenger models to a pipeline comparison.
Correct: B (You can add a challenger model to a pipeline comparison. You select the model in its pipeline.)
*Lesson 1* Which of the following statements is true about machine learning? a) Machine learning is another name for artificial intelligence. b) You can combine machine learning algorithms in SAS Viya with open source tools. c) Neural networks are the only machine learning models that learn from the data. d) Compared to conventional models, machine learning models produce better results from big data but require more time to do so.
Correct: B (You can combine machine learning algorithms in SAS Viya with open source tools.)
*Lesson 2* Which of the following is a best practice for handling high-cardinality input variables? a) binning b) Winsorizing c) standardization d) text mining
Correct: Binning is a method of transformation that converts numeric inputs to categories or groups the levels of a high-cardinality input.
*Lesson 5* Which statement is true about kernel functions? a) When a kernel function is used, the solution is no longer a hyperplane. b) Model Studio provides three kernel functions: linear, polynomial, and sigmoid. c) A kernel function is a math trick used to avoid having to calculate dot products on transformed data.
Correct: C (A kernel function is a math trick used to avoid having to calculate dot products on transformed data.)
*Lesson 4* Which architecture of a neural network is best for modeling data with discontinuous input-output mappings? a. multilayer perceptron with no hidden layers b. multilayer perceptron with one hidden layer c. multilayer perceptron with two hidden layers d. skip-layer perceptron
Correct: C (Adding a second hidden layer can improve performance by enabling the MLP to realize discontinuous input-output mappings.)
*Lesson 3* Which of the following statements is true about decision trees as compared to other machine learning models? a) They are relatively stable models. b) They use complete case analysis. c.) They are relatively easy to interpret. d) They require more data preparation.
Correct: C (Compared to other machine learning models, decision trees are relatively easy to interpret.)
*Lesson 4* Which of the following hyperparameters or set of hyperparameters in Model Studio controls weight decay? a) annealing rate b) learning rate c) L1 and L2 d) momentum
Correct: C (In Model Studio, the L1 and L2 hyperparameters control weight decay.)
*Lesson 5* Which statement is true about the maximum-margin hyperplane in a two-dimensional input space? a) It is a regularization parameter often denoted by C. b) It includes all of the data points in the training data. c) It has the largest possible margin of error on its positive and negative sides. d) It is the thickest line that touches the innermost values of one target outcome and the innermost values of the other target outcome.
Correct: C (In a two-dimensional input space, the maximum-margin hyperplane is the exact center of the thickest line that touches the innermost values of one target outcome and the innermost values of the other target outcome. This hyperplane has the largest possible margin of error on its positive and negative sides.)
*Lesson 4* Which of the following statements is true about neural networks? a) Neural networks require that a specified form be stated prior to modeling. b) Neural networks perform best when the signal to noise ratio in a data set is low. c) Due in part to their lack of interpretability, neural networks are most relevant to pure prediction scenarios. d) Neural networks are universal approximators, so they are universally better than other types of models.
Correct: C (Neural networks are generally considered to be "black boxes." Because they are minimally interpretable, at best, neural networks are most useful in pure prediction scenarios.)
*Lesson 4* Which of the following occurs during a neural network's learning process? a) avoiding global minima b) late stopping c) numerical optimization
Correct: C (Numerical optimization is an important part of the learning process.)
*Lesson 6* In model comparison, the best model has the highest value of which of the following measures? a) average squared error b) misclassification rate c) sensitivity
Correct: C (Of the listed measures, sensitivity is the one for which a higher value indicates better model performance.)
*Lesson 5* What are support vectors? a. the data points that are easiest to classify b. the data points that are farthest from the maximum-margin hyperplane c. the only data points that determine the location of the maximum-margin hyperplane
Correct: C (Support vectors are the points in the data that are closest to the maximum-margin hyperplane.)
*Lesson 5* For a support vector machine, the classifier model has which of the following elements? a) an intercept and a slope b) a series of if-then-else rules c) a normal vector and a bias term d) an input layer, a hidden layer, and an output layer
Correct: C (The classifier model (H) has two elements: a normal vector w and a bias term b.)
*Lesson 3* Which of the following statements is true about using logworth as the splitting criterion for building a decision tree? a) The input selected to represent a given split has the smallest logworth compared to the other inputs. b) A logworth value is always positive. c) The logworth value is adjusted to penalize inputs with many split points. d) The logworth value is large when the underlying Pearson chi squared p-value is also large.
Correct: C (The logworth value is adjusted in two ways to penalize inputs with many split points.)
*Lesson 3* When you build a forest in Model Studio, a sample of the original training data set is used to train each individual tree. What do you call the portion of the original training data that is not used to train the individual tree? a) validation sample b) test sample c) out-of-bag sample d) bagged sample
Correct: C (The portion of the original training data that is not used to train the individual tree is called the out-of-bag sample.)
*Lesson 4* When you are building a neural network model, which of the following methods helps to avoid overfitting? a) deviance estimation b) input standardization c) weight decay
Correct: C (Weight decay is one of two main methods of avoiding overfitting when build a neural network model.)
*Lesson 4* When early stopping is used to build a neural network model, which data partition does Model Studio use to select the final model? a) test b) train c) validate
Correct: C (When early stopping is used, Model Studio uses the validation data to select the final model.)
*Lesson 2* After a pipeline is run, which of the following can you do using the Manage Variables node? a) Specify a different target variable. b) Modify the target variable attributes. c) Set up imputation and transformation rules. d) Perform imputation and transformations.
Correct: C (You can use the Manage Variables node to set up imputation and transformation rules. The Manage Variables node cannot perform the other tasks listed.)
*Lesson 1* A model that generalizes well has which of the following? a) high variance b) high bias c) a balance of variance and bias
Correct: C (balance of variance and bias)
*Lesson 1* After you create a new project, Model Studio takes you to the Data tab. Which of the following can you do on the Data tab? a) add variable labels b) modify variable names c) modify variable roles and measurement levels
Correct: C (modify variable roles and measurement levels. You cannot modify the variable names and variable labels in Model Studio.)
*Lesson 4* Which neural network architecture is best for modeling nonstationary data? a) multilayer perceptron with no hidden layers b) multilayer perceptron with one hidden layer c) multilayer perceptron with two hidden layers d) skip-layer perceptron
Correct: D (An MLP tends to perform poorly when applied to nonstationary data (that is, data that have elements that change over time). A skip-layer perceptron is the best architecture to use with nonstationary data.)
*Lesson 3* Which of the following statements is true about autotuning? a) Autotuning is guaranteed to arrive at the best model. b) After you build a model using autotuning, you can no longer manually specify settings of the hyperparameters. c) An advantage of autotuning is that it usually reduces run time compared to running a model with settings that you specify manually. d) Autotuning capabilities are available for the following nodes: Decision Tree, Forest, Gradient Boosting, Neural Network, and SVM.
Correct: D (Autotuning capabilities are available for the specific nodes listed above.)
*Lesson 1* Which of the following statements about complete case analysis is true? a) In complete case analysis, all available cases are used to build the model, regardless of whether they have missing values. b) If missingness is predictive of the target, you can use complete case analysis without introducing bias into the model. c) All machine learning algorithms in Model Studio use complete case analysis. d) Complete case analysis can reduce the predictive accuracy of the model.
Correct: D (Complete case analysis can remove a tremendous amount of information from the training data and reduce the predictive accuracy of the model.)
*Lesson 3* Which of the following statements is true about gradient boosting models? a) The trees in the series are independent of each other. b) Gradient boosting models always outperform forest models. c) Each tree is built on a sample with replacement from the original training data. d) A major advantage of gradient boosting models is their emphasis on misclassified cases.
Correct: D (In each successive iteration of boosting, the misclassification rate is used to weight the case. This emphasis on misclassified cases improves the performance of the model.)
*Lesson 1* Which of the following is a common interface for SAS Viya applications, that enables you to easily view, organize, and share your content from one place? a) Applications Menu b) Model Studio c) Exchange d) SAS Drive
Correct: D (SAS Drive)
*Lesson 6* The confusion matrix is the foundation for which of the following assessment plots? a) cumulative lift chart b) cumulative percent hits chart c) LIME plot d) ROC chart
Correct: D (The confusion matrix contains statistics that are the foundation for the ROC chart.)
*Lesson 4* Which activation function is commonly used in the target layer when modeling a binary target? a) exponential b) hyperbolic tangent (Tanh) c) identity d) logistic
Correct: D (The logistic function is the target layer activation function (or target layer link function) that is typically used with a binary target.)
*Lesson 6* Based on the following C-statistic values, which model has the best classification accuracy? a) 0.50 b) 0.55 c) 0.60 d) 0.65
Correct: D (The model with highest C-statistic value has the best performance.)
*Lesson 5* Suppose you are modeling data with a binary target and three inputs. The data are linearly separable. How many possible solutions exist that classify the target? a) two (one for each target outcome) b) three (one for each input) c) the dot product between the normal vector w and the vector of inputs x d. an infinite number
Correct: D (When the data are linearly separable, there are an infinite number of solutions that can classify the binary target.)
*Lesson 2* Which of the following transformations creates bins for a numeric variable? a) inverse b) exponential c) standardize d) quantile
Correct: D (quantile transformation creates bins for a numeric variable. The remaining transformations in the list are in a different category: mathematical transformations.)
*Lesson 5* A feature space is constructed by applying a nonlinear transformation to data so that linear separation exists in this higher-dimensional space. True or False?
Correct: True
*Lesson 3* The process of pruning a decision tree creates a sequence of models that are nested within each other. True or False?
False (The candidate sub-trees that result from pruning are not necessarily nested within each other.)
*Lesson 2* To define variable metadata and assign rules to modify variables (for example, assigning a type of transformation), you can use either the Data tab or the Manage Variables node. a) True b) False
True (You can perform these tasks using either the Data tab or the Manage Variables node.)
*Lesson 4* Like decision trees, neural networks can select inputs. a) True b) False
Unlike some other types of models, such as tree-based models, a neural network cannot select inputs.