NT Risk Analytics Consultant Interview
What are the three components to a Value at Risk Model (VaR)?
1. Timeframe (days, weeks, years). 2. A confidence level (90%, 95%, 99%) 3. A loss amount (or loss percentage)
The AUC for a classifier with no predictive value is ______.
0.5
The AUC for a perfect classifier is ____.
1
a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. ... It may also be known as the coefficient of determination.
R^2
The anti-monotone property of support states that the support of an itemset is _________ than that of its subsets. A. sometimes more B. always less C. always more D. sometimes less
B. Always Less or D. Sometimes Less are both correct
Error due to ______ are errors made as a result of choosing a learning algorithm that is not well suited for the data or problem. A. Variance B. Bias C. Noise D. Sampling
B. Bias
Which of these is a regression problem? A. Which states have the highest infant mortality rate? B. Can I determine a person's income based on their age and type of job? C. Identify similarities in shopping patterns between customers of a department store. D. How can I group supermarket products using purchase frequency?
B. Can I determine a person's income based on their age and type of job?
The attribute or feature that you are trying to predict, which is described by the other features within an instance is known as the ________. A. Class B. Dependent variable C. Instance D. Feature
B. Dependent Variable
For decision trees, entropy is a quantification of the level of ___________ within a set of class values. A. nodes B. disorder C. randomness D. static
B. Disorder and C. Randomness are both correct
The ROC curve is a measure of the True Positive Rate against the _________________ of a model. A. True Negative Rate B. False Positive Rate C. Specificity D. False Negative Rate
B. False Positive Rate
Which of these is not a problem with the partitioning approach of the holdout method? A. Some samples may have too many or too few difficult cases, easy-to-predict cases, or outliers. B. It's not always possible to create representative partitions of a data set. C. Substantial portions of data must be reserved to test and validate the model. D. Each partition may have a larger or smaller proportion of some classes.
B. It's not always possible to create representative partitions of a data set.
The link function used for binomial logistic regression is called the _____________. A. logarithmic function B. logit function C. logos function D. inverse function
B. Logit function
Which of these is not a limitation of the existing approaches to machine learning? A. Human intent is complex and sometimes difficult to understand. B. Machine learning systems often cannot handle the large amounts of data available to them. C. Machine learning systems tend to fail in a brittle manner. D. Machine learning systems are not able to easily transfer ideas from one problem domain to another.
B. Machine learning systems often cannot handle the large amounts of data available to them.
Which of these is a distance measure employed by k-means clustering? A. Cluster distance B. Manhattan distance C. Centroid distance D. Euclidean distance
B. Manhattan Distance or D. Euclidean Distance
Rather than the sum of squares used in linear regression, in logistic regression, the coefficients are estimated using a technique called _______________. A. Mean Estimation of Means B. Maximum Likelihood Estimation C. Gradient Boosting D. Maximum Logistic Error
B. Maximum Likelihood Estimation
The Amelia package in R is useful for dealing with ___________ data. A. imbalanced B. missing C. aggregate D. skewed
B. Missing
As we discussed in class, the elbow method makes use of the Within Cluster Sum of Squares (WCSS) metric to suggest the appropriate value for "k". If we keep increasing the value for "k", what will happen to the value for WCSS? A. The value for WCSS will tend towards 1. B. The value for WCSS will tend towards 0. C. The value for WCSS will eventually become negative. D. The value for WCSS will grow infinitely.
B. The value for WCSS will tend towards 0.
Error due to ________ are errors made as a result of not providing the learning algorithm with the right amount or type of training data. A. Bias B. Variance C. Sampling D. Randomness
B. Variance
Which of these data transformation approaches results in a data set with the mean located at zero. A. decimal scaling B. z-score normalization C. mean-max normalization D. min-max normalization
B. Z score normalization
The _______ of a dataset represents the number of features in the dataset. A. Coarseness B. Resolution C. Dimensionality D. Density
C. Dimensionality
The meta-learning approach that utilizes the principle of creating a varied team of experts is known as an ________. A. assemble B. bagged learner C. ensemble D. meta-learner
C. Ensemble
A common type of distance measure used in k-NN is the _________ distance. A. Bayesian B. Nearest C. Euclidean D. Eucalyptus
C. Euclidean
In order to choose the next feature to split on, a decision tree learner calculates the Information Gain for features A, B, C and D as 0.022, 0.609, 0.841 and 0.145 respectively. Which feature will it next choose to split on? A. Feature D B. Feature A C. Feature C D. Feature B
C. Feature C since the 0.841 gain is the highest
A clustering method in which every object belongs to every cluster with a membership weight that goes between 0 (if it absolutely doesn't belong to the cluster) and 1(if it absolutely belongs to the cluster) is known as _______ clustering. A. overlapping B. hierarchical C. fuzzy D. partitional
C. Fuzzy
Which of these is not a common approach to choosing the right value for K? A. Test different k values against a variety of test datasets and chose the one that performs best. B. The square root of the number of training examples. C. Half the number of training examples. D. Use weighted voting where the closest neighbors have larger weights.
C. Half the number of training examples
The recall of a model is a measure of the completeness of the results of its predictions. This measure has the same value as the ____________ of the model. A. kappa B. precision C. sensitivity D. specificity
C. Sensitivity
Before we use k-NN, what can we do if we have significant variance in the range of values for our features? A. We create dummy variables. B. We exclude the outlier features. C. We normalize the data. D. We convert them all to 0s and 1s.
C. We normalize the data
The primary difference between classification and regression is that classification is used to predict _____ values, while regression is used to predict ______ values. A. ordinal, nominal B. nominal, binomial C. discrete, continuous D. continuous, discrete
C. discrete, continuous
The method of imputation that fills in missing values using similar instances from the same dataset is known as _________ imputation. A. Warm-deck B. Cold-deck C. Hot-deck D. Same-deck
C.Hot-deck
The goal of cross-validation is to _____________ across the iterations. A. choose the coolest model B. improve the performance of a model C. eliminate bad models D. evaluate future performance
D. Evaluate future performance
The increased likelihood that a rule occurs in a dataset relative to its typical rate of occurrence is known as __________. A. Count B. Confidence C. Support D. Lift
D. Lift
Good clustering will produce clusters with _____ inter-class similarity and ______ intra-class similarity. A. low, low B. high, high C. high, low D. low, high
D. Low, High
A small K makes a model susceptible to noise and/or outliers and can lead to ___________. A. underfitting B. error C. randomness D. overfitting
D. Overfitting
The article draws a distinction between robots and what it calls 'bots' in that, robots are _________ agents, while 'bots' are ________ agents. A. sensory, automata B. living, autonomous C. intelligent, autonomous D. physical, virtual
D. Physical, Virtual
For decision trees, the process of remediating the size of a tree in order for it to generalize better is known as ___________. A. planing B. partitioning C. purging D. pruning
D. Pruning
The error generated by a classifier during the training stage is know as the ____________ error. A. validation B. bootstrap C. holdout D. resubstitution
D. Resubstitution
One of the ways to reduce the computational complexity of frequent itemset generation is by the use of ____________. A. the FP-Growth algorithm B. the association rules algorithm C. the post-pruning algorithm D. the apriori algorithm
D. The apriori Algorithm
The K in k-NN has to do with ______________. A. The number of clusters that need to be created to properly label the unlabeled observation. B. The number of unlabeled observations with the letter K. C. The size of the training set. D. The number of labeled observations to compare with the unlabeled observation.
D. The number of labeled observations to compare with the unlabeled observation.
Clustering results in labels against previously unlabeled data, that is why it is sometimes referred to as _____________________. A. predictive partitioning B. supervised labeling C. predictive labeling D. unsupervised classification
D. Unsupervised Classification
________, is the total value a bank is exposed to when a loan defaults.
Exposure at Default (EaD)
According to the article, as a result of advances in machine learning, systems can now outperform humans at all tasks. True or False
False
Association rules are great with small data sets. True False
False
Features with a large number of distinct values will have lower intrinsic value than features with a small number of distinct values. True False
False
Regression establishes causation between the independent variables and the dependent variable. True False
False
The AUC metric and ROC curve can be used interchangeably because if two models have the same or identical AUC values, they will always have the same ROC curve. True False
False
Decision tree learners typically output the resulting tree structure in human-readable format. This makes them well suited for applications that require transparency for legal reasons or for knowledge transfer. True False
True
Entropy is highest when the split is 50-50. However, as one class dominates the other, entropy reduces to zero. True False
True
K-Means clustering only works with numeric data. True False
True
Machine learning exists in the cross-roads between data science, statistics and computer science. True False
True
Missing values can have meaning True False
True
One of the disadvantages of the random cross-validation approach is that some instances may not be used and others may be used more than once. True False
True
One of the strengths of association rules is that they are easy to understand. True False
True
One of the weaknesses of a Random Forest model is that unlike a decision tree, the model is not easily interpretable. TRUE FALSE
True
The kappa statistic is an adjustment of accuracy by accounting for the possibility of a correct prediction by chance alone. TRUE FALSE
True
The famous computing test in which a machine might be called intelligent, if its responses to questions could convince a person that it was human, is known as the _______ test.
Turing Test
Which of the following is not one of the key branches of machine learning? A. Reinforcement Learning B. Unsupervised Learning C. Deductive Learning D. Supervised Learning
C. Deductive Learning
Which of these terms is used to describe the degree to which data exists for each feature of all observations. A. Dimensionality B. Resolution C. Density D. Coarseness
C. Density
In class we discussed 6 stages in the "Analytic Process". Which of these is not one of those stages? A. Data Summarization B. Validation and Interpretation C. Data Exploration D. Modeling
A. Data Summarization
The recursive process used in logistic regression to minimize the cost function during maximum likelihood estimation is known as _________. A. gradient descent B. logit function C. sum of squared errors D. log odds
A. Gradient Descent
One of the major disadvantages of the leave-one-out cross-validation approach is that it _______________. A. is computationally expensive B. violates the holdout principle C. is not a good predictor of future performance D. uses too much data
A. Is computationally expensive
The confidence of an association rule is the __________ of the rule. A. predictive power B. support strength C. complete coverage D. likelihood level
A. Predictive Power
As part of the data transformation process, we sometimes have to discretize our data or create dummy variables. Which of these is a reason why we would need to do this? A. Some algorithms only work with either continuous or discrete variables. B. It helps when trying to fix duplicate data. C. This is an approach to normalize our data set. D. This is an important step in balancing imbalanced datasets.
A. Some algorithms only work with either continuous or discrete variables.
A dataset with two class values that is significantly skewed (more than 90%) towards one of those class values is known as _______ dataset. A. an imbalanced B. a skewed C. a bimodal D. an inverted
A. an imbalanced
According to the formal definition of machine learning, "A computer program is said to learn from _______ with respect to some class of _______ and performance measure P, if its performance at ________, as measured by P, improves with __________". A. experience (E), tasks (T), tasks (T), experience (E) B. exposure (E), tasks (T), tasks (T), exposure (E) C. tasks (T), experience (E), tasks (T), experience (E) D. experience (E), test (T), test (T), experience (E)
A. experience (E), tasks (T), tasks (T), experience (E)
k-NN is an example of a ________ model. A. non-parametric B. parametric C. metric D. unsupervised
A. non-parametric
Lazy learners such as k-Nearest Neighbor are also known as _______ learners. A. rote learners B. non-learners C. just-in-time learners D. instance-based learners
A. rote learners or D. instance-based learners
In the 18th century, English statistician, philosopher and Presbyterian minister, Thomas Bayes, developed a mathematical theorem for probability which is still foundational in modern day machine learning. What is this theorem called? A. Naive Bayes B. Bayes Theorem C. Probabilistic Theorem D. Bayesian Stochastics
Bayes Theorem
Which of these is NOT a method used in choosing the appropriate value for "k"? A. Elbow Method B. Gap statistic C. Ankle Method D. A priori knowledge
C. Ankle Method
The process of conducting a search to identify the optimal combination of hyperparameters to use for the learning process using a choice of evaluation methods and metrics is known as _______________ tuning. A. automatic hyper B. model settings C. automated parameter D. search space
C. Automated parameter
The technique that sequentially builds strong learners as a linear combination of weak learners is known as __________. A. bagging B. bootstrap aggregation C. boosting D. bumming
C. Boosting
The sampling approach that creates a training set of equal length as the original data, using sampling with replacement, is known as _______________. A. equal length sampling B. stratified sampling C. bootstrapping D. cross-validation
C. Bootstrapping
The functions that govern how disagreements among the predictions of ensemble models are reconciled are known as _________ functions. A. sigmoid B. stacking C. combination D. allocation
C. Combination
In Association rules, a collection of one or more items is known as ______________. A. a set of items B. a ruleset C. a set of rules D. an itemset
D. An itemset
While the underlying principles of machine learning are not new, recent advances in algorithmic techniques coupled with an increase in ___________ and _________ have resulted in increased accuracy and reliability of machine learning. A. connected devices, online users B. computing horsepower, artificial intelligence C. microprocessors, connected devices D. computing power, available data
D. Computing Power, Available Data
One of the limitations to using the F-score is that it assumes that ________________________. A. recall is always more important than precision B. the harmonic mean of precision and recall is zero C. precision is always more important than recall D. equal weight should be given to both precision and recall
D. Equal weight should be given to both precision and recall
K-Means clustering is useful in creating non-spherical clusters. True False
False, K-means clustering is spherical shaped clusters.
The goal of unsupervised learning is to predict future outcomes based on prior experience. True False
False, The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.
Color, shape, angle and number of edges are examples of nominal (or discrete) features. True False
False, angle has an unlimited amount of degrees that the angle can be.
Association rules imply causality in that they explain why item B is bought whenever item A is bought. True False
False, association rules, predict the occurrence of one or more entities based on the occurrences of other entities in a certain grouping, such as a transaction or an individual.
Occam's Razor states that when presented with competing hypothetical answers to a problem, one should select the answer that makes the most assumptions. True False
False, if models are equal, choose the one with the least assumptions.
In binomial logistic regression, the best cut-off point is always at 0.5. True False
False, the best cut off point can be calculated with the ROC curve (receiving operator characteristic).
Given a set of candidate models from the same data, the model with the highest AIC is usually the "preferred" model. True False
False, we want the lowest AIC model.
_________,is the amount of money a bank or other financial institution loses when a borrower defaults on a loan, depicted as a percentage of total exposure at the time of default.
Loss Given Default (LGD)
Which of these is a method employed by the k-means algorithm to mitigate the effects of the random initialization trap? A. Centroid placement B. K-means++ C. Random initialization escape D. K-medoids
K-means++
The ____________ is the line that makes the vertical distance from the data points to the regression line as small as possible.
Least Squares Regression Line
A ______ is used in hypothesis testing to help you support or reject the null hypothesis. The _________ is the evidence against a null hypothesis. The smaller the ________, the stronger the evidence that you should reject the null hypothesis.
P value
_____________, under the Federal Reserve's Comprehensive Capital Analysis and Review (CCAR), measures net revenue forecast from asset-liability spreads and non-trading fees of banks.
Pre-Provision Net Revenue
_________, a financial term describing the likelihood of a default over a particular time horizon. It provides an estimate of the likelihood that a borrower will be unable to meet its debt obligations.
Probability of Default (PD)
In 2016, Google DeepMind's AlphaGo system made use of _________ learning to further improve it's ability to win at the game of Go. A. supervised B. stochastic C. reinforcement D. unsupervised
Reinforecement
Which of these types of visualizations is best to use to explore the correlation between two continuous features? A. Scatter plot B. Sankey diagram C. Pie chart D. Histogram
Scatter Plot
The random sampling method that tries to maintain the same class distribution as the original dataset is known as ___________. A. Stratified random sampling B. Random sampling with replacement C. Purposeful random sampling D. Systematic random sampling
Stratified Random Sampling
Clustering results in labels against previously unlabeled data, that is why it is sometimes referred to as unsupervised classification. True False
True
As the complexity of a model increases, bias decreases but variance increases. True False
True
Both online and offline learning systems are trained and tested in an offline setting before deployment. True False
True
Clustering is a type of unsupervised learning? True False
True
The logistic function is a sigmoid function that assumes values from __ to __.
Zero to 1
The machine learning approach in which an agent learns to identify strategies that minimize negative consequences while maximizing its reward is known as __________________ learning.
reinforcement