NT Risk Analytics Consultant Interview

Ace your homework & exams now with Quizwiz!

What are the three components to a Value at Risk Model (VaR)?

1. Timeframe (days, weeks, years). 2. A confidence level (90%, 95%, 99%) 3. A loss amount (or loss percentage)

The AUC for a classifier with no predictive value is ______.

0.5

The AUC for a perfect classifier is ____.

1

a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model. ... It may also be known as the coefficient of determination.

R^2

The anti-monotone property of support states that the support of an itemset is _________ than that of its subsets. A. sometimes more B. always less C. always more D. sometimes less

B. Always Less or D. Sometimes Less are both correct

Error due to ______ are errors made as a result of choosing a learning algorithm that is not well suited for the data or problem. A. Variance B. Bias C. Noise D. Sampling

B. Bias

Which of these is a regression problem? A. Which states have the highest infant mortality rate? B. Can I determine a person's income based on their age and type of job? C. Identify similarities in shopping patterns between customers of a department store. D. How can I group supermarket products using purchase frequency?

B. Can I determine a person's income based on their age and type of job?

The attribute or feature that you are trying to predict, which is described by the other features within an instance is known as the ________. A. Class B. Dependent variable C. Instance D. Feature

B. Dependent Variable

For decision trees, entropy is a quantification of the level of ___________ within a set of class values. A. nodes B. disorder C. randomness D. static

B. Disorder and C. Randomness are both correct

The ROC curve is a measure of the True Positive Rate against the _________________ of a model. A. True Negative Rate B. False Positive Rate C. Specificity D. False Negative Rate

B. False Positive Rate

Which of these is not a problem with the partitioning approach of the holdout method? A. Some samples may have too many or too few difficult cases, easy-to-predict cases, or outliers. B. It's not always possible to create representative partitions of a data set. C. Substantial portions of data must be reserved to test and validate the model. D. Each partition may have a larger or smaller proportion of some classes.

B. It's not always possible to create representative partitions of a data set.

The link function used for binomial logistic regression is called the _____________. A. logarithmic function B. logit function C. logos function D. inverse function

B. Logit function

Which of these is not a limitation of the existing approaches to machine learning? A. Human intent is complex and sometimes difficult to understand. B. Machine learning systems often cannot handle the large amounts of data available to them. C. Machine learning systems tend to fail in a brittle manner. D. Machine learning systems are not able to easily transfer ideas from one problem domain to another.

B. Machine learning systems often cannot handle the large amounts of data available to them.

Which of these is a distance measure employed by k-means clustering? A. Cluster distance B. Manhattan distance C. Centroid distance D. Euclidean distance

B. Manhattan Distance or D. Euclidean Distance

Rather than the sum of squares used in linear regression, in logistic regression, the coefficients are estimated using a technique called _______________. A. Mean Estimation of Means B. Maximum Likelihood Estimation C. Gradient Boosting D. Maximum Logistic Error

B. Maximum Likelihood Estimation

The Amelia package in R is useful for dealing with ___________ data. A. imbalanced B. missing C. aggregate D. skewed

B. Missing

As we discussed in class, the elbow method makes use of the Within Cluster Sum of Squares (WCSS) metric to suggest the appropriate value for "k". If we keep increasing the value for "k", what will happen to the value for WCSS? A. The value for WCSS will tend towards 1. B. The value for WCSS will tend towards 0. C. The value for WCSS will eventually become negative. D. The value for WCSS will grow infinitely.

B. The value for WCSS will tend towards 0.

Error due to ________ are errors made as a result of not providing the learning algorithm with the right amount or type of training data. A. Bias B. Variance C. Sampling D. Randomness

B. Variance

Which of these data transformation approaches results in a data set with the mean located at zero. A. decimal scaling B. z-score normalization C. mean-max normalization D. min-max normalization

B. Z score normalization

The _______ of a dataset represents the number of features in the dataset. A. Coarseness B. Resolution C. Dimensionality D. Density

C. Dimensionality

The meta-learning approach that utilizes the principle of creating a varied team of experts is known as an ________. A. assemble B. bagged learner C. ensemble D. meta-learner

C. Ensemble

A common type of distance measure used in k-NN is the _________ distance. A. Bayesian B. Nearest C. Euclidean D. Eucalyptus

C. Euclidean

In order to choose the next feature to split on, a decision tree learner calculates the Information Gain for features A, B, C and D as 0.022, 0.609, 0.841 and 0.145 respectively. Which feature will it next choose to split on? A. Feature D B. Feature A C. Feature C D. Feature B

C. Feature C since the 0.841 gain is the highest

A clustering method in which every object belongs to every cluster with a membership weight that goes between 0 (if it absolutely doesn't belong to the cluster) and 1(if it absolutely belongs to the cluster) is known as _______ clustering. A. overlapping B. hierarchical C. fuzzy D. partitional

C. Fuzzy

Which of these is not a common approach to choosing the right value for K? A. Test different k values against a variety of test datasets and chose the one that performs best. B. The square root of the number of training examples. C. Half the number of training examples. D. Use weighted voting where the closest neighbors have larger weights.

C. Half the number of training examples

The recall of a model is a measure of the completeness of the results of its predictions. This measure has the same value as the ____________ of the model. A. kappa B. precision C. sensitivity D. specificity

C. Sensitivity

Before we use k-NN, what can we do if we have significant variance in the range of values for our features? A. We create dummy variables. B. We exclude the outlier features. C. We normalize the data. D. We convert them all to 0s and 1s.

C. We normalize the data

The primary difference between classification and regression is that classification is used to predict _____ values, while regression is used to predict ______ values. A. ordinal, nominal B. nominal, binomial C. discrete, continuous D. continuous, discrete

C. discrete, continuous

The method of imputation that fills in missing values using similar instances from the same dataset is known as _________ imputation. A. Warm-deck B. Cold-deck C. Hot-deck D. Same-deck

C.Hot-deck

The goal of cross-validation is to _____________ across the iterations. A. choose the coolest model B. improve the performance of a model C. eliminate bad models D. evaluate future performance

D. Evaluate future performance

The increased likelihood that a rule occurs in a dataset relative to its typical rate of occurrence is known as __________. A. Count B. Confidence C. Support D. Lift

D. Lift

Good clustering will produce clusters with _____ inter-class similarity and ______ intra-class similarity. A. low, low B. high, high C. high, low D. low, high

D. Low, High

A small K makes a model susceptible to noise and/or outliers and can lead to ___________. A. underfitting B. error C. randomness D. overfitting

D. Overfitting

The article draws a distinction between robots and what it calls 'bots' in that, robots are _________ agents, while 'bots' are ________ agents. A. sensory, automata B. living, autonomous C. intelligent, autonomous D. physical, virtual

D. Physical, Virtual

For decision trees, the process of remediating the size of a tree in order for it to generalize better is known as ___________. A. planing B. partitioning C. purging D. pruning

D. Pruning

The error generated by a classifier during the training stage is know as the ____________ error. A. validation B. bootstrap C. holdout D. resubstitution

D. Resubstitution

One of the ways to reduce the computational complexity of frequent itemset generation is by the use of ____________. A. the FP-Growth algorithm B. the association rules algorithm C. the post-pruning algorithm D. the apriori algorithm

D. The apriori Algorithm

The K in k-NN has to do with ______________. A. The number of clusters that need to be created to properly label the unlabeled observation. B. The number of unlabeled observations with the letter K. C. The size of the training set. D. The number of labeled observations to compare with the unlabeled observation.

D. The number of labeled observations to compare with the unlabeled observation.

Clustering results in labels against previously unlabeled data, that is why it is sometimes referred to as _____________________. A. predictive partitioning B. supervised labeling C. predictive labeling D. unsupervised classification

D. Unsupervised Classification

________, is the total value a bank is exposed to when a loan defaults.

Exposure at Default (EaD)

According to the article, as a result of advances in machine learning, systems can now outperform humans at all tasks. True or False

False

Association rules are great with small data sets. True False

False

Features with a large number of distinct values will have lower intrinsic value than features with a small number of distinct values. True False

False

Regression establishes causation between the independent variables and the dependent variable. True False

False

The AUC metric and ROC curve can be used interchangeably because if two models have the same or identical AUC values, they will always have the same ROC curve. True False

False

Decision tree learners typically output the resulting tree structure in human-readable format. This makes them well suited for applications that require transparency for legal reasons or for knowledge transfer. True False

True

Entropy is highest when the split is 50-50. However, as one class dominates the other, entropy reduces to zero. True False

True

K-Means clustering only works with numeric data. True False

True

Machine learning exists in the cross-roads between data science, statistics and computer science. True False

True

Missing values can have meaning True False

True

One of the disadvantages of the random cross-validation approach is that some instances may not be used and others may be used more than once. True False

True

One of the strengths of association rules is that they are easy to understand. True False

True

One of the weaknesses of a Random Forest model is that unlike a decision tree, the model is not easily interpretable. TRUE FALSE

True

The kappa statistic is an adjustment of accuracy by accounting for the possibility of a correct prediction by chance alone. TRUE FALSE

True

The famous computing test in which a machine might be called intelligent, if its responses to questions could convince a person that it was human, is known as the _______ test.

Turing Test

Which of the following is not one of the key branches of machine learning? A. Reinforcement Learning B. Unsupervised Learning C. Deductive Learning D. Supervised Learning

C. Deductive Learning

Which of these terms is used to describe the degree to which data exists for each feature of all observations. A. Dimensionality B. Resolution C. Density D. Coarseness

C. Density

In class we discussed 6 stages in the "Analytic Process". Which of these is not one of those stages? A. Data Summarization B. Validation and Interpretation C. Data Exploration D. Modeling

A. Data Summarization

The recursive process used in logistic regression to minimize the cost function during maximum likelihood estimation is known as _________. A. gradient descent B. logit function C. sum of squared errors D. log odds

A. Gradient Descent

One of the major disadvantages of the leave-one-out cross-validation approach is that it _______________. A. is computationally expensive B. violates the holdout principle C. is not a good predictor of future performance D. uses too much data

A. Is computationally expensive

The confidence of an association rule is the __________ of the rule. A. predictive power B. support strength C. complete coverage D. likelihood level

A. Predictive Power

As part of the data transformation process, we sometimes have to discretize our data or create dummy variables. Which of these is a reason why we would need to do this? A. Some algorithms only work with either continuous or discrete variables. B. It helps when trying to fix duplicate data. C. This is an approach to normalize our data set. D. This is an important step in balancing imbalanced datasets.

A. Some algorithms only work with either continuous or discrete variables.

A dataset with two class values that is significantly skewed (more than 90%) towards one of those class values is known as _______ dataset. A. an imbalanced B. a skewed C. a bimodal D. an inverted

A. an imbalanced

According to the formal definition of machine learning, "A computer program is said to learn from _______ with respect to some class of _______ and performance measure P, if its performance at ________, as measured by P, improves with __________". A. experience (E), tasks (T), tasks (T), experience (E) B. exposure (E), tasks (T), tasks (T), exposure (E) C. tasks (T), experience (E), tasks (T), experience (E) D. experience (E), test (T), test (T), experience (E)

A. experience (E), tasks (T), tasks (T), experience (E)

k-NN is an example of a ________ model. A. non-parametric B. parametric C. metric D. unsupervised

A. non-parametric

Lazy learners such as k-Nearest Neighbor are also known as _______ learners. A. rote learners B. non-learners C. just-in-time learners D. instance-based learners

A. rote learners or D. instance-based learners

In the 18th century, English statistician, philosopher and Presbyterian minister, Thomas Bayes, developed a mathematical theorem for probability which is still foundational in modern day machine learning. What is this theorem called? A. Naive Bayes B. Bayes Theorem C. Probabilistic Theorem D. Bayesian Stochastics

Bayes Theorem

Which of these is NOT a method used in choosing the appropriate value for "k"? A. Elbow Method B. Gap statistic C. Ankle Method D. A priori knowledge

C. Ankle Method

The process of conducting a search to identify the optimal combination of hyperparameters to use for the learning process using a choice of evaluation methods and metrics is known as _______________ tuning. A. automatic hyper B. model settings C. automated parameter D. search space

C. Automated parameter

The technique that sequentially builds strong learners as a linear combination of weak learners is known as __________. A. bagging B. bootstrap aggregation C. boosting D. bumming

C. Boosting

The sampling approach that creates a training set of equal length as the original data, using sampling with replacement, is known as _______________. A. equal length sampling B. stratified sampling C. bootstrapping D. cross-validation

C. Bootstrapping

The functions that govern how disagreements among the predictions of ensemble models are reconciled are known as _________ functions. A. sigmoid B. stacking C. combination D. allocation

C. Combination

In Association rules, a collection of one or more items is known as ______________. A. a set of items B. a ruleset C. a set of rules D. an itemset

D. An itemset

While the underlying principles of machine learning are not new, recent advances in algorithmic techniques coupled with an increase in ___________ and _________ have resulted in increased accuracy and reliability of machine learning. A. connected devices, online users B. computing horsepower, artificial intelligence C. microprocessors, connected devices D. computing power, available data

D. Computing Power, Available Data

One of the limitations to using the F-score is that it assumes that ________________________. A. recall is always more important than precision B. the harmonic mean of precision and recall is zero C. precision is always more important than recall D. equal weight should be given to both precision and recall

D. Equal weight should be given to both precision and recall

K-Means clustering is useful in creating non-spherical clusters. True False

False, K-means clustering is spherical shaped clusters.

The goal of unsupervised learning is to predict future outcomes based on prior experience. True False

False, The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

Color, shape, angle and number of edges are examples of nominal (or discrete) features. True False

False, angle has an unlimited amount of degrees that the angle can be.

Association rules imply causality in that they explain why item B is bought whenever item A is bought. True False

False, association rules, predict the occurrence of one or more entities based on the occurrences of other entities in a certain grouping, such as a transaction or an individual.

Occam's Razor states that when presented with competing hypothetical answers to a problem, one should select the answer that makes the most assumptions. True False

False, if models are equal, choose the one with the least assumptions.

In binomial logistic regression, the best cut-off point is always at 0.5. True False

False, the best cut off point can be calculated with the ROC curve (receiving operator characteristic).

Given a set of candidate models from the same data, the model with the highest AIC is usually the "preferred" model. True False

False, we want the lowest AIC model.

_________,is the amount of money a bank or other financial institution loses when a borrower defaults on a loan, depicted as a percentage of total exposure at the time of default.

Loss Given Default (LGD)

Which of these is a method employed by the k-means algorithm to mitigate the effects of the random initialization trap? A. Centroid placement B. K-means++ C. Random initialization escape D. K-medoids

K-means++

The ____________ is the line that makes the vertical distance from the data points to the regression line as small as possible.

Least Squares Regression Line

A ______ is used in hypothesis testing to help you support or reject the null hypothesis. The _________ is the evidence against a null hypothesis. The smaller the ________, the stronger the evidence that you should reject the null hypothesis.

P value

_____________, under the Federal Reserve's Comprehensive Capital Analysis and Review (CCAR), measures net revenue forecast from asset-liability spreads and non-trading fees of banks.

Pre-Provision Net Revenue

_________, a financial term describing the likelihood of a default over a particular time horizon. It provides an estimate of the likelihood that a borrower will be unable to meet its debt obligations.

Probability of Default (PD)

In 2016, Google DeepMind's AlphaGo system made use of _________ learning to further improve it's ability to win at the game of Go. A. supervised B. stochastic C. reinforcement D. unsupervised

Reinforecement

Which of these types of visualizations is best to use to explore the correlation between two continuous features? A. Scatter plot B. Sankey diagram C. Pie chart D. Histogram

Scatter Plot

The random sampling method that tries to maintain the same class distribution as the original dataset is known as ___________. A. Stratified random sampling B. Random sampling with replacement C. Purposeful random sampling D. Systematic random sampling

Stratified Random Sampling

Clustering results in labels against previously unlabeled data, that is why it is sometimes referred to as unsupervised classification. True False

True

As the complexity of a model increases, bias decreases but variance increases. True False

True

Both online and offline learning systems are trained and tested in an offline setting before deployment. True False

True

Clustering is a type of unsupervised learning? True False

True

The logistic function is a sigmoid function that assumes values from __ to __.

Zero to 1

The machine learning approach in which an agent learns to identify strategies that minimize negative consequences while maximizing its reward is known as __________________ learning.

reinforcement


Related study sets

Ch 8: Social Media, Peer Production, and Web 2.0

View Set

Cardiopulmonary Pathophysiology 1 Midterm Review

View Set

Social Psych Exam 3 - Cultural Psychology

View Set