Data Minning Exam 2 Review (Practice Quiz Questions)

¡Supera tus tareas y exámenes ahora con Quizwiz!

If a model has high bias, which of the following is correct?

(High bias = underfitting), we can make the model a bit more flexible to reduce the bias, the model has a high training error

Suppose we are building a neural network with 2 hidden layers and 10 neurons in each layer, we have 5 features (input nodes) and we are solving a regression problem. How many parameters do we need to learn for this neural network?

1 hidden layer: 10 *(5 + 1) + (10 + 1) = 71 2 hidden layer: 10 *(5 + 1) + 10*(10 + 1) + (10 + 1) = 82

Explain how K-means clustering works. Write down the steps. Describe any limitations.

1) For each point, place it in the cluster whose current centroid it is nearest ‒ centroid = average of its (data)points • 2) After all points are assigned, update the locations of centroids of the k clusters • 3) Reassign all points to their closest centroid ‒ Sometimes moves points between clusters • Repeat 2 and 3 until convergence ‒ Convergence: Points don't move between clusters and centroids stabilize

We are interested in making a bet with a friend. The probability of success (winning the bet) is 0.75. What are the odds of failure?

1/3

Suppose we are building a neural network with 2 hidden layers and 2 neurons in each hidden layer. We have 5 features (input nodes) and we are solving a regression problem. How many parameters do we need to learn for this neural network?

21

We are interested in making a bet with a friend. The probability of success (winning the bet) is 0.75. What are the odds of success?

3

Suppose we have a dataset with n data points (rows). What is the usually recommended number of folds for k-fold cross-validation?

5-10

Suppose you are given data with a numerical feature X. We know the following info about X. mean of X = M = 62 median of X = Med = 70 Max of X = 95 Min of X = 35 Which of the following statement is correct?

50 % of X values are less than Med

Three components of Gradient Boosting

A Loss function: is a function that measures the amount of error inthe model.• E.g. Residual sum of squares (RSS) from linear regression or logarithmic loss forclassification‒ A Weak Learner: is a simple model which might not have a greatprediction power alone. However, a bunch of weak learners combinedcan be a very good predictive model.• Small tree with one or two depth would be an example of a weak learner and isusually used in gradient boosting.‒ Additive Model: Adding weak learners (simple tree models) to getthe ensemble model.BAIS 3500: Data Mining

Which scenario is correct according to Condorct's Jury Theorem? X: people are 60% correct Y: people are 40% correct Adding more people from group X to vote would increase the chances of majority winning Adding more people from group Y to vote would increase the chances of majority winning Adding more people from group X to vote would decrease the chances of majority winning None of the other options.

Adding more people from group X to vote would increase the chances of majority winning

Which of the following is an integral component of Gradient Boosting? Weights given to different data points Additive Model Maximum Likelihood Least Squares

Additive Model

Which of the following statements is correct about the Ensemble Method? i) It is a set of models combined in some way to make predictions. ii) Each model in the ensemble will be built from a potentially different training set. iii) The models in an ensemble must be independent/diverse. All (i, ii, iii) i and ii i and iii ii and iii

All (i, ii, iii)

Which of the following statements is correct? i) Gradient Boosting is a state-of-the-art model for tabular (structured) data. ii) Gradient Boosting is quite a flexible model and can easily overfit. iii) The Gradient Boosting model is more flexible than one decision tree model. ii and iii i and iii iii only All (i, ii, iii)

All (i, ii, iii)

Suppose we have the proportions of students with different letter grades (A,B,C,D,F) in the Data Mining class. p_X = proportion of students with letter grade X. Which of the following combinations will have the highest entropy? p_A = 2/5, p_B = 2/10, p_C = 2/10, p_D = 2/10, p_F = 0 All proportions are equal to 1/5 p_A = 2/5, p_B = 2/10, p_C = 1/10, p_D = 2/10, p_F = 1/10 p_A = 1/5, p_B = 3/5, p_C = 0, p_D = 0, p_F = 1/5

All proportions are equal to 1/5

Which of the following statements is correct about boosting methods?

Boosting involves building multiple models in a sequence

How do we build different training sets when building a random forest ensemble model?

Bootstrapping (Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods.)

Which scenario has data leakage A. Predicting whether a criminal will re-offend in the future. Features include: his/her age, previous offenses, marital status, etc. B. Predicting the salary of a fresh graduate student before he finds a job. Features include: the GPA of the student, degree, major, age, internship experience (if there is any), etc C. Predicting the total amount of money a fundraiser will receive on GoFundMe before posting to GoFundMe. Features include: the characteristics of the project, the personal information of the fundraiser, the total number of donors, time of the year, etc

C (Total Number of Donors is unknown)

You are tasked with building a model to predict whether Iowa will beat Iowa State in the football game this season. This data mining problem would be an example of

Classification

Which of the following is not part of the "Data Preparation" steps, as laid out in the lecture notes? Transforming Features Collecting Data Data Exploration Partitioning Data

Collecting Data

Which of the following is not a machine learning approach to estimate the true function f? Parametric Methods Dynamic Programming Non-parametric methods

Dynamic Programming

Which of the following is not an example of Unsupervised Learning? Reducing higher dimension data to low dimension to visualize the data Movies grouped by the ratings assigned by movie viewers. Estimating the customer income levels Finding subgroups of breast cancer patients grouped by their gene expression measurements

Estimating the customer income levels

After the "evaluation" phase is completed, a data scientist needs to proceed to deployment and cannot go back to update the previous steps because they are final.

False

CRISP-DM is enforced to make sure that the experiences of the individual team members (data scientists) will have a significant impact on the output of the data mining process

False

Data Mining is focused on extracting/collecting data for analysis

False

Entergy Formula P = ¾, P2 = ¼

H = -(sum(Pi log (pi)) H = - (¾ log (¾) + ¼ log (¼))

Which of the following is not part of the CRISP-DM? Model Evaluation ,Data Preparation, Hiring new staff, Business Understanding

Hiring new staff

We are building a model to predict whether an F1 car will win the race. The average speed of the car (miles per hour) is our only feature. We run a logistic regression model and find that b0 = -3, b1 = 0.015. What is the meaning (interpretation) of b1 in this model?

If avg speed increases by one unit, log odds of winning change by b1

How is Random Forest different from Bagging?

In Random forest, each tree model at each split is built using a small randomly selected set of features.

Which of the following is correct about K-means? None of the other options K-means always find the right number of clusters in the given data K must be chosen so that within cluster average distance is minimal It does not matter what value of K is used for K-means

K must be chosen so that within cluster average distance is minimal

Which of the following methods does not involve measuring distances? Linear Regression KNN K-Means Clustering Hierarchical Clustering

Linear Regression

We fit different Boosting models on the same dataset. The results are given below. Which models are potentially underfitting and overfitting, respectively? Error: Training Test Model 1 25.3 75 Model 2 50 61 Model 3 63 83? Model 1 Overfitting only Model 1 Overfitting, Model 3 Underfitting Model 2 Underfitting only Model 3 Overfitting, Model 2 underfitting

Model 1 Overfitting, Model 3 Underfitting

Which of the following is not a significant reason for the recent popularity of neural networks? Neural Networks were developed recently More data that can be stored efficiently now More computation power available now Backpropagation algorithm and other better algos to train models now

Neural Networks were developed recently

What is the fundamental difference between an artificial neural network (ANN) and a deep neural network (DNN)?

No. of hidden layers (DNN has more hidden layers, they automate unstructured data)

An ESPN Analyst is estimating the viewership number for the Iowa vs. Iowa State football game. The model built has the form f(X) = b0 + b1 Hawkeye_Fans + b2 IowaState_Fans + b3 Historical_Viewers + ... Which of the following is not a correct way to describe this model f? None of the other options Parametric Model Linear Regression Non-Parametric Model

Non-Parametric Model

Which of the following is not a stopping criterion when building a tree? Minimum number of data points in a leaf Maximum depth allowed Number of random features to use when splitting a tree Minimum split size of a node

Number of random features to use when splitting a tree

What is the type of feature: Tesla stock price (sample values: $190, $ 80, $220, etc)

Numeric

Which type of data mining problem is it: Predicting the daily new COVID-19 cases in Iowa

Regression

Cross-validation is used to estimate

Test error using training data

Which of the following statements about decision trees is correct? They are interpretable models Decision Tree can only be used for classification They are the cutting edge (state of the art) method for classification None of the other options.

They are interpretable models

The data used to build a model is called

Training Set

Overfitting happens when Training error is High, Test error is Low Training error is Low, Test error is Low Training error is low, Test error is high Training error is High, Test error is High

Training error is low, Test error is high

During the Data Prep step of CRISP-DM, we might have to merge data from different tables into a single table for modeling

True

Clustering is an example of a

Unsupervised Learning

Which of the following is not an issue or a limitation for clustering? Feature scaling Using the idea of distance to identify similar items Similarity/Distance choice given the data How many clusters to choose?

Using the idea of distance to identify similar items

When building decision trees, we want to make splits

With the highest information gain

Centroid of a cluster is?

a point which is the average of all data points in that cluster

How would you deal with the missing value for "Scientific Writing"?

encode missing as a new category (scientific writing has structurally missing data, i.e., it's missing for some students because they didn't take it. In this case, you cannot impute with mode since these students are not supposed to have a grade. You can only replace the missing with a new category, which you can name it as "no grade", or anything that can differentiate from the existing letter grades)

Which of the following statements is correct? i) Decision trees can be post-pruned. ii) Decision boundaries for trees are rectangular boxes like regions. iii) Decision trees are not sensitive to changes in training data. ii only ii and iii i and ii i and iii

i and ii

Hierarchical clustering involves i) merging nearby clusters ii) specifying the number of clusters before implementation iii) measuring the distance between clusters (with multiple points)

i and iii

Which of the following statements is correct? i) Information Gain helps us select the best splits. ii) Information Gain is a measure of uncertainty. iii) Information Gain involves calculating the entropy of parent and children nodes. i and ii i and iii i only ii and iii

i and iii

Which of the following is true about Random Forests? i) It is an ensemble method where each individual model is a tree. ii) Random Forest does not make sure that each individual model is independent. iii) Random Forest does not have any hyperparameters to tune.

i only

Which of the following statements are correct for Logistic Regression? (i) Logistic Regression is used for a classification problem, where the target variable is categorical. (ii) Logistic Regression is an example of unsupervised learning. (iii) Logistic Regression is used for predicting a quantitative variable since it has the word regression in its name. i and iii i and ii i only All (i, ii, and iii)

i only

Which of the following statements is correct? i) Each model in boosting is a weak learner. ii) Each model in boosting is learned in a way that only a small set of features are available at each step of making splits. iii) There is only one kind of boosting method. ii i and ii ii and iii i only

i only

How can we increase the complexity of a neural network structure? i) Increase the no. of neurons in the output layer ii) Increase the no. of neurons in the hidden layer. iii) Increase the no. of neurons in the input layer. ii i & ii i & iii ii & iii

ii

Which of the following statements are correct about the gradient boosting method? i) There's no restriction on the size of each tree model we build. ii) The number of trees must not be too large as it can lead to overfitting. iii) Learning rate is not a hyperparameter for gradient boosting. ii and iii i and ii iii only ii only

ii only

What does Bias Measure?

is a measure of the deviation between the average predicted value and the true value. Bias usually happens when we fit too simple of a model to data that needs more flexible (complex) model. Model with high bias are in-flexible and results in a underfitted model.High bias always leads to high error on training and test data.

Overfitting refers to...

is the tendency of data mining procedures to tailor models to the training data, at the expense of generalization to previously unseen data points (test set)

We are building a model to predict whether an F1 car will win the race. The average speed of the car (miles per hour) is our only feature. We run a logistic regression model and find that b0 = -3, b1 = 0.015. What is the meaning of b0 in this model?

log odds of winning when avg car speed is 0

Suppose we have n data points (rows) in our dataset. We decided to split the data such that one data point is kept for validation and all the other data points are used for training. What is the value of k here if we consider this as a k-fold cross-validation?

n

Bias-Variance Tradeoff focuses primarily on the decrease in variance when model becomes less flexible None of the other options opposite behavior of bias and variance as model flexibility changes the decrease in bias when model becomes more complex

opposite behavior of bias and variance as model flexibility changes

If the model high variance

the model could be overfitting, it increasing the flexibility of a model increases the variance

A histogram is used to show

the range (distribution) of a numerical feature


Conjuntos de estudio relacionados

The Law of Supply and Demand Assignment and Quiz

View Set

The Columbian Exchange and Global Trade

View Set

Chapter 7 Antibacterial drugs that disrupt the bacterial cell wall

View Set

Chapter 16: Gastrointestinal and Urologic Emergencies

View Set

EF3 Upper Int 7A Frequently Confused Verbs

View Set

Definitions of flexion/extension/abduction ect.

View Set

Sedimentary Petrology Oppo Exam 2 (excluding diagenesis)

View Set