DATA 5600 - Final Exam

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

QUIZ 3 AND 5 HAVE MORE QUESTIONS TO GO OVER

LOOK AT QUIZ 3 AND 5

In K-Nearest Neighbors (KNN) algorithm, what does the "K" represent? a. Number of observations b. Number of neighbors c. Number of features d. None of the above

Number of neighbors

Which of the following statements are correct in regards to "why do we use cross validation?" in machine learning? A) For model architecture selection B) For estimation of model performance in train set a. Only A b. Only B c. Both A and B d. Neither A nor B

Only A

Which of the following statements is NOT correct with regards to ridge regressions? A) Ridge regression uses L1 norm. B) Ridge regression can shrink to estimates toward zero (and not exactly zero). C) Cross validation is used to select a good value for the shrinkage parameter. D) It is best to apply ridge regression after variable standardization. a. Only B b. Only A c. Only A and D d. C

Only A

Which of the following statements is correct with regards to the concept of norms? A) In mathematics, the norm of a vector is its length. B) L2 norm is based on Least absolute errors C) L1 norm is based on Least squared errors a. All of the above b. Only A c. Only B and C d. Only A and B

Only A

Which of the following is NOT a typical goal of econometrics research? a. Estimating relationships between variables b. Testing Hypothesis c. Outperforming complicated machine learning or deep learning models d. Predicting random variables

Outperforming complicated machine learning or deep learning models

If an independent variable in a multiple linear regression model is an exact linear combination of other independent variables, the model suffers from the: a. Multicollinearity b. Perfect collinearity c. Heteroskedasticity d. Homoskedasticity

Perfect collinearity

Which of the following models is not an example of penalized regression? a. Ridge regression b. Lasso regression c. Polynomial regression d. Elastic net regression

Polynomial regression

This feature scaling method will subtract the average from each observation and then divide that by the standard deviation of the observations. a. Min-Max normalization over [0,1] b. Mean normalization c. Standardization (Z-core) d. Min-Max normalization over [-1,1]

Standardization (Z-core)

Which of the following is a code editor? a. Python b. Anaconda c. VS Code d. PyCaret

VS Code

Which of the following statements is NOT correct with regards to the definition of Cost function? a. The cost function tells us "how good" our model is at making predictions for a given set of parameters. b. The cost function has its own curve and its own gradients. The slope of this curve tells us how to update our parameters to make the model more accurate. c. We can always use gradient descent to optimize the cost function even if the cost function is not differentiable d. None of the above

We can always use gradient descent to optimize the cost function even if the cost function is not differentiable

In two dimensional space, the neighborhood represented by Euclidian distance is a ------------ a. circle b. square c. star d. oval

circle

In linear regression, we fit the line using ---------- and in logistic regression we fit the S curve using -------- a. maximum likelihood - maximum likelihood b. least squares - maximum likelihood c. least squares - least squares d. maximum likelihood- least squares

least squares - maximum likelihood

The main idea of regularization in machine learning is to introduce a small amount of bias to the model by making it ------------------ in order to get a significant reduction in ----------------. a. less flexible - model variance b. more flexible - model variance c. more flexible - model bias d. less flexible - model bias

less flexible - model variance

Considering a two-class classification problem (2by2 confusion matrix), a good machine learning model is a model that -------------- a. maximizes true positives and true negatives, and minimizes the false positives and false negatives b. maximizes the true positives and false positives, and minimize the true negatives and false negatives c. minimizes true positives and true negatives, and maximizes the false positives and false negatives d. minimizes the true positives and false positives, and maximizes the true negatives and false negatives

maximizes true positives and true negatives, and minimizes the false positives and false negatives

In the gradient descent algorithm, if the learning rate is too -----------, gradient descent can be very slow and if the learning rate is too ---------, the gradient descent can even diverge. a. small - big b. big - small c. small - small d. big - big

small - big

One of the main goals in machine learning is to see if the model predictions are closed to the actual numbers in the ------------ set. This is called generalization. a. train b. test c. validation d. cross validation

test

Which one is correct definition of Supervised learning? In supervised learning ------- a. the machines learn to model relationships based on labeled data! b. the machines are not given labeled data c. the machines learn from interacting with its environment by producing actions and discover rewards

the machines learn to model relationships based on labeled data!

Which of the following KNN classification models has a most flexible (complex) decision boundary? i.e. maximum accuracy in the train set. A KNN model with K equal to? a. 10 b. 5 c. 1 d. Sample size

1

What is the common-sense accuracy (the benchmark accuracy) for the following classification problem? - You want to predict if someone is going to default on their credit card or not while only 10% of the observations in the train set have defaulted in reality. a. 10% b. 90% c. 5% d. 45%

90%

Which of the following statements accurately describes the trade-off between bias and variance in decision trees? a. A smaller tree with fewer splits will have lower bias and higher variance. b. A larger tree with more splits will have higher bias and lower variance. c. A smaller tree with fewer splits may lead to lower variance and better interpretation, but at the cost of higher bias. d. A larger tree with more splits may lead to lower bias and better interpretation, but at the cost of higher variance.

A smaller tree with fewer splits may lead to lower variance and better interpretation, but at the cost of higher bias.

A Challenging one!!! Suppose you are hired as a data analyst to analyze some COVID related data. You want to use logistic regression and construct a confusion matrix based on the COVID test results. If your goal is to avoid missing too many cases of COVID (avoid false negative), then A: which of the following probability thresholds would satisfy your objective? B: what is the consequence of that? Hint: if y_hat > threshold then your prediction is positive, otherwise negative. a. A= 0.2 and B: lower precision and higher recall b. A= 0.8 and B: lower precision and higher recall c. A= 0.2 and B: higher precision and lower recall d. A= 0.8 and B: higher precision and lower recall

A= 0.2 and B: lower precision and higher recall

Which of the following statements are correct: A) K nearest neighbor (KNN) is one of the simplest and best known non parametric supervised learning technique most often used for classification. B) Contrary to other learning algorithms that allow discarding the training data after the model is built, KNN keeps all training examples in memory. a. Only A b. Only B c. Both A and B d. Neither A nor B

Both A and B

Which of the following statements are correct: A) One big difference between linear regression and logistic regression is how the line is fit to the data. B) Maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. a. Only A b. Only B c. Neither A nor B d. Both A and B

Both A and B

Which of the following statements are correct? A: Machine Learning is a subset of AI that enables computers to learn from data. B: ML involves automated detection of meaningful patterns in data and apply the pattern to make predictions on unseen data! The purpose is to generalize. a. only A b. only B c. Both A and B d. Neither A nor B

Both A and B

Which of the following statements in a disadvantage of polynomial regression model? A) Polynomial models have notorious tail behavior B) Polynomial models are global fit a. Only A b. Only B c. Both A and B d. Neither A nor B

Both A and B

Which of the following statements is correct? A) A model is called a linear model as long as it is linear in parameters. B) Polynomial regression model is a special case of general linear regression models. a. Neither A nor B b. Both A and B c. Only A d. Only B

Both A and B

Why do we need to do feature scaling in machine learning? To .... A) Avoid numerical overflow and speed up the machine learning algorithm B) Reduce dominant effects of specific features a. Only A b. Only B c. Neither A nor B d. Both A and B

Both A and B

Which of following statements are correct? A) Linear probability model (LPM) may generate outcomes (probabilities) less than zero and greater than one. B) Logistic regression model may generate outcomes (probabilities) less than zero and greater than one. C) Both the LPM and logistic regression models, output probabilities. a. Only A b. Only B c. Both A and C d. Both B and C

Both A and C

Regression and Classification methods are different types of Reinforcement learning! True or False

False - Regression and Classification methods are different types of supervised learning!

In order to decide which feature to begin with and where to put the split, the CART algorithm compares -------------- for classification and ----------------- for regression in each region. a. Gini impurity - MSE b. Gini impurity - Gini impurity c. MSE - MSE d. MSE - Gini impurity

Gini impurity - MSE

What is the main difference between Google Colab and other cloud platforms for machine learning tasks? a. Google Colab is free, while other cloud platforms have a cost associated with them b. Google Colab has more powerful hardware than other cloud platforms c. Google Colab offers more advanced collaboration features than other cloud platforms d. Google Colab allows for faster data transfer than other cloud platforms

Google Colab is free, while other cloud platforms have a cost associated with them

Which of the following optimization solvers is the most stable one in minimizing the cost function? (i.e has the most stable path to the global minimum of the cost function) a. Gradient Descent b. Minibatch Gradient Descent c. Stochastic Gradient Descent d. They all have the same robustness

Gradient Descent

Which of the following is a disadvantage of KNN algorithm? a. It is simple to implement b. It can handle non-linear data c. It requires a lot of memory to store all the training data d. It is easy to implement for multi class problem

It requires a lot of memory to store all the training data

Which of the following is a web-based interactive development environment? a. JupyterLab b. VS code c. Anaconda d. PyCaret

JupyterLab

In this type of penalized regression, we can force some of the coefficient estimates to be exactly zero. In other words, this model allows feature selection. a. Ridge and LASSO b. Ridge and Elastic Net c. LASSO and Elastic Net d. only LASSO

LASSO and Elastic Net

---------- is a plot that shows the relationship between the amount of training data and the performance of a machine learning model. It is used to diagnose whether a model has high bias, high variance, or is just right. a. Learning rate b. Learning curve c. Cross validation curve d. None of the above

Learning curve

Which of the following regression models CANNOT completely overfit the data by construction? a. Linear regression b. KNN regression c. Both linear and KNN regression d. None of them

Linear regression

Which of the following best describes the main difference between statistical learning and machine learning? a. Machine learning is concerned with making predictions using data, while statistical learning is concerned with understanding the underlying relationship of the data b. Statistical learning is concerned with making predictions using data, while machine learning is concerned with understanding the underlying relationship of the data c. Statistical learning relies on mathematical models and assumptions, while machine learning does not d. Machine learning relies on mathematical models and assumptions, while statistical learning does not

Machine learning is concerned with making predictions using data, while statistical learning is concerned with understanding the underlying relationship of the data

Q: What are the outputs (predictions) of a CART model for classification and regression trees respectively? (what are the predicted values in each terminal node) A: for classification, the predictions are the ------------- in each region and for regression, the predictions are the ----------------- of the observations in each region. a. Majority class - Minority class b. Average values - Average values c. Majority class - Average values d. Minority class - Average values

Majority class - Average values

L1 norm is also known as ------------ distance and L2 norm is also known as ------------- distance. a. Boston- Euclidian b. Manhattan - Euclidian c. Euclidian - Manhattan d. Boston - Manhattan

Manhattan - Euclidian

Challenging: Which of the following regression metrics have a meaningless unit of measurement. For example, the unit is not dollar or foot or percentage or anything meaningful. a. Mean Squared Error (MSE) b. Root Mean Squared Error (RMSE) c. Mean Absolute Error (MAE) d. Mean Absolute Percentage Error (MAPE)

Mean Squared Error (MSE)

High (but not perfect) correlation between two or more independent variables is called: a. Perfect collinearity b. Heteroskedasticity c. Multicollinearity d. None of the above

Multicollinearity

What is the main difference between PyCaret and Scikit Learn for machine learning tasks? a. PyCaret is written in Python, while scikit-learn is written in R b. PyCaret is more user-friendly, while scikit-learn requires more knowledge of machine learning concepts c. PyCaret is more suited for deep learning tasks, while scikit-learn is more suited for traditional machine learning tasks d. PyCaret is a high-code library, while scikit-learn is a low-code library

PyCaret is more user-friendly, while scikit-learn requires more knowledge of machine learning concepts

Which of the following optimization solvers is the fastest one when working with large datasets? a. Gradient Descent b. Minibatch Gradient Descent c. Stochastic Gradient Descent d. They all have the same speed

Stochastic Gradient Descent

What is the characteristic of the greedy method used in decision trees? a. The method looks ahead and picks a split that will lead to a better tree in some future step. b. The method selects the best split based on a global criterion that minimizes the impurity of the tree. c. The method selects the best split at each step without considering the future consequences, also known as the greedy approach. d. The method evaluates all possible splits and selects the one that maximizes the information gain.

The method selects the best split at each step without considering the future consequences, also known as the greedy approach.

A Non-parametric machine learning model, is a model that no restriction is imposed on the functional form of the model, in other terms, f(x) is NOT assumed. True or False

True

Adding a new relevant variable can only increase the goodness-of-fit (R^2) in the regression model. True or False

True

Anaconda is a distribution of Python and R programming languages which simplify package management with conda environment. True or False

True

Correlation is not Causation! True or False

True

Google Colab is a notebook-style environment which provides access to machine learning libraries and computing resources like GPU. True or False

True

In LASSO some weights are reduced to zero, but others may be quite large. In Ridge, weights are small in magnitude, but they are not reduced to zero. In Elastic Net we get the best of both worlds by making some weights zero while reducing the magnitude of the others. True or False

True

Like logistic regression, KNN classification model can output probabilities for each class in the target space. True or False

True

One disadvantage of CART is that it may overfit the data specially if there is no stopping criteria in the algorithm. This will lead to generating a very bushy tree in general. True or False

True

Polynomial regression models are more prone to overfit the data compared to other linear regression models not using polynomial features. True or False

True

Regularized logistic regression refers to a model that adds a penalty term (either L1, L2 or a combination of the two) to the logistic regression cost function. True or False

True

The goal of unsupervised learning is to discover the underlying patterns and find groups of samples that behave similarly! True or False

True

The zero conditional mean assumption states that given any values of independent variables X, the errors are on average zero. True or False

True

To create a regularized model, we modify the loss function by adding a penalizing term whose value is higher when the model is more complex. True or False

True

In order to tune the hyper parameter (d) of the polynomial regression model, we plot the model performance vs model complexity in the -------------- data. a. train b. test c. cross validation d. All of the above

cross validation

As the order of polynomial model increases (more complex model), the model bias will ----------- and model variance will ------------ a. increase - increase b. decrease - increase c. increase - decrease d. decrease - decrease

decrease - increase

In a supervised machine learning algorithm, as we increase model complexity, the bias ---------- and the model variance ------------. a. increase - increase b. decrease - decrease c. increase - decrease d. decrease - increase

decrease - increase

From the Elements of Statistical Learning (aka the Bible of Machine Learning and btw our main textbook): Trees have one aspect that prevents them from being the ideal tool for predictive learning, namely ------------- a. simplicity b. interpretability c. accuracy d. inaccuracy

inaccuracy

As we increase the hyperparameter K in a KNN model from 1 to 10, the model bias ----------------- and the model variance ---------------- a. increase - decrease b. increase - increase c. decrease - decrease d. decrease - increase

increase - decrease

Which of the following statements are correct? A) While recall expresses the ability to find all relevant instances in a dataset, precision expresses the proportion of the data points our model says was relevant were actually relevant. B) F1 uses the simple average instead of a harmonic mean because simple average punishes extreme values. a. only A b. only B c. Both A and B d. Neither A nor B

only A

In machine learning, ------------ happens when the fitted algorithm, does not generalize well to the unseen data. a. underfitting b. overfitting c. over specification d. under specification

overfitting

The linear regression model is an example of --------------- model. a. parametric b. non-parametric c. can be either parametric or non-parametric d. none of the above

parametric

In machine learning the focus is more on ------------ and less on --------------. a. prediction - inference b. inference - prediction c. prediction - forecasting d. forecasting - prediction

prediction - inference

Which of the following statements is NOT an advantage of CART (Classification and Regression Trees)? a. Interpretability b. handling categorical variables without the need of one-hot encoding c. using a greedy algorithm which can get stuck in a local optimal d. handling non-linear data sets

using a greedy algorithm which can get stuck in a local optimal

-------------- set is used to tune (optimize) the hyperparameters of the model. a. train b. validation c. test d. All of the above

validation

In this type of penalized regression, we can shrink the coefficients estimate toward zero. a. Ridge b. LASSO c. Elastic Net d. All of the above

All of the above

An explanatory variable is said to be exogenous if it is correlated with the error term. True or False

False

OLS is an estimation technique that maximizes the residual sum of squares! True or False

False

There is only one hyperparameter in KNN model. True or False

False

Through Gauss-Markov assumptions we have a clear understanding of the distributions of beta hats. True or False

False

To train a machine learning model, we should always partition the data into 3 parts (train, validation, test) even if the data set is small. True or False

False

Which of the following best describes the main difference between AI, ML, and DL? a. AI is concerned with understanding and replicating human intelligence, while ML is concerned with making predictions using data, and DL is concerned with finding patterns in data b. ML is concerned with understanding and replicating human intelligence, while AI is concerned with making predictions using data, and DL is concerned with finding patterns in data c. DL is concerned with understanding and replicating human intelligence, while ML is concerned with making predictions using data, and AI is concerned with finding patterns in data d. AI is concerned with finding patterns in data, while ML is concerned with understanding and replicating human intelligence, and DL is concerned with making predictions using data

AI is concerned with understanding and replicating human intelligence, while ML is concerned with making predictions using data, and DL is concerned with finding patterns in data

Which of the following metrics, is NOT an evaluation metrics for regression models? a. R-squared b. MSE (Mean Squared Error) c. MAE (Mean Absolute Error) d. Accuracy

Accuracy

Which of the following performance metrics incorporates all the cells in a 2 by 2 confusion matrix (TP, TN, FP, FN)? a. Accuracy and Matthews Correlation Coefficient (MCC) b. only Matthews Correlation Coefficient (MCC) c. precision, recall d. f1 score

Accuracy and Matthews Correlation Coefficient (MCC)

What are the common ways of reducing the overfitting in a machine learning exercise? a. Collecting more data b. Complexity reduction (ex regularization) c. Using cross validation d. All of the above

All of the above

What are the steps in Econometrics analysis? a. Specify the model b. Collect data c. Quantify the model d. All of the above

All of the above

What is the primary benefit of using Google Colab over a personal workstation for machine learning tasks? a. Cost-effective b. Access to powerful GPUs c. Automatic software updates d. All of the above

All of the above

Which of the following is a disadvantage of using a personal workstation for machine learning tasks compared to cloud platforms and Google Colab? a. Limited access to powerful hardware b. Limited scalability c. Higher cost d. All of the above

All of the above

Which of the following statements is correct with regards to LASSO regressions? A) LASSO regression uses L1 norm. B) LASSO eliminates the least important features from the model, it automatically performs a type of feature selection. C) Cross validation is used to select a good value for the shrinkage parameter. D) It is best to apply LASSO regression after variable standardization a. All but A b. All but B c. only C and D d. All of the above

All of the above

Which of the following statements is correct with regards to MSE decomposition? A) Model variance is the variance if we had estimated the model with a different training set B) Model bias is the error due to using an approximate model (model is too simple). On average, the model is not hitting at the target. C) Irreducible error is due to missing variables and limited samples. Can't be fixed with modeling a. Only A b. Only A and B c. Only B and C d. All of the above

All of the above

Which one is correct about machine learning? a. Subset of AI that enables computers to learn from data. the model is trained with a set of algorithms b. A machine learning system is trained (with algorithms) rather than explicitly programmed c. ML involves automated detection of meaningful patterns in data and apply the pattern to make predictions on unseen data d. All of the above

All of the above

Which of the following statements are correct? A) The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. B) The true-positive rate is also known as sensitivity, recall or probability of detection in machine learning. C) The false-positive rate is also known as probability of false alarm. a. Only A b. Only B c. Only C d. All of them

All of them

We usually use MSE (Mean Squared Error) as the loss function for linear regression model. Why is that the case? A. It is a well-behaved cost function (continuous and differentiable) B. There exist a close form solution for MSE cost function a. Only A b. Only B c. Neither A nor B d. Both A and B

Both A and B

Which of the following statements is true regarding classification trees? a. Classification trees are used to predict quantitative responses. b. Classification trees are used to predict qualitative responses, and the prediction at each terminal node is the category with the majority of data points. c. The prediction of the algorithm at each terminal node in a classification tree is the average value of the target variable d. Regression trees are very similar to classification trees with respect to decision criteria

Classification trees are used to predict qualitative responses, and the prediction at each terminal node is the category with the majority of data points.

Which of the following is NOT a type of machine learning? a. Supervised learning b. Unsupervised learning c. Compiler learning d. Reinforcement learning

Complier learning

Which of the following is typically NOT a goal of statistical learning? a. Understanding the relationships between variables in a dataset b. Predicting the value of a target variable based on the values of other variables c. Finding patterns in data d. Constructing super complex models for maximum predictability

Constructing super complex models for maximum predictability

Which of the following statements accurately describes the cost complexity pruning method in decision trees? a. Cost complexity pruning involves randomly removing branches from a decision tree until a desired level of accuracy is achieved. b. Cost complexity pruning involves adding more branches to a decision tree to improve its predictive accuracy. c. Cost complexity pruning involves scaling the weights of features in a decision tree to adjust their importance in the final prediction. d. Cost complexity pruning involves adjusting the complexity of a decision tree by adding or removing branches to minimize a cost function that balances between model complexity and goodness of fit.

Cost complexity pruning involves adjusting the complexity of a decision tree by adding or removing branches to minimize a cost function that balances between model complexity and goodness of fit.

Which of the followings is NOT an advantage of KNN model? a. Intuitive and simple b. No assumptions on distribution of the data c. Curse of dimensionality d. Easy to implement for multi-class problem

Curse of dimensionality

Which of the following is NOT a common application of machine learning? a. Database management b. Fraud detection c. Image classification d. Customer segmentation

Database management

Which of the following statements accurately describes the approach used by decision trees in machine learning? a. Decision trees use a bottom-up approach to group and label observations that are similar b. Decision trees apply a top-down approach to data, progressively dividing data sets into smaller data groups based on a descriptive feature until they reach sets that are small enough to be described by some label c. Decision trees group data sets into larger data groups based on a descriptive feature until they reach sets that are large enough to be described by some label. d. Decision trees do not group or label observations in any specific way.

Decision trees apply a top-down approach to data, progressively dividing data sets into smaller data groups based on a descriptive feature until they reach sets that are small enough to be described by some label

Which of the following statements is true regarding decision trees? a. Decision trees are parametric models. b. Decision trees are non-parametric models. c. Decision trees can be both parametric and non-parametric models. d. None of the above

Decision trees are non-parametric models.

Which of the following decision trees criteria is more sensitive to purity/impurity of the target classes? a. MSE b. Error rate c. Gini d. Entropy

Entropy

Which of the following terms is not an alternative name for "Dependent Variable"? a. Response Variable b. Regressand c. Explanatory Variable d. Predicted Variable

Explanatory Variable

Which of the following performance metrics is the best way to compare different machine learning classification models when the target variable is highly imbalanced (for example 98% class 1 and only 2% class 2)? a. R-squared b. Accuracy c. F1-score d. Error rate

F1-score

AUC stands for "Area Under the (ROC) Curve". A model with a lower AUC is preferred to a model with a higher AUC. True or False

False


संबंधित स्टडी सेट्स

Chapter 9 Carbon Dioxide Equilibrium and Transport

View Set

Peds - Chapter 19: Nursing Care of the Child With a Cardiovascular Disorder

View Set

Chapter 7 AP Statistics Practice Test

View Set

2.1 The Classified Balance Sheet

View Set

3D Printing Review Game for Post Test

View Set

Algebra I STAAR EOC Quizlet Test Review Part #3

View Set

Exam #3: Oral & Esophageal Pathology

View Set

Organizational Behavior - Chapter 2

View Set

PNE 101 Lec. Ch 5-Homeostasis, Adaptation, & Stress Fundamental Nursing Skills and Concepts Timby 11th Ed.

View Set

Biology Module 3&4 - Biomes/Climate and Population Ecology

View Set