AIML

Ace your homework & exams now with Quizwiz!

You would like to write a model that exams individual customer account activity and for each account decide if it has been hacked or compromised. What type of machine learning problem is this?

Classification

You know that a few of your features are strongly correlated and thus it is recommended you begin your training with which one the following machine learning algorithms?

Elastic Net

Exploratory data analysis is performed only to inform data cleansing.True or False?

F

K-Means is not sensitive to scale, reducing the need to pre-process a model's input features. T/F?

F

Model performance on the training set matters more than performance on the validation set or test set. True or False?

F

In machine learning, data needs to be fit and transformed during the training process. Transformation must happen before fitting.

False

Machine learning algorithms cannot work with missing data points. Therefore, part of the data preparation includes caring for missing data. Which one of the following methods is considered imputation?

Fill missing values with some value like zero or one, a median value, or some other estimated value.

Tuning a model means running a machine learning algorithm to find the model ________ that will make it best fit the training data.

Hyperparameters

How many inputs will a neural network have if it is seeking to classify an email as spam or not?

It depends on the number of features in the dataset

Artificial neural networks have been around for a long time but more recently have profoundly impacted our way of lives because of _______.

Moore's Law Newer Activation functions Available toolkits (keras & tf)

If your Y / target data set looks as follows, what kind of a problem are you trying to solve? [3, 2, 8, 2, 3, 0, 3, 9, 9, 7]

Multiclass classification

DBSCAN approaches clustering differently than K-Means because it seeks to define clusters _____.

On density of a continuous region

Machine learning is often described by this definition: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Based on this definition, which variable is represented by MAE (mean absolute error) for some machine learning model?

P

All of the following are considered clustering algorithms EXCEPT for which one? Principal component analysis (PCA) Gaussian Mixture K-Means DBSCAN

PCA

This popular, open source Python library provides many machine learning algorithms.

SKLearn

Which of the following activation functions are you most likely not going to use in your neural network design?

STEP

In order to prepare this data for a machine learning algorithm, you will most likely choose to use OrdinalEncoder on which one of the following features?

Size

Which one of the following methods helps you to avoid sampling bias in your training data set?

Stratification

In unsupervised learning algorithms like K-Means, there is no need for a target vector (i.e. the y labels). True or False?

T

Spam detection is typically a supervised learning problem. True or False?

T

The more features in our model, the higher the likelihood there will be overfitting. True or False?

T

The Gaussian Mixture model results in "soft clustering". Soft clustering means the model will, for each new data instance, predict _____.

The probability that a data instance belongs to a cluster

Cross validation is a method that helps to alleviate which one of the following problems?

The validation set is too small to really measure the accuracy of a model

In machine learning, data needs to be fit and transformed during the training process. An example of transforming data is when the data is scaled between 0 and 1. True or False?

True

A neural network is designed to analyze lots of data and find an optimal solution. This requires adjusting which of the following parameters in its model as it processes batches of data? .

Weights, biases (wrong options: inputs, layers)

a tensor is

a multidimensional array

This measurement is helpful to evaluate the performance of a classification model when the data set is well-balanced (i.e. a good representation of instances for each of the target classes).

accuracy

K-Means can manage the following situations EXCEPT for which one?

clusters of variable density

A sparse matrix can result from the transformation process when a data set ....

columns with OHE

Which one of the following K-Means silhouette diagrams demonstrates the optimal number of clusters for a given model?

high silhouette score with similar size clusters

When evaluating K-Means algorithms, the most optimal solution will be the model with the lowest _______

inertia

In order to prepare this data for a machine learning algorithm, you will most likely use OneHotEncoder on which one of the following features? qualification ID name age

name

Gradient descent is a(n) ______ algorithm that iteratively finds the best possible solution for a function.

optimization

This Scikit-Learn tool provides a way for you to organize the execution of steps used to support your machine learning model (steps like data transformation, feature reduction and even the execution of an algorithm).

pipeline

In order to prepare this data for a machine learning algorithm, you may choose to bin which one of the following features?

price, age

This measurement is helpful to evaluate the performance of a classification model even when the data set is not well representative of the different target classes.

recall

Which is NOT true of regularization?

reduces underfitting

All of the following are terms associated with dimensionality reduction EXCEPT which one? Regression, feature extraction, clustering, regularization, principal component analysis

regression

Machine learning is often described by this definition: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Based on this definition, what is another way to describe E?

training data

Neural networks are sensitive to scale; it is important to standardize or normalize the model's input features.

true

The challenge of principal component analysis (PCA) is to determine the number of features while still ensure an acceptable level of ____ in the model.

variance

Assume your data set is stored in a dataframe named df and only has 100 records (yes, way too small but this is a theoretical question). If you use the following Python statement is issued, how many records will your test data set contain? train, test = train_test_split(df, test_size=.30)

30

Consider the machine learning landscape and your efforts to build a machine learning model. Order these common machine learning tasks below, starting with the first task to be accomplished and ending with the last task.

Analyze data Stratify/Split Data Transform Training Data Select Algorithm & Create model Tune model Eval Performance Make Prediction

Assume the following parameter list is used with a GridSearchCV. How many hyperparameter combinations will be tested?

# of items in first list * # of items in second list

Your regression model is designed to review a Spotify user's playlists and from that predict the user's age. The model's RMSE is 2.5. How would you describe the performance of this model to someone? Be specific.

+ or - 2.5 yrs off


Related study sets

HIST 159 FINAL Exam ID List Definitions

View Set

CHAPTER 18 - DISORDERS OF BLOOD FLOW AND BLOOD PRESSURE

View Set

Capstone Chap.11 Global and International Issues

View Set

VOLARE QUESTION (AIRCRAFT STRUCTURE AND DESIGN)

View Set