Week 7 Machine Learning

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What is the first step in constructing a decision tree? Start with all samples at a node. Partition the samples into subsets based on the input variables. Repeatedly partition data into successively purer subsets until stopping criteria are satisfied.

Start with all samples at a node.

In general, are classification and regression often supervised or unsupervised approaches? Supervised Unsupervised

Supervised

What does the following method call return? accuracy_score(data_true = data_test, data_pred = predictions) The fraction of correctly classified samples. The number of correctly classified samples.

The fraction of correctly classified samples.

What type of object does the function Kmeans output? kmeans dataframe integer series

kmeans

What is the function call to output the name of columns of a dataframe named x? x.columns(0) x.columns columns(x)

x.columns

You are given a dataframe labeled x where the column 'number' indicates the index of a record. Which function call would create a new dataframe y that takes more than 10 samples x if x has 100 records? y = x[(x['number']%5)==0] y = x[(x['number']%10)==0] y = x[(x['number']%15)==0]

y = x[(x['number']%5)==0]

Which Root Mean Square Error (RMSE) would represent a perfect prediction with no errors in regression? 0 NaN 1 -1

0

For a classification problem, if you want to predict the letter grade that a student would receive, what are 2 examples of reasonable input data to consider? Amount of time spent studying Percentage grade these students received in the previous semester Letter grade different students received in another class The students' ID numbers

Amount of time spent studying Percentage grade these students received in the previous semester

What is the difference between regression and classification for machine learning in Python? Regression transforms categorical values to numeric and then follows the same as classification. Regression is used to predict a numeric value while classification is used to predict a categorical value. Classification is used when the input data is categorical and regression is used when the input data is numeric.

Regression is used to predict a numeric value while classification is used to predict a categorical value.

What is the next step in building a classification model after the model is constructed and parameters are adjusted? Apply model to new data Train the data Minimize errors

Apply model to new data

How do you assign each sample in a dataset to a centroid using the k-means algorithm? Assign the sample to the cluster with the closest centroid. Assign the sample to the cluster with the furthest centroid. Assign the sample to a random cluster.

Assign the sample to the cluster with the closest centroid.

As an example, you have a dataset containing numerical values of subjects' heart rates during exercise and categorical values describing how much they smoke. You want to determine whether smoking and heart rate are related. What machine learning category would this fall under? Classification Regression Cluster analysis Association analysis

Association analysis

How do you determine the new centroid of a cluster? Calculate the mean of the cluster Calculate the max of the cluster Calculate the mode of the cluster Calculate the min of the cluster

Calculate the mean of the cluster

Is age group a numeric or a categorical variable? Numeric Categorical

Categorical

For example, you want to predict the number of kids someone will have: either 0, 1, 2, or 3+. Is this an example of regression or classification? Regression Classification

Classification

Why are decision boundaries of a decision tree parallel to the axes formed by the variables? Each split considers only a single variable Each subset should be as homogenous as possible The induction algorithm eventually stops expanding

Each split considers only a single variable

Cluster analysis is a supervised task. True False

False

In machine learning, algorithms and programs directly aim to learn a given task. True False

False

Regression is an unsupervised task. True False

False

Test data is the same dataset as training data in classification models. True False

False

True or False: The function call train_test_split(a, b) where a and b are dataframes will always output the same result. True False

False

What is the command to get the number of rows in a data set titled "data"? data.shape[0] data.shape[1] data.size() data.length()

data.shape[0]

Final clusters are sensitive to initial centroids. True False

True

It works out better mathematically to measure the impurity of a split in a decision tree, rather than the purity. True False

True

The target variable is always categorical in classification. True False

True

When you search an incorrectly spelled term online, suggested words is an example of machine learning. True False

True

What is the definition of 'data mining'? Activities related to finding patterns in databases and data warehouses. Process of inspecting, cleansing, transforming, and engineering a particular dataset. Query processing and statistical analysis to summarize a dataset.

Activities related to finding patterns in databases and data warehouses.

What does the "within-cluster sum of squared error" provide? A mathematical measure of the variation within a cluster. An error measurement for a specific sample in relation to the centroid of a particular cluster. An answer to which cluster is the most 'correct.'

A mathematical measure of the variation within a cluster.

Training Data--> Learning Algorithm--> ???? ----> Model (Training Phase) What goes in the ??? Build model Test data Apply model Results

Build model

What is true between supervised and unsupervised approaches? In supervised approaches, the target is unavailable. In unsupervised approaches, the target is unavailable. In supervised approaches, the target is provided. In unsupervised approaches, the target is provided. In supervised approaches, the target is unavailable. In unsupervised approaches, the target is provided. In supervised approaches, the target is provided. In unsupervised approaches, the target is unavailable.

In supervised approaches, the target is provided. In unsupervised approaches, the target is unavailable.

In a decision tree, which nodes do NOT have test conditions? Root nodes Internal nodes Leaf nodes

Leaf nodes

What 2 statements describe classification in the context of machine learning? Predict the category of the target given input data Supervised task Unsupervised task Numerical target variable

Predict the category of the target given input data Supervised task

How would you initially handle an anomaly (apparent outlier) in cluster analysis? Throw it out of the dataset Disregard in further analysis Provide further analysis on the anomaly

Provide further analysis on the anomaly

What is the correct word to describe an instance of an entity in your data? Sample Feature Attribute Field

Sample

Which is NOT mentioned in the course as a common similarity measure in cluster analysis? Euclidean distance Manhattan distance Cosine similarity Sine similarity

Sine similarity

How do you determine the size of a decision tree? The number of edges from the root node to that node. The number of edges in the longest path from the root node to the leaf node The number of nodes in the tree correct

The number of nodes in the tree correct

In building a machine learning model, why do we want to adjust the parameters? To reduce the model's error To compare different model variations To provide the best graph of the model outputs

To reduce the model's error

A Root Mean Square Error (RMSE) higher than our mean value would be too high. True False

True

In the parallel_plot function, what was represented on the y-axis of the resulting plot? Each of the features Values of each cluster center Location of each cluster center Min and max values in each cluster

Values of each cluster center

When is a prediction task referred to as simple linear regression? When there is only one input variable. When there is more than one input variable. When there are two input variables.

When there is only one input variable.

When would you use the machine learning technique 'regression'? When your model has to predict a categorical value. When your model has to predict a numerical value. When you want to organize similar items in your dataset into groups. When you want to capture associations between items

When your model has to predict a numerical value.

Which algorithm to build classification models relies on the notion that samples with similar characteristics likely belong to the same class? kNN Decision Tree Naive Bayes

kNN

Which parameter in the KMeans clustering algorithm do you have to specify for the number of clusters you want? n_clusters clusters tot cluster_centers

n_clusters

To use scikit-learn: DecisionTreeRegressor, train_test_split, and mean_squared_error, which of the following libraries are necessary? pandas sklearn.metrics sklearn.model_selection sklearn.tree scikitlearn

pandas sklearn.metrics sklearn.model_selection sklearn.tree

Which of the following is true about a model? built using test data evaluated on training data trained by the training data set

trained by the training data set

What is the appropriate input for the following line of code to make a linear regression prediction? y_prediction = regressor.predict(___) x_test x_train y_train y_test

y_test


Kaugnay na mga set ng pag-aaral

Supply Chain Exam 2 (Chapter 11)

View Set

Marquis Leadership 8e Ch 17: Staffing Needs and Scheduling Policies

View Set