Data Analytics Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

In a box plot, the box include %50 of the data, the horizontal line represents (i)____________, the top and bottom of the box represent (ii)________, respectively.

(i) the median (50th percentile), (ii) 75th and 25th percentiles

How would the correlations change if we normalized the data first?

Correlations will not change, since data are normalized by computing correlations

Which of the following are characteristics of Naive Bayes Classifier? (2 correct answers)

Data Driven Makes no assumptions about the distribution of the data

What are the characteristics of k-NN algorithm? (2 correct answers)

Makes no assumptions about the data Data-driven, not model-driven

Two models are applied to a dataset that has been partitioned. Model A is considerably more accurate than model B on the training data, but slightly less accurate than model B on the validation data. Which model are you more likely to consider for final deployment?

Model B

When a model is fit to training data, zero error with those data is not necessarily good. This special case is called ______.

Overfitting

Which of the following are true about Principal Component Analysis (PCA)? (2 correct answers)

PCA is intended for use with quantitative variables The idea of PCA is to find a linear combination of the two variables that contains most, even if not all, of the information, so that this new variable can replace the two original variables.

Which of the following are the methods that we use for dimension reduction? (4 correct answers)

Removing one of the variables in pairs that have a very strong correlation Logistics Regression Multiple Linear Regression Principal Component Analysis

Which of the following are advantages of Naive Bayes Method? (3 correct answers)

Simple and computationally efficient Handles purely categorical data well Works well with very large data sets

Identify whether the task required is supervised or unsupervised learning: Printing of custom discount coupons at the conclusion of a grocery store checkout based on what you just bought and what others have bought previously.

This is unsupervised learning

True or False: Bar charts are useful for comparing a single statistic (e.g. average, count, percentage) across groups. The height of the bar represents the value of statistic, and different bars correspond to different groups.

True

True or False: Naive Bayes method relies on assumption of independence between predictor variables within each class

True

True or False: Pairs of variables that have a very strong (positive or negative) correlation contain duplicative information. Therefore, we want to omit the variables that are strongly correlated to others to avoid multicolinearity (when fitting models).

True

True or False: The classification matrix, also called confusion matrix, gives estimates of the true classification and misclassification rates.

True

True or False: k-NN is a "lazy learner": the time consuming computation is deferred to the time of prediction. For every record to be predicted, we compute its distances from the entire set of training records only at the time of prediction. This behavior prohibits using this algorithm for real time prediction of a large number of records simultaneously.

True

To obtain an honest estimate of future classification error, we use the classification matrix that is computed from ________.

Validation Data

Scatter plots play important role in prediction. Next step can be developing a model. Scatter plots provide information about relationships (linear or non-linear) between variables. The variables in scatter plot ________.

must be numerical

The density ellipsoid in scatterplot matrix is a good graphical indicator of the correlation between two variables. The ellipsoid collapses diagonally as the correlation between the two variables approaches either 1 or -1. The ellipsoid is more circular if the two variables are more correlated. (TRUE or FALSE?)

False

True or False: Sensitivity and Specificity are plotted on an ROC Curve.

False

True or False: The test data are used to build models, or to further tweak the model or improve its fit.

False

True or False: To implement the k-NN algorithm successfully on JMP PRO one has to normalize the continuous predictors first.

False

True or False: k-NN algorithm can only be used for classification (of categorical outcome)

False

True or Fasle: The Naive Bayes platform fits a model to predict the value of a numerical variable as well as the value of a categorical variable.

False

Which of the following are the most popular visualization tools in JMP_Pro? (3 correct answers)

Fit Y by X Distribution Graph Builder

Identify whether the task required is supervised or unsupervised learning: Deciding whether to issue a loan to an applicant based on demographic and financial data (with reference to a database of similar data on prior customers).

This is supervised learning

In JMP a diamond is displayed in the box, where the center of the diamond is

The mean


Ensembles d'études connexes

Module 26209-20 grounding and bonding

View Set

PSCH 352 Cognition and Memory Ch. 1, 3-7

View Set

HIST2620 Ch 24, HIST2620 Chapter 23, HIST2620 CH 25

View Set

Everfi Financial Literacy Lesson 1 Quiz

View Set

Chapter 14 Basics of Health Insurance

View Set