Analytics Final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Which of the following is a disadvantage of neural networks?

-Their "black box" reputation -If the network sees only cases in a certain range, its ability to extrapolate (predict outside this range) is a serious danger. -Neural networks do not have a built-in variable selection method. -All of the above are weaknesses of neural networks.

Which of the following statements about a ROC curve is true?

-it stands for "Receiver Operating Characteristic Curve" -It plots the false positive rate (1- specificity) and true positive rate(sensitivity) -Each point on a ROC curve corresponds to a particular confusion matrix that depends on a specific threshold or cutoff. -All the above statements are true

Which of the following is a violation of one of the major assumptions of the simple regression model?

As the value of x increases, the value of the error term also increases

A negative correlation coefficient (r) implies weak relationship among the variables.

False

Cluster Analysis is a supervised learning technique.

False

Decision Tree is an un-supervised data mining technique.

False

Hierarchical clustering a good technique when you have a very large data set.

False

If 100 patients known to have a disease were tested, and 43 test positive, then the test has 43% specificity

False

In medicine, if the test is positive, it is good news.

False

In predicting the financial status of firms, sensitivity is the ability to predict a firm that is going to stay solvent correctly.

False

One advantage of neural networks is that there is very little chance of overfitting, so validation or testing data is not needed.

False

Predicting something like the average length of a delivery person's shift is a well-suited task for decision tree modeling.

False

The correlation coefficient (r) indicates the amount of change in variable Y when variable X changes by one unit.

False

The dependent variable in logistic regression is always binary.

False

When creating a decision tree, we want to keep splitting as long we can create more impurity in the nodes.

False

When creating a decision tree, we want to keep splitting as long we keep increasing R-Square for the Training data set.

False

When the F test is used to test the overall significance of a multiple regressionmodel, if the null hypothesis is rejected, it can be concluded that all of theindependent variables X 1,X2,X3,....XK are significantly related to the dependentvariable Y.

False

When the predictor variable is categorical and the response variable is continuous, you would use a logistic regression model.

False

in a simple linear regression model, the coefficient of determination (R-Square) not only indicates the strength of the relationship between independent and dependent variables but also shows whether the relationships are positive or negative.

False

the variance inflation factor (VIF) measures the relationship between the dependent variable and the rest of the independent variables in the regression model

False

Where would you most likely see a dendogram?

In a hierarchical clustering algorithm

An odds ratio ____________ 1 indicates that the condition or event is less likely to occur in the first group.

Less than

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) Select the appropriate test/model to determine if there is a relationship between age and household income.

Linear Regression

Which two analytical methods can be used for categorical target variables?

Logistic Regression and Decision Tree

What is a distinct property of Logistic Regression compared to Linear Regression?

Logistic Regression returns probability estimates of a response variable

Which of the following activation functions is not used in neural networks (in JMP)?

None of the above

The graph of the prediction equation obtained from the following model is a? Y=BO+B1X1+B2X2+E

Plane

In the Titanic data analysis, which of the independent variables in this model are significant predictors of Survived variable?

Sex, Passenger Class, Age, & Sibling and Spouces

An anti-theft scanner at an entrance of a book store buzzes once for every 1000 innocent people walking through the scanner. The accuracy of the scanner is:

Specificity of 99.9%

How is False Positive Rate defined?

The fraction of negative instances that were misclassified

in simple regression analysis if the correlation coefficient (r) is a positive value then

The slope of the regression line must also be positive.

A False Positive error is a Type I error.

True

A confusion matrix is used to describe the performance of a classification model.

True

A good clustering scheme will have little variation within clusters and signficant variation between clusters.

True

According to one of the videos, one disadvantage of neural networks is that they are slow learners

True

According to the text, the most popular choice for the number of hidden layers is one.

True

Cluster analysis is a very attractive initial data-mining tool because it can be used to discover rules and patterns.

True

Each terminal node in a decision tree can to be translated into a single IF-THEN rule.

True

If the probability of The University of Akron basketball team winning against Kent State team is .5, then the odds of The University of Akron winning against Kent State is 1.

True

If the probability of winning a game is 0.2, the odds of winning the game is 0.25.

True

In data mining over-fitting results in developing too precise a model that will fail to generalize and therefore will have poor predicting power.

True

In order to include a categorical variable in k-means cluster analysis in JMP, the data must be coded numerically. Therefore, categorical variables should be coded as 0/1.

True

In order to use a classification tree, the target variable must be categorical and not continuous.

True

In regression analysis, if the normal probability plot of residuals exhibits approximately a straight line, then it can be concluded that the assumption of normality is not violated.

True

K-means algorithm is a typical algorithm for cluster analysis that uses "Euclidean distance" to find clusters

True

Logistic Regression analysis is a supervised data mining technique

True

Neural networks can be used for both continuous and categorical dependent (output) variables

True

One-way to decide on the number of clusters in a cluster analysis is to arbitrarily pick a value.

True

Predicting the approval or disapproval of a loan based on credit scores and demographic information is a good application of Logistic Regression.

True

Regression analysis is an example of a supervised learning technique.

True

Specificity measures how good a test is at finding something if it's false.

True

The curse of dimensionality refers to the computational complexity of developing clusters using a large number of variables.

True

The most popular method for using model errors to update weights is called back propagation of error.

True

Training a neural network model involves estimating the weights that will lead to the best predictive results.

True

in this residual plot it appears that the constant variance assumptions are not being violated

True

In a decision tree algorithm, how is the attribute picked for the next split?

You pick the attribute with the highest Logworth.


Ensembles d'études connexes

Chapter 10: the New Frontier and the Great Society

View Set

Module 22: Biology, Cognition, and Learning

View Set

Abnormal Psychology Final Chapter 13-16

View Set

Solving Quadratic Equations by Factoring, Set 1

View Set

Chapter 48 Skin Integrity and Wound Care

View Set

Marketing Chapter 12 - Developing New Products

View Set