Module 5
The default likelihood threshold for determining outcome class in logistic regression is -- 0.5 -- 1 -- 0.4 -- 0.6
0.5
To decrease the size of the generated tree in decision trees analysis one should ________________ A) increase the minimum size for split B) increase the minimum size for leaf C) use the gain ratio algorithm D) A & B
A & B A) increase the minimum size for split B) increase the minimum size for leaf
If it is not specified logistic regression usually refers to __________ -- Multinomial Logistic Regression -- Polynomial Logistic Regression -- Binomial Logistic Regression -- None of above
Binomial Logistic Regression
Decision Trees analysis can only be used to predict a continuous dependent variable. -- True -- False
False
Logistic Regression is the same as polynomial regression. -- True -- False
False
Multi-nomial logistic regression is the same as linear regression -- True -- False
False
You need to let the data inform you about the appropriate decision threshold -- True -- False
False (No it is highly contextual, it has to be determined mostly based on business understanding)
Logistic Regression is also known as __________ -- Logarithmic Regression -- Logit Regression -- Binary Regression -- Linear Regression
Logit Regression
Which one is not a statistical measure used in logistic regression model evaluation -- Hit Rate -- R-Squared -- NagelKerke R-Squared -- -2LL
R-Squared
Decision tree algorithm starts with __________ predictor and then branches out to __________ -- The neutral, better predictors -- The weakest, better predictors -- The best, weaker predictors -- The worst, stronger predictors
The best, weaker predictors
Logistic Regression is similar to Linear regression in that -- They both only take one independent variable -- They both only take one dependent variable -- They both estimate an imaginary curvilinear line that fits the data the best -- They both estimate an imaginary straight line that fits the data the best
They both only take one dependent variable
Anything in the world can be molded using binary outcomes. -- True -- False
True
Decision tree analysis can detect and model localities within our data. -- True -- False
True
Decision tree analysis is not sensitive to missing values and outliers. -- True -- False
True
Decision tree is also know as classification tree -- True -- False
True
Using the same training data we can generate one decision tree model for the purpose of understanding and explaining and another one for the purpose of best predictive performance. -- True -- False
True
Considering the decision tree model, nodes _______________ -- show us the distribution of categories from the label attribute -- are an alternative for leaves -- represent all of the independent variables in our model -- are attributes which serve as predictors for the dependent attribute
are attributes which serve as predictors for the dependent attribute
Why it is recommended for the scoring data to be within the ranges of training data in classification/prediction tasks? -- because computer algorithms are very sensitive to data boundaries -- because computer algorithms also learn by experience, they can not accurately classify/predict something they have not yet experienced. -- There is no specific reason, it is the tradition. -- because scoring data is part of training data and it can not exceed it.
because computer algorithms also learn by experience, they can not accurately classify/predict something they have not yet experienced
An over fitted decision tree model -- has many leaves with the minimum number of allowed observations -- performs great on training/testing data -- is a fit tree with not many branches, it looks fit and slim -- is sensitive to outliers since it is overly fit -- has too many branches -- is sensitive to missing values -- fails on scoring data
has many leaves with the minimum number of allowed observations; performs great on training/testing data; has too many branches; fails on scoring data
Which one is true about Random Forest analysis -- it randomly chooses model as the best one -- it easily falls into over fitting problem -- it takes time to grow a forest, it is not good for quick tasks. -- it uses a voting mechanism to select the best model -- it is an ensemble of decision tree models -- it is an extended decision tree model with many roots and branches -- it generates different trees by changing model parameters randomly -- models are uncorrelated
it uses a voting mechanism to select the best model, it is an ensemble of decision tree models, it generates different trees by changing model parameters randomly, models are uncorrelated