INET 4061 Midterm
Which p value supports this statement: The test found that the data sample failed to reject the null hypothesis at a 5% significance level. - 0.03 - 0.07 - 0.005 - 0.01
- 0.07
Based on Bayes' Theorem, what is a patient's probability of having liver disease if they are an alcoholic, given: A is the event "Patient has liver disease." Past data tells you that 10% of patients entering your clinic have liver disease. B is the litmus test that "Patient is an alcoholic." Five percent of the clinic's patients are alcoholics. Among those patients diagnosed with liver disease, 7% are alcoholics. - 0.14 - 0.035 - 0.10 - 0.007
- 0.14
We are building a decision tree classifier to predict the number of spam emails we will get with a dataset of 100 emails (50 spam and 50 not spam.) After building a decision tree one of the edge leaves has 20 emails of which 15 of them are spam. For this edge leaf, what is our support and confidence respectively? - 20 emails/ 25% - Not enough information to answer - 20 emails / 75% - 15 emails / 20 emails
- 20 emails / 75%
A simple regression equation predicts ir conditioner costs per month, based on temperature. The corelation between prediction costs and the temperature is 0.80. what is the correct interpretation of the prediciton? - 64% of the variability in the air conditioner costs can be explained by temperature. - 80% of the variability in air conditioner costs can be explained by temperature - for each unit increase in temperature, air conditioner costs increase by 80 cents. - For each unit increase in temperature, air conditioner costs increase by 49 cents.
- 64% of the variability in air conditioner costs can be explained by temperature.
Using the following linear regression equation and regression output, what number do you plug into b0 and b1 respectively? ŷ = b0 + b1x Coefficients T P Intercept 76 2.53 0.01 Predictor 35 1.75 0.04 - .01, 0.04 - 2.53, 1.75 - 17.5, 2.53 - 76, 35 - 76, .04 - 35, 76
- 76, 35
1% of people have a certain genetic defect. 90% of tests for the gene detect the defect (true positives). 9.6% of the tests are false positives. If a person gets a positive test result, what are the odds they actually have the genetic defect? - 8.65% - 10.40% - 90.4% - 91.35%
- 8.65%
Which evaluation measure does not detect class imbalance? - Sensitivity/specificity - ROC Curve - Accuracy - True positive and true negative rates
- Accuracy
Which action handles an imbalance dataset? - Collect more data to even the imbalances in the dataset - All of these - Resample the dataset to correct for imbalances - Try a different algorithm altogether on your dataset
- All of these
The null hypothesis for an ANOVA test is - All pairs of samples are same i.e. all sample means are equal - A sample average is the same as the population mean - Sample variables in the test are independent and identically distributed random variables - At least one pair of samples is significantly different
- All pairs of a sample are same i.e. all sample means are equal
The null hypothesis for an ANOVA test is - A sample average is the same as the population mean - All pairs of samples are same i.e. all sample means are equal - sample variables in the test are independent and identically distributed random variables - At least one pair of samples is significantly different
- All pairs of samples are same i.e. samples means are equal
How can f(X) be used? - Depending on the complexity of f, we may be able to understand how each component of Xj of X affects Y - With a good f, we can make predictions of Y at new points X = x - All Statements are true. - We can understand which components of X = (X1, X2, .... Xp) are important in explaining Y, and which are irrelevant.
- All statements are true.
Which of these statements about dimensionality reduction is not true? - Lots of redundant or correlated features and noisy features are two reasons to do dimensionality reduction. - All the above are true. - We select the ideal number of components based on cumulative proportion of variance explained. - PCA algorithm constructs a new set of properties based on combination of the old ones rather than throwing out features.
- All the above is true
Which evaluation measure is appropriate to use for a classification model with imbalanced classes? - Coefficient of Determination (R^2) - Accuracy - Mean Squared Error (MSE) - Area under the curve (AUC)
- Area under the Curve
The purpose of a test dataset is - Compute dimensionality reduction on a model - Fit a model - Assess generalization error of a model - Estimate training error of a model
- Assess generalization error of a model
The purpose of a test dataset is - Estimate training error of a model - Fit a model - Compute dimensionality reduction on a model - Assess generalization error of a model
- Assess generalization of a model
Which test is appropriate to compare categorical variables and test if a sample matches the population? - t-test - Chi-square test - ANOVA - z-test
- Chi-square test
Which data science task is unsupervised (undirected)? - Optimization - Regression - Clustering - Classification
- Clustering
Rank the following types of reasoning in increasing order of difficulty - Inferential -> Exploratory -> Descriptive-> Causal -> Predictive - Exploratory -> Descriptive -> Predictive -> Inferential -> Causal - Exploratory -> Descriptive -> Predictive -> Inferential -> Causal - Descriptive -> Exploratory -> Inferential -> Predictive -> Causal
- Descriptive -> Exploratory -> Inferential -> Predictive -> Causal
Which statement is not a characteristic of the Curse of Dimensionality? - Exponential growth in data causes low sparsity in a data set. - In the era of Big Data, noise (defined as unhelpful, irrelevant, and possibly misleading information) is growing faster than the signal. - With Big Data, there will be millions of meaningless patterns in the data, the results of pure chance. - As the number of features or dimensions grows, the amount of data we need to generalize accurately grows exponentially.
- Exponential growth in data causes low sparsity in a data set.
What is true about R^2 and F-statistic when evalutating a linear regression model? - F-statistic tells us if the overall model is significant. - Once R^2 hits a certain threshold, we can accept the model. - Both tell us if the model is significant or not. - We don't need F-statistic if all variables individually are significant.
- F-statistic tells us if the overall model is significant
Adding a non-important feature to a linear regression model may result in a decrease in R^2. (T/F)
- False
Bagging iteratively combines multiple weak learners, weighted usually related to the weak learners' accuracy, to create one strong learner. (T/F)
- False
Bootstrap aggregation or bagging reduces bias. (T/F)
- False
Classification trees use RSS (residual sum of squares) as a criterion for making tree splits. (T/F)
- False
If R2 (coefficient of determination) is .6 then 40% of the variation is explained by the input variable (X) on the dependent variable (Y). (T/F)
- False
If the slope != 0 in a simple linear regression model that = β0 + β1x + €, then there is no relationship between y and x.(T/F)
- False
If you decrease alpha (the level of significance, you increase the probability of incorrectly rejecting the null hypothesis and also decrease the confidence level. (T/F)
- False
Standard errors can be used to perform hypothesis tests on the coefficients in a linear regression mode. If the hypothesis test of a standard error fails (i.e., you should reject the null hypothesis and conclude that β1 is not zero), then the confidence interval will contain zero. (T/F)
- False
The higher the p-value, the more unlikely the null hypothesis is (T/F)
- False
The output of a classification data task is continuous? (T/F)
- False
The relative importance for each predictor variable in a multiple linear regression model can be calculated directly from variables with different magnitude ranges that are not standardized. (T/F)
- False
Typically as the flexibility of f hat increases, its variances decreases and its bias increases. (T/F)
- False
Typically, as the felxibility of f hat increases, its variances decreases and its bias increases. (T/F)
- False
Which method adds variables to a model? - Subset Selection - Feature Construction - Dimension Reduction - Shrinkage
- Feature Construction
An example of a filter type of feature selection - Feature selection occurs naturally as part of the model - Try all possible feature subsets as input to model - Features are selected before input to model - Use a data mining algorithm as a black box to find best subset
- Features are selected before input model.
The feature selection method that selects features independent of a model is - Ensemble - Filter - Embedded - Wrapper
- Filter
What is the definition of a p-value? - A standardized value where the expected value is zero and the standard error is one. - If the null hypothesis were true, the probability of observing results due to random chance. - A family of curves that depend on a parameter called degrees of freedom. - If the null hypothesis were true, the probability of observing results as extreme or more extreme as the results observed in this test.
- If the null hypothesis were true, the probability of observing results as extreme or more extreme as the results observed in this test.
Hypothesis testing corresponds to which level of statistical reasoning? - Mechanistic - Exploratory - Inference - Casual
- Inference
Which is worse? Type 1 or Type 2 error? - Type 1 - Type 2 - It depends
- It depends
How does a 1-nearest neighbor classifier compare to a 3-nearest neighbor classifier - KNN-1 has lower variance and higher bias - KNN-1 has lower variance and lower biase - KNN-1 has higher variance and lower bias - KNN-1 has higher variance and higher bias
- KNN-1 has higher variance and lower bias
The main difference Lasso and Ridge Regression is: - Lasso helps with high bias while ridge regression helps with high variance - No answer text provided - Ridge regression depends on the parameter lambda, while Lasso does not - Lasso essentially eliminates variables by reducing slope to zero while Ridge doesn't completely remove variables.
- Lasso essentially eliminates variables by reducing slope to zero while Ridge doesn't completely remove variables
The ratio of the probability of a particular outcome occuring to the probablility of it not occuring is - Conditional Probability - Likelihood - Probability - Odds
- Odds
When the alternative hypothesis is whether the mean is less than or equal to a particular value what type of statistical test should we use?
- One-way t test
Which type of analytics addresses the question "What will happen"? - Diagnostic - Descriptive - Prescriptive - Predictive
- Predictive
The response variable in a classification model is - Any Data Type - Ordinal - Qualitative/Discreet - Quantitative/Continuous
- Qualitative/Discreet
Which machine learning algorithm is based on the idea of bagging? - Decision Tree - Random Forest - Classification - Regression
- Random forest
Which measurement is not used for best subset and stepwise model procedures to select a best regression model? - Cross-validated prediction error - BIC - AIC - Sensitivity
- Sensitivity
Which statement does not explain why stepwise methods are good alternatives to best subset selection? - A large search space can lead to overfitting and high variance of coefficient estimates - Stepwise methods explore a larger and broader set of models than best subset selection - Best subset selection cannot be applied with very large p
- Stepwise methods explore a larger and broader set of models than best subset selection
Which variable would likely not be chosen by feature selection to be removed from a feature set? - A Variable has one constant value - The variability of a variable explains a large amount of the variability of the response variable. - Each value in a variable is a unique integer. - A predictor variables is strongly correlated with another predictor variable.
- The variability of a variable explains a large amount of the variability of the response variable.
Which null hypothesis is appropriate for the blood pressure in this plot https://drive.google.com/file/d/1xC4HABZntKsrpYQHtA2bh_X4MDBXfhlg/view? - There is no significant difference in blood pressure for a patient before and after treatment; the difference in the means of the two groups, may be due to chance and sampling error. - There is a significant difference between the average blood pressures for two groups; the difference in the means of the two groups may be due to chance and sampling error. - There is a significant difference in the blood pressure for a patient before and after treatment; the difference in the means of the two groups is mostly likely not due to chance or sampling error. - There is no significant difference between the average blood pressures for this group compared to the population; the difference in the means of the two groups may be due to chance and sampling error.
- There is no significant difference in blood pressure for a patient before and after treatment; the difference in the means of the two groups may be due to chance and sampling error.
A chi-square fit test for two independent variables is appropriate to use to compare two variables in a contigency table to check if the data fits a distribution (T/F)
- True
A classification tree can have multiple leaf nodes with the same classification class. (T/F)
- True
A decision tree building process is greedy because at each step of the tree-building process, the best split is made at that step rather than looking ahead and picking a split that will lead to a better tree in some future step (T/F)
- True
A decision tree building process is greedy because at each step of the tree-building process, the best split is made at that step rather than looking ahead and picking a split that will lead to a better tree in some future step. (T/F)
- True
A decision tree divides the predictor space (the set of all possible values of the Xi's) into a number of distinct and non-overlapping regions/boxes/nodes. The same prediction is made for all observations in a region, which is the mean or most common class of the response values for the training observations in that region. (T/F)
- True
A simple linear regression model is built using least squared error and a confidence interval = 95%. This model is executed 100 times with a new data set from the population each time. Each time a different confidence interval is usually produced. The probability that the confidence interval will contain the true value of β1 is 95% of the time, and the probability is 5% of the time that the resulting confidence interval will not contain the true value of β1 (T/F)
- True
A z-test is used when the population parameters (mean and standard deviation) are known. (T/F)
- True
Bagging takes repeated samples from a training data set which, in effect, generates multiple different bootstrapped training sets. (T/F)
- True
Binary logistic regression can be used to predict the probability of a categorical dependent variable.(T/F)
- True
Bonferroni's Principle states that you look for events of a given type you can expect to find events to occur even if the data is completely random. (T/F)
- True
Combining the results of a large number of decision trees can often greatly improve prediction accuracy, but at the expense of some loss in interpretation. (T/F)
- True
If the response variable is a binary variable, a logistic regression model is more appropriate for classification than a linear regression model. A linear regression model might produce probabilities less than zero or larger than one. (T/F)
- True
In each iteration of k-fold cross validation, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. (T/F)
- True
R-squared is a measure of how much variability of the target variable can be explained by the predictor variables in a multple linear regression model. (T/F)
- True
Random forests provide an improvement over bagged trees by decorrelating the trees, which reduces the variance when the trees are averaged or majority vote wins. (T/F)
- True
Regularization is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting (high variance) (T/F)
- True
The curse of dimensionality applies to the challenge to perform meaningful mathematics when the number of features, p, is much greater than the number of samples, n, i.e. in the limit p>>n (T/F)
- True
The greater the inter-tree correlation, the greater the random forest error rate, so it is desirable that trees are as uncorrelated as possible. (T/F)
- True
The optimal predictor of Y with regard to a loss function expressed as a mean-squared prediction error f(x) = E( Y| X = x ) is the function that minimizes E[ Y - g(X) )^2| X=x ] over all functions g at all points X = x. (T/F)
- True
The response variable of a logistic regression model can have more than two levels or values. (T/F)
- True
The size of the test set for one iteration of K-fold cross validation is one fold. (T/F)
- True
The training dataset should be used to build a supervised model; a test dataset should not be used to build a supervised model. (T/F)
- True
Typically as the flexibility of f hat increases, its variances increases and its bias decreases. (T/F)
- True
Unsupervised data mining techniques specify a target variable. (T/F)
- True
When a new variable is added to a multiple regression model, the R2 value always increases (T/F)
- True
Which data characteristic is desirable in model data? - Different distribution compared to predicted cases - Constant value with respect to other values in other fields - Uncorrelated with other variables - Granularity is different than predicted cases
- Uncorrelated with other variables
Which distribution should you consider for the best fit of continuous, symmetric data when the data is not clustered around a central value? - lognormal - Poisson - Geometric - Uniform or Multi-modal
- Uniform or Multi-modal
Predicting wage in the wage data set can be best modeled as a linear regression model with the different variables. Out of the answers provided below which is true of a simple linear regression model to predict wage? - Wage is a continuous or quantitative output with year as the input variable, demonstrating a small linear relationship with year - Wage, education level, and age can be observed as input variables without any output variables - Wage is a continuous or quantitative output and age is the only input variable, demonstrating a significant amount of variability associated with age - Education level is a categorical or qualitative output that has an increasing relationship with wage as the input variable
- Wage is a continuous or quantitative output with year as the input variable, demonstrating a small linear relationship with year
What is the true positive rate (also known as sensitivity or recall) fr the following contingency table? Context: we are trying to predict who has heart disease based on if they smoke or not. (https://canvas.umn.edu/courses/262279/files/21880935/preview) - (a+c)/(a+b+c+d) - a/(a+b) - a/(a+c) - (a+b)/(a+c)
- a/(a+c)
Which of the following is a discrete random variable? - annual number of sweepstakes winners from New York City - all the above - average height of a randomly selected group of boys - distance between two locations
- annual number of sweepstakes
Given a five-point summary where: minimum=10, Q1 = 80, median=100,Q3=500, and maximum=700. What would be a possible interpretation of this summary? - data is uniform - the median has roughly the same value as the mean - data is skewed - data roughly fits a Gaussian distribution
- data is skewed
Which statement about p-value is false? - p-value is the probability of finding the observed or a more extreme result when the null hypothesis (H 0) of a study question is true - p- value is the probability of making a mistake by rejecting a true null hypothesis (Type I error) . - The smaller the p-value, the stronger the evidence that you should reject the null hypothesis. - A p-value provides a measure of the weight of evidence that an effect (usually a relationship between variables) observed in the sample data is a real effect in the population.
- p- value is the probability of making a mistake by rejecting a true null hypothesis (Type I error) .
Which term is defined as the continuous variable value at any given sample in the sample space, which can be interpreted as providing relative likelihood that the value of the random variable would equal that sample? - cumulative distribution function - expected value - probability density function - variance
- probability density function
As the size of a sample increases, the law of large numbers estimates the shape of the distribution of sample means to approximate a Gaussian (T/F)
False
Correlation implies causation; that is, correlation between two variables implies that one causes the other. (T/F)
False
What is the purpose of a dimenstionality constant? - understand if enough data is available for a model - all of these answers - calculate accuracy of a model - determine the probability distribution for a data set
Understand if enough data is available for a model
Which method(s) can be used to decide which distribution fits a data set? - all of these answers - run a test for goodness of fit that compares the actual data to a distribution - compare a histogram of actual data to a distribution - compute the moments of the actual data distribution and examine them for a fit to a distribution
all of these answers