6500: 601 - Business Analytics & Information Strategy Final Exam
If 100 megabytes of storage costs a penny, a terabyte of storage would cost $1,000 $10,000 $100,000 $1,000,000 $100
$100
If 100 megabytes of storage costs a penny, a petabyte of storage would cost $100,000 $1,000 $10,000 $100 $1,000,000
$100,000
The range of feasible values for the correlation coefficient is from: -1 to 1; 0 to 1 -1 to 0
-1 to 1
If two variables do not have a strong (linear) relationship, the correlation coefficient between the two variables will be close to -1; +1; 0; None of these
0
The range of feasible values for the multiple coefficients of determination is from: 0 to infinity; 0 to 1; -1 to 0 -1 to 1 Minus infinity to 0
0 to 1
In an examination, the mean score was 80 with a standard deviation of 5. The population of scores is normally distributed. What proportion of tests has scores over 90? 0.0228; 0.0456; 0.9544; 0.0027
0.0228
A large sample of 400 females had their systolic blood pressure measured. The mean blood pressure was 125 millimeters of mercury and the standard deviation was 10 millimeters of mercury. Approximately how many females in the sample had blood pressures higher than 145 millimeters of mercury? (Use empirical rule about Normal distribution to answer this question). 10 females; 128 females; 3 females; 20 females
10 females
A large sample of females had their systolic blood pressure measured. The mean blood pressure was 125 millimeters of mercury and the standard deviation was 10 millimeters of mercury. What percentage of females had blood pressures between 105 and 135 millimeters of mercury? (Use empirical (standard deviation) rule about Normal distribution to answer this question). 81.5; 99.7; 68; 95
81.5%
If the random variable of x is normally distributed, ______% of all possible observed values of x will be within two standard deviations of the mean. 95.44; 85.00; 68.26; 99.73;
95.44%
According to the empirical rule, the percentage of data that will fall within 3 standard deviations of the average in a normal distribution is approximately 90% 95% 99% 75% 89%
99%
A regression equation will have the best prediction capabilities if the independent variables have: A low degree of multicollinearity & a large standard error; A low degree of multicollinearity & a small standard error; A high degree of multicollinearity & a small standard error; A high degree of multicollinearity & a large standard error
A low degree of multicollinearity & a small standard error
A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.)Select the appropriate test/model to analyze differences in mean household income based on the four different levels of marital status. T-test; Chi-square test/contingency table; Linear regression; ANOVA
ANOVA
Analytics are not applicable when You have no historical data; Variables are difficult to be measured; There is not time to perform analysis (rapid decision making); Totally unstructured decisions; All of the above
All of the above
CRISP-DM is a hierarchical process model that consists of Phases; Generic tasks; Specialized tasks; Process instances; All of the above
All of the above
In the regression tree DM technique, The outcome variable is continuous; The procedures are similar to classification trees; Prediction is the average number of target variable; All of the above are true
All of the above are true
Which of the following statements about a ROC curve is true? It stands for "Receiver Operating Characteristic Curve"; It plots the false positive rate (1 - specificity) & true positive rate (sensitivity); Each point on a ROC curve corresponds to a particular confusion matrix that depends on a specific threshold or cutoff; All of the above statements are true;
All of the above statements are true
The choice of k (number of clusters in a Cluster Analysis) can be made using a variety of methods. Which of the following methods is appropriate in selecting the number of clusters. Based on subject-matter experts; Based on convenience; Based on constraints; Arbitrarily select k All of the methods listed above are appropriate
All of the methods listed above are appropriate
If a null hypothesis is rejected at a significance level of .01, it will ______ be rejected at a significance level of .05. Always; Sometimes; Never
Always
In testing the difference between two independent population means, it is assumed that the level of measurement of the variables is at least _____. An interval scale; A ratio scale; An ordinal scale; A nominal scale
An interval scale
Which of the following is a violation of one of the major assumptions of the simple regression model? As the value of x increases, the value of the error term also increases; Histogram of the residuals from a bell-shaped symmetric curve; The error terms are independent of each other; The error term shows no pattern over time
As the value of x increases, the value of the error term also increases
The following business question - What combination of products do customers frequently purchase together on our website? - can be answered by building a _____ DM model. Clustering; Classification; Prediction; Association
Association
Data has been collected on visitor's viewing habits at a bank's website. Which technique is used to identify pages commonly viewed during the same visit to the website? Association rules; Regression; Correlation; Clustering
Association rules
________ is an iterative variable selection procedure that allows an independent variable to be deleted to a multiple regression model during the next iteration. Forward elimination; Backward elimination; Stepwise regression; Mixed elimination
Backward elimination
Which of the following data visualization charts might include "whiskers?" Pareto chart; Stem & leaf plot; Pie chart; Boxplot
Boxplot
Which of the following tasks would NOT be part of the data understanding phase in CRISP-DM? Exploring the data; Collecting the data; Building the model; Describing the data
Building the model
A survey handed out to customers in a local mall asked the following questions: marital status - including single (s), married (m), widowed (w), divorced (d); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5, with 1 being poor & 5 being excellent). For the variable Marital Status (S/M/W/D), identify the data type. Categorical/Nominal; Categorical/Ordinal; Continuous
Categorical/Nominal
A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.). Select the appropriate test/model to determine if city service satisfaction level & marital status are independent? Linear regression; Chi-square test; T-test; ANOVA
Chi-square test
The following business question - What customer characteristics best predicts bank loan default OR commit fraud? - can be answered by building a ______ DM model. Classification; Prediction; Clustering; Association
Classification
Which of the following is an unsupervised data mining technique? ANOVA; Cluster Analysis; Logistic Regression; Linear Regression; Decision Tree
Cluster Analysis
The ________ measures the percentage of the variation in y (response variable) explained by the multiple regression model or the set of independent variables included in the multiple regression equation. Standard error; Correlation coefficient; Coefficient of determination; Total variation; F-test
Coefficient of determination
An expectable residual plot exhibits: A curve pattern; Decreasing error variance; Constant error variance; Increasing error variance
Constant error variance
"Would you like fries with sandwich?" is an example of which specific business application of market basket analysis? Up-sell; Prison cell; Cross-sell; Down-sell
Cross-sell
All of the following are correct except one when describing the present day analytics. Decision focus; Fact-based decision making; Data driven; Entire organization; Centrally managed
Data drive
Which is not considered a defining characteristic of big data? Volume; Processing complexity; Data structure; Data quality
Data quality
In logistic regression, if we change the cutoff value from .5 (the default) to a cutoff value of .7 we would expect it to: Increase the number of false positives; Decrease the number of false positives; Have no effect on the number of false positives
Decrease the number of false positives
Which of the following is not one of the four drivers of business analytics? Comply with regulatory requirements; Desire to analyze big data; Predict new business opportunity; Desire to optimize business operations
Desire to analyze big data
You have been assigned to do a study of the daily revenue effect of a pricing model of online transactions. All the data currently available to you has been loaded into your analytics database; revenue data, pricing data, and online transaction data. You have done visualization and univariate analysis of all the data and have decided that there are three different models that you would like to build & test. What is your next step? Ask the business owner which model they prefer & proceed with that model; Develop all three models & select model based on the results; Pick the model that demonstrates the most sophisticated technique; Prioritize the models by complexity & proceed with the simplest
Develop all three models & select model based on the results
These are the reasons why companies use analytics except Government regulations protect companies; Proprietary technologies are copied; Geographic advantages do not exist anymore; Getting difficult to differentiate based on products, services, or uses of IT among companies today
Government regulations protect companies
Multicollinearity between independent variables is severe if the variance inflation factor is: Substantially less than 1; Greater than 10; Between -1 and +1 Less than 5
Greater than 10
What is the motivation for using CRISP-DM methodology? Explore all possible approaches; Limits the amount of data needed; Generate positive results; Guarantees a successful project
Guarantees a successful project
The assumption of constant error variance of residuals in regression analysis is called: Homoscedasticity; Normality; Heteroscedasticity; Linearity
Homoscedasticity
Where would you most likely see a dendrogram? In a k-means clustering algorithm; In a neural network model; In decision tree analysis; In a hierarchical clustering algorithm; None of the above
In a hierarchical clustering algorithm
Which of the following questions is typically addressed via a business intelligence project? What will be the impact if we acquire a competing product? What is the optimal product mix? Why are we losing 10% of our most valuable customers? In which country did we have the highest sales last quarter?
In which country did we have the highest sales last quarter?
In logistic regression, if we change the cutoff value from 0.5 (the default) to a cutoff value of 0.3 we would expect it to: Increase the number of false positives; Decrease the number of false positives; Have no effect on the number of false positives
Increase the number of false positives
Assuming a fixed sample size, as alpha (Type I error) decreases, beta (Type II error) ______. Stays the same; Increases; Randomly flucuates; Decreases
Increases
In a multiple regression analysis, if the normal probability plot ___________, then it can be concluded that the assumption of normality is not violated. Is left-skewed; Is greatly curved; Is a straight line; Has the shape of a parabola that opens upward; Has the shape of a symmetric bell curve
Is a straight line
If the wages of workers for a given company are normally distributed with a mean of $15 per hour, then the proportion of the workers earning more than $13 per hour: Is less than the proportion earning more than the mean wage; Is less than 50%; Is greater than the population earning less than $13 per hour; Is greater than the proportion earning less than $18 per hour
Is greater than the proportion earning less than $13 per hour
You want to group customers in your dataset by similarity and assign labels to each group. What is the preferred analytic method to use for this task? Market basket analysis; Logistic regression; K-means clustering; Decision trees
K-means clustering
An odds ratio ____________ 1 indicates that the condition or event is less likely to occur in the first group. Equal to; Less than; Not equal to; Greater than
Less than
A survey was handed out to customers in a local mall asked the following questions: marital status - including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) Select the appropriate test/model to determine if there is a relationship between age and household income. ANOVA; Linear Regression Chi-squared test/contingency table; T-test of means
Linear regression
You are tasked with predicting if a customer will purchase a product (Yes/No) when the customer visits the website and the probability of a purchase decision. You are provided with other relevant variables that are associated with the problem. Which analytical method would you recommend? Linear regression; ANOVA; Association rules; Logistic regression
Logistic regression
Which two analytical methods can be used for categorical target variables? Linear regression & decision tree; Linear regression & logistic regression; Decision tree & cluster analysis; Cluster analysis & logistic regression; Logistic regression & decision tree
Logistic regression & decision tree
What is a distinct property of Logistic Regression compared to Linear Regression? Logistic regression handles missing values well; Logistic regression is robust with correlated predictor variables; Logistic regression works very well with a continuous target variable; Logistic regression returns probability estimates of a response variable
Logistic regression returns probability estimates of a response variable
The type of association rule that is associated with items being purchased together (at the same time) is called _____. Sequence; Retroactive; Lift; Market basket
Market basket
In order to test the effectiveness of a drug called XZR designed to reduce cholesterol levels, 9 heart patients' cholesterol levels are measured before they are given the drug. The same 9 patients use XZR for two continuous months. After two months of continuous use the 9 patients' cholesterol levels are measured again. The comparison of cholesterol levels before vs. after administering the drug is an example of testing the difference between: Two population proportions; Two means from independent populations; Matched pairs from two dependent populations; Two population variances from independent populations
Matched pairs from two dependent populations
If a population distribution is skewed to the right, then, given a random sample from that population, one would expect that the Median would be greater than the mean; Mode would be equal to the mean; Median would be less than the mean; Median would be equal to the mean
Median would be less than the mean
In which CRISP-DM phase is data partitioned & training data & validation data sets are created? Modeling; Data understanding; Data preparation; Evaluation; Business understanding
Modeling
Which data asset is an example of unstructured data? Database table; XML data file; Webserver log file; News article text
News article text
An investigator hired by a client suing for sex discrimination has developed a multiple regression model for employee salaries for the company in question. In this multiple regression model, the salaries are in thousands of dollars. For example, a data entry of 35 for the dependent variable indicates a salary of $35,000. The indicator (dummy) variable for gender is coded as X1=0 of male and X1=1 if female. The computer output of this multiple regression model shows that the coefficient for this variable (X1) is -4.2. The t test showed that X1 was significant at Alpha =0.05. This result implies that for male and female workers of the company, On average, salaries do not differ; On average, males earn $4,200 less; On average, females have 4.2 more years of experience; On average, males have 4.2 more years of experience; On average, females earn $4,200 less
On average, females earn $4,200 less
Data mining models can do the following except: Optimize taste & price of a product; Predict sales based on historical data; Assign customers to different segments; Classify into groups based on characteristics; Association products that are bought together
Optimize taste & price of a product
Assumptions of a regression model can be evaluated by plotting and analyzing the_____. Against the predicted value of Y; Over time; On a normal plot; All of the above
Over time
The data-mining consultant meets with the vice-president for marketing, who says that he would like to move forward with improving customer relationship. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 5; Phase 6
Phase 1
The data mining project manager meets with the data-warehousing manager to discuss how the data will be collected. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 5; Phase 6
Phase 2
The analysts meet to discuss whether the neural network or decision tree models should be implemented. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 4; Phase 6
Phase 4
Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 5; Phase 6
Phase 5
The graph of the prediction equation obtained from the following model y = b0 + b1x1 + b2x2 + e is a(n) Line Exponential curve; Plane; Parabola
Plane
The following business question - How do bank customer demographics & financial stability impact customer credit risk level - can be answered by building a ______ DM model. Association; Clustering; Prediction; Classification
Prediction
The following business question - How much can we expect this customer to spend on the basis of customer's shopping behavior & demographics - can be answered by building a _____ model. Association; Classification; Prediction; Clustering;
Prediction
_____ is a form of analytics which examines data to answer the question "what is going to happen?" Descriptive; Exploratory; Predictive; Prescriptive
Predictive
WNBA scouts research UA basketball star Rachel Tecca's college scoring history to estimate how much they would blow away the other teams if she signed with them. This is an example of Predictive analytics; Prescriptive analytics; Descriptive analytics; None of the above
Predictive analytics
______ is a form of analytics which examines data to answer the question "what should be done?" or "what can we do to make XYZ happen?" Descriptive; Exploratory; Predictive; Prescriptive
Prescriptive
__________ is a form of analytics which examines data to answer the question "what should be done?" Descriptive Exploratory Predictive Prescriptive
Prescriptive
An experiment studies the number of tickets written each day by campus police for illegal parking by The University of Akron students. This variable is: Ratio; Nominal; Ordinal; Interval
Ratio
Which of the following is not true about regression trees DM technique Variable selection & reduction is automatic; Easy to understand; Requires assumptions of statistical models; Produces rules that are easy to interpret & implement
Requires assumptions of statistical models
The dependent variable, the variable of interest in an experiment, is also called ______ variable. Categorical; Factor; Regression; Response
Response
If one of the assumptions of the regression model is violated, performing data transformations on the ____________ may remedy the situation: Independent variable; Slope; Predictor variable; Response variable
Response variable
The Y intercept (b0) in a multiple regression model represents the estimated value of the _____ variable, when the valued of all independent variables are ____. Response, one; Response, zero; Predictor, zero; Predictor, one
Response, zero
Which activity is performed in the deployment phase of the CRISP-DM process? Evaluate results; Produce project plan; Assess whether the model met business objectives or not; Review project; Try different analytical techniques
Review project
You received 100,000 home loan records. You want to take a quick look and see if there is any relationship between mortgage age and mortgage amount before conducting advanced analytics. Which graphical tool would you employ for your preliminary analysis? Histogram; Boxplot; Stacked bar chart; Scatterplot
Scatterplot
In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the Correlation coefficient; Y-intercept of the regression line; Standard error; Slope of the regression line; Coefficient of determination
Slope of the regression line
An anti-theft scanner at an entrance of a bookstore buzzes once for every 1000 innocent people walking through the scanner. The accuracy of the scanner is Specificity of 99.9%; Sensitivity of 99.9%; None of the above; Do not have enough info to answer this question
Specificity of 99.9%
The z-value tells us the number of _____ that a value of x is from the mean. Standard deviations; Variances; Standard means; None of the above
Standard deviations
Insight provides the following benefit except Helps us understand cause & effect; Helps us understand & solve problems; Tells us what happened in the past; Provides us foresight
Tells us what happened in the past
What are the outputs generated by a k-Means clustering Analysis? The centroids of the discovered cluster and the assignment of each input datum to a cluster; The rules that associate each input datum to a class and the diameter of the discovered clusters; Within Sum of Squares for each discovered cluster and the overall cluster dispersion; Class association for each datum and class probabilities;
The centroids of the discovered cluster and the assignment of each input datum to a cluster.
In a multiple regression model, the ratio of MSRegression/MSError yields which statistic, used to test the overall model? The chi-square statistic; The wrong statistic; The t-statistic; The f-statistic
The f-statistic
How is False Positive Rate defined? The number of true positives/all positives; The number of true negatives/all positives; The fraction of negative instances that were misclassified; The fraction of positive instances that were misclassified
The fraction of negative instances that were misclassified
In simple regression analysis, if the correlation coefficient (r) between the dependent and independent variable is a positive value, then The slope of the regression line must be positive; The y-intercept must also be positive; The coefficient of determination can be either positive or negative, depending on the value of the slope; The least squares regression equation could either have a positive or negative slope
The slope of the regression line must be positive
A decision maker wants to incorporate geographic region into a multiple regression model, where the possible responses are north, south, east, and west. How many dummy variables will be added to the model? One; Two; Three; Four
Three
Which of the following techniques is NOT a common data mining technique? Association; Prediction; Classification; Training
Training
The rejection rate of a true null hypothesis is called a ____ error. Type I; Beta; Random; Type II
Type I
For a fixed sample size, the lower we set alpha, the higher the ______. Random error; Type II error; Type I error; P-value
Type II error
______ refers to the accuracy or trustworthiness of the big data. Volume Value Veracity Velocity None of the above
Veracity
The business understanding phase includes the following except Initial assessment of tools & techniques; Determine business goals & objectives; Determine data mining goals & objectives; Visualize data
Visualize data
Which of the following is a categorical variable? Bank account balance; Value of company; Whether a person has a traffic violation; Daily sales in a store; Air temperature
Whether a person has a traffic violation
A Lift ratio of 23 for the Market Basket association rule "If X Then Y" indicates . . . Purchases of X and Y have no significant relationship; X being purchased moderately decreases the chance of Y also being purchased; X being purchased moderately increases the chance of Y also being purchased; X being purchased strongly increases the chance of Y also being purchased; X being purchased strongly decreases the chance of Y also being purchased;
X being purchased strongly increases the chance of Y also being purchased;
If we are testing the significance of the independent variable X1, and we reject the null hypothesis H0: b1=0, we conclude that: X1 is not significantly related to Y; X1 is an unimportant independent variable; B1 is significantly related to the dependent variable Y; X1 i significantly related to Y
X1 is significantly related to Y
The __________ of the simple linear regression model (Y = B0 + B1*X) is the value of Y when the value of X is zero. Coefficient of X; Error; Y-intercept; Slope
Y-intercept
In a decision tree algorithm, how is the attribute picked for the next split? You pick the attribute with the lowest logworth; You pick the attribute where the conditional entropy is higher than the base entropy; You pick the attribute where the conditional entropy is maximum; You pick the attribute with the highest logworth
You pick the attribute with the highest logworth
A standard normal distribution has a mean of ___ & a standard deviation of ___. One, zero; Zero, one; One, one; Zero, zero
Zero, one