6500: 601 - Business Analytics & Information Strategy Final Exam

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

If 100 megabytes of storage costs a penny, a terabyte of storage would cost $1,000 $10,000 $100,000 $1,000,000 $100

$100

If 100 megabytes of storage costs a penny, a petabyte of storage would cost $100,000 $1,000 $10,000 $100 $1,000,000

$100,000

The range of feasible values for the correlation coefficient is from: -1 to 1; 0 to 1 -1 to 0

-1 to 1

If two variables do not have a strong (linear) relationship, the correlation coefficient between the two variables will be close to -1; +1; 0; None of these

0

The range of feasible values for the multiple coefficients of determination is from: 0 to infinity; 0 to 1; -1 to 0 -1 to 1 Minus infinity to 0

0 to 1

In an examination, the mean score was 80 with a standard deviation of 5. The population of scores is normally distributed. What proportion of tests has scores over 90? 0.0228; 0.0456; 0.9544; 0.0027

0.0228

A large sample of 400 females had their systolic blood pressure measured. The mean blood pressure was 125 millimeters of mercury and the standard deviation was 10 millimeters of mercury. Approximately how many females in the sample had blood pressures higher than 145 millimeters of mercury? (Use empirical rule about Normal distribution to answer this question). 10 females; 128 females; 3 females; 20 females

10 females

A large sample of females had their systolic blood pressure measured. The mean blood pressure was 125 millimeters of mercury and the standard deviation was 10 millimeters of mercury. What percentage of females had blood pressures between 105 and 135 millimeters of mercury? (Use empirical (standard deviation) rule about Normal distribution to answer this question). 81.5; 99.7; 68; 95

81.5%

If the random variable of x is normally distributed, ______% of all possible observed values of x will be within two standard deviations of the mean. 95.44; 85.00; 68.26; 99.73;

95.44%

According to the empirical rule, the percentage of data that will fall within 3 standard deviations of the average in a normal distribution is approximately 90% 95% 99% 75% 89%

99%

A regression equation will have the best prediction capabilities if the independent variables have: A low degree of multicollinearity & a large standard error; A low degree of multicollinearity & a small standard error; A high degree of multicollinearity & a small standard error; A high degree of multicollinearity & a large standard error

A low degree of multicollinearity & a small standard error

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.)Select the appropriate test/model to analyze differences in mean household income based on the four different levels of marital status. T-test; Chi-square test/contingency table; Linear regression; ANOVA

ANOVA

Analytics are not applicable when You have no historical data; Variables are difficult to be measured; There is not time to perform analysis (rapid decision making); Totally unstructured decisions; All of the above

All of the above

CRISP-DM is a hierarchical process model that consists of Phases; Generic tasks; Specialized tasks; Process instances; All of the above

All of the above

In the regression tree DM technique, The outcome variable is continuous; The procedures are similar to classification trees; Prediction is the average number of target variable; All of the above are true

All of the above are true

Which of the following statements about a ROC curve is true? It stands for "Receiver Operating Characteristic Curve"; It plots the false positive rate (1 - specificity) & true positive rate (sensitivity); Each point on a ROC curve corresponds to a particular confusion matrix that depends on a specific threshold or cutoff; All of the above statements are true;

All of the above statements are true

The choice of k (number of clusters in a Cluster Analysis) can be made using a variety of methods. Which of the following methods is appropriate in selecting the number of clusters. Based on subject-matter experts; Based on convenience; Based on constraints; Arbitrarily select k All of the methods listed above are appropriate

All of the methods listed above are appropriate

If a null hypothesis is rejected at a significance level of .01, it will ______ be rejected at a significance level of .05. Always; Sometimes; Never

Always

In testing the difference between two independent population means, it is assumed that the level of measurement of the variables is at least _____. An interval scale; A ratio scale; An ordinal scale; A nominal scale

An interval scale

Which of the following is a violation of one of the major assumptions of the simple regression model? As the value of x increases, the value of the error term also increases; Histogram of the residuals from a bell-shaped symmetric curve; The error terms are independent of each other; The error term shows no pattern over time

As the value of x increases, the value of the error term also increases

The following business question - What combination of products do customers frequently purchase together on our website? - can be answered by building a _____ DM model. Clustering; Classification; Prediction; Association

Association

Data has been collected on visitor's viewing habits at a bank's website. Which technique is used to identify pages commonly viewed during the same visit to the website? Association rules; Regression; Correlation; Clustering

Association rules

________ is an iterative variable selection procedure that allows an independent variable to be deleted to a multiple regression model during the next iteration. Forward elimination; Backward elimination; Stepwise regression; Mixed elimination

Backward elimination

Which of the following data visualization charts might include "whiskers?" Pareto chart; Stem & leaf plot; Pie chart; Boxplot

Boxplot

Which of the following tasks would NOT be part of the data understanding phase in CRISP-DM? Exploring the data; Collecting the data; Building the model; Describing the data

Building the model

A survey handed out to customers in a local mall asked the following questions: marital status - including single (s), married (m), widowed (w), divorced (d); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5, with 1 being poor & 5 being excellent). For the variable Marital Status (S/M/W/D), identify the data type. Categorical/Nominal; Categorical/Ordinal; Continuous

Categorical/Nominal

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.). Select the appropriate test/model to determine if city service satisfaction level & marital status are independent? Linear regression; Chi-square test; T-test; ANOVA

Chi-square test

The following business question - What customer characteristics best predicts bank loan default OR commit fraud? - can be answered by building a ______ DM model. Classification; Prediction; Clustering; Association

Classification

Which of the following is an unsupervised data mining technique? ANOVA; Cluster Analysis; Logistic Regression; Linear Regression; Decision Tree

Cluster Analysis

The ________ measures the percentage of the variation in y (response variable) explained by the multiple regression model or the set of independent variables included in the multiple regression equation. Standard error; Correlation coefficient; Coefficient of determination; Total variation; F-test

Coefficient of determination

An expectable residual plot exhibits: A curve pattern; Decreasing error variance; Constant error variance; Increasing error variance

Constant error variance

"Would you like fries with sandwich?" is an example of which specific business application of market basket analysis? Up-sell; Prison cell; Cross-sell; Down-sell

Cross-sell

All of the following are correct except one when describing the present day analytics. Decision focus; Fact-based decision making; Data driven; Entire organization; Centrally managed

Data drive

Which is not considered a defining characteristic of big data? Volume; Processing complexity; Data structure; Data quality

Data quality

In logistic regression, if we change the cutoff value from .5 (the default) to a cutoff value of .7 we would expect it to: Increase the number of false positives; Decrease the number of false positives; Have no effect on the number of false positives

Decrease the number of false positives

Which of the following is not one of the four drivers of business analytics? Comply with regulatory requirements; Desire to analyze big data; Predict new business opportunity; Desire to optimize business operations

Desire to analyze big data

You have been assigned to do a study of the daily revenue effect of a pricing model of online transactions. All the data currently available to you has been loaded into your analytics database; revenue data, pricing data, and online transaction data. You have done visualization and univariate analysis of all the data and have decided that there are three different models that you would like to build & test. What is your next step? Ask the business owner which model they prefer & proceed with that model; Develop all three models & select model based on the results; Pick the model that demonstrates the most sophisticated technique; Prioritize the models by complexity & proceed with the simplest

Develop all three models & select model based on the results

These are the reasons why companies use analytics except Government regulations protect companies; Proprietary technologies are copied; Geographic advantages do not exist anymore; Getting difficult to differentiate based on products, services, or uses of IT among companies today

Government regulations protect companies

Multicollinearity between independent variables is severe if the variance inflation factor is: Substantially less than 1; Greater than 10; Between -1 and +1 Less than 5

Greater than 10

What is the motivation for using CRISP-DM methodology? Explore all possible approaches; Limits the amount of data needed; Generate positive results; Guarantees a successful project

Guarantees a successful project

The assumption of constant error variance of residuals in regression analysis is called: Homoscedasticity; Normality; Heteroscedasticity; Linearity

Homoscedasticity

Where would you most likely see a dendrogram? In a k-means clustering algorithm; In a neural network model; In decision tree analysis; In a hierarchical clustering algorithm; None of the above

In a hierarchical clustering algorithm

Which of the following questions is typically addressed via a business intelligence project? What will be the impact if we acquire a competing product? What is the optimal product mix? Why are we losing 10% of our most valuable customers? In which country did we have the highest sales last quarter?

In which country did we have the highest sales last quarter?

In logistic regression, if we change the cutoff value from 0.5 (the default) to a cutoff value of 0.3 we would expect it to: Increase the number of false positives; Decrease the number of false positives; Have no effect on the number of false positives

Increase the number of false positives

Assuming a fixed sample size, as alpha (Type I error) decreases, beta (Type II error) ______. Stays the same; Increases; Randomly flucuates; Decreases

Increases

In a multiple regression analysis, if the normal probability plot ___________, then it can be concluded that the assumption of normality is not violated. Is left-skewed; Is greatly curved; Is a straight line; Has the shape of a parabola that opens upward; Has the shape of a symmetric bell curve

Is a straight line

If the wages of workers for a given company are normally distributed with a mean of $15 per hour, then the proportion of the workers earning more than $13 per hour: Is less than the proportion earning more than the mean wage; Is less than 50%; Is greater than the population earning less than $13 per hour; Is greater than the proportion earning less than $18 per hour

Is greater than the proportion earning less than $13 per hour

You want to group customers in your dataset by similarity and assign labels to each group. What is the preferred analytic method to use for this task? Market basket analysis; Logistic regression; K-means clustering; Decision trees

K-means clustering

An odds ratio ____________ 1 indicates that the condition or event is less likely to occur in the first group. Equal to; Less than; Not equal to; Greater than

Less than

A survey was handed out to customers in a local mall asked the following questions: marital status - including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) Select the appropriate test/model to determine if there is a relationship between age and household income. ANOVA; Linear Regression Chi-squared test/contingency table; T-test of means

Linear regression

You are tasked with predicting if a customer will purchase a product (Yes/No) when the customer visits the website and the probability of a purchase decision. You are provided with other relevant variables that are associated with the problem. Which analytical method would you recommend? Linear regression; ANOVA; Association rules; Logistic regression

Logistic regression

Which two analytical methods can be used for categorical target variables? Linear regression & decision tree; Linear regression & logistic regression; Decision tree & cluster analysis; Cluster analysis & logistic regression; Logistic regression & decision tree

Logistic regression & decision tree

What is a distinct property of Logistic Regression compared to Linear Regression? Logistic regression handles missing values well; Logistic regression is robust with correlated predictor variables; Logistic regression works very well with a continuous target variable; Logistic regression returns probability estimates of a response variable

Logistic regression returns probability estimates of a response variable

The type of association rule that is associated with items being purchased together (at the same time) is called _____. Sequence; Retroactive; Lift; Market basket

Market basket

In order to test the effectiveness of a drug called XZR designed to reduce cholesterol levels, 9 heart patients' cholesterol levels are measured before they are given the drug. The same 9 patients use XZR for two continuous months. After two months of continuous use the 9 patients' cholesterol levels are measured again. The comparison of cholesterol levels before vs. after administering the drug is an example of testing the difference between: Two population proportions; Two means from independent populations; Matched pairs from two dependent populations; Two population variances from independent populations

Matched pairs from two dependent populations

If a population distribution is skewed to the right, then, given a random sample from that population, one would expect that the Median would be greater than the mean; Mode would be equal to the mean; Median would be less than the mean; Median would be equal to the mean

Median would be less than the mean

In which CRISP-DM phase is data partitioned & training data & validation data sets are created? Modeling; Data understanding; Data preparation; Evaluation; Business understanding

Modeling

Which data asset is an example of unstructured data? Database table; XML data file; Webserver log file; News article text

News article text

An investigator hired by a client suing for sex discrimination has developed a multiple regression model for employee salaries for the company in question. In this multiple regression model, the salaries are in thousands of dollars. For example, a data entry of 35 for the dependent variable indicates a salary of $35,000. The indicator (dummy) variable for gender is coded as X1=0 of male and X1=1 if female. The computer output of this multiple regression model shows that the coefficient for this variable (X1) is -4.2. The t test showed that X1 was significant at Alpha =0.05. This result implies that for male and female workers of the company, On average, salaries do not differ; On average, males earn $4,200 less; On average, females have 4.2 more years of experience; On average, males have 4.2 more years of experience; On average, females earn $4,200 less

On average, females earn $4,200 less

Data mining models can do the following except: Optimize taste & price of a product; Predict sales based on historical data; Assign customers to different segments; Classify into groups based on characteristics; Association products that are bought together

Optimize taste & price of a product

Assumptions of a regression model can be evaluated by plotting and analyzing the_____. Against the predicted value of Y; Over time; On a normal plot; All of the above

Over time

The data-mining consultant meets with the vice-president for marketing, who says that he would like to move forward with improving customer relationship. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 5; Phase 6

Phase 1

The data mining project manager meets with the data-warehousing manager to discuss how the data will be collected. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 5; Phase 6

Phase 2

The analysts meet to discuss whether the neural network or decision tree models should be implemented. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 4; Phase 6

Phase 4

Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. Identify the CRISP-DM phase for this task. Phase 1; Phase 2; Phase 3; Phase 5; Phase 6

Phase 5

The graph of the prediction equation obtained from the following model y = b0 + b1x1 + b2x2 + e is a(n) Line Exponential curve; Plane; Parabola

Plane

The following business question - How do bank customer demographics & financial stability impact customer credit risk level - can be answered by building a ______ DM model. Association; Clustering; Prediction; Classification

Prediction

The following business question - How much can we expect this customer to spend on the basis of customer's shopping behavior & demographics - can be answered by building a _____ model. Association; Classification; Prediction; Clustering;

Prediction

_____ is a form of analytics which examines data to answer the question "what is going to happen?" Descriptive; Exploratory; Predictive; Prescriptive

Predictive

WNBA scouts research UA basketball star Rachel Tecca's college scoring history to estimate how much they would blow away the other teams if she signed with them. This is an example of Predictive analytics; Prescriptive analytics; Descriptive analytics; None of the above

Predictive analytics

______ is a form of analytics which examines data to answer the question "what should be done?" or "what can we do to make XYZ happen?" Descriptive; Exploratory; Predictive; Prescriptive

Prescriptive

__________ is a form of analytics which examines data to answer the question "what should be done?" Descriptive Exploratory Predictive Prescriptive

Prescriptive

An experiment studies the number of tickets written each day by campus police for illegal parking by The University of Akron students. This variable is: Ratio; Nominal; Ordinal; Interval

Ratio

Which of the following is not true about regression trees DM technique Variable selection & reduction is automatic; Easy to understand; Requires assumptions of statistical models; Produces rules that are easy to interpret & implement

Requires assumptions of statistical models

The dependent variable, the variable of interest in an experiment, is also called ______ variable. Categorical; Factor; Regression; Response

Response

If one of the assumptions of the regression model is violated, performing data transformations on the ____________ may remedy the situation: Independent variable; Slope; Predictor variable; Response variable

Response variable

The Y intercept (b0) in a multiple regression model represents the estimated value of the _____ variable, when the valued of all independent variables are ____. Response, one; Response, zero; Predictor, zero; Predictor, one

Response, zero

Which activity is performed in the deployment phase of the CRISP-DM process? Evaluate results; Produce project plan; Assess whether the model met business objectives or not; Review project; Try different analytical techniques

Review project

You received 100,000 home loan records. You want to take a quick look and see if there is any relationship between mortgage age and mortgage amount before conducting advanced analytics. Which graphical tool would you employ for your preliminary analysis? Histogram; Boxplot; Stacked bar chart; Scatterplot

Scatterplot

In simple regression analysis the quantity that gives the amount by which Y (dependent variable) changes for a unit change in X (independent variable) is called the Correlation coefficient; Y-intercept of the regression line; Standard error; Slope of the regression line; Coefficient of determination

Slope of the regression line

An anti-theft scanner at an entrance of a bookstore buzzes once for every 1000 innocent people walking through the scanner. The accuracy of the scanner is Specificity of 99.9%; Sensitivity of 99.9%; None of the above; Do not have enough info to answer this question

Specificity of 99.9%

The z-value tells us the number of _____ that a value of x is from the mean. Standard deviations; Variances; Standard means; None of the above

Standard deviations

Insight provides the following benefit except Helps us understand cause & effect; Helps us understand & solve problems; Tells us what happened in the past; Provides us foresight

Tells us what happened in the past

What are the outputs generated by a k-Means clustering Analysis? The centroids of the discovered cluster and the assignment of each input datum to a cluster; The rules that associate each input datum to a class and the diameter of the discovered clusters; Within Sum of Squares for each discovered cluster and the overall cluster dispersion; Class association for each datum and class probabilities;

The centroids of the discovered cluster and the assignment of each input datum to a cluster.

In a multiple regression model, the ratio of MSRegression/MSError yields which statistic, used to test the overall model? The chi-square statistic; The wrong statistic; The t-statistic; The f-statistic

The f-statistic

How is False Positive Rate defined? The number of true positives/all positives; The number of true negatives/all positives; The fraction of negative instances that were misclassified; The fraction of positive instances that were misclassified

The fraction of negative instances that were misclassified

In simple regression analysis, if the correlation coefficient (r) between the dependent and independent variable is a positive value, then The slope of the regression line must be positive; The y-intercept must also be positive; The coefficient of determination can be either positive or negative, depending on the value of the slope; The least squares regression equation could either have a positive or negative slope

The slope of the regression line must be positive

A decision maker wants to incorporate geographic region into a multiple regression model, where the possible responses are north, south, east, and west. How many dummy variables will be added to the model? One; Two; Three; Four

Three

Which of the following techniques is NOT a common data mining technique? Association; Prediction; Classification; Training

Training

The rejection rate of a true null hypothesis is called a ____ error. Type I; Beta; Random; Type II

Type I

For a fixed sample size, the lower we set alpha, the higher the ______. Random error; Type II error; Type I error; P-value

Type II error

______ refers to the accuracy or trustworthiness of the big data. Volume Value Veracity Velocity None of the above

Veracity

The business understanding phase includes the following except Initial assessment of tools & techniques; Determine business goals & objectives; Determine data mining goals & objectives; Visualize data

Visualize data

Which of the following is a categorical variable? Bank account balance; Value of company; Whether a person has a traffic violation; Daily sales in a store; Air temperature

Whether a person has a traffic violation

A Lift ratio of 23 for the Market Basket association rule "If X Then Y" indicates . . . Purchases of X and Y have no significant relationship; X being purchased moderately decreases the chance of Y also being purchased; X being purchased moderately increases the chance of Y also being purchased; X being purchased strongly increases the chance of Y also being purchased; X being purchased strongly decreases the chance of Y also being purchased;

X being purchased strongly increases the chance of Y also being purchased;

If we are testing the significance of the independent variable X1, and we reject the null hypothesis H0: b1=0, we conclude that: X1 is not significantly related to Y; X1 is an unimportant independent variable; B1 is significantly related to the dependent variable Y; X1 i significantly related to Y

X1 is significantly related to Y

The __________ of the simple linear regression model (Y = B0 + B1*X) is the value of Y when the value of X is zero. Coefficient of X; Error; Y-intercept; Slope

Y-intercept

In a decision tree algorithm, how is the attribute picked for the next split? You pick the attribute with the lowest logworth; You pick the attribute where the conditional entropy is higher than the base entropy; You pick the attribute where the conditional entropy is maximum; You pick the attribute with the highest logworth

You pick the attribute with the highest logworth

A standard normal distribution has a mean of ___ & a standard deviation of ___. One, zero; Zero, one; One, one; Zero, zero

Zero, one


Kaugnay na mga set ng pag-aaral

Physical Science: 3rd Quarter Exam

View Set

Chapter 10 Contract Performance, Breach, and Remedies

View Set

Chapter 35- Key Pediatric Nursing Interventions

View Set

Warfarin NCLEX Questions (Coumadin)

View Set

SG-CHAPTER 31: Concepts of Care for Patients with Dysrhythmias

View Set

Electrochemistry Review Quiz MCQ

View Set