Business Analytics quiz 1-5

Ace your homework & exams now with Quizwiz!

In testing the difference between two independent population means, it is assumed that the level of measurement of the variable is at least _____________. a ordinal scale a nominal scale a ratio scale a interval scale

a interval scale

If 100 megabytes of storage cost a penny, a terabyte of storage would cost $1,000 $1,000,000 $100 $10,000 $100,000

$100

If a 100 megabytes of storage cost a penny, a petabyte of storage would cost $1,000, 000 $100 $10,000 $1,000 $100,000

$100,000

If two variables do not have a strong (linear) relationship, the correlation coefficient between the two variables will be close to -1 +1 0 None of the above

0

If the random variable of x is normally distributed, ______% of all possible observed values of x will be within two standard deviations of the mean 95.44 85.00 99.73 68.26

95.44

According to the empirical rule, the percentage of data that will fall within 3 standard deviations of the average in a normal distribution is approximately 99% 95% 90% 75% 89%

99%

In testing the difference between two independent population means, it is assumed that the level of measurement is at least _____________. a ratio variable a nominal variable a interval variable a ordinal variable

a interval variable

An experiment studies the number of tickets written each day by campus police for illegal parking by The University of Akron students. This variable is Nominal Ordinal Ratio Interval

Ratio

Which activity is performed in the Deployment phase of the CRISP-DM process? Try different analytical techniques Evaluate results Produce project plan Assess whether the model met business objectives or not Review project

Review project

A Stem & Leaf plot is similar to a histogram but is usually more informative display of relatively small data sets

True

A histogram is only appropriate for variables whose values are numerical and measured on an interval or ratio scale

True

A one-way ANOVA is a method that allows us to estimate and compare the effects of several treatments on a response variable

True

A statistical hypothesis is a statement about the value of a population parameter

True

A two-tailed test is one where Ha no direction is indicated and utilizes =/=

True

An outlier is an observation in a data set, which is far removed in value from the others in the data set

True

Big data can generate significant financial value across many companies and industry sectors

True

Big data make companies smart

True

CRISP-DM is an iterative process

True

Data Mining is a complex process requiring various tools and different people

True

Data mining should be viewed as a process

True

Getting data in order is so critical to analytics that most organizations have to undertake substantial data management efforts before they can do a lot of analysis

True

In Multiple Regression analysis, a t-test is used in testing the significance of an individual independent variable

True

In supervised DM technique, the algorithms learn which values of target variable are associated with predictor variables

True

Pie and bar charts are used to summarize nominal and ordinal data

True

Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and non-bankrupt firms is a supervised technique

True

Quartiles are values that divide a sample of data into four groups containing (as far as possible) equal numbers of observation

True

Regression analysis is an example of a supervised learning technique

True

The error term in the regression model describes the effects of all factors other than the independent variables on y (response variable)

True

The mean and median are the same for a normal distribution

True

The residual is the difference between the observed value of the dependent variable and the predicted value of the dependent variable

True

The three basic building blocks of business analytics are technology, process, and people

True

The yearly amount of snowfall in Akron over 10 years can be categorized as Ratio data

True

For a fixed sample size, the lower we set alpha, the higher is the __________. Type I error Type II error Random error p-value

Type II error

______________ refers to the accuracy or trustworthiness of the big data Volume Value Veracity Velocity None of the above

Veracity

Business understanding phase includes the following except Determine business goals & objectives Initial assessment of tools and techniques Determine data mining goals & objectives Visualize data

Visualize data

Records that have outlier values should always be removed from a dataset during data preparation and cleaning

False

Which of the following statements about this Pareto chart is false? It is a specialized form of a bar chart. It is a visualization tool for continuous data only. It is based on the "80/20 rule." All of the above statements are false.

It is a visualization tool for continuous data only.

Data has been collected on visitor's viewing habits at a bank's website. Which technique is used to identify pages commonly viewed during the same visit to the website? Classification Association Prediction Clustering

Association

An acceptable residual plot exhibits Constant error variance Increasing error variance A curved pattern Decreasing error variance

Constant error variance

Which of the following in not one of the four drivers of business analytics? Desire to analyze big data Desire to optimize business operations Predict new business opportunity Comply with regulatory requirements

Desire to analyze big data

Data was collected on students at a nearby university on their GPA at the end of their freshman year and their living accommodations (dorm, off-campus, and other). For which form of housing is the distribution of GPAs the most symmetric (least skewed)? Dorm Off-campus Other None of the abov

Dorm

A box plot can be used to summarize nominal data

False

A model's accuracy rate on the training data set is a better measure of the model's predictive ability than its accuracy rate on the validation data set

False

A negative correlation coefficient (r) implies a weak relationship among the variables

False

A scatterplot can be drawn with a set of two categorical data

False

Analytics help companies to be efficient but not effective

False

Business Intelligence is a subset of Business Analytics

False

CRISP-DM model is dependent on industry sector and technology used

False

CRISP-DM process model consists of five phases

False

For a hypothesis test about a population mean, if the level of significance (alpha) is less than the p-value, the null hypothesis is rejected

False

If the random variable of x is normally distributed, 68.26% of all possible observed values of x will be within two standard deviations of the mean

False

In a simple linear regression model, the coefficient of determination (R-Square) not only indicates the strength of the relationship between independent and dependent variables, but also shows whether the relationship is positive or negative

False

In multiple regression analysis, if the simple correlation coefficient (rxy) between the dependent variable and one of the independent variables is .95, then this indicates that multicollinearity exists

False

Most charts, graphs, and other visualizations would be considered predictive models

False

Predictive analytics involves higher mathematically complex models than prescriptive analytics

False

The controller of a chain of toy stores is interested in determining whether there is any difference in the weekly sales of store 1 and store 2. The weekly sales are normally distributed. This problem should be analyzed using Oneway ANOVA

False

The final stage of the CRISP-DM data mining process is "Evaluation."

False

The five Vs of Big Data are volume, velocity, volatility, variety, and veracity

False

The three basic building blocks of business analytics are technology, data, and people

False

The variance inflation factor measures the correlation between the dependent variable and the rest of the independent variables in the regression model

False

This is a valid alternate hypothesis: The average weight of desks made on assembly line one is same as the average weight of desks made on assembly line two

False

This is a valid null hypothesis: The average weight of desks made on assembly line one is different from the average weight of desks made on assembly line two

False

Today, business analytics are data driven rather than business driven

False

Two variables (x1 and y) have a high correlation coefficient ( rx1y .) Therefore, it can be concluded that changes in x1 cause y to change

False

Unstructured data increases the veracity in the data

False

When the F test is used to test the overall significance of a multiple regression model, if the null hypothesis is rejected, it can be concluded that all of the independent variables X1, X2, X3,... Xk are significantly related to the dependent variable Y

False

When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable

False

Multicollinearity between independent variables is severe if the variance inflation factor is Greater than 10 Less than 5 Substantially less than 1 Between -1 and +1

Greater than 10

Please identify the type of visualization shown in the graphic: Bar Graph Histogram Stem & Leaf Plot Pareto Plot Scatterplot

Histogram

Which of the following questions is typically addressed via a Business Intelligence project? What will be the impact if we acquire a competing product? What is the optimal product mix? Why are we losing our 10% of our most valuable customers? In which country did we have the highest sales last quarter?

In which country did we have the highest sales last quarter?

The primary use of stepwise regression is to identify the most significant ___________ that should be included in the multiple regression model Dependent variables dummy variables Independent variables correlated variables

Independent variables

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) Select the appropriate test/model to determine if there is a relationship between age and household income. ANOVA Linear Regression Chi-Square test/ Contingency model t-test of means

Linear Regression

An important factor in selecting software for word-processing and database management systems is the time required to learn how to use the system. To evaluate three file management systems, a firm designed a test involving word processing operators. Since operator variability was believed to have an impact, each of the operators was trained on all three of the file management systems. ------------------ ANALYSIS OF VARIANCE TABLE ----------------- SUM OF SQ'S D.F. MEAN SQ. F(D.F./8) P-VALUE System 103.33 2 51.67 56.36 0.000019 Operator 64.67 4 16.17 17.63 0.000494 ERROR 7.33 8 0.92 --------------------------------------------------------------- TOTAL 175.33 14 If we are using an alpha value of .05 then we would conclude that A) differences exist among both systems and operators. B) differences exist among systems only. C) differences exist among operators only. D) no differences exist.

A) differences exist among both systems and operators.

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) Select the appropriate test/model to analyze differences in mean household income based on the four different levels of marital status. t-test Chi-square test/contingency table Linear regression ANOVA

ANOVA

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.). Select the appropriate test/model to analyze differences in mean household income based on the four different levels of marital status ANOVA T-Test Linear Regression Chi-Square test

ANOVA

Suppose everyone who visits our retail website either gets one of two promotional offers, or no promotion at all. We want to see if making the promotional offers makes a difference in sales. What statistical method would you recommend for this analysis? P-test Z-test ANOVA T-test

ANOVA

CRISP-DM is a hierarchical process model that consists of Phases Generic tasks Specialized tasks Process instances All of the above

All of the above

What is the benefit of running a pilot project during the final phase (Operationalize) of an analytics project? Limit risk Learn about performance constraints Learn what is needed to retrain the model over time All of the above

All of the above

Analytics are not applicable when You have no historical data Variables are difficult to be measured There is not time to perform analysis (rapid decision making) Totally unstructured decisions All of the above are true

All of the above are true

___________ is an iterative variable selection procedure that allows an independent variable to be deleted to a multiple regression model during the next iteration Mixed Elimination Forward Elimination Backward Elimination Stepwise regression

Backward Elimination

Which of the following data visualization charts might include "whiskers"? Stem and leaf plot Pareto chart Pie chart Boxplot

Boxplot

Which of the following tasks would NOT be part of the data understanding phase in CRISP-DM? Exploring the data Collecting the data Building the model Describing the data

Building the model

We can test "goodness of fit" or "independence" of categorical variables by using which of the following distributions? A) Binomial distribution B) F-distribution C) Chi-Square distribution D) Normal distribution E) t-distribution

C) Chi-Square distribution

Which of the following statements is not a property of the normal probability distribution? A. The area under the normal curve to the right of the mean is equal to the area under the normal curve to the left of the mean. B. The normal distribution is symmetric. C. 95.44% of all possible observed values of the random variable x are within plus or minus three standard deviations of the population mean. D. The mean, median, and mode are equal.

C. 95.44% of all possible observed values of the random variable x are within plus or minus three standard deviations of the population mean.

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) For the variable City service satisfaction (scale of 1-5), identify the type of data. Categorical/Ordinal Continuous Categorical/nominal

Categorical/Ordinal

A survey handed out to customers in a local mall asked the following questions: marital status -- including single (S), married (M), widowed (W), divorced (D); annual household income (in $); age (in years); and overall satisfaction with city services (on a scale from 1 to 5 with 1 being poor and 5 being excellent.) For the variable Martial Status (S/M/W/D) identify the Data type. Categorical/Ordinal Categorical/nominal Continuous

Categorical/nominal

Accuracy and completeness of data is very important for analytics. The following data - 02/31/2020 is Both complete and accurate Complete but not accurate Not complete but accurate Not complete and not accurate None of the above

Complete but not accurate

Which is not considered a defining characteristic of Big Data? Volume Processing Complexity Data Structure Data Quality

Data Quality

All of the following is correct except one when describing the present day analytics Fact based decision-making Centrally managed Entire organization Decision focus Data driven

Data driven

In which CRISP-DM phase data is partitioned and training & validation data sets are created? Modeling Evaluation Data preparation Data Understanding Business understanding

Modeling

Data mining models can do the following except Predict sales based on historical data Classify into groups based on characteristics Assign customers to different segments Optimize Taste & Price of a product Associate products that are bought together

Optimize Taste & Price of a product

The data mining project manager meets with the data-warehousing manager to discuss how the data will be collected. Identify the CRISP-DM phase for this task Phase 1 Phase 2 Phase 3 Phase 5 Phase 6

Phase 2

The analysts meet to discuss whether the neural network or decision tree models should be implemented. Identify the CRISP-DM phase for this task Phase 1 Phase 2 Phase 3 Phase 4 Phase 6

Phase 4

WNBA scouts research UA basketball star Rachel Tecca's college scoring history to estimate how much they would blow away other teams if she signed with them. This is an example of Predictive Analytics Prescriptive analytics Descriptive Analytics None of the above

Predictive Analytics

____________________ is a form of analytics which examines data to answer the question "what should be done?" or "what can we do to make XYZ happen?" Descriptive Exploratory Predictive Prescriptive

Prescriptive

Assumptions of a regression model can be evaluated by plotting and analyzing the ________ dependent variables independent variables p values error terms

error terms

If the simple correlation coefficient between two independent variables is greater than .95, then ______________________ is considered to be severe multicollinearity coefficient of determination interaction

multicollinearity

Significant _________ may exist when the overall F-statistic is significant and the individual t statistics for all independent variables are insignificant autocorrelation independence multicollinearity outliers

multicollinearity

How many outlier records appear to be present in this distribution? zero one two unable to determine based on given information

one

In a multiple regression model, the ratio of MSRegression/MSError yields which statistic, used to test the overall model? the F statistic. the wrong statistic. the Chi-Square statistic. the t statistic.

the F statistic.


Related study sets

Chapter 106 ABS Components and Operation

View Set

MGMT 309 Exam 4 Practice Questions

View Set

Legal Structures & Principles for Businesses

View Set