Predictive Analytics and BI Exam 1

Ace your homework & exams now with Quizwiz!

Which of the following is true about clustering can

Assigning customers in different segments

Which of the following combines architectures, databases, analytical tools, applications, and methodologies?

BI

The classification method that uses conditional probabilities to build classification models is called:

Bayesian classifiers

Random sampling of a fixed number of instances from the original data with replacement to construct the training data set is achieved by

Bootstrapping

______________ classification approach uses historical samples and cases to identify commonalities in order to assign a new case to the most similar category.

Case-based reasoning

When SVM prediction model is developed, its can be integrated into decision support system by which of the following methods?

Computational object

In classification problems, the main source for all accuracy estimation metrics is a

Confusion matrix

Which of the following provides an estimate of the degree of linear association between numerically represented variables.

Correlation

The _____________ method's common idea is to split the data sample into a number of randomly drawn, disjointed subsamples.

Cross validation

Which of the following is not commonly used as an enabler of descriptive analytics?

Data mining

Which one of the following represents the final phase of data preprocessing?

Data reduction

Data mining is primarily concerned with mining (i.e., digging out data) from a variety of disparate data sources.

False

Decision trees are part of the regression-type prediction methods.

False

If a classification problem is not binary, we cannot use confusion matrix to tabulate prediction outcomes.

False

In banking and finance, data mining is often used to manage microeconomics movements and overall cash flow outcomes.

False

In linear regression independence of errors assumption is also known as homoscedasticity

False

In the Dallas Cowboys case study, the focus was on using data analytics to decide which players would play every week.

False

In time-series forecasting, an estimator's mean squared error measures the average absolute error between the estimated and the actual values.

False

Major commercial business intelligence products and services were well established in the early 1970s.

False

Nominal data represent the labels of multiple classes used to divide a variable into specific groups.

False

Novel is a key term in the definition of data mining, which means that the patterns are known by the user within the context of the system being analyzed.

False

Prediction modeling is often classified under the unsupervised machine learning methods.

False

The Naïve Bayes method requires output variables to have numeric values.

False

The data storage component of a business reporting system builds the various reports and hosts them for, or disseminates them to users. It also provides notification, annotation, collaboration, and other services.

False

The most important driver behind business analytics popularity is the need for the business managers to make experience and intuition driven business decisions.

False

The ratio of correctly classified positives divided by the total actual positive count is defined as a precision metric.

False

The tasks that are followed in the SVM model when performing the data preprocessing includes:

Handling missing and incomplete characters Handling noisy values Numericizing the data Normalizing the data

____________ clustering methods are based on the basic idea that nearby objects are more related to each other than are those that are farther away from each other.

Hierarchical

Which of the following is the definition of data?

Information

Which of the following application areas make use of association rule mining:

Medical Records Credit card transactions Banking services Sales transactions

With dashboards, the layer of information that uses graphical, abstracted data to keep tabs on key performance metrics is the ________ layer.

Monitoring

The classification modeling where the class of a case is predicted to be the class of the closest training samples is called:

Nearest-neighbor algorithm

The main reason that data mining has gained overwhelming attention in the business world.

Need of better decision making Cost of ownership Technological advancements Availability of data

Which of the following relates to a pattern-recognition methodology for machine learning.

Neural computing

The critical key terms used in defining data mining includes:

Nontrivial Potentially Useful Process Novel

_____________ is a type of linear least squares method for estimating the unknown parameters in a linear regression model.

Ordinary least squares

________ charts or network diagrams show precedence relationships among the project activities/tasks.

PERT

In Bayes theorem, the posterior probability is defined as

Posterior = (Likelihood*Prior) / Evidence

In retailing, data mining is most commonly used to

Predict future sales

Data mining is an essential part of what types of analytics in analytics taxonomy.

Predictive

What type of analytics seeks to determine what is likely to happen in the future?

Predictive

If I am interested in identifying the optimal quantity of purchase orders in order to minimize the overall cost, which of the following analytics type should I use?

Prescriptive

In brokerages and securities trading, data mining is used to

Prevent fraudulent activities

____________ is defined as the coefficient of determination in a statistical measure of regression model.

R-squared

In data mining, the prediction models further sub-classified into

Regression

_________ is used to describe the relationship between a response variable on one or more explanatory variables.

Regression SVM ANN Naive Bayes

Which of the following method takes into account the partial membership of class labels to predefined categories while building models for classification problems.

Rough sets

In data mining, clustering is classified further into

Segmentation, Outlier Analysis

The ratio of correctly classified negatives divided by the total negative count is called:

Specificity

Which one of the following represents unstructured data?

Textual Multimedia XML/HTML Pictures

Analytics is the art and science of discovering insight to support accurate and timely decision making.

True

Apriori and FP-Growth algorithms are part of the association type data mining tasks.

True

Association patterns can also include capturing the sequence of events and things.

True

Business analytics and data science have the same purpose: to convert data into actionable insight through an algorithm-based discovery process.

True

Business intelligence is nothing more than the descriptive analytics part of the simple business analytics taxonomy.

True

CRM aims to create one-on-one relationships with customers by developing an intimate understanding of their needs and wants.

True

Cubes in OLAP are defined as multidimensional representation of the data stored in and retrieved from data warehouses.

True

Data mining leverages capabilities of statistics, artificial intelligence, machine learning, management science, information systems, and databases, in a systematic and synergistic way.

True

ERP stands for enterprise resource planning and is used for the integration of company-wide data.

True

F1 metric is simply the harmonic mean of precision and recall.

True

Google Maps has set new standards for data visualization with its intuitive Web mapping software.

True

Homoscedasticity states that the response variables must have the same variance in their error, regardless of the explanatory variables' values.

True

How and what the model concludes on certain predictions is obtained by the interpretability characteristic of a prediction method.

True

In SVM model, normalization's main benefit is to avoid having attributes in greater numeric ranges dominate those in smaller numeric ranges.

True

In prediction, linear regression uses a mathematical equation to identify additive mathematical relationships between explanatory variables and the response variable.

True

In the FEMA case study, the BureauNet software was the primary reason behind the increased speed and relevance of the reports FEMA employees received.

True

Information warfare often refers to identify and stop malicious attacks on critical information infrastructures in literarily any and every organizations and business domains.

True

Interval data are variables that can be measured on interval scales.

True

One of SiriusXM's challenges was tracking potential customers when cars were sold.

True

The purpose of data preparation is to eliminate the possibility of GIGO errors, which is also commonly known as data preprocessing

True

The ratio of accurately classified instances (positives and negatives) divided by the total number of instances is defined as the overall accuracy metric.

True

Time series is a sequence of data points of interest measured and represented at consecutive and regular time intervals.

True

To deploy a developed SVM model, the model coefficients can be extracted and integrated directly into the decision support system.

True

Visualization differs from traditional charts and graphs in complexity of data sets and use of multiple dimensions and measures.

True

k-NN is a prediction method used not only for classification but also for regression-type prediction problems.

True

Which type of question does visual analytics seeks to answer?

Why is it happening?

This plot is a graphical illustration of several descriptive statistics about a given data set.

box-and-whiskers plot

Which characteristic of data means that all the required data elements are included in the data set?

data richness

What is the fundamental challenge of dashboard design?

ensuring that the required information is shown clearly on a single screen

Which type of visualization tool can be very helpful when a data set contains location data?

geographic map

When validating the assumptions of a regression, ________ assumes that the relationship between the response variable and the explanatory variables are linear.

linearity

What is the management feature of a dashboard?

operational data that identify what actions to take to resolve a problem

Which type of visualization tool can be very helpful when the intention is to show relative proportions of dollars per department allocated by a university administration?

pie chart

This measure of dispersion is calculated by simply taking the square root of the variations.

standard deviation

Data mining can be used to predict the result of sporting events to identify means to decrease odds of winning against specific opponent.

False

Business Analytics is the process of developing code and frameworks.

False

Business intelligence is a broad concept that also includes business analytics within its simple taxonomy.

False

The main roadblocks for adopting analytics include which of the following?

All of the answers are true

The term knowledge discovery has been used to refer to which of the follow?

Data Mining

Jim, the marketing manager in the company, is interested in the sales numbers in the south region by each product type for the last six months. What type of analytics would you use to help him?

Descriptive

Which of the following is not among the most important driver behind business analytics and data science popularity?

Domain specific knowledge

Business intelligence is a specific term that describes architectures and tools only.

False

Correlation is meant to represent the linear relationships between two nominal input variables.

False

Analytics and analysis are essentially the same thing; they both focus on the granular level representation of complex problems through decomposition of the whole into its lower-level parts.

False

Balancing skewed data means oversampling of the more represented class records and under sampling of the less represented class records.

False

Bootstrapping methodology is similar to the leave-one-out methodology where it can be used to calculate accuracy by leaving out one sample out at each iteration of the estimation process.

False

Which of the following is the overarching principle in DeepQA?

Integration of shallow and deep knowledge

The most commonly used clustering technique is

K-means

Which of the following developments is not contributing to facilitating growth of decision support and analytics?

Locally concentrated workforces

________ are typically used together with other charts and graphs, as opposed to by themselves, and show postal codes, country names, etc.

Maps


Related study sets

Walter Isaacson: "Benjamin Franklin: An American Life"

View Set

ulceratice colitis vs crohns disease practice questions

View Set

MKTG 409 Chapter 20 Practice Tests

View Set

Nouns referring to ways of thinking, processes and activities

View Set