MSIS 4263 Exam 1 Review

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

In what decade did disjointed information systems began to be integrated? 2010s 1970s 1990s 1980s 2000s

1980s

In CRISP-DM methodology, how many sequential steps exist? 5 6 7 4 8

6

Which of the following is true about clustering can Assigning customers in different segments Classifies customers into predefined classes Finds sequential relationships Tell the nature of future occurrences Forecasts future sales trends

Assigning customers in different segments

The classification method that uses conditional probabilities to build classification models is called: Bayesian classifiers Genetic algorithms Neural networks Rough sets Case-based reasoning

Bayesian classifiers

The most relevant methodology that is used to implement data science and business analytics projects is CRISP-DM methodology Knowledge discovery in databases (KDD) methodology SEMMA methodology Agile Methodology Six Sigma methodology

CRISP-DM methodology

Which of the following is not a supervised machine learning algorithm? Regression Classification Time series Forecasting Clustering

Clustering

The assessment of the project outcomes is carried out in which step of Six Sigma process. Analyze Control Define Measure Improve

Control

The data preprocessing step which is to prepare the data identified in the previous step for analysis in CRISP-DM process is Data Understanding Testing and Evaluation Model building Business Understanding Data Preparation

Data Preparation

Usually, which step in the CRISP process consumes the most amount of time to complete?

Data Preparation

Identifying the relevant data from different sources is achieved in which step of the CRISP-DM Process. Testing and Evaluation Model building Data Preparation Business Understanding Data Understanding

Data Understanding

The term knowledge discovery has been used to refer to which of the follow? Business Analytics Text Analytics Social Analytics Data Mining Web Mining

Data mining

Which of the following is not commonly used as an enabler of descriptive analytics? Dashboards and scorecards Data warehousing Data visualization Data mining Business Reporting

Data mining

Data mining is primarily concerned with mining (i.e., digging out data) from a variety of disparate data sources.

False

Decision trees are part of the regression-type prediction methods.

False

DeepQA is a massively parallel, web mining focused, probabilistic computational algorithm developed by the SAS Institute.

False

Define, explore, measure, and assess are the steps involved in Six Sigma process.

False

Handling the missing values in the data is typically performed in Data Consolidation phase.

False

If I am distributing funds to different financial products to maximize return, I am essentially doing descriptive analytics.

False

If a classification problem is not binary, we cannot use confusion matrix to tabulate prediction outcomes.

False

In CRISP-DM process, it is not important or necessary to follow the sequential order of each step. That is, the steps can be executed in an arbitrary sequence.

False

In SEMMA process, visualization and description of the data is carried out in the modify step.

False

In banking and finance, data mining is often used to manage microeconomics movements and overall cash flow outcomes.

False

In the project finalization task, both CRISP-DM and SEMMA methodologies prescribe deploying the results.

False

In the testing and evaluation step of CRISP-DM methodology, monitoring and maintenance of the models are important.

False

Major commercial business intelligence products and services were well established in the early 1970s.

False

Novel is a key term in the definition of data mining, which means that the patterns are known by the user within the context of the system being analyzed.

False

One of the most pronounced reasons for the increasing popularity of data mining is due to the fact that there are less suppliers than corresponding demand in the business marketplace.

False

Prediction modeling is often classified under the unsupervised machine learning methods.

False

The area under the ROC curve is a graphical assessment technique for binary classification problems, in which sensitivity is plotted on the y-axis and the specificity is plotted on the x-axis.

False

The modify step in Six-Sigma involves the process of assessing the mapping between organizational data repositories and the business problem.

False

The most important driver behind business analytics popularity is the need for the business managers to make experience and intuition driven business decisions.

False

The multi split methodology partitions data into exactly two mutually exclusive subsets called training set and test set.

False

The original terminology of data mining commonly refers to discovering known patterns in large and structured data sets.

False

The ratio of correctly classified positives divided by the total actual positive count is defined as a precision metric.

False

k-means algorithm is a part of prediction data mining method.

False

which of the following algorithms use the analogy of natural evolution to build directed search based mechanisms to classify data samples. Genetic Rough sets K-means Statistical analysis Support vector machines

Genetic

Which of the following question can be answered by prescriptive analytics? How long will the current problem continue to happen? How can the best be realized? Why did we lose five percent of customers last year? Why did the sales drop in Dallas? Will our sales increase or decrease next month?

How can the best be realized?

In the SEMMA process, the analysts have the option to select and transform the variables on which step to improve the model construction process. Explore Sample Model Assess Modify

Modify

The categorical data contains

Nominal

The types of patterns discovered with data mining includes all of these, except: Forecasting Optimization Classification Clustering Association

Optimization

The Customer credit ratings like bad, fair, and excellent are considered as what type of data. Numeric Continuous Quantitative Nominal Ordinal

Ordinal

In retailing, data mining is most commonly used to Discover time-variant association Develop managerial dashboards Predict future sales Optimize cash returns Detect policy failures

Predict future sales

Data mining is an essential part of what types of analytics in analytics taxonomy.

Predictive

What type of analytics seeks to determine what is likely to happen in the future?

Predictive

If I am interested in identifying the optimal quantity of purchase orders in order to minimize the overall cost, which of the following analytics type should I use?

Prescriptive

What type of analytics seeks to identify the courses of action to achieve the best performance possible? Diagnostic Domain specific Predictive Descriptive Prescriptive

Prescriptive

The critical key terms used in defining data mining includes all of these, except: Potentially useful Process Previously known Nontrivial Novel

Previously known

In data mining, the prediction models further sub-classified into Affinity analysis Link analysis Outlier analysis Regression Segmentation

Regression

The well-known standardized process for data analytics which was developed by SAS is called CRISP-DM methodology SEMMA methodology Knowledge discovery in databases (KDD) methodology Six Sigma methodology Agile Methodology

SEMMA methodology

In data mining, clustering is classified further into Segmentation, Outlier Analysis Segmentation, Classification, Sequence Analysis Segmentation, Outlier Analysis, Link Analysis Segmentation, Outlier Analysis, Classification Segmentation, Classification

Segmentation, Outlier Analysis

The primary difference between statistics and data mining is -Statistics starts with a well-defined proposition and hypothesis whereas data mining starts with a loosely defined discovery statement. -Statistics starts with a vague defined discovery system whereas data mining starts with predefined proposed system. -None of the answers are true -Statistics starts with a loosely defined discovery statement whereas data mining starts with a well-defined proposition and hypothesis. -Data mining starts with well-defined hypothesis and statistics starts with a novel discovery statement.

Statistics starts with a well-defined proposition and hypothesis whereas data mining starts with a loosely defined discovery statement.

In SEMMA process, the first step sample involves which of the following sub-steps Training, Testing, Deployment Training, Evaluation, Test, Deployment Testing, Evaluation, Deployment Training, Deployment, Maintenance Training, Validation, Test

Training, Validation, Test

A typical example of interval scale measurement is the temperature on the Celsius scale.

True

Analytics is the art and science of discovering insight to support accurate and timely decision making.

True

Apriori and FP-Growth algorithms are part of the association type data mining tasks.

True

Association patterns can also include capturing the sequence of events and things.

True

Business analytics and data science have the same purpose: to convert data into actionable insight through an algorithm-based discovery process.

True

Business intelligence is nothing more than the descriptive analytics part of the simple business analytics taxonomy.

True

CRM aims to create one-on-one relationships with customers by developing an intimate understanding of their needs and wants.

True

Data mining leverages capabilities of statistics, artificial intelligence, machine learning, management science, information systems, and databases, in a systematic and synergistic way.

True

During the model building step in CRISP-DM process, the data mining methods and algorithms are applied to the current data set.

True

ERP stands for enterprise resource planning and is used for the integration of company-wide data.

True

F1 metric is simply the harmonic mean of precision and recall.

True

How and what the model concludes on certain predictions is obtained by the interpretability characteristic of a prediction method.

True

Identifying the most pressing problem and defining the goals and objectives can be done in the define step in Six Sigma process.

True

If a data scientist is analyzing historical data to identify problems and root causes, he/she is essentially conducting descriptive analytics.

True

In SEMMA process, the accuracy and usefulness of the models are evaluated in the assess step.

True

In the model-building task, both CRISP-DM and SEMMA methodologies build and test various models.

True

In the retail industry association rule mining is frequently called market-based analysis.

True

Information warfare often refers to identify and stop malicious attacks on critical information infrastructures in literarily any and every organizations and business domains.

True

Manufacturers use data mining to classify anomalies and commonalities in the production system to improve the manufacturing system.

True

One of the key differences between business analytics and data science is their primary focus either on business problems or on mathematical algorithms.

True

Organizations apply analytics to business problems to identify problems, foresee future trends, and make best possible decisions.

True

Six Sigma process promotes an error-free/perfect business execution.

True

The data sources that are combined in a centralized data repository for supporting managerial decisions is known as a data warehouse.

True

The important part of KDD process is the feedback loop that allows the process flow to redirect backward, from any step to any other previous steps, for rework and readjustments.

True

The purpose of data preparation is to eliminate the possibility of GIGO errors, which is also commonly known as data preprocessing

True

The ratio of accurately classified instances (positives and negatives) divided by the total number of instances is defined as the overall accuracy metric.

True

Today, analytics can be defined as simply as "the discovery of information/knowledge/insight in data.

True

Today, analytics can be defined as simply as "the discovery of information/knowledge/insight in data."

True

Business Analytics is the process of developing code and frameworks.

False

The main reason that data mining has gained overwhelming attention in the business world

All answers are true

The main roadblocks for adopting analytics include which of the following? All of the answers are true Sheer size of big data Justifying ROI Lack of analytics talent Corporate culture

All of the answers are true

The other names commonly used for data mining includes...

All of the answers are true

The other names commonly used for data mining includes... Information harvesting Knowledge discovery in databases Information extraction All of the answers are true Pattern analysis

All of the answers are true

Firms have used analytics to enhance which of the following business activities: To empower employees with the information To improve their relationships with their customers To identify fraudulent transactions To make better decisions All the answers are true

All the answers are true

Business intelligence is a broad concept that also includes business analytics within its simple taxonomy.

False

In a longitudinal view of the evolution of analytics, what we nowadays call analytics was called __________________ in 1970s.

Decision support systems

Jim, the marketing manager in the company, is interested in the sales numbers in the south region by each product type for the last six months. What type of analytics would you use to help him? Descriptive Predictive Diagnostic Domain specific Prescriptive

Descriptive

Which of the following is not among the most important driver behind business analytics and data science popularity? Cheaper hardware and software Availability of ample digitized data Need to make better decisions Domain specific knowledge Enhanced algorithms

Domain specific knowledge

CRISP-DM methodology is proposed by Fayyad et al, in the year 1996.

False

Which of the following is the definition of data? Knowledge Chart Information Facts and measures Analytics

Facts and measures

Analytics and analysis are essentially the same thing; they both focus on the granular level representation of complex problems through decomposition of the whole into its lower-level parts.

False

Balancing skewed data means oversampling of the more represented class records and under sampling of the less represented class records.

False

Bootstrapping methodology is similar to the leave-one-out methodology where it can be used to calculate accuracy by leaving out one sample out at each iteration of the estimation process.

False

Which of the following is the overarching principle in DeepQA? All the answers are true Computer intelligence Human intelligence Integration of shallow and deep knowledge Maturity continuum

Integration of shallow and deep knowledge

The most commonly used clustering technique is

K means

The first and the earliest data mining process is known with the name of Knowledge discovery in databases (KDD) methodology SEMMA methodology Waterfall methodology CRISP-DM methodology Six Sigma methodology

Knowledge discovery in databases (KDD) methodology

Which of the following developments is not contributing to facilitating growth of decision support and analytics?

Locally Concentrated Workforces

During which step in DMAIC, the identified data sources are consolidated and transformed into a format that is amenable to machine processing.

Measure


Kaugnay na mga set ng pag-aaral

DWC Training: Person-Centered Planning with Children, Adults and Families

View Set

BMGT 350 chapter 9, 11,12,13,& 14

View Set

Chapter 10: Choosing a Legal Structure

View Set

Unit 2: Caring Throughout the Life Span / Chapter 9: Culture and Ethnicity

View Set

Econ 105 the cost of production notes

View Set

Chapter 18- The Expansion of Europe

View Set

Bio: Chapter 17 Translation test questions

View Set

Prep U / Qs -Chapter 8 - Communication

View Set

Chapter 4 - Lesson 2 - Asexual Reproduction

View Set

Chapter 14 Blood vessels, blood flow, blood pressure

View Set