MSIS 4263 Exam 1 Review
In what decade did disjointed information systems began to be integrated? 2010s 1970s 1990s 1980s 2000s
1980s
In CRISP-DM methodology, how many sequential steps exist? 5 6 7 4 8
6
Which of the following is true about clustering can Assigning customers in different segments Classifies customers into predefined classes Finds sequential relationships Tell the nature of future occurrences Forecasts future sales trends
Assigning customers in different segments
The classification method that uses conditional probabilities to build classification models is called: Bayesian classifiers Genetic algorithms Neural networks Rough sets Case-based reasoning
Bayesian classifiers
The most relevant methodology that is used to implement data science and business analytics projects is CRISP-DM methodology Knowledge discovery in databases (KDD) methodology SEMMA methodology Agile Methodology Six Sigma methodology
CRISP-DM methodology
Which of the following is not a supervised machine learning algorithm? Regression Classification Time series Forecasting Clustering
Clustering
The assessment of the project outcomes is carried out in which step of Six Sigma process. Analyze Control Define Measure Improve
Control
The data preprocessing step which is to prepare the data identified in the previous step for analysis in CRISP-DM process is Data Understanding Testing and Evaluation Model building Business Understanding Data Preparation
Data Preparation
Usually, which step in the CRISP process consumes the most amount of time to complete?
Data Preparation
Identifying the relevant data from different sources is achieved in which step of the CRISP-DM Process. Testing and Evaluation Model building Data Preparation Business Understanding Data Understanding
Data Understanding
The term knowledge discovery has been used to refer to which of the follow? Business Analytics Text Analytics Social Analytics Data Mining Web Mining
Data mining
Which of the following is not commonly used as an enabler of descriptive analytics? Dashboards and scorecards Data warehousing Data visualization Data mining Business Reporting
Data mining
Data mining is primarily concerned with mining (i.e., digging out data) from a variety of disparate data sources.
False
Decision trees are part of the regression-type prediction methods.
False
DeepQA is a massively parallel, web mining focused, probabilistic computational algorithm developed by the SAS Institute.
False
Define, explore, measure, and assess are the steps involved in Six Sigma process.
False
Handling the missing values in the data is typically performed in Data Consolidation phase.
False
If I am distributing funds to different financial products to maximize return, I am essentially doing descriptive analytics.
False
If a classification problem is not binary, we cannot use confusion matrix to tabulate prediction outcomes.
False
In CRISP-DM process, it is not important or necessary to follow the sequential order of each step. That is, the steps can be executed in an arbitrary sequence.
False
In SEMMA process, visualization and description of the data is carried out in the modify step.
False
In banking and finance, data mining is often used to manage microeconomics movements and overall cash flow outcomes.
False
In the project finalization task, both CRISP-DM and SEMMA methodologies prescribe deploying the results.
False
In the testing and evaluation step of CRISP-DM methodology, monitoring and maintenance of the models are important.
False
Major commercial business intelligence products and services were well established in the early 1970s.
False
Novel is a key term in the definition of data mining, which means that the patterns are known by the user within the context of the system being analyzed.
False
One of the most pronounced reasons for the increasing popularity of data mining is due to the fact that there are less suppliers than corresponding demand in the business marketplace.
False
Prediction modeling is often classified under the unsupervised machine learning methods.
False
The area under the ROC curve is a graphical assessment technique for binary classification problems, in which sensitivity is plotted on the y-axis and the specificity is plotted on the x-axis.
False
The modify step in Six-Sigma involves the process of assessing the mapping between organizational data repositories and the business problem.
False
The most important driver behind business analytics popularity is the need for the business managers to make experience and intuition driven business decisions.
False
The multi split methodology partitions data into exactly two mutually exclusive subsets called training set and test set.
False
The original terminology of data mining commonly refers to discovering known patterns in large and structured data sets.
False
The ratio of correctly classified positives divided by the total actual positive count is defined as a precision metric.
False
k-means algorithm is a part of prediction data mining method.
False
which of the following algorithms use the analogy of natural evolution to build directed search based mechanisms to classify data samples. Genetic Rough sets K-means Statistical analysis Support vector machines
Genetic
Which of the following question can be answered by prescriptive analytics? How long will the current problem continue to happen? How can the best be realized? Why did we lose five percent of customers last year? Why did the sales drop in Dallas? Will our sales increase or decrease next month?
How can the best be realized?
In the SEMMA process, the analysts have the option to select and transform the variables on which step to improve the model construction process. Explore Sample Model Assess Modify
Modify
The categorical data contains
Nominal
The types of patterns discovered with data mining includes all of these, except: Forecasting Optimization Classification Clustering Association
Optimization
The Customer credit ratings like bad, fair, and excellent are considered as what type of data. Numeric Continuous Quantitative Nominal Ordinal
Ordinal
In retailing, data mining is most commonly used to Discover time-variant association Develop managerial dashboards Predict future sales Optimize cash returns Detect policy failures
Predict future sales
Data mining is an essential part of what types of analytics in analytics taxonomy.
Predictive
What type of analytics seeks to determine what is likely to happen in the future?
Predictive
If I am interested in identifying the optimal quantity of purchase orders in order to minimize the overall cost, which of the following analytics type should I use?
Prescriptive
What type of analytics seeks to identify the courses of action to achieve the best performance possible? Diagnostic Domain specific Predictive Descriptive Prescriptive
Prescriptive
The critical key terms used in defining data mining includes all of these, except: Potentially useful Process Previously known Nontrivial Novel
Previously known
In data mining, the prediction models further sub-classified into Affinity analysis Link analysis Outlier analysis Regression Segmentation
Regression
The well-known standardized process for data analytics which was developed by SAS is called CRISP-DM methodology SEMMA methodology Knowledge discovery in databases (KDD) methodology Six Sigma methodology Agile Methodology
SEMMA methodology
In data mining, clustering is classified further into Segmentation, Outlier Analysis Segmentation, Classification, Sequence Analysis Segmentation, Outlier Analysis, Link Analysis Segmentation, Outlier Analysis, Classification Segmentation, Classification
Segmentation, Outlier Analysis
The primary difference between statistics and data mining is -Statistics starts with a well-defined proposition and hypothesis whereas data mining starts with a loosely defined discovery statement. -Statistics starts with a vague defined discovery system whereas data mining starts with predefined proposed system. -None of the answers are true -Statistics starts with a loosely defined discovery statement whereas data mining starts with a well-defined proposition and hypothesis. -Data mining starts with well-defined hypothesis and statistics starts with a novel discovery statement.
Statistics starts with a well-defined proposition and hypothesis whereas data mining starts with a loosely defined discovery statement.
In SEMMA process, the first step sample involves which of the following sub-steps Training, Testing, Deployment Training, Evaluation, Test, Deployment Testing, Evaluation, Deployment Training, Deployment, Maintenance Training, Validation, Test
Training, Validation, Test
A typical example of interval scale measurement is the temperature on the Celsius scale.
True
Analytics is the art and science of discovering insight to support accurate and timely decision making.
True
Apriori and FP-Growth algorithms are part of the association type data mining tasks.
True
Association patterns can also include capturing the sequence of events and things.
True
Business analytics and data science have the same purpose: to convert data into actionable insight through an algorithm-based discovery process.
True
Business intelligence is nothing more than the descriptive analytics part of the simple business analytics taxonomy.
True
CRM aims to create one-on-one relationships with customers by developing an intimate understanding of their needs and wants.
True
Data mining leverages capabilities of statistics, artificial intelligence, machine learning, management science, information systems, and databases, in a systematic and synergistic way.
True
During the model building step in CRISP-DM process, the data mining methods and algorithms are applied to the current data set.
True
ERP stands for enterprise resource planning and is used for the integration of company-wide data.
True
F1 metric is simply the harmonic mean of precision and recall.
True
How and what the model concludes on certain predictions is obtained by the interpretability characteristic of a prediction method.
True
Identifying the most pressing problem and defining the goals and objectives can be done in the define step in Six Sigma process.
True
If a data scientist is analyzing historical data to identify problems and root causes, he/she is essentially conducting descriptive analytics.
True
In SEMMA process, the accuracy and usefulness of the models are evaluated in the assess step.
True
In the model-building task, both CRISP-DM and SEMMA methodologies build and test various models.
True
In the retail industry association rule mining is frequently called market-based analysis.
True
Information warfare often refers to identify and stop malicious attacks on critical information infrastructures in literarily any and every organizations and business domains.
True
Manufacturers use data mining to classify anomalies and commonalities in the production system to improve the manufacturing system.
True
One of the key differences between business analytics and data science is their primary focus either on business problems or on mathematical algorithms.
True
Organizations apply analytics to business problems to identify problems, foresee future trends, and make best possible decisions.
True
Six Sigma process promotes an error-free/perfect business execution.
True
The data sources that are combined in a centralized data repository for supporting managerial decisions is known as a data warehouse.
True
The important part of KDD process is the feedback loop that allows the process flow to redirect backward, from any step to any other previous steps, for rework and readjustments.
True
The purpose of data preparation is to eliminate the possibility of GIGO errors, which is also commonly known as data preprocessing
True
The ratio of accurately classified instances (positives and negatives) divided by the total number of instances is defined as the overall accuracy metric.
True
Today, analytics can be defined as simply as "the discovery of information/knowledge/insight in data.
True
Today, analytics can be defined as simply as "the discovery of information/knowledge/insight in data."
True
Business Analytics is the process of developing code and frameworks.
False
The main reason that data mining has gained overwhelming attention in the business world
All answers are true
The main roadblocks for adopting analytics include which of the following? All of the answers are true Sheer size of big data Justifying ROI Lack of analytics talent Corporate culture
All of the answers are true
The other names commonly used for data mining includes...
All of the answers are true
The other names commonly used for data mining includes... Information harvesting Knowledge discovery in databases Information extraction All of the answers are true Pattern analysis
All of the answers are true
Firms have used analytics to enhance which of the following business activities: To empower employees with the information To improve their relationships with their customers To identify fraudulent transactions To make better decisions All the answers are true
All the answers are true
Business intelligence is a broad concept that also includes business analytics within its simple taxonomy.
False
In a longitudinal view of the evolution of analytics, what we nowadays call analytics was called __________________ in 1970s.
Decision support systems
Jim, the marketing manager in the company, is interested in the sales numbers in the south region by each product type for the last six months. What type of analytics would you use to help him? Descriptive Predictive Diagnostic Domain specific Prescriptive
Descriptive
Which of the following is not among the most important driver behind business analytics and data science popularity? Cheaper hardware and software Availability of ample digitized data Need to make better decisions Domain specific knowledge Enhanced algorithms
Domain specific knowledge
CRISP-DM methodology is proposed by Fayyad et al, in the year 1996.
False
Which of the following is the definition of data? Knowledge Chart Information Facts and measures Analytics
Facts and measures
Analytics and analysis are essentially the same thing; they both focus on the granular level representation of complex problems through decomposition of the whole into its lower-level parts.
False
Balancing skewed data means oversampling of the more represented class records and under sampling of the less represented class records.
False
Bootstrapping methodology is similar to the leave-one-out methodology where it can be used to calculate accuracy by leaving out one sample out at each iteration of the estimation process.
False
Which of the following is the overarching principle in DeepQA? All the answers are true Computer intelligence Human intelligence Integration of shallow and deep knowledge Maturity continuum
Integration of shallow and deep knowledge
The most commonly used clustering technique is
K means
The first and the earliest data mining process is known with the name of Knowledge discovery in databases (KDD) methodology SEMMA methodology Waterfall methodology CRISP-DM methodology Six Sigma methodology
Knowledge discovery in databases (KDD) methodology
Which of the following developments is not contributing to facilitating growth of decision support and analytics?
Locally Concentrated Workforces
During which step in DMAIC, the identified data sources are consolidated and transformed into a format that is amenable to machine processing.
Measure
