MSIS-4263 Midterm

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

In CRISP-DM methodology, how many sequential steps exist?

6

Which of the following is true about clustering can

Assigning customers in different segments

Random sampling of a fixed number of instances from the original data with replacement to construct the training data set is achieved by

Bootstrapping

Identifying the goals, purpose, and requirements of the customers is achieved in which step of the CRISP-DM process.

Business Understanding

The most relevant methodology that is used to implement data science and business analytics projects is

CRISP-DM methodology

______________ classification approach uses historical samples and cases to identify commonalities in order to assign a new case to the most similar category.

Case-based reasoning

Which of the following is not a supervised machine learning algorithm?

Clustering

When an SVM prediction model is developed, it can be integrated into decision support system by which of the following methods?

Computational object

In classification problems, the main source for all accuracy estimation metrics is a

Confusion matrix

Which of the following provides an estimate of the degree of linear association between numerically represented variables.

Correlation

The _____________ method's common idea is to split the data sample into a number of randomly drawn, disjointed subsamples.

Cross validation

Identifying the relevant data from different sources is achieved in which step of the CRISP-DM Process.

Data Understanding

Usually, which step in the CRISP process consumes the most amount of time to complete?

Data preparation

Data mining is primarily concerned with mining (i.e., digging out data) from a variety of disparate data sources.

False

If I am distributing funds to different financial products to maximize return, I am essentially doing descriptive analytics. True

False

If a classification problem is not binary, we cannot use confusion matrix to tabulate prediction outcomes. True

False

In SEMMA process, visualization and description of the data is carried out in the modify step.

False

In banking and finance, data mining is often used to manage microeconomics movements and overall cash flow outcomes.

False

In linear regression independence of errors assumption is also known as homoscedasticity

False

In linear regression, the hypothesis testing reveal the existence of relationships between explanatory (i.e., input) variables.

False

In normality of error assumption of linear regression, the response variables values expected to be randomly distributed.

False

In the project finalization task, both CRISP-DM and SEMMA methodologies prescribe deploying the results.

False

In the testing and evaluation step of CRISP-DM methodology, monitoring and maintenance of the models are important.

False

Linear Regression aims to capture the functional relationships between one or more numeric input variables and a categorical output variable.

False

Logistic regression is like linear regression where both of them are used to predict a numeric target variable.

False

Major commercial business intelligence products and services were well established in the early 1970s.

False

One of the most pronounced reasons for the increasing popularity of data mining is due to the fact that there are less suppliers than corresponding demand in the business marketplace.

False

Prediction modeling is often classified under the unsupervised machine learning methods.

False

The Naïve Bayes method requires output variables to have numeric values.

False

The area under the ROC curve is a graphical assessment technique for binary classification problems, in which sensitivity is plotted on the y-axis and the specificity is plotted on the x-axis.

False

The modify step in Six-Sigma involves the process of assessing the mapping between organizational data repositories and the business problem.

False

The multi split methodology partitions data into exactly two mutually exclusive subsets called training set and test set.

False

The ratio of correctly classified positives divided by the total actual positive count is defined as a precision metric.

False

____________ clustering methods are based on the basic idea that nearby objects are more related to each other than are those that are farther away from each other.

Hierarchical

During which step in DMAIC, the identified data sources are consolidated and transformed into a format that is amenable to machine processing.

Measure

________________ is the occurrence of high intercorrelations among two or more independent variables in a multiple regression model.

Multicollinearity

Which of the following relates to a pattern-recognition methodology for machine learning.

Neural computing

The categorical data contains

Nominal

The types of patterns discovered with data mining includes all of these, except:

Optimization

The Customer credit ratings like bad, fair, and excellent are considered as what type of data.

Ordinal

_____________ is a type of linear least squares method for estimating the unknown parameters in a linear regression model.

Ordinary least squares

In retailing, data mining is most commonly used to

Predict future sales

Data mining is an essential part of what types of analytics in analytics taxonomy.

Predictive

In brokerages and securities trading, data mining is used to

Prevent fraudulent activities

____________ is defined as the coefficient of determination in a statistical measure of regression model.

R-squared

The well-known standardized process for data analytics which was developed by SAS is called

SEMMA methodology

In data mining, clustering is classified further into

Segmentation, Outlier Analysis

Which of the following is not among the main assumptions in linear regression?

Simplicity

The ratio of accurately classified negatives divided by the total negative count is defined as

Specificity

The ratio of correctly classified negatives divided by the total negative count is called:

Specificity

The primary difference between statistics and data mining is

Statistics starts with a well-defined proposition and hypothesis whereas data mining starts with a loosely defined discovery statement.

A typical example of interval scale measurement is the temperature on the Celsius scale.

True

Analytics is the art and science of discovering insight to support accurate and timely decision making.

True

Apriori and FP-Growth algorithms are part of the association type data mining tasks.

True

Association patterns can also include capturing the sequence of events and things.

True

Business intelligence is nothing more than the descriptive analytics part of the simple business analytics taxonomy.

True

CRM aims to create one-on-one relationships with customers by developing an intimate understanding of their needs and wants.

True

Cubes in OLAP are defined as multidimensional representation of the data stored in and retrieved from data warehouses.

True

Data mining leverages capabilities of statistics, artificial intelligence, machine learning, management science, information systems, and databases, in a systematic and synergistic way.

True

During the model building step in CRISP-DM process, the data mining methods and algorithms are applied to the current data set.

True

ERP stands for enterprise resource planning and is used for the integration of company-wide data.

True

Homoscedasticity states that the response variables must have the same variance in their error, regardless of the explanatory variables' values.

True

In SEMMA process, the accuracy and usefulness of the models are evaluated in the assess step.

True

In prediction, linear regression uses a mathematical equation to identify additive mathematical relationships between explanatory variables and the response variable.

True

In the model-building task, both CRISP-DM and SEMMA methodologies build and test various models.

True

In the retail industry association rule mining is frequently called market-based analysis.

True

Manufacturers use data mining to classify anomalies and commonalities in the production system to improve the manufacturing system.

True

Multicollinearity can be triggered by having two or more perfectly correlated explanatory variables present in the model.

True

Organizations apply analytics to business problems to identify problems, foresee future trends, and make best possible decisions.

True

Six Sigma process promotes an error-free/perfect business execution.

True

The important part of KDD process is the feedback loop that allows the process flow to redirect backward, from any step to any other previous steps, for rework and readjustments.

True

The purpose of data preparation is to eliminate the possibility of GIGO errors, which is also commonly known as data preprocessing

True

The ratio of accurately classified instances (positives and negatives) divided by the total number of instances is defined as the overall accuracy metric.

True

Today, analytics can be defined as simply as "the discovery of information/knowledge/insight in data.

True

k-NN is a prediction method used not only for classification but also for regression-type prediction problems.

True

Association and clustering type patterns are often classified as the result of

Unsupervised learning procedures

Business Analytics is the process of developing code and frameworks.

False

In linear regression the relationship between the variables can be represented as:

All the answers are true Mathematical equation Additive function Linear representation Linear coefficient

Which one of the following represents unstructured data?

All answers are true Multimedia XML/HTML Pictures Textual

Which of the following application areas make use of association rule mining:

All answers are true Sales transactions Medical records Credit card transactions Banking services

The tasks that are followed in the SVM model when performing the data preprocessing includes:

All the answers are true Handling noisy values Handling missing and incomplete data Normalizing the data Numerisizing the data

Business intelligence is a broad concept that also includes business analytics within its simple taxonomy.

False

CRISP-DM methodology is proposed by Fayyad et al, in the year 1996.

False

Analytics and analysis are essentially the same thing; they both focus on the granular level representation of complex problems through decomposition of the whole into its lower-level parts.

False

Balancing skewed data means oversampling of the more represented class records and under sampling of the less represented class records.

False

Bootstrapping methodology is similar to the leave-one-out methodology where it can be used to calculate accuracy by leaving out one sample out at each iteration of the estimation process.

False

The most commonly used clustering technique is

K-means

________________ clustering is a density based method of vector quantization to partition observations into predetermined fixed number of clusters.

K-means

The first and the earliest data mining process is known with the name of

Knowledge discovery in databases (KDD) methodology

Which of the following is not a classification method?

Linear regression


संबंधित स्टडी सेट्स

chapter 18 and 19 personality of psychology

View Set

Cumulative Exam Semester II Review

View Set

2.04 Quiz: Review of Equations 2

View Set

Electrical Student Outcome Assesment HVAC

View Set

Phlebotomy Essentials 6th edition. ALL quizzes, ALL ch. tests, GRADED work, NOT guesses. PLUS, the FULL NAHP study guide

View Set

social studies Chapter 5 - Section 3 and 4 8th grade

View Set

ANTH 1101 Peoples and Cultures (Section 1: Understanding & Studying Culture)

View Set