DSCI

Ace your homework & exams now with Quizwiz!

Predictors of a multiple linear regression model can only be of the numeric type. True or false

false

ANOVA is an analysis under which of the following data mining task categories? A) Data exploration B) Data cleaning C) Dimension reduction D) Visualization

A) Data exploration

a. Deciding whether to issue a loan to an applicant based on demographic and financial data (with reference to a database of similar data on prior customers). This is supervised learning, because the database includes whether the loan was approved or not. b. In an online bookstore, making recommendations to customers concerning additional items to buy based on the buying patterns in prior transactions. This is unsupervised learning, because there is no apparent outcome (e.g., whether the recommendation was adopted or not). c. Identifying a network data packet as dangerous (virus, hacker attack) based on comparison to other packets whose threat status is known .This is supervised learning, because for the other packets the status is known. d. Identifying segments of similar customers. This is unsupervised learning because there is no known outcome (though once you use unsupervised learning to identify segments, you could use supervised learning to classify new customers into those segments) .e. Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and nonbankrupt firms. This is supervised learning, because the status of the similar firms is known. f. Estimating the repair time required for an aircraft based on a trouble ticket. This is supervised learning, because there is likely to be knowledge of actual (historic) repair times of similar repairs g. Automated sorting of mail by zip code scanning. This is supervised learning, as there is likely to be knowledge about whether the sorting was correct. h. Printing of custom discount coupons at the conclusion of a grocery store checkout based on what you just bought and what others have bought previously .This is unsupervised learning, if we assume that we do not know what will be purchased in the future.

.

What is the essential element in the machine learning algorithms that distinguish supervised from unsupervised learning? A) In the supervised learning models target variable is used in the model, but in the unsupervised learning models there is no target variable B) Supervised learning models require significantly larger computational resources than unsupervised learning models C) Unsupervised learning models require significantly larger computational resources than supervised learning models D) Supervised learning model are better than unsupervised learning models

A) In the supervised learning models target variable is used in the model, but in the unsupervised learning models there is no target variable

The target variable in a multiple linear regression model must be a: A) Numerical variable B)Nominal Variable C)Ordinal variable D)Binary variable

A) Numerical variable

Which of the following tasks is a supervised learning task? • A) Predicting air pollution level B) Segmenting the market based on the common interests and priorities C) Visualizing relationship between the target variable and predictors D) Finding association rules in a large amount of retail transactions

A) Predicting air pollution level

Which statement about the data mining process is INCORRECT? A)Data cleaning and pre-processing is usually a trivial step in the process B)The overall objective of any data mining process is to improve the business • C)No value presents in raw, unprocessed, and unactionable data D)Data mining is a multidisciplinary field that makes a significant use of statistics

A)Data cleaning and pre-processing is usually a trivial step in the process

Which statement about business intelligence workflow is CORRECT? A)Data in the operational database is transformed to analytical data in the data warehouse B)External data sources are never used in the Bl analytics C)Business users and Bl analysts are usually the same group of people D) Transactional data software is directly connected to data warehouse

A)Data in the operational database is transformed to analytical data in the data warehouse

Which of the following is NOT a step in data pre-processing? • A)Data modeling B)Data integration C)Data cleaning D)Data reduction

A)Data modeling

"Learn from the observed records to predict numerical values of unseen records." In data mining, this is called A)Regression B)Classification C)Clustering D)Segmentation

A)Regression

To show the relationship between one numeric and one categorical variable, which plot type is NOT useful? A)Scatter plot B)Boxplot C)Bar chart D)Pie chart

A)Scatter plot

Which of the following statements is INCORRECT about the missing values in a data set? A)The best strategy is always to drop records with any missing values. B) Missing values are common issue and there are strategies to handle them C)Both categorical and numerical values can be missing D)Missing values can be estimated

A)The best strategy is always to drop records with any missing values.

Which statement is INCORRECT about exploratory data visualization? A)The purpose of visual exploration of data is to perform target prediction B)It is aimed to discover and display interesting and useful patterns and trends in data C)Varity of plot types are used in exploratory visualization D) It usually overlaps with exploratory statistical analytics

A)The purpose of visual exploration of data is to perform target prediction

Which of the followings is a core idea/task in data mining? A)Data visualization B)Regression modeling C)Data cleaning and pre-processing D)All of the others

D)All of the others

"Identifying segments of similar customers." Performing this task in data mining requires a supervised learning approach. True or False

False

"Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and nonbankrupt firms." Performing this task in data mining requires an unsupervised learning approach. True or False

False

Both the histogram and box plot are univariate plots that are useful for exploring the distribution of a variable True or False

True

Data exploration includes summary statistics, univariate and bivariate analysis, basic statistical tests (t-test, correlation), ANOVA, and outlier detection. True or False

True

In the data preparation step, normalizing numeric data is a popular method to transform variables into a more suitable structure for modeling. True or False

True

Transforming numerical variables means performing mathematical functions on them and creating new variables that are better suited for our data mining model. True or False

True

Higher value of Adjusted R2 indicates a model that can explain more variance in data true or false

true

We use the least squares method to find the line true or false

true


Related study sets

Mishna Masechtot by Seder - Partial List

View Set

ACCT 202 Exam 3 Rakow Loyola Chicago

View Set

New Testament Letters Test 3 (Calvin Seminary)

View Set

Comparative Advantage and Gains from Trade

View Set