Data Mining Final

Ace your homework & exams now with Quizwiz!

Data mining is a simple transformation of technology developed from databases, statistics, and machine learning?

False (IS NOT)

A process of finding a model that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. It predicts categorical (discrete, unordered) label

classification

database

clean data

has difficulty when number of classes is large:

gini index

datawarehouse

any data used, dirty data

how do you handle noisy data

binning, regression, clustering, detect suspicious values

A process to analyze data objects without consulting a known class label. The objects are clustered or grouped based on the principle of maximizing the intra-class similarity and minimizing the interclass similarity

clustering

1950-1990

computational science

a process that removes or transforms noise and inconsistent data

data cleaning

where multiple data sources may be combined

data integration

1990-now

data science

Clustering partitions data set into clusters based on ________.

similarities

The goal of data reduction is to obtain a reduced representation of the data set that is much ____ in volume yet produces the same ____ results.

smaller,analytical

regression

smooth by fitting data into a function

Noise is error or variance in a measured variable.

true

When data is integrated, redundant attributes may be generated. Such redundancy could be detected by ____ analysis and _____ analysis

covariance, correlation

an essential process where intelligent and efficient methods are applied in order to extract patterns

data mining

objects are made up of:

entities

The task of data cleaning is just to get rid of noisy data.

false

Decision tree is constructed in a bottom-up recursive divide-and-conquer manner.

false (top up)

Dissimilarity of data is higher when objects are more alike

false(lower)

prefers unbalanced splits in which one partition os much smaller than the

gain ratio

data preprocessing

improves data

Another challenge in data mining is the parallel, distributed, and [a1] processing of data mining algorithms. Due to the high cost of some data mining processes, [a1] data mining algorithms incorporate database updates without the need to mine the entire data again from scratch. The two blanks should be filled with the same word. What is it?

incremental

has bias towards multivalued attributes:

information gained

where visualization and knowledge representation techniques are used to present the mined knowledge to the user

knowledge presentation

Central tendency of a data can be measured by mean, median, and____

mode

A two-step process of classification are explained by the following two:

model construction, model usage

what are the 5 data attributes

nominal,binary,ordinal, interval scaled, ratio scaled

data sets are made up of:

objects

A process to analyze the objects that do not comply with the general behavior or model of the data. Examples include fraud detection based on a large dataset of credit card transactions

outlier analysis

An induced tree may _____ the training data when it has too many branches. Some may reflect anomalies due to noise or outliers.

overfit

where data relevant to the analysis task are retrieved from the database

data selection

where data are transformed or consolidated into forms appropriate for mining

data transformation

clustering

detect and remove outliers

The need for data mining has arisen due to the wide availability of huge amounts of data and the imminent need for turning such data into useful _____ and ____

knowledge, information

In supervised learning, the training data are accompanied by ____ which are indicating the class of the observations.

labels

Correlation analysis measures the _________ relationship between object.

liner

a process that identifies the truly interesting patterns representing knowledge based on some interesting measures

pattern evaluation

A process to model continuous-valued functions. It is used to predict missing or unavailable numerical data values rather than (discrete) class label

regression

One challenge to data mining regarding performance issues is the ___and ___ of data mining algorithms, because it is extremely important to effectively extract information from large amounts of data in databases within predictable and acceptable running times

scalability, efficency

data normalization

scales data

data visualization

search patterns, trends among data

The purpose of data pre-processing is to improve data quality.

true


Related study sets

Ch. 11: Stuttering and Other Fluency Disorders

View Set