Data Mining Exam 1: Lecture 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Attribute transformation

a function that maps the entire set of values of a given attribute to a new set of replacement values such that each old value can be identified with one of the new values

An attribute is

a property or characteristic of an object

Binary attributes are

a special case of discrete attributes

Outliers

are data objects with characteristics that are considerably different than most of the other data objects in the data set

Dimensionality Reduction

attempts to avoid curse of dimensionality through either PCA, SVD, supervised & non-linear techniques

A collection of attributes

describe an object

Feature Subset Selection

Removes redundant and irrelevant features

Sampling

The main technique employed for data selection because obtaining the entire set of data is too expensive or time consuming

Discretization

The process of converting a continuous attribute into an ordinal attribute

An attribute is a property of characteristic of an object

True

For ordinal attributes

Values distinguish and order objects (<, >)

For nominal attributes

Values only distinguish between one another (= or !=)

Data Preprocessing Strategies

-Aggregation -Sampling -Dimensionality Reduction -Feature subset selection -Feature creation -Discretization & Binarization -Attribute Transformation

Aggregation

-Combining two or more attributes (or objects) into a single attribute (or object) in order to: reduce data, change scale, or get more stable data

How to handle missing values?

-Eliminate data objects -Estimate missing values -Ignore the missing value during analysis

Continuous Attributes

-Has real numbers as attribute values, like temperature, height, or weight -Real values can only be measured and represented using a finite number of digits -Typically represented as floating-point variables

Discrete Attributes

-Have only a finite or countable infinite set of values, like zip codes, counts, or set of words in a document -Often represented as integer variables -Binary attributes are a special case of discrete attributes

Data quality problems

-Noise and outliers -Missing values -Duplicate data

4 Types of attributes

-Nominal -Ordinal -Interval -Ratio

Which of the following statements about asymmetric attributes is correct? A. Non-zero attribute values are equally important as zero values in data analysis. B. Non-zero attribute values are more important than zero value in data analysis. C. Zero attribute value is more important than non-zero attribute values in data analysis. D. none of the above

B. Non-zero attribute values are more important

For ratio attributes

Both differences AND ratios are meaningful (*, /)

(qualitative/categorical) Ordinal data examples

Rankings (taste of potato chips from 1-10), grades, height in (tall, medium, short)

(quantitative/numeric) Interval data examples

Calendar dates, temperatures in Celsius or Fahrenheit

What is Data?

Collection of data objects and their attributes.

Feature Creation

Creates new attributes that can capture the important information in a data set much more efficiently than the original attributes

Which of the following is an example of data quality problems? A. Noise and outliers. B. Missing values. C. Duplicate data. D. All of the above.

D. All of the above

Which of the following is NOT one of the three important characteristics of structured data. A. Resolution. B. Dimensionality. C. Sparsity. D. Sample size.

D. Sample Size

For interval attributes

Differences between values are meaningful (+, -)

Important characteristics of structured data

Dimensionality - curse of dimensionality Sparsity - only presence counts Resolution - patterns depend on the scale

Age in years is:

Discrete, quantitative, ratio

Attributes and attribute values are equivalent

False

(qualitative/categorical) Nominal data examples

ID numbers, eye color, zip codes

Binarization

Maps a continuous or categorical attribute into one or more binary variables

Noise

refers to modification of original values

Normalization is a form of attribute transformation that:

refers to various techniques to adjust to differences among attributes in terms of frequency of occurrence, mean, variance, and magnitude

(quantitative/numeric) Ratio data examples

temperature in Kelvin, length, time, counts

An Attribute is also known as

variable, characteristic, or feature


Conjuntos de estudio relacionados

Client assessment CH.15 Assessing Head and Neck

View Set

electrical Electrical - Conductors !

View Set

Debt: US Government Debt Section 3

View Set

Unit 5 America's Past - 2nd Grade Social Studies

View Set

Auditing: Chapter 15 - Audit Reports for Financial Statement Audits

View Set

NEC National Electrical Code CH2 {Article 250-285}

View Set