Data Mining Exam 1: Lecture 3

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Cosine measure Often falls in the range [-1,1]. True False

True

Similarity/Dissimilarity for Simple Attributes: Interval or Ratio

d = |p-q| s = -d, or s = 1/(1+d)

Similarity/Dissimilarity for Simple Attributes: Ordinal

d = |p-q|/(n-1) s = 1 - |p-q|/(n-1)

Simple Matching Coefficient

(F11+F00) / (F01 + F10 + F11 + F00)

Similarity Measure

-A numerical measure of how alike two data objects are -Is higher when objects are more alike -Often falls in the range [0,1]

Dissimilarity Measure

-Numerical measure of how different are two data objects -Lower when objects are more alike -Minimum dissimilarity is often 0 -Upper limit varies

Similarity/Dissimilarity for Simple Attributes: Nominal

d = 0 if p=q d = 1 if p != q s = 1 if p =q s = 0 if p!= q

Which of the following statements regarding data exploration is NOT true? A. Data exploration can make use of humans' abilities to recognize patterns. B. In data mining, clustering and anomaly detection can only be applied to handle data exploration tasks. C. Data exploration can help to select the right tool for preprocessing or analysis. D. A preliminary exploration of the data can help to better understand its characteristics.

B. In data mining, clustering and anomaly detection can only be applied to handle data exploration tasks.

For the following vectors, x and y, x = (1, 1, 0, 1, 0, 1), y = (1, 1, 1, 0, 0, 1), the Jaccard coefficient of x and y is: A. 0.4 B. 0.5 C. 0.6 D. 0.7

C. 0.6

Which of the following statements regarding similarity and dissimilarity measures is NOT true? A. Similarity measure often falls in the range [0,1]. B. The dissimilarity measure is lower when objects are more alike. C. Dissimilarity measures always have an invariant upper limit. D. Proximity refers to a similarity or dissimilarity.

C. Dissimilarity measures always have an invariant upper limit.

Mahalanobis Distance

Determining similarity of an unknown Sample set to a known one. Takes into account the correlations of the Data set and is scale-invariant

Euclidean Distance

Dist = sqrt(sum(Pk - Qk)^2)) n = number of dimensions (attributes) and Pk and Qk are respectively the kth attributes or data objects p and q

Similarity Between Binary Vectors

F01 = # of attributes where p was 0 and q was 1 F10 = # of attributes where p was 1 and q was 0 F00 = # of attributes where p was 0 and q was 0 F11 = # of attributes where p was 1 and q was 1

Jaccard Coefficient

F11/(F01 + F10 + F11)

If two objects have a cosine measure of 1, they are identical. True False

False

Minkowski Distance is a

Generalization of Euclidean Distance

Cosine Similarity

If d1 and d2 are document vectors then cos(d1, d2) = (d1 o d2) / (||d1|| * ||d2||)

Correlation

Measures the linear relationship between objects, and ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation)

Minkowski Distance Examples

r = 1: City block distance r = 2: Euclidean distance r = infinity (max diff. between any component of the vectors)

Common Properties of a Similarity

s(p, q) = 1 only if p = q s(p, q) = s(q, p) for all p and q (symmetry)

Proximity refers to a

similarity or dissimilarity

Voir tous les ensembles d'études

Data Mining Exam 1: Lecture 3

Ensembles d'études connexes

Chapter 9

econ, chapter 19 - public goods and tragety of the commons

CH 8 SCM

Computers in Health Care Units 1 2 3 4

Exam 2

Micro- Exam 1

NEF pre-interm 3D - tenses review

intermediate chap 23

Tax Accounting - Chapter 4

Unit 2-2 Lesson 13

US Government Chapter 7

ACC 202 EXAM 1

Strategic Management: Exam 1 Study Guide

AP Bio Unit 6 DNA and Gene expression review

Module2HW

CFA Level I - Corporate Finance

APES - Chapter 16

Fundamentals of Care: Safety

Introduction to Ammunition

4-Authzd Relps Duties and Disclosre