Data Mining: Chapters 1, 2, & 3

¡Supera tus tareas y exámenes ahora con Quizwiz!

Given the following sets of integers:A={1, 2, 3, 4}B={2, 3, 5, 7}C={2, 4, 6}Which pair has the highest Jaccard similarity?

A and C

Which of the following algorithms would be more applicable to use for clustering when the number of clusters is known and time efficiency is an issue?

K-means

Which of the following is generally NOT a reason for using MapReduce over OLAP ?

MapReduce provides quick access to simple statistics for the data

Identify all the data mining tasks that could be naturally associated with the following application: Finding how a person's average weight varies with demographics.

Regression

What is the main purpose of a data warehouse?

To provide a repository for the purpose of analysis

Given the following characteristic matrix: Document 1Document 2Document 3110011101110 and permutations:P1 = (2,4,3,1)P2 = (1,2,4,3)P3 = (4,3,2,1)What pair of documents has the lowest Jaccard similarity based on their minhash signatures?

1 and 3

Given the following characteristic matrix: Document 1Document 2Document 3110011101110 and permutations:P1 = (2,4,3,1)P2 = (1,2,4,3)P3 = (4,3,2,1)What is the minhash signature of document 3?

2,2,3

Assume you have the following clusters of points:C1={1,2}, C2={4} What is the average silhouette score of this clustering, assuming Euclidean distance is used?

7/18

Assume the output from Mappers in a MapReduce application is the following set of key/value pairs:<1, 1>, <1, 2>, <1, 3>, <2, 1>, <2, 2>, <3, 1>What is the set of inputs that are given to the reducers?

<1, [1, 2, 3]>, <2, [1, 2]>, <3, [1]>

Identify all the data mining tasks that could be naturally associated with the following application: Finding possible drug interactions from patient data.

Association Rule Mining

A Decision Tree is an example of a:

Classifer

Identify all the data mining tasks that could be naturally associated with the following application: Recommending a possible friend for a user of a social network.

Collaborative Filtering, Classification, and Link Analysis

Which of the following is an appropriate way of determining the number of clusters for the k-means method?

Compute average silhouette scores for different values of k and choose k that generates the largest value. Compute average cluster diameter for different values of k, graph the diameter values over k, and choose a knee point.

Which of the following attribute types is most fitting for an attribute specifying a distance value, where the values are specified in meters?

Continuous

How does Hadoop know that DataNodes fail?

DataNodes stop sending heartbeat messages

Which of the following distance measures is most applicable for comparing strings?

Edit Distance

Based on the TF.IDF measure of similarity, if a term appears in many documents, it will be given a higher IDF value.

False

The purpose of shingling is to reduce the representation size of a text document.

False

Using Locality-Sensitive Hashing, two dissimilar documents will always end up in separate buckets and therefore the similarity of these documents will not have to be computed.

False

Which of the following attribute types is most fitting for an attribute specifying an object's color (red, green, white, etc.) ?

Nominal

Which of the following attribute types is most fitting for an attribute specifying a person's income as low, medium, or high?

Ordinal

Which of the following SQL statements is equivalent to the Relational Algebra query:

SELECT name FROM Employee WHERE age=21

In Hadoop's HDFS, there is a single NameNode, whose purpose is to manage the file system namespace.

True

A data schema is

a description of a data set's attributes and their properties

Assume you have 3 documents with the following terms: D1 = "computer", "web", "storage", "options" D2 = "computer", "game", "development" D3 = "web", "development", "frameworks" What is the TF.IDF score of the term "computer" in document D1?

0.58496


Conjuntos de estudio relacionados

Chapter 2: Financial Statements, Taxes, and Cash Flow

View Set

ES2580: THE ASIAN AMERICAN EXPERIENCE - OUTLINE EXAM 1

View Set

CHAPTER 8 - SEGMENTING AND TARGETING MARKETS

View Set