Data Mining Midterm
A major feature of a data warehouse is that ____. A. old data is removed periodically to improve performance B. typical users include clerks and database professionals C. it focuses on the day-to-day operations of an organization D. is time-variant
D
Intuitively, the drill-down OLAP operation corresponds to concept ____ in a concept hierarchy. A. cooperation B. ascension C. forecasting D. specialization
D
The major dimensions of a multidimensional view are: A. data, knowledge, utilization, applications B. data, knowledge, technologies, and applications C. data, knowledge, applications D. data, utilization, concurrency, modernization
b
Characterization and discrimination; the mining of frequent patterns, associations, and correlations; classification and regression; clustering analysis, and outlier analysis are all examples of data mining ____________
functionalities
A pattern is ________ if it is valid on test data with some degree of certainty, novel, potentially useful, and easily understood by humans
interesting
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the first quartile? A. 19 B. 20 C. 21 D. 22
b
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the median? A. 24 B. 25 C. 22 D. 27
b
Attribute-oriented induction is an alternative to the _____ approach for data generalization. A. three tier architecture B. concept hierarchies C. back-end tools D. data cube
b
________ is the process of discovering interesting patterns from massive amounts of data
data mining
Interesting patterns represent knowledge t/f
t
Smoothing, attribute construction, aggregation, normalization, and descritization are examples of data ____________ strategies
transformation
An _____________ association rule is a rule that is deemed as strong association rule by the association analysis but it is of no value to the problem at hand or that it is misleading
uninteresting
Measures of pattern interestingness are either objective or
subjective
Consider two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8). What is the Manhattan distance?
11
How many cuboids are there in a 5-dimensional data cube if there were no hierarchies associated to any dimensions?
32
How many cuboids are there in an 9-dimensional data cube if there were no hierarchies associated to any dimension?
512
Consider two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8). What is the Supremum distance? (round to 2 decimal places)
6
Consider two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8). What is the Minkowski distance? (round to 2 decimal places)
6.15
Consider two objects represented by the tuples (22, 1, 42, 10) and (20, 0, 36, 8). What is the Euclidian distance? (round to 2 decimal places)
6.71
How many steps are there involved in data mining when viewed as KDD?
7
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the mean? A. 25 B. 29.96 C. 32.56 D. 30.40
B
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the mode? A. 25 B. 35 C. 25 and 35 D. Does not exist
C
In attribute-oriented induction, data relevant to the task at hand is collected and then generalization is performed by either attribute generalization or ___. A. full materialization B. concept description C. attribute removal D. partial materialization
C
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the 3rd quartile? A. 35 B. 36 C. 40 D. 45
a
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the minimum? A. 13 B. 15 C. 16 D. 17
a
A ____ is a repository for long-term storage of data from multiple sources organized so as to facilitate management decision making. A. Data warehouse B. data mart C. transactional database D. object oriented programming language
a
Consider a data cube measure obtained by applying the average() function. The measure is ___. A. algebraic B. analytic C. holistic D. atomic
a
Data mining functionalities are used to specify kinds of patterns or _______ to be found in data mining tasks. A. knowledge B. transactions C. relations D. history
a
In the ___ method, the process to design and construct a data warehouse is sequential, moving onto each phase only if the previous phase is complete. A. waterfall B. top-down C. spiral D. bottom-up
a
Location Resource Brazil-------8,233 USA--------3,069 Canada----2,902 China------2,840 Colombia--2,132 2-D data cube above represents info on freshwater resources per country (in kms cubed). The cube contains the dimensions location and resource. The concept hierarchy for location is defined as the total order "country<continent". Which operation materializes the view provided below? Location--------Resource Canada----2,902 China------2,840 A. dice B. drill-up C. drill-through D. rotate
a
Multidimensional data mining is also called _______ multidimensional data mining A. Exploratory B. Meaningful C. Modern D. Useful
a
The ____ OLAP operation performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction. A. roll-up B. rotate C. drill-down D. slice
a
A data warehouse is a ____ collection of data in support of management's decision making process. A. day-today operations oriented, integrated, time-variant, and nonvolatile B. subject-oriented, integrated, time-variant, and nonvolatile C. subject-oriented, integrated, time-variant, and volatile D. subject-oriented, integrated, time-invariant, and nonvolatile
b
The bottom layer in a three-tier data warehouse architecture typically consists of _____. A. analysis and API tools B. a relational database system C. a server implemented using a ROLAP or MOLAP model D. a client layer
b
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the midrange? A. 30 B. 41 C. 41.5 D. 42.5
c
Among the data warehouse applications, ______ applications supports OLAP operatiions such as roll-up, drill-down, and slice. A. star schema B. data mining C. analytical processing D. information processing
c
Data warehouse systems provide multidimensional data analysis capabilities, collectively referred to as A. TCP B. UDP C. OLAP D. relational database
c
The ___ OLAP operation is realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions. A. dice B. rotate C. drill-down D. drill-up
c
Measures of _________ tendency indicate where most of the values in our data set fall
central
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70 What is the maximum? A. 45 B. 46 C. 52 D. 70
d
A form of dimensional modeling used in online analytical processing systems is the _____. A. relational diagram B. entity-relationship model C. object-oriented data model D. star schema
d
An advantage of the spiral method to design and construct a data warehouse is that A. it requires fewer resources B. the process moves onto each phase only if the previous phase is complete C. risks are managed late in the process D. modifications can be done quickly
d
Location Resource Brazil-------8,233 USA--------3,069 Canada----2,902 China------2,840 Colombia--2,132 2-D data cube above represents info on freshwater resources per country (in kms cubed). The cube contains the dimensions location and resource. The concept hierarchy for location is defined as the total order "country<continent". Which operation materializes the view provided below? Location--------Resource South America---10,365 North America---5,971 Asia---------------2,840 A. pivot B. drill-down C. slice D. roll-up
d
Some measures of ________ are variance, standard deviation, and interquartile range.
dispersion
CIST MPK Useful acronym form 7 steps of data mining when viewed as KDD. Data cleansing Data integration Data selection Data transformation Data mining Pattern evaluation ________________ What is the last step? (starts with K)
knowledge presentation
The normal measures are mean, median, and ______
mode