CIS3920 test b.ank 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

A

(2) What is the final step of knowledge discovery? A. Knowledge presentation B. Data mining C. Data cleaning D. Pattern evaluation

D

1. What is an example of data quality issues? a. Noise b. Outliers c. Duplicate Data d. All of the above

F

2. Regression is a descriptive function of data mining. TrueFalse

D

2. Which of the following is NOT a type of attribute? a. Nominal b. Ordinal c. Interval d. Visual

A

3) What is known as developing techniques for mining encrypted data? (Tan page 14) a) Privacy-preserving data mining b) Security data mining c) Clustering Analysis d) All of the above

B

3. Which attribute type would 'grades' fall into? a. Nominal b. Ordinal c. Interval d. Ratio

D

3. Which of the following is Not a form of Machine Learning? a. Supervised Learning b. Unsupervised Learning c. Semi-supervised Learning d. Multi-supervised Learning

A

3. Which one is a predictive function of data mining? *slides p33 A. Classification B. Clustering C. Association Rule Discovery D. Sequential Pattern Discovery

C

) Anomaly detection is the task of identifying observations whose characteristics are significantly different from the rest of the data. Such observations are known as: Intro to data mining - Chapter 1.5 - Page 11 a)anomalies b)outliers c)both

D

.What is Data Visualization? Han Chapter 2, Page 56 A. Helping to identify noise and outliers. B. A line of best fit can be drawn to study the correlation between the variables. C. Measures of central tendency ad measures of dispersion. D. Aims to communicate data clearly and effectively through graphical representation.

A

1) What kinds of data can be mined? (Han page 8) a) Any data can be mined b) Only database data c) Business related data d) None

A

1. A web mining may involve: *slides p12 A. Data cleaning B. Data exploration C. Decision making D. Data presentation

C

1. There are many factors comprising data quality. Which of the following factors is not comprising data quality? (Han, Page 84) a) Accuracy b) Completeness c) Inconsistency (correct answer) d) Believability

T

1. Through data mining, we can recognize the pattern of data.TrueFalse

A

1. What are the tasks associated with data mining? a. Association Analysis, Cluster Analysis, Classification, Anomaly Detection, Predictive modeling b. Scalability, High Dimensionality, Non-traditional Analysis c. Classification, Regression modeling, Cluster Analysis d. Anomaly Detection, Machine Learning

E

2) What is a major dimension of multidimensional view? (Han slide 33) a) Data b) Knowledge c) Technologies d) Applications e) All of the Above

D

2. Which of the following is NOT a product of Postprocessing in Data Mining? a. Filtering Patterns b. Visualization c. Pattern Interpretation d. Data Subsetting

D

2. Which of the following is association analysis used to? (page 9) a. Find groups of closely related observations so that observations that belong to the cluster are similar to each other b. Identify observations whose characteristics are significantly different from the rest of the data c. Predict the value of a particular attribute based on the values of other attributes d. Discover patterns that describe strongly associated feature in the data, they are typically represented in the form of implication rules or feature subsets

C

3. Two example of Predictive modeling are: Tan Chapter 1, Page 8 A. Regression and Scalability B. Association analysis and Scalability C. Classification and Regression D. Classification and Association analysis

A

3. What are the two types of attributes? a. Qualitative and Quantitative b. Spatial and Temporal c. Training and Validation d. Data Preparation and Modeling

A

3. What form of analysis is used for fraud detection at credit card companies? (page 11) a. Anomaly detection b. Cluster analysis c. Association analysis d. Regression Analysis

B

Accuracy, completeness, consistency, timeliness, believability and interpretability all make up: a. Data mining b. data quality c. data cleaning d. data reduction

A

Another name for a data set that may contain objects that do not comply with the general behavior or model: a. Outliers queries b. decision tree c. cluster analysis d. database

D

What is a repository of information collected from multiple sources, stored under a unified schema and usually resides at a single site? a. Data selection b. data mining c. data transformation d. data warehouse

A

What is the pattern of knowledge discovery. A) Cleaning and Integration -> Selection and Transformation -> Data Mining -> Patterns -> Knowledge. B) Selection and Transformation -> Data Mining -> Cleaning and Integration -> Patterns -> Knowledge. C) Data Mining -> Cleaning and Integration -> Patterns -> Selection and Transformation -> Knowledge. D) Patterns -> Cleaning and Integration -> Selection and Transformation -> Data Mining -> Knowledge.

C

What's the last step of data mining in business intelligent? A. Data magnifying B. Data mining C. Decision making D. Dashboarding

D

Which is NOT an attribute type? Han Chapter 2, Page 41-44 A. Nominal B. Binary C. Ordinal D. BoxPlots

B

Which of the following are not general techniques for icon-based visualization? (Lecture #2, pg. 33) a) Shape coding b) Stick figures c) Color icons d) Tile bars

D

Which of the following is NOT a step in data mining? A. Obtain the data set to be used in the analysis B. Explore, clean, and preprocess the data. C. Reduce the data if necessary. D. Recycle the data.

B

Which of the following is NOT in the knowledge discovery(KDD) process? A. Database B. Data storehouse C. Data warehouse D. Data mining

D

Which of the following is Quantitative Attribute? A) Zip Codes B) ID Numbers C) Gender D) Length

D

Which of the following is a Qualitative Attribute? A) Age B) Temperature C) Length D) ID Numbers

C

The creation of a new set of features from the original raw data is known as: a)Aggregation b)Sampling c)Feature creation d)Dimensionality Reduction

D

(1) Which one is NOT one of the major tasks in data preprocessing? A. Data reduction B. Data normalization C. Data reduction D. Data visualization

D

(1) Which statement is correct about "data rich but information poor situation"? (Chapter page number: HAN Ch1. p5) A. The abundance of data, coupled with the need for powerful data analysis tools B. The fast-growing, tremendous amount of data, collected and stored in large and numerous data repositories C. The data we collect and stored today has far exceeded our human ability for comprehension without powerful tools. D. All of the above.

A

(2) Which one is discrete attribute? A. Zip codes B. Height C. Temperature D. All of the above

C

(3) Which one is NOT the alternative name of Data mining? A. Knowledge discovery in databases (KDD) B. Data dredging C. Knowledge harvesting D. Business intelligence

B

1. In data mining, tasks are usually divided into two major categories, which are: a. Intensive tasks & Non-Intensive tasks b. Predictive tasks & Descriptive tasks c. Dependent & Independent tasks d. None of the above

A

1. In what step of the knowledge discover process do we explore and clean the data? a. Preprocessing b. Deployment c. Evaluation d. Modeling

C

1. Please select the example of quantitative attribute. *slides p12 A. Eye color B. Letter grade C. Weight D. Zip code

D

1. We live in a world where vast amounts of data are collected daily. Considering that analyzing data is an important need, read the options and select the best alternative for data mining: (Han, page 8) a) All types of data can be mined b) Analyzing data is a fast process c) Data sources include only databases and data warehouses d) Data mining can be applied to any kind of data as long as the data are meaningful for a target application

D

1. Which of the following is NOT a form of clustering techniques? a. K-means b. Agglomerative Hierarchical c. DBSCAN d. Association

D

1. Which of the following is NOT a type of operation of number for describing attributes? a. Distinctness b. Order c. Addition d. Exponential

C

1. Which of these is a task used for predictive modeling? (page 8) a. Classification b. Regression c. Both a and b d. None of the above

B

1.What is Data Mining? Tan Chapter 1, Page 2 A. The step that ensures that only valid and useful results are incorporated into the decision support system. B. Data Mining is the process of automatically discovering useful information in large data repositories. C. Statistical measures or hypothesis testing methods can also be applied during postprocessing to eliminate spurious data mining results. D. The implementation of novel data structures to access individual records in an efficient manner.

C

2. A relational database is a collection of tables which consist of which of the following: a. Attributes (rows/records) & Tuples (columns/fields) b. Keys (rows) & Values (columns) c. Attributes (columns/fields) & Tuples (rows/records) d. Attributes (columns/fields) & Keys (rows)

C

2. Although in large databases where 'less can be more', how would aggregation become an issue? a. There should be no issue when aggregating b. This form of data processing should not be used on large databases overall c. Figuring which way to combine the values of the attributes d. The data becomes less useful and more complicated to work with

A

2. An attribute is ___ a. A property or characteristic of an object b. Dataset c. A collection of data objects d. An observation

A

2. Data Mining is an integral part of _____ a. Knowledge Discovery in Databases (KDD) b. Classification c. Regression Modeling d. Learning

D

2. Data Warehouse is a type of data source. According to the book, a data warehouse: (Han, page 10) a) Is a repository of information collected from single sources b) Is not constructed via process of data cleaning c) Is designed to store details of individual transactions d) Is a repository of information collected from multiple sources

B

2. In which data, the value of each component is the number of times the corresponding term occurs? *slides p21 A. Data matrix B. Document data C. Transaction data D. Graph data

A

2. What is Data Mining and Knowledge Discovery? Tan Chapter 1, Page 3 A. Data Mining is an integral part of knowledge discovery in databases (KDD) - which is the overall process of converting raw data into useful information. B. An example is visualization which allows analysts to explore the data and the data mining results from a variety of viewpoints. C. Handle massive data set, then must be scalable. D. The implementation of novel data structures to access individual records in an efficient manner.

C

2. What is not data mining? *slides p9 A. KDD process B. Group together similar documents returned by search engine according to the context C. Query a Web search engine for information D. Certain names are more prevalent in certain locations

C

3. Data cleaning is an important process of preprocessing data. Which of the following best describe Data cleaning? (Han, page 88) a) Data cleaning corrects inconsistencies but does not fill in missing values b) Data cleaning does not correct inconsistencies but identify outliers c) Data cleaning corrects inconsistencies and fill in missing values d) Data cleaning corrects inconsistencies and identifies outliers

A

3. There are a number of data mining functionalities for predictive analysis, including classification and regression. In this context, classification can be described as: (Han, page 18) a) The process of finding a model or function that describes and distinguishes data classes or concepts (correct answer) b) A set of items that appear frequently together in a transactional data set c) The comparison of the general features of the target data class object against general features of objects from contrasting classes d) None of the above

A

3. What are some specific challenges that motivated data mining? a. Scalability, High Dimensionality, Heterogenous Data, Data Ownership and Distribution, Non-traditional Analysis b. Dependent and Independent Variables c. Clustering, Anomalies, Databases d. Inflation, Low Economic Activity

A

3. What is the key principle to sampling? *slides p44 A. The sample should be representative of the entire dataset. B. The sample should be easy to acquire. C. The sample should be big. D. The sample should have equal probability of selecting any particular item

D

3. Which are examples of anomaly detection? a. Credit card fraud detection b. Industrial damage detection c. Intrusion detection d. All of the above

D

3. Which of the following is NOT a data transformation strategy? a. Smoothing b. Discretization c. Aggregation d. Sampling

C

4. What is dimensionality of a dataset? (pg 29) a. The quality of the data provided from the data b. How thinly distributed the data is c. The number of attributes that the objects in the data set posses d. The number of objects in the data set

A

5. Which type of data is a sequential data recorded in specific time intervals? (pg 35) a. Time Series Data b. Spatial Data c. Sequence Data d. Transactional Data

T

Association analysis is used to discover patterns that describe strongly associated features in the data True False

D

Based on the Market Basket Analysis which items correlate with one another? (a) Bread and Milk (b) Butter and Salmon (c) Sugar and Tea (d) Beer and Diapers

D

Because preventing data quality problems is typically not an option, data mining focuses on (Tan, pg. 37) a) The detection and correction of data quality problems b) The use of algorithms that can tolerate poor data quality c) The omission of data objects or attribute values d) Choices A and B may both apply

T

Data exploration can aid in selecting the appropriate preprocessing and data analysis techniques.TrueFalse

D

During this process, transformations are applied in order to obtain a reduced or "compressed" representation of the original data. a. Dimensionality reduction compression b. data cleansing c. numerosity reduction d. data

T

John Tukey created Exploratory Data Analysis (EDA) in the 1970's.TrueFalse

T

KDD is a process of consisting of a series of transformation steps, from data preprocessing to postprocessing of data mining results (T/F)

A

One of the major processes in data preprocessing is Data reduction and it can be defined as: (Han, page 86) a) Reduced representation of the data set that is much smaller in volume, yet produces the same analytical results. b) Data replaced by alternative, smaller representations using parametric models. c) Reduced representation of a data set using encoding schemes. d) None of the above

D

Outliers are a) Data objects with characteristics different from most other data objects b) Atypical values of an attribute c) Data objects that should be ignored d) Choices A and B can both apply

F

Projection techniques cannot help users find interesting projections of multidimensional data sets. (T/F)

D

Some ideas that data mining draws upon: Intro to data mining - Chapter 1.3 - Page 6 a)Search algorithms b)Modeling Techniques c)Sampling d)All of the above

T

The attribute to be predicted is commonly known as the target or dependent variable. (T/F)

A

The combining of two or more objects into a single object is: a)Aggregation b)Sampling c)Feature creation d)Dimensionality Reduction

D

The key challenges faced by distributed data mining algorithms include: Intro to data mining - Chapter 1.2 - Page 5 a) reduce the amount of communication needed to perform the distributed computation b) how to effectively consolidate the data mining results obtained from multiple source c) how to address data security issue d) all of the above

T

The purpose of preprocessing is to transform the raw input data into an appropriate format for subsequent analysis.TrueFalse

T

Variance and standard deviation are measure of data dispersion.(T/F)

D

What are general characteristics of data sets (a) Dimensionality (b) Sparsity (c) Resolution (d) All of the above

B

What is a commonly used approach for selecting a subset of the data objects to be analyzed? a)Aggregation b)Sampling c)Feature creation d)Dimensionality Reduction

B

What is a concept of summarization of the general characteristics or features of a target class of data. A) Data Discrimination B) Data Characterization C) Data Aggregation D) Data Separation

A

What is a property or characteristic of an object that varies from one object to another? a. Attribute b. ordinal c. data set d. anomaly detection

B

What is an Attribute? Han Chapter 2, Page 40 A. A sample should be representative of the entire dataset. B. An attribute is a data field, representing a characteristic or feature of a data object. C. A scatter plots can be extended to n attributes. D. Small icon to represent multidimensional data values.

A

What is noise in data? (Han page 89) a) Random error or variance in a measured variable. b) Statistical data c) Irrelevant data d) None of the above

C

What is the attribute type of calendar dates? A. Nominal B. Ordinal C. Interval D. Ratio

A

What is the first step in data cleaning? (Han page 91) a) Discrepancy detection/Monitoring Errors b) Analyzing data c) Communicating with Team d) All of the above

B

What is the process of integrating data mining results into decision support systems'? a. KDD b. closing the loop c. scalability d. postprocessing

C

What is the process where intelligent methods are applied to extract data patterns? A) Data Selection B) Data Transformation C) Data Mining D) Patten Evaluation

A

What is the techniques that can be applied to obtain reduced representation of the data set. A) Data Reduction B) Data Value Conflict Detection C) Data Aggregation D) Data Transformation

A

What property(s) or operation(s) are used to describe distinctness? (Tan page 47) a) = and != b) + and - c) * and / d) All of the above

A

Which of the following is NOT a data mining function? A. Serialization B. Classification C. Generalization D. Association and correlation analysis

D

Which of the following is a continuous variable? A. Zip code B. Social security number C. Student ID number D. Temperature

C

Which of the following is a qualitative attribute? A. Weight B. Number of TV C. Eye color D. Temperature

D

Which of the following is not one of the most forms of data for mining? Descriptive Data (Han, pg. 8) a) Database data b) Warehouse data c) Transactional data d) Descriptive Data

A

Which of the following is the correct sequence of the first four steps in data mining? (Lecture #1 ppt, pg. 55) a) 1. Understand the data mining project's purpose 2. Obtain the data set to be used in the analysis 3. Explore, clean, and preprocess the data 4. Reduce the data, if necessary b) 1. Obtain the data set to be used in the analysis 2. Understand the data mining project's purpose 3. Determine the data mining task 4. Reduce the data, if necessary c) 1. Determine the data mining task 2. Obtain the data set to be used in the analysis 3. Explore, clean, and preprocess the data 4. Reduce the data, if necessary d) 1. Understand the data mining project's purpose 2. Determine the data mining task 3. Explore, clean, and preprocess the data 4. Reduce the data, if necessary

B

Which of these a method to perform data transformation? (02a-Lec2a-Han-Ch3-Tan-Ch2.pdf page 29) a. Data compression b. Normalization c. Filling in missing data d. Dimensionality reduction

C

Which of these are a method to for solving missing values? (a) Fill in the value manually (b) Use a global constant (c) All of the above (d) None of the above

C

Which of these do NOT fall in the knowledge discovery process? (a) Data transformation (b) Data selection (c) Data presentation (d) Data cleaning

D

Which of these do data mining fall into? (a) Statistics (b) Machine Learning (c) Pattern Recognition (d) None of the above

D

Which one is a smoothing technique (a) Binning (b) Regression (c) Outlier Analysis (d) All of the above

C

Which one of the following methods does not bias the data when addressing missing values in Data Cleaning? (Han, pg. 88-89) a) Use the most probably value to fill in the missing value b) Use a global constant to fill in the missing value c) Ignore the tuple d) Use the attribute mean or median for all samples belonging to the same class as a given tuple


Set pelajaran terkait

Unidad 2 La célula- Teoría celular

View Set

Pharm - Neurological medications

View Set

Ch. 5: Public Spending and Public Choice

View Set

honors chemistry: 9.3, 11.1, & 11.2

View Set

Domain 3 - Injury and Care (Including Legal)

View Set

Unit 2 - remembering and reflecting on the holocaust

View Set

Napoleon, World War I, Victorian Age/ Industrial Revolution, Project

View Set