CIS3920 test b.ank 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

1. What is an example of data quality issues? a. Noise b. Outliers c. Duplicate Data d. All of the above

1. There are many factors comprising data quality. Which of the following factors is not comprising data quality? (Han, Page 84) a) Accuracy b) Completeness c) Inconsistency (correct answer) d) Believability

1. Through data mining, we can recognize the pattern of data.TrueFalse

2) What is a major dimension of multidimensional view? (Han slide 33) a) Data b) Knowledge c) Technologies d) Applications e) All of the Above

3. What are the two types of attributes? a. Qualitative and Quantitative b. Spatial and Temporal c. Training and Validation d. Data Preparation and Modeling

3. What form of analysis is used for fraud detection at credit card companies? (page 11) a. Anomaly detection b. Cluster analysis c. Association analysis d. Regression Analysis

3. Which of the following is NOT a data transformation strategy? a. Smoothing b. Discretization c. Aggregation d. Sampling

1.What is Data Mining? Tan Chapter 1, Page 2 A. The step that ensures that only valid and useful results are incorporated into the decision support system. B. Data Mining is the process of automatically discovering useful information in large data repositories. C. Statistical measures or hypothesis testing methods can also be applied during postprocessing to eliminate spurious data mining results. D. The implementation of novel data structures to access individual records in an efficient manner.

3. There are a number of data mining functionalities for predictive analysis, including classification and regression. In this context, classification can be described as: (Han, page 18) a) The process of finding a model or function that describes and distinguishes data classes or concepts (correct answer) b) A set of items that appear frequently together in a transactional data set c) The comparison of the general features of the target data class object against general features of objects from contrasting classes d) None of the above

3. Two example of Predictive modeling are: Tan Chapter 1, Page 8 A. Regression and Scalability B. Association analysis and Scalability C. Classification and Regression D. Classification and Association analysis

3. What are some specific challenges that motivated data mining? a. Scalability, High Dimensionality, Heterogenous Data, Data Ownership and Distribution, Non-traditional Analysis b. Dependent and Independent Variables c. Clustering, Anomalies, Databases d. Inflation, Low Economic Activity

3. What is the key principle to sampling? *slides p44 A. The sample should be representative of the entire dataset. B. The sample should be easy to acquire. C. The sample should be big. D. The sample should have equal probability of selecting any particular item

4. What is dimensionality of a dataset? (pg 29) a. The quality of the data provided from the data b. How thinly distributed the data is c. The number of attributes that the objects in the data set posses d. The number of objects in the data set

Which of the following is the correct sequence of the first four steps in data mining? (Lecture #1 ppt, pg. 55) a) 1. Understand the data mining project's purpose 2. Obtain the data set to be used in the analysis 3. Explore, clean, and preprocess the data 4. Reduce the data, if necessary b) 1. Obtain the data set to be used in the analysis 2. Understand the data mining project's purpose 3. Determine the data mining task 4. Reduce the data, if necessary c) 1. Determine the data mining task 2. Obtain the data set to be used in the analysis 3. Explore, clean, and preprocess the data 4. Reduce the data, if necessary d) 1. Understand the data mining project's purpose 2. Determine the data mining task 3. Explore, clean, and preprocess the data 4. Reduce the data, if necessary

Which of these are a method to for solving missing values? (a) Fill in the value manually (b) Use a global constant (c) All of the above (d) None of the above

What is a repository of information collected from multiple sources, stored under a unified schema and usually resides at a single site? a. Data selection b. data mining c. data transformation d. data warehouse

The creation of a new set of features from the original raw data is known as: a)Aggregation b)Sampling c)Feature creation d)Dimensionality Reduction

Data exploration can aid in selecting the appropriate preprocessing and data analysis techniques.TrueFalse

Variance and standard deviation are measure of data dispersion.(T/F)

3) What is known as developing techniques for mining encrypted data? (Tan page 14) a) Privacy-preserving data mining b) Security data mining c) Clustering Analysis d) All of the above

(2) Which one is discrete attribute? A. Zip codes B. Height C. Temperature D. All of the above

.What is Data Visualization? Han Chapter 2, Page 56 A. Helping to identify noise and outliers. B. A line of best fit can be drawn to study the correlation between the variables. C. Measures of central tendency ad measures of dispersion. D. Aims to communicate data clearly and effectively through graphical representation.

1. Please select the example of quantitative attribute. *slides p12 A. Eye color B. Letter grade C. Weight D. Zip code

2. An attribute is ___ a. A property or characteristic of an object b. Dataset c. A collection of data objects d. An observation

Accuracy, completeness, consistency, timeliness, believability and interpretability all make up: a. Data mining b. data quality c. data cleaning d. data reduction

Another name for a data set that may contain objects that do not comply with the general behavior or model: a. Outliers queries b. decision tree c. cluster analysis d. database

1. We live in a world where vast amounts of data are collected daily. Considering that analyzing data is an important need, read the options and select the best alternative for data mining: (Han, page 8) a) All types of data can be mined b) Analyzing data is a fast process c) Data sources include only databases and data warehouses d) Data mining can be applied to any kind of data as long as the data are meaningful for a target application

1. What are the tasks associated with data mining? a. Association Analysis, Cluster Analysis, Classification, Anomaly Detection, Predictive modeling b. Scalability, High Dimensionality, Non-traditional Analysis c. Classification, Regression modeling, Cluster Analysis d. Anomaly Detection, Machine Learning

1. Which of the following is NOT a form of clustering techniques? a. K-means b. Agglomerative Hierarchical c. DBSCAN d. Association

1. Which of the following is NOT a type of operation of number for describing attributes? a. Distinctness b. Order c. Addition d. Exponential

1. Which of these is a task used for predictive modeling? (page 8) a. Classification b. Regression c. Both a and b d. None of the above

2. Which of the following is NOT a product of Postprocessing in Data Mining? a. Filtering Patterns b. Visualization c. Pattern Interpretation d. Data Subsetting

2. Which of the following is association analysis used to? (page 9) a. Find groups of closely related observations so that observations that belong to the cluster are similar to each other b. Identify observations whose characteristics are significantly different from the rest of the data c. Predict the value of a particular attribute based on the values of other attributes d. Discover patterns that describe strongly associated feature in the data, they are typically represented in the form of implication rules or feature subsets

3. Data cleaning is an important process of preprocessing data. Which of the following best describe Data cleaning? (Han, page 88) a) Data cleaning corrects inconsistencies but does not fill in missing values b) Data cleaning does not correct inconsistencies but identify outliers c) Data cleaning corrects inconsistencies and fill in missing values d) Data cleaning corrects inconsistencies and identifies outliers

3. Which of the following is Not a form of Machine Learning? a. Supervised Learning b. Unsupervised Learning c. Semi-supervised Learning d. Multi-supervised Learning

3. Which one is a predictive function of data mining? *slides p33 A. Classification B. Clustering C. Association Rule Discovery D. Sequential Pattern Discovery

5. Which type of data is a sequential data recorded in specific time intervals? (pg 35) a. Time Series Data b. Spatial Data c. Sequence Data d. Transactional Data

Association analysis is used to discover patterns that describe strongly associated features in the data True False

Based on the Market Basket Analysis which items correlate with one another? (a) Bread and Milk (b) Butter and Salmon (c) Sugar and Tea (d) Beer and Diapers

Because preventing data quality problems is typically not an option, data mining focuses on (Tan, pg. 37) a) The detection and correction of data quality problems b) The use of algorithms that can tolerate poor data quality c) The omission of data objects or attribute values d) Choices A and B may both apply

One of the major processes in data preprocessing is Data reduction and it can be defined as: (Han, page 86) a) Reduced representation of the data set that is much smaller in volume, yet produces the same analytical results. b) Data replaced by alternative, smaller representations using parametric models. c) Reduced representation of a data set using encoding schemes. d) None of the above

What is a concept of summarization of the general characteristics or features of a target class of data. A) Data Discrimination B) Data Characterization C) Data Aggregation D) Data Separation

What is a property or characteristic of an object that varies from one object to another? a. Attribute b. ordinal c. data set d. anomaly detection

What is the first step in data cleaning? (Han page 91) a) Discrepancy detection/Monitoring Errors b) Analyzing data c) Communicating with Team d) All of the above

What is the pattern of knowledge discovery. A) Cleaning and Integration -> Selection and Transformation -> Data Mining -> Patterns -> Knowledge. B) Selection and Transformation -> Data Mining -> Cleaning and Integration -> Patterns -> Knowledge. C) Data Mining -> Cleaning and Integration -> Patterns -> Selection and Transformation -> Knowledge. D) Patterns -> Cleaning and Integration -> Selection and Transformation -> Data Mining -> Knowledge.

What is the process of integrating data mining results into decision support systems'? a. KDD b. closing the loop c. scalability d. postprocessing

What is the process where intelligent methods are applied to extract data patterns? A) Data Selection B) Data Transformation C) Data Mining D) Patten Evaluation

What is the techniques that can be applied to obtain reduced representation of the data set. A) Data Reduction B) Data Value Conflict Detection C) Data Aggregation D) Data Transformation

What property(s) or operation(s) are used to describe distinctness? (Tan page 47) a) = and != b) + and - c) * and / d) All of the above

What's the last step of data mining in business intelligent? A. Data magnifying B. Data mining C. Decision making D. Dashboarding

Which of the following is a qualitative attribute? A. Weight B. Number of TV C. Eye color D. Temperature

Which of the following is not one of the most forms of data for mining? Descriptive Data (Han, pg. 8) a) Database data b) Warehouse data c) Transactional data d) Descriptive Data

Which of these a method to perform data transformation? (02a-Lec2a-Han-Ch3-Tan-Ch2.pdf page 29) a. Data compression b. Normalization c. Filling in missing data d. Dimensionality reduction

Which one is a smoothing technique (a) Binning (b) Regression (c) Outlier Analysis (d) All of the above

Which one of the following methods does not bias the data when addressing missing values in Data Cleaning? (Han, pg. 88-89) a) Use the most probably value to fill in the missing value b) Use a global constant to fill in the missing value c) Ignore the tuple d) Use the attribute mean or median for all samples belonging to the same class as a given tuple

KDD is a process of consisting of a series of transformation steps, from data preprocessing to postprocessing of data mining results (T/F)

Outliers are a) Data objects with characteristics different from most other data objects b) Atypical values of an attribute c) Data objects that should be ignored d) Choices A and B can both apply

Projection techniques cannot help users find interesting projections of multidimensional data sets. (T/F)

1) What kinds of data can be mined? (Han page 8) a) Any data can be mined b) Only database data c) Business related data d) None

1. A web mining may involve: *slides p12 A. Data cleaning B. Data exploration C. Decision making D. Data presentation

1. In what step of the knowledge discover process do we explore and clean the data? a. Preprocessing b. Deployment c. Evaluation d. Modeling

2. Regression is a descriptive function of data mining. TrueFalse

3. Which are examples of anomaly detection? a. Credit card fraud detection b. Industrial damage detection c. Intrusion detection d. All of the above

The key challenges faced by distributed data mining algorithms include: Intro to data mining - Chapter 1.2 - Page 5 a) reduce the amount of communication needed to perform the distributed computation b) how to effectively consolidate the data mining results obtained from multiple source c) how to address data security issue d) all of the above

What is a commonly used approach for selecting a subset of the data objects to be analyzed? a)Aggregation b)Sampling c)Feature creation d)Dimensionality Reduction

What is an Attribute? Han Chapter 2, Page 40 A. A sample should be representative of the entire dataset. B. An attribute is a data field, representing a characteristic or feature of a data object. C. A scatter plots can be extended to n attributes. D. Small icon to represent multidimensional data values.

What is noise in data? (Han page 89) a) Random error or variance in a measured variable. b) Statistical data c) Irrelevant data d) None of the above

What is the attribute type of calendar dates? A. Nominal B. Ordinal C. Interval D. Ratio

Which of the following is NOT a data mining function? A. Serialization B. Classification C. Generalization D. Association and correlation analysis

Which of the following is NOT a step in data mining? A. Obtain the data set to be used in the analysis B. Explore, clean, and preprocess the data. C. Reduce the data if necessary. D. Recycle the data.

Which of the following is NOT in the knowledge discovery(KDD) process? A. Database B. Data storehouse C. Data warehouse D. Data mining

Which of the following is Quantitative Attribute? A) Zip Codes B) ID Numbers C) Gender D) Length

Which of the following is a Qualitative Attribute? A) Age B) Temperature C) Length D) ID Numbers

Which of the following is a continuous variable? A. Zip code B. Social security number C. Student ID number D. Temperature

Which of these do NOT fall in the knowledge discovery process? (a) Data transformation (b) Data selection (c) Data presentation (d) Data cleaning

Which of these do data mining fall into? (a) Statistics (b) Machine Learning (c) Pattern Recognition (d) None of the above

Which is NOT an attribute type? Han Chapter 2, Page 41-44 A. Nominal B. Binary C. Ordinal D. BoxPlots

Which of the following are not general techniques for icon-based visualization? (Lecture #2, pg. 33) a) Shape coding b) Stick figures c) Color icons d) Tile bars

) Anomaly detection is the task of identifying observations whose characteristics are significantly different from the rest of the data. Such observations are known as: Intro to data mining - Chapter 1.5 - Page 11 a)anomalies b)outliers c)both

2. A relational database is a collection of tables which consist of which of the following: a. Attributes (rows/records) & Tuples (columns/fields) b. Keys (rows) & Values (columns) c. Attributes (columns/fields) & Tuples (rows/records) d. Attributes (columns/fields) & Keys (rows)

2. Data Warehouse is a type of data source. According to the book, a data warehouse: (Han, page 10) a) Is a repository of information collected from single sources b) Is not constructed via process of data cleaning c) Is designed to store details of individual transactions d) Is a repository of information collected from multiple sources

2. In which data, the value of each component is the number of times the corresponding term occurs? *slides p21 A. Data matrix B. Document data C. Transaction data D. Graph data

2. What is Data Mining and Knowledge Discovery? Tan Chapter 1, Page 3 A. Data Mining is an integral part of knowledge discovery in databases (KDD) - which is the overall process of converting raw data into useful information. B. An example is visualization which allows analysts to explore the data and the data mining results from a variety of viewpoints. C. Handle massive data set, then must be scalable. D. The implementation of novel data structures to access individual records in an efficient manner.

2. What is not data mining? *slides p9 A. KDD process B. Group together similar documents returned by search engine according to the context C. Query a Web search engine for information D. Certain names are more prevalent in certain locations

2. Which of the following is NOT a type of attribute? a. Nominal b. Ordinal c. Interval d. Visual

During this process, transformations are applied in order to obtain a reduced or "compressed" representation of the original data. a. Dimensionality reduction compression b. data cleansing c. numerosity reduction d. data

John Tukey created Exploratory Data Analysis (EDA) in the 1970's.TrueFalse

The attribute to be predicted is commonly known as the target or dependent variable. (T/F)

The combining of two or more objects into a single object is: a)Aggregation b)Sampling c)Feature creation d)Dimensionality Reduction

What are general characteristics of data sets (a) Dimensionality (b) Sparsity (c) Resolution (d) All of the above

(1) Which one is NOT one of the major tasks in data preprocessing? A. Data reduction B. Data normalization C. Data reduction D. Data visualization

(1) Which statement is correct about "data rich but information poor situation"? (Chapter page number: HAN Ch1. p5) A. The abundance of data, coupled with the need for powerful data analysis tools B. The fast-growing, tremendous amount of data, collected and stored in large and numerous data repositories C. The data we collect and stored today has far exceeded our human ability for comprehension without powerful tools. D. All of the above.

(2) What is the final step of knowledge discovery? A. Knowledge presentation B. Data mining C. Data cleaning D. Pattern evaluation

(3) Which one is NOT the alternative name of Data mining? A. Knowledge discovery in databases (KDD) B. Data dredging C. Knowledge harvesting D. Business intelligence

1. In data mining, tasks are usually divided into two major categories, which are: a. Intensive tasks & Non-Intensive tasks b. Predictive tasks & Descriptive tasks c. Dependent & Independent tasks d. None of the above

2. Although in large databases where 'less can be more', how would aggregation become an issue? a. There should be no issue when aggregating b. This form of data processing should not be used on large databases overall c. Figuring which way to combine the values of the attributes d. The data becomes less useful and more complicated to work with

2. Data Mining is an integral part of _____ a. Knowledge Discovery in Databases (KDD) b. Classification c. Regression Modeling d. Learning

3. Which attribute type would 'grades' fall into? a. Nominal b. Ordinal c. Interval d. Ratio

Some ideas that data mining draws upon: Intro to data mining - Chapter 1.3 - Page 6 a)Search algorithms b)Modeling Techniques c)Sampling d)All of the above

The purpose of preprocessing is to transform the raw input data into an appropriate format for subsequent analysis.TrueFalse

CIS3920 test b.ank 1

Set pelajaran terkait

Quiz - Chapter 14 - Socio

Unidad 2 La célula- Teoría celular

Course 2 Medical Terminology

Urinary system

Cantonese 2-2

Exec Leadership Lect 3

Civil Rights: Jim Crow Era

BSN 246 Week 3 HESI Prep

When

Pharm - Neurological medications

Ch. 5: Public Spending and Public Choice

Shankman Final

honors chemistry: 9.3, 11.1, & 11.2

MIS Supply Chain Management

EC-018

RE FINANCE

Domain 3 - Injury and Care (Including Legal)

Unit 2 - remembering and reflecting on the holocaust

Midterm 2

Quiz 7 Ch. 16