Data mining test study guide
discrimination
a comparison of the general features of the target class data object against the general features of object from one or multiple contrasting classes
characterization
a summarization of the general characteristic or features of a target class of data
data qualities
accuracy completeness consistency believably timeliness interpretability
two reasons why data mining is popular ow and wasn't as popular 20 years ago
advancement in technology in technology (computerization of the society) powerful data collection and storage tools
data mining
an essential process where intelligent methods are applied to extract data patterns
what are the data mining functionalities
characterization discrimination classification regression clustering analysis association correlation
database
collection of interrelated data that are measured by specialized system known as database management system (DBMS)
five summary of a distribution
consisting of the median Q2, the quartiles (Q1 and Q3), and the smallest and largest individual observation
What is the steps of a process of knowledge discovery (KDD)?
data cleaning data integration data selection data transformation data mining data evaluation knowledge presentation
association
deciding which value are related to each other
what are some procedure of handling missing values
getting rid of valuable data from the data set creating an incomplete data set
interpretability
if the data easy to understand or not
accuracy
if the value are right or wrong, it is accurate or not
consistency
if the value is consistent with other values or inconsistent
completeness
if the values are recorded or not, is it available or not
believably
if the values are trustworthy or not, can we trust the data and the data source
timeliness
if the values will be able to process on time or not, updatability
confidence in an association rule
it gives the level of certainty or chance that is probable of the association
support in an association rule
it gives the percentage of the actual association found in the target data set
correlation
predicting future values based on current trends
data warehouse
repositories of information from multiple sources and stored under a unified schema of a site
attributes
represent characteristics of those objects
data object
represent entities
what is minimum confidence threshold
same definition as minimum support threshold
outlier analysis
studying value s that are separated from a class label in order to explain why it occurred
target data set
the class of data under study
what is minimum support threshold
the minimum value that is required for a support to achei
What is data mining
the process of discovering interacting patterns and knowledge from large amount of data
classification
the process of finding a model that describes and distinguishes data classes or concepts
data evaluation
to identify the truly interesting patterns representing knowledge based on interesting measures
why do we pre-process data
to reduce redundancy of using same data on the data set to save time during the data analysis phase of data mining to handle incomplete data set to find possible data to replace the missing data clean noisy data from data set
data cleaning
to remove noise and inconsistent data
clustering
used to generate class labels for a group of data
regression
used to prediction missing or unavailable numerical data values rather than class labels
knowledge presentation
were visualization and knowledge representation techniques are used to present mined knowledge to users
data transformation
where data are transformed and consolidated into forms appropriate from mining by performing summary or aggregation operation
data selection
where data relevant to the analysis task are retrieved from the database
data integration
where multiple data source may be combined