Data Mining Test # 1
Data transformation techniques
min normalization - max normalization - z score normalization - decimal scaling normalization -
How to use the binning method to handle noisy data?
Equal-width binning - Equal-depth binning -
the major steps of KDD (knowledge discovery from databases) or data mining process
Input data- Collection of data objects and their attributes Data preprocessing - Data extraction, cleaning, and transformation comprises the majority of ,the work of building a data warehouse Data Mining - Postprocessing - Information -
Differences between Data mining and traditional statistical methods
Statistics is the traditional field that deals with the quantification, collection, analysis, interpretation, and drawing conclusions from data. Data mining is an interdisciplinary field that draws on computer sci- ences (data base, artificial intelligence, machine learning, graphical and visualization models), statistics and engineering (pattern recognition, neural networks).
The major tasks of data preprocessing
Data Cleaning - fill in missing values, identify outliers and smooth out noisy data, correct inconsistent data, resolve redundancy caused by data integration. Missing data - Ignore the tuple: usually done when class label is missing (assuming the tasks in classification not effective when the percentage of missing values per attribute varies considerably, Fill in it automatically with (global constant , attribute mean). Noisy data - random error or variance in a measured variable, Incorrect attribute values may due to, Other data problems which requires data cleaning.
Data Mining
Non-trivial extraction of implicit, previously unknown and potentially useful information from data. Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns to support future decisions
data mining tasks
Predictive Methods - Use some variables to predict unknown or future values of other variables. Descriptive Methods - Find human interpretable patterns/rules that describe the data.
Differences between Data mining and Database query processing
Query Tools - are tools that help analyze the data in a database. They provide query building, query editing, searching, finding, reporting and summarizing functionalities. Data mining - extraction of previously unknown and interesting information from raw data, utilize statistical models to look for hidden patterns in data. Data miners are interested in finding useful relationships between different data elements.
data preprocessing
to transform the raw input data into an appropriate format for subsequent analysis.