Ch 1- Intro to Data Mining
Estimation
We approximate the value of a numeric target variable using a set of numeric and or categorical predictor variables.
Modeling Phase
Select and apply appropriate modeling techniques. Calibrate model settings to optimize results. Often, several different techniques may be applied for the same data mining problem. May require looping back to data preparation phase, in order to bring the form of the data into line with the specific requirements of a particular data mining technique.
Prediction
Similar to classification and estimation except the results lie in the future.
Exploratory Data Analysis
A geographical method of exploring the data in search of patterns and trends.
CRISP-DM
Cross-Industry Standard Process for Data Mining, provides a non-proprietary and freely available standard process for fitting data mining into the general problem solving strategy of a business or research unit. It has 6 phases- Business Research/Understanding Phase, Data Understanding Phase, Data Preparation Phase, Modeling Phase, Evaluation Phase, Deployment Phase.
Deployment Phase
Model creation does not signify the completion of the project. Need to make use of created models. Example of a simple deployment; Generate a report. Example of a more complex deployment: Implement a parallel data mining process in another department. For businesses, the customer often carries out the deployment based on your model.
Description
Describing patterns and trends lying within the data. These models should be as transparent as possible and describe clear patters that are amenable to intuitive interpretation and explanation.
Most Common Data Mining Tasks
Description, Estimation, Prediction, Classification, Clustering, Association
Business/Research Understanding Phase
First, clearly enunciate the project objectives and requirements in terms of the business or research unit as a whole. Then, translate these goals and restrictions into the formulation of a data mining problem definition. Finally, prepare a preliminary strategy for achieving these objectives.
Data Understanding Phase
First, collect the data. Then, use exploratory data analysis to familiarize yourself with the data, and discover initial insights. Evaluate the quality of the data. Finally, select interesting subsets that may contain actionable patterns.
Evaluation Phase
The modeling phase has delivered one or more models. These models must be evaluated for quality and effectiveness, before we deploy them for use in the field. Determine whether the model in fact achieves the objectives set for it in phase 1. Establish whether some important facet of the business or research problem has not been sufficiently accounted for. Finally, come to a decision regarding the use of the data mining results.
Data Mining
The process of discovering useful patterns and trends in large data settings. The process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules.
Data Mining Fallacies
There are automatic data mining tools, data mining is an autonomous processes requiring little human input, data mining pays for itself quickly, data mining software packages are intuitive and easy to use, data mining will identify the causes of our business or research problems, data mining will automatically clean up our messy database, data mining always provides positive results.
Data Preparation Phase
This labor-intensive phase covers all the aspects of preparing the final data set, which shall be used for subsequent phases, from the initial, ray, dirty data. Select the cases and variables you want to analyze, and that are appropriate for your analysis. Perform transformations on certain variables if needed. Clean the raw data so that it is ready for the modeling tools.