Ch 1- Intro to Data Mining

Ace your homework & exams now with Quizwiz!

Estimation

We approximate the value of a numeric target variable using a set of numeric and or categorical predictor variables.

Modeling Phase

Select and apply appropriate modeling techniques. Calibrate model settings to optimize results. Often, several different techniques may be applied for the same data mining problem. May require looping back to data preparation phase, in order to bring the form of the data into line with the specific requirements of a particular data mining technique.

Prediction

Similar to classification and estimation except the results lie in the future.

Exploratory Data Analysis

A geographical method of exploring the data in search of patterns and trends.

CRISP-DM

Cross-Industry Standard Process for Data Mining, provides a non-proprietary and freely available standard process for fitting data mining into the general problem solving strategy of a business or research unit. It has 6 phases- Business Research/Understanding Phase, Data Understanding Phase, Data Preparation Phase, Modeling Phase, Evaluation Phase, Deployment Phase.

Deployment Phase

Model creation does not signify the completion of the project. Need to make use of created models. Example of a simple deployment; Generate a report. Example of a more complex deployment: Implement a parallel data mining process in another department. For businesses, the customer often carries out the deployment based on your model.

Description

Describing patterns and trends lying within the data. These models should be as transparent as possible and describe clear patters that are amenable to intuitive interpretation and explanation.

Most Common Data Mining Tasks

Description, Estimation, Prediction, Classification, Clustering, Association

Business/Research Understanding Phase

First, clearly enunciate the project objectives and requirements in terms of the business or research unit as a whole. Then, translate these goals and restrictions into the formulation of a data mining problem definition. Finally, prepare a preliminary strategy for achieving these objectives.

Data Understanding Phase

First, collect the data. Then, use exploratory data analysis to familiarize yourself with the data, and discover initial insights. Evaluate the quality of the data. Finally, select interesting subsets that may contain actionable patterns.

Evaluation Phase

The modeling phase has delivered one or more models. These models must be evaluated for quality and effectiveness, before we deploy them for use in the field. Determine whether the model in fact achieves the objectives set for it in phase 1. Establish whether some important facet of the business or research problem has not been sufficiently accounted for. Finally, come to a decision regarding the use of the data mining results.

Data Mining

The process of discovering useful patterns and trends in large data settings. The process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules.

Data Mining Fallacies

There are automatic data mining tools, data mining is an autonomous processes requiring little human input, data mining pays for itself quickly, data mining software packages are intuitive and easy to use, data mining will identify the causes of our business or research problems, data mining will automatically clean up our messy database, data mining always provides positive results.

Data Preparation Phase

This labor-intensive phase covers all the aspects of preparing the final data set, which shall be used for subsequent phases, from the initial, ray, dirty data. Select the cases and variables you want to analyze, and that are appropriate for your analysis. Perform transformations on certain variables if needed. Clean the raw data so that it is ready for the modeling tools.


Related study sets

CHAPTER 2 CSE: smartphones are really smart

View Set

AP Macroeconomics Final - Winter 2017

View Set

تست اصول فقه شهبازی

View Set

Contractor/Supervisor Refresher (3)

View Set

Wireless Cryptographic Protocols

View Set

[3/6] 26% Life (Policy riders, Provisions, Options, and Exclusions)

View Set