Data Mining: What is data mining?
elements of data mining
extract, store, provide, analyze, present.
data
facts, numbers or text that can be processed by a computer. the amount and variety is huge
continuous innovation: example
grocery chain. oracle to find local buying patterns. bought diapers and beer. when they did weekly shopping. when they rarely shopped. made an insight on buying beer for the coming week.
MPP
massively parallel processors have order of magnitude in improvements in query time.
types of data
operational: transactional data. sales, costs, inventory, payroll and accounting. nonoperational data: industry sales, forecasts, macro economic. meta data: data about the data. logical database design or data dictionary.
information
patterns, associations, relationships of data makes this.
internal factors
price, product positioning, staff skills.
nearest neighbor method
a level of data analysis. classifies each record in a dataset based on a combination of the classes of the k records.
rule induction
a level of data analysis. extraction of useful if-then rules from data based on statistics.
artificial neural networks
a level of data analysis. non-linear predictive models learn by training and look like biology neural networks.
genetic algorithms
a level of data analysis. optimization techniques that use processes such as genetic combination, mutation, natural selection in a design based on the concepts of natural evolution..
decision trees
a level of data analysis. tree-shaped structures that represent sets of decisions. rules for dataset classification. CART. CHAID. urels to apply to new data to predict outcomes.
data visualization
a level of data analysis. visual interpretation of complex relationships in multidimensional data. graphic tools.
walmart and datamining
a pioneer. transform supplier relationship. point of sale transactions captured and sent to warehouse suppliers access and analyze. suppliers find buying patterns. local improved.
data warehouses
a process of centralized data management and retrieval. central repository of organizational data.
what allows warehouses
advances in data capture, processing power, data transmission, storage abilities. allows integration of databases into these
store
an element of data mining. manage the data in multidimensional systems.
analyze
an element of data mining. the data by application software.
levels of data analysis
artificial intelligence networks. genetic algorithms. decision trees. nearest neighbor method. rule induction. data visualization
types of data relationships
classes. clusters. associations. sequential patterns.
who uses data mining
companies with a strong consumer focus. retail, financial, communication, marketing.
associations
data can be mined to identify associations. beer diaper correlation
sequential patterns
data is mined to anticipate behavior patterns and trends.
clusters
data item are grouped according to logical relations or customer preferences. market segments.
data mining
date or knowledge discovery. is the process of analyzing data from different perspectives and summarizing it into useful info.
external factors
economic indicators. competition. customer demographics.
useful information
information that can increase revenue, cut costs or both.
how does mining work
it analyzes relationships and patterns in stored transaction data based on open-ended user queries. there are four types of relationships.
what data mining does
allows companies to determine relationships of internal factors to external factors. determine the impact on sales, satisfaction and profits. drill down into summary info to view transactional data.
provide
an element of data mining. data access to business analysts and information technology professionals
present
an element of data mining. the data in a useful format such as a graph or table.
extract
an element of data mining. transform and load transaction data onto the warehouse system.
data mining software
analytical tool to use for analyzing data. analyze from different dimensions, angles, categorize it and summarize relationships.
knowledge
a conversion of information. based on historical patterns or future trends. summary information. used to make decisions.
classes
stored data is used to locate data in predetermined groups.
data mining and fundamentals
technically it is the process of finding correlations or patterns among dozens of fields in a large relational database.
continuous innovation
technology of mining is not new. computer processing power, disk storage and statistical software are increasing the accuracy of data analysis and lowering costs.
query complexty
the more complex the queries and the greater the number of queries being processed the more powerful the system need be
size of the database
the more data being processed and maintained the more power an infrastructure system is needed
centralization of data
used to maximize user access and analysis. users can access the data freely.