Lecture 1
Spatial Data Examples (7 Other Kinds of Data)
- Maps.
Examples of Scientific and Engineering Science Data Sources (4 Major Sources of Abundant Data)
- Remote sensing. - Process measuring. - Scientific experiments. - System performance. - Engineering observations. - Environment surveillance.
2 Data Mining Applications
1) Business Intelligence 2) Web Search Engines
semantic, entity-relationalship
A ________ data model such as an ______-______________ (ER) data model is often constructed for relational databases.
Data Warehouse
A repository of information collected from multiple sources, stored under a unified schema, and usually residing at a single site.
Data Mining (Def. 1)
Automated analysis of massive data sets.
Text Mining (3 Kinds of Data Mining)
Customer sentiment analysis, aspect-based mining, identifying trends.
meaningful
Data mining can be applied to any kind of data as long as the data are __________ for a target application.
strides
Data mining has and will continue to make great _______ in our journey from the data age toward the coming information age.
multidimensional, cube, attribute, schema
Data warehouse is usually modeled by a ________________ data structure, called a data ____, in which each dimension corresponds to an _________ or a set of attributes in the ______, and each cell stores the value of some aggregate measure such as count or sum sales amount.
cleaning, integration, transformation, loading, refreshing
Data warehouses are constructed via a process of data ________, data ___________, data ______________, data _______, and periodic data __________.
interrelated
Database management system (DBMS) consists of a collection of ____________ data.
Web Mining (3 Kinds of Data Mining)
It can help us learn about the distribution of information on the WWW in general, characterize and classify web pages, and uncover web dynamics and the association and other relationships among different web pages, users, communities, and web-based activities.
Transactional Database
It captures a transaction, such as a customer's purchase, a flight booking, or a user's clicks on a web page.
Multimedia Data Mining (3 Kinds of Data Mining)
It includes image data and video data mining.
Business Intelligence (2 Data Mining Applications)
It involves effective market analysis, compare customer feedback on similar products, discover the strengths and weaknesses of their competitors, retain highly valuable customers, and make smart business decisions.
Multidimensional Data Mining (Exploratory Multidimensional Data Mining)
It performs data mining in multidimensional space in an OLAP style.
Data Cube
It provides a multidimensional view of data and allows the precomputation and fast access of summarized data.
Machine Learning (Difference Between Data Mining and Machine Learning)
The purpose of (Data Mining / Machine Learning) is construct complete autonomous learning system.
Web Data Examples (7 Other Kinds of Data)
- A huge, widely distributed information repository made available by the Internet.
Interactive Mining (User Interaction)
- Build flexible user interfaces and an exploratory mining environment, facilitating the user's interaction with the system. - Allow users to dynamically change the focus of a search, to refine mining requests based on returned results.
Incorporation of background knowledge (User Interaction)
- Constraints, rules, and other information regarding the domain under study should be incorporated into the knowledge discovery process. - Such knowledge can be used for pattern evaluation as well as to guide the search toward interesting patterns.
Handling Uncertainty, Noise, or Incompleteness of Data (Mining Methodology)
- Data often contain noise, error, exceptions, or uncertainty, or are incomplete. - Confuse the data mining process, leading to the derivation of erroneous patterns. - Examples: Data cleaning, data preprocessing, outlier detection and removal, and uncertainty reasoning.
Mining Dynamic, Networked, and Global Data Repositories (Diversity of Database Types)
- Discovery of knowledge from different sources of structured, semi-structured, or unstructured. - Yet interconnected data with diverse data semantics poses great challenges to data mining.
Privacy-Preserving Data Mining (Data Mining and Society)
- Help scientific discovery, business management, economy recovery, and security protection (e.g., the real-time discovery of intruders and cyberattacks). - However, it poses the risk of disclosing an individual's personal information.
Time-Related or Sequence Data Examples (7 Other Kinds of Data)
- Historical records. - Stock exchange data. - Time-series and biological sequence data.
Presentation and Visualization of Data Mining Results (User Interaction)
- How can data mining system present data mining results easily understood and directly usable by humans? - Crucial if the data mining process is interactive. - It requires the system to adopt expressive knowledge representations, user-friendly interfaces, and visualization techniques.
Social Impacts of Data Mining (Data Mining and Society)
- How can we use data mining technology to benefit society? - How can we guard against its misuse?
Invisible Data Mining (Data Mining and Society)
- Invisible data mining by incorporating data mining into their components to improve their functionality and performance. - For example, when purchasing items online, users may be unaware that the store is likely collecting data on the buying patterns of its customers, which may be used to recommend other items for purchase in the future.
Examples of Medical and Health Industry Data Sources (4 Major Sources of Abundant Data)
- Medical records. - Patient monitoring and medical imaging.
Parallel, Distributed, and Incremental Mining Algorithms (Efficiency and Scalability)
- Motivation factors to develop parallel and distributed data-intensive mining algorithms. -> Huge size of many data sets, the wide distribution of data, and the computational complexity. - Cloud and cluster computing.
Handling Complex Types of Data (Diversity of Database Types)
- New data types, from structured data such as relational and data warehouse data to semi-structured and unstructured data. - Temporal data, biological sequences, sensor data, spatial data, hypertext data, multimedia data, software program code, Web data, and social network data.
Examples of Communities and Social Media Data Sources (4 Major Sources of Abundant Data)
- News. - YouTube. - Digital pictures and videos. - Blogs. - Web communities. - Various kinds of social networks.
Pattern Evaluation and Pattern or Contraint-Guided Mining (Mining Methodology)
- Not all the patterns generated by data mining processes are interesting. - Techniques are needed to assess the interestingness of discovered patterns based on subjective measures.
Efficiency and Scalability of Data Mining Algorithms (Efficiency and Scalability)
- Running time of a data mining algorithm must be predictable, short, and acceptable by applications. - Efficiency, scalability, performance, optimization, and the ability to execute in real time are key criteria that drive the development of many new data mining algorithms.
Examples of Business Data Sources (4 Major Sources of Abundant Data)
- Sales transactions. - Stock trading records. - Product descriptions. - Sales promotions. - Company profiles and performance. - Customer feedback.
Graph and Networked Data Examples (7 Other Kinds of Data)
- Social and information networks.
Technologies Used in Data Mining (Figure)
- Statistics - Machine learning - Pattern recognition - Visualization - Algorithms - High-performance computing - Applications - Information retrieval - Data warehouse - Database systems
Hypertext and Multimedia Data Examples (7 Other Kinds of Data)
- Text. - Image. - Video. - Audio data.
Engineering Design Data Examples (7 Other Kinds of Data)
- The design of buildings. - System components. - Integrated circuits.
Data Streams Examples (7 Other Kinds of Data)
- Video surveillance and sensor data, which are continuously transmitted.
Boosting the Power of Discovery in a Networked Environment (Mining Methodology)
- Web, database relations, files, or documents. - Data objects reside in a linked or interconnected environment. - Semantic links across multiple data objects can be used to advantage in data mining.
4 Major Sources of Abundant Data
1) Business. 2) Scientific and Engineering Science. 3) Medical and Health Industry. 4) Communities and Social Media.
Predictive Data Mining Functionalities (2 Types of Data Mining Functionalities)
1) Classification 2) Regression
7 Steps in KDD (Knowledge Discovery in Databases)
1) Data Cleaning. 2) Data Integration. 3) Data Selection. 4) Data Transformation. 5) Data Mining. 6) Pattern Evaluation. 7) Knowledge Presentation.
2 Types of Data Mining Functionalities
1) Descriptive Data Mining Functionalities 2) Predictive Data Mining Functionalities
Descriptive Data Mining Functionalities (2 Types of Data Mining Functionalities)
1) Frequent Pattern Mining 2) Correlation 3) Clustering 4) Outlier Detection
6 Major Issues in Data Mining
1) Mining Methodology 2) User Interaction 3) Efficiency 4) Scalability 5) Diversity of Data Types 6) Data Mining and Society
3 Kinds of Data Mining
1) Text Mining. 2) Multimedia Data Mining. 3) Web Mining.
10 Data Sources (Types)
1- Databases. 2- Data warehouses. 3- Transactional data. 4- Data streams. 5- Ordered/sequence data or graphs. 6- Networked data. 7- Spatial data. 8- Text data. 9- Multimedia data. 10 - WWW.
3 Points of Search Engines
1- They pose grand challenges to data mining. 2- They often have to deal with online data. 3- They often have to deal with queries that are asked only a very small number of times.
7 Other Kinds of Data
1- Time-related or sequence data. 2- Data streams. 3- Spatial data. 4- Engineering design data. 5- Hypertext and multimedia data. 6- Graph and networked data. 7- Web.
Relational Database
A collection of tables, each of which is assigned a unique name.
Database Management System (DBMS)
A set of software programs to manage and access the data.
related
A transactional database may have additional tables which contain other information _______ to the transactions, such as item description, information about the salesperson or the branch, and so on.
Data Mining (7 Steps in KDD)
An essential process where intelligent methods are applied to extract data patterns.
reduction
Data ________ may also be performed to obtain a smaller representation of the original data without sacrificing its integrity.
database, web, computerized
Data availability is in ________ systems, ___, ____________ society.
automated
Data collected can be done by _________ data collection tools.
Mining Various and New Kinds of Knowledge (Mining Methodology)
Diversity of applications and new mining tasks continue to emerge, making data mining a dynamic and fast-growing field.
attributes, tuples
Each table consists of a set of __________ (columns or fields) and usually stores a large set of ______ (records or rows).
object, unique key
Each tuple in a relational table represents an ______ identified by a ______ ___ and described by a set of attribute values.
Data Mining (Def. 2)
Extraction of interesting non-trivial, implicit, previously unknown, and potentially useful pattern or knowledge from huge amount of data.
Ad Hoc Data mining and Data Mining Query Languages (User Interaction)
High-level data mining query languages or other high-level flexible user interfaces will give uses the freedom to define ad hoc data mining tasks.
Video Data Mining (Multimedia Data Mining)
Hockey game, we can detect video sequences corresponding to goals.
Image Data Mining (Multimedia Data Mining)
Identifying objects and classifying them by assigning semantic labels or tags.
Example 1 of Data Mining - An Interdisciplinary Effort
Mining data with natural language text. - Fuse data mining methods with methods of information retrieval and natural language processing.
Example 2 of Data Mining - An Interdisciplinary Effort
Mining of software bugs in large programs. - This form of mining, known as bug mining, benefits from the incorporation of software engineering knowledge into the data mining process.
patterns, knowledge
Multidimensional data mining allows the exploration of multiple combinations of dimensions at varying levels of granularity in data mining, and thus has greater potential for discovering interesting ________ representing _________.
Mining Knowledge in Multidimensional Space (Mining Methodology)
Searching for knowledge in large data sets, explore the data in multidimensional space. - Searching for interesting patterns among combinations of dimensions (attributes) at varying levels of abstraction. - In many cases data can be aggregated or viewed as multidimensional data cube. - Mining knowledge in cube space can substantially enhance the power and flexibility of data mining.
Mining Relational Databases
Searching for trends or data patterns.
transformation, consolidation, selection, warehousing
Sometimes data ______________ and _____________ are performed before the data _________ process, particularly in the case of data ___________.
Web Search Engines (2 Data Mining Applications)
Specialized computer servers that search for information on the Web.
information
The necessity of ___________ has led to the birth of data mining.
Data Mining - An Interdisciplinary Effort (Mining Methodology)
The power of data mining can be substantially enhanced by integrating new methods from multiple disciplines.
Data Mining (Difference Between Data Mining and Machine Learning)
The purpose of (Data Mining / Machine Learning) is turn raw data into useful information.
mining
The term data ______ is often used to refer to the entire knowledge discovery process.
terabytes, petabytes
There is an explosive growth of data from _________ to _________.
Pattern Evaluation (7 Steps in KDD)
To identify the truly interesting patterns representing knowledge based on interestingness measures.
Data Cleaning (7 Steps in KDD)
To remove noise and inconsistent data.
data, information
We are actually living in the ____ age, moving towards the ____________.
Data Selection (7 Steps in KDD)
Where data relevant to the analysis task are retrieved from database.
Data Transformation (7 Steps in KDD)
Where data transformed and consolidated into forms appropriate for mining by performing summary or aggregation operations.
Data Integration (7 Steps in KDD)
Where multiple data sources may be combined.
Knowledge Presentation (7 Steps in KDD)
Where visualization and knowledge representation techniques are used to present mined knowledge to users.
Efficiency and Scalability
a) Efficiency and scalability of data mining algorithms. b) Parallel, distributed, and incremental mining algorithms.
Diversity of Database Types
a) Handling complex types of data. b) Mining dynamic, networked, and global data repositories.
User Interaction
a) Interactive mining. b) Incorporation of background knowledge. c) Ad hoc data mining and data mining query languages. d) Presentation and visualization of data mining results.
Mining Methodology
a) Mining various and new kinds of knowledge. b) Mining knowledge in multidimensional space. c) Data mining - an interdisciplinary effort. d) Boosting the power of discovery in a networked environment. e) Handling uncertainty, noise, or incompleteness of data. f) Pattern evaluation and pattern or constraint-guided mining.
Data Mining and Society
a) Social impacts of data mining. b) Privacy-preserving data mining. c) Invisible data mining.