Chapter 6
Attributes
(also called columns or fields) are the data elements associated with an entity.
entity
(also referred to as a table) stores information about a person, place, thing, transaction, or event.
INFORMATION CLEANSING OR SCRUBBING
A process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
Prediction
A statement about what will happen or might happen in the future, for example, predicting future sales or employee turnover.
Regression
A statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables when the focus is on the relationship between a dependent variable and one or more independent variables.
Optimization
A statistical process that finds the way to make a design, system, or decision as effective as possible, for example, finding the values of controllable variables that determine maximal productivity or minimal waste
Cluster analysis
A technique used to divide information sets into mutually exclusive groups such that the members of each group are as close as possible to one another and the different groups are as far apart as possible. Cluster analysis segments customer information to help organizations identify customers with similar behavioral traits, such as clusters of best customers or onetime customers. Cluster analysis also can uncover naturally occurring patterns in information. A great example of using cluster analysis in business is to create target-marketing strategies based on zip codes. Evaluating customer segments by zip code allows a business to assign a level of importance to each segment. Zip codes offer valuable insight into such things as income levels, demographics, lifestyles, and spending habits. With target marketing, a business can decrease its costs while increasing the success rate of the marketing campaign.
Market basket analysis
Analyzes such items as websites and checkout scanner information to detect customers' buying behavior and predict future behavior by identifying affinities among customers' choices of products and services. Market basket analysis is frequently used to develop marketing campaigns for cross-selling products and services (especially in banking, insurance, and finance) and for inventory control, shelf product placement, and other retail and marketing applications.
Social media analytics
Analyzes text flowing across the Internet, including unstructured text from blogs and messages.
Web analytics
Analyzes unstructured data associated with websites to identify consumer behavior and website navigation
Text analytics
Analyzes unstructured data to find trends and patterns in words and sentences. Text mining a firm's customer support email might identify which customer service representative is best able to handle the question, allowing the system to forward it to the right person.
CLASSIFICATION
Assigns records to one of a predefined set of classes
DATA MART
Contains a subset of data warehouse information
Easy to store large amounts of data
Data-driven websites can keep large volumes of information organized. Website owners can use templates to implement changes for lay- outs, navigation, or website structure. This improves website reliability, scalability, and performance.
Easy to eliminate human errors
Data-driven websites trap data-entry errors, eliminating inconsistencies while ensuring that all information is entered correctly
ESTIMATION
Determines values for an unknown continuous variable behavior or estimated future value
AFFINITY GROUPING
Determines which things go together
Inconsistent Data Definitions
Every department had its own method for recording data so when trying to share information, data did not match and users did not get the data they really needed.
Lack of Data Standards
Managers needed to perform cross-functional analysis using data from all departments, which differed in granularities, formats, and levels
Ineffective Direct Data Access
Most data stored in operational databases did not allow users direct access; users had to wait to have their queries or questions answered by MIS professionals who could code SQL
Association detection
Reveals the relationship between variables along with the nature and frequency of the relationships. Many people refer to association detection algorithms as association rule generators because they create rules to determine the likelihood of events occurring together at a particular time or following each other in a logical progression. Percentages usually reflect the patterns of these events. For example, "55 percent of the time, events A and B occurred together," or "80 percent of the time that items A and B occurred together, they were followed by item C within three days."
CLUSTERING
Segments a heterogeneous population of records into a number of more homogeneous subgroups
Poor Data Quality
The data, if available, were often incorrect or incomplete. Therefore, users could not rely on the data to make decisions
DATA MINING
The process of analyzing data to extract information not offered by the raw data alone
Speech analytics
The process of analyzing recorded calls to gather information; brings structure to customer interactions and exposes information buried in customer contact center interactions with an enterprise. Speech analytics is heavily used in the customer service department to help improve processes by identifying angry customers and routing them to the appropriate customer service representative.
Forecasting
Time-series information is time-stamped information collected at a particular frequency. Formally defined, forecasts are predictions based on time-series information. Examples of time-series information include web visits per hour, sales per month, and calls per day. Forecasting data-mining tools allow users to manipulate the time series for forecasting activities
Inadequate Data Usefulness
Users could not get the data they needed; what was collected was not always useful for intended purposes
Easy to manage content
Website owners can make changes without relying on MIS professionals; users can update a data-driven website with little or no training
data artist
a business analytics specialist who uses visual tools to help people understand complex data
Big data
a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools
record
a collection of related data elements
Structured data
a defined length, type, and format and includes numbers, dates, or strings such as Customer Address
primary key
a field (or group of fields) that uniquely identifies a given record in a table
data warehouse
a logical collection of information, gathered from many operational databases, that supports business analysis activities and decision-making tasks.
Information integrity
a measure of the quality of information
foreign key
a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
Extraction, transformation, and loading (ETL)
a process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse
Information cleansing or scrubbing
a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information
relational database management system
allows users to create, read, update, and delete data in a relational data- base
dynamic catalog
an area of a website that stores information about products in a database
data-driven website
an interactive website kept constantly updated and relevant to the needs of its customers using a database
structured query language (SQL)
asks users to write lines of code to answer questions against a database
data dictionary
compiles all of the metadata about the data elements in the data model
Machine-generated data
created by a machine without human intervention. Machine- generated structured data includes sensor data, point-of-sale data, and web log (blog) data
database management system (DBMS)
creates, reads, updates, and deletes data in a database while controlling access and security.
Dynamic information
data that change based on user actions.
Human-generated data
data that humans, in interaction with computers, generate. Human-generated structured data includes input data, click-stream data, or gaming data.
physical view of information
deals with the physical storage of information on a storage device
business rule
defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer
Data visualization
describes technologies that allow users to see or visualize data to transform information into a business perspective
Metadata
details about data
data quality audits
determine the accuracy and completeness of its data
Business-critical integrity constraints
enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints.
Dirty data
erroneous or flawed data
data scientist
extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information
Advanced analytics
focuses on forecasting future trends and producing insights using sophisticated quantitative methods, including statistics, descriptive and predictive data mining, simulation, and optimization
logical view of information
focuses on how individual users logically access information to meet their own particular business needs
query-by-example (QBE) tool
helps users graphically design the answer to a question against a database
Real-time information
immediate, up-to-date information
Static information
includes fixed data incapable of change in the event of a user action
Information redundancy
is the duplication of data, or the storage of the same data in multiple places.
Data models
logical data structures that detail the relationships among data elements by using graphics or pictures
database
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses)
Data visualization tools
move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more
Unstructured data
not defined, does not follow a specified format, and is typically free- form text such as emails, Twitter tweets, and text messages.
Analysis paralysis
occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome
Infographics
present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format
Distributed computing
processes and manages algorithms across many machines in a computing environment.
Real-time systems
provide real-time information in response to requests
Relational integrity constraints
rules that enforce basic and fundamental information-based constraints.
Integrity constraints
rules that help ensure the quality of information
relational database model
stores information in the form of logically related two-dimensional tables.
information cube
the common term for the representation of multidimensional information
Data governance
the overall management of the availability, usability, integrity, and security of company data
content creator
the person responsible for creating the original website content
content editor
the person responsible for updating and maintaining website content
Master data management (MDM)
the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems.
Data mining
the process of analyzing data to extract information not offered by the raw data alone.
data element (or data field)
the smallest or basic unit of information
Information granularity
to the extent of detail within the information (fine and detailed or coarse and abstract)
Business intelligence dashboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis
Data-mining tools
use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making.
Information integrity issues
when a system produces incorrect, inconsistent, or duplicate data
Information inconsistency
when the same data element has different values