Chapter 6 - Business Intelligence
Consequences of using low quality data
Inability to track customers accurately. Difficulty identifying the organization's most valuable customers. Inability to identify selling opportunities. Lost revenue opportunities from marketing to nonexistent customers. The cost of sending undeliverable mail. Difficulty tracking revenue because of inaccurate invoices. Inability to build strong relationships with customers.
Dynamic information
Includes data that change based on user actions. For example, static websites supply only information that will not change until the content editor changes the information.
Levels, Formats, and Granularities of Organizational Information
Information Levels: Individual, Department, Enterprise Information Formats: Document, Presentation, Spreadsheet, Database Information Granularities: Detail (fine), Summary, Aggregate (coarse).
Data models
Logical data structures that detail the relationships among data elements by using graphics or pictures.
Information Integrity issues
Occur when a system produces incorrect, inconsistent or duplicate data
Information types
Transactional: encompasses all of the information contained within a single business process or unit of work, and its primary purpose is to support daily operational tasks Analytical:encompasses all organizational information, and its primary purpose is to support the performance of managerial analysis tasks. Analytical information is useful when making important decisions such as whether the organization should build a new manufacturing plant or hire additional sales personnel.
data artist
a business analytics specialist who uses visual tools to help people understand complex data.
Big data
a collection of large, complex data sets, including structured and unstructured data, which cannot be analyzed using traditional database methods and tools.
primary key
a field that uniquely identifies a given record in a table
Data warehouse
a logical collection of information, gathered from many operational databases, that supports business analysis activities and decision-making tasks. primary purpose of a data warehouse is to combine information, more specifically, strategic information, throughout an organization into a single repository in such a way that the people who need that information can make decisions and undertake business analysis
Information integrity
a measure of the quality of information
foreign key
a primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables.
Extraction, transformation, and loading (ETL)
a process that extracts information from internal and external databases, transforms it using a common set of enterprise definitions, and loads it into a data warehouse.
Information cleansing or scrubbing
a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information.
Relational database management system
allows users to create, read, update and delete data in a relational datatbase
Dynamic catalogue
an area of a website that stores information about products in a database.
data-driven website
an interactive website kept constantly updated and relevant to the needs of its customers using a database
Integrity constraints
are rules that help ensure the quality of information. The database design needs to consider integrity constraints
Structured query language (SQL)
asks users to write lines of code to answer questions against a database.
record
collection of related data elements.
Data dictionary
compiles all of the metadata about the data elements in the data model.
data mart
contains a subset of data warehouse information
Machine-generated data
created by a machine without human intervention. Machine-generated structured data includes sensor data, point-of-sale data, and web log (blog) data.
database management system (DBMS)
creates, reads, updates, and deletes data in a database while controlling access and security. Managers send requests to the DBMS, and the DBMS performs the actual manipulation of the data in the database.
attributes
data elements associated with an entity
Human-generated data
data that humans, in interaction with computers, generate. Human-generated structured data includes input data, click-stream data, or gaming data.
physical view of information
deals with the physical storage of information on a storage device.
business rule
defines how a company performs certain aspects of its business and typically results in either a yes/no or true/false answer. Stating that merchandise returns are allowed within 10 days of purchase is an example of a business rule.
Data visualization
describes technologies that allow users to see or visualize data to transform information into a business perspective.
data quality audits
determine the accuracy and completeness of its data.
Business-critical integrity constraints
enforce business rules vital to an organization's success and often require more insight and knowledge than relational integrity constraints.
Dirty Data
erroneous or flawed data
data scientist
extracts knowledge from data by performing statistical analysis, data mining, and advanced analytics on big data to identify trends, market changes, and other relevant information.
Advanced analytics
focuses on forecasting future trends and producing insights using sophisticated quantitative methods, including statistics, descriptive and predictive data mining, simulation, and optimization
Logical view of information
focuses on how individual users logically access information to meet their own particular business needs.
Structured data
has a defined length, type, and format and includes numbers, dates, or strings such as Customer Address. Structured data is typically stored in a traditional system such as a relational database or spreadsheet and accounts for about 20 percent of the data that surrounds us.
query-by-example (QBE) tool
helps users grapahically design the answer to a question against a database
Static information
includes fixed data incapable of change in the event of a user action.
database
maintains information about various types of objects (inventory), events (transactions), people (employees), and places (warehouses).
Data visualization tools
move beyond Excel graphs and charts into sophisticated analysis techniques such as controls, instruments, maps, time-series graphs, and more.
Information inconsistency
occurs when the same data element has different values
Analysis paralysis
occurs when the user goes into an emotional state of over-analysis (or over-thinking) a situation so that a decision or action is never taken, in effect paralyzing the outcome.
entity
or table, stores information about a person, place, thing, transactuon or event.
Infographics
present the results of data analysis, displaying the patterns, relationships, and trends in a graphical format
Distributed computing
processes and manages algorithms across many machines in a computing environment
Real-time systems
provide real-time information in response to requests.
Metadata
provides details about data. For example, metadata for an image could include its size, resolution, and date created. Metadata about a text document could contain document length, data created, author's name, and summary.
Information Granularity
refers to the extent of detail within the information (fine and detailed or coarse and abstract)
Relational integrity constraints
rules that enforce basic and fundamental information-based constraints
Unstructured data
s not defined, does not follow a specified format, and is typically free-form text such as emails, Twitter tweets, and text messages. Unstructured data accounts for about 80 percent of the data that surrounds us.
data element (or Data field)
s the smallest or basic unit of information. Data elements can include a customer's name, address, email, discount rate, preferred shipping method, product name, quantity ordered
relational database model
stores information in the form of logically related, two dimensional tables
information cube
the common term for the representation of multidimensional information
Information redundancy
the duplication of data, or the storage of the same data in multiple places.
Data governance
the overall management of the availability, usability, integrity, and security of company data.
content creator
the person responsible for creating the original website content
content editor
the person responsible for updating and maintaining website content
Master Data Management (MDM)
the practice of gathering data and ensuring that it is uniform, accurate, consistent, and complete, including such entities as customers, suppliers, products, sales, employees, and other critical entities that are commonly integrated across organizational systems
Data mining
the process of analyzing data to extract information not offered by the raw data alone.
Business intelligence dashboards
track corporate metrics such as critical success factors and key performance indicators and include advanced capabilities such as interactive controls, allowing users to manipulate data for analysis.
Data-mining tools
use a variety of techniques to find patterns and relationships in large volumes of information that predict future behavior and guide decision making.
advantages of using data-driven websites
- Much easier to use than a customer query tool - web interface requires few or no changes to database model - costs less to add a web interface in front of a DBMS than to redesign and rebuild the system to support changes. - Easy to manage content - Easy to store large amounts of data: - Easy to eliminate human errors
Four Primary Reasons for low quality Information
1. Online customers intentionally enter inaccurate information 2. different systems have different entry standards and formats 3. data entry personnel enter abbreviated information 4. third-party and external info has inconsistencies, inaccuracies and errors
Example of low quality information
1. completeness 2. consistency 3. accuracy
different industries use business intelligence include
Airlines: Analyze popular vacation locations with current flight listings. Banking: Understand customer credit card usage and nonpayment rates. Health care: Compare the demographics of patients with critical illnesses. Insurance: Predict claim amounts and medical coverage costs. Law enforcement: Track crime patterns, locations, and criminal behavior. Marketing: Analyze customer demographics. Retail: Predict sales, inventory levels, and distribution. Technology: Predict hardware failures.
3 Core Concepts of data warehousing
Data Mart Information Cleansing Data Mining
Methods for Analyzing Big Data
Data Mining Big Data Analytics Data Visualization
Real-time information
Immediate, up-to-date information. One of the biggest pitfalls is continual change