CH6 Foundations of Business Intelligence: Databases and Information Management
web mining
The discovery and analysis of useful patterns and information from the web process of extracting knowledge from the content of web pages, which may include text, image, audio, and video data
normalization
The process of streamlining complex groups of data to minimize redundant data elements and awkward many-to-many relationships and increase stability and flexibility
key field
This field uniquely identifies each record so that the record can be retrieved, updated, or sorted
Multidimensional Data Model
This view shows product versus region. If you rotate the cube 90 degrees, the face that will show is product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible.
file
a group of records of the same type
data dictionary
an automated or manual file that stores definitions of data elements and their characteristics
Structured Query Language (SQL)
an international standard language for processing a database
smart contracts
are computer programs that implement the rules governing transactions between firms, e.g., what is the price of products, how will they be shipped, when will the transaction be completed, who will finance the transaction, what are financing terms, and the like
data definition
capability to specify the structure of the content of the database
entity-relationship diagram
clarifies table relationships in a relational database
data warehouse
database that stores current and historical data of potential interest to decision makers throughout the company
big data
describe these data sets with volumes so huge that they are beyond the ability of typical DBMS to capture, store, and analyze
Blockchain
distributed database technology that enables firms and organizations to create and verify transactions on a network nearly instantaneously without a central authority
Data governance
encompasses policies and procedures through which data can be managed as an organizational resource.It establishes the organization's rules for sharing, disseminating, acquiring, standardizing, classifying, and inventorying information
sequences
events are linked over time. We might find, for example, that if a house is purchased, a new refrigerator will be purchased within two weeks 65 percent of the time, and an oven will be bought within one month of the home purchase 45 percent of the time.
Data cleansing
known as data scrubbing, consists of activities for detecting and correcting data in a database that are incorrect, incomplete, improperly formatted, or redundant
Associations
occurrences linked to a single event. For instance, a study of supermarket purchasing patterns might reveal that, when corn chips are purchased, a cola drink is purchased 65 percent of the time, but when there is a promotion, cola is purchased 85 percent of the time. This information helps managers make better decisions because they have learned the profitability of a promotion.
distributed database
one that is stored in multiple physical locations. Parts or copies of the database are physically stored in one location and other parts or copies are maintained in other locations
Hadoop
open source software framework managed by the Apache Software Foundation that enables distributed parallel processing of very large amounts of data across inexpensive computers
relational database
organize data into two-dimensional tables (called relations) with columns and rows. Each table contains data about an entity and its attributes
forecasting
predictions in a different way. It uses a series of existing values to forecast what other values will be. For example, forecasting might find patterns in data to help managers estimate the future value of continuous variables, such as sales figures.
Data mining
provides insights into corporate data that cannot be obtained with OLAP by finding hidden patterns and relationships in large databases and inferring rules from them to predict future behavior
Classification
recognizes patterns that describe the group to which an item belongs by examining existing items that have been classified and by inferring a set of rules. For example, businesses such as credit card or telephone companies worry about the loss of steady customers. Classification helps discover the characteristics of customers who are likely to leave and can provide a model to help managers predict who those customers are so that the managers can devise special campaigns to retain such customers.
in-memory computing
relies primarily on a computer's main memory (RAM) for data storage
data lake
repository for raw unstructured data or structured data that for the most part have not yet been analyzed, and the data can be accessed in many ways.
bit
represents the smallest unit of data a computer can handle
query
request for data from a database
referential integrity
rules to ensure that relationships between coupled tables remain consistent
Sentiment analysis
software can mine text comments in an email message, blog, social media conversation, or survey form to detect favorable and unfavorable opinions about specific subjects
database management system (DBMS)
specific type of software for creating, storing, organizing, and accessing data from a database
data mart
subset of a data warehouse in which a summarized or highly focused portion of the organization's data is placed in a separate database for a specific population of users
online analytical processing (OLAP)
supports multidimensional data analysis, enabling users to view the same data in different ways using multiple dimensions
data manipulation language
that is used to add, change, delete, and retrieve the data in the database
"3Vs"
the extreme volume of data, the wide variety of data types and sources, and the velocity at which the data must be processed
primary key
the unique identifier for all the information in any row of the table, and this cannot be duplicated
Nonrelational database management systems
use a more flexible data model and are designed for managing large data sets across many distributed machines and for easily scaling up or down
analytic platforms
using both relational and nonrelational technology that are optimized for analyzing large data sets
data quality audit
which is a structured survey of the accuracy and level of completeness of the data in an information system. Data quality audits can be performed by surveying entire data files, surveying samples from data files, or surveying end users for their perceptions of data quality.
Clustering
works in a manner similar to classification when no groups have yet been defined. A data-mining tool can discover different groupings within data, such as finding affinity groups for bank cards or partitioning a database into groups of customers based on demographics and types of personal investments.
byte
A group of bits, represents a single character, which can be a letter, a number, or another symbol
record
A group of related fields, such as a student's identification number (ID), the course taken, the date, and the grade
database
A group of related files
field
A grouping of characters into a word, a group of words, or a complete number (such as a person's name or age)
foreign key
A primary key of one table that appears as an attribute in another table and acts to provide a logical relationship between the two tables
attributes
Each entity has specific characteristics
entity
Each of these generalized categories representing a person, place, or thing on which we store information
Text mining
Email, memos, call center transcripts, survey responses, legal cases, patent descriptions, and service reports These tools can extract key elements from unstructured big data sets, discover patterns and relationships, and summarize the information