ch.3 Database Systems, Data Warehouses, and Data Marts
Validation rule
A rule determining whether a value is valid; for example, a student's age can not be a negative number.
Subject oriented
Focused on a specific area, such as the home-improvement business or a university, whereas data in a database is transaction/function oriented
Purpose
Used for analytical purposes, whereas data in a database is used for capturing and managing transactions
fragmentation
approach to a distributed DBMS addresses how tables are divided among multiple locations. There are three variations: horizontal, vertical, and mixed.
allocation
approach to a distributed DBMS combines fragmentation and replication, with each site storing the data it uses most often.
database
A database is a collection of related data that is stored in a central location or in multiple locations.
data hierarchy
A data hierarchy is the structure and organization of data, which involves fields, records, and files.
database management system (DBMS)
A database management system (DBMS) is software for creating, storing, maintaining, and accessing database files. A DBMS makes using databases more efficient.
Type of data
Captures aggregated data, whereas data in a database captures raw transaction data
Time variant
Categorized based on time, such as historical information, whereas data in a database only keeps recent activity in memory
Field data type
Character (text), date, and number
Integrated
Comes from a variety of sources, whereas data in a database usually does not
Integrity rules
Defines the boundaries of a database, such as maximum and minimum values allowed for a field, constraints (limits on what type of data can be stored in a field), and access methods
Data structure
Describes how data is organized and the relationship among records
Operations
Describes methods, calculations, and so forth that can be performed on data, such as updating and querying data
Field name
Student name, admission date, age, and major
data dictionary
The data dictionarystores definitions, such as data types for fields, default values, and validation rules for data in each field.
physical view
The physical viewinvolves how data is stored on and retrieved from storage media, such as hard disks, magnetic tapes, or CDs.
replication
The replication approach to a distributed DBMS has each site store a copy of the data in the organization's database.
Default value
The value entered if none is available; for example, if no major is declared, the value is "undecided."
Variety
This refers to the combination of structured data (e.g., customers' product ratings between 1 and 5) and unstructured data (e.g., call center conversations or customer complaints about a service or product).
Volume
This refers to the sheer quantity of transactions, measured in petabytes (1,024 terabytes) or exabytes (1,024 petabytes).
Velocity
This refers to the speed with which the data has to be gathered and processed.
data-driven Web site
acts as an interface to a database, retrieving data for users and allowing users to enter data in the database.
data administration component
also used by IT professionals and database administrators, is used for tasks such as backup and recovery, security, and change management.
Data-mining
analysis is used to discover patterns and relationships.
object-oriented databases
both data and their relationships are contained in a single object. An object consists of attributes and methods that can be performed on the object's data.
data model
determines how data is created, represented, organized, and maintained. It usually contains data structure, operations, and integrity rules.
database administrator (DBA)
found in large organizations, design and set up databases, establish security measures, develop recovery procedures, evaluate database performance, and add and fine-tune database functions.
Prescriptive analytics
goes beyond descriptive and predictive analytics by recommending a course of action that a decision maker should follow and showing the likely outcome of each decision.
normalization
improves database efficiency by eliminating redundant data and ensuring that only related data is stored in a table.
logical view
involves how information appears to users and how it can be organized and retrieved.
data warehouse
is a collection of data from a variety of sources used to support decision-making applications and generate business intelligence.
foreign key
is a field in a relational table that matches the primary key column of another table. It can be used to cross-reference tables.
Structured Query Language (SQL)
is a standard fourth-generation query language used by many DBMS packages, such as Oracle 12c and Microsoft SQL Server. SQL consists of several keywords specifying actions to take.
big data
is data so voluminous that conventional computing methods are not able to efficiently process and manage it.
network model
is similar to the hierarchical model, but records are organized differently. Unlike the hierarchical model, each record in the network model can have multiple parent and child records.
data manipulation component
is used to add, delete, modify, and retrieve records from a database.
data definition component
is used to create and maintain the data dictionary and define the structure of files in a database.
application generation component
is used to design elements of an application using a database, such as data entry screens, interactive menus, and interfaces with other programming languages.
online transaction processing (OLTP)
is used to facilitate and manage transaction-oriented applications, such as point-of-sale, data entry, and retrieval transaction processing. It generally uses internal data and responds in real time.
random access file structure
records can be accessed in any order, regardless of their physical locations in storage media. This method of access is fast and very effective when a small number of records need to be processed daily or weekly.
indexed sequential access method (ISAM)
records can be accessed sequentially or randomly, depending on the number being accessed. For a small number, random access is used, and for a large number, sequential access is used.
sequential access file structure
records in files are organized and processed in numerical or sequential order, typically the order in which they were entered.
inheritance
refers to new objects being created faster and more easily by entering new data in attributes.
encapsulation
refers to the grouping into a class of various objects along with their attributes and methods—meaning, grouping related items into a single unit. This helps handle more complex types of data, such as images and graphs.
Extraction, transformation, and loading (ETL)
refers to the processes used in a data warehouse. It includes extracting data from outside sources, transforming it to fit operational needs, and loading it into the end target (database or data warehouse).
create, read, update, and delete (CRUD)
refers to the range of functions that data administrators determine who has permission to perform certain functions.
Descriptive analytics
reviews past events, analyzes the data, and provides a report indicating what happened in a given period and how to prepare for the future Predictive analytics, as the name indicates, is a proactive strategy; it prepares a decision maker for future events.
distributed database management system (DDBMS)
stores data on multiple servers throughout an organization.
A database engine
the heart of DBMS software, is responsible for data storage, manipulation, and retrieval.
hierarchical model
the relationships between records form a treelike structure (hierarchy). Records are called nodes, and relationships between records are called branches. The node at the top is called the root, and every other node (called a child) has a parent. Nodes with the same parents are called twins or siblings.
primary key
uniquely identifies every record in a relational database. Examples include student ID numbers, account numbers, Social Security numbers, and invoice numbers.
relational model
uses a two-dimensional table of rows and columns of data. Rows are records (also called tuples), and columns are fields (also referred to as attributes).
database marketing
uses an organization's database of customers and potential customers to promote products or services.
Business analytics (BA)
uses data and statistical methods to gain insight into the data and provide decision makers with information they can act on.
data mart
usually a smaller version of a data warehouse, used by a single department or function.
query by example (QBE) With query by example (QBE)
you request data from a database by constructing a statement made up of query forms. With current graphical databases, you simply click to select query forms instead of having to remember keywords, as you do with SQL. You can add AND, OR, and NOT operators to the QBE form to fine-tune the query.
In summary, a database has the following advantages over a flat file system:
• More information can be generated from the same data. • Complex requests can be handled more easily. • Data redundancy is eliminated or minimized. • Programs and data are independent, so more than one program can use the same data. • Data management is improved. • A variety of relationships among data can be easily maintained. • More sophisticated security measures can be used. • Storage space is reduced.