MIS 385 Exam II
Second Normal Form
1NF PLUS every non-key attribute is fully functionally dependent on the ENTIRE primary key No partial functional dependencies
Action Assertions
Business rules
Advantages of Partitioning
Efficiency: Records used together are grouped together Local optimization: Each partition can be optimized for performance Security: data not relevant to users are segregated Recovery and uptime: smaller files take less time to back up
ETL
Extract, transform, and load
First Normal Form
No multivalued attributes Every attribute value is atomic All relations are in 1st Normal Form.
Entity Integrity
No primary key attribute may be null. All primary key fields MUST have data
Data manipulation
Powerful SQL operations for retrieving and modifying data
Data Normalization
Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data The process of decomposing relations with anomalies to produce smaller, well-structured relations
Enterprise Keys
Primary keys that are unique in the whole database, not just within a single relation
Delete Rules
Restrict-don't allow delete of "parent" side if related rows exist in "dependent" side Cascade-automatically delete "dependent" side rows that correspond with the "parent" side row to be deleted Set-to-Null set the foreign key in the dependent side to null if deleting from the parent side
Most common data model
dimensional model
Pull Replication
receiving sites control when update messages will be processed
Application Program Interface
routines that an application uses to direct the performance of procedures by the computer's operating system
Designing Fields
smallest unit of application data recognized by system software
UDDI
- standard for creating and distributing Web services
Third Normal Form
2NF PLUS no transitive dependencies This is called transitive, because the primary key is a determinant for another attribute, which in turn is a determinant for a third
Decentralized Database
A collection of independent databases on non-networked computers
Service Oriented Architecture
A collection of services that communicate with each other, usually by passing data or coordinating a business activity
Data Mart
A data warehouse that is limited in scope
Transaction
A discrete unit of work that must be completely processed or not processed at all
Data Administration
A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards
Cloud computing
A model for creating ubiquitous, convenient, on-demand access to network services
Physical File
A named portion of secondary memory allocated for the purpose of storing physical records
Data steward
A person responsible for ensuring that organizational applications properly support the organization's data quality goals
Relation
A relation is a named, two-dimensional table of data A table consists of rows (records) and columns (attribute or field). Requirements for a table to qualify as a relation: It must have a unique name. Every row must be unique Attributes (columns) in tables must have unique names All relations are in 1st Normal form
Well-Structured Relations
A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data inconsistencies Goal is to avoid anomalies
SQL Environment Catalog
A set of schemas that constitute the description of a database
Distributed Database
A single logical database spread physically across computers in multiple locations that are connected by a data communications link
Data Warehouse
A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
General rule of thumb
A table should not pertain to more than one entity type.
Database Administration
A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery
Candidate Key
A unique identifier. One of the candidate keys will become the primary key Each non-key field is functionally dependent on every candidate key.
Domain Constraints
Allowable values for an attribute
Thin Client
An application where the client (PC) accessing the application primarily provides the user interfaces and some application processing, usually with no or limited local data storage.
SQL Database Definition
CREATE SCHEMA-defines a portion of the database owned by a particular user CREATE TABLE-defines a new table and its columns CREATE VIEW-defines a logical table from one or more tables or views
Field design
Choosing data type Coding, compression, encryption Controlling data integrity
View Integration
Combining entities from multiple ER models into common relations
Data Control Language
Commands that control a database, including administering privileges and committing data
Data Definition Language
Commands that define a database, including creating, altering, and dropping tables and establishing constraints
Data Manipulation Language
Commands that maintain and query a database
Server
Computer (PC/mini/mainframe) that provides a service
Consolidation
Consolidating all data into a centralized database
Surrogate Keys
Dimension table keys should be surrogate
Master Data Management
Disciplines, technologies, and methods to ensure the currency, meaning, and quality of reference data within and across various subject areas
Vertical Partitioning
Distributing the columns of a logical relation into several separate physical tables
Horizontal Partitioning
Distributing the rows of a logical relation into several separate tables
Noncorrelated subqueries
Do not depend on data from the outer query Execute once for the entire outer query
Data dictionary
Documents data elements of a database
Data propagation
Duplicate data across databases, with near real-time delay
Data governance
High-level organizational groups and processes overseeing data stewardship across the organization
Disadvantages of Partitioning:
Inconsistent access speed: Slow retrievals across partitions Complexity: Non-transparent partitioning Extra space or update time: Duplicate data; access from multiple partitions
Normalization can
Increase chance of errors and inconsistencies Reintroduce anomalies Force reprogramming when business rules change
changed data capture
Indicates which data have changed since previous data integration activity
Query Efficiency Considerations
Instead of SELECT *, identify the specific attributes in the SELECT clause; this helps reduce network traffic of result set Limit the number of subqueries; try to make everything done in a single query if possible If data is to be used many times, make a separate query and store it as a view
Data Mining
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques
Correlated subqueries
Make use of data from the outer query Execute once for each row of the outer query Can use the EXISTS operator
Transforming EER Diagrams into Relations
Mapping Regular Entities to Relations
Database Recovery
Mechanism for restoring a database quickly and accurately after loss or damage
Data integrity
Mechanisms for implementing business rules that maintain integrity of manipulated data
Database Security
Protection of the data against accidental or intentional loss, destruction, or misuse
Data federation
Provides a virtual view of data without actually creating one centralized database
Information Repository Dictionary System (IRDS)
Software tool managing/controlling access to information repository
SQL
Structured Query Language The standard for relational database management systems
System catalog
System-created database that describes all database objects
Data structure
Tables (relations), rows, columns
File Organizations
Technique for physically arranging records of a file on secondary storage
Concurrency Control
The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment
Schema
The structure that contains descriptions of objects created by a user
Online Analytical Processing (OLAP) Tools
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques
Functional Dependency
The value of one attribute (the determinant) determines the value of another attribute
Denormalization
Transforming normalized relations into non-normalized physical record specifications Benefits:Can improve performance (speed) by reducing number of table lookups
Client
Workstation (usually a PC) that requests and uses a service
Outer join
a join in which rows that do not have matching values in common columns are nonetheless included in the result table (as opposed to inner join, in which rows must have matching values in order to appear in the result table)
Equi-join
a join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table
Join
a relational operation that causes two or more tables with a common domain to be combined into a single table or view
Web Services
a set of emerging XML-based standards that define protocols for automatic communication between software programs over the Web
Informational system
a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications
Operational system
a system that is used to run a business in real time, based on current data; also called a system of record
Index
a table or other data structure used to determine in a file the location of records that satisfy some condition
Range control
allowable value limitations (constraints or validation rules)
Natural join
an equi-join in which one of the duplicate columns is eliminated in the result table
Default value
assumed value if no explicit value
Referential integrity
constraint that ensures that foreign key values of a table must match primary key values of a related table in 1:M relationships
Union join
includes all columns from each table in the join, and an instance for each row of each table
Subquery
placing an inner query (SELECT statement) inside an outer query
Middleware
software that allows an application to interoperate with other software without requiring user to understand and code low-level operations
Joining
the process of combining data from various sources into a single table or view
Normalization
the process of decomposing relations with anomalies to produce smaller, well-structured relations
Selection
the process of partitioning data according to predefined criteria
Aggregation
the process of transforming data from detailed to summary level
Push Replication
updating site sends changes to other sites