WGU C192 Data Management for Programmers
OODM (object oriented data model)
A (logical) data model that captures the semantics of objects supported in object-oriented programming
Equijoin
A Theta join where the predicate F contains only equality (=)
package
A collection of procedures, functions, variables, and SQL statements that are grouped together and stored as a single program unit
internal schema
A complete description of the internal model, down to the physical storage of data. Each DB has exactly one.
timestamping
A concurrency control protocol that orders transactions in such a way that older transactions (transactions with smaller timestamps) get priority in the event of conflict
dimensional model
A database design composed of one table with a composite primary key, called the fact table, and a set of smaller tables, called dimension tables. All natural keys are replaced with surrogate keys.
data mart
A database that contains a subset of corporate data to support the analytical requirements of a particular business unit or to support users who share the same requirement to analyze a particular business process
temporal database
A database that contains time-varying historical data with the possible inclusion of current and future data and has the ability to manipulate this data
timeout
A deadlock technique where a transaction that requests a lock will wait for only a system-defined period of time before being automatically restarted
deadlock detection and recovery
A deadlock technique where the DBMS allows deadlock to occur but recognizes occurrences of deadlock and breaks them, usually through a Wait-For graph
deadlock prevention
A deadlock technique where the DBMS looks ahead to determine if a transaction would cause deadlock, and never allows deadlock to occur
star schema (star join)
A dimensional data model that has a fact table in the center, surrounded by *denormalized* dimension tables
starflake schema
A dimensional data model that has a fact table in the center, surrounded by normalized *and* denormalized dimension tables
snowflake schema
A dimensional data model that has a fact table in the center, surrounded by normalized dimension tables
fact constellation
A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables
dynamic optimization
A form of query optimization where decomposition and optimization are carried out every time query is run. Selects best optimum strategy, but has a performance overhead.
static optimization
A form of query optimization where the query is parsed, validated, and optimized at once, similar to a compiler. Runtime overhead removed, but may no longer use optimum strategy when run.
two phase locking (2PL)
A locking protocol where all locking operations precede the first unlock operation in the transaction. A transaction must acquire a lock (read or write, as needed) on an item before operating on it. Once a lock is released, no new locks can be acquired.
dimensionality modeling
A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access
base relation
A named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database
locking
A procedure used to control concurrent access to data. When one transaction is accessing the database, this may deny access to other transactions to prevent incorrect results
recursive relationship
A relationship type in which the *same* entity type participates more than once in *different roles*
operational data store (ODS)
A repository of current and integrated operational data used for analysis. May be a staging area for a data warehouse. Provides the ease of use of a RDBMS while remaining distant from the decision support functions of a data warehouse
nonserial schedule
A schedule where the operations from a set of concurrent transactions are interleaved
serial schedule
A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions
sagas
A sequence of (flat) transactions that can be interleaved with other transactions
row type
A sequence of field name/data type pairs that provides a data type to represent the types of rows in tables, so that complete rows can be stored in variables
shared subclass
A subclass with more than one superclass
data warehousing
A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process
candidate key
A superkey such that no proper subset is a superkey within the relation (atomic superkey)
timestamp
A unique identifier created by the DBMS that indicates the relative starting time of a transaction
consistency
ACID component. A transaction must transform the database from one consistent state to another consistent state
atomicity
ACID component. The "all or nothing" property-a transaction is an indivisible unit that is either performed in its entirety or is not performed at all
durability
ACID component. The effects of a successfully committed transaction are permanently recorded in the database and must not be lost because of a subsequent failure
isolation
ACID component. Transactions execute independently of one another.
Columns, constraints, defaults
ALTER TABLE allows you to add and remove these elements from a table
discretionary access control (DAC)
Access control based on assigning individual rights to specific users via simple GRANT and REVOKE SQL statements.
mandatory access control (MAC)
Access control based on system-wide policies that cannot be changed by individual users. Each database object is assigned a security class, users assigned clearances. Rules determine whether a user can read/write an object based on their clearance.
external schema
Also called a subschema, this is a specific view of the data utilized by an end user
superkey
An attribute, or set of attributes, that uniquely identifies a tuple within a relation
savepoint
An identifiable point in a flat transaction representing some partially consistent state, which can be used as an internal restart point for the transaction if a subsequent problem is detected
data model
An integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization (structural, manipulative, constraints)
conformed dimension
Any dimension that is to be represented in more than one dimensional model—and hence data mart
CREATE OR REPLACE PACKAGE BODY PackageName AS
Code to create a package body
CREATE OR REPLACE PACKAGE PackageName AS
Code to create a package specification
CREATE OR REPLACE PROCEDURE name (...) AS
Code to create a stored procedure
CREATE TRIGGER TriggerName When ON
Code to create a trigger
Fan trap
Connection trap where a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous
chasm trap
Connection trap where a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences
link analysis
Data mining technique that aims to establish associations between the individual records, or sets of records, in a database
deviation detection
Data mining technique that identifies outliers, which express deviation from some previously known expectation and norm
predictive modeling
Data mining technique that uses observations to form a model of the important characteristics of some phenomenon
database segmentation
Data mining technique used to partition a database into an unknown number of segments/clusters of records that share a number of properties and so are considered to be homogeneous
temporal data
Data that changes over time
Corporate Information Factory (CIF)
Data warehouse approach by Inmon which begins with a data model of all data, creates the warehouse, then uses the warehouse to feed data marts. Uses traditional techniques.
Business Dimensional Lifecycle (BDL)
Data warehouse approach by Kimball, starts by identifying info requirements and business processes to create Data Warehouse Bus Matrix, which is then used to create data marts, which are finally integrated into a warehouse. Uses dimensionality modeling.
centralized approach
Database design approach where requirements for each user view are merged into a single set of requirements for the new database system. A data model representing all user views is created during the database design stage.
view integration approach
Database design approach where requirements for each user view are merged into a single set of requirements for the new database system. A data model representing all user views is created during the database design stage.
native XML database (NXD)
Defines a logical data model for an XML document that is used to store and retrieve XML. The XML document must be the logical unit of storage, although the underlying physical storage model may be different
vertical partitioning
Distributes the attributes of a relation across a number of (smaller) relations (the primary key is duplicated to allow the original relation to be reconstructed)
horizontal partitioning
Distributes the tuples of a relation across a number of (smaller) relations
disjoint constraint
EER constraint that describes the relationship between members of the subclasses and indicates whether it is possible for a member of a superclass to be a member of one, or more than one, subclass
participation constraint
EER constraint that determines whether every member in the superclass must participate as a member of a subclass
aggregation
EER technique that represents a "has-a" or "is-part-of" relationship between entity types, where one represents the "whole" and the other the "part."
composition
EER technique; a specific form of aggregation that represents an association between entities, where there is a strong ownership and coincidental lifetime between the "whole" and the "part."
specialization
EER technique; the process of maximizing the differences between members of an entity by identifying their distinguishing characteristics
generalization
EER technique; the process of minimizing the differences between entities by identifying their common characteristics
identify facts
Final step in dimensionality modeling, where you decide on the columns in the fact table (i.e., the facts)
referential integrity
If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null
True
In a CREATE TABLE statement, after specifying columns, remaining clauses can be proceeded by "CONSTRAINT name" to allow those to be dropped using ALTER TABLE
location transparency
In a DDBMS, allows a user to access data without caring where it's located, although they may need to be aware of fragmentation
distribution transparency
In a DDBMS, allows a user to perceive a DDBMS as a single logical entity
transaction transparency
In a DDBMS, ensures all distributed transactions maintain the distributed database's integrity and consistency
failure transparency
In a DDBMS, ensures atomicity for distributed transactions in events of failure
performance transparency
In a DDBMS, ensures it performs as well as a centralized DBMS
fragmentation transparency
In a DDBMS, when a user doesn't need to know data is fragmented
entity integrity
In a base relation, no attribute of a primary key can be null
exclusive lock (write)
Lock where a transaction can both read and update an item
shared lock (read)
Lock where a transaction can read an item but not update it
nested transaction model
Model where a transaction is viewed as a collection of related subtasks, or subtransactions, each of which may also contain any number of subtransactions
Locking and timestamping
Name two pessimistic concurrency control techniques
. (dot)
Operator used to reference items declared in a package
connection trap
Problem that arises in ER modeling when the meaning of a relationship is not fully understood nor clearly defined
shadow paging
Recovery protocol that maintains a current page table and a shadow page table. When a transaction completes, the current page table becomes the new shadow page table.
immediate update
Recovery protocol where updates are applied to the database as they occur without waiting to reach the commit point
deferred update
Recovery protocol where updates are not written to the database until after a transaction has reached its commit point
intersection
Relational algebra operation that defines a relation consisting of the set of all tuples that are in both R and S. R and S must be union-compatible.
difference
Relational algebra operation that defines a relation consisting of the tuples that are in relation R, but not in S. R and S must be union-compatible.
Division
Relational algebra operation that defines a relation over the attributes C that consists of the set of tuples from R that match the combination of *every* tuple in S
union
Relational algebra operation that defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. R and S must be union-compatible.
Semijoin
Relational algebra operation that defines a relation that contains the tuples of R that participate in the join of R with S satisfying the predicate F.
Theta join (θ)
Relational algebra operation that defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. The predicate F is of the form R.ai θ S.bi, where θ may be one of the comparison operators (<, ≤, >, ≥, =, ≠).
Cartesian product
Relational algebra operation that defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S
Outer join
Relational algebra operation that is a join in which tuples from R that do not have matching values in the common attributes of S are also included in the result relation. Missing values in the second relation are set to null.
Natural join
Relational algebra operation that is an Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.
rename
Relational algebra operation that provides a new name S for the expression E, and optionally names the attributes as a1, a2, . . . , an. (Greek letter rho)
projection
Relational algebra operation that works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates (i.e., selecting specific columns from a table)
selection (restriction)
Relational algebra operation that works on a single relation R and defines a relation that contains only those tuples of R that satisfy the specified condition (predicate)
CREATE ASSERTION name
SQL DDL command to create a constraint that is not directly linked with a table definition, using a CHECK statement
CREATE DOMAIN name AS type
SQL DDL command to create a new datatype based on an underlying existing type using a CHECK statement
CREATE SCHEMA name AUTHORIZATION user
SQL DDL command to create a schema
CHECK
SQL DDL keyword to add a domain constraint to a column
EXISTS (NOT EXISTS)
SQL keyword used to check only for the existence or nonexistence of rows in the subquery result table
ESCAPE
SQL keyword used to define an escape character
declare grain
Second step in dimensionality modeling, where you decide how granular you want to be in the fact table (e.g., an individual sale vs some aggregate)
serializability
Term for finding nonserial schedules that allow transactions to execute concurrently without interfering with one another, and thereby produce a database state that could be produced by a serial execution
degree (relation)
Term for the number of attributes a relation/table contains
cardinality (relation)
Term for the number of tuples in a relation/table. Changes over time.
view maintenance
Term for the process of updating a materialized view in response to changes to the underlying data
migrating rows
Term for the rows that enter or leave a view
intension
Term for the structure of a relation/table, together with a specification of the domains and any other restrictions on possible values
extension (state)
Term for the tuples of a relation, which change over time
view materialization
Term for when a DBMS stores the view as a temporary table in the database when the view is first queried, for performance
DOMAIN and TABLE
The SQL DDL ALTER command can only be used on these two types of objects
alternate keys
The candidate keys that are not selected to be the primary key
systems specification
The document that describes in detail the data to be held in the database and how the data is to be used
users requirements specification
The document that describes in detail the data to be held in the database and how the data is to be used
view
The dynamic result of one or more relational operations operating on the base relations to produce another relation; a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of request
online analytical processing (OLAP)
The dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data
select business process
The first step in dimensionality modeling, where you choose which part of the business to model.
fact finding
The formal process of using techniques such as interviews and questionnaires to collect facts about systems, requirements, and preferences
select business process, declare grain, choose dimensions, identify facts
The four steps in phase 1 of dimensional modeling
physical data independence
The immunity of the conceptual schema to changes in the internal schema
logical data independence
The immunity of the external schemas to changes in the conceptual schema
fact table
The main table in a dimensional model, composed of a composite primary key (made up of foreign keys) containing one or more numerical records
dimension table
The other tables in a dimensional model, composed of a noncomposite primary key referenced by the main table and containing descriptive textual info, with attributes used as the constraints in queries
growing phase
The phase where a transaction acquires all the locks needed but cannot release any locks
shrinking phase
The phase where a transaction releases its locks but cannot acquire any new locks
checkpoint
The point of synchronization between the database and the transaction log file. All buffers are force-written to secondary storage
conceptual modeling
The process of constructing a detailed architecture for a database that is independent of implementation details
logical modeling
The process of constructing a model of the data based on a specific data model, but that is independent of implementation details
logical database design
The process of constructing a model of the data used in an enterprise based on a specific data model, but independent of a particular DBMS and other physical considerations
Conceptual database design
The process of constructing a model of the data used in an enterprise, independent of all physical considerations
data mining
The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions
physical database design
The process of producing a description of the implementation of the database on secondary storage.
closure
The property allowing expressions to be nested in the relational algebra, just as we can nest arithmetic operations
cascading rollback
The situation in which a single transaction leads to a series of rollbacks. Avoidable using rigorous 2PL or strict 2PL.
read, validation, write
The three phases of an optimistic concurrency control protocol
optimistic
These concurrency control techniques are based on the premise that conflict is rare, and so allow transactions to proceed unsynchronized, checking for conflicts only at commit
Pessimistic (conservative)
These concurrency control techniques cause transactions to be delayed in case they conflict with others at some future time
fact finding techniques
These include examining documentation, interviewing, observing the enterprise in operation, research, and questionnaires.
choose dimensions
Third step in dimensionality modeling, where you decide on dimension tables to set the context for asking about facts in the fact table
cursor
This acts as a pointer to a particular row of a query result. It allows the rows of a query result to be accessed one at a time.
conceptual schema
This describes all the entities, attributes, and relationships together with integrity constraints. Each DB has exactly one.
Normalization
This is used to test the correctness of a logical data model
True
True or false: The WHERE clause filters individual rows going into the final result table, whereas HAVING filters groups going into the final result table
True
True or false: When GROUP BY is used, each item in the SELECT list must be single-valued per group
conflict serializability
Type of serializability that orders any conflicting operations in the same way as some serial execution
livelock
When two transactions are left in a wait state indefinitely, unable to acquire any new locks, although the DBMS is not in deadlock
deadlock
When two transactions wait for locks on items held by the other
database and log files
Which parts of a database need to be backed up?