CMSC 508 Midterm
Overlapping
subclasses may have overlapping sets of entities
Disjoint
subclasses must have disjoint sets of entities
Tuple
records that correspond to the attributes; horizontal
Consistency constraints
refers to the requirement that any given database transaction must change affected data only in allowed ways. refers to maintaining and assuring the accuracy and consistency of data over its entire life-cycle,
Relation schema
relation name followed by a parenthesized list of attribute names with associated domains followed by constraints
Superclass and subclass
The derived class (the class that is derived from another class) is called a subclass. The class from which it's derived is called the superclass
Physical schema
defines how the conceptual schema is actually stored in data structures on disks
Data abstraction
entities
Total participation
every entity in R participates in at least one relationship in S
Atomic domain
indivisible
UML representation
is a way of visualizing a software program using a collection of diagrams.
Weak entity
is one whose existence is dependent on another identifying entity
Primary key
key in a relational database that is unique for each record. It is a unique identifier, such as a driver license number, telephone number (including area code), or vehicle identification number (VIN).
Schema
logical structure of the Database
Null value
null signifies unknown or does not exist o All comparisons involving null are false by definition o The result of any arithmetic expression involving null is null
Partial participation
only some entities in R participate in relationships in S
Atomicity
either all actions in a transaction occur, or none
Database languages
Procedural • Specifies what data is required and how to get those data o Declarative • Specifies what data is required without specifying how
Normalization
Process of organizing the columns and tables of a relational database to minimize data redundancy • Normalization decomposes a table into smaller tables without losing information • References are defined using primary keys and foreign keys
Projection
Projection (∏) • Outputs specified attributes from all rows of the input relation, and removes duplicate tuples from the output • Let R (A,B,C) Properties • Idempotent ∏A1,...,AN(∏B1,...,BM(R )) = ∏A1,...,AN(R ) where {A1,...,AN} ⊆ {B1,...,BM} • Distributive over set union ∏A1,...,AN(R P ) = ∏A1,...,AN(R ) ∏A1,...,AN(P ) Projection does NOT distribute over intersection and set difference
One-to-many relationship
1:M
Strong entity
An entity set that has a primary key
Physical data independence
Being able to change the physical schema without changing the logical schema
Query language
SQL
Instance
The actual content of a database at a certain point in time
One-to-one relationship
1:1
Logical schema
describes the data model
Generalization
A bottom-up design process to combine a number of entity sets that share the same features into a higher-level entity set • Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way
Data models
A collection of tools for describing o Data o Data relationships o Data semantics o Data constraints
Database Management System
Collection of interrelated data Set of programs to access the data An environment that is convenient, efficient, and reliable
Relational data model
Collection of tables to represent both data and the relationships • Tables have multiple columns with unique names • Tables contain records (tuples) that correspond to the attributes • Tables are called relations
Derived attribute
Computed from other attributes o Example: age given the date of birth, GPA given grades
ACID properties
Consistency - making sure that if a DBMS starts in a consistent state, that it ends up in a consistent state. Isolation - One transaction does not affect other transactions Durability - If a transactions commits then that effect is persistent
Failure recovery
DBMS provides fault-tolerant design using transactions
Entity-relationship model
Data modelling as a collection of entities and relationships • Represented diagrammatically by an entity-relationship diagram • Unified Modeling Language (UML) • Entity sets are represented by a rectangular box • Relationship sets are represented by a diamond connecting related entity sets
Unary, Binary, Ternary relationship
Degree of a relationship
Database administrator (DBA)
Designs logical /physical schemas o Handles security and authorization o Data availability, crash recovery o Database tuning as needs evolve
Data inconsistency
Different rows should different content for an attribute that should have the same value
Aggregation
E-R model cannot express relationships among relationships • Suppose we want to record evaluations of a student by a guide on a project Eliminate this redundancy via aggregation without redundancy o Treat relationship as an abstract entity o Allows relationships between relationships o Abstraction of relationship into new entity • The following diagram represents: o A student is guided by a instructor on a project o A student, instructor, project set may have an evaluation To represent aggregation, create a schema containing o Primary key of the aggregated relationship o The primary key of the associated entity set o Any descriptive attributes eval_for (s_ID, project_id, i_ID, evaluation_id) The schema proj_guide is redundant!
Entity and entity set
Entity sets • Entities are represented by their name and a set of attributes • Entities are uniquely identified by the primary key • Foreign keys guarantee referential integrity • Entities are represented graphically as rectangles, listing the attributes. The primary key is underlined 4 Example: instructor = (ID, name, salary ) student = (ID, name, tot_cred)
Mapping cardinality
Express the number of entities to which another entity can be associated via a relationship set
Functional dependencies
Given a relation R, a set of attributes X in R is said to functionally determine another set of attributes Y, also in R, (written X → Y) if, and only if, each X value is associated with precisely one Y value • A functional dependency X → Y holds over relation R if, for every allowable instance r of R: o t1ϵ r, t2ϵ r, (t1) = (t2) implies (t1) = (t2) o if the X values agree, then the Y values must also agree • An FD is a statement about all allowable relations o Must be identified based on semantics of application o Given some allowable instances of R, we can check if they violate some FDs, but we cannot tell if holds over R!
E-R model and normalization
Identify main entities and relationships (list) 2. Populate entities with attributes, identify PK 3. Draw entities, relationships, and name the roles 4. Define the cardinality and participation of the relationships 5. Refine and simplify the model 6. Translate the model to relational schemas Common errors in ERDs • Missing cardinalities, participation, roles, diamonds (NOTATION!) • Excessive decomposition (attributes -> entities), specialization (car category), ternary relationships, weak entities • Age is NOT an attribute (birthdate is) process of organizing the attributes and relations to minimize data redundancy • Normalization involves decomposing a table into less redundant and smaller tables without losing information using foreign keys • The objective is to isolate data so that modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys
Data-manipulation language
Language for accessing and manipulating the data SQL is the most widely used query language (declarative) Example: select name from instructor where instructor.ID = '22222' Application programs generally access databases through: • Language extensions to allow embedded SQL • Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database (drivers)
Many-to-one relationship
M:1
Many-to-many relationship
M:N
Joins
Natural join (⨝) • Let r and s be relations on schemas R and S respectively • Natural join of relations R and S is a relation on schema R S as: o Consider each pair of tuples tr from r and ts from s. o If tr and ts have the same value on each of the attributes in R S, add a tuple t to the result, where t has the same value as tr on r t has the same value as ts on s • Properties: associative and commutative Theta-join (θ) • R ⨝θS = σθ(R×S) • Selection σ meeting condition θ after cross product Equijoin • θ is the equality operator Outer join • An extension of the join operation that avoids loss of information • Computes the join and then adds tuples form one relation that does not match tuples in the other relation • Uses null values o null signifies unknown or does not exist o All comparisons involving null are false by definition o The result of any arithmetic expression involving null is null Inner join • Instructor ⨝ teaches Left outer join Right outer Join Full outer join
Object-based data model
Objects, classes and inheritance are directly supported in database schemas and in the query language • Extend the relational data model by including object orientation and constructs to deal with added data types • Allow attributes of tuples to have complex types, including nonatomic values such as nested relations • Preserve relational foundations, in particular the declarative access to data, while extending modeling power • Provide upward compatibility with existing relational languages
Query processor
Parsing and translation Optimization Evaluation • Cost difference between equivalent queries can be enormous
Relationship and relationship set
Relationships define associations among entities • Relationships may generate new attributes due to the association • Relationships are represented graphically as diamonds
Selection
Selection (σ) • Returns rows of the relation that satisfy the predicate • Let R (A,B,C,D) Properties • Idempotent σA(R ) = σA(σA(R )) • Commutative σA(σB(R ) ) = σB(σA(R )) • Breaking up σA ∧ B(R) = σA (σB(R)) = σB(σA(R)) σA ∨ B(R) = σA (R) σB(R)
Simple and composite attributes
Simple (atomic) and composite attributes (divisible) Example: VCU eID vs name (First name + last name)
Data-definition language
Specification notation for defining the database schema Example: create table instructor ( ID char(5), name varchar(20), dept_name varchar(20), salary numeric(8,2)) • DDL compiler generates a set of tables stored in a data dictionary • Data dictionary contains metadata o Database schema o Integrity constraints • Primary key • Referential integrity o Authorization
Entity-relationship data model
The ER data mode was developed to facilitate database design by allowing specification of an enterprise schema that represents the overall logical structure of a database • The ER model is very useful in mapping the meanings and interactions of real-world enterprises onto a conceptual schema • Because of this usefulness, many database-design tools draw on concepts from the ER model
Identifying relationship
The relationship associating the weak entity set with the identifying entity set
Specialization and generalization
Top-down design process: sub-groupings within an entity set that are distinctive from other entities in the set • These sub-groupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set • Depicted by a triangle component labeled ISA e.g., instructor "is a" person, student "is a" person, but attributes and relationships for instructor and student are different
Total and partial generalization
Total: every entity in the superclass must be a member of a subclass • Partial: an entity may not belong to any of the subclasses
Operations on relations
Union () • Let r and s be relations on schemas R and S respectively • 𝑅 𝑆 = 𝑥 𝑥 ∈ 𝑅 𝑜𝑟 𝑥 ∈ 𝑆} Intersection () • Let r and s be relations on schemas R and S respectively • 𝑅 𝑆 = 𝑥 𝑥 ∈ 𝑅 𝑎𝑛𝑑 𝑥 ∈ 𝑆} Difference(−) • Let r and s be relations on schemas R and S respectively • 𝑅 − 𝑆 = { 𝑥 ∈ 𝑅 | 𝑥 ∉ 𝑆} Cartesian product(×) • Output all pairs of rows from the two input relations • 𝑅 × 𝑆 = 𝑟, 𝑠 𝑟 ∈ 𝑅 𝑎𝑛𝑑 𝑠 ∈ 𝑆}
Attribute
Vertical on table
Semistructured data model
XML - Extensible Markup Language • Originally intended as a document markup language • The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data • XML has become the basis for new generation data interchange • A wide variety of tools is available for parsing, browsing and querying XML documents/data
Attribute inheritance
a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked
Candidate key
a minimal set of attributes which is a key
Storage manager
a program that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system. • The storage manager is responsible to the following tasks: o Interaction with the file manager o Efficient storing, retrieving and updating of data • Issues: o Storage access o File organization o Indexing and hashing
Table
a set of data elements (values) using a model of vertical columns (identifiable by name) and horizontal rows, the cell being the unit where a row and column intersect.
Relation instance
a table with a column for each attribute which is atomic; each row represents an entity
Referential integrity constraint
all foreign key constraints are enforced.
Foreign key
an attribute of a relation which serves as a primary key of another relation
Superkey
any key including a set of all attributes of a relation
Multi-valued attributes
can have more than one value at a time for an attribute
Transactions
collection of operations that performs a single logical function (atomic) in a database application
Database schema
collection of relation schemas
Nonprocedural language
computer language that does not require writing traditional programming logic. Also known as a "declarative language," users concentrate on defining the input and output rather than the program steps required in a procedural programming language such as C++ or Java.
Application program
computer program whose primary purpose is entering and retrieving information from a computerized database. Early examples of database applications were accounting systems and airline reservations systems
Single-valued attributes
considered single-valued if there is at most one value associated with it at any one point in time
Metadata
container object that keeps together many different features of a database (or multiple databases) being described
Data dictionary
contains metadata o Database schema o Integrity constraints • Primary key • Referential integrity o Authorization
Concurrency control
controls the interaction among the concurrent transactions, to ensure the consistency of the database
Domain
set of allowed values for each attribute
Database instance
set of memory structures that manage database files
Relation
set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain. a table
Keys
set of zero, one, or more attributes. The value(s) of these attributes are required to be unique for each tuple (row) in a relation
Schema diagram
shows the relations, attributes, and keys
ER diagram
shows the relationships of entity sets stored in a database. An entity in this context is a component of data. In other words, ER diagrams illustrate the logical structure of databases
Procedural language
type of computer programming language that specifies a series of well-structured steps and procedures within its programming context to compose a program. It contains a systematic order of statements, functions and commands to complete a computational task or program.
Recursive relationship set
unary with roles
Degree of relationship set
unary, binary, ternary
Set operations
union, intersection, difference, product
Discriminator attributes
used to identify other attributes of a weak entity
Null value
value used to represent an unknown piece of data.
Lossless decomposition
• A decomposition of R into R1 and R2 is lossless join if at least one of the following dependencies is in F + : o R1 R2 R1 o R1 R2 R2 • The above functional dependencies are a sufficient condition for lossless join decomposition; the dependencies are a necessary condition only if all constraints are functional dependencies 4 R1 (R) R2 (R) = R
Two- and three-tier architectures
• Two-tier architectures - application runs on client machine, communicates directly with the DB running on the server. • Three-tier architectures - application code running on the client's machine communicates with with an application server at the server, but never communicates with the database itself.