DB 1
arity of a relation
the number of attributes it has
LInear hashing
the one that expands linearly (if index 0 is full, start at index M [2M + i where i is 0]. Can be maintained by a file-load factor instead of just overflows
Total participation with subclass
the superclass HAS to be one of the subclasses
Relation state
the tuples existing in a relation. A subset of the cross-products of all attribute domains
Seek time
time for disk head to find track
block transfer time
time needed to read info from block into main memory
Mappings
transforming requests/results between three DB levels
A database represents some aspect of the real world, sometimes called the
universe of discourse miniworld
A state that satisfies the structure and constraints specified in the schema is called a ________ state.
valid
Traditional file processing
writing custom programs to access data stored in files. User defines and implements files needed for a specific software app as part of programming the app
DBMS
- allows user to create database - specify types of data - relationships among data elements - constraints on what is stored
Data normalization is great, but sometimes we use _________ to improve the performance of queries, like storing pieces of data in multiple places, called _________________
1. controlled redundancy 2. denormalization
View
A composite of a subset of columns from one or more related tables. Contains virtual data derived from the database files but is not explicitly stored
Online transaction processing applications
A concurrently available application that needs to specify holds on data per user efficiently and correctly (online airline booking example)
Data model
A form of data abstraction / conceptual representation that supports basic operations (retrievals and updates on the database) and dynamic behavior (allows database designer to specify a set of valid operations allows on DB objects, like COMPUTE_GPA and stuff). Hides implementation details form users
Operation
A function on data built into the DB definitions. Specified as an interface (inputs and the operation name) and the implementation.
Stored Procedure
A set of SQL statements that can be parameterized and executed ("macro").
Table
A set of data records of the same format, divided into columns that all contain the same kind of data, and rows of related records.
Domain
A set of valid atomic values
Index
A type of access path, allows direct access to data using an index term or a keyword
Four Desirable Properties of Transactions
ACID properties Atomicity Consistency preservation Isolation Durability
Partial participation constraint
An entity can exist without a relationship to the other entity
Total participation constraint
An entity cannot exist without a relationship to the other entity
Dense vs sparse index
An entry for every value versus entries for just some
SQL CHECK Clause
Applies to each tuple individually Example: Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21); CHECK (Dept_create_date <= Mgr_start_date);
Relationship
Association among entities
Uniqueness constraint
Attributes must be distinct between entities.
Difference between B+ tree and B tree
B tree contains pointers at intermediate levels, B+ tree only has data pointers in leaves. Leaves point to their right neighbors too in B+ trees
SQL Create statement
CREATE TABLE [name] (columns, constraitns, etc);
Which of the CRUD operation(s) result in a change of state in a database?
CUD (not retrieve)
A row in a relational model
Called a tuple. represents a collection of related data values. Typically corresponds to real-world entity or relationship.
Definitions of attributes
Can be an ordered list (this is customary) or an unordered list of (attribute, value) pairs.
Data Independence
Capacity to change the schema at one level of a database system Without having to change the schema at the next higher level
Types of end users
Casual end users (occasionally access the DB, usually through specific interfaces) Naive/parametric end users (constantly query/update the DB using canned transactions, like bank tellers and stuff) Sophisticated end users (engineers/scientists/etc that use the DBMS often to execute complex requirements) Standalone users (maintain personal DBs and use GUIs and packages, like someone using a financial software package)
Query processing/optimization
Chooses an efficient query plan for each query based on storage structures
Category
Class that is a subset of the union of n defining superclasses
Subclass
Class whose entities must always be a subset of the entities in another class
Modules and who develops them?
Components of a DBMS for implementing the catalog, querying, interface processing, accessing data. Developed by DBMS System designers/implementers
Semantic integrity constraints
Constraints on the type of values for an attribute
What are nine advantages of using a DBMS and the capabilities that a good DBMS should possess?
Controlling redundancy. Restricting unauthorized access. Providing persistent storage for program objects. Providing storage structures and search techniques for efficient query processing. Providing backup and recovery. Providing multiple user interfaces. Representing complex relationships among data. Enforcing integrity constraints. Permitting inferencing and actions using rules and triggers
Buffering/caching
DBMS often uses this to keep parts of the database in memory to avoid queries loading/unloading data from memory
Who are the three categories of people whose jobs are associated with the design, development, and operation of the DBMS software and system environment.
DBMS system designers and implementers. Tool developers. Operators and maintenance personnel.
In many DBMSs where no strict separation of levels is maintained, this one language is used by the DBA and database designers to define both the conceptual and internal schemas. Otherwise, it is used to specify the conceptual schema only.
Data Definition Language (DDL)
Once the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. This language provides these manipulations.
Data Manipulation Language (DML)
Self-describing data
Data contains item names and data values together in one structure (usually used in NOSQL)
Mirroring/Shadowing
Data is written redundantly to two discs that are treated as one logical disc.
Who are the four categories of people whose jobs involve the day-to-day use of a large database
Database administrators, database designers, end users, and system analysts and application programmers (software engineers).
Storage definition language
Define internal schema only
Data definition language (DDL)
Define the internal and conceptual schema
The DBMS is a general-purpose software system that facilitates what four processes that involve databases?
Defining, constructing, manipulating, and sharing databases
Physical data model
Describe how files are stored on computer.
Atomic
Describes a value that is indivisible
External/view level
Describes part of the database that a particular user group is interested in
Internal level
Describes physical storage structure of the database
Conceptual level
Describes structure of the whole database for a community of users
Primary key
Designated among candidate keys. Other candidate keys are designated as unique keys
Predicate-defined subclass
Displayed by writing the predicate condition next to the line that connects the subclass to the specialization circle. An attribute equals a particular value.
What constraints can Insert violate?
Domain constraints Key constraints Entity integrity Referential integrity
Weak Entities
Don't have key attributes of their own. They are identified by being related to another entity. Always has total participation constraint. If parent entity is deleted, all related weak entities are deleted too.
Backup and recovery system
Ensures DB can recover, such as, in the middle of a transaction or by disk backup etc
Isolation property of transactions
Ensures that each transaction appears to execute in isolation from other transactions
Atomicity property of transactions
Ensures that either all the database operations in a transaction are executed or none are
Specialization hierarchy
Every subclass participates as a subclass in ONLY ONE class/subclass relationship
Transaction
Executing process that includes multiple DB accesses
A Primary Key must be composed of exactly one Attribute in a Relation schema (TRUE/FALSE)
FALSE
A database whose state violates only one Integrity Constraint may still be possible to be in a valid state. (TRUE/FALSE)
FALSE
Tuples in a relation are ordered, TRUE OR FALSE
FALSE. A relation state is a SET of tuples
Sequential file
File containing ordered records
Heap/Pile file
File containing unordered records
Difference between STUDENT and STUDENT(Name, Ssn, ...)
Former is the current state and latter is only the schema
Multiple hashing
Hashing collision method that applies a second hash function if a collision happens. If that collides, use open addressing or a third hash function
Chaining
Hashing collision resolution method that dumps records in an overflow location and sets a pointer of the occupied hash address location to the address of the overflow location
Open addressing
Hashing collision resolution method that proceeds in order from the specified address and checks each position until an open one is found
The overhead costs of using a DBMS are due to which three general areas?
High initial investment in hardware, software, and training. The generality that a DBMS provides for defining and processing data. Overhead for providing security, concurrency control, recovery, and integrity functions.
Entity-relationship ER model/diagram
High-level conceptual model and diagrammatic notation associated with the ER model
Two types of distributed DDBMS software
Homogenous (same DBMS at all sites) and heterogenous (different DBMS at each site)
Database designers
Identifying data to be stored and choosing structures to represent/store the data. Develop views. Collect requirements for the entities to be stored
Deductive database systems
Inferring new information from the stored database facts
Types of Model Constraints
Inherent model-based (constraints inherent in the data model) Schema-based (expressed in schemas of data model by specifying them in the DDL) Application based (business rules not expressed in the schema, but expressed by application program)
What four main types of actions involve databases?
Insert (Add) Delete (Remove) Modify (Update) Retrieve (Query)
Program-operation independence
Interface includes operation name and data types of its arguments, so implementation can be changed without affecting interface
What are the three levels of the Three-Schema Architecture?
Internal level Conceptual level External level
Levels of three-schema architecture
Internal level Conceptual level External/view level
Basic SQL constraints
Key and referential integrity constraints Restrictions on attribute domains and NULLs Constrains on individual tuples within a relation
List four common Database System Utilities discussed in Chapter 2.
Loading (load data files into DB) Backup DB storage reorganization (reorganize DB or make new access paths) Performance monitoring
What are the two types of data independence?
Logical data independence Physical data independence
Of the two types of Data Independence, which is harder to achieve? Why?
Logical data independence is harder to achieve because it allows structural and constraint changes without affecting application programs.
List the nine DBMS Interfaces discussed in the Chapter 2.
Menu-based for web clients or browsing Apps for mobile devices Forms-based GUIs Natural Language Keyword-based DB search Speech I/O Interfaces for Parametric users Interfaces for DBA
Stored Procedures
More involved procedures to enforce rules
Three things DBMS offer
Multiuser usage Encryption and authentication API to interact with it
Entity integrity constraint
No NULL primary key
Key constraints
No two tuples can have the same combination of values for all their attributes
Why include class/subclass specializations in the model?
Not all attributes/relationships may apply to all members of the entity type
What constraints can Delete violate
Only referential integrity. CASCADE, RESTRICT, SET NULL/DEFAULT
Schema evolution
Ops that change the schema. Modern DBMS makes it possible to do so while the DB is operational
Tools and who develops them?
Optional software packages that facilitate DB modeling and design, DB system design, and improved performance. Tool developers develop these.
End users
People whose jobs require access to DB for querying, updating, generating reports.
What are five additional implications of using the database approach, i.e. those that can benefit most organizations
Potential for enforcing standards. Reduced application development time. Flexibility. Availability of up-to-date information. Economies of scale.
How to specify a relation schema R
R(A1, A2, A3... An) where Ai is attribute i (name of a role played by domain D in the relation schema R)
Know the following
Referential integrity constraint Uniqueness/Key constraint Business rules
Attribute
Represents some property of interest about an entity
Security and authorization subsystem
Responsible for creating username/passwords for accessing the database according to certain access restrictions
Triggers
Rule activated by updates to the table
What are the four main characteristics of the database approach versus the file-processing approach?
Self-describing nature of a database system. Insulation between programs and data, and data abstraction. Support of multiple views of the data. Sharing of data and multiuser transaction processing.
Relational database schema
Set of relation schemas S = {R1, R2, ..., Rm) and a set of integrity constraints IC.
Relational database state
Set of relation states DB = { r1, r2,.... rm}
Class
Set or collection of entities.
In spite of the advantages of using a DBMS, there are a few situations in which a DBMS may involve unnecessary overhead costs that would not be incurred in traditional file processing. These four situations are:
Simple, well-defined database applications that are not expected to change at all. Stringent, real-time requirements for some application programs that may not be met because of DBMS overhead. Embedded systems with limited storage capacity, where a general-purpose DBMS would not fit. No multiple-user access to data.
Union constraint on specialization
Single superclass/subclass relationship with more than one superclass (disjoint)
Privileged software
Software for applying big changes to the DB like creating new accounts
Spanned vs unspanned
Spanned -- records can span more than one block, with the tail end of a block pointing to the rest of the record in the next block Unspanned -- if there's not enough space for one more, don't store another one
Canned transaction
Standard DB operations like queries and updates
In databases where a clear separation is maintained between the conceptual and internal levels, this language is used to specify the internal schema.
Storage Definition Language (SDL)
Program-data independence
Structure of data files is stored in DBMS catalog separately from access programs
What features are in EER diagrams that aren't in ER diagrams
Sub-/super-classes Specialization/generalization Category/union type Attribute/relationship inheritance
User-defined subclass
Subclass is not defined by a predicate
Specialization lattice
Subclasses can be a subclass of more than one class/subclass relationship (multiple inheritance).
Overlap constraint on specialization
Subclasses of a specialization don't have to be disjoint
Disjointness constraint on specialization
Subclasses of a specialization must be disjoint (only one specialization per entity)
Key
Superkey of Relation r.
Query optimizer
Takes compiled query/form and makes an efficient search plan
Dynamic hashing
The one that looks like a binary tree that splits on bit values until the leaf node points to the address of the bucket
Extendible hashing
The one with a directory indexed by higher order bits. Index by the first d bits, but each bucket can be indexed by the first <d bits until they overflow, at which point, they're split.
How do you represent inheritance in an EER diagram
The pitchfork U where the forked end points toward the parent
Relationship participation constraints
Total and partial
A Primary Key has a unique value for each Tuple (i.e. Row) in a given Relation (i.e. Table). (TRUE/FALSE)
True
What two properties does a key satisfy
Uniqueness (no two tuples can have the same values for all attributes in key) Minimal superkey (cannot remove any attributes and still have uniqueness constraint hold)
Do key attributes have to be unique?
Uniqueness property must hold for every entity set of the entity type by key attributes
How do you match queries on aggregate values?
Use the HAVING keyword (after WHERE)
What constraints can update violate
Usually causes no problems unless the primary/foreign key is being updated
Referential integrity constraint
Value of Foreign key in a tuple is the primary key of another tuple OR is NULL
In true Three-schema architectures, this language specifies user views (perspectives) and their mapping to the conceptual schema.
View Definition Language (VDL)
Correlated nested query
When a condition in the where clause of a nested query references an attribute of a relation declared in the outer query, the two are correlated
Does the order of table creation matter?
Yes, when foreign keys are involved
Attribute-defined subclass
You can tell the specialization of a class by a particular attribute that the subclass may have.
Relational model represents data as
a collection of relations
Consistency of a transaction
a correct execution of this transaction moves the DB from one consistent state to another
low-level/procedural DML
a data sublanguage embedded in general purpose host language. uses record-at-a-time processing to work on one record at a time
High-level/nonprocedural DML
a query language used on their own to specify complex DB ops at batch levels. Set-at-a-time. DECLARATIVE LANGUAGE because it says what to retrieve instead of how
Valid state
a state that satisfies ALL integrity constraints
Overflow/transaction file
a supplementary file to sequential files -- insertions are just thrown in there and merged in later
Atomicity of a transaction
a transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.
Isolation of a transaction
a transaction should not make its updates visible to other transactions until it is committed; this property, when enforced strictly, solves the temporary update problem and makes cascading rollbacks of transactions unnecessary
Centralized DBMS
all DBMS ops executed on ONE machine
Data manipulation language
allows CRUD
Schema construct
an object in the schema (which I think is a table)
rotational delay
beginning of block rotates until head finds it
RAID 6
block striping with TWO parity blocks
RAID 5
block striping with parity block distributed
Does DBMS allow you to manipulate the schema or the actual data?
both!
Logical data independence
capacity to change the conceptual schema without having to change external schemas or application programs
Physical data independence
capacity to change the internal schema without having to change the conceptual schema
High-Level/Conceptual data models
closer to the way users perceive data
Entity type
collection of entities that have the same attributes
What is a DB
collection of related data over a miniworld that has inherent meaning
Stored data manager
conducts low-level IO between main-memory and disk
Virtual data
data derived from the DB files but not explicitly stored
A database design that stores each logical data item in only one place in the database is called...
data normalization
Specialization
defining a set of subclasses for an entity type defined on the basis of some distinguishing characteristic of the entities in the superclass. They can define specific attributes or specific relationship types
Low-level/physical data models
describe the details of how data is stored on computer storage media
Metadata
describes data and is stored in the DB
System analysts
determine requirements of end users and create canned transactions for them. Also application programmers, software developers, software engineers
Data striping
distributing data over multiple discs to make them look like a single, large, fast disc (bit-level striping, block-level striping, etc)
Representational data models
easily understood by end users also similar to how data is organized in computer storage hides many detail of data storage on disk, but can be implemented on a computer system directly
Query compiler
evaluates queries typed in interactively.
RAID 1
exact copy of data on both discs. No striping
Runtime DB Processor
executes all queries and updates catalog with statistic changes
Blocking Factor
for fixed-length records of size R and disc of block size B, floor(B/R) which is the number of records per block.
Traditional database systems often suffered from the so-called [a] problem, since the data structures provided by the DBMS were incompatible with the programming language's data structures.
impedance mismatch
Three types of constraints on entities
implicit (inherent in the data model) explicit (defined by the schema) semantic (contained in the business rules)
Clustering index
index on non-distinct key (that the table is ordered on) and has one entry per distinct value -- points to the block where that distinct value first shows up. BAD PERFORMANCE DUE TO REDIRECTION
The DBMS must provide specialized data structures and search techniques to speed up disk search for the desired records. Auxiliary files called _____ are often used for this purpose.
indexes
Primary index
indexes on primary key and has one entry per DISK BLOCK. entry is for the first record in that block
Column names in a relational model
interpret meanings of the values in tuple attributes
The DBMS stores the descriptions of the schema constructs and constraints—also called the ________ —in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to.
metadata
Type of higher-level implementation data model
object-data model
Secondary index
on a potentially non-ordered potentially non-distinct value. dense index that points to the BLOCK where the value is contained -- indexed values are stored in order in index
Durability of a transaction
once a transaction changes the DB and the changes are committed, these changes must never be lost because of subsequent failure
Client module vs Server module
one runs on user-facing end, other is responsible for data storage, access, etc
RAID 0
only data striping. No redundancy
Stored procedures
part of the DB definition and are invoked when certain conditions are met
active DB system
provides active rules that auto initiate actions when certain events/conditions occur
Open Database Connectivity (ODBC)
provides an API for direct DB access that can run on the client side (LIKE JDBC)
Entity
real-world object or concept
RAID
redundancy (reliable), data striping (performance), parity (error detection).
Candidate key
relation schema can have more than one key -- any attribute is a candidate key
Operators and maintenance personnel
responsible for actual running and maintenance of the hardware and software environment for the DB system
Database administrator
responsible for administering the database, the DBMS, and related software. Responsible for authorizing access to database, coordinating and monitoring its use, and acquiring software and hardware resources as needed
In a data model, it is important to distinguish between the description of the database and the database itself. The description of a database is the database _____________
schema
access path
search structure that makes the search for a particular DB record efficient, such as indexing or hashing
A DBMS should provide a ________, which the DBA uses to create accounts and to specify account restrictions.
security and authorization subsystem
Relation
set of tuples of arity n. Each tuple t is an ordered list of n values, where each is an element in the domain of Ai or a NULL value.
Concurrency control
several users trying to update the same data do so in a controlled manner and the result of the updates is correct
View definition language
specifies user views/mappings to conceptual schema
DBMS catalog (and what does it store)
stores info such as the structure of each file, the type and storage format of each data item, and various constrains on the data -- called METADATA
Access path
structure that makes the search for particular database records efficient
Schema
structure/framework of the DB
precompiler
takes DB queries written in progrzmming languages and extracts DML commands while sending the rest of hte program to the compiler
Deductive DB systems
the ability to deduce/infer new information from stored DB facts. Done by specifying declared rules.
database state
the data in the database at a point in time
What is used to classify a DBMS
the data model (relational versus NOSQL) number of users (single vs multiuser systems) number of sites the DB is distributed over (centralized vs distributed) cost (opensource versus other) Access path options (reverse structures vs others) General vs specific purpose
Intension
the database schema
Extension
the database state
Superkey
the entire tuple
relational data model, network/hierarchal models, record-based data model
the first is new, the next two are old, and the last is another representational data model