Database Management Systems

Ace your homework & exams now with Quizwiz!

design of DML

(not done)

Boolean searches

(not done) and, or, not, George Boole

element-element associations

(not done) (compare with element-attribute associations)

element-attribute associations

(not done) (compare with element-element associations)

design of DDL

(not done) (in reference to implementation of eleme)nt associations

fragmentation of files

(not done) (to optimize storage) and traversal of the fragmented files (vertical and horizontal fragmentation)

heuristics of database design

(not done) (what are you interested in)

database locking (or records)

(not done) as aside, deadlock, livelock

entity/attribute model

(not done) as compared to hierarchical, network and relational models

interface between db design, data structures, and telecommunications

(not done) data distribution and processing, levels of database usage (operational vs decision support

History of database design

(not done) hierarchical model, network model, relational model, comparison with entity/attribute model

Programming a DBMS

(not done) interpretation of DDL saving the interpretation in a table loading data in batch mode creation and testing of the DML: query (find / search), insertion, deletion, update values (change)

File organization techniques

(not done) related to the key in elementary sets, attribute fields, key in relationship sets

speedy access

(not done) sequential access vs random or direct, all data vs subsets, inverted file

Representation of objects

(not done) set theory and it's application to database to database design sets relationships attributes

transaction

- a piece of work - a collection of operations that performs a single, logical function in a database operation. each transaction is a a unit of both atomicity and consistency - is required to have the ACID properties (atomicity, consistency, isolation, durability) - is ended in SQL with a "commit work" or "rollback work" statement - in log-based schemes, is considered to have committed when it's last log record - the commit log record - has been output to stable storage.

race condition

- an undesirable condition that occurs when a device or system attempts to perform two or more operations at teh same time - occurs when two or more threads can access shared data and try to change it at the same time (one gets there first). Because the thread scheduling mechanism can swap threads at any time, you don't know the order in which the threads will attempt to access the shared data. - which gets done first? (in terms of locking)

select operator

- horizontal subset - that is, serves to shrink the table vertically by eliminating unwanted rows (tuples) - [not SQL SELECT, but relational algebra]

project operator

- vertical subset - serves to shrink the table horizontally by removing unwanted columns - most commercial implementations of SQL do not support a project operation, and projections are achieved by specifying the columns desired in the output. e.g. FROM table

partial dependency

-where only part of the key is sufficient to identify the tuple. e.g. in "{SSN, age} -> name" the SSN alone is sufficient to determine the name.

SQL parts

DDL to define schemas, delete relations, and modify relation schemas DML to query information from a database, insert tuples, delete tuples, modify tuples Integrity - commands to specify integrity constraints that the data stored in the db must satisfy. Updates that violate constraints are disallowed. View definition - sql includes commands for defining views Transaction Control - commands for specifying the beginning and ending of transactions Embedded SQL and dynamic SQL - these define how SQL statements can be embedded within a general-purpose programming language such as C, C++ and Java. Authorization - SQL DDL includes commands for specifying access rights to relations and views.

Physical level

How data are stored. e.g. an instructor, department, or student record can be described as a block of consecutive storage locations

natural join

Unlike the Cartesian product of two relations, which concatenates each tuple of the first relation with every tuple of the second, natural join considers only those pairs of tuples with the same value on those attributes that appear in the schemas of both relations.

extension

We use the term _______ of the entity set to refer to the actual collection of entities belonging to the entity set.

referential integrity constraint

a constraint that requires that the values appearing in a specified group of attributes of any tuple, also appear in specified attributes of at least one tuple in the referenced relation

tuple

a sequence (a list) of values

commit

a transaction that completes its execution successfully is said to be "committed"

abort

a transaction that fails to execute successfully is aborted. an aborted transaction musd be rolled back in one of two ways; restart transaction if it was aborted due to a hardware or software eroror not created through the internal logic of the transaction kill transaction if it

multi-valued attributes

a user may have multiple phone numbers, so rather tahn put multiple values in one cell, maybe we'd put mlutiple rows, repeating the info, varying only in the telephume number row.

join operator

allows the dynamic linking of two tables that share a common value - the join operation is achieved by stating the selection criteria fo rtwo tables and equating them with the common columns

assertions

any condition that the database must always satisfy. domain constraints and special forms are assertions.

hash indices

based on a uniform distribution of values across a range of buckets. the bucket to which a value is assigned is determined by a function, called a hash function

file header

bytes allocated at beginning of file which contain variety of information about file. for now we store only the address of the first record we deleted

SQL types

char(n), fixed length character. varchar(n), variable length character with maximum length n. int - a finite subset of the integers that is machine dependent smallint - a machine-depended subset of the integer type numeric(p,d) - fixed-point number with user-specified precision. consists of p digits (plus a sign), and d of teh p digits are to the right of the decimal point. e.g. numeric(3,1) allows 44.5 to be stored exactly, but neither 444.5 or 0.32 can be stored exactly in a field of this type. real, double precision. floating-point and double-precision floating-point numbers with machine-depended precision. float(n) - a floating-point number with precision of at least n digits

instantaneous commit

commits instantaneously, there is an option to delay commitment ...

functional dependencies

constraints

phantom phenomenon

e.g. SELECT name FROM instructor WHERE salary > 90000 - - and followed by -- Insert 'James', salary =100000. This would insert a value that would be missed by the query

multi value attribute

e.g. various phone numbers assigned to one person

transaction management

ensures that the database remains in a consistent (correct) state despite system failures. ensures that concurrent transaction executions proceed without conflicting.

relation

in the relational model, a relation refers to a table

Three levels of database design

internal external conceptual

multiple granularity locking

locks are set on objects that contain other objects. MGL explains the hierarchical nature of the "contains" relationship. e.g.

file

organized logically as a sequence of records, these records are mapped to disk blocks

Technical issues

privacy, integrity, security or data, data compaction

project

projects only certain (or all of the?) columns (vertical) and all the rows

SQL operators

select, project, join

livelock

situation where 2 or more processes carry out actions, but are unable to complete their task because of interaction with the other process(es).

decomposition

the act of putting single value attributes in one table, multi-value attributes in another

relational database scheme

the collection of relations used to represent a set of information, while the current values/instance of the corresponding relations is called the (relational) database.

relational database

the current values corresponding to the relations of a given schema (compare with relational database schema)

throughput

the number of transactions executed in a given amount of time.

intension

the permanent part of the relation (meaning/the schema). What is specified by in the relational schema, and not any instance (elements), but a definition of all permissible elements (a structure and set integrity constraints). compare with "Extension"

relational model

uses a collection of tables to represent both data and relationships among those data. it is record-based.

dependency preservation

when decomposing, don't break up functional dependencies

deadlock

where at least two threads each hold a lock that the other thread holds and will not give up until it has acquired the other lock.

indexing techniques

no technique is best for particular applications. each must be evaluated on the basis of: access types access time insertion time deletion time space overhead

Durability

Property that ensures that once a transaction has been committed, it will remain so, even in the event of a crash, power loss, or errors. (e.g. you want your account balance to not be wiped out at the end of the day)

sequential file organization

Records are stored in sequential order, according to the value of a "search key" of each record. out-of-order records are chained together [to maintain order/searchability] To allow fast random access, we use an index structure.

data model types

Relational Model Entity-Relationship Model Object-Based Data Model Semistructured Data Model

entity

a "thing" or "object" in the real world that is distinguishable from other objects. entities in a database are described by a set of attributes

primary key

a candidate key (minimal set of attributes needed to uniquely identify a tuple/row) from a relation chosen by the DBA as the unique identifier

database system

a collection of interrelated data and a set of programs that allow users to access and modify these data. db subsystems include storage manager and query processor

hash function

a database indexing function used to determine to which bucket a value will be assigned in the index

database instances and schemas as programming instances/classes

a database schema corresponds to classes and variables (the definition/design of the class) in a program. each variable has a specific value at a given instant. The value of the variables at a specific point are an instance of the database schema.

Integrity problems

a disadvantage of file processing systems vs DBMS -

storage manager data structure implementations

as part of the physical implementation of the system, the storage manager implements several data structures: Data files Data dictionary Indices

Logical level

at a logical level a record is described by a type definition: e.g . type instructor = record id : char (5); name : char(20); salary : numeric(8,2); end;

ordered indices

based on sorted ordering of the values - dense index - sparse index

Indexing

basically two types of indices (for accellerating searches on DB) Ordered indices Hash indices

View level

computer users see application programs that hide the details of the data. at the view level, several views of the database are defined, and a database user sees some of all of these views.

serializability

ensures consistency: 2 phases - growing/expanding phase lock x lock y - shrinking phase guaranteed consistency, but .... but what?

schema

- relationship description - the overall design of the database the description of a database, may be visualized with a schema diagram analogous to a "class" definition in programming, with such things as variables and variable types defining/creating parameters for (a model) for the class

storage manager components

Authorization and integrity manager Transaction manager File manager Buffer manager

File Processing System disadvantages vs DBMS

Data redundancy and inconsistency Difficulty in accessing data Data isolation Integrity problems Atomicity problems Concurrent access anomalies Security problems

data abstraction levels

Physical Level (*How* data stored) Logical Level (*What* data stored) View Level (exists to enable us to see only what we want, filtering out the rest of the database?)

Automicity problems

a disadvantage of file processing systems vs DBMS - if a transfer of $500 is to happen, the system may not fail between the start (remove $500 from account A) and end (add $500 to account B). The funds transfer must be "atomic"; meaning, it must happen in it's entirety or not at all.

Data Redundancy and Inconsistency

a disadvantage of file processing systems vs DBMS - if created over long periods of time, different programmers create different structures, maybe written in different languages. Data may be replicated redundantly recorded; e.g. a student who is both Music and Math major, may have name/telephone in both the Music dept, file, and same entries in Math dept. file. Plus, when data is updated one place, it may not be updated in another, becoming inconsistent.

Difficulty in accessing data

a disadvantage of file processing systems vs DBMS - maybe a program to access "all students" was written, but not one for "all students in zip code X". a new program may need to be written, or get "all students" and manually extract the information

Security problems

a disadvantage of file processing systems vs DBMS - not every user should be able to access all data. E.g. only payroll personnel may need to see certain financial records.

Concurrent access-anomalies

a disadvantage of file processing systems vs DBMS - say two students concurrently try to register for a class with 39 ppl, and say it's limited to 40 people. if both add at the same time and they both receive feedback that the class now has 40 ppl, both the feedback is incorrect, and capacity has been exceeded.

hashing file organization

a hash function is computed on some attribute of each record. the result of the hash function specifies in which block of the file the record

candidate key

a minimal superkey, that is, a set of attributes that forms a superkey, but none of whose subsets is a superkey. One of the candidate keys of a relation is chosen by the DBA as the primary key

n-tuple

a relationship between n number of values. corresponds to number of columns/attributes in a table/relation. A 4-tuple, is a row in a relation with 4 columns. Each row is a tuple.

serializable schedule

a schedule (of transactions on data in a database) in which concurrent transactions have the same final effect as if they'd been performed in serial (one after another). That is, they don't F-up the output if done together, so they can be "serialized"

superkey

a set of one or more attributes which taken collectively are guaranteed to uniquely identify a tuple/entity (a single row in a table/relation) e.g. a SSN is a superkey, but a name (not unique) is not

relationship

an association among several entities (a table is also called a relationship, and it contains a number of entities/rows/records)

search key

an attribute or set of attributes used to look up records in a file (we often have more than one index for a file. E.g., we may wish to search for a book by author, by subject, or by title.)

domain

an attribute's set of permitted values (e.g. the set of all the names of instructors in a table where "names" is an attribute)

dense index

an index entry appears for every search-key value in the file

sparse index

an index entry appears only for some of the search-key values.

clustering index

an index whose search key also defines the sequential order of the file. (also called a primary index)

schema diagram

an object in a schema (e.g. Student, Course, Section in image) is a "schema construct" record type names, data items may be illustrated. A schema diagram does not include the data types

database

an organized body of related information

DBMS

database management system - a collection of interrelated data and a set of programs to access those data - primary goal of DBMS is to provide a way to store and access database information that is both convenient and efficient

storage manager

database system component responsible for storing, retrieving, and updating data in the database. provides interface between low-level data stored in the db (e.g. data files from physical implementation) and the application programs and queries submitted to the system.

Design Process overview

design of database schema design of programs that access and update the data design of a security scheme to control access to data

concurrency

enables better performance than serial processes (if a waiter server 1 person at a time, the entire meal, before going onto the next customer, it would take forever to get through all the meals)

index sequential files

files with a clustering index on the search index

B+-tree lookup

is straightforward and efficient. Insertion and deletion, however, are more complicated but still efficient. The number of operations required for lookup, insertion, and deletion of B+-trees is proportional to the logarithm to the base N of teh number of records in the relation, where each nonleaf node store N pointers.

data manipulation language

language enabling users to access or manipulate data (access = Retrieval, Insertion, Deletion, Modification). Two Types Procedural DML - requires user to specify what data and how to get those data Declarative DML - (or nonprocedural DMLs) require a user to specify what data but NOT how to get those data. (The SQL query language is nonprocedural. A query takes as input several tables (possibly only one) and always returns a single table.)

query language

language in which a user requests information from a database. may be procedural or non-procedural

Consistency

property that ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This doesn't guarantee correctness of a transaction, but merely that any programming errors cannot result in the violation of any defined rules. E.g. if a column can only accept "Male" or "Female" as input, but one tries to input "Hermaphrodite, then the database consistency rule would be violated, and as such, the transaction would not be allowed, thereby keeping the database consistent.

Isolation

property that ensures that the concurrent execution of transactions results in a system state the would be obtained if transactions were executed serially. a disadvantage of file processing systems vs DBMS -

data model

provides a way to describe the design of a database at the physical, logical, and view levels. a collection of tools for describing data, data relationships, data semantics, and consistency contraints.

inconsistent state

say A + B = 300. if we have A=100 & B=200, and want to move 50 to B from A, we subtract from A 100-50 and add to B 200+50, if we subtract 50 but have a failure which then doesn't ALSO add the 50 B (A+B=300 before, but not after transfer) then we've left our database in an inconsistent state

ACID

set of properties that guarantee that database transactions are processed reliably. Atomicity Consistency Isolation Durability

data definition language

specification notation for defining the database schema - e.g. CREATE TABLE instructor ( ID char(5), name varchar(20), salary numeric(8,2) ) in practice, DDL and DML are not two separate languages; instead, they form parts of a single database language such as SQL.

B+-tree

takes the form of a balanced tree, in which every path from the root of the tree to a leaf of the tree is of the same length. The height of the tree is proportional to the log to the base N of the number of records in the relation, where each nonleaf node store N pointers; the value of N is often around 50 or 100. B+-trees are much shorter than other balanced binary-tree structures such as AVL trees, and therefore require fewer disk accesses to locate records.

space overhead

the additional space occupied by an index structure. it the amount of space is moderate, it is usually worthwhile to sacrifice the space to achieve improved performance.

instance

the collection of the information stored in a database at any particular moment

Atomicity

the requirement that all the several operations required to perform a single logical unit of work do happen, or not-at-all. E.g. a transfer of $5 from one account (subtract from Source) to another (add Target) must ensure all transactions occur, or that none do. a failure cannot lave the database in a state where a transaction is partially executed.

entity set

the set of all entities of the same type

relationship set

the set of all relationships of the same type

deletion time

the time it takes to delete a data item. this value includes (1) the time to find data item, and (2) the time it takes to update the index structure.

access time

the time it takes to find a particular data item, or set of times, using the technique in question

insertion time

the time it takes to insert a new data item. this value includes (1) the time to find the correct place to insert the new data item, and (2) the time it takes to update the index structure.

procedural language

the user instructs system to perform sequence of operations on a database to compute the desired result

access types

types of access that are supported efficiently - can include finding records with a specified attribute value and finding records whose attribute values fall in a specific range.

non-procedural language

user describes the desired information without giving specific procedures for obtaining that information - SQL is non-procedural

Entity-Relationship data model (E-R)

uses a collection of basic objects, called entities, and relationships among these objects.

Relational Model

uses a collection of tables to represent both data and relationships among those data. tables are also known as relations. is a record-based model. Each table contains records (rows) of a particular type. each record type defines a fixed number of fields/attributes. The relations/tables are related to each other through the sharing of common entity characteristics (values in columns).

functional dependencies

when the value of one attribute implies or determines the value of another. In the relational model, all attributes in the table must be functionally dependent on the primary key for that table (e.g. X determines Y, written X → Y)


Related study sets

prep u 140 infection and inflammation

View Set

Chemical Reactions and Reaction Stoichiometry

View Set

Chapter 6: Appendicular Skeletal System

View Set

Benjamin Franklin "Poor Richard's Almanac"

View Set

Macroeconomics Practice Problems Ch.8

View Set

Chapter 2: Workforce Safety and Wellness

View Set

Astronomy Chapter 30 - Life in the Universe

View Set