Midterm Study Guide
Codd's 12 Rules
1. Information Rule 2. Guaranteed Access Rule 3. Systematic Treatment of NULL Values 4. Active Online Catalog 5. Comprehensive Data Sub-Language Rule 6. View Updating Rule 7. High-Level Insert, Update, and Delete Rule 8. Physical Data Independence 9. Logical Data Independence 10. Integrity Independence 11. Distribution Independence 12. Non-Subversion Rule
Data Definition Language (DDL)
A data definition language (DDL) is a standard for commands that define the different structures in a database. DDL statements create, modify, and remove database objects such as tables, indexes, and users. Common DDL statements are CREATE, ALTER, DROP, TRUNCATE, RENAME
Data Manipulation Language (DML)
A data manipulation language is a family of syntax elements similar to a computer programming language used for selecting, inserting, deleting, and updating data in a database. Performing read-only queries of data is sometimes also considered a component of DMLs.
Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data definition, data manipuation, and transaction management operations. This language can be used directly or by means of some application. If the database allows access to data without any help of this language, then it is considered as a violation.
Database instance
A database instance is a state of operational database with data at any given time. It contains a snapshot of the database. Database instances tend to change over time. A DBMS ensures that its every instance (state) is in a valid state, by diligently following all validations, constraints, and conditions that the database designers have imposed.
Database Management System
A database management system stores data in such a way that it becomes easier to retrieve, manipulate, and produce information.
Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can be independently modified without the need of any change in the application. this rule has been regarded as the foundation of distributed database systems
High-Level Insert, Update, and Delete Rule
A database must support high-level insertion, updation, and deletion. This must not be limited to a single row, that is, it must also support union, intersection, and minus operations to yield sets of data records.
Database Schema
A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data. A database schema defines its entities and the relationship among them. It contains a descriptive detail of the database, which can be depicted by any means of schema diagrams.
Isolation of data and application
A database system is entirely different than its data. A database is an active entity, whereas data is said to be passive, on which the database works and organizes. DBMS also stores metadata, which is about data, to ease it own process.
Relation instance
A finite set of tuples in the relational database system represents a relation instance. Relation instances do not have duplicate tuples.
Logical Database Schema
A logical database schema defines all the logical constraints that need to be applied on the data stored. It defines tables, views, and integrity constraints.
Real-world entity
A modern DBMS is more realistic and uses real-world entities to design its architecture. It uses the behavior and attributes too. For example, a school database may use students as an entity and their age as an attribute.
Physical Database Schema
A physical database schema pertains to the actual storage of data and its form of storage like files, indices, etc. It defines how the data will be stored in secondary storage.
Primary Keys
A primary key, also called a primary keyword, is a key in a relational database that is unique for each record. It is a unique identifier, such as a license number, telephone number, VIN. A relational database must always have one and only one primary key.
Relation schema
A relation schema describes the relation name (table name), attributes, and their names
Tuple
A single row of a table, which contains a single record for that relation is called a tuple.
Administrators
Administrators maintain the DBMS and are responsible for administrating the database. They are responsible to look after its usage and by whom it should be used. They create access profiles for users and apply limitations to maintain isolation and force security. Administrators also look after DBMS resources like system license, required tools, and other software and hardware related maintenance.
View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the system.
Application (Middle) Tier
At this tier reside the application server and the programs that access the database. For a user, this application tier presents an abstracted view of the database. End-users are unaware of any existence of the database beyond the application. At the other end, the database tier is not aware of any other user beyond the application tier. Hence, the application layer sits in the middle and acts as a mediator between the end-user and the database.
Database (Data) Tier
At this tier, the database resides along with its query processing languages. We also have the relations that define the data and their constraints at this level.
Boyce-Codd Normal Form
Boyce-Codd Normal Form is an extension of the Third Normal Form on strict terms. BCNF states that - - For any non-trivial functional dependency, X -> A, X must be a super-key
Consistency
Consistency is a state where every relation in a database remains consistent. There exist methods and techniques, which can detect attempt of leaving database in inconsistent state. A DBMS can provide greater consistency as compared to earlier forms of data storing applications like file-processing systems.
Relation-based tables
DBMS allows entities and relations among them to form tables. A user can understand the architecture of a database just by looking at the table names.
ACID Properties
DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (normally shortened as ACID). These concepts are applied on transactions, which manipulate data in a database. ACID properties help the database stay healthy in multi-transactional environments and in case of failure.
Less redundancy
DBMS follows the rules of normalization, which splits a relation when any of its attributes is having redundancy in values. Normalization is a mathematically rich and scientific process that reduces data redundancy.
Query Language
DBMS is equipped with query language, which makes it more efficient to retrieve and manipulate data. A user can apply as many and as different filtering options as required to retrieve a set of data. Traditionally it was not possible where file-processing system was used.
Multiple views
DBMS offers multiple views for different users. A user who is in the Sales department will have a different view of database than a person working in the Production department. This feature enables the users to have a concentrate view of the database according to their requirements.
Multi-user and Concurrent Access
DBMS supports multi-user environment and allows them to access and manipulate data in parallel. Though there are restrictions on transactions when users attempt to handle the same data item, but users are always unaware of them.
Data
Data mostly represents recordable facts. Data aids in producing information, which is based on facts. For example, if we have data about marks obtained by all students, we can then conclude about toppers and average marks.
Database
Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information
Designers
Designers are the group of people who actually work on designing part of the database. They keep a close watch on what data should be kept and in what format. They identify and design the whole set of entities, relations, constraints, and views.
Relation key
Each row has one or more attributes, known as relation key, which can identify the row in the relation (table) uniquely.
End Users
End users are those who actually reap the benefits of having a DBMS. End users can range from simple viewers who pay attention to the logs or market rates to sophisticated users such as business analysts.
User (Presentation) Tier
End-users operate on this tier and they know nothing about any existence of the database beyond this layer. At this layer, multiple views of the database can be provided by the application. All views are generated by applications that reside in the application tier.
Attribute domain
Every attribute has some pre-defined value scope, known as attribute domain.
Relational Integrity Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational Integrity Constraints. There are three main integrity constraints - 1. Key constraints 2. Domain constraints 3. Referential integrity constraints
Guaranteed Access Rule
Every single data element (value) is guaranteed to be accessible logically with a combination of table-name, primary-key (row value), and attribute-name (column value). No other means, such as pointers, can be used to access data.
Security
Feature like multiple views offer security to some extent where users are unable to access data of other users and departments. DBMS offers methods to impose constraints while entering data into the database and retrieving the same at a later stage. DBMS offers many different levels of security features, which enables multiple users to have different views with different features. For example, a user in the sales department. For example, a user in the Sales department cannot see the data that belongs to the Purchase department. Additionally, it can also be managed how much data of the Sales department should be displayed to the user. Since a DBMS is not saved on the disk as traditional file systems, it is very hard for miscreants to break the code.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units. Each attribute must contain only a single value from its pre-defined domain
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal Form and the following must satisfy - 1. No non-prime attribute is transitively dependent on prime key attribute 2. For any non-trivial functional dependency, X -> A, then either - X is a superkey or, - A is prime attribute
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have the same values for attributes B1, B2, ..., Bn. In a relational database, a functional database is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation. Functional dependency is a relationship that exists when one attribute uniquely determines another attribute. If R is a relation with attributes X and Y, a functional dependency between the attributes is represented as X -> Y, which specifies Y is functionally dependent on X.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is next to impossible. Normalization is a method to remove all these anomalies and bring the database to a consistent state.
Trivial Functional Dependency
If a functional dependency (FD) X -> Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always hold
Non-Subversion Rule
If a system has an interface that provides access to low-level records, then the interface must not be able to subvert the system and bypass security and integrity constraints.
Non-trivial Functional Dependency
If an FD X -> Y holds, where Y is not a subset of X, then it is called a non-trivial FD
Completely Non-trivial Functional Dependency
If an FD X -> Y holds, where x intersect Y = phi, it is said to be a completely non-trivial FD
Update anomalies
If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
Candidate keys
If there are more than one such minimal subsets, these are called candidate keys.
Second Normal Form
If we follow the second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X -> A holds, then there should not be any proper subset Y of X, for which Y -> A also holds true. Eliminates partial dependency.
Tables
In relational data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has rows and columns, where rows represents records and columns represent the attributes.
Many-to-many Relationship
In system's analysis a many-to-many relationship is a type of cardinality that refers to the relationship between two entities A and B in which A may contain a parent instance for which there are many children in B and vice versa. In a relational database a many-to-many relationship exists when many rows in table A may be linked to many rows in table B and vice versa.
One-to-many Relationship
In systems analysis, a one-to-many relationship is a type of cardinality that refers to the relationship between two entities A and B in which an element of A may be linked to many elements of B, but a member of B is linked only to one element of A. In a relational database, a one-to-many relationship exists when one row in table A may be linked with many rows in table B, but one row in table B is linked only to one row in table A.
One-to-one Relationship
In systems analysis, a one-to-one relationship is a type of cardinality that refers to the relationship between two entities A and B in which one element of A may only be linked to one element of B and vice versa. In a relational database, a one-to-one relationship exists when one row in a table may be linked with only one row in another table and vice versa.
Foreign Keys
In the context of relational databases, a foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table. In simpler words, the foreign key is defined in a second table, but it refers to the primary key in the first table.
Schema Diagram
It formulates all the constraints that are to be applied on the data a database schema defines its entities and the relationship among them. It contains a descriptive detail of the database, which can be depicted by means of schema diagrams.
Join
Join is a combination of a Cartesian product followed by a selection process. A join operation pairs tuples from different relations, if and only if a given join condition can be satisfied
Referential Integrity
Referential integrity is a property of data which, when satisfied, requires every value of one attribute (column) of a relation (table) to exist as a value of another attribute (column) in a different (or the same) relation (table). For referential integrity to hold in a relational database, any field in a table that is declared a foreign key can contain either a null value, or only values from a parent table's primary key or candidate key. In other words, when a foreign key value is used it must reference a valid, existing primary key in the parent table.
SQL Delete
The DELETE statement is used to delete rows in a table DELETE FROM table_name WHERE some_column=some_value;
SQL Group By
The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name;
SQL Having
The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate functions SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name HAVING aggregate_function(column_name) operator value;
SQL Inner Join
The INNER JOIN keyword selects all rows from both tables as long as there is a match between the columns in both tables SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name=table2.column_name; or: SELECT column_name(s) FROM table1 JOIN table2 ON table1.column_name=table2.column_name;
SQL Insert Into
The INSERT INTO statement is used to insert new records in a table The first form does not specify the column names where the data will be inserted, only their values: INSERT INTO table_name VALUES (v1, v2, ..., vN); The second form specifies both the column names and the values to be inserted: INSERT INTO table_name (c1, c2, ..., cN) VALUES (v1, v2, ..., vN);
SQL Left/Left Outer Join
The LEFT JOIN keyword returns all rows from the left table (table1), with the matching rows in the right table (table2). The result is NULL in the right side when there is no match. SELECT column_name(s) FROM table1 LEFT JOIN table2 ON table1.column_name=table2.column_name; or: SELECT column_name(s) FROM table1 LEFT OUTER JOIN table2 ON table1.column_name=table2.column_name;
Systematic Treatment of NULL Values
The NULL values in a database must be given a systematic and uniform treatment. This is a very important rule because a NULL can be interpreted as one of the following - data is missing, data is not known, or data is not applicable.
SQL Order By
The ORDER BY keyword is used to sort the result-set by one or more columns The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a descending order, you can use the DESC keyword. SELECT c1, c2, ..., cN FROM table_name ORDER BY c1 ASC|DESC, c2 ASC|DESC, ..., cN ASC|DESC;
SQL Right/Right Outer Join
The RIGHT JOIN keyword returns all rows from the right table (table2), with the matching rows in the table (table1). The result is NULL in the left side when there is no match. SELECT column_name(s) FROM table1 RIGHT JOIN table2 ON table1.column_name=table2.column_name; or: SELECT column_name(s) FROM table1 RIGHT OUTER JOIN table2 ON table1.column_name=table2.column_name;
SQL Distinct
The SELECT DISTINCT statement is used to return only distinct (different) values SELECT DISTINCT column_name1, column_name2, ..., column_nameN FROM table_name;
SQL Select
The SELECT statement is used to select data from a database SELECT column_name1, column_name2, ..., column_nameN FROM table_name; SELECT * FROM table_name
SQL Update
The UPDATE statement is used to update existing records in a table UPDATE table_name SET c1=v1, c2=v2, ..., cN=vN WHERE some_column=some_value;
SQL Where
The WHERE clause is used to extract only those records that fulfill a specified criterion SELECT c1, c2, ..., cN FROM table_name WHERE c1 operator value;
Physical Data Independence
The data stored in a database must be independent of the applications that access the database. Any change in the physical structure of a database must not have any impact on how the data is being accessed by external applications.
Information Rule
The data stored in a database, may it be user data or metadata, must be a value of some table cell. Everything in a database must be stored in a table format.
Logical Data Independence
The logical data in a database must be independent of its user's view (application). Any change in logical data must not affect the applications using it. For example, if two tables are merged or one is split into two different tables, there should be no impact or change on the user application. This is one of the most difficult rules to apply.
Relational Model
The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on the first-order predicate logic and defines a table as an n-array relation. The main highlights of this model are - 1. Data is stored in tables called relations 2. Relations can be normalized 3. In normalized relations, values saved are atomic values. 4. Each row in a relation contains a unique value 5. Each column in a relation contains values from a same domain
Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known as data dictionary, which can be accessed by authorized users. Users can use the same query language to access the catalog which they use to access the database itself.
Key
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset of attributes is called key for that relation. Key constraints force that - 1. In a relation with a key attribute, no two tuples can have identical values for key attributes 2. A key attribute can not have NULL values.
Theta Join
Theta join combines tuples from different relations provided they satisfy the theta condition. R1 and R2 are relations having attributes (A1, A2,...,An) and (B1, B2,...,Bn) such that the attributes don't have anything in common, that is R1 intersect R2 = phi. Theta join
Deletion anomalies
We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else
Insert anomalies
We tried to insert data in a record that does not exist at all
Non-prime attribute
an attribute, which is not a part of the prime-key, is said to be a non-prime attribute
Prime attribute
an attribute, which is part of the prime-key, is known as a prime attribute
Augmentation rule
if a -> b holds and y is attribute set, then ay -> by also holds. That is adding attributes in dependencies, does not change the basic dependencies
Reflexive rule
if alpha is a set of attributes and beta is_subset_of_alpha, the alpha holds beta
Transitivity rule
same as transitive rule in algebra, if a -> b holds and b -> c holds, then a -> c also holds. a -> b is called as functionally that determines b.