DBMS - Database Management System
One to Many
A single row of table 1 associates with more than one rows of table 2
Database
Collection of Data
Composite Key
A key that consists of more than one attribute to uniquely identify rows in a table is called composite key. It is also known as compound key.
Data Model
A logical structure of Database that describes the design of database to reflect entities, attributes, relationship among data, constrains etc.
What must you have to understand the view of data?
Data Abstraction Instance and Schema
Types of constraints
NOT NULL UNIQUE DEFAULT CHECK Key Constraints - PRIMARY KEY, FOREIGN KEY Domain constraints Mapping constraints
What is the need of DBMS?
Store Data in an optimized and systematic manner and Retrieve Data Fast
SQL
Structured Query Language a dbms language
how is a deadlock corrected once it is detected?
Terminating processes involved in deadlock Resource Preemption
Foreign Key
The columns of a table that points to the primary key of another table. They act as a cross-reference between tables. Foreign keys are the columns of a table that points to the candidate key of another table.
Transitive functional dependency
each functional dependency X-> Y at least one of the following conditions hold: X is a super key of table Y is a prime attribute of table -can only occur in a relation of three of more attributes.
Constraints in DBMS
enforce limits to the data or type of data that can be inserted/updated/deleted from a table. The whole purpose of constraints is to maintain the data integrity during an update/delete/insert into a table.
other names for rows
records tuples
Resource Preemption
the preemption of resources and allocation of them to the other processes until the deadlock is resolved.
Deadlock detection
use a resource scheduler. if there is a deadlock it is known to the resource scheduler.
E-R Data Model
(Entity-Relationship model) a systematic way of describing and defining a business process. An ER model is typically implemented as a database. The main components of E-R model are: entity set and relationship set.
No preemption condition
A resource cannot be forcibly taken from a process. Only the process can release a resource that is being held by it.
Super Key vs Candidate Key
Candidate keys are selected from the set of super keys. they do not have any redundant attributes. That's the reason they are also termed as minimal super key.
Disadvantages of DBMS over FILE System
Higher implementation Cost Complexity Performance for some applications
Deadlock Handling
Ignore the deadlock (Ostrich algorithm) Deadlock detection
3 types of Anomalies in un-normalized db
Insertion Update Delete
DBMS languages
Languages used for read, update and store data in a database. ie. SQL
Many to One
Many rows of table 1 associate with a single row of table 2
What does the Wait/Die algorithm do when an Older process needs a resource held by a younger process?
Older process waits.
What are the 3 levels of Data abstraction?
Physical Logical View
Delete anomaly Example:
Suppose, if at a point of time the company closes the department D890 then deleting the rows that are having emp_dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.
What are some Examples Applications where we use Database Management Systems are:
Telecom Industry Banking Education sector Online shopping
Cardinality in context of Data models
The relationship between two tables. One to One One to Many Many to One Many to Many
Mutual exclusion condition
There must be at least one resource that cannot be used by more than one process at a time.
Avoid circular wait condition
This can be avoided if the resources are maintained in a hierarchy and process can hold the resources in increasing order of precedence. This avoid circular wait. Another way of doing this to force one resource per process rule - A process can request for a resource once it releases the resource currently being held by it. This avoids the circular wait.
Atomicity
This property ensures that either all the operations of a transaction reflect in database or none. Suppose Account A has a balance of 400$ & B has 700$. Account A is transferring 100$ to Account B. This is a transaction that has two operations a) Debiting 100$ from A's balance b) Creating 100$ to B's balance. Let's say first operation passed successfully while second failed, in this case A's balance would be 300$ while B would be having 700$ instead of 800$. This is unacceptable in a banking system. Either the transaction should fail without executing any of the operation or it should process both the operations.
ACID properties in DBMS
To ensure the integrity of data during a transaction, the database system maintains the following properties. Atomicity Consistency Isolation Durability
Types of Functional Dependencies
Trivial functional dependency non-trivial functional dependency Multivalued dependency Transitive dependency
DML (Data Manipulation Language)
Used for accessing and manipulating data in a database. SELECT INSERT UPDATE DELETE
DCL (Data Control language)
Used for granting and revoking user access on a database. GRANT REVOKE
keys in DBMS
Used for identifying unique rows from table. Establishes relationship among tables.
two algorithms for deadlock avoidance
Wait/Die Wound/Wait
What does Wound/Wait algorithm do when an Older process needs a resource held by a younger process?
Younger process dies
What does the Wait/Die algorithm do when a Younger process needs a resource held by older process
Younger process dies
What does Wound/Wait algorithm do when a Younger process needs a resource held by older process
Younger process waits
Primary Key
a column or set of columns in a table that uniquely identifies tuples (rows) in that table. -can not contain duplicate values or nulls
Deadlock in DBMS
a condition wherein two or more tasks are waiting for each other in order to be finished but none of the task is willing to give up the resources that other task needs. In this situation no task ever gets finished and is in waiting state forever.
Management System
a set of programs to store and retrieve data.
First normal form (1NF)
an attribute (column) of a table cannot hold multiple values. It should hold only atomic values. a single row for each unique value
Mapping constraints:
constraints can be explained in terms of mapping cardinality
Candidate key
minimal super keys with no redundant attributes.
DBMS
(Database Management System) A collection of inter-related data that contains programs to store & access that data in an easy and effective manner.
Hold and wait condition:
A process that is holding a resource can request for additional resources that are being held by other processes in the system.
Removing mutual exclusion
All resources must be sharable that means at a time more than one processes can get a hold of the resources. That approach is practically impossible.
Column / Attribute:
Each attribute and its values are known as attributes in a database.
Common normal forms
First normal form (1NF) Second normal form (2NF) Third normal form(3NF) Boyce & Codd normal form (BCNF)
RDBMS
Relational Database Management System has following major components: Table, Record / Tuple, Field & Column /Attribute.
Physical Data Models
describe data at the lowest level of abstraction.
Many to Many
Many rows of table 1 associate with many rows of table 2
Hierarchical Model
Data is organized into a tree like structure with each record is having one parent record and many children. The main drawback of this model is that, it can have only one to many relationships between nodes.
How do you overcome anomalies?
Normalize Data
Types of Data Models
Object Based Record Based
Durability
Once a transaction completes successfully, the changes it has made into the database should be permanent even if there is a system failure. The recovery-management component of database systems ensures the durability of transaction.
What is data abstraction?
The process of hiding irrelevant details from user
Physical abstraction level
This is the lowest level of data abstraction. It describes how data is actually stored in database. You can get the complex data structure details at this level. records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory.
DDL (Data Definition Language)
Used for specifying the database schema CREATE ALTER DROP TRUNCATE RENAME
Table
a collection of data represented in rows and columns.
Circular wait condition
A condition where one process is waiting for a resource that is being held by second process and second process is waiting for third process ....so on and the last process is waiting for the first process. Thus making a circular chain of waiting.
Boyce & Codd normal form (BCNF)
Advanced version of 3NF that's why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
prime attribute
An attribute that is a part of one of the candidate keys is known as prime attribute.
non-prime attribute
An attribute that is not part of any candidate key
Example of 3NF
Break emp_zip, emp_state, emp_city and emp_district out of Employee table into Employee_Zip table so that they are not all dependent on employee_id. Then just have emp_zip in employee table
Isolation
For every pair of transactions, one transaction should start execution only when the other finished execution.
non-trivial functional dependency
If a functional dependency X->Y holds true where Y is not a subset of X then this dependency is called non trivial Functional dependency. An employee table with three attributes: emp_id, emp_name, emp_address. The following functional dependencies are non-trivial: emp_id -> emp_name (emp_name is not a subset of emp_id) emp_id -> emp_address (emp_address is not a subset of emp_id) On the other hand, the following dependencies are trivial: {emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
Update anomaly Example:
In a table we have two rows for an employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.
Alternate key
Out of all candidate keys, only one gets selected as primary key, remaining keys are known as alternative or secondary keys.
3 Schema levels
Physical Logical View
Preemption of resources
Preemption of resources from a process can result in rollback and thus this needs to be avoided in order to maintain the consistency and stability of the system.
Types of keys
Primary Super Candidate Alternate Composite Foreign
Cardinality In Context of Query Optimization
Refers to the uniqueness of a column in a table. The column with all unique values would be having the high cardinality and the column with all duplicate values would be having the low cardinality. These cardinality scores helps in query optimization.
Deadlock prevention
Removing mutual exclusion Removing hold and wait condition Preemption of resources Avoid circular wait condition
Insert anomaly Example:
Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if emp_dept field doesn't allow nulls.
Second normal form (2NF)
Table is in 1NF (First normal form) No non-prime attribute is dependent on the proper subset of any candidate key of table. break column into a new table instead of having to update it twice if there is more then one record. Ie. add details table for age, height, etc.
Third normal form(3NF)
Table must be in 2NF Transitive functional dependency of non-prime attribute on any super key should be removed.
Terminating processes involved in deadlock
Terminating all the processes involved in deadlock or terminating process one by one until deadlock is resolved can be the solutions but both of these approaches are not good. Terminating all processes cost high and partial work done by processes gets lost. Terminating one by one takes lot of time because each time a process is terminated, it needs to check whether the deadlock is resolved or not. Thus, the best approach is considering process age and priority while terminating them during a deadlock condition.
Functional dependency in DBMS
The attributes of a table is said to be dependent on each other when an attribute of a table uniquely identifies another attribute of the same table. If column A of a table uniquely identifies the column B of same table then it can represented as A->B (Attribute B is functionally dependent on attribute A)
Relational Model
The data and relationships are represented by collection of inter-related tables. Each table is a group of column and rows, where column represents attribute of an entity and rows represents records.
Database Instance
The data stored in database at a particular moment of time
Trivial functional dependency
The dependency of an attribute on a set of attributes Symbolically: A ->B is trivial functional dependency if B is a subset of A. The following dependencies are also trivial: A->A & B->B the following dependencies are trivial: {emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
Ignore the deadlock (Ostrich algorithm)
When it is believed that deadlocks are very rare and cost of deadlock handling is higher, in that case ignoring is better solution than handling it. For example: Let's take the operating system example - If the time requires handling the deadlock is higher than the time requires rebooting the windows then rebooting would be a preferred choice considering that deadlocks are very rare in windows.
Normalization
a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly
Super Key
a set ofone or more columns (attributes) to uniquely identify rows in a table.
Record based logical Models
describe data at the conceptual and view levels. These models specify logical structure of database with records, fields and attributes. Relational Model Hierarchical Model Network Model
Field
name of column
Advantages of DBMS over a FILE System
no redundant data Data consistency and Integrity Secure Privacy Easy access to data Easy recovery Flexible
Multivalued dependency
occurs when there are more than one independent multivalued attributes in a table. Here columns manuf_year and color are independent of each other and dependent on bike_model. In this case these two columns are said to be multivalued dependent on bike_model. These dependencies can be represented like this: bike_model ->> manuf_year bike_model ->> color
resource scheduler
one that keeps the track of resources allocated to and requested by processes.
Anomalies
problems that can occur in poorly planned, un-normalized databases where all the data is stored in one table
Mapping Cardinality
One to One: One to Many Many to One Many to Many
Coffman conditions
Mutual exclusion condition Hold and wait condition No preemption condition Circular wait condition
View abstraction level
- Highest level of data abstraction. - Describes the user interaction with database system. user just interact with system with the help of GUI and enter the details at the screen, they are not aware of how the data is stored and what data is stored; such details are hidden from them.
Logical abstraction level
- Middle level - Describes what data is stored in database - Developers and admins access this level. - records can be described as fields and attributes along with their data types, their relationship among each other can be logically implemented.
One to One
A single row of table 1 associates with single row of table 2
Types of DBMS languages
DDL (Data Definition Language) DML (Data Manipulation Language) DCL (Data Control language)
Database vs Instance
Database is the set of files where application data (the reason for a database) and meta data is stored. An instance is the software (and memory) that Oracle uses to manipulate the data in the database.
Object Based logical models
Describe Data at the conceptual and view levels. E-R Model Object Oriented Model
Database Schema
Design of a database. Defines the variable declarations in tables that belong to a particular database;
Domain constraints:
Each table has certain set of columns and each column allows a same type of data, based on its data type. The column does not accept values of any other data type. Domain constraints are user defined data type and we can define them like this: Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY / CHECK / DEFAULT)
Removing hold and wait condition
This can be removed if the process acquires all the resources that are needed before starting out. Another way to remove this to enforce a rule of requesting resource when there are none in held by the process.
Consistency
To preserve the consistency of database, the execution of transaction should take place in isolation (that means no other transaction should run concurrently when there is a transaction already running). For example account A is having a balance of 400$ and it is transferring 100$ to account B & C both. So we have two transactions here. Let's say these transactions run concurrently and both the transactions read 400$ balance, in that case the final balance of A would be 300$ instead of 200$. This is wrong. If the transaction were to run in isolation then the second transaction would have read the correct balance 300$ (before debiting 100$) once the first transaction went successful.
Network Model
data is organized into a graph-like structure Allows each record to have more than one parent record.