db
M:N
(*..*)- an employee may learn many job skills, and each job skill may be learned by many employees.
1:M
(1..*) - a painter creates many different paintings, but each is painted by only one painter
Overview of DBMS Components
-
1:1
1..1)- Each department must have only one employee who is a manager and each employee who is a manager can manage only one department
Relationship
Describes a bi-directional association among entities. For example, a relationship exists between customers and sales agents that could be described as follows: A sales agent can serve many customers, and each customer may be served by one sales agent. Data models use three types of relationships:
Database Designers
Logical database designers identify: - the data (entities and attributes) - relationships between the data- the constraints on data to be stored in the database Physical database designers:- translates the logical model into statements that define the database
Data Administrator
Responsible for: the management of the data resource database planning development maintenance of standards policies & procedures conceptual/logical database design
Conceptual Database Design:
The information gathered in the requirements analysis step is used to develop a high-level description of the data to be stored in the database, along with the constraints that are known to hold on this data.
Naïve users:
Typically know nothing about database of DBMS Access the database through simple application programs
Sparse data
occurs when the number of attributes is very large, but the number of actual data instances is low.
Functions of the DBMS include
1 Allowing users to create new databases. This is done via data definition languages (DDLs). 2 Allow users to query the database via data manipulation languages (DMLs). 3 Support the storage of very large amounts of data. Typically gigabytes or more for very long periods of time. Maintaining its security and integrity in the process. 4 Control access to data from many users simultaneously.
DBMS
A DBMS is software used to design and facilitate the use of large collections of data
Logical Database Design:
A DBMS must be selected to implement the database and to convert the conceptual database design into a database schema within the data model of the chosen DBMS.
The Data Definition Language (DDL)
A Data Definition Language is a language that allows the DBA or user to describe and name the entities, attributes, and relationships required for the application, together with any associated integrity and security constraints.
Entities
A person, place, thing, or event about which data will be collected and stored. Represents a particular type of object in the real world "distinguishable" that is, each occurrence is unique and distinct. May be a tangible object, i.e., one that you can touch such as a person or a product. May be intangible, such as a flight route, or a rock concert (an event).
external data model
An external data model to represent each user's view of the organization.
internal data model
An internal data model to represent the conceptual schema in such a way that it can be understood by the DBMS.
Physical Database Design:
At this stage in the design of a database, potential workloads and access patterns are simulated to identify potential weaknesses in the conceptual database. This will often cause the creation of additional indices and/or clustering relations. In critical situations, the entire conceptual model will need restructuring.
ACID properties
Atomicity: all of a transaction is executed or none of it is executed. Consistency: data cannot be in a inconsistent state. Isolation: concurrent transactions must be isolated from each other both in effect and in visibility. Durability: changes to the database caused by a transaction must not be lost even if the system fails immediately after the transaction completes.
Query Manager
Converts a query (or database manipulation), expressed at a very high level (e.g., SQL) into a sequence of request for stored data such as specific tuples of a relation or parts of an index to a relation. The Query Manager may also have to optimize the query. This involves the formulation of a good query execution strategy.
Functions of the DBMS
Database design and application development Data Analysis Concurrency and Robustness Efficiency and Scalability
The Internal Level
Describes how the data is stored in the database. That is the physical representation of the database on the computer. Describes the physical implementation necessary to achieve optimal runtime performance and storage space utilization. Covers the data structures and file organizations used to store the data on the storage devices. Interfaces with the OS access methods (file management techniques for storing and retrieving data records) to place the data on the storage devices, build indexes, retrieve the data, and so on.
Application Developers
Develop application programs that use the database to provide the functionality required by end-user
Security Design
Different user groups are identified and their different roles are analyzed so that access patterns to the data can be defined. There is often a seventh step in this process with the last step being a tuning phase, during which the database is made operational (although it may be through a simulation) and further refinements are made as the system is "tweaked" to provide the expected environment. The illustration on the following page summarizes the main phases of database design.
Storage Manager
In a simple database system, the storage manager is nothing more than the file system of the underlying OS. In larger systems, for the purposes of efficiency, the DBMS's normally control storage on the disk directly. The storage manager consists of two basic components (1) the buffer manager, and (2) the file manager
Database
In the most general sense a database is simply a collection of related data.
Schema Refinement:
In this step the schemas developed in step 3 above are analyzed for potential problems. It is in this step that the database is normalized. Normalization of a database is based upon some elegant and powerful mathematical theory. We will discuss normalization later in the term
Relational Databases
Introduced by Codd in 1970 Suggested that database data should be stored in tables known as relations Advantages: Facilitates more rapid queries No need to be concerned with the storage structure of data Queries can be written in high level language Queries can be processed more efficiently
File Manager
Keeps track of the location of files on the disks and obtains the block or blocks containing a file on request from the buffer manager.
smaller databases
Originally, DBMS's were large, expensive running on large mainframe computers. Large size was necessary, since storage of a gigabyte of data required a large computer. Today, many gigabytes fit on small jump drives that fit in your hand making it quite feasible to run a DBMS controlling a VLDB on a personal computer. Relational DBMS based on the relation model are a common tool for small-computer applications much as spreadsheets and word processors were before them
Procedural DMLs
Procedural DMLs are languages in which the user informs the system what data is required and exactly how to retrieve that data.
Semantic Data Modeling
Semantic data models attempt to capture the "meaning" of a database. Practically, they provide an approach for conceptual data modeling. Over the years there have been several different semantic data models that have been proposed. By far the most common is the entity-relationship data model, most often referred to as simply the E-R data model.
Entity Set
Set of entities of the same type that share the same attributes. For example, the set of all persons who are customers at a particular bank can be defined as the entity set customers. Entity sets do not need to be disjoint. For example, we could define the entity set of all persons who work for a bank (employee) and the entity set of all persons who are customers of the bank (customers). A given person entity might be an employee, a customer, both, or neither.
Mappings
The DBMS is responsible for mapping between the different levels of schemas to ensure consistency It checks that each view/external schema is derivable from the conceptual schema Each external schema is related to the conceptual schema by an external/conceptual mapping. This enables the DBMS to map names in the user's view on to the relevant part of the conceptual schema. The conceptual schema is mapped to the internal schema through a conceptual/internal mapping conceptual/internal mapping enables the DBMS to find the actual record or combination of records in physical storage that constitute a logical record in the conceptual schema, together with any constraints to be enforced on the operations for that logical record.
The Conceptual Level
The conceptual level is the community view of the database. Describes what data is stored in the database and the relationships among the data. Level at which the DBA can see the entire logical structure of the database. Contains no storage-dependent details. For example, an entity may be defined as represented by an integer data type at this level, but the number of bytes it occupies is not specified at this level. Represents a complete view of the data requirements of the organization that is independent of any storage considerations. Supports all external views, since data in the views must be contained in, or derivable from, the conceptual level.
Stored Database
The stored database resides on secondary and tertiary devices
Transaction Manager
There are certain guarantees that a DBMS must make when performing operations on a database. These guarantees are often referred to as the ACID properties.
The External Level
User's view of the database. Describes that part of the database which is relevant to each user. Consists of a number of different external views of the database. Includes only those entities, attributes, and relationships in the "real world" that the user is interested in while hiding others
Parallel Computing
Very large databases require speed enhancers. Speed enhancement is handled in many different fashions in modern databases including: Indexing structures Parallelism - both in terms of CPUs as well as in terms of the database itself. To some extent, distributed database systems can also be included as a speed enhancer, although in a slightly different manner, as we will see later in the term.
Strong entities
exist independently of other types of entities has its own unique identifier
Meta-data
is data about data. It is a description of the data components of the database
conceptual schema
used at the conceptual level, describes all the entities, attributes, and relationships along with their integrity constraints.
The Relational Model
The relational model was introduced in 1970 by E.F. Codd (working for IBM at the time), in his landmark paper "A Relational Model of Data for Large Shared Databanks" (Communications of the ACM, June 1970, pp.377-387). The relational model's foundation is a mathematical concept known as a relation. To avoid the complexity of abstract mathematical theory, think of a relation as a matrix composed of intersecting rows and columns. Each row is called a tuple. Each column represents an attribute. The relational model also describes a precise set of data manipulation constructs based on advanced mathematical concepts.
system catalog
The result of the compilation/execution of the DDL statements is a set of tables stored in special files collectively referred to as the The system catalog is also commonly referred to as the data dictionary or data directory
Database Administrator
The role of the DBA is more technically oriented than that of the DA. The responsibilities include: the physical realization of the database physical database design and implementation security and integrity control maintenance of the operational system ensuring satisfactory performance of the applications for users.
internal schema
a complete description of the internal model, containing the definition of the stored records, methods of representation, etc..
Data Manipulation Language
A Data Manipulation Language is a language that provides a set of operations to support the basic data manipulation operations on the data held in the database. DML operations usually include the following: insertion of new data into the database. modification of data stored in the database. retrieval of data contained in the database. deletion of data from the database. The part of the DML that involves data retrieval is called a query language.
business rule
A business rule is a brief, precise, and unambiguous description of a policy, procedure, or principle within a specific organization.
Attribute
A characteristic of an entity. For example, a customer entity would be described by attributes such as last name, first name, phone numbers, address, etc.. It is also possible for relationships to have attributes.
conceptual data model
A conceptual data model to represent the logical (or community view) that is DBMS independent.
Data Models
A structural part, consisting of a set of rules according to which databases can be constructed. A manipulative part, defining the types of operations that are allowed on the data (this includes operations that are used for updating or retrieving data from the database and for changing the structure of the database). Possibly a set of integrity rules, which ensures that the data is accurate.
Derived Data
Data calculated by the DBMS from physical data. If the data from which it is derived changes in some fashion, then so too will the derived value.
Physical Data:
Data that was entered into the database either manually or electronically.
Sophisticated users
Familiar with the structure of the database and the facilities offered by the DBMS Use a high-level query language like SQL to perform their required operations and may even write their own application programs.
File Systems
File systems were first commercial database systems appearing in the late 1960's These facilitated data storage but did not really facilitate concurrent use. Did not support query languages their support for schemas is limited to the creation of directory structures for files. Examples were airline and banking systems which consisted of many small items which were queried and modified often
Buffer Manager
Handles main memory. IT obtains blocks of data from the disk, via the file manager, and chooses a page of main memory in which to store than block. The paging algorithm will determine how long a page will remain in main memory
Data Independence
Means that data is separate from all of the programs that use it Protects data from corruption by application programs Allows database designers to alter database at the conceptual and physical level while not affecting the external view of the database perceived by external users. Primary motivation of data independence is to restrict access to the storage structure of the data by user applications. This is achieved by hiding the specifics of the data storage structure from the applications.
Database Schema
Overall description of the database 3 types of database schema corresponding to the levels of abstraction
Attribute Domains
Permitted set of values for an attribute Sometimes called the value set, of that attribute An attribute of an entity set is a function that maps from the entity set into a domain Since an entity set may have several attributes, each entity in the set can be described by a set of <attribute, data- value> pairs, one for each attribute of the entity set.
Queries
Primary motivation for creating database is to store data in an organized way so that information can be extracted from the database Information is extracted from a database by posing specially formatted questions. These questions are called queries For queries to be processed, they have to be interpreted correctly by the DBMS. This is facilitated by database languages
What Should an Entity Be?
SHOULD BE: An object that will have many instances in the database An object that will be composed of multiple attributes An object that we are trying to model SHOULD NOT BE: A user of the database system An output of the database system (e.g. a report)
E-R model
The E-R model is often used as a form of communication between database designers and the end users during the developmental stages of a database The E-R model contains an extensive set of modeling tools. In this course we focus on those involved in conceptual database design
Data modeling
The first step in designing a database Refers to the process of creating a specific data model for a determined problem domain When done properly, the final data model effectively is a "blueprint" with all the instructions to build the database that will meet all end-user requirements.
Requirements Analysis
The first step in designing a database application is to: understand what data is to be stored in the database what applications must be built on top of it what operations are most frequent and subject to performance requirements Often this is an informal process involving discussions with user groups and studying the current environment. Examining existing applications expected to be replaced or complemented by the database system.
The Hierarchical Model
The hierarchical model was developed in the 1960s to manage large amounts of data from complex manufacturing projects, such as the Apollo rocket program (moon landing in 1969). The model's basic logical structure is represented as a tree. The tree contains levels, or segments. A segment is the equivalent of a file system's record type. Within the hierarchy, a higher layer is perceived as the parent of the segment directly beneath it, which is called the child. The hierarchical model depicts a set of 1:M relationships between a parent and its children segments. Each parent can have many children, but each child has only one parent.
key-value data model
The key-value data model is based on a structure composed of two data elements: a key and a value, in which every key has a corresponding value or set of values.
Tertiary Storage
The largest databases today require more than disks. Tertiary devices tend to store a terabyte each and have longer access times than do disks. Disadvantages Typical disk access times are in the 10-20msec range. A typical tertiary device may take several seconds. Tertiary devices involve transporting the object on which the data is stored to some reading device via a robotic conveyance of some sort. It is common to use CDs as the tertiary medium
The Network Model
The network model was created to represent complex data relationships more effectively than the hierarchical model, to improve database performance, and to impose a database standard. In the network model, the user perceives the database as a collection of records in 1:M relationships. Unlike, the hierarchical model, the network model allows a record to have more than one parent. Although it is generally not used today, the network model defined many standards and concepts that are still in use today, such as the terms schema and sub-schema and definitions for a data manipulation language (DML) and data definition language (DDL).
The Physical Level
The physical level below the DBMS consists of items only the OS knows, such as exactly how the sequencing is implemented and whether the fields of internal records are stored as contiguous bytes on the disk. Is managed by the OS under the direction of the DBMS. The functions of the DBMS and the OS at the physical level are not clear cut and vary from system to system. Some DBMSs take advantage of many of the OS access methods, while others will use only the most basic ones and create their own file organizations.
End Users
These are the "clients" for the database and can be broadly categorized into two groups based upon how they utilize the system.
Database Sub-Languages
These consists of two parts: a Data Definition Language (DDL) and a Data Manipulation Language (DML). The DDL is used to specify the database schema The DML is used to both read and update the database. Called data sublanguages because they do not include constructs for all computing needs such as conditional or iterative statements, which are provided by the high-level programming languages. Most DBMSs embed the sublanguage in a high-level programming language such as COBOL, Pascal, C, C++, Java, or Visual Basic which is then called the host language. Most data sublanguages also provide a non-embedded or interactive version of the language to be input directly from a terminal.
Characteristics of Identifiers
Will not change in value. Will not be null. No intelligent identifiers (e.g. containing locations or people that might change). Substitute new, simple keys for long, composite keys.
external schema
also called subschemas, correspond to different views of the data.
Non-procedural DMLs
are languages in which the user informs the system only of what data is required and leaves the how to retrieve the data entirely up to the system Procedural DMLs tend to be more focused on individual records while non-procedural DMLs tend to operate on sets of records.
Weak entity
dependent on a strong entity...cannot exist on its own does not have a simple unique identifier Identifier is composite including the identifier of the strong entity to which it is related. This is referred to as the identifying relationship. It links strong entities to weak entities
Data Model Basic Building Blocks
entities, attributes, constraints, relationships
Three-Levels of Abstraction in a Database System
external level, conceptual level, internal level
Physical data independence
refers to the immunity of the external schema to changes in the internal schema. Example changes in file organization and indexing structures
Logical data independence
refers to the immunity of the external schemas to changes in the conceptual schema. Example changes in data structure
constraint,
which is a restriction placed on the data. Constraints are important because they help to ensure data integrity. Constraints arise from business rules which are derived from a detailed description of how an organization operates.