Database 2

Ace your homework & exams now with Quizwiz!

Physical data independence

Generally, physical data independence exists in most databases and file environ- ments where physical details such as the exact location ofdata on disk,and hard- ware details of storage encoding, placement, compression, splitting, merging of records,and so on are hidden from the user.Applications remain unaware ofthese details.On the other hand,logical data independence is harder to achieve because it allows structural and constraint changes without affecting application programs—a much stricter requirement. Whenever we have a multiple-level DBMS,its catalog must be expanded to include information on how to map requests and data among the various levels.The DBMS uses additional software to accomplish these mappings by referring to the mapping information in the catalog.Data independence occurs because when the schema is changed at some level,the schema at the next higher level remains unchanged;only the mapping between the two levels is changed. Hence, application programs refer- ring to the higher-level schema need not be changed. The three-schema architecture can make it easier to achieve true data indepen- dence, both physical and logical. However, the two levels of mappings create an overhead during compilation or execution of a query or program, leading to ineffi- ciencies in the DBMS.Because ofthis,few DBMSs have implemented the full three- schema architecture.

The three-schema architecture can be used to further explain the concept of data independence,which can be defined as the capacity to change the schema at one level ofa database system without having to change the schema at the next higher level.We can define two types of data independence:

Logical data independence is the capacity to change the conceptual schema without having to change external schemas or application programs. We may change the conceptual schema to expand the database (by adding a record type or data item),to change constraints,or to reduce the database (by removing a record type or data item).In the last case,external schemas that refer only to the remaining data should not be affected.For example,the external schema ofFigure 1.5(a)should not be affected by changing the GRADE_REPORT file (or record type) shown in Figure 1.2into the one shown in Figure 1.6(a).Only the view definition and the mappings need to be changed in a DBMS that supports logical data independence.After the conceptual schema undergoes a logical reorganization, application pro- grams that reference the external schema constructs must work as beforeChanges to constraints can be applied to the conceptual schema without affecting the external schemas or application programs. 2. Physical data independence is the capacity to change the internal schema without having to change the conceptual schema. Hence, the external schemas need not be changed as well.Changes to the internal schema may be needed because some physical files were reorganized—for example,by creat- ing additional access structures—to improve the performance of retrieval or update.Ifthe same data as before remains in the database,we should not have to change the conceptual schema. For example, providing an access path to improve retrieval speed ofsection records (Figure 1.2) by semester and year should not require a query such as list all sections offered in fall 2008 to be changed,although the query would be executed more efficiently by the DBMS by utilizing the new access path.

Categories of Data Models

Many data models have been proposed,which we can categorize according to the types ofconcepts they use to describe the database structure. High-level or conceptual data models provide concepts that are close to the way many users per- ceive data, whereas low-level or physical data models provide concepts that describe the details ofhow data is stored on the computer storage media,magnetic disks.Concepts provided by low-level data models are generally meant for computer specialists,not for end users.Between these two extremes is a class of representational (or implementation) data models,4 which provide concepts that may be easily understood by end users but that are not too far removed from the way data is organized in computer storage. Representational data models hide many details ofdata storage on disk but can be implemented on a computer system directly.

Three-Tier and n-Tier Architectures for Web Applications

Many Web applications use an architecture called the three-tier architecture,which adds an intermediate layer between the client and the database This intermediate layer or middle tier is called the application server or the Web server,depending on the application.This server plays an intermediary role by run- ning application programs and storing business rules (procedures or constraints) that are used to access data from the database server.It can also improve database security by checking a client's credentials before forwarding a request to the data- base server. Clients contain GUI interfaces and some additional application-specific business rules. The intermediate server accepts requests from the client, processes the request and sends database queries and commands to the database server,and then acts as a conduit for passing (partially) processed data from the database server to the clients,where it may be processed further and filtered to be presented to users in GUI format.Thus,the user interface, application rules, and data access act as the three tiers. Figure 2.7(b) shows another architecture used by database and other application package vendors. The presentation layer displays information to the user and allows data entry.The business logic layer handles intermediate rules and constraints before data is passed up to the user or down to the DBMS.The bottom layer includes all data management services.The middle layer can also act as a Web server,which retrieves query results from the database server and formats them into dynamic Web pages that are viewed by the Web browser at the client side. Other architectures have also been proposed.It is possible to divide the layers between the user and the stored data further into finer components,thereby giving rise to n-tier architectures, where n may be four or five tiers.Typically, the business logic layer is divided into multiple layers. Besides distributing programming and data throughout a network, n-tier applications afford the advantage that any one tier can run on an appropriate processor or operating system platform and can be handled independently.Vendors of ERP (enterprise resource planning) and CRM (customer relationship management) packages often use a middleware layer, which accounts for the front-end modules (clients) communicating with a number of back-end databases (servers)Advances in encryption and decryption technology make it safer to transfer sensi- tive data from server to client in encrypted form,where it will be decrypted.The lat- ter can be done by the hardware or by advanced software.This technology gives higher levels ofdata security,but the network security issues remain a major con- cern.Various technologies for data compression also help to transfer large amounts ofdata from servers to clients over wired and wireless networks.

Mapping and three schema

Notice that the three schemas are only descriptions of data; the stored data that actually exists is at the physical level only.In a DBMS based on the three-schema architecture, each user group refers to its own external schema. Hence, the DBMS must transform a request specified on an external schema into a request against the conceptual schema,and then into a request on the internal schema for processing over the stored database. If the request is a database retrieval, the data extracted from the stored database must be reformatted to match the user's external view.The processes of transforming requests and results between levels are called mappings. These mappings may be time-consuming, so some DBMSs—especially those that are meant to support small databases—do not support external views.Even in such systems, however, a certain amount of mapping is necessary to transform requests between the conceptual and internal levels.

Physical data models

Physical data models describe how data is stored as files in the computer by repre- senting information such as record formats, record orderings, and access paths. An access path is a structure that makes the search for particular database records effi- cient.We discuss physical storage techniques and access structures in Chapters 17 and 18.An index is an example ofan access path that allows direct access to data using an index term or a keyword.It is similar to the index at the end ofthis book, except that it may be organized in a linear, hierarchical (tree-structured), or some other fashion.

Representational or implementation data models

Representational or implementation data models are the models used most fre- quently in traditional commercial DBMSs.These include the widely used relational data model,as well as the so-called legacy data models—the network and hierarchical models—that have been widely used in the past.Part 2 is devoted to the relational data model, and its constraints, operations and languages.5 The SQL standard for relational databases is described in Chapters 4 and 5. Representational data models represent data by using record structures and hence are sometimes called record-based data models.

Three schema architecture and data dependence

Three of the four important characteristics of the database approach, listed in Section 1.3,are (1) use ofa catalog to store the database description (schema) so as to make it self-describing, (2) insulation of programs and data (program-data and program-operation independence), and (3) support of multiple user views. In this section we specify an architecture for database systems,called the three-schema architecture,9 that was proposed to help achieve and visualize these characteristics. Then we discuss the concept of data independence further.

Object data model

We can regard the object data model as an example ofa new family ofhigher-level implementation data models that are closer to conceptual data models.A standard for object databases called the ODMG object model has been proposed by the Object Data Management Group (ODMG).We describe the general characteristics ofobject databases and the object model proposed standard in Chapter 11.Object data models are also frequently utilized as high-level conceptual models, particu- larly in the software engineering domain.

Basic idea of data models

addition to the basic operations provided by the data model,it is becoming more common to include concepts in the data model to specify the dynamic aspect or behaviorofa database application.This allows the database designer to specify a set of valid user-defined operations that are allowed on the database objects.3 An exam- ple ofa user-defined operation could be COMPUTE_GPA,which can be applied to a STUDENT object. On the other hand, generic operations to insert, delete, modify, or retrieve any kind ofobject are often included in the basic data model operations. Concepts to specify behavior are fundamental to object-oriented data models (see Chapter 11) but are also being incorporated in more traditional data models.For example, object-relational models (see Chapter 11) extend the basic relational model to include such concepts,among others.In the basic relational data model, there is a provision to attach behavior to the relations in the form ofpersistent stored modules, popularly known as stored procedures

Two-Tier Client/Server Architectures for DBMS

relational database management systems (RDBMSs), many of which started as centralized systems,the system components that were first moved to the client side were the user interface and application programs.Because SQL (see Chapters 4and 5) provided a standard language for RDBMSs,this created a logical dividing between client and server. Hence, the query and transaction functionality related to SQL processing remained on the server side.In such an architecture,the server is often called a query server or transaction server because it provides these two functionalities.In an RDBMS,the server is also often called an SQL server. The user interface programs and application programs can run on the client side. When DBMS access is required,the program establishes a connection to the DBMS (which is on the server side);once the connection is created,the client program can communicate with the DBMS.A standard called Open Database Connectivity (ODBC) provides an application programming interface (API), which allows client-side programs to call the DBMS,as long as both client and server machines have the necessary software installed. Most DBMS vendors provide ODBC drivers for their systems.A client program can actually connect to several RDBMSs and send query and transaction requests using the ODBC API,which are then processed at the server sites.Any query results are sent back to the client program,which can process and display the results as needed.A related standard for the Java program- ming language, called JDBC,has also been defined.This allows Java client programs to access one or more DBMSs through a standard interface. The different approach to two-tier client/server architecture was taken by some object-oriented DBMSs, where the software modules of the DBMS were divided between client and server in a more integrated way.For example,the server level may include the part ofthe DBMS software responsible for handling data storage on disk pages, local concurrency control and recovery, buffering and caching of disk pages, and other such functions. Meanwhile, the client level may handle the user interface; data dictionary functions; DBMS interactions with programming lan- guage compilers; global query optimization, concurrency control, and recovery across multiple servers; structuring of complex objects from the data in the buffers; and other such functions. In this approach, the client/server interaction is more tightly coupled and is done internally by the DBMS modules—some ofwhich reside on the client and some on the server—rather than by the users/programmers.The exact division offunctionality can vary from system to system.In such a client/server architecture, the server has been called a data server because it pro- vides data in disk pages to the client.This data can then be structured into objects for the client programs by the client-side DBMS software. The architectures described here are called two-tier architectures because the soft- ware components are distributed over two systems: client and server. The advan- tages of this architecture are its simplicity and seamless compatibility with existing systems. The emergence of the Web changed the roles of clients and servers, leading to the three-tier architecture.

Centralized DBMSs Architecture

Architectures for DBMSs have followed trends similar to those for general computer system architectures. Earlier architectures used mainframe computers to provide the main processing for all system functions, including user application programs and user interface programs,as well as all the DBMS functionality.The reason was that most users accessed such systems via computer terminals that did not have pro- cessing power and only provided display capabilities. Therefore, all processing was performed remotely on the computer system, and only display information and controls were sent from the computer to the display terminals,which were con- nected to the central computer via various types of communications networks. As prices ofhardware declined,most users replaced their terminals with PCs and workstations. At first, database systems used these computers similarly to how they had used display terminals,so that the DBMS itselfwas still a centralized DBMS in which all the DBMS functionality, application program execution, and user inter- face processing were carried out on one machine. Figure 2.4 illustrates the physical components in a centralized architecture. Gradually, DBMS systems started to exploit the available processing power at the user side,which led to client/server DBMS architectures.

Conceptual data models entity attribute and relationship

Conceptual data models use concepts such as entities, attributes, and relationships. An entityrepresents a real-world object or concept,such as an employee or a project from the miniworld that is described in the database.An attribute represents some property ofinterest that further describes an entity,such as the employee's name or salary. A relationship among two or more entities represents an association among the entities, for example, a works-on relationship between an employee and a proj- ect. Chapter 7 presents the Entity-Relationship model—a popular high-level con- ceptual data model. Chapter 8 describes additional abstractions used for advanced modeling, such as generalization, specialization, and categories (union types).

Graphical User Interfaces.

Graphical User Interfaces. A GUI typically displays a schema to the user in dia- grammatic form.The user then can specify a query by manipulating the diagram.In many cases,GUIs utilize both menus and forms.Most GUIs use a pointing device, such as a mouse,to select certain parts ofthe displayed schema diagram.

Basic Client/Server Architecture

First,we discuss client/server architecture in general,then we see how it is applied to DBMSs. The client/server architecturewas developed to deal with computing envi- ronments in which a large number of PCs, workstations, file servers, printers, database servers, Web servers, e-mail servers, and other software and equipment are connected via a network.The idea is to define specialized servers with specific functionalities.For example,it is possible to connect a number ofPCs or small workstations as clients to a file serverthat maintains the files ofthe client machines. Another machine can be designated as a printer server by being connected to vari- ous printers;all print requests by the clients are forwarded to this machine.Web servers or e-mail servers also fall into the specialized server category. The resources provided by specialized servers can be accessed by many client machines.The client machines provide the user with the appropriate interfaces to utilize these servers,as well as with local processing power to run local applications.This concept can be carried over to other software packages,with specialized programs—such as a CAD (computer-aided design) package—being stored on specific server machines and being made accessible to multiple clients. Figure 2.5 illustrates client/server archi- tecture at the logical level;Figure 2.6is a simplified diagram that shows the physical architecture. Some machines would be client sites only (for example, diskless work- stations or workstations/PCs with disks that have only client software installed)concept of client/server architecture assumes an underlying framework that consists ofmany PCs and workstations as well as a smaller number ofmainframe machines, connected via LANs and other types of computer networks. A client in this framework is typically a user machine that provides user interface capabilities and local processing. When a client requires access to additional functionality— such as database access—that does not exist at that machine,it connects to a server that provides the needed functionality. A server is a system containing both hard- ware and software that can provide services to the client machines,such as file access, printing, archiving, or database access. In general, some machines install only client software, others only server software, and still others may include both client and server software, as illustrated in Figure 2.6. However, it is more common that client and server software usually run on separate machines.Two main types of basic DBMS architectures were created on this underlying client/server framework: two-tier and three-tier.13We discuss them next

Database System Utilities

In addition to possessing the software modules just described,most DBMSs have database utilities that help the DBA manage the database system.Common utilities have the following types of functions: ■ Loading. A loading utility is used to load existing data files—such as text files or sequential files—into the database. Usually, the current (source) format ofthe data file and the desired (target) database file structure are speci- fied to the utility,which then automatically reformats the data and stores it in the database.With the proliferation of DBMSs, transferring data from one DBMS to another is becoming common in many organizations.Some ven- dors are offering products that generate the appropriate loading programs, given the existing source and target database storage descriptions (internal schemas).Such tools are also called conversion tools. For the hierarchical DBMS called IMS (IBM) and for many network DBMSs including IDMS (Computer Associates), SUPRA (Cincom), and IMAGE (HP),the vendors or third-party companies are making a variety of conversion tools available (e.g.,Cincom's SUPRA Server SQL) to transform data into the relational model. ■ Backup. A backup utility creates a backup copy ofthe database,usually by dumping the entire database onto tape or other mass storage medium.The backup copy can be used to restore the database in case ofcatastrophic disk failure. Incremental backups are also often used, where only changes since the previous backup are recorded. Incremental backup is more complex, but saves storage space. ■ Database storage reorganization.This utility can be used to reorganize a set of database files into different file organizations, and create new access paths to improve performance. ■ Performance monitoring. Such a utility monitors database usage and pro- vides statistics to the DBA.The DBA uses the statistics in making decisions such as whether or not to reorganize files or whether to add or drop indexes to improve performance

DBMS Interfaces

Menu-Based Interfaces for Web Clients or Browsing. These interfaces pre- sent the user with lists ofoptions (called menus) that lead the user through the for- mulation ofa request.Menus do away with the need to memorize the specific commands and syntax of a query language; rather, the query is composed step-by- step by picking options from a menu that is displayed by the system.Pull-down menus are a very popular technique in Web-based user interfaces.They are also often used in browsing interfaces,which allow a user to look through the contents ofa database in an exploratory and unstructured manner. Forms-Based Interfaces. A forms-based interface displays a form to each user. Users can fill out all ofthe form entries to insert new data,or they can fill out only certain entries,in which case the DBMS will retrieve matching data for the remain- ing entries.Forms are usually designed and programmed for naive users as inter- faces to canned transactions.Many DBMSs have forms specification which are special languages that help programmers specify such forms. SQL*Forms is a form-based language that specifies queries using a form designed in conjunc- tion with the relational database schema.Oracle Forms is a component ofthe Oracle product suite that provides an extensive set offeatures to design and build applications using forms.Some systems have utilities that define a form by letting the end user interactively construct a sample form on the screen.

Natural Language Interfaces.

Natural Language Interfaces. These interfaces accept requests written in English or some other language and attempt to understand them. A natural lan- guage interface usually has its own schema,which is similar to the database concep- tual schema, as well as a dictionary of important words. The natural language interface refers to the words in its schema,as well as to the set ofstandard words in its dictionary, to interpret the request. If the interpretation is successful, the inter- face generates a high-level query corresponding to the natural language request and submits it to the DBMS for processing;otherwise,a dialogue is started with the user to clarify the request. The capabilities of natural language interfaces have not advanced rapidly. Today, we see search engines that accept strings of natural lan- guage (like English or Spanish) words and match them with documents at specific sites (for local search engines) or Web pages on the Web at large (for engines like Google or Ask).They use predefined indexes on words and use ranking functions to retrieve and present resulting documents in a decreasing degree of match. Such "free form"textual query interfaces are not yet common in structured relational or legacy model databases,although a research area called keyword-based querying has emerged recently for relational databases.

Data manipulation language

Once the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS pro- vides a set ofoperations or a language called the data manipulation language (DML) for these purposes. In current DBMSs,the preceding types oflanguages are usually not considered dis- tinct languages; rather, a comprehensive integrated language is used that includes constructs for conceptual schema definition, view definition, and data manipula- tion.Storage definition is typically kept separate,since it is used for defining physi- cal storage structures to fine-tune the performance of the database system, which is usually done by the DBA staff.A typical example ofa comprehensive database lan- guage is the SQL relational database language (see Chapters 4and 5),which repre- sents a combination of DDL, VDL, and DML, as well as statements for constraint specification, schema evolution, and other features. The SDL was a component in early versions ofSQL but has been removed from the language to keep it at the con- ceptual and external levels only. There are two main types ofDMLs.A high-level or nonprocedural DML can be used on its own to specify complex database operations concisely.Many DBMSs allow high-level DML statements either to be entered interactively from a display monitor or terminal or to be embedded in a general-purpose programming lan- guage.In the latter case,DML statements must be identified within the program so that they can be extracted by a precompiler and processed by the DBMS.A low- level or procedural DML must be embedded in a general-purpose programming language. This type of DML typically retrieves individual records or objects from the database and processes each separately. Therefore, it needs to use Once the database schemas are compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS pro- vides a set ofoperations or a language called the data manipulation language (DML) for these purposes. In current DBMSs,the preceding types oflanguages are usually not considered dis- tinct languages; rather, a comprehensive integrated language is used that includes constructs for conceptual schema definition, view definition, and data manipula- tion.Storage definition is typically kept separate,since it is used for defining physi- cal storage structures to fine-tune the performance of the database system, which is usually done by the DBA staff.A typical example ofa comprehensive database lan- guage is the SQL relational database language (see Chapters 4and 5),which repre- sents a combination of DDL, VDL, and DML, as well as statements for constraint specification, schema evolution, and other features. The SDL was a component in early versions ofSQL but has been removed from the language to keep it at the con- ceptual and external levels only. There are two main types ofDMLs.A high-level or nonprocedural DML can be used on its own to specify complex database operations concisely.Many DBMSs allow high-level DML statements either to be entered interactively from a display monitor or terminal or to be embedded in a general-purpose programming lan- guage.In the latter case,DML statements must be identified within the program so that they can be extracted by a precompiler and processed by the DBMS.A low- level or procedural DML must be embedded in a general-purpose programming language. This type of DML typically retrieves individual records or objects from the database and processes each separately. Therefore, it needs to use language constructs,such as looping,to retrieve and process each record from a set of records. Low-level DMLs are also called record-at-a-time DMLs because of this property.DL/1,a DML designed for the hierarchical model,is a low-level DML that uses commands such as GET UNIQUE, GET NEXT,or GET NEXT WITHIN PARENT to navigate from record to record within a hierarchy ofrecords in the database.High- level DMLs,such as SQL,can specify and retrieve many records in a single DML statement; therefore,they are called set-at-a-timeor set-oriented DMLs.A query in a high-level DML often specifies which data to retrieve rather than howto retrieve it; therefore, such languages are also called declarative.

DBMS Languages

Once the design ofa database is completed and a DBMS and any mappings between the two.In many DBMSs where no strict separation of levels is maintained, one language, called the data definition language (DDL), is used by the DBA and by database designers to define both schemas.The DBMS will have a DDL compiler whose function is to process DDL statements in order to iden- tify descriptions ofthe schema constructs and to store the schema description in the DBMS catalog. In DBMSs where a clear separation is maintained between the conceptual and inter- nal levels,the DDL is used to specify the conceptual schema only.Another language, the storage definition language (SDL),is used to specify the internal schema.The mappings between the two schemas may be specified in either one ofthese lan- guages. In most relational DBMSs today, there is no specific language that performs the role of SDL. Instead, the internal schema is specified by a combination of func- tions, parameters, and specifications related to storage. These permit the DBA staff to control indexing choices and mapping ofdata to storage.For a true three-schema architecture, we would need a third language, the view definition language (VDL), to specify user views and their mappings to the conceptual schema,but in most DBMSs the DDL is used to define both conceptual and external schemas. In relational DBMSs,SQL is used in the role ofVDL to define user or application views as results of predefined queries (see Chapters 4 and 5). chosen to implement the database,the first step is to specify conceptual and internal schemas for the database

Data models schemas and instances

One fundamental characteristic ofthe database approach is that it provides some level of data abstraction. Data abstraction generally refers to the suppression of details of data organization and storage, and the highlighting of the essential fea- tures for an improved understanding of data. One of the main characteristics of the database approach is to support data abstraction so that different users can perceive data at their preferred level ofdetail.A data model—a collection of concepts that can be used to describe the structure ofa database—provides the necessary means to achieve this abstraction.2 By structure of a database we mean the data types,rela- tionships,and constraints that apply to the data.Most data models also include a set of basic operations for specifying retrievals and updates on the database.

Tools, Application Environments, and Communications Facilities

Other tools are often available to database designers, users, and the DBMS. CASE tools12 are used in the design phase ofdatabase systems.Another tool that can be quite useful in large organizations is an expanded data dictionary (or data reposi- tory) system.In addition to storing catalog information about schemas and con- straints, the data dictionary stores other information, such as design decisions, usage standards, application program descriptions, and user information. Such a system is also called an information repository. This information can be accessed directly by users or the DBA when needed.A data dictionary utility is similar to the DBMS catalog,but it includes a wider variety ofinformation and is accessed mainly by users rather than by the DBMS softwareApplication development environments, such as PowerBuilder (Sybase) or JBuilder (Borland), have been quite popular.These systems provide an environment for developing database applications and include facilities that help in many facets of database systems, including database design, GUI development, querying and updating, and application program development. The DBMS also needs to interface with communications software, whose function is to allow users at locations remote from the database system site to access the data- base through computer terminals, workstations, or personal computers. These are connected to the database site through data communications hardware such as Internet routers, phone lines, long-haul networks, local networks, or satellite com- munication devices. Many commercial database systems have communication packages that work with the DBMS.The integrated DBMS and data communica- tions system is called a DB/DC system. In addition, some distributed DBMSs are physically distributed over multiple machines. In this case, communications net- works are needed to connect the machines.These are often local area networks (LANs),but they can also be other types ofnetworks.

Classifications of DBMS

Several criteria are normally used to classify DBMSs.The first is the data model on which the DBMS is based.The main data model used in many current commercial DBMSs is the relational data model.The object data model has been implemented in some commercial systems but has not had widespread use.Many legacy applica- tions still run on database systems based on the hierarchical and network data models.Examples ofhierarchical DBMSs include IMS (IBM) and some other sys- tems like System 2K (SAS Inc.) and TDMS.IMS is still used at governmental and industrial installations, including hospitals and banks, although many of its users have converted to relational systems.The network data model was used by many vendors and the resulting products like IDMS (Cullinet—now Computer Associates), DMS 1100 (Univac—now Unisys), IMAGE (Hewlett-Packard),VAX- DBMS (Digital—then Compaq and now HP),and SUPRA (Cincom) still have a fol- lowing and their user groups have their own active organizations.Ifwe add IBM's popular VSAM file system to these,we can easily say that a reasonable percentage of worldwide-computerized data is still in these so-called legacy database systems. The relational DBMSs are evolving continuously, and, in particular, have been incorporating many of the concepts that were developed in object databases. This has led to a new class ofDBMSs called object-relational DBMSs.We can categorize DBMSs based on the data model: relational, object, object-relational, hierarchical, network, and other. More recently,some experimental DBMSs are based on the XML (eXtended Markup Language) model, which is a tree-structured (hierarchical) data model. These have been called native XML DBMSs. Several commercial relational DBMSs have added XML interfaces and storage to their products. The second criterion used to classify DBMSs is the number of users supported by the system.Single-user systemssupport only one user at a time and are mostly used with PCs. Multiuser systems, which include the majority of DBMSs, support con- current multiple users. The third criterion is the number of sites over which the database is distributed. A DBMS is centralized ifthe data is stored at a single computer site.A centralized DBMS can support multiple users,but the DBMS and the database reside totally at a single computer site.A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites,connected by a computer network. Homogeneous DDBMSs use the same DBMS software at all the sites,whereas

Speech input and output

Speech Input and Output. Limited use ofspeech as an input query and speech as an answer to a question or result ofa request is becoming commonplace. Applications with limited vocabularies such as inquiries for telephone directory, flight arrival/departure, and credit card account information are allowing speech for input and output to enable customers to access this information.The speech input is detected using a library ofpredefined words and used to set up the param- eters that are supplied to the queries.For output,a similar conversion from text or numbers into speech takes place

DBMS Component Modules

The database and the DBMS catalog are usually stored on disk.Access to the disk is controlled primarily by the operating system (OS), which schedules disk read/write.Many DBMSs have their own buffer management module to schedule disk read/write, because this has a considerable effect on performance. Reducing disk read/write improves performance considerably. A higher-level stored data manager module ofthe DBMS controls access to DBMS information that is stored on disk,whether it is part ofthe database or the catalog. Let us consider the top part ofFigure 2.3first.It shows interfaces for the DBA staff, casual users who work with interactive interfaces to formulate queries, application programmers who create programs using some host programming languages, and parametric users who do data entry work by supplying parameters to predefined transactions.The DBA staffworks on defining the database and tuning it by making changes to its definition using the DDL and other privileged commands. The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes information such as the names and sizes offiles,names and data types ofdata items, storage details of each file, mapping information among schemas, and constraints. In addition,the catalog stores many other types ofinformation that are needed by the DBMS modules,which can then look up the catalog information as needed. Casual users and persons with occasional need for information from the database interact using some form ofinterface,which we call the interactive query interface in Figure 2.3.We have not explicitly shown any menu-based or form-based interac- tion that may be used to generate the interactive query automatically.These queries are parsed and validated for correctness ofthe query syntax,the names offiles data elements,and so on by a query compiler that compiles them into an internal form. This internal query is subjected to query optimization (discussed in Chapters 19 and 20). Among other things, the query optimizer is concerned with the rearrangement and possible reordering of operations, elimination of redundancies, and use of correct algorithms and indexes during execution. It consults the system catalog for statistical and other physical information about the stored data and gen- erates executable code that performs the necessary operations for the query and makes calls on the runtime processor.

Distinction between database schema and database

The distinction between database schema and database state is very important. When we define a new database,we specify its database schema only to the DBMS. At this point,the corresponding database state is the empty state with no data.We get the initial state ofthe database when the database is first populated or loaded with the initial data.From then on,every time an update operation is applied to the database,we get another database state.At any point in time,the database has a current state.8 The DBMS is partly responsible for ensuring that every state ofthe database is a valid state—that is,a state that satisfies the structure and constraints specified in the schema.Hence,specifying a correct schema to the DBMS is extremely important and the schema must be designed with utmost care.The DBMS stores the descriptions of the schema constructs and constraints—also called the meta-data—in the DBMS catalog so that DBMS software can refer to the schema whenever it needs to.The schema is sometimes called the intension,and a database state is called an extension of the schema. Although, as mentioned earlier, the schema is not supposed to change frequently, it is not uncommon that changes occasionally need to be applied to the schema as the application requirements change. For example, we may decide that another data item needs to be stored for each record in a file,such as adding the Date_of_birth to the STUDENTschema in Figure 2.1.This is known as schema evolution.Most mod- ern DBMSs include some operations for schema evolution that can be applied while the database is operational

Three schema architecture

The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user applications from the physical database. In this architecture, schemas can be defined at the following three levels: 1. The internal level has an internal schema, which describes the physical stor- age structure ofthe database.The internal schema uses a physical data model and describes the complete details ofdata storage and access paths for the database. 2. The conceptual level has a conceptual schema, which describes the struc- ture ofthe whole database for a community ofusers.The conceptual schema hides the details of physical storage structures and concentrates on describ- ing entities, data types, relationships, user operations, and constraints. Usually, a representational data model is used to describe the conceptual schema when a database system is implemented.This implementation con- ceptual schema is often based on a conceptual schema design in a high-level data model. 3. The external or view level includes a number of external schemas or user views.Each external schema describes the part ofthe database that a partic- ular user group is interested in and hides the rest ofthe database from that user group.As in the previous level,each external schema is typically imple- mented using a representational data model, possibly based on an external schema design in a high-level data model.

Use of three architecture

The three-schema architecture is a convenient tool with which the user can visualize the schema levels in a database system.Most DBMSs do not separate the three levels completely and explicitly,but support the three-schema architecture to some extent. Some older DBMSs may include physical-level details in the conceptual schema. The three-level ANSI architecture has an important place in database technology development because it clearly separates the users'external level, the database's con- ceptual level,and the internal storage level for designing a database.It is very much applicable in the design ofDBMSs,even today.In most DBMSs that support user views,external schemas are specified in the same data model that describes the conceptual-level information (for example, a relational DBMS like Oracle uses SQL for this).Some DBMSs allow different data models to be used at the conceptual and external levels.An example is Universal Data Base (UDB),a DBMS from IBM, which uses the relational model to describe the conceptual schema,but may use an object-oriented model to describe an external schema.

DML

Whenever DML commands,whether high level or low level,are embedded in a general-purpose programming language, that language is called the host language and the DML is called the data sublanguage.10 On the other hand,a high-level DML used in a standalone interactive manner is called a query language.In general, both retrieval and update commands ofa high-level DML may be used interactively and are hence considered part ofthe query language.11 Casual end users typically use a high-level query language to specify their requests, whereas programmers use the DML in its embedded form.For naive and paramet- ric users,there usually are user-friendly interfaces for interacting with the data- base;these can also be used by casual users or others who do not want to learn the details of a high-level query language.We discuss these types of interfaces next.

Schemas, Instances, and Database State

any data model,it is important to distinguish between the description of the data- base and the database itself.The description ofa database is called the database schema,which is specified during database design and is not expected to change frequently.6 Most data models have certain conventions for displaying schemas as diagrams.7 A displayed schema is called a schema diagram. Figure 2.1 shows a schema diagram for the database shown in Figure 1.2;the diagram displays the structure ofeach record type but not the actual instances ofrecords.We call each object in the schema—such as STUDENT or COURSE—a schema construct. A schema diagram displays only some aspects ofa schema,such as the names of record types and data items,and some types ofconstraints.Other aspects are not specified in the schema diagram; for example,Figure 2.1 shows neither the data type of each data item, nor the relationships among the various files. Many types of con- straints are not represented in schema diagrams.A constraint such as students majoring in computer science must take CS1310 before the end oftheir sophomore year is quite difficult to represent diagrammatically. The actual data in a database may change quite frequently.For example,the data- base shown in Figure 1.2changes every time we add a new student or enter a new grade.The data in the database at a particular moment in time is called a database state or snapshot.It is also called the current set of occurrences or instances in the


Related study sets

Chapter 21: Insurance Companies and Pension Funds

View Set

Unit 2 Atoms and States of Matter

View Set

1.2 Numbers 0-30 2 - ¿Cuántos hay?

View Set

Ch. 12 - Pricing Products and Services

View Set