1. Introduction to DBMS
What does a DBMS store?
A very large, integrated collection of data.
What information does a DBMS model?
Entities (e.g., students, courses) Relationships (e.g., Madonna is taking CS564)
• What is a transaction? What guarantees does a DBMS offer with respect to transactions? (Section 1.7)
The ACID Test Atomic - Atomicity requires that each transaction be "all or nothing" Consistency - The consistency property ensures that any transaction will bring the database from one valid state to another Isolation - The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially Durability - Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors
What is the relational data model? (Section 1.5)
Main concept: relation is a table with rows and columns) (Every relation has a schema which defines columns or fields)
What is the WAL property, and why is it important?
The Write-Ahead-Log is a history of all actions carried out by the DBMS while executing a set of Xacts. Before a change is made to the database, the corresponding log entry is forced to a safe location. After a crash, the effects of partially executed transactions are undone using the log. (Thanks to WAL, if log entry wasn't saved before the crash, corresponding change was not applied to database!) The WAL property ensures atomicity.
What is the most widely used model of data today? (Section 1.5)
The relational model of data
Why Study Databases?
- Shift from computation to information (low end: scramble to webspace high end: scientific applications) - Datasets increasing in diversity and volume - DBMS encompasses most of CS (OS, languages, theory, AI, multimedia, logic)
Explain the advantages of using a query language instead of custom programs to process data. (Section 1.6)
A DBMS takes great care to evaluate queries as efficiently as possible. Indexes can be used to speed up many queries----in fact, a good choice of indexes for the underlying relations can speed up each query in the preceding list. A DBMS enables users to create, modify, and query data through a data manipulation language (DML).
What is a DBMS
A Database Management System is a software package designed to store and manage databases.
What is a schema? (Section 1.5)
A description of a particular collection of data, using a given data model.
What does a DBMS model?
A real-world enterprise.
What is a transaction?
An atomic sequence of database actions (reads/writes).
Explain the strict two-phase locking protocol.
Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction.
What must a user guarantee with respect to a transaction and database consistency?
Ensuring that a transaction (run alone) preserves consistency is ultimately the user's responsibility!
(25 pts) Scrooge McNugget wants to store information (names, addresses, descriptions of embarrassing moments, etc.) about the many ducks on his payroll. Not surprisingly, the volume of data compels him to buy a database system. To save money, he wants to buy one with the fewest possible features, and he plans to run it as a stand-alone application on his PC clone. Of course, Scrooge does not plan to share his list with anyone. Indicate which of the following DBMS features Scrooge should pay for; in each case, also indicate why Scrooge should (or should not) pay for that feature in the system he buys.
(a) A security facility is necessary because Scrooge does not plan to share his list with anyone else. Even though he is running it on his stand-alone PC, a rival duckster could break in and attempt to query his database. The databases security features would foil the intruder. (5 pts) (b) Concurrency control is not needed because only he uses the database.(5 pts) (c) Crash recovery is essential for any database; Scrooge would not want to lose his data if the power was interrupted while he was using the system.(5 pts) (d) A view mechanism is needed. Scrooge could use this to develop custom screens that he could conveniently bring up without writing long queries repeatedly.(5 pts) (e) A query language is necessary since Scrooge must be able to analyze the dark secrets of his victims. In particular, the query language is also used to dene views.
• When would you store data in a DBMS instead of in operating system files and vice-versa? (Section 1.3)
- A database application must stage large datasets between main memory and secondary storage (e.g., buffering, page-oriented access, 32-bit addressing, etc.) - DBMS has special code for different queries - DBMS must protect data from inconsistency due to multiple concurrent users - DBMS must support crash recovery - DBMS must have security and access control
• What are the main benefits of using a DBMS to manage data in applications involving extensive data access? (Sections 1.1, 1.4)
- Data Independence - Efficient Data Access - Data Integrity and Security - Data Administration - Concurrent Access and Crash Recovery - Reduced Application Development Time
Why would you choose a database system instead of simply storing data in operating system files? When would it make sense not to use a database system?
- Data Independence - Efficient Data Access - Data Integrity and Security - Data Administration - Concurrent Access and Crash Recovery - Reduced Application Development Time A DBMS is a complex piece of software, optimized for certain kinds of workloads (e.g., answering complex queries or handling many concurrent requests), and its performance may not be adequate for certain specialized applications. Examples include applications with tight real-time constraints or just a few well-defined critical operations for which efficient custom code must be written. Another reason for not using a DBMS is that an application may need to manipulate the data in ways not supported by the query language. In such a situation, the abstract view of the data presented by the DBMS does not match the application's needs and actually gets in the way. As an example, relational databases do not support flexible analysis of text data (although vendors are now extending their products in this direction).
• What are locks in a DBMS, and why are they used?
1. Every object that is read or written by a transaction is first locked in shared or exclusive mode, respectively. Placing a lock on an object restricts its availability to other transactions and thereby affects performance. Suppose that the following locking protocol is followed: Every transaction begins by obtaining a shared lock on each data object that it needs to read and an exclusive lock on each data object that it needs to rnod~fy, then releases all its locks after completing all actions. Consider two transactions T1 and T2 such that T1 wants to modify a data object and T2 wants to read the same object. Intuitively, if T1's request for an exclusive lock on the object is granted first, T2 cannot proceed until T1 relea..':les this lock, because T2's request for a shared lock will not be granted by the DBMS until then. Thus, all of T1's actions will be completed before any of T2's actions are initiated.
What is a data model? (Section 1.5)
A collection of concepts for describing data.
What is data independence and how does a DBMS support it?
Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. Data independence is achieved through use of the three levels of data abstraction. The conceptual schema describes stored data in terms of a data model. The physical schema specifies additional storage details. The external schema, allows data access to be customized and authorized at the level of individual users or groups.
Explain the difference between logical and physical data independence.
Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data.
What is logical data independence and why is it important?
Logical data independence: Protection from changes in logical structure of data. It shields users from changes in the logical structure of the data, or changes in the choice of relations to be stored.
• Explain the different roles of database administrators, application programmers, and end users of a database. Who needs to know the most about database systems? (Section 1.9)
The DBA is responsible for many critical tasks: • Design of the Conceptual and Physical Schemas: The DBA is responsible for interacting with the users of the system to understand what data is to be stored in the DBMS and how it is likely to be used. Based on this knowledge, the DBA must design the conceptual schema (decide what relations to store) and the physical schema (decide how to store them). The DBA may also design widely used portions of the external schema, although users probably augment this schema by creating additional views. • Security and Authorization: The DBA is responsible for ensuring that unauthorized data access is not permitted. In general, not everyone should be able to access all the data. In a relational DBMS, users can be granted permission to access only certain views and relations. For example, although you might allow students to find out course enrollments and who teaches a given course, you would not want students to see faculty salaries or each other's grade information. The DBA can enforce this policy by giving students permission to read only the Courseinfo view. • Data Availability and Recovery from Failures: The DBA must take steps to ensure that if the system fails, users can continue to access as much of the uncorrupted data as possible. The DBA must also work to restore the data to a consistent state. The DBMS provides software support for these functions, but the DBA is responsible for implementing procedures to back up the data periodically and maintain logs of system activity (to facilitate recovery from a crash). • Database Tuning: Users' needs are likely to evolve with time. The DBA is responsible for modifying the database, in particular the conceptual and physical schemas, to ensure adequate performance as requirements change. Database application programmers develop packages that facilitate data access for end users, who are usually not computer professionals, using the host or data languages and software tools that DBMS vendors provide. (Such tools include report writers, spreadsheets, statistical packages, and the like.) Application programs should ideally access data through the external schema. It is possible to write applications that access data at a lower level, but such applications would comprornise data independence. DBA's need to know the most about database systems
Why does a DBMS interleave the actions of different transactions instead of executing transactions one after the other?
The DBMS interleaves the actions of different transaction in order to ensure consistency, e.g., prevent a situation such as a check being cleared while an account balance is being computed.
Explain the difference between external, internal, and conceptual schemas.
The external schema, allows data access to be customized and authorized at the level of individual users or groups. The conceptual schema describes stored data in terms of a data model. The physical schema specifies additional storage details.
What should a DBMS guarantee with respect to concurrent execution of several transactions and database consistency'?
Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints. The ACID Test Atomic - Atomicity requires that each transaction be "all or nothing" Consistency - The consistency property ensures that any transaction will bring the database from one valid state to another Isolation - The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially Durability - Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors
• Identify the main components in a DBMS and briefly explain what they do. (Section 1.8)
http://ecomputernotes.com/fundamental/what-is-a-database/components-of-dbms http://www.scs.ryerson.ca/~mcarberr/cps610/Recap.pdf 1. File Manager manages the allocation space on disk storage and the data structures used to represent info stored on other media. In most applications (99.9%) the file is the central element. All applications are designed with the specific goal: generation and use of information. A typical file system layered architecture is the following (see also CPS510). User Program Sequential Indexed Random Lists Logical I/O Basic File System Structure Device Drivers (Disk,tape,etc) Controllers Actual Device 2. Buffer Manager among other tasks, it transfers blocks between disk (or other devices) and Main Memory (MM). A DMA (Direct Memory Access) is a form of I/O that controls the exchange of blocks between MM and a device. When a processor receives a request for a transfer of a block, it sends it to the DMA which transfers the block uninterrupted. 3. Query Parser translates statements in a query language, whether embedded or not, into a lower level language. (See RL language example from CPS510). This parser is also a strategy selector: i.e., finding the best and most efficient way (faster?) of executing the query. 4. Authorization and Integrity Manager checks for the authority of the users to access and modify info, as well as integrity constraints (keys, etc). 5. Recovery Manager ensures that the database is and remains in a consistent (sound) state after any kind of failure. 6. Concurrency Controller enforces Mutual Exclusion by ensuring that concurrent interactions with the data base proceed without conflict (deadlocks, etc).
What are the responsibilities of a DBA? If we assume that the DBA is never interested in running his or her own queries, does the DBA still need to understand query optimization? Why?
• Design of the Conceptual and Physical Schemas: The DBA is responsible for interacting with the users of the system to understand what data is to be stored in the DBMS and how it is likely to be used. Based on this knowledge, the DBA must design the conceptual schema (decide what relations to store) and the physical schema (decide how to store them). The DBA may also design widely used portions of the external schema, although users probably augment this schema by creating additional views. • Security and Authorization: The DBA is responsible for ensuring that unauthorized data access is not permitted. In general, not everyone should be able to access all the data. In a relational DBMS, users can be granted permission to access only certain views and relations. For example, although you might allow students to find out course enrollments and who teaches a given course, you would not want students to see faculty salaries or each other's grade information. The DBA can enforce this policy by giving students permission to read only the Courseinfo view. • Data Availability and Recovery from Failures: The DBA must take steps to ensure that if the system fails, users can continue to access as much of the uncorrupted data as possible. The DBA must also work to restore the data to a consistent state. The DBMS provides software support for these functions, but the DBA is responsible for implementing procedures to back up the data periodically and maintain logs of system activity (to facilitate recovery from a crash). • Database Tuning: Users' needs are likely to evolve with time. The DBA is responsible for modifying the database, in particular the conceptual and physical schemas, to ensure adequate performance as requirements change. Actually the DBA needs to understand query optimization even more than the people who are running the queries, because how fast the query returns a result directly correlates to the value of the database to the end user. In addition, query optimization is an entire subfield of databases, and can be very difficult to do correctly.