Databases Intro

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Miniworld/Universe of Discourse

- The end users of a database may perform business transactions (for example, a customer buys a camera) or events may happen (for example, an employee has a baby) that cause the information in the database to change -In order for a database to be accurate and reliable at all times, it must be a true reflection of the miniworld that it represents; therefore, changes must be reflected in the database as soon as possible

Backup and recovery subsystem

-A DBMS must provide facilities for recovering from hardware or software failures -The backup and recovery subsystem of the DBMS is responsible for recovery. -For example, if the computer system fails in the middle of a complex update transaction, the recovery subsystem is responsible for making sure that the database is restored to the state it was in before the transaction started executing -Disk backup is also necessary in case of a catastrophic disk failure

Database (uses)

-A collection of related data; by data, we mean known facts that can be recorded and that have implicit meaning; this collection of related data with an implicit meaning is a database -Data typically stored in mobile phones have their own simple database software -This data can also be recorded in an indexed address book or stored on a hard drive, using a personal computer and software such as Microsoft Access or Excel

Manipulating a database

-Includes functions such as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld, and generating reports from the data

Database Management System/DBMS (definition)

-A computerized system that enables users to create and maintain a database; a collection of programs managing databases -An interface between databases and end users -The DBMS is a general-purpose software system; not specific purpose, can be applied to general case -Facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications -Manage: create, read, update and delete data -Usually running on database servers, especially for large databases; exception is embedded databases -Very sophisticated with built-in efficient algorithms and data structures -Built based on some data model (logical structure of the database) ex. Relational model -Operations through standard language, SQL

View and Virtual Data

-A database typically has many types of users, each of whom may require a different perspective or view of the database on the same set of data -A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored; some users may not need to be aware of whether the data they refer to is stored or derived -Virtual data is not physically stored and is derived from physical data (multiple tables, each table has a physical storage location in hard drive, view data may be derived from this); this saves storage space, can be physically stored if necessary (if important, or needs to be reused) -A multiuser DBMS whose users have a variety of distinct applications must provide facilities for defining multiple views -For example, one user of the database of may be interested only in accessing and printing the transcript of each student requires a specific view, while a second user, who is interested only in checking that students have taken all the prerequisites of each course for which the student registers, may require a different view

Referential Integrity constraint

-A more complex type of constraint that frequently occurs involves specifying that a record in one file must be related to records in other files (done by DBMS) -Foreign key

Concurrency Control

-A multiuser DBMS must allow multiple users to access the database at the same time; this is essential if data for multiple applications is to be integrated and maintained in a single database -The DBMS must include concurrency control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct

Query vs Transaction

-A query typically causes some data to be retrieved; sometimes loosely used for all types of interactions with databases, including modifying the data -In a narrow sense, query means data retrieval; in a broader sense, query means any database access, including modifying data -A transaction may cause some data to be read and some data to be written into the database

Maintaining database (by DBMS)

-A typical large database may have a life cycle of many years, so the DBMS must be able to maintain the database system by allowing the system to evolve as requirements change over time

Application Program (purpose)

-Accesses the database by sending (general) queries or requests for data to the DBMS -May include creating a database, creating a table, creating a stored procedure, calling a stored procedure, or doing other regular queries; all of these are related to database design and are a big decision because changing the structure of a database

Sharing a database

-Allows multiple users and programs to access the database simultaneously

NOSQL Systems (purpose)

-Also referred to as big data storage systems -Data from social media sites ; store nontraditional data, such as posts, tweets, images, and video clips

Transaction (definition)

-An executing program or process that includes one or more database accesses, such as reading or updating of database records -Each transaction is supposed to execute a logically correct database access if executed in its entirety without interference from other transactions

DBMS and controlled redundancy

-DBMS should have capabilities to control this redundancy in order to prohibit inconsistencies among the files -DBMS can enforce certain constraints; cannot enforce business rules as DBMS is a general-purpose software -Application programs enforce business rules; responsibility of application programmers

What is a Database

-In general, a database is a collection of related data; emphasize some relationship -A database has some business purpose; model aspects of reality, business trying to solve real world problem, model or represent something relevant in real world -Organized in a special way to support certain operations (done by DBMS): Retrieval, Insertion (new data), Update (modify), Deletion

Program-operation independence

-In some types of database systems, such as object-oriented and object-relational systems, users can define operations on data as part of the database definitions; an operation (also called a function or method) is specified in two parts: 1. The interface (or signature) of an operation includes the operation name and the data types of its arguments (or parameters). 2. The implementation (or method) of the operation is specified separately and can be changed without affecting the interface. -User application programs can operate on the data by invoking these operations through their names and arguments, regardless of how the operations are implemented (called program-operation independence)

Integrity constraint (definition)

-Integrity constraints are used to ensure accuracy and consistency of data in a relational database -Most database applications have certain integrity constraints that must hold for the data -A DBMS should provide capabilities for defining and enforcing these constraints -The simplest type of integrity constraint involves specifying a data type for each data item (done by DBMS) -These constraints are derived from the meaning or semantics of the data and of the miniworld it represents -It is the responsibility of the database designers to identify integrity constraints during database design -Some constraints can be specified to the DBMS and automatically enforced; other constraints may have to be checked by update programs or at the time of data entry; for typical large applications, it is customary to call such constraints business rules

Defining a database

-Involves specifying the data types, structures, and constraints of the data to be stored in the database -Structure of a database (tables, table definitions) -Data types (inside a table, for each data, what is the type) -Constraints of data (impose conditions on data); if data violates constraint, DBMS does not allow it -Uniqueness (ex. all columns in certain table have unique value) -The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or dictionary (called meta-data)

Controlled Redundancy

-It is sometimes necessary to use controlled redundancy to improve the performance of queries -Faster access if stored in a new table with redundancy -By placing all the data together, we do not have to search multiple files to collect the data -So, multiple table access with complex queries is replaced by single table access with simple query (known as data denormalization) -DBMS should have capabilities to control this redundancy in order to prohibit inconsistencies among the files

Stored Procedures

-More involved procedures to enforce rules are popularly called stored procedures; they become a part of the overall database definition and are invoked appropriately when certain conditions are met -In form of functions (with input and output) -Written in SQL -Execute a sequence SQL statements in DBMS -Part of database definition -Can be called locally and remotely (from the application programs) -Can be created in application programs -Can reduce data traffic on Internet -Can enforce business rules

Traditional Database (properties)

-Most of the information that is stored and accessed is either textual or numeric -Relational Database

Data Format Stored in databases

-Most popular: textural and numeric -Multimedia data includes images, audios, and videos (all are binary files) -Two ways to store multimedia data: 1)Binary: Store data directly; SQL format is BLOBS (Binary large objects) 2)Textural: Store file names in databases and leave the data on a hard drive of the server; have the location/path to access a given file; usually used in web applications

Nontradtional Database (properties)

-NOSQL (not only SQL) -New databases used on top of relational databases (SQL) to do something more -Much simpler than relational databases, as traditional databases handle the simple data in a complex manner -Mainly handle relatively simple data but a large amount supporting a fast search -Take advantage of simple structure of nontraditional data (performance) -Not replacing traditional databases because need to handle complicated data; augment traditional databases to support fast performance for social media data

Data format for social media

-Nontraditional data: posts, tweets, images, video clips, etc -These have a large data amount -The structure of the data is fairly simple: typically key-value pairs; for each data piece, there is a key which is used for search operations -If this data is treated as traditional data, must use complex traditional way to manipulate it; slows performance

Embedded Database

-Powered by database engines -May be installed on small device, such as a phone app -Not remote, is a local database that does not require internet connection -Since it is embedded, it is a small database and thus cannot include database server; instead, include database engine for all necessary queries

Object-oriented databases

-Programming languages typically have complex data structures, such as structs or class definitions in C++ or Java -The values of program variables or objects are discarded once a program terminates, unless the programmer explicitly stores them in permanent files, which often involves converting these complex structures into a format suitable for file storage -When the need arises to read this data once more, the programmer must convert from the file format to the program variable or object structure -Object-oriented database systems are compatible with programming languages such as C++ and Java, and the DBMS software automatically performs any necessary conversions -Hence, a complex object in C++ can be stored permanently in an object-oriented DBMS

Protecting database (by DBMS)

-Protection includes system protection against hardware or software malfunction (or crashes) and security protection against unauthorized or malicious access

Meta-Data

-Refers to database definition; different from business data -Data used for database definition; descriptive data for data -Stored by the DBMS in the form of a database catalog or dictionary -For traditional databases, the meta-data (stored definition of a database) is part of the database, and the DBMS uses software to access and modify the meta-data of a primary database when the user modifies the primary database -NOSQL systems do not have meta-data; data is simple and self-descriptive, already contains structure data you need

Database (properties)

-Represents some aspect of the real world, sometimes called the miniworld or the universe of discourse (UoD); changes to the miniworld are reflected in the database -A database is a logically coherent collection of data with some inherent meaning; a random assortment of data cannot correctly be referred to as a database -A database is designed, built, and populated with data for a specific purpose; it has an intended group of users and some preconceived applications in which these users are interested -Overall, a database has some source from which data is derived, some degree of interaction with events in the real world, and an audience that is actively interested in its contents

Data denormalization

-Simplify the query -Space-performance trade off -User more space to exchange for better performance (with more effort for data consistency) -Update derived data for consistency (extra cost)

Indexes

-Since the database is typically stored on disk, the DBMS must provide specialized data structures and search techniques to speed up disk search for the desired records -Auxiliary files called indexes used for this; these files can be sorted (such as binary search) while original files cannot be sorted -Indexes are typically based on tree data structures or hash data structures that are suitably modified for disk search -In order to process the database records needed by a particular query, those records must be copied from disk to main memory; therefore, the DBMS often has a buffering or caching module that maintains parts of the database in main memory buffers; make data loading from disk to memory faster -In general, the operating system is responsible for disk-to-memory buffering; however, because data buffering is crucial to the DBMS performance, most DBMSs do their own data buffering -The query processing and optimization module of the DBMS is responsible for choosing an efficient query execution plan for each query based on the existing storage structures; the choice of which indexes to create and maintain is part of physical database design and tuning

Uniqueness Constraint

-Specifies uniqueness on data item values, such as a key (done by DBMS)

Data persistence (definition)

-Store data permanently (such as programming objects, which are data with complex structures) on hard disk -Databases can be used to provide persistent storage for program objects and data structures -Such an object is said to be persistent, since it survives the termination of program execution and can later be directly retrieved by another program

Multimedia databases (use)

-Store images, audio clips, and video streams digitally (as files)

Transaction properties

-The DBMS must enforce several transaction properties -The isolation property ensures that each transaction appears to execute in isolation from other transactions, even though hundreds of transactions may be executing concurrently -The atomicity property ensures that either all the database operations in a transaction are executed or none are -Define a business task containing a sequence of database accesses that cannot be separated (cannot execute only "part" of it)

Database System (broad definition)

-The database and DBMS software together can be referred to as the database system -The database system includes Application programs/Queries, the DBMS software, the stored database, and the stored database definition (meta-data)

When to not use a DBMS

-The overhead costs of using a DBMS are due to the following: High initial investment in hardware, software, and training; the generality that a DBMS provides for defining and processing data; overhead for providing security, concurrency control, recovery, and integrity functions -It may be more desirable to develop customized database applications under the following circumstances: Simple, well-defined database applications that are not expected to change at all; stringent, real-time requirements for some application programs that may not be met because of DBMS overhead; embedded systems with limited storage capacity, where a general-purpose DBMS would not fit; no multiple-user access to data

Constructing the database

-The process of storing the data on some storage medium that is controlled by the DBMS

Impedance Mismatch Problem

-Traditional database systems often suffered from the socalled impedance mismatch problem, since the data structures provided by the DBMS were incompatible with the programming language's data structures -The RDBMS is a tabular data structure; program objects are a tree data structure; multiple tables with correlations may be needed to store one object, and the maintenance cost is high -Object-oriented database systems typically offer data structure compatibility with one or more object-oriented programming languages


Kaugnay na mga set ng pag-aaral

Odessey American Government CH1 section 2

View Set

Chapter 41: Gastrointestinal Dysfunction NCLEX

View Set

Which of the following words is most closely related to technological risks associated with HVAC?

View Set

Evolve Cardiovascular System, Blood, and Lymphatic Systems

View Set

Medical Terminology- Chapters 1 + 2

View Set