Everything

¡Supera tus tareas y exámenes ahora con Quizwiz!

DBMS Selection

- Define Terms of Reference of study: states the objectives and scope of the study and the tasks that need to be undertaken - Shortlist two or three products - Evaluate products - Recommend selection

trigger

- Defines an action that the database should take when some event occurs in the application. - Based on Event-Condition-Action (ECA) model. - User must be owner or have TRIGGER privilege.

Statement to grant and revoke system privileges

- Grant System Privileges/Roles dialog box and Revoke System Privileges/Roles dialog box of the Oracle Security Manager - SQL GRANT and REVOKE statements (see Section 7.6)

Java can be connected to an ODBC-compliant DBMS through:

- JDBC (dynamic SQL) or SQLJ (static SQL), - Container-Managed Persistence (CMP) - Java Data Objects (JDO) - Java Persistence API (JPA)

models based on the concepts of open nested transactions and split transactions

- Kangaroo - Reporting and Co-Transactions - MoFlex A mobile host starts a transaction and a subtransaction is started at the connected mobile support station

Attribute

An attribute is a named column of a relation.

Derived attribute

An attribute that represents a value that is derivable from the value of a related attribute or set of attributes, not necessarily in the same entity type.

DA

Data Administrator - responsible for the management of the data resource -database planning - development and maintenance of standards, policies and procedures - conceptual/logical database design - consults with and advises senior managers

DDL

Data Definition Language - The DDL allows users to specify the data types and structures and the constraints on the data to be stored in the database.

DBMS

Database Management System - A software system that enables users to define, create, maintain, and control access to the database.

EJB

Enterprise JavaBeans (EJB) is a server-side component architecture for the business tier, encapsulating business and data logic

"irreversible" and "reversible."

Irreversible techniques, as the name implies, do not permit the original data to be known. However, the data can be used to obtain valid statistical information. Reversible is more commonly used.

JDOQL

JDO Query Language - data-source-neutral query language based on Java boolean expressions

MOLAP

MOLAP servers use specialized data structures and multidimensional database management systems (MDDBMSs) to organize, navigate, and analyze data. To enhance query performance, the data is typically aggregated and stored according to predicted usage. issues include: Only a limited amount of data can be efficiently stored and analyzed. Navigation and analysis of data are limited, because the data is designed according to previously determined requirements. Data may need to be physically reorganized to optimally support new requirements.

Entity types

Main objects user is interested in

asterisk:asterisk is

Many to many

Set difference

R - S The Set difference operation defines a relation consisting of the tuples that are in relation R, but not in S. R and S must be union-compatible. (List all cities where there is a branch office but no properties for rent.)

Cartesian product

R × S The Cartesian product operation defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S. (List the names and comments of all clients who have viewed a property for rent.) kind of default join called a Cartesian Join (or Cross Join)

enhanced modeling concepts

specialization/generalization, aggregation, and composition

semantic net

symbols for entities and relationships:

In deciding where to post (or place) the foreign key attribute(s), we must first identify

the "parent" and "child" entities involved in the relationship

Thomas's write rule

ignore obsolete write rule

Lock that is held for a much shorter duration than a normal lock

latch * A latch can be used before a page is read from, or written to, disk to ensure that the operation is atomic

rigorous 2PL

leave the release of all locks until the end of the transaction

left in a wait state indefinitely, unable to acquire any new locks

livelock

The technique of locking a child node and releasing the lock on the parent node if possible

lock-coupling or crabbing

Concurrency control techniques

locking and timestamping *both conservative (pessimistic) approaches (they cause transactions to be delayed in case they conflict with other transactions at some time in the future)

distributed database

logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network

relational keys

one or more attributes that uniquely identifies each tuple in a relation

too much study too soon leads to...

paralysis by analysis

The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics.

Specialization - bottom-up

System Definition

Specifying the scope and boundaries of the database system, including major user views, its users, and application areas.

CASE benefits:

Standards Integration Support for standard methods Consistency Automation

Strong and Weak entity types

Strong entity type: An entity type that is not existence-dependent on some other entity type. Weak entity type: An entity type that is existence-dependent on some other entity type.

Criteria for data models

Structural validity Simplicity Expressibility Nonredundency Sharability Extensibility (evolve to meet new requirements) Integrity Diagrammic representation (easy to diagram)

SQL

Structured Query Language

SDLC Activities

Systems Development Life Cycle

view maintenance

The process of updating a materialized view in response to changes to the underlying data The basic aim of view maintenance is to apply only those changes necessary to the view to keep it current

The Join operation

combines two relations to form a new relation

database design methodology main phases:

conceptual, logical, and physical database design

A schedule that orders any conflicting operations in the same way as some serial execution

conflict serializability

XML Query Working Group

created XQuery formed to produce a data model for XML documents, a set of query operators on this model, and a query language based on these query operators

subschema

The part of the database as seen by the user or application program

Join operation types

Theta join Equijoin (a particular type of Theta join) Natural join Outer join Semijoin

An action, or series of actions, carried out by a single user or application program, that reads or updates the contents of the database.

Transaction

UML

Unified Modeling Language - first letter is upper-case

subtransaction

types: - compensatable - repeatable - pivot - location-dependent correct execution order of subtransactions ensures the termination despite disconnection

bean

used to encapsulate many objects into a single object (the bean)

A view is created by..

using the CREATE VIEW statement by specifying a defining query

OLAP

Online Analytical Processing The dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data.

global logical data model

Represents the data requirements for all user views

The two principal rules for the relational model

entity integrity referential integrity

The responsibility to authorize use of the DBMS usually rests with the...

Database Administrator (DBA)

Elements

Elements, or tags, are the most common form of markup

(1:*)

One-to-many "One" is parent, primary key should be posted to child

DBMS provides

- a security system - an integrity system - a concurrency control system - a recovery control system - a user-accessible catalog

mandatory participation on one side of 1:1 relationship

"Optional" is parent, primary key should be posted to child (mandatory)

Cursor Attribuites

%ROWCOUNT - evaluates total number of rows returned so far %ISOPEN - true if the cursor is open %FOUND - true if most resent fetch returned a row %NOTFOUND

Categories of OLAP Servers

- Multidimensional OLAP (MOLAP) - Relational OLAP (ROLAP) - Hybrid OLAP (HOLAP) - Desktop OLAP (DOLAP)

Web-DBMS advantages

- Simplicity - Platform independence - Graphical user interface - Standardization - Cross-platform support - Transparent network access - Scalable deployment - Innovation

polyinstantiation

- The presence of data objects that appear to have different values to users with different clearances - Precludes possibility of hints of existing tuples to lower clearance

steal policy

- allows the buffer manager to write a buffer to disk before a transaction commits (the buffer is unpinned). In other words, the buffer manager "steals" a page from the transaction. The alternative policy is no-steal - avoids the need for a very large buffer space

functions that a database management system (DBMS) should provide

- transaction support - concurrency control services - recovery services

Physical data models

- unifying model - frame memory model

Windowing calculations

- used to compute cumulative, moving, and centered aggregates EX:

Timestamping

-if a transaction attempts to read or write a data item, then the read or write is only allowed to proceed if the last update on that data item was carried out by an older transaction - each data item contains a read_timestamp, giving the timestamp of the last transaction to read the item, and a write_timestamp

Guidelines for indexes

-the primary key of a table is automatically indexed, but a field whose data type is Memo, Hyperlink, or OLE Object cannot be indexed. Microsoft advises indexing a field if all the following apply: - the field's data type is Text, Number, Currency, or Date/Time - the user anticipates searching for values stored in the field - the user anticipates sorting values in the field;the user anticipates storing many different values in the field. If many of the values in the field are the same, the index may not significantly speed up queries.

Composite key

A candidate key that consists of two or more attributes

Relational database

A collection of normalized relations with distinct relation names

Foreign key

A column or combination of columns that is used to establish and enforce a link between the data in two tables - an attribute, or set of attributes, within one relation that is the candidate key of another relation A FOREIGN KEY is a key used to link two tables together. A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table. The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.

Candidate key

A combination of attributes that uniquely identifies a database record without any extraneous data (minimal superkey)

semantic data modeling

A data model that represents the "real world"

Data Mart

A database that contains a subset of corporate data to support the analytical requirements of a particular business unit (such as the Sales department) or to support users who share the same requirements to analyse a particular business process (such as property sales).

Subclass

A distinct subgrouping of occurrences of an entity type, which must be represented in a data model. Every sub a member of super...but not all super members of subs.

Determining attribute domains

A domain is a pool of values from which one or more attributes draw their values A fully developed data model specifies the domains for each attribute and includes:allowable set of values for the attribute;sizes and formats of the attribute. Document in Data Dictionary.

Domain

A domain is the set of allowable values for one or more attributes.

FOREIGN KEY

A foreign key is a column, or set of columns, that links each row in the child table containing the foreign key to the row of the parent table containing the matching candidate key value. Referential integrity means that, if the foreign key contains a value, that value must refer to an existing, valid row in the parent table - used in CREATE or ALTER TABLE statements

Dimensionality modeling

A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables Each dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. Forms 'star-like' structure, which is called a star schema or star join.

XML

A metalanguage (a language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML.

package

A package is a schema object that groups logically related PL/SQL types, variables, and subprograms. Packages usually have two parts, a specification (spec) and a body - a collection of procedures, functions, variables, and SQL statements that are grouped together and stored as a single program unit providing a form of encapsulation - consists of specification and body Oracle package steps: - It compiles the procedure or package. - It stores the compiled code in memory - It stores the procedure or package in the database.

database developer

A person or group of people responsible for the analysis, design, and implementation of a database system.

Ranking functions

A ranking function computes the rank of a record compared to other records in the dataset based on the values of a set of measures EX: RANK( ) OVER (ORDER BY columnList) DENSE_RANK( ) OVER (ORDER BY columnList)

Relation

A relation is a table with columns and rows - subset of the Cartesian product of two or more sets

fragments

A relation may be divided into a number of subrelations called fragments, which are allocated to one or more sites. Fragments may be replicated to provide improved availability and performance.

Relationship type

A set of meaningful associations among entity types

Relational database schema

A set of relation schemas, each with a distinct name.

Database

A shared collection of logically related data and its description, designed to meet the information needs of an organization. *self-describing collection of integrated records

Tuple

A tuple is a row of a relation. The tuples are called the extension (or state) of a relation, which changes over time. The nearest equivalent in SQL is a "row". SQL rows differ from relational tuples in a few fundamental ways, for example: SQL rows are always ordered lists of values rather than unordered sets; SQL rows permit values that are un-named or have duplicate names.

Relationship occurrence

A uniquely identifiable association that includes one occurrence from each participating entity type

Character Data

ASCII and EBCDIC are commonly used character sets

Data Abstraction

Ability to change the internal definition of an object without affecting the users of the object

HAVING

Acts as a WHERE clause for groups, restricting the groups that appear in the final result table

General constraints

Additional rules specified by the users or database administrators of a database that define or constrain some aspect of the enterprise

INSERT

Adds new rows

Advantages and disadvantages of triggers

Advantages: Elimination of redundant code. Simplifying modifications. Increased security. Improved integrity. Improved processing power. Good fit with the client-server architecture. Disadvantages: Performance overhead. Cascading effects. Cannot be scheduled. Less portable.

APB-1

Analytical Processing Benchmark The aim of the APB-1 is to measure a server's overall OLAP performance rather than the performance of individual tasks

AQM

Analytical Queries per Minute a standard benchmark metric that allows for comparison of performances of different combinations of hardware and software

Special needs of a DDBMS

As well as having the standard functionality expected of a centralized DBMS, a DDBMS will need: - extended communication services - extended system catalog - distributed query processing - extended security, concurrency, and recovery services.

ACID properties of a transaction

Atomicity, Consistency, Isolation, and Durability

A property of an entity or a relationship type:

Attribute

The set of allowable values for one or more attributes:

Attribute domain

derived attribute

Attributes whose values are based on the values of other attributes

A mechanism that determines whether a user is who he or she claims to be.

Authentication

EER is used for

CAD CAM (computer aided manufacturing) CASE (computer aided software engineering) OIS (office information systems) EER byproduct of a need for "semantic" modeling concepts

Examples of Oracle roles

CONNECT: Includes only the following system privilege: CREATE SESSION RESOURCE: Includes the following system privileges: CREATE CLUSTER, CREATE INDEXTYPE, CREATE OPERATOR, CREATE PROCEDURE, CREATE SEQUENCE, CREATE TABLE, CREATE TRIGGER, CREATE TYPE DBA: All system privileges WITH ADMIN OPTION

aggregate functions

COUNT SUM AVG MIN MAX

The main aggregate functions

COUNT - returns the number of values in the associated attribute. SUM - returns the sum of the values in the associated attribute. AVG - returns the average of the values in the associated attribute. MIN - returns the smallest value in the associated attribute. MAX - returns the largest value in the associated attribute.

Primary key

Candidate key chosen for use in identification of tuples. A relation must always have a primary key.

used to improve database recovery - all modified buffer blocks, all log records, and a record identifying all active transactions are written to disk

Checkpoints

CGI disadvantages

Common Gateway Interface - a new process is created for each invocation of the CGI script, which can overload the Web server during peak times

The five basic search conditions (or predicates)

Comparison Range (BETWEEN/NOT BETWEEN) Set membership (IN/NOT IN) Pattern match (LIKE/NOT LIKE) Null

Simple/composite attributes

Composite attributes are made up of simple attributes (EX: address using street, city, zip)

Three phases of database design:

Conceptual Logical Physical

Condition handling

DECLARE {CONTINUE | EXIT | UNDO} HANDLERFOR SQLSTATE {sqlstateValue | conditionName | SQLEXCEPTION | SQLWARNING | NOT FOUND} handlerAction; A condition name and an optional corresponding SQLSTATE value can be declared using: DECLARE conditionName CONDITION[FOR SQLSTATE sqlstateValue]

RESTRICT VS CASCADE

DROP behaviors - specifies the action to be taken if the domain is currently being used - if RESTRICT is specified and the domain is used in an existing table, view, or assertion definition the drop will fail - CASCADE, any table column that is based on the domain is automatically changed to use the domain's underlying data type, and any constraint or default clause for the domain is replaced by a column constraint or column default clause, if appropriate

Roles in the Database Environment

Data and Database Administrators Database Designers Application Developers End-Users

DDL

Data definition language

Non-volatile Data

Data in the warehouse is not normally updated in real-time (RT) but is refreshed from operational systems on a regular basis

Time-variant Data

Data in the warehouse is only accurate and valid at some point in time or over some time interval

DP

Data processing

The number of participating entity types in a relationship is called the:

Degree of a relationship type Example of ternary relationship called "Registers":

conceptual schema

Describes all the entities, attributes, and relationships together with integrity constraints

Cardinality

Describes the maximum number of possible relationship occurrences for an entity participating in a given relationship type.

Disjoint constraint

Describes the relationship between members of the subclasses and indicates whether it is possible for a member of a superclass to be a member of one, or more than one, subclass (The disjoint constraint only applies when a superclass has more than one subclass. If the subclasses are disjoint, then an entity occurrence can be a member of only one of the subclasses.)

The difference between RANK and DENSE_RANK:

EX: SELECT branchNo, SUM(saleAmount) AS sales, RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS ranking, DENSE_RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS dense_ranking FROM Branch, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND Branch.city = 'Edinburgh' GROUP BY(branchNo);

INTERSECT use

EX: Construct a list of all cities where there is both a branch office and a property. (SELECT city FROM branch) INTERSECT (SELECT city FROM PropertyForRent);

Three-level architecture

External - The users' view of the database. This level describes that part of the database that is relevant to each user Conceptual - The community view of the database. This level describes what data is stored in the database and the relationships among the data Internal - The physical representation of the database on the computer. This level describes how the data is stored in the database.

FLWOR expressions

FOR forVar IN inExpression LET letVar := letExpression [WHERE filterExpression] [ORDER BY orderSpec] RETURN expression

extensions

Features that are provided on top of the standard by the vendors.

Updatable view

For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the source table.

Data Warehouse vs Data Mart

For many enterprises the way to avoid the complexities associated with designing a data warehouse is to start by building one or more data marts. Data marts allow designers to build something that is far simpler and achievable for a specific group of users.

HOLAP

HOLAP (Hybrid Online Analytical Processing) servers deliver selected data directly from the DBMS or via a MOLAP (Multidimensional OLAP) server to the desktop (or local server) in the form of a data cube, where it is stored, analyzed, and maintained locally issues include: - architecture results in redundancy - lack of consistency as well

Components of the DBMS Environment

Hardware Software Data Procedures People

File index types

Heap - (simplest - records inserted at end) Hash - (bucket via hash function) Indexed Sequential Access Method (ISAM) B -tree (paths to root and leaf same length) Clusters (two or more tables same data block)

Referential integrity

If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null. A foreign key links each tuple in the child relation to the tuple in the parent relation containing the matching candidate key value. Referential integrity means that if the foreign key contains a value, that value must refer to an existing tuple in the parent relation.

Entity integrity

In a base relation, no attribute of a primary key can be null.

homogeneous vs heterogeneous

In a homogeneous system, all sites use the same DBMS product. In a heterogeneous system, sites may run different DBMS products,

The process of keeping and maintaining a log file of all changes made to the database to enable recovery to be undertaken effectively in the event of a failure

Journaling

Outer JOIN types

LEFT JOIN RIGHT JOIN FULL JOIN

Testing evaluation criteria

Learnability: How long does it take a new user to become productive with the system? Performance: How well does the system response match the user's work practice? Robustness: How tolerant is the system of user error? Recoverability: How good is the system at recovering from user errors? Adapatability: How closely is the system tied to a single model of work?

Link Analysis

Link analysis aims to establish links, called associations, between the individual records, or sets of records, in a database

Discretionary Access Control (DAC)

Most commercial DBMSs provide an approach to managing privileges that uses SQL called Discretionary Access Control (DAC). The SQL standard supports DAC through the GRANT and REVOKE commands Has security loopholes

Sort results using

ORDER BY

OEM

Object Exchange Model: proposed models for semistructured data - nested object model.

Data Mart

Often holds only one subject area- for example, Finance, or Sales May hold more summarised data (although many hold full detail) Concentrates on integrating information from a given subject area or set of source systems Is built focused on a dimensional model using a star schema.

Guidelines for determining secondary indexes (simplified)

One approach to selecting an appropriate file organization for a relation is to keep the tuples unordered and create as many secondary indexes as necessary. Another approach is to order the tuples in the relation by specifying a primary or clustering index. One approach to determining which secondary indexes are needed is to produce a "wish-list" of attributes that we consider are candidates for indexing, and then to examine the impact of maintaining each of these indexes.

ODS

Operational Data Store - Holds current and integrated operational data for analysis. - Often structured and supplied with data in the same way as the data warehouse. - May act as staging area for data to be moved into the warehouse. - Often created when legacy operational systems are found to be incapable of achieving reporting requirements

put_line

Outputs information to a buffer in the SGA (System Global Area - an area of shared memory that is used to store data and control information for one Oracle instance). Displayed using "get_line"

Benefits of Data Warehousing

Potential high returns on investment Competitive advantage Increased productivity of corporate decision-makers

ROLLUP

ROLLUP creates subtotals that roll up from the most detailed level to a grand total put a different way: ROLLUP creates subtotals at n 1 levels, where n is the number of grouping columns EX: SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales FROM Branch, PropertyForSale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.year Month IN ('2013-08', '2013-09') AND Branch.city IN ('Aberdeen', 'Edinburgh', 'Glasgow') GROUP BY ROLLUP(propertyType, yearMonth, city);

A schedule in which for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj.

Recoverable schedule

RDF

Resource Description Framework - infrastructure that enables the encoding, exchange, and reuse of structured metadata

Use of HAVING

SELECT * FROM Staff GROUP BY branchNo HAVING COUNT(staffNo) > 1

Use of AS

SELECT COUNT(DISTINCT propertyno) AS MyCount

privileges

SELECT—the privilege to retrieve data from a table INSERT—the privilege to insert new rows into a table UPDATE—the privilege to modify rows of data in a table DELETE—the privilege to delete rows of data from a table REFERENCES—the privilege to reference columns of a named table in integrity constraints USAGE—the privilege to use domains, collations, character sets, and translations. * Issued via GRANT

Relational DBMSs generation...

Second-generation DBMS

Java Servlets

Servlets are programs that run on a Java-enabled Web server and build Web pages, analogous to CGI programming

Domain

Set of legal values CREATE DOMAIN Example: CREATE DOMAIN DomainName [AS] dataType [DEFAULT defaultOption] [CHECK (searchCondition)]

Locking

Shared lock:If a transaction has a shared lock on a data item, it can read the item but not update it. Exclusive lock:If a transaction has an exclusive lock on a data item, it can both read and update the item.

confusion matrix

Shows the counts of the actual versus predicted class values

SAX

Simple API for XML: an event-based, serial-access API for XML

Star schema

Star schema is a logical structure that has a fact table (containing factual data) in the center, surrounded by denormalized dimension tables (containing reference data).

Starflake schema

Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) dimension tables

view resolution

Tables and Views both have definitions such as the columns of the table, or the columns in a view. All columns have given data types. Tables hold row data for a given set of columns spread across the physical pages stored on a disk. Views have definitions similar to tables. However, they do not hold row data. The data found in a view is populated by an underlying query that is reading from tables that hold row data. When you read from a view, or join a view to an existing query the SQL server will execute the query in the view and join it to your dataset. When it does that, that would be view resolution.

SQL for relations, attributes, and tuples

Tables, columns, and rows

ODBC

The Microsoft Open Database Connectivity - provides a common interface for accessing heterogeneous SQL databases - based on SQL as a standard for accessing data

Database Segmentation

The aim of database segmentation is to partition a database into an unknown number of segments, or clusters, of similar records, that is, records that share a number of properties and so are considered to be homogeneous. (Segments have high internal homogeneity and high external heterogeneity.)

Physical vs Conceptual

The basic concept of the ER model is the entity type, which represents a group of "objects" in the "real world" with the same properties. An entity type has an independent existence and can be objects with a physical (or "real") existence or objects with a conceptual (or "abstract") existence

Primary key

The candidate key that is selected to uniquely identify each occurrence of an entity type.

Degree

The degree of a relation is the number of attributes it contains.

View

The dynamic result of one or more relational operations operating on the base relations to produce another relation. A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of request

transaction/relation cross-reference matrix

The matrix shows, in a visual way, the transactions that are required and the relations they access:

Candidate key

The minimal set of attributes that uniquely identifies each occurrence of an entity type.

update anywhere ownership

The update-anywhere (symmetric replication) model creates a peer-to-peer environment in which multiple sites have equal rights to update replicated data.

Subject-oriented Data

The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales). Need for decision support

To perform multi-table queries

To combine columns from several tables into a result table, we need to use a join operation

Data Modeling

Two main purposes: - assist in the understanding of the meaning (semantics) of the data - facilitate communication about the information requirements Data modeling ensures we understand: - each user's perspective of the data - the nature of the data itself, independent of its physical representations - the use of data across user views.

UNDO/REDO in context of a crash

UNDO uncommitted transactions REDO commited

Three results combinations

UNION (full) INTERSECT EXCEPT (difference, MINUS)

"View" use disadvantages

Update restriction Structure restriction Performance

Composite key

When a key consists of more than one attribute

Lore's XML-based data model

XML element is a pair (eid, value)

parallel DBMS

a DBMS running across multiple processors and disks and which has been designed to evaluate operations in parallel, whenever possible, in order to improve performance

Diagrammatic representation of attributes on relationships is done using:

a dashed line:

The MoFlex model

allow defining predicates about time, costs, and location that affect the execution of a transaction. Compensatable. Repeatable. A transaction that eventually succeeds, but might be repeated a number of times. Pivot. A transaction that is neither compensatable nor repeatable. Location-dependent. If the subtransaction has to terminate at this specific location (cell, MSS).

Foreign Key

an attribute (possible composite) in a relation of a database that serves as the primary key of another relation in the same database

kernel-based replication

replication protocol is tightly coupled to the concurrency control mechanism of the local database system

relational schema

set of relations

distributed processing

where a centralized DBMS is accessed over a network

Mapping strong entity types

"Regular" entities: - Simple - Composite - Multi-valued

Functions of a DBMS

(1) Data storage, retrieval, and update (2) A user-accessible catalog (3) Transaction support (4) Concurrency control services (5) Recovery services (6) Authorization services (7) Support for data communication (8) Integrity services (9) Services to promote data independence (10) Utility services

Codd's Rules for OLAP Tools

(1) Multidimensional conceptual view (2) Transparency (3) Accessibility (4) Consistent reporting performance (5) Client-server architecture (6) Generic dimensionality (Every data dimension must be equivalent in both structure and operational capabilities) (7) Dynamic sparse matrix handling (8) Multi-user support (9) Unrestricted cross-dimensional operations (10) Intuitive data manipulation (11) Flexible reporting (12) Unlimited dimensions and aggregation levels

OLAP key features

(online analytical processing) - multidimensional views of data - support for complex calculations - time intelligence.

lazy update propagation

- (asynchronous update propagation) - the target database is updated after the source database has been modified.

eager update propagation

- (synchronous update propagation) - ensures that all copies are updated within the enclosing transaction and voting at the end ensures the atomicity.

multilevel relation

- A relation that reveals different tuples to users with different security clearances - Related to MAC

Aspects of transactions

- A transaction should always transform the database from one consistent state to another - "Committed" if successful, "Aborted" otherwise, where database is "rolled back" or "undone" - "Compensating transaction" fixes a mistake

ASP

- Active Server Pages - allows dynamic, interactive Web pages to be created on the Web server - Active X is programming extension of ASP

API

- Application Programming Interface - "non-CGI gateways" - implemented by extending the web server

Properties of a transaction

- Atomicity: "all or nothing" - Consistency: one consistent state to another - Isolation: execute independently of one another - Durability: effects permanently recorded

Derived Attributes

- Attributes whose value can be found by examining the values of other attributes (calculated attributes) - Often, derived attributes do not appear in the logical data model but are documented in the data dictionary. If a derived attribute is displayed in the model, a "/" is used to indicate that it is derived - Considering physical design - calculated every time or stored in DB?

Phases of CRISP-DM model

- Business understanding - Data understanding - Data preparation - Modeling - Evaluation - Deployment

CGI

- Common Gateway Interface - CGI is a standard, portable, and modular method for supporting application-specific functionality by allowing scripts to be activated by the server to handle client requests - A specification for transferring information between a Web server and a CGI program - cookies - popular technique for integrating databases into the web

CGI advantages

- Common Gateway Interface - simplicity - language independence - Web server independence - wide acceptance

Data Mart vs Data Warehouse

- Data marts support only the requirements of a particular department or functional area and can therefore be built more rapidly. - The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. - Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.

Stages of the database system development lifecycle

- Database Planning - System Definition - Requirements collection and analysis - Database Design - DBMS Selection - Application Design - Prototyping - Implementation - Data Conversion and Loading - testing - Operational Maintenance

CODASYL

- First generation of DBMS Disadvantages: - Complex programs have to be written to answer even simple queries based on navigational record-oriented access - There is minimal data independence - There is no widely accepted theoretical foundation.

Mandatory Access Control (MAC)

- Mandatory Access Control (MAC) is based on system-wide policies that cannot be changed by individual users. In this approach each database object is assigned a security class and each user is assigned a clearance for a security class, and rules are imposed on reading and writing of database objects by users - SQL standard does not include support for MAC - Popular version: Bell-LaPadula model (objects, subjects, security classes and clearances)

UI Design Guidelines

- Meaningful title - Comprehensible instructions - Logical grouping/sequencing of fields - Visually appealing layout - Familiar field labels - Consistent terminology and abbreviations - Consistent use of color - Visible space and boundaries for data entry fields - Convenient cursor movement - Error correction for individual characters and entire fields - Error messages for unacceptable values - Optional fields marked clearly - Explanatory messages for fields - Completion signal

Microsoft .NET

- Microsoft's latest Web Platform - a vision for the third generation of the Internet where "software is delivered as a service, accessible by any device, any time, any place, and is fully programmable and personalizeable."

formal XML semantics processing model

- Parsing (ensures that the input expression is an instance of the language) - Normalization (converts the expression into an XQuery Core expression) - Static type analysis (bottom-up technique by applying type inference rules over expressions) - Dynamic evaluation (computes the value of the expression from the abstract syntax tree in the core language)

Web-DBMS disadvantages

- Reliability - Security - Cost - Scalability - Limited functionality of HTML - Statelessness - Bandwidth - Performance - Immaturity of deployment tools

The five fundamental operations in relational algebra

- Selection (or Restriction) - Projection - Cartesian product - Union - Set difference * In addition, there are also the Join, Intersection, and Division operations, which can be expressed in terms of the five basic operations

cursors

- Since SELECT can only be used if the query returns only one row cursors are used to navigate results - used to allow the rows of a query result to be accessed one at a time - must be declared and opened before it can be used, and it must be closed to deactivate it after it is no longer required - DECLARE, OPEN, FETCH, CLOSE, DEALLOCATE

SQL

- Structured Query Language - Non-procedural - Standard English words (for pros and non-pros) - Transform-oriented language: designed to use relations to transform inputs into required outputs. Two major components: - Data Definition Language (DDL) for defining the database structure and controlling access to the data - Data Manipulation Language (DML) for retrieving and updating data.

Synchronous vs Asynchronous

- Synchronous: All changes are applied at all sites participating in the replication as part of a single transaction. If the transaction fails at any site, the entire transaction is rolled back. - Asynchronous: Local changes are captured, stored in a queue, and propagated and applied at the remote sites at regular intervals.

Problems of Data Warehousing

- Underestimation of resources for data loading - Hidden problems with source systems - Required data not captured - Increased end-user demands - Data homogenization - High demand for resources - Data ownership - High maintenance - Long duration projects - Complexity of integration

Fan trap

- Where a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous. - Resolved by repositioning relationships

Chasm trap

- Where a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences. - Resolved by identifying a missing relationship.

A DBMS should provide the following facilities to assist with recovery:

- a backup mechanism: makes periodic backup copies of the database - logging facilities: keep track of the current state of transactions and database changes - checkpoint facility: enables updates to the database that are in progress to be made permanent - recovery manager: allows the system to restore the database to a consistent state following a failure.

View Mechanisms provide

- a level of security - a mechanism to customize the appearance of the database - can present a consistent, unchanging picture of the structure of the database

Identifying entity types

- a method of identifying entities: examine users' requirements specification. From this specification identify nouns or noun phrases that are mentioned (for example, staff number, staff name, property number, property address, rent, number of rooms). and look for major objects, such as people, places, or concepts of interest - watch for synonyms/homonyms - assign meaningful and obvious names - documents the types

The relational algebra

- a theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s) - relation-at-a-time (or set) language in which all tuples, possibly from several relations, are manipulated in one statement without looping.

cryptosystem components

- an encryption key to encrypt the data (plaintext) - an encryption algorithm that with the encryption key transforms the plaintext into ciphertext - a decryption key to decrypt the ciphertext - a decryption algorithm that with the decryption key transforms the ciphertext back into plaintext

exception

- an identifier in PL/SQL raised during the execution of a block that terminates its main body of actions - To handle raised exceptions, separate routines called exception handlers are specified

CREATE INDEX

- an index is a structure that provides accelerated access to the rows of a table based on the values of one or more columns. The presence of an index can significantly improve the performance of a query. - creation of indexes is not standard SQL

transaction analysis form includes

- any predicates that will be used - any attributes that will be required to join relations together (for a query transaction) - attributes used to order results (for a query transaction) - attributes used to group data together (for a query transaction) - any built-in functions that may be used (such as AVG, SUM) - any attributes that will be updated by the transaction.

There are three specializations of link analysis:

- associations discovery (finds items that imply the presence of other items) - sequential pattern discovery (finds patterns where one set of items is followed by another over time) - similar time sequence discovery (links between data sets that are time dependent)

Documenting attributes

- attribute name and description - data type and length - any aliases that the attribute is known by - whether the attribute is composite and, if so, the simple attributes that make up the composite attribute - whether the attribute is multi-valued - whether the attribute is derived and, if so, how it is to be computed - any default value for the attribute

relational data model

- based on the concept of mathematical relations - data and relationships are represented as tables, each of which has a number of columns with a unique name

Optimistic methods

- based on the premise that conflict is rare - sacrifices locking to gain performance, risking costly rollback - involves read (new value to local variable), validation, write

Domain constraints

- can be specified using the CHECK clause or by defining domains using the CREATE DOMAIN statement.

the four allocation strategies

- centralized (a single centralized database) - fragmented (fragments assigned to one site) - complete replication (complete copy of the database maintained at each site) - selective replication (combination of the first three)

Disadvantages of a DBMS

- complexity - size - cost (software/hardware/conversion) - performance - greater impact of failure

Logical Database Designer

- concerned with identifying the data (that is, the entities and attributes) - the relationships between the data - and the constraints on the data that is to be stored in the database - must have a thorough and complete understanding of the organization's data and any constraints on this data (business rules)

Boolean data

- consists of the distinct truth values TRUE and FALSE - unless prohibited by a NOT NULL constraint it also supports the UNKNOWN truth value as the NULL value - CHAR fixed, VARCHAR variable

Deadlock detection is usually handled by

- construction of a wait-for graph (WFG) that shows the transaction dependencies - generated at regular intervals

neural network

- contains collections of connected nodes with input, output, and processing at each node - each processing unit connected by a weighted value - may contain hidden processing layers

disadvantages of a DDBMS

- cost - complexity - lack of standards - experience

relational model

- data is logically structured within relations (tables) - each relation has a name and is made up of named attributes (columns) of data - each tuple (row) contains one value per attribute

network model

- data is represented as collections of records - relationships are represented by sets - compared with the relational model, relationships are explicitly modeled by the sets, which become pointers in the implementation.

important features of data mining tools

- data preparation (most time consuming) - selection of data mining operations (algorithms) - product scalability and performance - facilities for understanding results.

metadata

- data that describes the objects in the database - makes it easier for those objects to be accessed or manipulated - contains definitions of records, data items, etc.

Semistructured data

- data that has some structure, but the structure may not be rigid

Record-Based Data Model

- database consists of a number of fixed-format records - relational data model (simplicity, data independence...high cost, complex) - network data model (easy access to data, many to many.....changes occur to entire DB) - hierarchical data model (addresses flat file storage issue...redundant, bad at many to many)

Physical Database Designer

- decides how the logical database design is to be physically realized - mapping the logical database design into a set of tables and integrity constraints - selecting specific storage structures and access methods for the data to achieve good performance - designing any security measures required on the data.

Recovery techniques using deferred update vs immediate update

- deferred update recovery protocol: updates are not written to the database until after a transaction has reached its commit point - immediate update recovery protocol: updates are applied to the database as they occur without waiting to reach the commit point

HTTP

- defines how clients and servers communicate Request-response paradigm: - connection - request - response - close

Validate conceptual data model against user transactions

- describing the transactions - using transaction pathways

EXISTS and NOT EXISTS

- designed for use only with subqueries - produce a simple true/false result. EX: 'Find all staff such that there exists a Branch row containing his/her branch number, branchNo, and the branch city equal to London'. SELECT staffNo, fName, lName, position FROM Staff s WHERE EXISTS (SELECT * FROM Branch b WHERE s.branchNo = b.branchNo AND city = 'London');

asymmetric encryption

- different keys, public and private - digital signature can be used - RSA

The DDBMS should appear like a centralized DBMS by providing a series of transparencies:

- distribution transparency - performance transparency - transaction transparency - DBMS transparency

dynamic query optimization

- dynamically carry out decomposition and optimization every time the query is run - advantage: all information required to select an optimum strategy is up to date - disadvantages: are that the performance of the query is affected because the query has to be parsed, validated, and optimized before it can be execute

advantages of SQL include:

- easy to learn - nonprocedural - free-format - DBMS-independent - recognized international standard

PRIMARY KEY

- entity integrity - used in CREATE or ALTER TABLE statements

The Conceptual Data Model comprises:

- entity types - relationship types - attributes and attribute domains - primary keys and alternate keys - integrity constraints

A conceptual data model comprises:

- entity types - relationship types - attributes and attribute domains - primary keys and alternate keys - integrity constraints.

The five commonly used fact-finding techniques:

- examining documentation - interviewing - observing the enterprise in operation - research - questionnaires

two types of subprograms

- functions (always returns a single value to caller) - (stored) procedures (used unless single value needed)

Reasons for Creating a Data Mart

- give users access to the data they need to analyze most often - provide data in a form that matches the collective view of the data by a group of users in a department or business application area - improve end-user response time due to the reduction in the volume of data to be accessed. - provide appropriately structured data as dictated by the requirements of the end-user access tools. - simpler build compared with establishing an enterprise-wide DW (EDW). - cost of implementing data marts is normally less than that required to establish a EDW. - future users of a data mart are more easily defined and targeted to obtain support for a data mart than an enterprise-wide data warehouse project.

external schema

- highest level - subschemas - corresponds to different views of the data

Large object data type

- holds a large amount of data, such as a long text file or a graphics file. - three different types: - Binary Large Object (BLOB) - Character Large Object (CLOB) - National Character Large Object (NCLOB)

isolation level

- indicates the degree of interaction that is allowed from other transactions during the execution of the transaction. Dirty read: A transaction reads data that has been written by another as yet uncommitted transaction. Nonrepeatable read: A transaction rereads data that it has previously read, but another committed transaction has modified or deleted the data in the intervening period. Phantom read: A transaction executes a query that retrieves a set of rows satisfying a certain search condition. When the transaction re-executes the query at a later time, additional rows are returned that have been inserted by another committed transaction in the intervening period. Only the SERIALIZABLE isolation level is safe, that is, generates serializable schedules.

The types of features that should be described in the systems specification include:

- initial database size - database rate of growth - the types and average number of record searches - networking and shared access requirements - performance -security -backup and recovery -legal issues

Data model

- integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization. - structural part: set of rules according to which databases can be constructed - manipulative part: defining the types of operation that are allowed on the data - set of integrity constraints: ensures that the data is accurate.

Forms generators

- interactive facility for rapidly creating data input and display layouts for screen forms - allows the user to define what the screen is to look like

attributes in the data dictionary consist of

- its domain, consisting of a data type, length, and any constraints on the domain - an optional default value for the attribute - whether the attribute can hold nulls - whether the attribute is derived and, if so, how it should be computed

main ways to implement a replication protocol:

- kernel-based replication - middleware-based replication

unique mobile environment query processing requirments

- location-aware queries - location-dependent queries - moving object database queries - spatio-temporal queries (query results depend on the query's spatial properties - space and time) - continuous queries To enable location-dependent queries, a database must support two- and three-dimensional geometric shapes and geometric operations

transaction

- logical unit of work consisting of one or more SQL statements that is guaranteed to be atomic with respect to recovery - COMMIT statement signals successful completion of a transaction and all changes to the database are made permanent. -ROLLBACK statement signals that the transaction should be aborted and all changes to the database are undone.

Identifying and associating attributes with entity or relationship types

- look for nouns or noun phrases in the users' requirements specification. The attributes can be identified where the noun or noun phrase is a property, quality, identifier, or characteristic of one of these entities or relationships - "What information are we required to hold on x or y?" - note if simple or composite, single or multi-valued, derived - each should be associated with one relationship or entity - document attributes

potential problems caused by concurrency:

- lost update problem: apparently successful operation overidden by another user - uncommitted dependency problem: one transaction is allowed to see the intermediate results of another transaction before it has committed - inconsistent analysis problem: a transaction reads several values from the database but a second transaction updates some of them during the execution of the first

internal schema

- lowest level of abstraction - complete description of the internal model - definitions of stored records - methods of representation - data fields

shadow paging

- maintains two-page tables during the life of a transaction: a current page table and a shadow page table - when the transaction starts, the two-page tables are the same (shadow page table is never changed thereafter) - when the transaction completes, the current page table becomes the shadow page table

mobile DBMS issues

- managing limited resources - security - transaction handling - query processing - disconnections (major issue)

One-to-one (1:1) binary relationship types

- mandatory participation on both sides of 1:1 relationship - mandatory participation on one side of 1:1 relationship - optional participation on both sides of 1:1 relationship.

Subprograms

- named PL/SQL blocks that can take parameters and be invoked - two types of subprogram called functions (always returns a single value to caller) and (stored) procedures (used unless single value needed)

Ways to address OLAP server implementation issues

- preaggregation (cube materialization) - dimensional hierarchy - sparse data management

operations associated with data mining techniques

- predictive modeling - database segmentation - link analysis - deviation detection

Conceptual modeling

- process of constructing a model of the information use in an enterprise that is independent of implementation details A conceptual data model comprises: - entity types - relationship types - attributes and attribute domains - primary keys and alternate keys - integrity constraints

subquery

- produces a temporary table with results that can be accessed and used by the outer statement - scalar subquery: returns a single column and a single row - row subquery returns multiple columns, but only a single row. - table subquery returns one or more columns and multiple rows.

Secondary indexes

- provide a mechanism for specifying an additional key for a base relation that can be used to retrieve data more efficiently - obviously, extra overhead required

XQuery

- query language for XML - functional language in which a query is represented as an expression EX: Find the staff number of the first member of staff in the XML document doc("staff_list.xml")/STAFFLIST/STAFF[1]//STAFFNO

Checking model for redundancy

- re-examine one-to-one (1:1) relationships (may have identified two entities that represent the same object) - remove redundant relationships (relationship is redundant if the same information can be obtained via other relationships) - consider time dimension

Properties of Relation

- relation has a name that is distinct from all other relation names in the relational schema - each cell of the relation contains exactly one atomic (single) value - each attribute has a distinct name - the values of an attribute are all from the same domain - each tuple is distinct - the order of attributes has no significance - the order of tuples has no significance, theoretically

Identifying relationship types

- relationships could be verbs in the users' requirements specification - only interested in required relationships between entities in this step - using an ER diagram can help - determine multiplicity constraints - check for fan/chasm traps - document relationships

Deviation Detection

- relatively new technique - identifies outliers, which express deviation from some previously known expectation and norm - can be performed using statistics and visualization techniques or as a by-product of data mining

primary copy ownership

- replicated data is owned by one site, the primary (or master) copy, and can be updated only by that site - "publish and subscribe" - Potential uses: Decision support system analysis, distribution and dissemination of centralized information, consolidation of remote information, mobile computing

entity type

- represents a group of "objects" in the "real world" with the same properties - Each is represented with a rectangle - has an independent existence and can be objects with a physical (or "real") existence or objects with a conceptual (or "abstract") existence

Types of integrity constraints

- required data (when an entity must contain valid data) - attribute domain constraints (legal values) - multiplicity (a branch has many staff and a member of staff works at a single branch) -entity integrity (primary key can not be null) - referential integrity (if the foreign key contains a value, that value must refer to an existing tuple in the parent relation (candidate key)) - general constraints

hierarchical model

- restricted type of network model - data is represented as collections of records and relationships are represented by sets - allows a node to have only one parent

system catalog

- result of the compilation of the DDL statements - a set of tables stored in special files collectively - also called data dictionary and data directory

analytical operations that can be performed on data cubes

- roll-up - drill-down - "slice and dice" - pivot

Two types of triggers

- row-level triggers (FOR EACH ROW) that execute for each row of the table that is affected by the triggering event - statement-level triggers (FOR EACH STATEMENT) that execute only once even if multiple rows are affected by the triggering event.

Approaches to Integrating the Web and DBMSs

- scripting languages such as JavaScript and VBScript - Common Gateway Interface (CGI), an early, and possibly one of the most widely used, techniques - HTTP cookies;extensions to the Web server, such as the Netscape API (NSAPI) and Microsoft's Internet Information Server API (ISAPI) - Java, JEE, JDBC, SQLJ, JDO, JPA, Servlets, and JavaServer Pages (JSP) - Microsoft's Web Solution Platform: .NET, Active Server Pages (ASP), and ActiveX Data Objects (ADO) - Oracle's Internet Platform

XML advantages

- simplicity - open standard - extensibility (allow users to define own tags) - reuse (libraries of XML tags, etc) - separation of content and presentation - improved load balancing - support for the integration of data from multiple sources -ability to describe data from a wide variety of applications - more advanced search engines

Field

- smallest unit of data in a database - corresponds to a simple attribute from the ER diagram

ALTER TABLE

- statement for changing the structure of a table once it has been created - add a new column to a table - drop a column from a table - add a new table constraint - drop a table constraint - set a default for a column - drop a default for a column.

Relationship structure types

- strong entity types - weak entity types - one-to-many (1:*) binary relationship types - one-to-one (1:1) binary relationship types - one-to-one (1:1) recursive relationship types - superclass/subclass relationship types - many-to-many (*:*) binary relationship types - complex relationship types - multi-valued attributes.

alias

- the alias is separated from the table name with a space. - can be used to qualify a column name whenever there is ambiguity regarding the source of the column name - used as a shorthand notation for the table name. EX: Column: SELECT column_name AS alias_name FROM table_name; Table: SELECT column_name(s) FROM table_name AS alias_name; Table example with: SELECT o.OrderID, o.OrderDate, c.CustomerName FROM Customers AS c, Orders AS o WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID; Without: SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName FROM Customers, Orders WHERE Customers.CustomerName="Around the Horn" AND Customers.CustomerID=Orders.CustomerID;

XML schema

- the definition (both in terms of its organization and its data types) of a specific XML structure

Logical data independence

- the immunity of the external schemas to changes in the conceptual schema - the ability to modify the logical schema without causing application program to be rewritten

relations identified in the logical data model consist of

- the name of the relation - a list of simple attributes in brackets - the primary key and, where appropriate, alternate keys (AK) and foreign keys (FK) - referential integrity constraints for any foreign keys identified

Physical database design

- the process of producing a description of the implementation of the database on secondary storage - describes base relations, storage structures and access methods - integrity constraints and security measures

static query optimization

- the query is parsed, validated, and optimized once advantage: the runtime overhead is removed and there may be more time available to evaluate a larger number of execution strategies disadvantage: the execution strategy that is chosen as being optimal when the query is compiled may no longer be optimal when the query is run

There are three general techniques for handling deadlock:

- timeouts (a transaction that requests a lock will wait for only a system-defined period of time. If the lock has not been granted within this period, the lock request times out) - deadlock prevention - deadlock detection and recovery * only way to recover - terminate a process

Requirements Collection and Analysis

- to create a users' requirements specification that describes in detail the data to be held in the database and how the data is to be used - the purpose of gathering this information is to create a systems specification, which describes any features to be included in the new database system (including views)

Transaction records contain:

- transaction identifier - type of log record (transaction start, insert, update, delete, abort, commit) - identifier of data item affected by the database action (insert, delete, and update operations) - before-image of the data item, that is, its value before change (update and delete operations only) - after-image of the data item, that is, its value after change (insert and update operations only) - log management information, such as a pointer to previous and next log records for that transaction (all operations)

Examples of transaction performance criteria

- transactions that run frequently and will have a significant impact on performance - transactions that are critical to the operation of the business - the times during the day/week when there will be a high demand made on the database (called the peak load) We use this information to identify the parts of the database that may cause performance problems

Approximate numeric data

- used for defining numbers that do not have an exact representation, such as real numbers. - approximate numeric, or floating point, notation is similar to scientific notation, in which a number is written as a mantissa times some power of ten (the exponent). EX: 10E3, 5.2E6, −0.2E-4. FLOAT [precision] REAL DOUBLE PRECISION

Multiplicity constraints

- used to check and maintain data quality - assertions about entity occurrences that can be applied when the database is updated to determine whether the updates violate the stated rules of the enterprise (like upper/lower limits)

Bit data

- used to define bit strings, that is, a sequence of binary digits (bits), each having either the value 0 or 1

Exact Numeric Data

- used to define numbers with an exact representation. - consists of digits, an optional decimal point, and an optional sign. - consists of a precision (total number of significant decimal digits) and a scale (total number of decimal places). EX: -12.345 has precision 5 and scale 3 NUMERIC [ precision [, scale] ] DECIMAL [ precision[, scale] ] INTEGER SMALLINT BIGINT

Datetime data

- used to define points in time to a certain degree of accuracy - DATE (uses YEAR, MONTH, and DAY) - TIME [timePrecision] [WITH TIME ZONE] (uses HOUR, MINUTE, and SECOND) - TIMESTAMP [timePrecision] [WITH TIME ZONE] (used to store date and times) - timePrecision is the number of decimal places of accuracy to which the SECOND field is kept

SQL Identifiers

- used to identify objects in the database, such as table names, view names, and columns - an identifier can be no longer than 128 characters (most dialects have a much lower limit than this) - an identifier must start with a letter - an identifier cannot contain spaces

Interval data

- used to represent periods of time - two classes of interval data type: year-month intervals and day-time intervals. INTERVAL {{startField TO endField} singleDatetimeField}startField = YEAR | MONTH | DAY | HOUR | MINUTE [(intervalLeadingFieldPrecision)]endField = YEAR | MONTH | DAY | HOUR | MINUTE | SECOND [(fractionalSecondsPrecision)]singleDatetimeField = startField | SECOND [(intervalLeadingFieldPrecision [, fractionalSecondsPrecision])]

Application Developer

- work from a specification produced by systems analysts - Each program contains statements that request the DBMS to perform some operation on the database, which includes retrieving data, inserting, updating, and deleting data.

The purpose of the system definition stage is...

...to define the scope and boundary of the database system and its major user views

The 8 steps

1. Build Conceptual Data Model 2. Build Logical Data Model 3. Translate Logical Data Model for Target DBMS 4. Design File Organizations and Indexes 5. Design User Views 6. Design Security Mechanisms 7. Consider the Introduction of Controller Redundancy 8. Monitor and Tune the Operational System

Guidelines for determining secondary indexes

1. Do not index small relations 2. In general, index the primary key of a relation if it is not a key of the file organization 3. Add a secondary index to a foreign key, if it is frequently accessed 4. Add a secondary index to any attribute that is heavily used as a secondary key 5. Add a secondary index on attributes that are frequently involved in: - selection or join criteria - ORDER BY - GROUP BY - other operations involving sorting (such as UNION or DISTINCT) 6. Add a secondary index on attributes involved in built-in aggregate functions, along with any attributes used for the built-in functions 7. As a more general case of the previous guideline, add a secondary index on attributes that could result in an index-only plan. 8. Avoid indexing an attribute or relation that is frequently updated. 9. Avoid indexing an attribute if the query will retrieve a significant proportion (for example 25%) of the tuples in the relation. In this case, it may be more efficient to search the entire relation than to search using an index. 10. Avoid indexing attributes that consist of long character strings.

relationally complete

A language that can be used to produce any relation that can be derived using the relational calculus is said to be relationally complete

MDBS

A multidatabase system (MDBS) is a distributed DBMS in which each site maintains complete autonomy

Base relation

A named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database

Relation schema

A named relation defined by a set of attribute and domain name pairs.

domain

A pool of values from which one or more attributes draw their values (employee number being 2 letters, 3 numbers....sex (M/F))

Multidimensional Schemas

A popular data model for multidimensional data is the star schema, which is characterized by having facts (measures) in the center surrounded by dimensions, forming a star-like shape. Variations of the star schema include the snowflake and starflake schema.

Deriving relations for logical data model results in

A relational schema

View serializability

A schedule is view serializable if it is view equivalent to a serial schedule

Schema

A schema is a named collection of database objects that are in some way related to one another Relations and other database objects exist in an environment. Among other things, each environment consists of one or more catalogs, and each catalog consists of a set of schemas.

Composition

A specific form of aggregation that represents an association between entities, where there is a strong ownership and coincidental lifetime between the "whole" and the "part."

Design methodology

A structured approach that uses procedures, techniques, tools, and documentation aids to support and facilitate the process of design.

shared subclass

A subclass with more than one superclass Results in multiple inheritance

Data Warehousing

A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision-making process

2PL

A transaction follows the two-phase locking protocol if all locking operations precede the first unlock operation in the transaction. According to the rules of this protocol, every transaction can be divided into two phases: - growing phase: in which it acquires all the locks needed but cannot release any locks - shrinking phase: in which it releases its locks but cannot acquire any new locks.

users' requirements specification

All of the information gathered on each user view of the database system is described in this document

GROUP BY

Allows summary information to be included in the result table. Rows that have the same value for one or more columns can be grouped together and treated as a unit for using the aggregate functions.

well-formed XML document

An XML document that conforms to the structural and notational rules of XML is considered well-formed.

valid XML document

An XML document that is well-formed and also conforms to a DTD (Document Type Definition) is considered valid

Transaction

An action, or series of actions, carried out by a single user or application program, that accesses or changes the content of the database. Three main types: Retrieval Update Mixed (involves both retrieval and update)

Superkey

An attribute, or set of attributes, that uniquely identifies a tuple within a relation - may contain attributes not necessary for unique identification

Superclass

An entity type that includes one or more distinct subgroupings of its occurrences, which must be represented in a data model.

Deadlock

An impasse that may result when two (or more) transactions are each waiting for locks to be released that are held by the other.

CUBE

CUBE takes a specified set of grouping columns and creates subtotals for all of the possible combinations If n columns are specified for a CUBE, there will be 2n combinations of subtotals returned EX: SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales FROM Branch, PropertyForSale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.yearMonth IN ('2013-08', '2013-09') AND Branch.city IN ('Aberdeen', 'Edinburgh', 'Glasgow') GROUP BY CUBE(propertyType, yearMonth, city);

Multi-user view handling approaches

Centralized: Requirements for each user merged into single set of requirements (one-shot approach). View integration: View requirements remain on separate lists. Data models for single user view called "local data model" - which are merged into "global data model" in later stage.

Classification

Classification is used to establish a specific predetermined class for each record in a database from a finite set of possible class values. There are two specializations of classification: tree induction and neural induction. Example of tree induction:

Requirements Collection and Analysis

Collection and analysis of the requirements for the new database system. Details for each major user view: - description of data - details of how data used - additional requirements Requirements Specifications documents produced. Multi-user view handling (centralized or view integration)

CASE tools

Computer-Aided Software Engineering CASE support may include: - a data dictionary to store information about the database system's data - design tools to support data analysis - tools to permit development of the corporate data model, and the conceptual and logical data models - tools to enable the prototyping of applications

This phase of database design is primarily concerned with developing the ER diagram

Conceptual (ER = Entity Relationship)

Phases of Database Design

Conceptual, Logical, Physical

Database Design

Conceptual, logical and physical design of the database. Two approaches - "bottom up" (begins with attributes - entities and their relationships) and "top down" (more appropriate for complex databases - uses "Entity-Relationship" model) Database and Application design are parallel processes

Control statements

Conditional: IF THEN ELSE CASE Iterative: LOOP WHILE REPEAT FOR

Integrity control

Consists of constraints that we wish to impose in order to protect the database from becoming inconsistent. Defined in the CREATE and ALTER TABLE statements. Types: - required data (NOT NULL) - domain constraints - entity integrity (PRIMARY KEY) - referential integrity (FOREIGN KEY) - general constraints (CHECK, UNIQUE, CREATE ASSERTION)

UPDATE

Contents of existing rows in a named table to be changed

The logical design is said to have this with the conceptual design....

Correspondence with E-R Model Relations (tables) correspond with entity types and with many-to-many relationship types Rows correspond with entity instances and with many-to-many relationship instances Columns correspond with attributes NOTE: The word relation (in relational database) is NOT the same as the word relationship (in E-R model)

CRISP-DM

Cross Industry Standard Process for Data Mining - CRISP-DM remains the top methodology for data mining projects - provides a structured approach to planning a data mining project 1. Business understanding 2. Data understanding 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment

DDL

Data Definition Language Language that allows the DBA or user to describe and name the entities, attributes, and relationships required for the application, together with any associated integrity and security constraints.

DML

Data Manipulation Language - It allows users to insert, update, delete, and retrieve data from the database

DML

Data Manipulation Language - A language that provides a set of operations to support the basic data manipulation operations on the data held in the database. - part of a DML that involves data retrieval is called a query language - The terms "query language" and "DML" are commonly used interchangeably

DBA

Database Administrator - responsible for the physical realization of the database - physical database design and implementation - security and integrity control - maintenance of the operational system - ensuring satisfactory performance - more technically oriented than the role of the DA

DSDLC

Database System Development Lifecycle The effort spent on maintenance (correcting faults that had been detected, implementing new user requirements, and modifying the software to run on new or upgraded platforms) began to absorb resources at an alarming rate. As a result, many major software projects were late, over budget, unreliable, and difficult to maintain, and performed poorly. These issues led to what has become known as the software crisis (software depression)

Which parts of a database need to be backed up?

Database and log file

System Catalog

Description of the data *data dictionary or metadata

Application Design

Designing the user interface and the application programs that use and process the database Database and Application design are parallel processes Involves transaction design and interface design

DOLAP

Desktop OLAP DOLAP tools store the OLAP data in client-based files and support multidimensional processing using a client multidimensional engine issues include: - developing security controls

Participation

Determines whether all or only some entity occurrences participate in a relationship

Participation constraint

Determines whether every member in the superclass must participate as a member of a subclass. Mandatory or Optional

Comparison of DM and ER models

Dimensional vs Entity Relationship A single ER model normally decomposes into multiple DMs. Multiple DMs are then associated through 'shared' dimension tables. E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. E-R modelling revovles around the Entities and their relationships to capture the overall process of the system. Dimensional model/Muti-Dimensinal Modelling revolves around Dimensions(point of analysis) for decison making and not to capture the process.

authorization mechanisms

Discretionary access control: Each user is given appropriate access rights (or privileges) on specific database objects Mandatory access control: Each database object is assigned a certain classification level (for example, Top Secret, Secret, Confidential, Unclassified) and each subject (for example, users or programs) is given a designated clearance level

Guidelines for representation of superclass/subclass relationship

Disjoint represented by an 'OR' Non-disjoint (overlapping) represented by 'AND' Disjoint constraint only used for a hierarchy with more than one subclass So 4 possibilities for constraints shown on EERM: {Mandatory, OR} Must belong to exactly one subclass {Mandatory, AND} Must belong to one or more subclasses {Optional, OR} May belong to one subclass or none {Optional, AND} May belong to any number of subclasses

DDBMS

Distributed Database Management System: software that transparently manages the distributed database

DOM

Document Object Model: a tree-based API for XML

DTD

Document Type Definition - defines the valid syntax of an XML document

General constraint example:

DreamHome has a rule that prevents a member of staff from managing more than 100 properties at the same time:

ER vs EER

ER (entity relationship) includes: - entities - relationships - attributes EER (enhanced entity relationship) also includes: - specialization/generalization (superclass/subclass) - aggregation ("has-a" or "is-part-of" relationship) - composition (association between entities, "whole" and the "part.")

logical data model includes

ER diagram(s), relational schema, and supporting documentation such as the data dictionary, which is produced throughout the development of the model.

File based system

Each program defines and manages its own data -Data is isolated -Prone to data duplication -References and querries hard-coded resulting in Data Dependence -Data in proprietary format

ER modeling

Entity Relationship - top-down approach to database design - begins by identifying the important data called entities and relationships between the data that must be represented in the model - then more detail added such as the information we want to hold about the entities and relationships (attributes and any constraints on the entities, relationships, and attributes)

Use of JOIN

FROM Client JOIN Viewing USING clientNo

Facts

Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed

An HTTP request consists of:

GET POST (posts to resource) HEAD (requests header) PUT DELETE OPTIONS

The process of minimizing the differences between entities by identifying their common characteristics.

Generalization - top-down

Data Warehouse

Holds multiple subject areas Holds very detailed information Works to integrate all data sources Does not necessarily use a dimensional model but feeds dimensional models.

PHP

Hypertext Preprocessor - popular open source HTML-embedded scripting language

SQL modify statements

INSERT - adds new rows of data to a table UPDATE - modifies existing data in a table DELETE - removes rows of data from a table

functional dependencies

Indicate important relationships between the attributes of a relation. It is those functional dependencies and the primary key for each relation that are used in the process of normalization.

ISLC

Information Systems Lifecycle

Relation anomalies

Insertion (adding new rows forces user to create duplicate date) Deletion (deleting rows may cause a loss of data that would be needed for other future rows) Modification (changing data in a row forces changes to other rows because of duplication)

interpreted programming language

Interpreted (not compiled) by client

The most commonly used and normally the most useful fact-finding technique

Interviewing

The 80/20 rule

It has been suggested that the most active 20% of user queries account for 80% of the total data access (Wiederhold, 1983).

%TYPE

It is also possible to declare a variable to be of the same type as a column in a specified table or another variable using the %TYPE attribute. For example, to declare that the vStaffNo variable is the same type as the staffNo column of the Staff table, we could write: vStaffNo Staff.staffNo%TYPE; vStaffNo1 vStaffNo%TYPE; Also %ROWTYPE

write-set

It is good practice to collect all updates at the local site first and propagate them in one message to the remote sites later. Such an approach requires the extraction of the write-set

Requirements for a table to qualify as a relation:

It must have a unique name. Every attribute value must be atomic (not multivalued, not composite). Every row must be unique (can't have two rows with exactly the same values for all their fields). Attributes (columns) in tables must have unique names. The order of the columns must be irrelevant. The order of the rows must be irrelevant.

JDO

Java Data Objects - provide a standard interface between application objects and data sources, such as relational databases, XML databases, legacy databases, and file systems. - provide developers with a transparent Java-centric mechanism for working with persistent data to simplify application development.

JDBC

Java Database Connectivity - the most prominent and mature approach for accessing relational DBMSs from - modeled after the Open Database Connectivity (ODBC) specification - JDBC package defines a data-base access API that supports basic SQL functionality and enables access to a wide range of relational DBMS products - with JDBC, Java can be used as the host language for writing database applications

JPA

Java Persistence API - defines an interface to persist normal Java objects, sometimes called Plain Old Java Objects (or POJOs), to a datastore.

JSP

JavaServer Pages is a Java-based server-side scripting language that allows static HTML to be mixed with dynamically generated HTML.

Lore

Lightweight Object REpository - a multiuser DBMS, supporting crash recovery, materialized views, bulk loading of files in some standard format (XML is supported), and a declarative update language - Lorel is its language

This phase of database design is primarily concerned transforming the ER diagram into relations

Logical

VBScript

Microsoft proprietary interpreted scripting language

MIME

Multipurpose Internet Mail Extensions - a standard for encoding binary data into ASCII, as well as a standard for indicating the type of data contained inside a message. Subtypes include HTML, JPEG, MPEG, PDF

OLAP environment multidimensional data is represented as:

N-dimensional data cubes. An alternative representation for a data cube is as a lattice of cuboids

A schedule where the operations from a set of concurrent transactions are interleaved

Nonserial schedule

optional participation on both sides of 1:1 relationship

Parent/child assignment arbitrary

Database segmentation

Partitions a database into an unknown number of segments, or clusters, of similar records. This approach uses unsupervised learning to discover homogeneous subpopulations in a database to improve the accuracy of the profiles.

Warehouse Manager

Performs all the operations associated with the management of the data in the warehouse such as: - Analysis of data to ensure consistency. - Transformation and merging of source data from temporary storage into data warehouse tables. - Creation of indexes and views on base tables. - Generation of denormalizations, (if necessary). - Generation of aggregations, (if necessary). - Backing-up and archiving data.

Query Manager

Performs the operations associated with the management of user queries

three types of classes in JDO

Persistence-capable: classes whose instances can be persisted to a data store. Persistence-aware: classes that manipulate persistence-capable classes. Normal: classes that are not persistable and have no knowledge of persistence.

Database Planning

Planning how the stages of the lifecycle can be realized most efficiently and effectively. Identification of mission statement, mission objectives and standards definitions.

Perl

Practical Extraction and Report Language - high-level interpreted programming language with extensive, easy-to-use text processing capabilities.

dimensional model advantages

Predictable and standard form of the underlying dimensional model offers important advantages: - Efficiency - Ability to handle changing requirements - Extensibility - Ability to model common business situations - Predictable query processing

WITH CHECK OPTION

Prevents "migrating rows" resulting from inserts or updates WITH CASCADED CHECK OPTION will prevent migration resulting from changes that cascade to other views

Statement to create roles

Privileges can be explicitly granted to users. For example, a user can explicitly grant the privilege to insert rows into the PropertyForRent table to the user Beech:GRANT INSERT ON PropertyForRent TO Beech; Privileges can also be granted to a role (a named group of privileges), and then the role granted to one or more users

PL/SQL block parts

Procedural Language SQL an optional declaration part, in which variables, constants, cursors, and exceptions are defined and possibly initialized a mandatory executable part, in which the variables are manipulated an optional exception part, to handle any exceptions raised during execution

Two types of DML:

Procedural: specifies how the output of a DML statement is to be obtained. 3GLs Nonprocedural: describe only what output is to be obtained. 4GLs

Division

R ÷ S The Division operation defines a relation over the attributes C that consists of the set of tuples from R that match the combination of every tuple in S.

Intersection

R ∩ S The Intersection operation defines a relation consisting of the set of all tuples that are in both R and S. R and S must be union-compatible.

Union

R ∪ S The union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. R and S must be union-compatible. (List all cities where there is either a branch office or a property for rent)

Natural join

R ⋈ S The Natural join is an Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.

EquiJoin

R ⋈FS In the case where the predicate F contains only equality (=), the term Equijoin is used instead

Theta join (θ-join)

R ⋈FS The Theta join operation defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. The predicate F is of the form R.ai θ S.bi, where θ may be one of the comparison operators (<, ≤, >, ≥, =, ≠). Further: A theta join allows for arbitrary comparison relationships (such as ≥). An equijoin is a theta join using the equality operator. A natural join is an equijoin on attributes that have the same name in each relationship.

(Left) Outer join

R ⋊ S The (left) Outer join is a join in which tuples from R that do not have matching values in the common attributes of S are also included in the result relation. Missing values in the second relation are set to null.

Semijoin

R ▹F S The Semijoin operation defines a relation that contains the tuples of R that participate in the join of R with S satisfying the predicate F.

A relationship type in which the same entity type participates more than once in different roles.

Recursive relationship:

RDBMS

Relational Database Management System - has become the dominant data-processing software in use today,

ROLAP

Relational online analytical processing - fastest-growing type of OLAP server (response to MOLAP limitations) - supports relational DBMS products through the use of a metadata layer issues include: - performance problems due to processing complex queries - middleware development challenges

Aggregation

Represents a "has-a" or "is-part-of" relationship between entity types, where one represents the "whole" and the other the "part."

local logical data model

Represents the data requirements of one or more but not all user views of a database

The SQL programming language is known as

SQL/PSM (persistent stored modules)

A sequence of the operations by a set of concurrent transactions that preserves the order of the operations in each of the individual transactions.

Schedule

The two unary operations

Selection and Projection Projection means choosing which columns (or expressions) the query shall return. Selection means which rows are to be returned. - extract information from only one relation

A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions

Serial schedule

Predictive modeling

Similar to the human learning experience in using observations to form a model of the important characteristics of some phenomenon Model is developed using a supervised learning approach, which has two phases: training (using historical data for a training set) and testing (new data)

Snowflake schema

Snowflake schema is a variant of the star schema that has a fact table in the center, surrounded by normalized dimension tables

SDLC

Software Development Lifecycle

Conceptual database design steps

Step 1 Build conceptual data model Step 1.1 Identify entity types Step 1.2 Identify relationship types Step 1.3 Identify and associate attributes with entity or relationship types Step 1.4 Determine attribute domains Step 1.5 Determine candidate, primary, and alternate key attributes Step 1.6 Consider use of enhanced modeling concepts (optional step) Step 1.7 Check model for redundancy Step 1.8 Validate conceptual data model against user transactions Step 1.9 Review conceptual data model with user

Steps in building the Conceptual Data Model

Step 1.1: Identify entity types (define main objects user is interested in) Step 1.2: Identify relationship types (Use Entity-Relationship (ER) modeling to visualize the entity and relationships) Step 1.3: Identify and associate attributes with entity or relationship types (simple/composite attributes, single-valued/multi-valued attributes, and derived attributes) Step 1.4: Determine attribute domains Step 1.5: Determine candidate, primary, and alternative key attributes Step 1.6: Consider use of enhanced modeling concepts (optional step) (specialization/generalization, aggregation, and composition) Step 1.7: Check model for redundancy Step 1.8: Validate conceptual data model against user transactions Step 1.9: Review conceptual data model with user

Logical database design steps for the relational model

Step 2 Build logical data model Step 2.1 Derive relations for logical data model Step 2.2 Validate relations using normalization Step 2.3 Validate relations against user transactions Step 2.4 Check integrity constraints Step 2.5 Review logical data model with user Step 2.6 Merge logical data models into global model (optional step) Step 2.7 Check for future growth

Steps in Building Logical Data Model

Step 2.1 Derive relations for logical data model Step 2.2 Validate relations using normalization Step 2.3 Validate relations against user transactions Step 2.4 Check integrity constraints Step 2.5 Review logical data model with user Step 2.6 Merge logical data models into global data model (optional step) Step 2.7 Check for future growth

Steps in building the Logical Data Model

Step 2.1: Derive relations for logical data model Step 2.2: Validate relations using normalization Step 2.3: Validate relations against user transactions Step 2.4: Check integrity constraints Step 2.5: Review logical data model with user Step 2.6: Merge logical data models into global model Step 2.7: Check for future growth

Steps of physical database design

Step 3 Translate logical data model for target DBMS Step 3.1 Design base relations Step 3.2 Design representation of derived data Step 3.3 Design general constraints Step 4 Design file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements Step 5 Design user views Step 6 Design security mechanisms Step 7 Consider the introduction of controlled redundancy Step 8 Monitor and tune the operational system

Physical database design steps for relational databases

Step 3 Translate logical data model for target DBMS Step 3.1 Design base relations Step 3.2 Design representation of derived data Step 3.3 Design general constraints Step 4 Design file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements Step 5 Design user views Step 6 Design security mechanisms Step 7 Consider the introduction of controlled redundancy Step 8 Monitor and tune the operational system

Steps in "Translate Logical Data Model for Target DBMS"

Step 3.1: Design base relations Step 3.2: Design representation of derived data Step 3.3: Design general constraints

Steps for designing File Organizations and Indexes

Step 4.1: Analyze transactions Step 4.2: Choose file organizations Step 4.3: Choose indexes Step 4.4: Estimate disk space requirements

a transaction updates a data item based on its old value, which is first read by the transaction

Testing for conflict serializability - Constrained write rule o Transaction updates data item based on its old value, which is first read - Precedence Graph o Node for each transaction o Edge from node that writes a value to node that then reads that same value o Edge from node that reads a value to a node that then writes into

Cardinality

The cardinality of a relation is the number of tuples it contains. In the relational model, tables can be related as any of "one-to-many" or "many-to-many". This is said to be the cardinality of a given table in relation to another. Cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column

Integrity constraints are

The constraints that we wish to impose in order to protect the database from becoming incomplete, inaccurate, or inconsistent

Integrated Data

The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent.

Physical data independence

The immunity of the conceptual schema to changes in the internal schema.

schema

The logical organization of the entire database as seen by the DBA

Multiplicity

The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship (1 to 1, 1 to many, many to many, complex)

Multiplicity constraints

The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship. Multiplicity constrains the way that entities are related. It is a representation of the policies (or business rules) established by the user or enterprise.

Logical database design output

The output of this process is a logical data model consisting of an ER/relation diagram, relational schema, and supporting documentation that describes this model, such as a data dictionary. Logical = what, physical = how

Logical database design

The process of constructing a model of the data used in an enterprise based on a specific data model but independent of a particular DBMS and other physical considerations.

Data mining

The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. Uses:

concurrency control

The process of managing simultaneous operations on the database without having them interfere with one another.

relational schema validation

The structure of the relational schema is validated: - using normalization - then checked to ensure that the relations are capable of supporting the transactions given in the users' requirements specification - next check whether all important integrity constraints are represented by the logical data model - then validated by the users to ensure that they consider the model to be a true representation of the data requirements of the enterprise

Types of fragmentation

There are two main types of fragmentation: horizontal and vertical. - horizontal fragments are subsets of tuples - vertical fragments are subsets of attributes. There are also two other types of fragmentation: mixed and derived, a type of horizontal fragmentation where the fragmentation of one relation is based on the fragmentation of another relation

cube materialization

To address storage/resource issues create only a subset of all possible cuboids with the aim of supporting the majority of queries and/or the most demanding in terms of resources

Conceptual database design

To build the conceptual representation of the database, which includes identification of the important entities, relationships, and attributes. Constructing a model independent of implementation.

Physical database design

To decide how the logical structure is to be physically implemented (as base relations) in the target DBMS.

Objective of Design File Organizations and Indexes step

To determine the optimal file organizations to store the base relations and the indexes that are required to achieve acceptable performance, that is, the way in which relations and tuples will be held on secondary storage. Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements

The purpose of normalization is

To ensure that the set of relations has a minimal yet sufficient number of attributes necessary to support the data requirements of the enterprise with minimal redundancy

Purpose for Entity-Relationship (ER) Modeling

To ensure that we get a precise understanding of the nature of the data and how it is used by the enterprise, we need a model for communication that is nontechnical and free of ambiguities.

Objective of Translate Logical Data Model for Target DBMS step

To produce a relational database schema from the logical data model that can be implemented in the target DBMS. Step 3.

view materialization

To store the view as a temporary table in the database when the view is first queried (The speed difference may be critical in applications where the query rate is high and the views are complex, so it is not practical to recompute the view for every query) One disadvantage with materialized views is maintaining the currency of the temporary table.

Logical database design

To translate the conceptual representation to the logical structure of the database, which includes designing the relations. Independent of DBMS or other physical considerations.

When is updating a view allowed?

Updates are allowed through a view defined using a simple query involving a single base relation and containing either the primary key or a candidate key of the base relation. Updates are not allowed through views involving multiple base relations. Updates are not allowed through views involving aggregation or grouping operations.

CASE tools categories:

Upper-CASE tools support the initial stages of the database system development lifecycle, from planning through to database design. Lower-CASE tools support the later stages of the lifecycle, from implementation through testing, to operational maintenance. Integrated-CASE tools support all stages of the lifecycle and thus provide the functionality of both upper- and lower-CASE in one tool.

transaction analysis form example

Used in Step 4: Design File Organizations and Indexes (map all transaction paths to relations; determine which relations are most frequently accessed by transactions; analyze the data usage of selected transactions that involve these relations)

transaction usage map

Used in Step 4: Design File Organizations and Indexes (map all transaction paths to relations; determine which relations are most frequently accessed by transactions; analyze the data usage of selected transactions that involve these relations)

Value prediction

Value prediction is used to estimate a continuous numeric value that is associated with a database record. This technique uses the traditional statistical techniques of linear regression and nonlinear regression

Determining candidate, primary, and alternate key attributes

When choosing a primary key from among the candidate keys, use the following guidelines to help make the selection: - the candidate key with the minimal set of attributes - the candidate key that is least likely to have its values changed - the candidate key with fewest characters (for those with textual attribute(s)) - the candidate key with smallest maximum value (for those with numerical attribute(s)) - the candidate key that is easiest to use from the users' point of view

workflow ownership

Workflow ownership allows the right to update replicated data to move from site to site. However, at any one moment, there is only ever one site that may update that particular data set

XML Information Set

XML Information Set (or Infoset) is an abstract description of the information available in a well-formed XML document that meets certain XML namespace constraints

persistence entity

a lightweight Java class that typically represents a table in a relational database.

conservative 2PL

a transaction obtains all its locks when it begins, or it waits until all the locks are available

A data warehouse is well equipped for providing data for mining because

a warehouse not only holds data of high quality and consistency, and from multiple sources, but is also capable of providing subsets (views) of the data for analysis and lower level details of the source data, when required.

candidate key, primary key, alternate keys

candidate key: a minimal set of attributes of an entity that uniquely identifies each occurrence of that entity. We may identify more than one candidate key, in which case we must choose one to be the primary key (simplest, least likely to change); the remaining candidate keys are called alternate keys.

Situation in which a single transaction leads to a series of rollbacks

cascading rollback *cascading rollbacks are undesirable, because they potentially lead to the undoing of a significant amount of work

two techniques associated with predictive modeling

classification and value prediction

mandatory participation on both sides of 1:1 relationship

combine the entities involved into one relation and choose one of the primary keys of the original entities to be the primary key of the new relation, while the other (if one exists) is used as an alternate key

dimensional hierarchy

defines mappings from a set of lower-level concepts to higher-level concepts zipCode → area → city → region → country

advantages of a DDBMS

distributed database management system - it reflects the organizational structure - it makes remote data more shareable - it improves reliability, availability, and performance - it may be more economical - it provides for modular growth, facilitates integration - helps organizations remain competitive.

Java Web Services

document-oriented—those that deal directly with processing XML documents procedure-oriented—those that deal with procedures.

force policy

ensures that all pages updated by a transaction are immediately written to disk when the transaction commits. The alternative policy is no-force

Object-based data models

entities: a distinct object attributes: a property that describes some aspect of the object relationships: an association between entities

ER

entity-relationship modeling

ETL

extract, transform and load - used in data warehousing, data marts - migrating from one database to another

middleware-based replication

middleware is responsible for coordinating client requests across the different replicas and appears as one database system to the client

Relational modeling uses these to maintain relationships

primary keys and foreign keys Primary keys are typically the unique identifier noted on the conceptual model Foreign keys are the primary key of another entity to which an entity has a relationship Composite keys are primary keys that are made of more than one attribute Weak entities Associative entities

allows a user to create or access (that is read, write, or modify) some database object (such as a relation, view, or index) or to run certain DBMS utilities

privilege

Java

simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, and dynamic language from Sun Microsystems. Java applications are compiled into bytecodes, which are interpreted and executed by the Java Virtual Machine

Two parts to a package

specification: declares all public constructs of the package body: defines all constructs (public and private) of the package, and so implements the specification

holds only exclusive locks until the end of the transaction

strict 2PL

The SQL:2011 standard

supports OLAP functionality in the provision of extensions to grouping capabilities such as: - CUBE and ROLLUP functions - elementary operators such as moving windows and ranking functions

uses the same key for both encryption and decryption

symmetric encryption DES, PGP

Variable assignment in the executable part of a PL/SQL block

using the normal assignment statement (:=) or as the result of an SQL SELECT or FETCH statement. For example: vStaffNo := 'SG14' ;vRent := 500;SELECT COUNT(*) INTO x FROM PropertyForRent WHERE staffNo = vStaffNo; *PL/SQL (Procedural Language/Structured Query Language) is Oracle Corporation's procedural extension for SQL and the Oracle relational database

log records (or at least certain parts of them) are written before the corresponding write to the database

write-ahead log protocol

UNIQUE constraint

you can use UNIQUE constraints to make sure that no duplicate values are entered in specific columns that do not participate in a primary key. Although both a UNIQUE constraint and a PRIMARY KEY constraint enforce uniqueness, use a UNIQUE constraint instead of a PRIMARY KEY constraint when you want to enforce the uniqueness of a column, or combination of columns, that is not the primary key

Projection

Πa1 ... , an(R) The Projection operation works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. (Produce a list of salaries for all staff, showing only the staffNo, fName, lName, and salary details)

Selection

σpredicate(R) The Selection operation works on a single relation R and defines a relation that contains only those tuples of R that satisfy the specified condition (predicate) (List all staff with a salary greater than £10000)


Conjuntos de estudio relacionados

Chapter 19: Transoceanic Encounters and Global Connections

View Set

chapter 6: What sort of learning does classical conditioning explain?

View Set

Comprehension Check 2 - Sentences (TEFL)

View Set

PSC 100: Chapter 1 (Intro to Cognitive Psychology)

View Set

IUHS Immunology Exam Master Questions

View Set