Everything
DBMS Selection
- Define Terms of Reference of study: states the objectives and scope of the study and the tasks that need to be undertaken - Shortlist two or three products - Evaluate products - Recommend selection
trigger
- Defines an action that the database should take when some event occurs in the application. - Based on Event-Condition-Action (ECA) model. - User must be owner or have TRIGGER privilege.
Statement to grant and revoke system privileges
- Grant System Privileges/Roles dialog box and Revoke System Privileges/Roles dialog box of the Oracle Security Manager - SQL GRANT and REVOKE statements (see Section 7.6)
Java can be connected to an ODBC-compliant DBMS through:
- JDBC (dynamic SQL) or SQLJ (static SQL), - Container-Managed Persistence (CMP) - Java Data Objects (JDO) - Java Persistence API (JPA)
models based on the concepts of open nested transactions and split transactions
- Kangaroo - Reporting and Co-Transactions - MoFlex A mobile host starts a transaction and a subtransaction is started at the connected mobile support station
Attribute
An attribute is a named column of a relation.
Derived attribute
An attribute that represents a value that is derivable from the value of a related attribute or set of attributes, not necessarily in the same entity type.
DA
Data Administrator - responsible for the management of the data resource -database planning - development and maintenance of standards, policies and procedures - conceptual/logical database design - consults with and advises senior managers
DDL
Data Definition Language - The DDL allows users to specify the data types and structures and the constraints on the data to be stored in the database.
DBMS
Database Management System - A software system that enables users to define, create, maintain, and control access to the database.
EJB
Enterprise JavaBeans (EJB) is a server-side component architecture for the business tier, encapsulating business and data logic
"irreversible" and "reversible."
Irreversible techniques, as the name implies, do not permit the original data to be known. However, the data can be used to obtain valid statistical information. Reversible is more commonly used.
JDOQL
JDO Query Language - data-source-neutral query language based on Java boolean expressions
MOLAP
MOLAP servers use specialized data structures and multidimensional database management systems (MDDBMSs) to organize, navigate, and analyze data. To enhance query performance, the data is typically aggregated and stored according to predicted usage. issues include: Only a limited amount of data can be efficiently stored and analyzed. Navigation and analysis of data are limited, because the data is designed according to previously determined requirements. Data may need to be physically reorganized to optimally support new requirements.
Entity types
Main objects user is interested in
asterisk:asterisk is
Many to many
Set difference
R - S The Set difference operation defines a relation consisting of the tuples that are in relation R, but not in S. R and S must be union-compatible. (List all cities where there is a branch office but no properties for rent.)
Cartesian product
R × S The Cartesian product operation defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S. (List the names and comments of all clients who have viewed a property for rent.) kind of default join called a Cartesian Join (or Cross Join)
enhanced modeling concepts
specialization/generalization, aggregation, and composition
semantic net
symbols for entities and relationships:
In deciding where to post (or place) the foreign key attribute(s), we must first identify
the "parent" and "child" entities involved in the relationship
Thomas's write rule
ignore obsolete write rule
Lock that is held for a much shorter duration than a normal lock
latch * A latch can be used before a page is read from, or written to, disk to ensure that the operation is atomic
rigorous 2PL
leave the release of all locks until the end of the transaction
left in a wait state indefinitely, unable to acquire any new locks
livelock
The technique of locking a child node and releasing the lock on the parent node if possible
lock-coupling or crabbing
Concurrency control techniques
locking and timestamping *both conservative (pessimistic) approaches (they cause transactions to be delayed in case they conflict with other transactions at some time in the future)
distributed database
logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network
relational keys
one or more attributes that uniquely identifies each tuple in a relation
too much study too soon leads to...
paralysis by analysis
The process of maximizing the differences between members of an entity by identifying their distinguishing characteristics.
Specialization - bottom-up
System Definition
Specifying the scope and boundaries of the database system, including major user views, its users, and application areas.
CASE benefits:
Standards Integration Support for standard methods Consistency Automation
Strong and Weak entity types
Strong entity type: An entity type that is not existence-dependent on some other entity type. Weak entity type: An entity type that is existence-dependent on some other entity type.
Criteria for data models
Structural validity Simplicity Expressibility Nonredundency Sharability Extensibility (evolve to meet new requirements) Integrity Diagrammic representation (easy to diagram)
SQL
Structured Query Language
SDLC Activities
Systems Development Life Cycle
view maintenance
The process of updating a materialized view in response to changes to the underlying data The basic aim of view maintenance is to apply only those changes necessary to the view to keep it current
The Join operation
combines two relations to form a new relation
database design methodology main phases:
conceptual, logical, and physical database design
A schedule that orders any conflicting operations in the same way as some serial execution
conflict serializability
XML Query Working Group
created XQuery formed to produce a data model for XML documents, a set of query operators on this model, and a query language based on these query operators
subschema
The part of the database as seen by the user or application program
Join operation types
Theta join Equijoin (a particular type of Theta join) Natural join Outer join Semijoin
An action, or series of actions, carried out by a single user or application program, that reads or updates the contents of the database.
Transaction
UML
Unified Modeling Language - first letter is upper-case
subtransaction
types: - compensatable - repeatable - pivot - location-dependent correct execution order of subtransactions ensures the termination despite disconnection
bean
used to encapsulate many objects into a single object (the bean)
A view is created by..
using the CREATE VIEW statement by specifying a defining query
OLAP
Online Analytical Processing The dynamic synthesis, analysis, and consolidation of large volumes of multidimensional data.
global logical data model
Represents the data requirements for all user views
The two principal rules for the relational model
entity integrity referential integrity
The responsibility to authorize use of the DBMS usually rests with the...
Database Administrator (DBA)
Elements
Elements, or tags, are the most common form of markup
(1:*)
One-to-many "One" is parent, primary key should be posted to child
DBMS provides
- a security system - an integrity system - a concurrency control system - a recovery control system - a user-accessible catalog
mandatory participation on one side of 1:1 relationship
"Optional" is parent, primary key should be posted to child (mandatory)
Cursor Attribuites
%ROWCOUNT - evaluates total number of rows returned so far %ISOPEN - true if the cursor is open %FOUND - true if most resent fetch returned a row %NOTFOUND
Categories of OLAP Servers
- Multidimensional OLAP (MOLAP) - Relational OLAP (ROLAP) - Hybrid OLAP (HOLAP) - Desktop OLAP (DOLAP)
Web-DBMS advantages
- Simplicity - Platform independence - Graphical user interface - Standardization - Cross-platform support - Transparent network access - Scalable deployment - Innovation
polyinstantiation
- The presence of data objects that appear to have different values to users with different clearances - Precludes possibility of hints of existing tuples to lower clearance
steal policy
- allows the buffer manager to write a buffer to disk before a transaction commits (the buffer is unpinned). In other words, the buffer manager "steals" a page from the transaction. The alternative policy is no-steal - avoids the need for a very large buffer space
functions that a database management system (DBMS) should provide
- transaction support - concurrency control services - recovery services
Physical data models
- unifying model - frame memory model
Windowing calculations
- used to compute cumulative, moving, and centered aggregates EX:
Timestamping
-if a transaction attempts to read or write a data item, then the read or write is only allowed to proceed if the last update on that data item was carried out by an older transaction - each data item contains a read_timestamp, giving the timestamp of the last transaction to read the item, and a write_timestamp
Guidelines for indexes
-the primary key of a table is automatically indexed, but a field whose data type is Memo, Hyperlink, or OLE Object cannot be indexed. Microsoft advises indexing a field if all the following apply: - the field's data type is Text, Number, Currency, or Date/Time - the user anticipates searching for values stored in the field - the user anticipates sorting values in the field;the user anticipates storing many different values in the field. If many of the values in the field are the same, the index may not significantly speed up queries.
Composite key
A candidate key that consists of two or more attributes
Relational database
A collection of normalized relations with distinct relation names
Foreign key
A column or combination of columns that is used to establish and enforce a link between the data in two tables - an attribute, or set of attributes, within one relation that is the candidate key of another relation A FOREIGN KEY is a key used to link two tables together. A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table. The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.
Candidate key
A combination of attributes that uniquely identifies a database record without any extraneous data (minimal superkey)
semantic data modeling
A data model that represents the "real world"
Data Mart
A database that contains a subset of corporate data to support the analytical requirements of a particular business unit (such as the Sales department) or to support users who share the same requirements to analyse a particular business process (such as property sales).
Subclass
A distinct subgrouping of occurrences of an entity type, which must be represented in a data model. Every sub a member of super...but not all super members of subs.
Determining attribute domains
A domain is a pool of values from which one or more attributes draw their values A fully developed data model specifies the domains for each attribute and includes:allowable set of values for the attribute;sizes and formats of the attribute. Document in Data Dictionary.
Domain
A domain is the set of allowable values for one or more attributes.
FOREIGN KEY
A foreign key is a column, or set of columns, that links each row in the child table containing the foreign key to the row of the parent table containing the matching candidate key value. Referential integrity means that, if the foreign key contains a value, that value must refer to an existing, valid row in the parent table - used in CREATE or ALTER TABLE statements
Dimensionality modeling
A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables Each dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. Forms 'star-like' structure, which is called a star schema or star join.
XML
A metalanguage (a language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML.
package
A package is a schema object that groups logically related PL/SQL types, variables, and subprograms. Packages usually have two parts, a specification (spec) and a body - a collection of procedures, functions, variables, and SQL statements that are grouped together and stored as a single program unit providing a form of encapsulation - consists of specification and body Oracle package steps: - It compiles the procedure or package. - It stores the compiled code in memory - It stores the procedure or package in the database.
database developer
A person or group of people responsible for the analysis, design, and implementation of a database system.
Ranking functions
A ranking function computes the rank of a record compared to other records in the dataset based on the values of a set of measures EX: RANK( ) OVER (ORDER BY columnList) DENSE_RANK( ) OVER (ORDER BY columnList)
Relation
A relation is a table with columns and rows - subset of the Cartesian product of two or more sets
fragments
A relation may be divided into a number of subrelations called fragments, which are allocated to one or more sites. Fragments may be replicated to provide improved availability and performance.
Relationship type
A set of meaningful associations among entity types
Relational database schema
A set of relation schemas, each with a distinct name.
Database
A shared collection of logically related data and its description, designed to meet the information needs of an organization. *self-describing collection of integrated records
Tuple
A tuple is a row of a relation. The tuples are called the extension (or state) of a relation, which changes over time. The nearest equivalent in SQL is a "row". SQL rows differ from relational tuples in a few fundamental ways, for example: SQL rows are always ordered lists of values rather than unordered sets; SQL rows permit values that are un-named or have duplicate names.
Relationship occurrence
A uniquely identifiable association that includes one occurrence from each participating entity type
Character Data
ASCII and EBCDIC are commonly used character sets
Data Abstraction
Ability to change the internal definition of an object without affecting the users of the object
HAVING
Acts as a WHERE clause for groups, restricting the groups that appear in the final result table
General constraints
Additional rules specified by the users or database administrators of a database that define or constrain some aspect of the enterprise
INSERT
Adds new rows
Advantages and disadvantages of triggers
Advantages: Elimination of redundant code. Simplifying modifications. Increased security. Improved integrity. Improved processing power. Good fit with the client-server architecture. Disadvantages: Performance overhead. Cascading effects. Cannot be scheduled. Less portable.
APB-1
Analytical Processing Benchmark The aim of the APB-1 is to measure a server's overall OLAP performance rather than the performance of individual tasks
AQM
Analytical Queries per Minute a standard benchmark metric that allows for comparison of performances of different combinations of hardware and software
Special needs of a DDBMS
As well as having the standard functionality expected of a centralized DBMS, a DDBMS will need: - extended communication services - extended system catalog - distributed query processing - extended security, concurrency, and recovery services.
ACID properties of a transaction
Atomicity, Consistency, Isolation, and Durability
A property of an entity or a relationship type:
Attribute
The set of allowable values for one or more attributes:
Attribute domain
derived attribute
Attributes whose values are based on the values of other attributes
A mechanism that determines whether a user is who he or she claims to be.
Authentication
EER is used for
CAD CAM (computer aided manufacturing) CASE (computer aided software engineering) OIS (office information systems) EER byproduct of a need for "semantic" modeling concepts
Examples of Oracle roles
CONNECT: Includes only the following system privilege: CREATE SESSION RESOURCE: Includes the following system privileges: CREATE CLUSTER, CREATE INDEXTYPE, CREATE OPERATOR, CREATE PROCEDURE, CREATE SEQUENCE, CREATE TABLE, CREATE TRIGGER, CREATE TYPE DBA: All system privileges WITH ADMIN OPTION
aggregate functions
COUNT SUM AVG MIN MAX
The main aggregate functions
COUNT - returns the number of values in the associated attribute. SUM - returns the sum of the values in the associated attribute. AVG - returns the average of the values in the associated attribute. MIN - returns the smallest value in the associated attribute. MAX - returns the largest value in the associated attribute.
Primary key
Candidate key chosen for use in identification of tuples. A relation must always have a primary key.
used to improve database recovery - all modified buffer blocks, all log records, and a record identifying all active transactions are written to disk
Checkpoints
CGI disadvantages
Common Gateway Interface - a new process is created for each invocation of the CGI script, which can overload the Web server during peak times
The five basic search conditions (or predicates)
Comparison Range (BETWEEN/NOT BETWEEN) Set membership (IN/NOT IN) Pattern match (LIKE/NOT LIKE) Null
Simple/composite attributes
Composite attributes are made up of simple attributes (EX: address using street, city, zip)
Three phases of database design:
Conceptual Logical Physical
Condition handling
DECLARE {CONTINUE | EXIT | UNDO} HANDLERFOR SQLSTATE {sqlstateValue | conditionName | SQLEXCEPTION | SQLWARNING | NOT FOUND} handlerAction; A condition name and an optional corresponding SQLSTATE value can be declared using: DECLARE conditionName CONDITION[FOR SQLSTATE sqlstateValue]
RESTRICT VS CASCADE
DROP behaviors - specifies the action to be taken if the domain is currently being used - if RESTRICT is specified and the domain is used in an existing table, view, or assertion definition the drop will fail - CASCADE, any table column that is based on the domain is automatically changed to use the domain's underlying data type, and any constraint or default clause for the domain is replaced by a column constraint or column default clause, if appropriate
Roles in the Database Environment
Data and Database Administrators Database Designers Application Developers End-Users
DDL
Data definition language
Non-volatile Data
Data in the warehouse is not normally updated in real-time (RT) but is refreshed from operational systems on a regular basis
Time-variant Data
Data in the warehouse is only accurate and valid at some point in time or over some time interval
DP
Data processing
The number of participating entity types in a relationship is called the:
Degree of a relationship type Example of ternary relationship called "Registers":
conceptual schema
Describes all the entities, attributes, and relationships together with integrity constraints
Cardinality
Describes the maximum number of possible relationship occurrences for an entity participating in a given relationship type.
Disjoint constraint
Describes the relationship between members of the subclasses and indicates whether it is possible for a member of a superclass to be a member of one, or more than one, subclass (The disjoint constraint only applies when a superclass has more than one subclass. If the subclasses are disjoint, then an entity occurrence can be a member of only one of the subclasses.)
The difference between RANK and DENSE_RANK:
EX: SELECT branchNo, SUM(saleAmount) AS sales, RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS ranking, DENSE_RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS dense_ranking FROM Branch, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND Branch.city = 'Edinburgh' GROUP BY(branchNo);
INTERSECT use
EX: Construct a list of all cities where there is both a branch office and a property. (SELECT city FROM branch) INTERSECT (SELECT city FROM PropertyForRent);
Three-level architecture
External - The users' view of the database. This level describes that part of the database that is relevant to each user Conceptual - The community view of the database. This level describes what data is stored in the database and the relationships among the data Internal - The physical representation of the database on the computer. This level describes how the data is stored in the database.
FLWOR expressions
FOR forVar IN inExpression LET letVar := letExpression [WHERE filterExpression] [ORDER BY orderSpec] RETURN expression
extensions
Features that are provided on top of the standard by the vendors.
Updatable view
For a view to be updatable, the DBMS must be able to trace any row or column back to its row or column in the source table.
Data Warehouse vs Data Mart
For many enterprises the way to avoid the complexities associated with designing a data warehouse is to start by building one or more data marts. Data marts allow designers to build something that is far simpler and achievable for a specific group of users.
HOLAP
HOLAP (Hybrid Online Analytical Processing) servers deliver selected data directly from the DBMS or via a MOLAP (Multidimensional OLAP) server to the desktop (or local server) in the form of a data cube, where it is stored, analyzed, and maintained locally issues include: - architecture results in redundancy - lack of consistency as well
Components of the DBMS Environment
Hardware Software Data Procedures People
File index types
Heap - (simplest - records inserted at end) Hash - (bucket via hash function) Indexed Sequential Access Method (ISAM) B -tree (paths to root and leaf same length) Clusters (two or more tables same data block)
Referential integrity
If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null. A foreign key links each tuple in the child relation to the tuple in the parent relation containing the matching candidate key value. Referential integrity means that if the foreign key contains a value, that value must refer to an existing tuple in the parent relation.
Entity integrity
In a base relation, no attribute of a primary key can be null.
homogeneous vs heterogeneous
In a homogeneous system, all sites use the same DBMS product. In a heterogeneous system, sites may run different DBMS products,
The process of keeping and maintaining a log file of all changes made to the database to enable recovery to be undertaken effectively in the event of a failure
Journaling
Outer JOIN types
LEFT JOIN RIGHT JOIN FULL JOIN
Testing evaluation criteria
Learnability: How long does it take a new user to become productive with the system? Performance: How well does the system response match the user's work practice? Robustness: How tolerant is the system of user error? Recoverability: How good is the system at recovering from user errors? Adapatability: How closely is the system tied to a single model of work?
Link Analysis
Link analysis aims to establish links, called associations, between the individual records, or sets of records, in a database
Discretionary Access Control (DAC)
Most commercial DBMSs provide an approach to managing privileges that uses SQL called Discretionary Access Control (DAC). The SQL standard supports DAC through the GRANT and REVOKE commands Has security loopholes
Sort results using
ORDER BY
OEM
Object Exchange Model: proposed models for semistructured data - nested object model.
Data Mart
Often holds only one subject area- for example, Finance, or Sales May hold more summarised data (although many hold full detail) Concentrates on integrating information from a given subject area or set of source systems Is built focused on a dimensional model using a star schema.
Guidelines for determining secondary indexes (simplified)
One approach to selecting an appropriate file organization for a relation is to keep the tuples unordered and create as many secondary indexes as necessary. Another approach is to order the tuples in the relation by specifying a primary or clustering index. One approach to determining which secondary indexes are needed is to produce a "wish-list" of attributes that we consider are candidates for indexing, and then to examine the impact of maintaining each of these indexes.
ODS
Operational Data Store - Holds current and integrated operational data for analysis. - Often structured and supplied with data in the same way as the data warehouse. - May act as staging area for data to be moved into the warehouse. - Often created when legacy operational systems are found to be incapable of achieving reporting requirements
put_line
Outputs information to a buffer in the SGA (System Global Area - an area of shared memory that is used to store data and control information for one Oracle instance). Displayed using "get_line"
Benefits of Data Warehousing
Potential high returns on investment Competitive advantage Increased productivity of corporate decision-makers
ROLLUP
ROLLUP creates subtotals that roll up from the most detailed level to a grand total put a different way: ROLLUP creates subtotals at n 1 levels, where n is the number of grouping columns EX: SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales FROM Branch, PropertyForSale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.year Month IN ('2013-08', '2013-09') AND Branch.city IN ('Aberdeen', 'Edinburgh', 'Glasgow') GROUP BY ROLLUP(propertyType, yearMonth, city);
A schedule in which for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj.
Recoverable schedule
RDF
Resource Description Framework - infrastructure that enables the encoding, exchange, and reuse of structured metadata
Use of HAVING
SELECT * FROM Staff GROUP BY branchNo HAVING COUNT(staffNo) > 1
Use of AS
SELECT COUNT(DISTINCT propertyno) AS MyCount
privileges
SELECT—the privilege to retrieve data from a table INSERT—the privilege to insert new rows into a table UPDATE—the privilege to modify rows of data in a table DELETE—the privilege to delete rows of data from a table REFERENCES—the privilege to reference columns of a named table in integrity constraints USAGE—the privilege to use domains, collations, character sets, and translations. * Issued via GRANT
Relational DBMSs generation...
Second-generation DBMS
Java Servlets
Servlets are programs that run on a Java-enabled Web server and build Web pages, analogous to CGI programming
Domain
Set of legal values CREATE DOMAIN Example: CREATE DOMAIN DomainName [AS] dataType [DEFAULT defaultOption] [CHECK (searchCondition)]
Locking
Shared lock:If a transaction has a shared lock on a data item, it can read the item but not update it. Exclusive lock:If a transaction has an exclusive lock on a data item, it can both read and update the item.
confusion matrix
Shows the counts of the actual versus predicted class values
SAX
Simple API for XML: an event-based, serial-access API for XML
Star schema
Star schema is a logical structure that has a fact table (containing factual data) in the center, surrounded by denormalized dimension tables (containing reference data).
Starflake schema
Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) dimension tables
view resolution
Tables and Views both have definitions such as the columns of the table, or the columns in a view. All columns have given data types. Tables hold row data for a given set of columns spread across the physical pages stored on a disk. Views have definitions similar to tables. However, they do not hold row data. The data found in a view is populated by an underlying query that is reading from tables that hold row data. When you read from a view, or join a view to an existing query the SQL server will execute the query in the view and join it to your dataset. When it does that, that would be view resolution.
SQL for relations, attributes, and tuples
Tables, columns, and rows
ODBC
The Microsoft Open Database Connectivity - provides a common interface for accessing heterogeneous SQL databases - based on SQL as a standard for accessing data
Database Segmentation
The aim of database segmentation is to partition a database into an unknown number of segments, or clusters, of similar records, that is, records that share a number of properties and so are considered to be homogeneous. (Segments have high internal homogeneity and high external heterogeneity.)
Physical vs Conceptual
The basic concept of the ER model is the entity type, which represents a group of "objects" in the "real world" with the same properties. An entity type has an independent existence and can be objects with a physical (or "real") existence or objects with a conceptual (or "abstract") existence
Primary key
The candidate key that is selected to uniquely identify each occurrence of an entity type.
Degree
The degree of a relation is the number of attributes it contains.
View
The dynamic result of one or more relational operations operating on the base relations to produce another relation. A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of request
transaction/relation cross-reference matrix
The matrix shows, in a visual way, the transactions that are required and the relations they access:
Candidate key
The minimal set of attributes that uniquely identifies each occurrence of an entity type.
update anywhere ownership
The update-anywhere (symmetric replication) model creates a peer-to-peer environment in which multiple sites have equal rights to update replicated data.
Subject-oriented Data
The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales). Need for decision support
To perform multi-table queries
To combine columns from several tables into a result table, we need to use a join operation
Data Modeling
Two main purposes: - assist in the understanding of the meaning (semantics) of the data - facilitate communication about the information requirements Data modeling ensures we understand: - each user's perspective of the data - the nature of the data itself, independent of its physical representations - the use of data across user views.
UNDO/REDO in context of a crash
UNDO uncommitted transactions REDO commited
Three results combinations
UNION (full) INTERSECT EXCEPT (difference, MINUS)
"View" use disadvantages
Update restriction Structure restriction Performance
Composite key
When a key consists of more than one attribute
Lore's XML-based data model
XML element is a pair (eid, value)
parallel DBMS
a DBMS running across multiple processors and disks and which has been designed to evaluate operations in parallel, whenever possible, in order to improve performance
Diagrammatic representation of attributes on relationships is done using:
a dashed line:
The MoFlex model
allow defining predicates about time, costs, and location that affect the execution of a transaction. Compensatable. Repeatable. A transaction that eventually succeeds, but might be repeated a number of times. Pivot. A transaction that is neither compensatable nor repeatable. Location-dependent. If the subtransaction has to terminate at this specific location (cell, MSS).
Foreign Key
an attribute (possible composite) in a relation of a database that serves as the primary key of another relation in the same database
kernel-based replication
replication protocol is tightly coupled to the concurrency control mechanism of the local database system
relational schema
set of relations
distributed processing
where a centralized DBMS is accessed over a network
Mapping strong entity types
"Regular" entities: - Simple - Composite - Multi-valued
Functions of a DBMS
(1) Data storage, retrieval, and update (2) A user-accessible catalog (3) Transaction support (4) Concurrency control services (5) Recovery services (6) Authorization services (7) Support for data communication (8) Integrity services (9) Services to promote data independence (10) Utility services
Codd's Rules for OLAP Tools
(1) Multidimensional conceptual view (2) Transparency (3) Accessibility (4) Consistent reporting performance (5) Client-server architecture (6) Generic dimensionality (Every data dimension must be equivalent in both structure and operational capabilities) (7) Dynamic sparse matrix handling (8) Multi-user support (9) Unrestricted cross-dimensional operations (10) Intuitive data manipulation (11) Flexible reporting (12) Unlimited dimensions and aggregation levels
OLAP key features
(online analytical processing) - multidimensional views of data - support for complex calculations - time intelligence.
lazy update propagation
- (asynchronous update propagation) - the target database is updated after the source database has been modified.
eager update propagation
- (synchronous update propagation) - ensures that all copies are updated within the enclosing transaction and voting at the end ensures the atomicity.
multilevel relation
- A relation that reveals different tuples to users with different security clearances - Related to MAC
Aspects of transactions
- A transaction should always transform the database from one consistent state to another - "Committed" if successful, "Aborted" otherwise, where database is "rolled back" or "undone" - "Compensating transaction" fixes a mistake
ASP
- Active Server Pages - allows dynamic, interactive Web pages to be created on the Web server - Active X is programming extension of ASP
API
- Application Programming Interface - "non-CGI gateways" - implemented by extending the web server
Properties of a transaction
- Atomicity: "all or nothing" - Consistency: one consistent state to another - Isolation: execute independently of one another - Durability: effects permanently recorded
Derived Attributes
- Attributes whose value can be found by examining the values of other attributes (calculated attributes) - Often, derived attributes do not appear in the logical data model but are documented in the data dictionary. If a derived attribute is displayed in the model, a "/" is used to indicate that it is derived - Considering physical design - calculated every time or stored in DB?
Phases of CRISP-DM model
- Business understanding - Data understanding - Data preparation - Modeling - Evaluation - Deployment
CGI
- Common Gateway Interface - CGI is a standard, portable, and modular method for supporting application-specific functionality by allowing scripts to be activated by the server to handle client requests - A specification for transferring information between a Web server and a CGI program - cookies - popular technique for integrating databases into the web
CGI advantages
- Common Gateway Interface - simplicity - language independence - Web server independence - wide acceptance
Data Mart vs Data Warehouse
- Data marts support only the requirements of a particular department or functional area and can therefore be built more rapidly. - The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. - Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.
Stages of the database system development lifecycle
- Database Planning - System Definition - Requirements collection and analysis - Database Design - DBMS Selection - Application Design - Prototyping - Implementation - Data Conversion and Loading - testing - Operational Maintenance
CODASYL
- First generation of DBMS Disadvantages: - Complex programs have to be written to answer even simple queries based on navigational record-oriented access - There is minimal data independence - There is no widely accepted theoretical foundation.
Mandatory Access Control (MAC)
- Mandatory Access Control (MAC) is based on system-wide policies that cannot be changed by individual users. In this approach each database object is assigned a security class and each user is assigned a clearance for a security class, and rules are imposed on reading and writing of database objects by users - SQL standard does not include support for MAC - Popular version: Bell-LaPadula model (objects, subjects, security classes and clearances)
UI Design Guidelines
- Meaningful title - Comprehensible instructions - Logical grouping/sequencing of fields - Visually appealing layout - Familiar field labels - Consistent terminology and abbreviations - Consistent use of color - Visible space and boundaries for data entry fields - Convenient cursor movement - Error correction for individual characters and entire fields - Error messages for unacceptable values - Optional fields marked clearly - Explanatory messages for fields - Completion signal
Microsoft .NET
- Microsoft's latest Web Platform - a vision for the third generation of the Internet where "software is delivered as a service, accessible by any device, any time, any place, and is fully programmable and personalizeable."
formal XML semantics processing model
- Parsing (ensures that the input expression is an instance of the language) - Normalization (converts the expression into an XQuery Core expression) - Static type analysis (bottom-up technique by applying type inference rules over expressions) - Dynamic evaluation (computes the value of the expression from the abstract syntax tree in the core language)
Web-DBMS disadvantages
- Reliability - Security - Cost - Scalability - Limited functionality of HTML - Statelessness - Bandwidth - Performance - Immaturity of deployment tools
The five fundamental operations in relational algebra
- Selection (or Restriction) - Projection - Cartesian product - Union - Set difference * In addition, there are also the Join, Intersection, and Division operations, which can be expressed in terms of the five basic operations
cursors
- Since SELECT can only be used if the query returns only one row cursors are used to navigate results - used to allow the rows of a query result to be accessed one at a time - must be declared and opened before it can be used, and it must be closed to deactivate it after it is no longer required - DECLARE, OPEN, FETCH, CLOSE, DEALLOCATE
SQL
- Structured Query Language - Non-procedural - Standard English words (for pros and non-pros) - Transform-oriented language: designed to use relations to transform inputs into required outputs. Two major components: - Data Definition Language (DDL) for defining the database structure and controlling access to the data - Data Manipulation Language (DML) for retrieving and updating data.
Synchronous vs Asynchronous
- Synchronous: All changes are applied at all sites participating in the replication as part of a single transaction. If the transaction fails at any site, the entire transaction is rolled back. - Asynchronous: Local changes are captured, stored in a queue, and propagated and applied at the remote sites at regular intervals.
Problems of Data Warehousing
- Underestimation of resources for data loading - Hidden problems with source systems - Required data not captured - Increased end-user demands - Data homogenization - High demand for resources - Data ownership - High maintenance - Long duration projects - Complexity of integration
Fan trap
- Where a model represents a relationship between entity types, but the pathway between certain entity occurrences is ambiguous. - Resolved by repositioning relationships
Chasm trap
- Where a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences. - Resolved by identifying a missing relationship.
A DBMS should provide the following facilities to assist with recovery:
- a backup mechanism: makes periodic backup copies of the database - logging facilities: keep track of the current state of transactions and database changes - checkpoint facility: enables updates to the database that are in progress to be made permanent - recovery manager: allows the system to restore the database to a consistent state following a failure.
View Mechanisms provide
- a level of security - a mechanism to customize the appearance of the database - can present a consistent, unchanging picture of the structure of the database
Identifying entity types
- a method of identifying entities: examine users' requirements specification. From this specification identify nouns or noun phrases that are mentioned (for example, staff number, staff name, property number, property address, rent, number of rooms). and look for major objects, such as people, places, or concepts of interest - watch for synonyms/homonyms - assign meaningful and obvious names - documents the types
The relational algebra
- a theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s) - relation-at-a-time (or set) language in which all tuples, possibly from several relations, are manipulated in one statement without looping.
cryptosystem components
- an encryption key to encrypt the data (plaintext) - an encryption algorithm that with the encryption key transforms the plaintext into ciphertext - a decryption key to decrypt the ciphertext - a decryption algorithm that with the decryption key transforms the ciphertext back into plaintext
exception
- an identifier in PL/SQL raised during the execution of a block that terminates its main body of actions - To handle raised exceptions, separate routines called exception handlers are specified
CREATE INDEX
- an index is a structure that provides accelerated access to the rows of a table based on the values of one or more columns. The presence of an index can significantly improve the performance of a query. - creation of indexes is not standard SQL
transaction analysis form includes
- any predicates that will be used - any attributes that will be required to join relations together (for a query transaction) - attributes used to order results (for a query transaction) - attributes used to group data together (for a query transaction) - any built-in functions that may be used (such as AVG, SUM) - any attributes that will be updated by the transaction.
There are three specializations of link analysis:
- associations discovery (finds items that imply the presence of other items) - sequential pattern discovery (finds patterns where one set of items is followed by another over time) - similar time sequence discovery (links between data sets that are time dependent)
Documenting attributes
- attribute name and description - data type and length - any aliases that the attribute is known by - whether the attribute is composite and, if so, the simple attributes that make up the composite attribute - whether the attribute is multi-valued - whether the attribute is derived and, if so, how it is to be computed - any default value for the attribute
relational data model
- based on the concept of mathematical relations - data and relationships are represented as tables, each of which has a number of columns with a unique name
Optimistic methods
- based on the premise that conflict is rare - sacrifices locking to gain performance, risking costly rollback - involves read (new value to local variable), validation, write
Domain constraints
- can be specified using the CHECK clause or by defining domains using the CREATE DOMAIN statement.
the four allocation strategies
- centralized (a single centralized database) - fragmented (fragments assigned to one site) - complete replication (complete copy of the database maintained at each site) - selective replication (combination of the first three)
Disadvantages of a DBMS
- complexity - size - cost (software/hardware/conversion) - performance - greater impact of failure
Logical Database Designer
- concerned with identifying the data (that is, the entities and attributes) - the relationships between the data - and the constraints on the data that is to be stored in the database - must have a thorough and complete understanding of the organization's data and any constraints on this data (business rules)
Boolean data
- consists of the distinct truth values TRUE and FALSE - unless prohibited by a NOT NULL constraint it also supports the UNKNOWN truth value as the NULL value - CHAR fixed, VARCHAR variable
Deadlock detection is usually handled by
- construction of a wait-for graph (WFG) that shows the transaction dependencies - generated at regular intervals
neural network
- contains collections of connected nodes with input, output, and processing at each node - each processing unit connected by a weighted value - may contain hidden processing layers
disadvantages of a DDBMS
- cost - complexity - lack of standards - experience
relational model
- data is logically structured within relations (tables) - each relation has a name and is made up of named attributes (columns) of data - each tuple (row) contains one value per attribute
network model
- data is represented as collections of records - relationships are represented by sets - compared with the relational model, relationships are explicitly modeled by the sets, which become pointers in the implementation.
important features of data mining tools
- data preparation (most time consuming) - selection of data mining operations (algorithms) - product scalability and performance - facilities for understanding results.
metadata
- data that describes the objects in the database - makes it easier for those objects to be accessed or manipulated - contains definitions of records, data items, etc.
Semistructured data
- data that has some structure, but the structure may not be rigid
Record-Based Data Model
- database consists of a number of fixed-format records - relational data model (simplicity, data independence...high cost, complex) - network data model (easy access to data, many to many.....changes occur to entire DB) - hierarchical data model (addresses flat file storage issue...redundant, bad at many to many)
Physical Database Designer
- decides how the logical database design is to be physically realized - mapping the logical database design into a set of tables and integrity constraints - selecting specific storage structures and access methods for the data to achieve good performance - designing any security measures required on the data.
Recovery techniques using deferred update vs immediate update
- deferred update recovery protocol: updates are not written to the database until after a transaction has reached its commit point - immediate update recovery protocol: updates are applied to the database as they occur without waiting to reach the commit point
HTTP
- defines how clients and servers communicate Request-response paradigm: - connection - request - response - close
Validate conceptual data model against user transactions
- describing the transactions - using transaction pathways
EXISTS and NOT EXISTS
- designed for use only with subqueries - produce a simple true/false result. EX: 'Find all staff such that there exists a Branch row containing his/her branch number, branchNo, and the branch city equal to London'. SELECT staffNo, fName, lName, position FROM Staff s WHERE EXISTS (SELECT * FROM Branch b WHERE s.branchNo = b.branchNo AND city = 'London');
asymmetric encryption
- different keys, public and private - digital signature can be used - RSA
The DDBMS should appear like a centralized DBMS by providing a series of transparencies:
- distribution transparency - performance transparency - transaction transparency - DBMS transparency
dynamic query optimization
- dynamically carry out decomposition and optimization every time the query is run - advantage: all information required to select an optimum strategy is up to date - disadvantages: are that the performance of the query is affected because the query has to be parsed, validated, and optimized before it can be execute
advantages of SQL include:
- easy to learn - nonprocedural - free-format - DBMS-independent - recognized international standard
PRIMARY KEY
- entity integrity - used in CREATE or ALTER TABLE statements
The Conceptual Data Model comprises:
- entity types - relationship types - attributes and attribute domains - primary keys and alternate keys - integrity constraints
A conceptual data model comprises:
- entity types - relationship types - attributes and attribute domains - primary keys and alternate keys - integrity constraints.
The five commonly used fact-finding techniques:
- examining documentation - interviewing - observing the enterprise in operation - research - questionnaires
two types of subprograms
- functions (always returns a single value to caller) - (stored) procedures (used unless single value needed)
Reasons for Creating a Data Mart
- give users access to the data they need to analyze most often - provide data in a form that matches the collective view of the data by a group of users in a department or business application area - improve end-user response time due to the reduction in the volume of data to be accessed. - provide appropriately structured data as dictated by the requirements of the end-user access tools. - simpler build compared with establishing an enterprise-wide DW (EDW). - cost of implementing data marts is normally less than that required to establish a EDW. - future users of a data mart are more easily defined and targeted to obtain support for a data mart than an enterprise-wide data warehouse project.
external schema
- highest level - subschemas - corresponds to different views of the data
Large object data type
- holds a large amount of data, such as a long text file or a graphics file. - three different types: - Binary Large Object (BLOB) - Character Large Object (CLOB) - National Character Large Object (NCLOB)
isolation level
- indicates the degree of interaction that is allowed from other transactions during the execution of the transaction. Dirty read: A transaction reads data that has been written by another as yet uncommitted transaction. Nonrepeatable read: A transaction rereads data that it has previously read, but another committed transaction has modified or deleted the data in the intervening period. Phantom read: A transaction executes a query that retrieves a set of rows satisfying a certain search condition. When the transaction re-executes the query at a later time, additional rows are returned that have been inserted by another committed transaction in the intervening period. Only the SERIALIZABLE isolation level is safe, that is, generates serializable schedules.
The types of features that should be described in the systems specification include:
- initial database size - database rate of growth - the types and average number of record searches - networking and shared access requirements - performance -security -backup and recovery -legal issues
Data model
- integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization. - structural part: set of rules according to which databases can be constructed - manipulative part: defining the types of operation that are allowed on the data - set of integrity constraints: ensures that the data is accurate.
Forms generators
- interactive facility for rapidly creating data input and display layouts for screen forms - allows the user to define what the screen is to look like
attributes in the data dictionary consist of
- its domain, consisting of a data type, length, and any constraints on the domain - an optional default value for the attribute - whether the attribute can hold nulls - whether the attribute is derived and, if so, how it should be computed
main ways to implement a replication protocol:
- kernel-based replication - middleware-based replication
unique mobile environment query processing requirments
- location-aware queries - location-dependent queries - moving object database queries - spatio-temporal queries (query results depend on the query's spatial properties - space and time) - continuous queries To enable location-dependent queries, a database must support two- and three-dimensional geometric shapes and geometric operations
transaction
- logical unit of work consisting of one or more SQL statements that is guaranteed to be atomic with respect to recovery - COMMIT statement signals successful completion of a transaction and all changes to the database are made permanent. -ROLLBACK statement signals that the transaction should be aborted and all changes to the database are undone.
Identifying and associating attributes with entity or relationship types
- look for nouns or noun phrases in the users' requirements specification. The attributes can be identified where the noun or noun phrase is a property, quality, identifier, or characteristic of one of these entities or relationships - "What information are we required to hold on x or y?" - note if simple or composite, single or multi-valued, derived - each should be associated with one relationship or entity - document attributes
potential problems caused by concurrency:
- lost update problem: apparently successful operation overidden by another user - uncommitted dependency problem: one transaction is allowed to see the intermediate results of another transaction before it has committed - inconsistent analysis problem: a transaction reads several values from the database but a second transaction updates some of them during the execution of the first
internal schema
- lowest level of abstraction - complete description of the internal model - definitions of stored records - methods of representation - data fields
shadow paging
- maintains two-page tables during the life of a transaction: a current page table and a shadow page table - when the transaction starts, the two-page tables are the same (shadow page table is never changed thereafter) - when the transaction completes, the current page table becomes the shadow page table
mobile DBMS issues
- managing limited resources - security - transaction handling - query processing - disconnections (major issue)
One-to-one (1:1) binary relationship types
- mandatory participation on both sides of 1:1 relationship - mandatory participation on one side of 1:1 relationship - optional participation on both sides of 1:1 relationship.
Subprograms
- named PL/SQL blocks that can take parameters and be invoked - two types of subprogram called functions (always returns a single value to caller) and (stored) procedures (used unless single value needed)
Ways to address OLAP server implementation issues
- preaggregation (cube materialization) - dimensional hierarchy - sparse data management
operations associated with data mining techniques
- predictive modeling - database segmentation - link analysis - deviation detection
Conceptual modeling
- process of constructing a model of the information use in an enterprise that is independent of implementation details A conceptual data model comprises: - entity types - relationship types - attributes and attribute domains - primary keys and alternate keys - integrity constraints
subquery
- produces a temporary table with results that can be accessed and used by the outer statement - scalar subquery: returns a single column and a single row - row subquery returns multiple columns, but only a single row. - table subquery returns one or more columns and multiple rows.
Secondary indexes
- provide a mechanism for specifying an additional key for a base relation that can be used to retrieve data more efficiently - obviously, extra overhead required
XQuery
- query language for XML - functional language in which a query is represented as an expression EX: Find the staff number of the first member of staff in the XML document doc("staff_list.xml")/STAFFLIST/STAFF[1]//STAFFNO
Checking model for redundancy
- re-examine one-to-one (1:1) relationships (may have identified two entities that represent the same object) - remove redundant relationships (relationship is redundant if the same information can be obtained via other relationships) - consider time dimension
Properties of Relation
- relation has a name that is distinct from all other relation names in the relational schema - each cell of the relation contains exactly one atomic (single) value - each attribute has a distinct name - the values of an attribute are all from the same domain - each tuple is distinct - the order of attributes has no significance - the order of tuples has no significance, theoretically
Identifying relationship types
- relationships could be verbs in the users' requirements specification - only interested in required relationships between entities in this step - using an ER diagram can help - determine multiplicity constraints - check for fan/chasm traps - document relationships
Deviation Detection
- relatively new technique - identifies outliers, which express deviation from some previously known expectation and norm - can be performed using statistics and visualization techniques or as a by-product of data mining
primary copy ownership
- replicated data is owned by one site, the primary (or master) copy, and can be updated only by that site - "publish and subscribe" - Potential uses: Decision support system analysis, distribution and dissemination of centralized information, consolidation of remote information, mobile computing
entity type
- represents a group of "objects" in the "real world" with the same properties - Each is represented with a rectangle - has an independent existence and can be objects with a physical (or "real") existence or objects with a conceptual (or "abstract") existence
Types of integrity constraints
- required data (when an entity must contain valid data) - attribute domain constraints (legal values) - multiplicity (a branch has many staff and a member of staff works at a single branch) -entity integrity (primary key can not be null) - referential integrity (if the foreign key contains a value, that value must refer to an existing tuple in the parent relation (candidate key)) - general constraints
hierarchical model
- restricted type of network model - data is represented as collections of records and relationships are represented by sets - allows a node to have only one parent
system catalog
- result of the compilation of the DDL statements - a set of tables stored in special files collectively - also called data dictionary and data directory
analytical operations that can be performed on data cubes
- roll-up - drill-down - "slice and dice" - pivot
Two types of triggers
- row-level triggers (FOR EACH ROW) that execute for each row of the table that is affected by the triggering event - statement-level triggers (FOR EACH STATEMENT) that execute only once even if multiple rows are affected by the triggering event.
Approaches to Integrating the Web and DBMSs
- scripting languages such as JavaScript and VBScript - Common Gateway Interface (CGI), an early, and possibly one of the most widely used, techniques - HTTP cookies;extensions to the Web server, such as the Netscape API (NSAPI) and Microsoft's Internet Information Server API (ISAPI) - Java, JEE, JDBC, SQLJ, JDO, JPA, Servlets, and JavaServer Pages (JSP) - Microsoft's Web Solution Platform: .NET, Active Server Pages (ASP), and ActiveX Data Objects (ADO) - Oracle's Internet Platform
XML advantages
- simplicity - open standard - extensibility (allow users to define own tags) - reuse (libraries of XML tags, etc) - separation of content and presentation - improved load balancing - support for the integration of data from multiple sources -ability to describe data from a wide variety of applications - more advanced search engines
Field
- smallest unit of data in a database - corresponds to a simple attribute from the ER diagram
ALTER TABLE
- statement for changing the structure of a table once it has been created - add a new column to a table - drop a column from a table - add a new table constraint - drop a table constraint - set a default for a column - drop a default for a column.
Relationship structure types
- strong entity types - weak entity types - one-to-many (1:*) binary relationship types - one-to-one (1:1) binary relationship types - one-to-one (1:1) recursive relationship types - superclass/subclass relationship types - many-to-many (*:*) binary relationship types - complex relationship types - multi-valued attributes.
alias
- the alias is separated from the table name with a space. - can be used to qualify a column name whenever there is ambiguity regarding the source of the column name - used as a shorthand notation for the table name. EX: Column: SELECT column_name AS alias_name FROM table_name; Table: SELECT column_name(s) FROM table_name AS alias_name; Table example with: SELECT o.OrderID, o.OrderDate, c.CustomerName FROM Customers AS c, Orders AS o WHERE c.CustomerName="Around the Horn" AND c.CustomerID=o.CustomerID; Without: SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerName FROM Customers, Orders WHERE Customers.CustomerName="Around the Horn" AND Customers.CustomerID=Orders.CustomerID;
XML schema
- the definition (both in terms of its organization and its data types) of a specific XML structure
Logical data independence
- the immunity of the external schemas to changes in the conceptual schema - the ability to modify the logical schema without causing application program to be rewritten
relations identified in the logical data model consist of
- the name of the relation - a list of simple attributes in brackets - the primary key and, where appropriate, alternate keys (AK) and foreign keys (FK) - referential integrity constraints for any foreign keys identified
Physical database design
- the process of producing a description of the implementation of the database on secondary storage - describes base relations, storage structures and access methods - integrity constraints and security measures
static query optimization
- the query is parsed, validated, and optimized once advantage: the runtime overhead is removed and there may be more time available to evaluate a larger number of execution strategies disadvantage: the execution strategy that is chosen as being optimal when the query is compiled may no longer be optimal when the query is run
There are three general techniques for handling deadlock:
- timeouts (a transaction that requests a lock will wait for only a system-defined period of time. If the lock has not been granted within this period, the lock request times out) - deadlock prevention - deadlock detection and recovery * only way to recover - terminate a process
Requirements Collection and Analysis
- to create a users' requirements specification that describes in detail the data to be held in the database and how the data is to be used - the purpose of gathering this information is to create a systems specification, which describes any features to be included in the new database system (including views)
Transaction records contain:
- transaction identifier - type of log record (transaction start, insert, update, delete, abort, commit) - identifier of data item affected by the database action (insert, delete, and update operations) - before-image of the data item, that is, its value before change (update and delete operations only) - after-image of the data item, that is, its value after change (insert and update operations only) - log management information, such as a pointer to previous and next log records for that transaction (all operations)
Examples of transaction performance criteria
- transactions that run frequently and will have a significant impact on performance - transactions that are critical to the operation of the business - the times during the day/week when there will be a high demand made on the database (called the peak load) We use this information to identify the parts of the database that may cause performance problems
Approximate numeric data
- used for defining numbers that do not have an exact representation, such as real numbers. - approximate numeric, or floating point, notation is similar to scientific notation, in which a number is written as a mantissa times some power of ten (the exponent). EX: 10E3, 5.2E6, −0.2E-4. FLOAT [precision] REAL DOUBLE PRECISION
Multiplicity constraints
- used to check and maintain data quality - assertions about entity occurrences that can be applied when the database is updated to determine whether the updates violate the stated rules of the enterprise (like upper/lower limits)
Bit data
- used to define bit strings, that is, a sequence of binary digits (bits), each having either the value 0 or 1
Exact Numeric Data
- used to define numbers with an exact representation. - consists of digits, an optional decimal point, and an optional sign. - consists of a precision (total number of significant decimal digits) and a scale (total number of decimal places). EX: -12.345 has precision 5 and scale 3 NUMERIC [ precision [, scale] ] DECIMAL [ precision[, scale] ] INTEGER SMALLINT BIGINT
Datetime data
- used to define points in time to a certain degree of accuracy - DATE (uses YEAR, MONTH, and DAY) - TIME [timePrecision] [WITH TIME ZONE] (uses HOUR, MINUTE, and SECOND) - TIMESTAMP [timePrecision] [WITH TIME ZONE] (used to store date and times) - timePrecision is the number of decimal places of accuracy to which the SECOND field is kept
SQL Identifiers
- used to identify objects in the database, such as table names, view names, and columns - an identifier can be no longer than 128 characters (most dialects have a much lower limit than this) - an identifier must start with a letter - an identifier cannot contain spaces
Interval data
- used to represent periods of time - two classes of interval data type: year-month intervals and day-time intervals. INTERVAL {{startField TO endField} singleDatetimeField}startField = YEAR | MONTH | DAY | HOUR | MINUTE [(intervalLeadingFieldPrecision)]endField = YEAR | MONTH | DAY | HOUR | MINUTE | SECOND [(fractionalSecondsPrecision)]singleDatetimeField = startField | SECOND [(intervalLeadingFieldPrecision [, fractionalSecondsPrecision])]
Application Developer
- work from a specification produced by systems analysts - Each program contains statements that request the DBMS to perform some operation on the database, which includes retrieving data, inserting, updating, and deleting data.
The purpose of the system definition stage is...
...to define the scope and boundary of the database system and its major user views
The 8 steps
1. Build Conceptual Data Model 2. Build Logical Data Model 3. Translate Logical Data Model for Target DBMS 4. Design File Organizations and Indexes 5. Design User Views 6. Design Security Mechanisms 7. Consider the Introduction of Controller Redundancy 8. Monitor and Tune the Operational System
Guidelines for determining secondary indexes
1. Do not index small relations 2. In general, index the primary key of a relation if it is not a key of the file organization 3. Add a secondary index to a foreign key, if it is frequently accessed 4. Add a secondary index to any attribute that is heavily used as a secondary key 5. Add a secondary index on attributes that are frequently involved in: - selection or join criteria - ORDER BY - GROUP BY - other operations involving sorting (such as UNION or DISTINCT) 6. Add a secondary index on attributes involved in built-in aggregate functions, along with any attributes used for the built-in functions 7. As a more general case of the previous guideline, add a secondary index on attributes that could result in an index-only plan. 8. Avoid indexing an attribute or relation that is frequently updated. 9. Avoid indexing an attribute if the query will retrieve a significant proportion (for example 25%) of the tuples in the relation. In this case, it may be more efficient to search the entire relation than to search using an index. 10. Avoid indexing attributes that consist of long character strings.
relationally complete
A language that can be used to produce any relation that can be derived using the relational calculus is said to be relationally complete
MDBS
A multidatabase system (MDBS) is a distributed DBMS in which each site maintains complete autonomy
Base relation
A named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database
Relation schema
A named relation defined by a set of attribute and domain name pairs.
domain
A pool of values from which one or more attributes draw their values (employee number being 2 letters, 3 numbers....sex (M/F))
Multidimensional Schemas
A popular data model for multidimensional data is the star schema, which is characterized by having facts (measures) in the center surrounded by dimensions, forming a star-like shape. Variations of the star schema include the snowflake and starflake schema.
Deriving relations for logical data model results in
A relational schema
View serializability
A schedule is view serializable if it is view equivalent to a serial schedule
Schema
A schema is a named collection of database objects that are in some way related to one another Relations and other database objects exist in an environment. Among other things, each environment consists of one or more catalogs, and each catalog consists of a set of schemas.
Composition
A specific form of aggregation that represents an association between entities, where there is a strong ownership and coincidental lifetime between the "whole" and the "part."
Design methodology
A structured approach that uses procedures, techniques, tools, and documentation aids to support and facilitate the process of design.
shared subclass
A subclass with more than one superclass Results in multiple inheritance
Data Warehousing
A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision-making process
2PL
A transaction follows the two-phase locking protocol if all locking operations precede the first unlock operation in the transaction. According to the rules of this protocol, every transaction can be divided into two phases: - growing phase: in which it acquires all the locks needed but cannot release any locks - shrinking phase: in which it releases its locks but cannot acquire any new locks.
users' requirements specification
All of the information gathered on each user view of the database system is described in this document
GROUP BY
Allows summary information to be included in the result table. Rows that have the same value for one or more columns can be grouped together and treated as a unit for using the aggregate functions.
well-formed XML document
An XML document that conforms to the structural and notational rules of XML is considered well-formed.
valid XML document
An XML document that is well-formed and also conforms to a DTD (Document Type Definition) is considered valid
Transaction
An action, or series of actions, carried out by a single user or application program, that accesses or changes the content of the database. Three main types: Retrieval Update Mixed (involves both retrieval and update)
Superkey
An attribute, or set of attributes, that uniquely identifies a tuple within a relation - may contain attributes not necessary for unique identification
Superclass
An entity type that includes one or more distinct subgroupings of its occurrences, which must be represented in a data model.
Deadlock
An impasse that may result when two (or more) transactions are each waiting for locks to be released that are held by the other.
CUBE
CUBE takes a specified set of grouping columns and creates subtotals for all of the possible combinations If n columns are specified for a CUBE, there will be 2n combinations of subtotals returned EX: SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales FROM Branch, PropertyForSale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.yearMonth IN ('2013-08', '2013-09') AND Branch.city IN ('Aberdeen', 'Edinburgh', 'Glasgow') GROUP BY CUBE(propertyType, yearMonth, city);
Multi-user view handling approaches
Centralized: Requirements for each user merged into single set of requirements (one-shot approach). View integration: View requirements remain on separate lists. Data models for single user view called "local data model" - which are merged into "global data model" in later stage.
Classification
Classification is used to establish a specific predetermined class for each record in a database from a finite set of possible class values. There are two specializations of classification: tree induction and neural induction. Example of tree induction:
Requirements Collection and Analysis
Collection and analysis of the requirements for the new database system. Details for each major user view: - description of data - details of how data used - additional requirements Requirements Specifications documents produced. Multi-user view handling (centralized or view integration)
CASE tools
Computer-Aided Software Engineering CASE support may include: - a data dictionary to store information about the database system's data - design tools to support data analysis - tools to permit development of the corporate data model, and the conceptual and logical data models - tools to enable the prototyping of applications
This phase of database design is primarily concerned with developing the ER diagram
Conceptual (ER = Entity Relationship)
Phases of Database Design
Conceptual, Logical, Physical
Database Design
Conceptual, logical and physical design of the database. Two approaches - "bottom up" (begins with attributes - entities and their relationships) and "top down" (more appropriate for complex databases - uses "Entity-Relationship" model) Database and Application design are parallel processes
Control statements
Conditional: IF THEN ELSE CASE Iterative: LOOP WHILE REPEAT FOR
Integrity control
Consists of constraints that we wish to impose in order to protect the database from becoming inconsistent. Defined in the CREATE and ALTER TABLE statements. Types: - required data (NOT NULL) - domain constraints - entity integrity (PRIMARY KEY) - referential integrity (FOREIGN KEY) - general constraints (CHECK, UNIQUE, CREATE ASSERTION)
UPDATE
Contents of existing rows in a named table to be changed
The logical design is said to have this with the conceptual design....
Correspondence with E-R Model Relations (tables) correspond with entity types and with many-to-many relationship types Rows correspond with entity instances and with many-to-many relationship instances Columns correspond with attributes NOTE: The word relation (in relational database) is NOT the same as the word relationship (in E-R model)
CRISP-DM
Cross Industry Standard Process for Data Mining - CRISP-DM remains the top methodology for data mining projects - provides a structured approach to planning a data mining project 1. Business understanding 2. Data understanding 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment
DDL
Data Definition Language Language that allows the DBA or user to describe and name the entities, attributes, and relationships required for the application, together with any associated integrity and security constraints.
DML
Data Manipulation Language - It allows users to insert, update, delete, and retrieve data from the database
DML
Data Manipulation Language - A language that provides a set of operations to support the basic data manipulation operations on the data held in the database. - part of a DML that involves data retrieval is called a query language - The terms "query language" and "DML" are commonly used interchangeably
DBA
Database Administrator - responsible for the physical realization of the database - physical database design and implementation - security and integrity control - maintenance of the operational system - ensuring satisfactory performance - more technically oriented than the role of the DA
DSDLC
Database System Development Lifecycle The effort spent on maintenance (correcting faults that had been detected, implementing new user requirements, and modifying the software to run on new or upgraded platforms) began to absorb resources at an alarming rate. As a result, many major software projects were late, over budget, unreliable, and difficult to maintain, and performed poorly. These issues led to what has become known as the software crisis (software depression)
Which parts of a database need to be backed up?
Database and log file
System Catalog
Description of the data *data dictionary or metadata
Application Design
Designing the user interface and the application programs that use and process the database Database and Application design are parallel processes Involves transaction design and interface design
DOLAP
Desktop OLAP DOLAP tools store the OLAP data in client-based files and support multidimensional processing using a client multidimensional engine issues include: - developing security controls
Participation
Determines whether all or only some entity occurrences participate in a relationship
Participation constraint
Determines whether every member in the superclass must participate as a member of a subclass. Mandatory or Optional
Comparison of DM and ER models
Dimensional vs Entity Relationship A single ER model normally decomposes into multiple DMs. Multiple DMs are then associated through 'shared' dimension tables. E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design. Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design. E-R modelling revovles around the Entities and their relationships to capture the overall process of the system. Dimensional model/Muti-Dimensinal Modelling revolves around Dimensions(point of analysis) for decison making and not to capture the process.
authorization mechanisms
Discretionary access control: Each user is given appropriate access rights (or privileges) on specific database objects Mandatory access control: Each database object is assigned a certain classification level (for example, Top Secret, Secret, Confidential, Unclassified) and each subject (for example, users or programs) is given a designated clearance level
Guidelines for representation of superclass/subclass relationship
Disjoint represented by an 'OR' Non-disjoint (overlapping) represented by 'AND' Disjoint constraint only used for a hierarchy with more than one subclass So 4 possibilities for constraints shown on EERM: {Mandatory, OR} Must belong to exactly one subclass {Mandatory, AND} Must belong to one or more subclasses {Optional, OR} May belong to one subclass or none {Optional, AND} May belong to any number of subclasses
DDBMS
Distributed Database Management System: software that transparently manages the distributed database
DOM
Document Object Model: a tree-based API for XML
DTD
Document Type Definition - defines the valid syntax of an XML document
General constraint example:
DreamHome has a rule that prevents a member of staff from managing more than 100 properties at the same time:
ER vs EER
ER (entity relationship) includes: - entities - relationships - attributes EER (enhanced entity relationship) also includes: - specialization/generalization (superclass/subclass) - aggregation ("has-a" or "is-part-of" relationship) - composition (association between entities, "whole" and the "part.")
logical data model includes
ER diagram(s), relational schema, and supporting documentation such as the data dictionary, which is produced throughout the development of the model.
File based system
Each program defines and manages its own data -Data is isolated -Prone to data duplication -References and querries hard-coded resulting in Data Dependence -Data in proprietary format
ER modeling
Entity Relationship - top-down approach to database design - begins by identifying the important data called entities and relationships between the data that must be represented in the model - then more detail added such as the information we want to hold about the entities and relationships (attributes and any constraints on the entities, relationships, and attributes)
Use of JOIN
FROM Client JOIN Viewing USING clientNo
Facts
Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed
An HTTP request consists of:
GET POST (posts to resource) HEAD (requests header) PUT DELETE OPTIONS
The process of minimizing the differences between entities by identifying their common characteristics.
Generalization - top-down
Data Warehouse
Holds multiple subject areas Holds very detailed information Works to integrate all data sources Does not necessarily use a dimensional model but feeds dimensional models.
PHP
Hypertext Preprocessor - popular open source HTML-embedded scripting language
SQL modify statements
INSERT - adds new rows of data to a table UPDATE - modifies existing data in a table DELETE - removes rows of data from a table
functional dependencies
Indicate important relationships between the attributes of a relation. It is those functional dependencies and the primary key for each relation that are used in the process of normalization.
ISLC
Information Systems Lifecycle
Relation anomalies
Insertion (adding new rows forces user to create duplicate date) Deletion (deleting rows may cause a loss of data that would be needed for other future rows) Modification (changing data in a row forces changes to other rows because of duplication)
interpreted programming language
Interpreted (not compiled) by client
The most commonly used and normally the most useful fact-finding technique
Interviewing
The 80/20 rule
It has been suggested that the most active 20% of user queries account for 80% of the total data access (Wiederhold, 1983).
%TYPE
It is also possible to declare a variable to be of the same type as a column in a specified table or another variable using the %TYPE attribute. For example, to declare that the vStaffNo variable is the same type as the staffNo column of the Staff table, we could write: vStaffNo Staff.staffNo%TYPE; vStaffNo1 vStaffNo%TYPE; Also %ROWTYPE
write-set
It is good practice to collect all updates at the local site first and propagate them in one message to the remote sites later. Such an approach requires the extraction of the write-set
Requirements for a table to qualify as a relation:
It must have a unique name. Every attribute value must be atomic (not multivalued, not composite). Every row must be unique (can't have two rows with exactly the same values for all their fields). Attributes (columns) in tables must have unique names. The order of the columns must be irrelevant. The order of the rows must be irrelevant.
JDO
Java Data Objects - provide a standard interface between application objects and data sources, such as relational databases, XML databases, legacy databases, and file systems. - provide developers with a transparent Java-centric mechanism for working with persistent data to simplify application development.
JDBC
Java Database Connectivity - the most prominent and mature approach for accessing relational DBMSs from - modeled after the Open Database Connectivity (ODBC) specification - JDBC package defines a data-base access API that supports basic SQL functionality and enables access to a wide range of relational DBMS products - with JDBC, Java can be used as the host language for writing database applications
JPA
Java Persistence API - defines an interface to persist normal Java objects, sometimes called Plain Old Java Objects (or POJOs), to a datastore.
JSP
JavaServer Pages is a Java-based server-side scripting language that allows static HTML to be mixed with dynamically generated HTML.
Lore
Lightweight Object REpository - a multiuser DBMS, supporting crash recovery, materialized views, bulk loading of files in some standard format (XML is supported), and a declarative update language - Lorel is its language
This phase of database design is primarily concerned transforming the ER diagram into relations
Logical
VBScript
Microsoft proprietary interpreted scripting language
MIME
Multipurpose Internet Mail Extensions - a standard for encoding binary data into ASCII, as well as a standard for indicating the type of data contained inside a message. Subtypes include HTML, JPEG, MPEG, PDF
OLAP environment multidimensional data is represented as:
N-dimensional data cubes. An alternative representation for a data cube is as a lattice of cuboids
A schedule where the operations from a set of concurrent transactions are interleaved
Nonserial schedule
optional participation on both sides of 1:1 relationship
Parent/child assignment arbitrary
Database segmentation
Partitions a database into an unknown number of segments, or clusters, of similar records. This approach uses unsupervised learning to discover homogeneous subpopulations in a database to improve the accuracy of the profiles.
Warehouse Manager
Performs all the operations associated with the management of the data in the warehouse such as: - Analysis of data to ensure consistency. - Transformation and merging of source data from temporary storage into data warehouse tables. - Creation of indexes and views on base tables. - Generation of denormalizations, (if necessary). - Generation of aggregations, (if necessary). - Backing-up and archiving data.
Query Manager
Performs the operations associated with the management of user queries
three types of classes in JDO
Persistence-capable: classes whose instances can be persisted to a data store. Persistence-aware: classes that manipulate persistence-capable classes. Normal: classes that are not persistable and have no knowledge of persistence.
Database Planning
Planning how the stages of the lifecycle can be realized most efficiently and effectively. Identification of mission statement, mission objectives and standards definitions.
Perl
Practical Extraction and Report Language - high-level interpreted programming language with extensive, easy-to-use text processing capabilities.
dimensional model advantages
Predictable and standard form of the underlying dimensional model offers important advantages: - Efficiency - Ability to handle changing requirements - Extensibility - Ability to model common business situations - Predictable query processing
WITH CHECK OPTION
Prevents "migrating rows" resulting from inserts or updates WITH CASCADED CHECK OPTION will prevent migration resulting from changes that cascade to other views
Statement to create roles
Privileges can be explicitly granted to users. For example, a user can explicitly grant the privilege to insert rows into the PropertyForRent table to the user Beech:GRANT INSERT ON PropertyForRent TO Beech; Privileges can also be granted to a role (a named group of privileges), and then the role granted to one or more users
PL/SQL block parts
Procedural Language SQL an optional declaration part, in which variables, constants, cursors, and exceptions are defined and possibly initialized a mandatory executable part, in which the variables are manipulated an optional exception part, to handle any exceptions raised during execution
Two types of DML:
Procedural: specifies how the output of a DML statement is to be obtained. 3GLs Nonprocedural: describe only what output is to be obtained. 4GLs
Division
R ÷ S The Division operation defines a relation over the attributes C that consists of the set of tuples from R that match the combination of every tuple in S.
Intersection
R ∩ S The Intersection operation defines a relation consisting of the set of all tuples that are in both R and S. R and S must be union-compatible.
Union
R ∪ S The union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. R and S must be union-compatible. (List all cities where there is either a branch office or a property for rent)
Natural join
R ⋈ S The Natural join is an Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.
EquiJoin
R ⋈FS In the case where the predicate F contains only equality (=), the term Equijoin is used instead
Theta join (θ-join)
R ⋈FS The Theta join operation defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. The predicate F is of the form R.ai θ S.bi, where θ may be one of the comparison operators (<, ≤, >, ≥, =, ≠). Further: A theta join allows for arbitrary comparison relationships (such as ≥). An equijoin is a theta join using the equality operator. A natural join is an equijoin on attributes that have the same name in each relationship.
(Left) Outer join
R ⋊ S The (left) Outer join is a join in which tuples from R that do not have matching values in the common attributes of S are also included in the result relation. Missing values in the second relation are set to null.
Semijoin
R ▹F S The Semijoin operation defines a relation that contains the tuples of R that participate in the join of R with S satisfying the predicate F.
A relationship type in which the same entity type participates more than once in different roles.
Recursive relationship:
RDBMS
Relational Database Management System - has become the dominant data-processing software in use today,
ROLAP
Relational online analytical processing - fastest-growing type of OLAP server (response to MOLAP limitations) - supports relational DBMS products through the use of a metadata layer issues include: - performance problems due to processing complex queries - middleware development challenges
Aggregation
Represents a "has-a" or "is-part-of" relationship between entity types, where one represents the "whole" and the other the "part."
local logical data model
Represents the data requirements of one or more but not all user views of a database
The SQL programming language is known as
SQL/PSM (persistent stored modules)
A sequence of the operations by a set of concurrent transactions that preserves the order of the operations in each of the individual transactions.
Schedule
The two unary operations
Selection and Projection Projection means choosing which columns (or expressions) the query shall return. Selection means which rows are to be returned. - extract information from only one relation
A schedule where the operations of each transaction are executed consecutively without any interleaved operations from other transactions
Serial schedule
Predictive modeling
Similar to the human learning experience in using observations to form a model of the important characteristics of some phenomenon Model is developed using a supervised learning approach, which has two phases: training (using historical data for a training set) and testing (new data)
Snowflake schema
Snowflake schema is a variant of the star schema that has a fact table in the center, surrounded by normalized dimension tables
SDLC
Software Development Lifecycle
Conceptual database design steps
Step 1 Build conceptual data model Step 1.1 Identify entity types Step 1.2 Identify relationship types Step 1.3 Identify and associate attributes with entity or relationship types Step 1.4 Determine attribute domains Step 1.5 Determine candidate, primary, and alternate key attributes Step 1.6 Consider use of enhanced modeling concepts (optional step) Step 1.7 Check model for redundancy Step 1.8 Validate conceptual data model against user transactions Step 1.9 Review conceptual data model with user
Steps in building the Conceptual Data Model
Step 1.1: Identify entity types (define main objects user is interested in) Step 1.2: Identify relationship types (Use Entity-Relationship (ER) modeling to visualize the entity and relationships) Step 1.3: Identify and associate attributes with entity or relationship types (simple/composite attributes, single-valued/multi-valued attributes, and derived attributes) Step 1.4: Determine attribute domains Step 1.5: Determine candidate, primary, and alternative key attributes Step 1.6: Consider use of enhanced modeling concepts (optional step) (specialization/generalization, aggregation, and composition) Step 1.7: Check model for redundancy Step 1.8: Validate conceptual data model against user transactions Step 1.9: Review conceptual data model with user
Logical database design steps for the relational model
Step 2 Build logical data model Step 2.1 Derive relations for logical data model Step 2.2 Validate relations using normalization Step 2.3 Validate relations against user transactions Step 2.4 Check integrity constraints Step 2.5 Review logical data model with user Step 2.6 Merge logical data models into global model (optional step) Step 2.7 Check for future growth
Steps in Building Logical Data Model
Step 2.1 Derive relations for logical data model Step 2.2 Validate relations using normalization Step 2.3 Validate relations against user transactions Step 2.4 Check integrity constraints Step 2.5 Review logical data model with user Step 2.6 Merge logical data models into global data model (optional step) Step 2.7 Check for future growth
Steps in building the Logical Data Model
Step 2.1: Derive relations for logical data model Step 2.2: Validate relations using normalization Step 2.3: Validate relations against user transactions Step 2.4: Check integrity constraints Step 2.5: Review logical data model with user Step 2.6: Merge logical data models into global model Step 2.7: Check for future growth
Steps of physical database design
Step 3 Translate logical data model for target DBMS Step 3.1 Design base relations Step 3.2 Design representation of derived data Step 3.3 Design general constraints Step 4 Design file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements Step 5 Design user views Step 6 Design security mechanisms Step 7 Consider the introduction of controlled redundancy Step 8 Monitor and tune the operational system
Physical database design steps for relational databases
Step 3 Translate logical data model for target DBMS Step 3.1 Design base relations Step 3.2 Design representation of derived data Step 3.3 Design general constraints Step 4 Design file organizations and indexes Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements Step 5 Design user views Step 6 Design security mechanisms Step 7 Consider the introduction of controlled redundancy Step 8 Monitor and tune the operational system
Steps in "Translate Logical Data Model for Target DBMS"
Step 3.1: Design base relations Step 3.2: Design representation of derived data Step 3.3: Design general constraints
Steps for designing File Organizations and Indexes
Step 4.1: Analyze transactions Step 4.2: Choose file organizations Step 4.3: Choose indexes Step 4.4: Estimate disk space requirements
a transaction updates a data item based on its old value, which is first read by the transaction
Testing for conflict serializability - Constrained write rule o Transaction updates data item based on its old value, which is first read - Precedence Graph o Node for each transaction o Edge from node that writes a value to node that then reads that same value o Edge from node that reads a value to a node that then writes into
Cardinality
The cardinality of a relation is the number of tuples it contains. In the relational model, tables can be related as any of "one-to-many" or "many-to-many". This is said to be the cardinality of a given table in relation to another. Cardinality refers to the uniqueness of data values contained in a particular column (attribute) of a database table. The lower the cardinality, the more duplicated elements in a column
Integrity constraints are
The constraints that we wish to impose in order to protect the database from becoming incomplete, inaccurate, or inconsistent
Integrated Data
The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent.
Physical data independence
The immunity of the conceptual schema to changes in the internal schema.
schema
The logical organization of the entire database as seen by the DBA
Multiplicity
The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship (1 to 1, 1 to many, many to many, complex)
Multiplicity constraints
The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship. Multiplicity constrains the way that entities are related. It is a representation of the policies (or business rules) established by the user or enterprise.
Logical database design output
The output of this process is a logical data model consisting of an ER/relation diagram, relational schema, and supporting documentation that describes this model, such as a data dictionary. Logical = what, physical = how
Logical database design
The process of constructing a model of the data used in an enterprise based on a specific data model but independent of a particular DBMS and other physical considerations.
Data mining
The process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions. Uses:
concurrency control
The process of managing simultaneous operations on the database without having them interfere with one another.
relational schema validation
The structure of the relational schema is validated: - using normalization - then checked to ensure that the relations are capable of supporting the transactions given in the users' requirements specification - next check whether all important integrity constraints are represented by the logical data model - then validated by the users to ensure that they consider the model to be a true representation of the data requirements of the enterprise
Types of fragmentation
There are two main types of fragmentation: horizontal and vertical. - horizontal fragments are subsets of tuples - vertical fragments are subsets of attributes. There are also two other types of fragmentation: mixed and derived, a type of horizontal fragmentation where the fragmentation of one relation is based on the fragmentation of another relation
cube materialization
To address storage/resource issues create only a subset of all possible cuboids with the aim of supporting the majority of queries and/or the most demanding in terms of resources
Conceptual database design
To build the conceptual representation of the database, which includes identification of the important entities, relationships, and attributes. Constructing a model independent of implementation.
Physical database design
To decide how the logical structure is to be physically implemented (as base relations) in the target DBMS.
Objective of Design File Organizations and Indexes step
To determine the optimal file organizations to store the base relations and the indexes that are required to achieve acceptable performance, that is, the way in which relations and tuples will be held on secondary storage. Step 4.1 Analyze transactions Step 4.2 Choose file organizations Step 4.3 Choose indexes Step 4.4 Estimate disk space requirements
The purpose of normalization is
To ensure that the set of relations has a minimal yet sufficient number of attributes necessary to support the data requirements of the enterprise with minimal redundancy
Purpose for Entity-Relationship (ER) Modeling
To ensure that we get a precise understanding of the nature of the data and how it is used by the enterprise, we need a model for communication that is nontechnical and free of ambiguities.
Objective of Translate Logical Data Model for Target DBMS step
To produce a relational database schema from the logical data model that can be implemented in the target DBMS. Step 3.
view materialization
To store the view as a temporary table in the database when the view is first queried (The speed difference may be critical in applications where the query rate is high and the views are complex, so it is not practical to recompute the view for every query) One disadvantage with materialized views is maintaining the currency of the temporary table.
Logical database design
To translate the conceptual representation to the logical structure of the database, which includes designing the relations. Independent of DBMS or other physical considerations.
When is updating a view allowed?
Updates are allowed through a view defined using a simple query involving a single base relation and containing either the primary key or a candidate key of the base relation. Updates are not allowed through views involving multiple base relations. Updates are not allowed through views involving aggregation or grouping operations.
CASE tools categories:
Upper-CASE tools support the initial stages of the database system development lifecycle, from planning through to database design. Lower-CASE tools support the later stages of the lifecycle, from implementation through testing, to operational maintenance. Integrated-CASE tools support all stages of the lifecycle and thus provide the functionality of both upper- and lower-CASE in one tool.
transaction analysis form example
Used in Step 4: Design File Organizations and Indexes (map all transaction paths to relations; determine which relations are most frequently accessed by transactions; analyze the data usage of selected transactions that involve these relations)
transaction usage map
Used in Step 4: Design File Organizations and Indexes (map all transaction paths to relations; determine which relations are most frequently accessed by transactions; analyze the data usage of selected transactions that involve these relations)
Value prediction
Value prediction is used to estimate a continuous numeric value that is associated with a database record. This technique uses the traditional statistical techniques of linear regression and nonlinear regression
Determining candidate, primary, and alternate key attributes
When choosing a primary key from among the candidate keys, use the following guidelines to help make the selection: - the candidate key with the minimal set of attributes - the candidate key that is least likely to have its values changed - the candidate key with fewest characters (for those with textual attribute(s)) - the candidate key with smallest maximum value (for those with numerical attribute(s)) - the candidate key that is easiest to use from the users' point of view
workflow ownership
Workflow ownership allows the right to update replicated data to move from site to site. However, at any one moment, there is only ever one site that may update that particular data set
XML Information Set
XML Information Set (or Infoset) is an abstract description of the information available in a well-formed XML document that meets certain XML namespace constraints
persistence entity
a lightweight Java class that typically represents a table in a relational database.
conservative 2PL
a transaction obtains all its locks when it begins, or it waits until all the locks are available
A data warehouse is well equipped for providing data for mining because
a warehouse not only holds data of high quality and consistency, and from multiple sources, but is also capable of providing subsets (views) of the data for analysis and lower level details of the source data, when required.
candidate key, primary key, alternate keys
candidate key: a minimal set of attributes of an entity that uniquely identifies each occurrence of that entity. We may identify more than one candidate key, in which case we must choose one to be the primary key (simplest, least likely to change); the remaining candidate keys are called alternate keys.
Situation in which a single transaction leads to a series of rollbacks
cascading rollback *cascading rollbacks are undesirable, because they potentially lead to the undoing of a significant amount of work
two techniques associated with predictive modeling
classification and value prediction
mandatory participation on both sides of 1:1 relationship
combine the entities involved into one relation and choose one of the primary keys of the original entities to be the primary key of the new relation, while the other (if one exists) is used as an alternate key
dimensional hierarchy
defines mappings from a set of lower-level concepts to higher-level concepts zipCode → area → city → region → country
advantages of a DDBMS
distributed database management system - it reflects the organizational structure - it makes remote data more shareable - it improves reliability, availability, and performance - it may be more economical - it provides for modular growth, facilitates integration - helps organizations remain competitive.
Java Web Services
document-oriented—those that deal directly with processing XML documents procedure-oriented—those that deal with procedures.
force policy
ensures that all pages updated by a transaction are immediately written to disk when the transaction commits. The alternative policy is no-force
Object-based data models
entities: a distinct object attributes: a property that describes some aspect of the object relationships: an association between entities
ER
entity-relationship modeling
ETL
extract, transform and load - used in data warehousing, data marts - migrating from one database to another
middleware-based replication
middleware is responsible for coordinating client requests across the different replicas and appears as one database system to the client
Relational modeling uses these to maintain relationships
primary keys and foreign keys Primary keys are typically the unique identifier noted on the conceptual model Foreign keys are the primary key of another entity to which an entity has a relationship Composite keys are primary keys that are made of more than one attribute Weak entities Associative entities
allows a user to create or access (that is read, write, or modify) some database object (such as a relation, view, or index) or to run certain DBMS utilities
privilege
Java
simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multithreaded, and dynamic language from Sun Microsystems. Java applications are compiled into bytecodes, which are interpreted and executed by the Java Virtual Machine
Two parts to a package
specification: declares all public constructs of the package body: defines all constructs (public and private) of the package, and so implements the specification
holds only exclusive locks until the end of the transaction
strict 2PL
The SQL:2011 standard
supports OLAP functionality in the provision of extensions to grouping capabilities such as: - CUBE and ROLLUP functions - elementary operators such as moving windows and ranking functions
uses the same key for both encryption and decryption
symmetric encryption DES, PGP
Variable assignment in the executable part of a PL/SQL block
using the normal assignment statement (:=) or as the result of an SQL SELECT or FETCH statement. For example: vStaffNo := 'SG14' ;vRent := 500;SELECT COUNT(*) INTO x FROM PropertyForRent WHERE staffNo = vStaffNo; *PL/SQL (Procedural Language/Structured Query Language) is Oracle Corporation's procedural extension for SQL and the Oracle relational database
log records (or at least certain parts of them) are written before the corresponding write to the database
write-ahead log protocol
UNIQUE constraint
you can use UNIQUE constraints to make sure that no duplicate values are entered in specific columns that do not participate in a primary key. Although both a UNIQUE constraint and a PRIMARY KEY constraint enforce uniqueness, use a UNIQUE constraint instead of a PRIMARY KEY constraint when you want to enforce the uniqueness of a column, or combination of columns, that is not the primary key
Projection
Πa1 ... , an(R) The Projection operation works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. (Produce a list of salaries for all staff, showing only the staffNo, fName, lName, and salary details)
Selection
σpredicate(R) The Selection operation works on a single relation R and defines a relation that contains only those tuples of R that satisfy the specified condition (predicate) (List all staff with a salary greater than £10000)