Advanced Database Systems
What are the aims of DBMS?
-Provide users with an abstract view of data -Hide certain details of data from different users -No details about how data is stored
When are pairs of operations not in conflict?
-When two transactions only read some data item. -When two transactions read or write completely different data items
What are the main advantages of the Enhanced ER model?
-avoid describing similar concepts more than once -have relations that include a subclass but not the superclass -more semantic information to the design of a form
What is a recoverable schedule?
-once a transaction T is committed -it should never be necessary to roll back.
Semi-structured data
-self describing data -schema information is mixed with data values
Unstructured data
-very limited indication of the type of data
Several actions are necessary to create a database. Place these in the correct order: 1. Create the data dictionary views 2. Create the parameter file 3. Create the password file 4. Issue the CREATE DATABASE command 5. Issue the STARTUP command
2. Create the parameter file 3. Create the password file 5. Issue the STARTUP command 4. Issue the CREATE DATABASE command 1. Create the data dictionary views
Page 211 Question 1
A = Segment B = Extent C = Oracle Block D = Datafile
Database managment system (DBMS)
A central set of common functions for managing a database
Field
A character or group of characters that have a specific meaning, used to define and store data
What is a database?
A collection of files storing related data.
File
A collection of related records
Primary Key
A column in a relational database whose values must be unique for each row
Foreign Key
A column or field that is a primary key in another table
Surrogate Key
A column you can create to be the records primary key identifier
Row
A data set representing a single item
Multidimensional data model (OLAP)
A database is a set of facts (points) in a multidimensional space. Consists of fact tables and dimension tables
Relationship
A logical connection between different tables
Record
A logically connected set of one or more fields that describes a person, place, or thing
Database administrator (DBA)
A person charged with maintaining, administering, and maintaining the database
Database
A place to store and manage application data
Data Dictionary veiws
A program that allows the user to view objects by reference
Transaction log
A record of all related changes in the computers memory
What is a non serial schedule?
A schedule where the operations of some transactions are interleaved.
What is a serial schedule?
A schedule where the operations of two transactions are not interleaved.
What is a schedule?
A sequence of operations from a set of n transactions T1,T2,...Tn such that: the order of the operations in each transaction Ti is preserved in the schedule.
Foreign key
A set of attributes within a relation R1 that matches the primary key of a relation R2.
Table
A set of rows sharing the same attributes
What files are generated when you choose the option to Generate Database Creation Scripts in the Database Configuration Assistant?
A shell script SQL scripts A parameter file A password file
What is a cascading rollback?
A single transaction failure leading to a series of transaction rollbacks
Triggers
A statement that is executed automatically by the system as a side effect of a modification to the database.
Normalization
A step-by-step process to eliminate data redundancy 1NF, 2NF, 3NF
Field
A table consists of several records, each record can be broken into several smaller entities
Raw facts
A telephone number, a birth date, a customer name, and a year to date (YTD) sales value
What is the uncommitted dependency (dirty data) problem?
A transaction is allowed to see the intermediate results of another transaction before it has committed
Roll back
A transaction log reverses back to a state before any change occurred
What is Consistency?
A transaction must transform the database frm one consistent state to another consistent state.
What is the inconsistent analysis problem?
A transaction reads some values while they are being updated by another transaction.
Composite Key
A unique key created by combining two or more columns. Usually containing feilds that are primary keys in other tables
Oracle Net
A utility that enables network communcation between client and the server used by oracle servers
What are the 4 properties of a transaction?
ACID: Atomicity Consistency Isolation Durability
Which of these actions will not be recorded in the alert log? A. ALTER DATABASE commands B. ALTER SESSION commands C. ALTER SYSTEM commands D. Archiving an online redo log file E. Creating a tablespace F. Creating a user
ALTER SESSION commands Creating a user
Changing Column's Data Type §
ALTER can be used to change data type § Some RDBMSs do not permit changes to data types unless column is empty § Syntax - § ALTER TABLE tablename MODIFY (columnname(datatype)) ;
Null
Absence of any data value that could represent: An unknown attribute value A known, but missing, attribute value A inapplicable condition
At what point can you not choose or change the database character set?
After database creation, using DBCA to install options.
Operations on Multidimensional Data Model
Aggregation (roll-up), navigation (drill-down), selection (slide), calculation, ranking, etc
Objects
All database objects
ALL operator
Allows comparison of a single value with a list of values returned by the first subquery § Uses a comparison operator other than equals
What is a transaction?
An action carried out by the user/ program that reads or updates the database.
How do we guarantee serializability with locks?
An additional protocol controls the positioning of locks. Two phase locking (2PL): -for every single transaction, all locking operations occure before unlocking operations
Determinant
Any attribute whose value determines other values within a row
Relationships
Association between entities that always operate in both directions
Primary key (PK)
Attribute or combination of attributes that uniquely identifies any given row
Foreign Key
Attribute or combination of attributes in one table whose values must either match the primary key in another table or be null
Super Key
Attribute or combination of attributes that uniquely identifies each row in a table
Levels of Database Backups
Backups are provided with high security
BLOB
Binary Large Object stores up to 4 GB of data
Command §
CREATE SCHEMA AUTHORIZATION {creator}; § Seldom used directly
Syntax to create table §
CREATE TABLE tablename();
Q.43 In the figure below, Customer_ID in the CUSTOMER Table is which type of key?
Candidate
Primary Key
Candidate key selected to uniquely identify all other attribute values in any given row; cannot contain null entries
Alternate key
Candidate keys not selected
Atomic attribute
Cannot be further subdivided
Business intelligence
Captures and processes business data to generate information that support decision making
What is a problem of timestamp protocol? What is the solution?
Cascading rollback. Solution: -A transaction is structured such that all its writes are all performed at the end of its processing -All writes of a transaction form an atomic action
Database management system (DBMS)
Collection of programs, Manages the database structure, Controls access to data stored in the database
NoSQL Disadvantages
Complex programming is required, There is no relationship support, There is no transaction integrity support, In terms of data consistency, it provides an eventually consistent model
Main points of conservative method and optimistic method?
Conservative: delay the transactions when they are in conflict Optimistic: assume that transactions are rarely in conflict. Check for conflicts just before the transaction commits
Keys
Consist of one or more attributes that determine other attributes
Disjoint subtypes
Contain a unique subset of the supertype entity set, Known as nonoverlapping subtypes, Implementation is based on the value of the subtype discriminator attribute in the supertype
Processing mismatch §
Conventional programming languages process one data element at a time § Newer programming environments manipulate data sets in a cohesive manner §
What operation cannot be applied to a tablespace after creation?
Convert from manual segment space management to automatic segment space management
Conviction (Association Rules)
Conviction (A => B) = Pr(A)*Pr(not B) / Pr(A ∧ not B)
Factors Affecting Software Purchasing Decision
Cost DBMS features and tools Underlying model Portability DBMS hardware requirements
Which of these operations can be accomplished with the DBCA? A. Create a database B. Remove a database C. Upgrade a database D. Add database options E. Remove database options
Create a database, Remove a database, Add database options
Developing an ER Diagram
Create a detailed narrative of the organization's description of operations
Database Developer
Create and maintain database based applications, programing, database fundamentals, SQL
Cloud database
Created and maintained using cloud data services that provide defined performance measures for the database
Oracle application server
Creates a World Wide Website that allows users to access oracle databases and create dynamic web pages
Virtualization
Creates logical representations of computing resources independent of underlying physical computing resources
Which view will list all tables in the database?
DBA_TABLES
Which views could you query to find out about the temporary tablespaces and the files that make them up?
DBA_TABLESPACES DBA_TEMP_FILES V$TABLESPACE V$TEMPFILE
What instance parameter cannot be changed after database creation?
DB_BLOCK_SIZE
Which parameter controls the location of background process trace files?
DIAGNOSTIC_DEST
Listing Unique Values §
DISTINCT clause: Produces list of values that are unique § Syntax - SELECT DISTINCT columnlist FROM tablelist ; § Access places nulls at the top of the list § Oracle places it at the bottom § Placement of nulls does not affect list contents
Which of these commands can be executed against a table in a read-only tablespace? A. DELETE B. DROP C. INSERT D. TRUNCATE E. UPDATE
DROP
DML
Data Manipulation Language: insertion of new data modification of stored data retrieval of data deletion of data
Data dictionary management
Data dictionary
________ is a component of the relational data model included to specify business rules to maintain the integrity of data when they are manipulated.
Data integrity
Distributed database
Data is distributed across different sites
Centralized database
Data is located at a single site
First Normal Form (1NF)
Data is structured to have a primary key and no repeating groups
Data type mismatch §
Data types provided by SQL might not match data types used in different host languages
Time-Variant Data
Data whose values change over time and for which a history of the data changes must be retained
USERS
Database Users
VEIWS
Database Views
Performance Factors of an Information System
Database design and implementation Application design and implementation Administrative procedures
Database Design Challenges: Conflicting Goals
Database design must conform to design standards
Audit trial
Database of the log in and log out and also the tables na ina-access
TABLES
Database tables
ERD depicts the
Database's main components, Entities, Attributes, Relationships,
Which files must be synchronized for a database to open?
Datafiles, online redo log files, and the controlfile
SQL Functions
Date and time functions Numeric functions String functions Conversion functions
Structured Query Language (SQL)
De facto query language and data access standard supported by the majority of DBMS vendors
DROP TRIGGER trigger_name command §
Deletes a trigger without deleting the table §
DROP TABLE
Deletes table from database § Syntax - DROP TABLE tablename ;
Truncate table tablename
Deleting the data in the table
Attributes
Describe the properties of an object
Relationship
Describes an association among entities
Cloud Computing Data Architect
Design and implement the infrastructure for next generation cloud database systems, internet technologies, cloud storage technologies, data security, performance tuning, large databases, etc.
Database Architect
Design and implementation of database environments, DBMS fundamentals, data modeling, SQL, hardware knowledge, etc.
Database Designer
Design and maintain databases, system design, database design, SQL
Foreign key
Do not need to have unique values in the referencing relation
Optional attribute
Does not require a value, can be left empty
Relational model aimed at logical level
Does not require physical
Index
Each index is associated with only one table
Entity Integrity Purpose
Each row will have a unique identity, and foreign key values can properly reference primary key values
Inheritance
Enables an entity subtype to inherit attributes and relationships of the supertype, All entity subtypes inherit their primary key attribute from their supertype, At the implementation level, supertype and its subtype(s) maintain a 1:1 relationship, Entity subtypes inherit all relationships in which supertype entity participates, Lower-level subtypes inherit all attributes and relationships from its upper-level supertypes
Backup and recovery management
Enables recovery of the database after a failure
Schema data definition language (DDL)
Enables the database administrator to define the schema components
The External Model
End users' view of the data environment, ER diagrams are used to represent the external views
Performance tuning
Ensures efficient performance of the database in terms of storage and access speed
SQL Constraints •
Ensures that column does not accept nulls NOT NULL • Ensures that all values in column are unique UNIQUE • Assigns value to attribute when a new row is added to table DEFAULT • Validates data when attribute value is entered CHECK
Full functional dependence
Entire collection of attributes in the determinant is necessary for the relationship
Participants
Entities that participate in a relationship
Segments
Equivalent of a file system's record type
SQL engine
Executes all queries
Cardinality
Expresses the minimum and maximum number of entity occurrences associated with one occurrence of related entity
HAVING Clause §
Extension of GROUP BY feature § Applied to output of GROUP BY operation § Used in conjunction with GROUP BY clause in second SQL command set § Similar to WHERE clause in SELECT statement
Theta join
Extension of natural join, denoted by adding a theta subscript after the JOIN symbol
Logical View of Data
Facilitated by the creation of data relationships based on a logical construct called a relation
Proper naming
Facilitates communication between parties, Promotes self-documentation
Well designed database
Facilitates data management, Generates accurate and valuable information
Structural independence
File structure is changed without affecting the application's ability to access the data
Big Data Aims to
Find new and better ways to manage large amounts of web and sensor-generated data, Provide high performance and scalability at a reasonable cost
Normalization Normal forms
First normal form (1NF) Second normal form (2NF) Third normal form (3NF)
CHAR
Fixed-length character data coumn_name CHAR(maximum_size(0-2000)
Database Design
Focuses on the design of the database structure that will be used to store and manage end user data, Well designed database, Poorly designed database causes difficult to trace errors
ALTER TABLE command §
Followed by a keyword that produces the specific change one wants to make § Options include ADD, MODIFY, and DROP §
Design Case 1: Implementing 1:1 Relationships
Foreign keys work with primary keys to properly implement relationships in relational model
Grouping Data §
Frequency distributions created by GROUP BY clause within SELECT statement § Syntax - SELECT columnlist FROM tablelist [WHERE conditionlist ] [GROUP BY columnlist ] [HAVING conditionlist ] [ORDER BY columnlist [ASC | DESC]];
Partial dependency
Functional dependence in which the determinant is only part of the primary key Assumption - One candidate key Straight forward Easy to identify
SQL Functions §
Functions always use a numerical, date, or string value § Value may be part of a command or may be an attribute located in a table § Function may appear anywhere in an SQL statement where a value or an attribute can be used
Data Management
Generation, storage, and retrieval of data
The Entity Relationship Model
Graphical representation of entities and their relationships in a database structure
Repeating group
Group of multiple entries of same type can exist for any single key attribute occurrence, Existence proves the presence of data redundancies
HAVING subqueries
HAVING clause restricts the output of a GROUP BY query by applying conditional criteria to the grouped rows
HTML vs XML
HTML describes the presentation of data XML describes the content of data
Outer join
Have common values in common columns or have no matching values
Edgar frank ted codd
He is an english computer scientist and worked for IBM, who invented the relational model for database management
Database Consultant
Help companies leverage database technologies to improve business processes and achieve specific goals, database fundamentals, data modeling, database design, SQL, DBMS, hardware vendor specific technologies, etc.
Syntax §
INSERT INTO tablename SELECT columnlist FROM tablename §
Developing an ER Diagram
Identify business rules based on the descriptions
Developing an ER Diagram
Identify main entities and relationships from the business rules
Developing an ER Diagram
Identify the attributes and primary keys that adequately describe entities
Fully functional dependence (composite key)
If attribute B is functionally dependent on a composite key A but not on any Subset of that composite key, the attribute B is fully functionally dependent on A.
When is a schedule cascadeless?
If cascade rollbacks cannot occur
How can a tablespace be made larger?
If it is a SMALLFILE tablespace, add files Resize the existing file(s)
When is a non-serial schedule serializable?
If it produces a database state that can be produced by some serial execution of the same transactions
What conditions are needed for a recoverable schedule?
If no transaction T commits in schedule S unless: -first all transactions T' commit, from which T reads.
Order of rows and columns
Immaterial to the dbms
Oracle Sequences §
Independent object in the database § Have a name and can be used anywhere a value expected § Not tied to a table or column § Generate a numeric value that can be assigned to any column in any table § Table attribute with an assigned value can be edited and modified § Can be created and deleted any time
Unique index
Index key can have only one pointer value associated with it
Index key
Index's reference point that leads to data location identified by the key
Single Data Value
Intersection of a row and column
Primary key
Is a candidate key that is most appropriate to become main key of the table
Foreign key
Is a field in a relational table that matches the primary key column of another table
Data integrity
Is a fundamental component of information security
Composite index: §
Is based on two or more attributes § Prevents data duplication §
Conversion to Third Normal Form Table is in 3NF when it:
Is in 2NF Contains no transitive dependencies
Domain
It can be considered a constraint on a value of the attribute
Unstructured data
It exists in their original state
One to one
It exists when one row in a table may be linked with one row in another table and vice versa
One to many
It exists when one row in table A may be linked with many rows in table B but one row in table B is linked to only one row in table A
Relational database
It is a collection of data items organized as a set of formally described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables
Database
It is a collection of logically related data
Schema
It is the overall design of a database
Structured query language
Meaning of sql
Stored function
Named group of procedural and SQL statements that returns a value § As indicated by a RETURN statement in its program code §
Unnormalized
No primary keys and redundant data
Partial completeness
Not every supertype occurrence is a member of a subtype
Arithmetic operators §
Perform: § Operations within parentheses § Power operations § Multiplications and divisions § Additions and subtractions
In which data model would a code table appear?
Physical
Data types
Primitve data types
UNION ALL §
Produces a relation that retains duplicate rows § Can be used to unite more than two queries §
Data quality
Promoting accuracy, validity, and timeliness of data
Public Key
Publicly published key used to encrypt data, but cannot be used to decrypt data.
Subquery
Query embedded/nested inside another query
Database access languages and application programming interfaces
Query language, Structured query language (SQL)
Existence independence
Referred to as a strong entity or regular entity
Data integrity
Refers to the accuracy and consistency of data stored in a database
Recursive relationship
Relationship exists between occurrences of the same entity set
The Internal Model
Representing database as seen by the DBMS mapping conceptual model to the DBMS, Is software dependent and hardware independent
The Conceptual Model
Represents a global view of the entire database by the entire organization, Has a macro-level view of data environment, Is software and hardware independent
Extensible Markup Language (XML)
Represents data elements in textual format
Client database processes
Requests for data across the network
Design Case 2: Maintaining History of Time-Variant Data
Requires creating a new entity in a 1:M relationship with the original entity, New entity contains the new value, date of the change, and other pertinent attribute
Hierarchical Model Disadvantages
Requires knowledge of physical data storage characteristics, Navigational system requires knowledge of hierarchical path, Changes in structure require changes in all application programs, Implementation limitations, No data definition, Lack of standards
Class hierarchy
Resembles an upside-down tree in which each class has only one parent
Tuple
Rows
Specific integrity rule
Rules that apply to a particular relation
General integrity rule
Rules that apply to all relations or database
Contstraints
Rules that restrict the data values the can be entered into a column in a database table
Desktop database
Runs on PC
Attribute List Subqueries §
SELECT statement uses attribute list to indicate what columns to project in the resulting set § Column alias cannot be used in attribute list computation if alias is defined in the same attribute list
Comparison Operators: Computed Columns and Column Aliases §
SQL accepts any valid expressions/formulas in the computed columns § Computed column, an alias, and date arithmetic can be used in a single query
Relational Set Operators §
SQL data manipulation commands are set-oriented § UNION, INTERSECT, and Except (MINUS) work properly when relations are union - compatible
Homonym
Same name is used to label different attributes
SEQUENCES
Sequences used to generate surrogate key values automatically
Triggering event
Statement that causes the trigger to execute
Conf(LHS => RHS) (in terms of supports)
Supp(LHS ∪ RHS) / Supp(LHS)
Extended relational data model (ERDM)
Supports OO features and complex data representation
Workgroup databases
Supports a small number of users or a specific department
Database Design
Supports company's operations and objectives Checks the ultimate final product from all perspectives Pointers for examining completion procedures Data component is an element of whole system System analysts/programmers design procedures to convert data into information Database design is an iterative process
COMMIT: Command to save changes •
Syntax - COMMIT [WORK]; • Ensures database update integrity
DELETE: Command to delete •
Syntax - DELETE FROM tablename • [WHERE conditionlist ];
Data Manipulation Commands INSERT: Command to insert data into table •
Syntax - INSERT INTO tablename VALUES(); • Used to add table rows with NULL and NOT NULL attributes
ROLLBACK: Command to restore the database •
Syntax - ROLLBACK; • Undoes the changes since last COMMIT command
SELECT: Command to list the contents •
Syntax - SELECT columnlist FROM tablename ; • Wildcard character (*): Substitute for other characters/command
UPDATE: Command to modify data •
Syntax - UPDATE tablename SET columnname = expression [, columnname = expression ] [WHERE conditionlist ];
Which protocols can Oracle Net 12c use?
TCP, SDP, TCP with secure sockets, Named Pipes
Which of these are types of segment? A. Sequence B. Stored procedure C. Table D. Table partition E. View
Table Table partition
Attribute or combination of attributes
Table must have to uniquely identify each row
Difference
Tables must be union-compatible to yield valid results
Intersect
Tables must be union-compatible to yield valid results
Base tables
Tables on which the view is based
Union-compatible
Tables share the same number of columns, and their corresponding columns share compatible domains
Logical design
Task of creating a conceptual data model
How does multiversion timestamp work?
Tell me!
Connectivity
Term used to label the relationship types
Q.15 The figure below is an example of mapping which type of relationship?
Ternary
What tools can be used to manage templates?
The Database Configuration Assistant
What happens when a user issues a COMMIT?
The LGWR flushes the log buffer to the online redo log
Which of these tools can configure a listener.ora file? A. The Database Configuration Assistant B. Database Express C. The lsnrctl utility D. The Net Configuration Assistant E. The Net Manager
The Net Configuration Assistant The Net Manager
Data independence
The ability to modify a scheme definition in one level without affecting a scheme definition in a higher level
Logical data independence
The ability to modify the conceptual scheme without causing application programs to be written
Functional dependence
The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B
Primary key
The candidate key selected to identify rows uniquely with the table
Schema objects (database objects)
The data within a user schema
What memory structures are a required part of the SGA?
The database buffer cache The log buffer The shared pool
You shut down your instance with SHUTDOWN IMMEDIATE. What will happen on the next startup?
The database will open without recovery
Which SGA memory structure(s) cannot be resized automatically after instance startup?
The log buffer
Which SGA memory structure(s) cannot be resized dynamically after instance startup?
The log buffer
Structured Query Language
The standard query language for relational databases is Structured Query Language
If a statement is suspended because of a space error, what will happen when the problem is fixed?
The statement will continue executing from the point it had reached immediately after the problem is fixed
Attribute
The table columns
You issue the command SHUTDOWN, and it seems to hang. What could be the reason?
There are other sessions logged on
Surrogate Keys Primary key used to simplify the identification of entity instances are useful when:
There is no natural key, Selected candidate key has embedded semantic contents or is too long
Which statement is correct regarding the online redo log?
There must be at least two log file groups, with at least one member each
Ternary relationship
Three entities are associated
How to handle deadlocks?
Timeouts: -a transaction that requests a lock for only a system-defined period (maximum) of time. Deadlock detection: -construct a wait-for-graph -nodes for each transaction -a cycle suggests there is a deadlock
Data transformation and presentation
Transforms entered data to conform to required data structures
One to one One to many Many to one Many to many
Types of relationship between entities
Advanced Data Updates §
UPDATE command updates only data in existing rows § If a relationship is established between entries and existing columns, the relationship can assign values to appropriate slots § Arithmetic operators are useful in data updates § In Oracle, ROLLBACK command undoes changes made by last two UPDATE statements
Select (Restrict)
Unary operator that yields a horizontal subset of a table
Disadvantages of Storing Derived Attributes Not Stored
Uses CPU processing cycles, Increases data access time, Adds coding complexity to queries
Entity relationship diagram (ERD)
Uses graphic representations to model database components
WHERE Subqueries
Uses inner SELECT subquery on the right side of a WHERE comparison expression § Value generated by the subquery must be of a comparable data type § If the query returns more than a single value, the DBMS will generate an error § Can be used in combination with joins
Divide
Uses one 2-column table as the dividend and one single-column table as the divisor
EER diagram (EERD)
Uses the EER model
Validates logical model
Using normalization Integrity constraints Against user requirements
Relvar
Variable that holds a relation
Left outer join
Yields all of the rows in the first table, including those that do not have a matching value in the second table
Right outer join
Yields all of the rows in the second table, including those that do not have matching values in the first table
Product
Yields all possible pairs of rows from two tables
Difference
Yields all rows in one table that are not found in the other table
Intersect
Yields only the rows that appear in both tables
The LOG_BUFFER parameter is a static parameter. How can you change it?
You can change it within the instance, but it will return to the static value at the next startup
Consider this line from a listener.ora file: L1=(description=(address=(protocol=tcp)(host=serv1)(port=1521))) What will happen if you issue this connect string: connect scott/tiger@L1
You can't tell - it depends on how the client side is configured
You issue the URL https://127.0.0.1:5500/em and receive an error. What could be the problem?
You have not started the database listener Database Express is running on a different port You are not logged on to the database server node You have not started the database
Client
a program that rquests and uses a servers resources
User Schema
are part of the users account where they create table in there area known as users schema
Multidimensional OLAP (MOLAP)
array based storage structures, direct access to array data structures, so fast access times
A form of denormalization where the same data are purposely stored in multiple places in the database is called:
data replication
A detailed coding scheme recognized by system software for representing organizational data is called a(n):
data type
Average linkage (clustering)
distance b/w 2 clusters A and B = (1/(|A|*|B|)) ∑a∑b d(a,b)
Complete linkage (clustering)
distance b/w 2 clusters A and B = max{d(a,b): a∈A, b∈B}
The ________ states that no primary key attribute may be null.
entity integrity rule
A command used in Oracle to display how the query optimizer intends to access indexes, use parallel servers and join tables to prepare query results is the:
explain plan
Practical significance of data dependence
is difference between logical and physical format
Market Basket (Association Rules)
items bought by one customer in a single transaction; more generally, a group of entities that can be considered related for the purposes of data mining
An index on columns from two or more tables that come from the same domain of values is called a:
join index.
column costraint
limits the value that can be placed in a specific column, irrespective of values that exist in other table rows
A form of database specification which maps conceptual requirements is called:
logical specifications
What are the three types of problems by interleaving transactions?
lost update problem uncommitted dependency problem inconsistent analysis problem
The need to ________ relations commonly occurs when different views need to be integrated.
merge
Attribute Hierarchies
most common form of attribute relation. Either linear or lattice structure
Entity integrity
no attribute in a primary key is null
A(n) ________ is a field of data used to locate a related field or record.
pointer
Server DBMS process
retrieves data from the database and performs the the requested functions on the data
What will be the setting of the OPTIMIZER_MODE parameter for your session after the next start up if you issue these commands: alter system set optimizer_most=all_rows scope=spfile; alter system set optimizer_mode=rule; alter session set optimizer_mode=first_rows;
rule
One field or combination of fields for which more than one record may have the same combination of values is called a(n):
secondary key
A key decision in the physical design process is:
selecting structures
Snowflake Schema
set up like star, it with normalized (and connected) versions of dimension tables (each dimension may have more than 1 table now)
How does the timestamp method guarantee serializability?
since all the arcs in the precedence graph are: Transaction with smaller timestamp ---> transaction with larger timestamp. Therefore no cycle and no deadlocks! But may not be recoverable
Fixed-Point
specific number of digits on both the lfwt and right of the decimal
Exec SQL statement
statement is used to identify embedded SQL requests to the preprocessor. EXEC SQL<embedded SQL statement>END_EXEC
Knowledge Discovery in Databases (KDD)
the nontrivial process of identifying valid, potentially useful, and ultimately understandable patterns in data.
You have to use local naming. Which file(s) must you create on the client machine?
tnsnames.ora only
When a session changes data, where does the change get written?
To the data block in the cache, and the redo log buffer
Computer-Aided Systems Engineering (CASE)
Tool that produces: Time and cost effective systems Structured, documented, and standardized applications
Specialization
Top-down process, Identifies lower-level, more specific entity subtypes from a higher-level entity supertype, Based on grouping unique characteristics and relationships of the subtypes
Systems Development Life Cycle (SDLC)
Traces history of an information system Provides a picture within which database design and application development are mapped out and evaluated Iterative rather than sequential process
What is Isolation?
Transactions execute idependently -the partial effects of an incomplete transaction must not be visible to other transactions
Binary relationship
Two entities are associated
Physical data independence Logical data independence
Two kinds of data independence
Specific integrity rule General integrity rule
Two relational integrity rule
Table
Two-dimensional structure composed of rows and columns
Project
Unary operator that yields a vertical subset of a table
Entity Relationship Model Advantages
Visual modeling yields conceptual simplicity, Visual representation makes it an effective communication tool, Is integrated with the dominant relational model
Big Data Challenges
Vo l u m e d o e s n o t a l l o w t h e u s a g e o f conventional structures, Expensive, OLAP tools proved inconsistent dealing with unstructured data
Big Data Characteristics
Volume, Velocity, Va r i e t y
Use of Composite Primary Keys Identifiers of weak entities
Weak entity has a strong identifying relationship with the parent entity
What is a lock?
When a transaction accesses the database the lock denies access to other transactions to prevent incorrect results
When is a non-serial schedule view serializable?
When it is view equivalent to serial schedule (returns same result)
Under what circumstances would a connection through a Database Resident Connection Pool (SERVER=POOLED) connection be suitable?
When many short-lived connections share a schema
When are pairs of operations in conflict?
When one transaction writes a data item and another one either reads or writes the same data item.
SQL Indexes §
When primary key is declared, DBMS automatically creates unique index §
How is deadlock formed?
When two or more transactions wait for each other -> They can wait forever.
Data Series
When you turn a sequence of ordered points into a "time series", but ordering unrelated to time; then can use similar/the same techniques on these series
Time Series
a collection of made sequentially in time (useful because intuitive, easy to understand).
Server
a computer that shares its reources with other computers (terminals)
Data Warehourse
a decision support DB that is maintained separately from the organization's operational DBs. Collection of data that is used primarily for organizational decision making/ analysis Usually pulls data from different sources, important to sanitize, normalize, and preprocess the data
Server process
a program that listens to requests for resources from clients and then responds to those requests
Star Schema
a single fact table and a single table for each dimension, generated keys used for maintenance, performance reasons fact table: keys == foreign keys into dimension table (dimension attributes), other attributes = measure (aka dependent) attributes fact table much larger than dimension tables, normalized
Data Mining
a step in the KDD process consisting of applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data
What is timestamp?
a unique identifier TS(Ti) that indicates the relative starting time of a transaction Ti
Syntax to create SQL indexes § CREATE INDEX
indexname ON tablename(); §
Syntax to delete an index § DROP INDEX
indexname;
On-Line Analytical Processing (OLAP)
information technology to help the knowledge worker (executive, manager, analyst) to make faster and better decisions (element of decision support systems, more high level than SQL) - mostly read operations, can optimize on that
Data access layer
interfaces between business logic layer and the underlying database provides mapping from object model of business layer to relational model of database
Dynamic Time Warpping
intuition: we copy and element multiple times so as to achieve a better matching Create a matrix W of size |Q| by |C|, where W_i,j = dist(q_i, c_j) for some distance metric. Computer the best/shortest "warping path" between Q and C with DTW(Q, C) = min{sqrt(∑(k=1 to K)w_k)/K} where K is number of weights being added - can use dynamic programming to find shortest path
A method that speeds query processing by running a query at the same time against several partitions of a table using multiprocessors is called:
parallel query processing
A functional dependency in which one or more nonkey attributes are functionally dependent on part, but not all, of the primary key is called a ________ dependency.
partial functional
An attribute (or attributes) that uniquely identifies each row in a relation is called a:
primary key
Business-logic layer
provides high level view of data and actions on data hides details of data storage schema
Precision
total number of digits both to the left and to the right of the decimal point
A method for handling missing data is to:
track missing data with special reports
Augment transactions to get generalized association rules
transaction T -> {items in T + all of the ancestors of items in T} NOTE: if we have a large itemset Y∪x, we can eliminate all other itemsets Y∪ancestor(x) as redundant
Support for generalized association rules
transaction T supports an item x if: x is an item in T, or x is an ancestor of an item in T note: if {x,y} has above threshold support, then {x, \hat(y)} where \hat(y) is an ancestor of y also has above threshold support (sim for {\hat(x), y} and {\hat(x), \hat(y)})
Database access frequencies are estimated from:
transaction volumes
A functional dependency between two or more nonkey attributes is called a:
transitive dependency
Interestingness (Association Rules)
try to model departure from independence of A and B = Pr(A, B) / (Pr(A)*Pr(B)) -> can model with Supp(A∪B)/(Supp(A)*Supp(B)) range 0 to positive inf - = 1 if A and B are independent
Generalized Association Rules
using (forest of) type hierarchies to generalize items to a less specific type (and then getting association rules based on some of these)
Floating-Point
variable number of decimal places
A relation that contains minimal redundancy and allows easy use is considered to be:
well-structured
Horizontal partitioning makes sense:
when different categories of a table's rows are processed separately
Fact Constellation
when multiple fact tables share common dimension tables
Integer
whole numbers
Sources of Business Rules
Company managers, Policy makers, Department managers, Written documentation, Direct interviews with end users
Data Hardware Software End users Procedure
Components of the dbms environment (5)
Associative Entities
Composed of the primary key attributes of each parent entity
Q.23 In the figure below, the primary key for "Order Line" is which type of key?
Composite
Database Design Process
Conceptual Design Data analysis and requirements Entity Relationship modeling and normalization Data model verification Distributed database design select the dbms • Logical Design Map conceptual model to logical model components Validate logical model using normalization Validate logical model integrity constraints Validate logical model against user requirements • Physical Design Define data storage organization Define integrity and security measures Determine performance measures
ERD depicts the
Conceptual database as viewed by end user
Schema
Conceptual organization of the entire database as viewed by the database administrator
Network Model Advantages
Conceptual simplicity, Handles more relationship types, Data access is flexible, Data owner/member relationship promotes data integrity, Conformance to standards, Includes data definition language, (DDL) and data manipulation, language (DML)
Database Environments
Conceptual, Logical, and physical
Entity integrity
Condition in which each row in the table has its own unique identity
Overlapping subtypes
Contain nonunique subsets of the supertype entity set, Implementation requires the use of one discriminator attribute for each subtype
Character data type
Contain strings that are alphanumeric not used in calculations
General purpose databases
Contains a wide variety of data used in multiple disciplines
Object
Contains data and their relationships with operations that are performed on it
Discipline specific databases
Contains data focused on specific subject areas
Linking Table
Contains the primary key of each of the tables being connected
Entity subtype
Contains unique characteristics of each entity subtype
During the transition from NOMOUNT to MOUNT mode, which files are required?
Controlfiles
PL/SQL Processing with Cursors
Cursor-style processing involves retrieving data from the cursor one row at a time § Current row is copied to PL/SQL variables
DDL DML can provide security system Can provide an integrity system Can provide a concurrency control system Can provide a recovery control system
Different facilities that DBMS provide? (6)
Synonym
Different names are used to describe the same attribute
Data inconsistency
Different versions of the same data appear in different places
Use of Composite Primary Keys Identifiers of composite entities
Each primary key combination is allowed once in M:N relationship 21 ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible web sit e, in wh o le o r in p art. Use of Composite Primary Keys § When used as identifiers of weak entities, represent a real - world object that is: § Existence - dependent on another real - world object § Represented in the data model as two separate entities in a strong identifying relationship 22
Collection of tables stored in the database
Each table is independent from another, Rows in different tables are related based on common values in common attributes
Normalization Process Objective is to ensure that each table conforms to the concept of well-formed relations
Each table represents a single subject, No data item will be unnecessarily stored in more than one table, All nonprime attributes in a table are dependent on the primary key, Each table is void of insertion, update, and deletion anomalies
Conversion to First Normal Form
Enable reducing data redundancies Steps Eliminate the repeating groups Identify the primary key Identify all dependencies All relational tables satisfy 1NF requirements Some tables contain partial dependencies Subject to data redundancies and various anomalies
Online analytical processing (OLAP)
Enable retrieving, processing, and modeling data from the data warehouse
Security management
Enforces user security and data privacy
Keys Used to
Ensure that each row in a table is uniquely identifiable
Existence independence
Entity exists apart from all of its related entities
Existence dependence
Entity exists in the database only when it is associated with another related entity occurrence
Data manipulation language (DML)
Environment in which data can be managed and is used to work with the data in the database
Keys Used to
Establish relationships among tables and to ensure the integrity of the data
The Rule of Precedence
Establish the order in which computations are completed
Requirements for Good Normalized Set of Tables
Evaluate PK assignments and naming conventions Refine attribute atomicity, Identify new attributes and new relationships, Refine primary keys as required for data granularity, Maintain historical accuracy and evaluate using derived attributes
Normalization
Evaluating and correcting table structures to minimize data redundancies Reduces data anomalies Assigns attributes to tables based on determination, Properly designed 3NF structures meet the requirement of fourth normal form (4NF)
Updatable view restrictions §
GROUP BY expressions or aggregate functions cannot be used § Set operators cannot be used § JOINs or group operators cannot be used
Entity supertype
Generic entity type related to one or more entity subtypes, Contains common characteristics
What protocol(s) can be used to contact Database Express?
HTTP HTTPS
Big Data New Technologies
Hadoop, Hadoop Distributed, File System (HDFS), MapReduce, NoSQL
Weak Entity Conditions
Has a primary key that is partially or totally derived from parent entity in the relationship
In which type of file is multiple key retrieval not possible?
Hashed
Which type of file is easiest to update?
Hashed
Natural join
Have common values in common columns
Relvar
Heading contains the names of the attributes and the body contains the relation
Reasons for Identifying and Documenting Business Rules
Help standardize company's view of data, Communications tool between users and designers
Composite entity (Bridge or associative entity):
Helps avoid problems inherent to M:N relationships, includes the primary keys of tables to be linked
NoSQL Advantages
High scalability, availability, and fault tolerance are provided, Uses low-cost commodity hardware, Supports Big Data, 4. Key-value model improves storage efficiency
Structural point of view of normal forms
Higher normal forms are better than lower normal forms
Explicit cursor
Holds the output of a SQL statement that may return two or more rows
Data Dictionary and the System Catalog
Homonyms and synonyms must be avoided to lessen confusion
Deadlock thoughts...
How far to roll back 'victim'? Choice of deadlock victim? How long a transaction has been running? How many still to update? Avoid starvation? Solution: store number of times it has been aborted. Next time use different selection criteria.
Translating Business Rules into Data Model Components Questions to identify the relationship type
How many instances of B are related to one instance of A?, How many instances of A are related to one instance of B?
Which form of compression uses compression algorithms rather than de-duplication algorithms?
Hybrid Columnar Compression
International business machine
IBM means
The Physical Model
Operates at lowest level of abstraction, Describes the way data are saved on storage media such as disks or tapes, Requires the definition of physical storage and data access methods-level details
Index
Orderly arrangement to logically access rows in a table
Secondary key
Other term for alternative key
Attribute
Other term for column
Tuple/record
Other term for row
Relation
Other term for table
Divide
Output is a single column that contains all values from the second column of the dividend that are associated with every row in the divisor
No value
Output of the outer query might result in an error or a null empty set
What will happen if you do not run the CATAALOG.SQL and CATPROC.SQL scripts after creating a database?
It will not be possible to query the data dictionary views.
Data modeling
Iterative and progressive process of creating a specific data model for a determined problem domain
JDBC
Java Database Connectivity - works with Java Connects a java program to the SQL server (mysql-connector-java-5.1.39-bin.jar)
Java Server Page
Java technology that allows dynamically generated HTML pages. Server executes the script embedded within the HTML before it generates a web page. Java code enclosed between <% ... %> (index.jsp)
Client side scripting
JavaScript often used. Security is needed to prevent malicious scripts from damaging client machines. Easy to limit in scripting languages, harder to limit general use languages like Java. Java applet does not make system calls directly. Prevents file writes and notifies user about potentially dangerous actions.
Cross Correlation (time series)
Keep one sequence static and slide the other to compute the correlation (inner product) of each shift; determine shift that gives maximum correlation. Eq: see notes
Private Key
Key known only to individual user, and used to decrypt data. Need not be transmitted to the site doing encryption.
Composite key
Key that consists of two or more attributes that uniquely identify an entity occurrence
Composite key
Key that is composed of more than one attribute
Advanced Data Definition Commands §
Keywords use with the command § ADD - Adds a column § MODIFY - Changes column characteristics § DROP - Deletes a column § Used to: § Add table constraints § Remove table constraints
Disadvantages of Storing Derived Attributes Stored
Requires constant maintenance to ensure derived value is current, especially if any values used in the calculation change
Relational Model Disadvantages
Requires substantial hardware and system software overhead, Conceptual simplicity gives untrained people the tools to use a good system poorly, May promote information problems
UNIQUE constraint
Restriction placed on a column to ensure that no duplicate values exist for that column
Extended Entity Relationship Model (EERM)
Result of adding more semantic constructs to the original entity relationship (ER) model
Developing an ER Diagram
Revise and review ERD
What is required before shrinking a table?
Row movement must be enabled Automatic segment space management must be enabled
Entity instance or entity occurrence
Rows in the relational table
Following syntax enables to specify which rows to select
SELECT columnlist § FROM tablelist § [WHERE conditionlist ];
Copying Parts of Tables §
SQL permits copying contents of selected table columns § Data need not be reentered manually into newly created table(s) § Table structure is created § Rows are added to new table using rows from another table
Format of Association Rules
LHS => RHS, meaning something along the lines of "if item(s) of the LHS are in a transaction, item(s) of the RHS are also likely to be in the same transaction"
Problems with File System Data Processing
Lengthy development times, Difficulty of getting quick answers, Complex system administration, Lack of security and limited data sharing, Extensive programming
Query language
Lets the user specify what must be done without having to specify how
Granularity
Level of detail represented by the values stored in a table's row
Entity Relationship Model Disadvantages
Limited constraint representation, Limited relationship representation, No data manipulation language, Loss of information content occurs when attributes are removed from entities to avoid crowded displays
Natural join
Links tables by selecting only the rows with common values in their common attributes
Equijoin
Links tables on the basis of an equality condition that compares specified columns of each table
Relationships
Links that show how different records are related
The Database Schema §
Logical group of database objects related to each other §
Additional SELECT Query Keywords §
Logical operators work well in the query environment § SQL provides useful functions that: § Counts § Find minimum and maximum values § Calculate averages § SQL allows user to limit queries to entries: § Having no duplicates § Whose duplicates may be grouped
Logical View of Data
Logical simplicity yields simple and effective database design methodologies
Database Systems
Logically related data stored in a single logical data repository, DBMS eliminates most of file system's problems, Current generation DBMS software
Which process is responsible for sending the alert when a tablespace usage critical threshold is reached?
MMON, the Manageability Monitor process
Density - MOLAP vs. ROLAP
MOLAP tends to work well with dense data, and ROLAP with sparser data (hybrid systems can pick representation based on density of data to exploit this difference)
DDL DML DQL DCL TCC
Main categories of sql commands (5)
Conversion to Second Normal Form Steps
Make new tables to eliminate partial dependencies Reassign corresponding dependent attributes
Conversion to Third Normal Form Steps
Make new tables to eliminate transitive dependencies Reassign corresponding dependent attributes
Database Administrator
Manage and maintain DBMS and databases, database fundamentals, SQL, vendor courses
Hierarchical Models
Manage large amounts of data for complex manufacturing projects, Represented by an upside-down tree which contains segments, Depicts a set of one-to-many (1:M) relationships
Extensible Markup Language (XML)
Manages unstructured data for efficient and effective exchange of all data types
Evolution of File System Data Processing
Manual File Systems -> Computerized File Systems -> File System Redux: Modern End User Productivity Tools
Outer join
Matched pairs are retained and unmatched values in the other table are left null
Relation or table
Matrix composed of intersecting tuple and attribute
Associative Entities
May also contain additional attributes that play no role in connective process
Data definition language
Meaning of ddl
Equality or inequality
Meet a given join condition
Candidate Key
Minimal (irreducible) super key; a superkey that does not contain a subset of attributes that is itself a superkey
Integrated
Minimize data redundancy
Data integrity management
Minimizes redundancy and maximizes consistency
________ are anomalies that can be caused by editing data in tables.
Modification
Create table Alter table Drop table Create index Alter index Drop index create view Drop view
Most fundamentals ddl commands (8)
A virtual table
Multicolumn, multirow set of values
Which of the following statements about listeners is correct? A. A listener can connect you to one instance only B. A listener can connect you to one service only C. Multiple listeners can share one network interface card D. An instance will only accept connections from the listener specified on the local_listener parameter
Multiple listeners can share one network interface card
Second Normal Form (2NF)
Must be in 1NF and has no partial dependancies, a partial dependency means the feilds in the table are only partially dependant on the primary key
Third Normal Form (3NF)
Must be in 2NF and has no transitive dependencies, a transitive dependency means a feild is dependent on a feild in a table that is not a primary key
Values in a column
Must conform to same data format
Required attribute
Must have a value, cannot be left empty
Features of table creating command sequence §
NOT NULL specification § UNIQUE specification §
NUMBER
NUMBER [([Precision,] [scale])] Three subtypes of number int, fixed_point, and floating_point
Stored Procedures §
Named collection of procedural and SQL statements § Advantages § Reduce network traffic and increase performance § Reduce code duplication by means of code isolation and code sharing
Database Design Challenges: Conflicting Goals
Need for high processing speed may limit the number and complexity of logically desirable relationships
Database Design Challenges: Conflicting Goals
Need for maximum information generation may lead to loss of clean design structures and high transaction speed
What are the advantages of multiversion timestamp?
No idea.
Entity Integrity Example
No invoice can have a duplicate number, nor it can be null
Entity integrity
No key attribute in the primary key can contain a null
Which of the following are properties of relations?
No two rows in a relation are identical
Desirable Primary Key Characteristics
Non intelligent, No change over time, Preferably single-attribute, Preferably numeric, Security-compliant
What action should you take after terminating the instance with SHUTDOWN ABORT?
None, Recovery will be automatic
Structured Query Language (SQL) §
Nonprocedural language with basic command vocabulary set of less than 100 words § Differences in SQL dialects are minor
1:M relationship
Norm for relational databases
Normalization and Database Design
Normalization should be part of the design process Proposed entities must meet required the normal form before table structures are created Principles and normalization procedures to be understood to redesign and modify databases ERD is created through an iterative process Normalization focuses on the characteristics of specific entities
NoSQL Databases
Not based on the relational model, Support distributed database architectures, Provide high scalability, high availability, and fault tolerance, Support large amounts of sparse data, Geared toward performance rather than transaction consistency, Store data in key-value stores
Raw Data
Not yet been processed to reveal the meaning
When updating rows locally and through a database link in one transaction, what must you do to ensure a two-phase commit?
Nothing special because two-phase commit is automatic
Translating Business Rules into Data Model Components
Nouns translate into entities, Verbs translate into relationships among entities, Relationships are bidirectional
Union-compatible
Number of attributes are the same and their corresponding data types are alike
Common SQL Data Types
Numeric NUMBER(L,D) or NUMERIC(L,D) Character CHAR(L) • VARCHAR(L) or VARCHAR2(L) •DATE Date
ANSI SQL allows use of following clauses to cover CASCADE, SET NULL, or SET DEFAULT §
ON DELETE and ON UPDATE
Ordering a Listing §
ORDER BY clause is useful when listing order is important § Syntax - SELECT columnlist FROM tablelist [WHERE conditionlist ] [ORDER BY columnlist [ASC | DESC]]; § Cascading order sequence : Multilevel ordered sequence § Created by listing several attributes after the ORDER BY clause
Inheritance
Object inherits methods and attributes of parent class
Hibernate
Object relational mapping system. Supports a query language that can express complex queries involving joins. Allows relationships to be mapped to sets associated with objects.
Design Case 4: Redundant Relationships
Occur when there are multiple relationship paths between related entities, Need to remain consistent across the model, Help simplify the design
Design Case 3: Design trap
Occurs when a relationship is improperly or incompletely identified, Represented in a way not consistent with the real world
Design Case 3: Fan Trap
Occurs when one entity is in two 1:M relationships to other entities, Produces an association among other entities not expressed in the model
A list of values
One column and multiple rows
One single value
One column and one row
1:1 relationship
One entity can be related to only one other entity and vice versa
Optional Relationship participation
One entity occurrence does not require a corresponding entity occurrence in a particular relationship
Mandatory Relationship participation
One entity occurrence requires a corresponding entity occurrence in a particular relationship
Identifiers
One or more attributes that uniquely identify each entity instance
Many to many
One or more rows in a table can be related to 0, 1 or many rows in another table
Differential backup
Only modified/updated objects since last full backup are backed up
Inner join
Only returns matched records from the tables that are being joined
Inner join
Only rows that meet a given criterion are selected
Transaction log backup
Only the transaction log operations that are not reflected in a previous backup are backed up
ODBC
Open Database Connectivity - works with C, C++, C#, and visual basic
Set-oriented
Operate over entire sets of rows and columns at once
Dynamic SQL
SQL statement is generated at run time § Attribute list and condition are not known until end user specifies them § Slower than static SQL § Requires more computer resources
Transitive dependency
An attribute functionally depends on another nonkey attribute
What is an outer join?
An extension of the Natural join operation that avoids loss of information Computes natural Join and then: add the result tuple form relation R that do not match tuples from relation S Uses null values.
Which statements are correct about extents? A. An extent is a consecutive grouping of Oracle blocks B. An extent is a random grouping of Oracle blocks C. An extent can be distributed across one or more datafiles D. An extent can contain blocks from one or more segments E. An extent can be assigned to only one segment
An extent is a consecutive grouping of Oracle blocks An extent can be assigned to only one segment
What statements are correct about extents? A. An extent is a grouping of several Oracle blocks B. An extent is a grouping of several operating system blocks An extent can be distributed across one or more datafiles D. An extent can contain blocks from one or more segments E. An extent can be assigned to only one segment
An extent is a grouping of several Oracle blocks An extent can be assigned to only one segment
Large itemset
An itemset that satisfies the minimum support requirement (ie, Supp(itemset)≥min_supp)
Entity
An object about which you want to store data (i.e. students, faculty, courses)
Instance
An occurance of a specific entity
Purpose of Database Initial Study
Analyze company situation Define problems and constraints Define objectives Define scope and boundaries
End users can use PL/SQL to create: §
Anonymous PL/SQL blocks and triggers § Stored procedures and PL/SQL functions
Database schema
Another term for schema
Structured data
It results from formatting, Structure is applied based on type of processing to be performed
Triggers §
Procedural SQL code automatically invoked by RDBMS when given data manipulation event occurs §
Procedural SQL
Procedural code is executed as a unit by DBMS when invoked by end user §
What is Atomicity?
"All or nothing property" -either the transation performed entirely or not at all -the recovery of the subsystem of the DBMS is responsible
What is the lost update problem?
"Override by mistake" -an apparently successful completed update operation by one user is overridden by another user
Confidence (of an association rule)
% of the transactions containing LHS that ALSO contain RHS eq: |{transactions S s.t. (LHS ∪ RHS ⊆ S)}| / |{transactions S' s.t. (LHS)⊆S'}|
Support (of a set of items/ association rule)
% of the transactions containing all of the items in the rule (in both the LHS and RHS). eq: |{transactions S s.t. (LHS ∪ RHS)⊆S}| / |all transactions|
0-I
(0,1) the one side is optional
0<-
(0,N) the many side is optional
II
(1,1) the one side is mandatory
I<-
(1,N) the many side is mandatory
One-to-one
(1:1)
One-to-many
(1:M)
Many-to-many
(M:N or M:M)
What is the cascading rollback problem? What is the solution?
-A transaction rollsback after a long time -causing a pile up of rollbacks SOLUTION: release all locks at the end of every transaction
Course outline
-Structured / Relational Data model -Relational Calculus vs Relational Algebra -Enhanced Entity-Relational model (EER) -Structured Query language -Semi-structured data (SSD) -Querying XML:XPath+XQuery -Transactions - concurrency
What is multiversion timestamp ordering?
-generalized timestamp protocol. -keeps several versions of every data item to increase concurrency.
Types of XML validation
1. Well formed: single root element and matching tags (properly nested) 2. Valid: must be well formed and elements must follow a pre-defined structure, described in the DTD.
Relational database
A database structured to recognized relations among stored items of information
The SYSAUX tablespace is mandatory. What will happen if you attempt to issue a CREATE DATABASE command that does not specify a datafile for the SYSAUX tablespace?
A default SYSAUX tablespace and datafile will be created.
Dimension Table (OLAP)
A dimension has attributes, which can occur in hierarchy (i.e., time over days, weeks, years)
Describe a precedence graph.
A directed graph G =(N,E) with a set of nodes N and a set of directed edges E, with: -a node for each transaction -a directed edge Ti-Tj whenever: *Tj reads a value of item written by Ti or *Tj writes a value into an item atfer it has been read or written by Ti
Fact Table (OLAP)
A fact has a number of dimensions, including at least one measure dimension (quantity that can be analysed by user)
Entity Integrity Requirement
A foreign key may have either a null entry or a entry that matches a primary key value in a table to which it is related
Java Servlet
A java class used to extend the capabilities of servers that host web applications. Defines an API for communicating between the web server and application program. (DBentry.java)
Column
A labelled element of a tuple
Candiate key
A minimal set of attributes that uniquely identify a tuple.
INTERSECT §
Combines rows from two queries, returning only the rows that appear in both sets Syntax - query INTERSECT query
Join columns
Common columns
ANY operator
Allows comparison of a single value to a list of values and selects only the rows for which the value is greater than or less than any value in the list
End-user interface
Allows end user to interact with the data
Join
Allows information to be intelligently combined from two or more tables
Character
Alphabetic or numeric
Associative Entities
Also known as composite or bridge entities
Alias
Alternate name given to a column or table in any SQL statement to improve the readability
Which of these background processes is optional? A. ARCn, the archive process B. CKPT, the checkpoint process C. DBWn, the database writer D. LGWR, the log writer E. MMON, the manageability monitor
ARCn, the archive process
Model
Abstraction of a real-world object or event
Database communication interfaces
Accept end user requests via multiple, different network environments
Structural dependence
Access to a file is dependent on its own structure, All file system programs are modified to conform to a new file structure
Manual File Systems
Accomplished through a system of file folders and filing cabinets
Trigger action based on DML predicates §
Actions depend on the type of DML statement that fires the trigger
SELECT command
Acts as a subquery and is executed first
Comparison Operators §
Add conditional restrictions on selected table contents § Used on: § Character attributes § Dates
Optional WHERE clause §
Adds conditional restrictions to the SELECT statement
Encrypting
Advanced Encryption Standard(AES) based on Rijndael algorithm used. Public-key encryption based on each user having two keys. If user1 wants to share data with user2, user1 encrypts the data using the public key of user2.
Joining Tables With an Alias §
Alias identifies the source table from which data are taken § Any legal table name can be used as alias § Add alias after table name in FROM clause § FROM tablename alias
Full backup/dump:
All database objects are backed up in their entirety
Conversion to First Normal Form 1NF describes tabular format in which
All key attributes are defined There are no repeating groups in the table All attributes are dependent on the primary key
Entity integrity
All of the values in the primary key must be unique
Entity Integrity Requirement
All primary key entries are unique, and no part of a primary key may be null
TAB_COLUMNS
All table columns
Minimum data rule
All that is needed is there, and all that is there is needed
Study this tnsnames.ora file: test = (description = (address_list = (address = (protocol = tcp)(host = serv2)(port = 1521)) ) (connect_data = (service_name = prod) ) ) prod = (description = (address_list = (address = (protocol = tcp)(host = serv1)(port = 1521)) ) (connect_data = (service_name = prod) ) ) dev = (description = (address_list = (address = (protocol = tcp)(host = serv2)(port = 1521)) ) (connect_data = (service_name = dev) ) ) Which of the following statements are correct about the connect strings test, prod, and dev? A. All three are valid B. All three can succeed only if the instances are set up for dynamic instance registration C. The test connection will fail, because the connect string doesn't match the service name D. There will be a port conflict on serv2, because prod and dev try to use the same port
All three are valid All three can succeed only if the instances are set up for dynamic instance registration
Q.32 In the figure below, what is depicted?
An associative entity
Importance of Data Models
Are a communication tool, Give an overall view of the database, Organize data for various users, Are an abstraction for the creation of good database
Fields
Are columns that contain data items and character descriptions.
Records
Are rows of a collection of related feilds that contain related information
Unary relationship
Association is maintained within a single entity
Class
Collection of similar objects with shared structure and behavior organized in a class hierarchy
Functional dependence (Generalized definition)
Attribute A determines attribute B if all of the rows in the table that agree in value for attribute A also agree in value for attribute B
Subtype Discriminator
Attribute in the supertype entity that determines to which entity subtype the supertype occurrence is related, Default comparison condition is the equality comparison
Secondary Key
Attribute or combination of attributes used strictly for data retrieval purposes
Composite attribute
Attribute that can be subdivided to yield additional attributes
Simple attribute
Attribute that cannot be subdivided
Single-valued attribute
Attribute that has only a single value
Key attribute
Attribute that is a part of a key
Determinant
Attribute whose value determines another
Derived attribute
Attribute whose value is calculated from other attributes, Derived using an algorithm
Dependent
Attribute whose value is determined by the other attribute
Table Column
Attribute, and each column has a distinct name
Multivalued attributes
Attributes that have many values and require creating: Several new attributes, one for each component of the original multivalued attribute, A new entity composed of the original multivalued attribute's components
Implicit cursor
Automatically created when SQL statement returns only one value
Candidate Key
Any column that could be used as a primary key
Host language
Any language that contains embedded SQL statements §
Union
Combines all rows from two tables, excluding duplicate rows
Special Operators
BETWEEN • Checks whether attribute value is within a range IS NULL • Checks whether attribute value is null LIKE • Checks whether attribute value matches given string pattern IN • Checks whether attribute value matches any value within a value list EXISTS • Checks if subquery returns any rows
Object/Relational Database Management System (O/R DBMS)
Based on ERDM, focuses on better data management
Object-oriented database management system (OODBMS)
Based on OODM
Determination
Based on the relationships among the attributes
The Object-Oriented Data Model (OODM)
Basic building block for autonomous structures, Abstraction of real-world entity
Conceptual schema
Basis for the identification and high-level description of the main data objects
Entity Relationship Model (ERM)
Basis of an entity relationship diagram (ERD)
Entity names - Required to:
Be descriptive of the objects in the business environment, Use terminology that is familiar to the users
Advantages of the DBMS
Better data integration and less data inconsistency, Increased end user productivity, Improved: Data sharing, Data security, Data access, Decision making
Where is the division between the client and the server in the Oracle environment?
Between the user process and the server process
BFILE
Binary file stored outside the database
An appropriate datatype for adding a sound clip would be:
Blob
Persistent stored module (PSM)
Block of code containing: § Standard SQL statements § Procedural extensions that is stored and executed at the DBMS server
Generalization
Bottom-up process, Identifies a higher-level, more generic entity supertype from lower-level entity subtypes, Based on grouping common characteristics and relationships of the subtypes
DTW Path Properties
Boundary Constraint (w must start and finish in the first and last points of the sequences); Continuity (at any given pt in w, we can only travel to neighboring pts); Monotonicity (points in w must be monotonically ordered)
The normal form which removes any remaining functional dependencies because there was more than one primary key for the same nonkeys is called:
Boyce-Codd normal form
Business Rules
Brief, precise, and unambiguous description of a policy, procedure, or principle, Enable defining the basic building blocks, Describe main and distinguishing characteristics of the data
One segment can be spread across many datafiles. How?
By assigning multiple datafiles to a tablespace
UNION §
Combines rows from two or more queries without including duplicate rows Syntax -query UNION query
EXCEPT (MINUS) §
Combines rows from two queries and returns only the rows that appear in the first set § Syntax § query EXCEPT query § query MINUS query
Shared
Can be access by different users at the same time
PL/SQL Stored Functions §
Can be invoked only from within stored procedures or triggers
Foreign key
Can be used to cross-reference tables
Deleting a Table from the Database
Can drop a table only if it is not the one side of any relationship § RDBMS generates a foreign key integrity violation error message if the table is dropped
Physical independence
Changes in physical model do not affect internal model
Logical independence
Changing internal model without affecting the conceptual model
An appropriate datatype for one wanting a fixed-length type for last name would include:
Char
NCLOB
Character LOB supports 2 byte character codes
CLOB
Character Large Object, stores up to 4GB of character data
Atomicity
Characteristic of an atomic attribute
Attribute
Characteristic of an entity, Columns
Attributes
Characteristics of entities
Value Constraints
Check Condition (CC), NOT NULL (NN), Unique (UK)
Metadata
Data about data, which the end user data are integrated and managed, Describe data characteristics and relationships
Data dependence
Data access changes when data storage characteristics change
CREATE VIEW statement
Data definition command that stores the subquery specification in the data dictionary § CREATE VIEW command § CREATE VIEW viewname AS SELECT query
Categories of SQL function §
Data definition language (DDL) § Data manipulation language (DML) §
DDL
Data definition language: specifies entity names/ attributes/ relationships for the stored data
Computerized File Systems
Data processing (DP) specialist: Created a computer based system that would track data and produce required reports
Data Redundancy To be controlled except the following circumstances
Data redundancy must be increased to make the database serve crucial information purposes, Exists to preserve the historical accuracy of the data
Data independence
Data storage characteristics is changed without affecting the program's ability to access the data
Structutred data
Data that is represented in a scrict format *relational data model
If there are several databases created off the same Oracle Home, how will Database Express be configured?
Database Express will give access to each database through different ports.
Decrypting
Decrypting using public key encryption is based on each user having two keys. If user1 wants to share data with user2, after data being encrypted, user2 will use both of their keys to decrypt.
DATE
Default format (DD-MON-YY)
restricted actions
Deleting a table from a user schema, Changing an existing columns data type, Decrease the width of an existing column, Adding a primary key constraint to an existing column, Adding a foreign key constraint, Adding a UNIQUE constraint, Adding a CHECK constraint, Changing a column's default value
________ problems are encountered when removing data with transitive dependencies.
Deletion
Dependency diagram
Depicts all dependencies found within given table structure Helps to get an overview of all relationships among table's attributes Makes it less likely that an important dependency will be overlooked
Specialization Hierarchy
Depicts arrangement of higher-level entity supertypes and lower-level entity subtypes, Relationships are described in terms of "is-a" relationships, Subtype exists within the context of a supertype, Every subtype has one supertype to which it is directly related, Supertype can have many subtypes
Conceptual data model
Describes main data entities, attributes, relationships, and constrains
Unified Modeling Language (UML)
Describes sets of diagrams and symbols to graphically model a system
Connectivity
Describes the relationship classification
Domain
Describes the set of possible values for a given attribute
Data dictionary
Description of all tables in the database created by the user and designer
Denormalization
Design goals Creation of normalized relations Processing requirements and speed Number of database tables expands when tables are decomposed to conform to normalization requirements Joining a larger number of tables: Takes additional input/output (I/O) operations and processing logic Reduces system speed
Operational database
Designed to support a company's day to day operations
Conceptual Design
Designs a database independent of database software and physical details Designed as software and hardware independent
Logical design
Designs an enterprise-wide database that is based on a specific data model but independent of physical-level details
Database Analyst
Develop Databases for decision support reporting, SQL, query optimization, data warehouses
Developing an ER Diagram
Develop the initial ERD
Data anomaly
Develops when not all of the required changes in the redundant data are made successfully
One to one One to many Many to many
Different entity relationships? (3)
The Boyce-Codd Normal Form (BCNF)
Every determinant in the table should be a candidate key, Equivalent to 3NF when the table contains only one candidate key, Violated only when the table contains more than one candidate key, Considered to be a special case of 3NF
When will the Segment Advisor run?
Every night, as an autotask On demand
Referential integrity
Every reference to an entity instance by another entity instance is valid
A priori property of large itemsets
Every subset of a large itemset is also large
Total completeness
Every supertype occurrence must be a member of any
Lack of Design and Data Modeling Skills
Evident despite the availability of multiple personal productivity tools being available, vital in the data design process, decreases communication between the designer, user, and the developer
Correlated Subquery §
Executes once for each row in the outer query § Inner query references a column of the outer subquery § Can be used with the EXISTS special operator
Weak Entity Conditions
Existence-dependent
Use of Composite Primary Keys, When used as identifiers of weak entities, represent a real-world object that is:
Existence-dependent on another real-world object, Represented in the data model as two separate entities in a strong identifying relationship
Module coupling
Extent to which modules are independent to one another Low coupling decreases unnecessary intermodule dependencies
Syntax alternatives §
IN and NOT IN subqueries can be used in place of INTERSECT
Normalization Process Ensures that all tables are in at least 3NF Higher forms are not likely to be encountered in business environment Works one relation at a time Starts by:
Identifying the dependencies of a relation (table), Progressively breaking the relation into new set of relations
Database Security Officer
Implement security policies for data administration, DBMS fundamentals, database administration, SQL, data security technologies, etc.
Many-to-many (M:N) relationship
Implemented by creating a new entity in 1:M relationships with the original entities
Which of the following is an objective of selecting a data type?
Improve data integrity
Where is the current redo byte address, also known as the incremental checkpoint position, recorded?
In the controlfile
File System Redux: Modern End User Productivity Tools
Includes spreadsheet programs such as Microsoft Excel
Disadvantages of Database Systems
Increased costs, Management complexity, Maintaining currency, Vendor dependence, Frequent upgrade/replacement cycles
Relationship Degree
Indicates the number of entities or participants associated with a relationship
Triggering timing
Indicates when trigger's PL/SQL code executes
Module
Information system component that handles specific business function
Implementation and Loading
Install the DBMS Create the databases Requires the creation of special storage-related constructs to house the end-user tables Load or convert the data Requires aggregating data from multiple sources
Role of the DBMS
Intermediary between the user and the database, Enables data to be shared, Presents the end user with an integrated view of the data, Receives and translates application requests into operations required to fulfill the requests, Hides database's internal complexity from the application programs and users
Conversion to Second Normal Form Table is in 2NF when it:
Is in 1NF Includes no partial dependencies
Associative Entities
Is in a 1:M relationship with the parent entities
Determination
Is the basis for establishing the role of a key
How can you enable the suspension and resumption of statements that hit space errors?
Issue an ALTER SESSION ENABLE RESUMABLE command Set the instance parameter RESUMABLE_TIMEOUT
DDL
It allows the user to create and restructure database objects
Relational database
It is a digital database
Entity relationship diagram
It is a graphical representation of entities and their relationships to each other
Primary key
It is a key that uniquely identify each record in a table
Entity
It is a piece of data-an object or concept about which data is stored
What is a DB management system?
It is a software that allows to manage efficiently a DB (i.e define/create/maintain/ control access)
What is a Database Management system?
It is a software that allows to manage efficiently the data base.
Candidate key
It is an attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in that table
Database system
It is an automated system that enables users to define, create, maintain and control access to the database
Relation
It is composed of rows and columns of data
Super key
It is defined as a set of attributes within a table that uniquely identifies each record within a table
Candidate key
It is defined as the set of fields from which primary key can be selected
Relational database
It is divided into logical units called table which is composed of rows and columns of data
Relationship
It is how the data is shared between entities
Entity Integrity Example
It is impossible to have invalid sales representative number
Entity Integrity Purpose
It is possible for an attribute not to have a corresponding value but it is impossible to have an invalid entry It is impossible to delete row in a table whose primary keys has mandatory matching foreign key values in another table
Super key
It is superset of candidate key
What is Database recovery?
It is the process of restoring a database to a correct state after failure
SQL
It is the standard language used to define, query, update and maintain relational databases
Alter Table
It is used to add, delete or modify columns in an existing table
Key
It is used to establish and identify relation between tables
Relational model
It organizes data into one or more tables of columns and rows, with a unique key identifying each row
Domain
It refers to a set of valid atomic values for a given attribute
Primary key
It refers to an attribute or field that serves as a unique identifier for a particular record within a relation
Triggering action
PL/SQL code enclosed between the BEGIN and END keywords
Data storage management
Performance tuning
Joining Database Tables §
Performed when data are retrieved from more than one table at a time § Equality comparison between foreign key and primary key of related tables § Tables are joined by listing tables in FROM clause of SELECT statement § DBMS creates Cartesian product of every table in the FROM clause
Procedural SQL §
Performs a conditional or looping operation by isolating critical code and making all application programs call the shared code § Yields better maintenance and logic control §
Relational Database Management System (RDBMS)
Performs basic functions provided by the hierarchical and network DBMS systems, Makes the relational data model easier to understand and implement, Hides the complexities of the relational model from the user
________ database specification indicates all the parameters for data storage that are then input to database implementation.
Physical
Testing Factors
Physical security Password security Access rights Audit trails Data encryption Diskless workstations Optimization
Logically related data stored in a single logical data repository
Physically distributed among multiple storage facilities
Design Case 1: Implementing 1:1 Relationships Options for selecting and placing the foreign key:
Place a foreign key in both entities, Place a foreign key in one of the entities
NOT NULL constraint
Placed on a column to ensure that every row in the table has a value for that column
Batch update routine
Pools multiple transactions into a single batch to update a master table field in a single operation
Data Redundancy Implications
Poor data security, Data inconsistency, Increased likelihood of data entry errors when complex entries are made in different files, Data anomaly
Subschema
Portion of the database seen by the application programs that produce the desired information from the data within the database
Distributed Database Design
Portions of database may reside in different physical locations, Ensures database integrity, security, and performance
Periodic Maintenance Activities
Preventive maintenance (backup) Corrective maintenance (recovery) Adaptive maintenance Assignment of access permissions and their maintenance for new and old users Generation of database access statistics Periodic security audits Periodic system-usage summaries
Integrity Constraints
Primary (PK),Foreign(FK), Composite Key(CK)
Primary Key and Foreign Key §
Primary key attributes contain both a NOT NULL and a UNIQUE specification § RDBMS will automatically enforce referential integrity for foreign keys § Command sequence ends with semicolon §
Composite identifier
Primary key composed of more than one attribute
Weak (non-identifying) relationship
Primary key of the related entity does not contain a primary key component of the parent entity
Authentication
Process DBMS uses to verify that only registered users access the data § Required for the creation tables § User should log on to RDBMS using user ID and password created by database administrator
Systems development
Process of creating information system
Physical design
Process of data storage organization and data access characteristics of the database
Database development
Process of database design and its implementation
Systems analysis
Process that establishes need for and extent of information system
Semistructured data
Processed to some extent
The Relational Model
Produced an automatic transmission database that replaced standard transmission databases, Based on a relation, Describes a precise set of data manipulation constructs
Information
Produced by processing data, Reveals the meaning of data, Enables knowledge creation, Should be accurate, relevant, and timely to enable good decision making
Denormalization
Produces a lower normal form, Results in increased performance and greater data redundancy Defects in unnormalized tables Data updates are less efficient because tables are larger Indexing is more cumbersome No simple strategies for creating virtual tables known as views
Static SQL
Programmer uses predefined SQL statements and parameters § SQL statements will not change while application is running
Hierarchical Model Advantages
Promotes data sharing, Parent/child relationship promotes conceptual simplicity and data integrity, Database security is provided and enforced by DBMS, Efficient with 1:M relationships
The Information System
Provides for data collection, storage, and retrieval Composed of: People, hardware, software Database(s), application programs, procedures
Description of Operations
Provides precise, up-to-date, and reviewed description of activities defining organization's operating environment
You receive an alert warning you that a tablespace is nearly full. What action could you take to prevent this becoming a problem, without any impact for your users?
Purge all recycle bin objects in the tablespace Shrink the tables in the tablespace
Design Case 1: Implementing 1:1 Relationships Rule
Put primary key of the parent entity on the dependent entity as foreign key
Tasks to be Completed Before Using a New RDBMS Create database structure
RDBMS creates physical files that will hold database § Differs from one RDBMS to another §
Data
Raw Facts, Have little meaning unless its been organized in some logical manner, Building block of information
End User Data
Raw facts of interest to end user
Natural Keys or Natural Identifier
Real-world identifier used to uniquely identify real-world objects, Familiar to end users and forms part of their day-to- day business vocabulary, Used as the primary key of the entity being modeled
Q.34 In the figure below, what type of key is depicted?
Recursive foreign
Recursive Joins §
Recursive query : Table is joined to itself using alias § Use aliases to differentiate the table from itself
Entity
Refers to the entity set and not to a single entity occurrence
Data Redundancy
Relational database facilitates control of data redundancies through use of foreign keys
Logical View of Data
Relational database model enables logical representation of the data and its relationships
SQL Join Operators
Relational join operation merges rows from two tables and returns rows with one of the following
Relational Algebra
Relational operators have the property of closure
unrestricted action
Renaming a table, Adding new fields, Deleting fields, Increasing the maximum_size value field Deleting constraints
Network Models
Represent complex data relationships, Improve database performance and impose a database standard, Depicts both one-to-many (1:M) and many-to-many (M:N) relationships
Attribute name
Required to be descriptive of the data represented by the attribute
Embedded SQL §
SQL statements contained within an application programming language § Differences between SQL and procedural languages § Run - time mismatch § SQL is executed one instruction at a time § Host language runs at client side in its own memory space
Which of the following violates the atomic property of relations?
Sam Hinz
Candidate key
Same characteristics as primary key but not chosen to be the primary key
Advantages of Storing Derived Attributes Stored
Saves CPU processing cycles, saves data access time, data value is readily available, can be used to keep track of historical data
Advantages of Storing Derived Attributes Not Stored
Saves storage space, computation always yields current value
Islands of information
Scattered data locations, Increases the probability of having different versions of the same data
A relation that contains no multivalued attributes and has nonkey attributes solely dependent on the primary key but contains transitive dependencies is in which normal form?
Second
Object-Oriented Model Advantages
Semantic content is added, Visual representation includes semantic content, Inheritance promotes data integrity
Which type of file is most efficient with storage space?
Sequential
Sequences
Sequential lists of numbers the database automatically generates to guarantee values are unique
Server Side Scripting
Server executes the script embedded within the HTML before it generates a web page. Java Server Pages allow java code to be embedded in static HTML. Java code is enclosed between <% ... %>. Simplifies task of connecting a database to the web.
How do sessions communicate with the database?
Server processes execute SQL received from user processes
Database Role
Set of database privileges that could be assigned as a unit to a user or group
Domain
Set of possible values for a given attribute
Constraint
Set of rules to ensure data integrity
Concurrency control system
Several users accessing the database at the time
Difference between shared (read) lock and exclusive (write) lock?
Shared lock: -T is allowed only to read some data item -any other transaction can only read this item. Exclusive lock: -T is allowed to read and write on some data item -any other transaction has no access to this item
Database
Shared, integrated computer structure that stores a collection of: End user data and metadata
Data models
Simple representations of complex real-world data structures, Useful for supporting a specific problem domain
Primary Keys
Single attribute or a combination of attributes, which uniquely identifies each entity instance, Guarantees entity integrity, Works with foreign keys to implement relationships
Tuple, Table Row
Single entity occurrence within the entity set
Object-Oriented Model Disadvantages
Slow development of standards caused vendors to supply their own enhancements, Compromised widely accepted standard, Complex navigational system, Learning curve is steep, High system overhead slows transactions
Cookies
Small piece of text containing identifying information. Sent by server to browser on first interaction. Sent by browser to server on all subsequent interactions. Server saves information about cookies it issued, and can use it when serving a request; authentication information and user preferences.
Sources of Database Failure
Software Hardware Programming exemptions Transactions External factors
Multiuser access control
Sophisticated algorithms ensure that multiple users can access the database concurrently without compromising its integrity
Flags
Special codes used to indicate the absence of some value
Cursor
Special construct used to hold data rows returned by a SQL query
Specialization vs Generalization
Specialisation: -the top-down process of maximising the differences between entity occurences, by identifying their distinguishing characteristics -given superclass(es), it leads to identifying subclasses. Generalisation: -The bottom-up process of minimising the differences between entity occurences, by identifying their common charactersitics. -Given subclasses, it leads to identifying superclass(es)
External schema
Specific representation of an external view
Internal schema
Specific representation of an internal model, Uses the database constructs supported by the chosen database
WHERE condition •
Specifies the rows to be selected
FROM Subqueries § FROM clause: §
Specifies the tables from which the data will be drawn § Can use SELECT subquery
Completeness Constraint
Specifies whether each supertype occurrence must also be a member of at least one subtype
Embedded SQL framework defines: §
Standard syntax to identify embedded SQL code within the host language § Standard syntax to identify host variables § Communication area used to exchange status and error information between SQL and host language
To create a database, in what mode must the instance be?
Started in NOMOUNT mode
Determination
State in which knowing the value of one attribute makes it possible to determine the value of another
Triggering level
Statement- and -row level
Large Object (LOB)
Stores binary data i.e. images, digitized sounds
Data warehouse
Stores data in a format optimized for decision support
Current generation DBMS software
Stores data structures, relationships between structures, and access paths, Defines, stores, and manages all access paths and components
Data dictionary
Stores definitions of the data elements and their relationships
Analytical database
Stores historical data and business metrics used exclusively for tactical or strategic decision making, data warehouse
Cohesivity
Strength of the relationships among the module's entities
Q.41 In the figure below, what type of relationship do the relations depict?
Strong entity/weak entity
Relational Model Advantages
Structural independence is promoted using independent tables, Tabular view improves conceptual simplicity, Ad hoc query capability is based on SQL, Isolates the end user from physical-level details, Improves implementation and management simplicity
Inline subquery §
Subquery expression included in the attribute list that must return one value
Database fragment
Subset of a database stored at a given location
Specialization Hierarchy Provides the means to:
Support attribute inheritance, Define a special supertype attribute known as the subtype discriminator, Define disjoint/overlapping constraints and complete/partial constraints
Enterprise database
Supports many users across many departments
Multiuser database
Supports multiple users at the same time, Workgroup databases, Enterprise database
Single user database
Supports one user at a time, Desktop Database
Alter table tablename add columnname datatype
Syntax of alter table, add
Alter table tablename drop column columnname
Syntax of alter table, delete
Alter Table tablename modify column columnname datatype
Syntax of alter table, modify
Create Table ( Column1 data type(size), Column2 data type(size) Primary Key(column1) );
Syntax of create table
Drop table tablename
Syntax of deleting a table with data
Adding Primary and Foreign Key Designations §
Syntax to add or modify columns § ALTER TABLE tablename § {ADD | MODIFY} ( columnname datatype [ {ADD | MODIFY} columnname datatype ] ) ; § ALTER TABLE tablename § ADD constraint [ ADD constraint ] ;
Network Model Disadvantages
System complexity limits efficiency, Navigational system yields complex implementation, application development, and management, Structural changes require changes in all application programs
System catalog
System data dictionary that describes all objects within the database
TIMESTAMP
TIMESTAMP (NumberofDecimalPlaces)
IND_COLUMNS
Table Columns that have indexes
CONSTRAINTS
Table Constraints
CONS_COLUMN
Table columns that have constraints
Relational database
Table formatted database, a matrix with colums and rows
INDEXES
Table indexes created to improve query retrieval performance
Fourth Normal Form (4NF)
Table is in 4NF when it: Is in 3NF Has no multivalued dependencies Rules All attributes must be dependent on the primary key, but they must be independent of each other No row may contain two or more multivalued facts about an entity
Clustered Tables
Technique that stores related rows from two related tables in adjacent data blocks on disk
What statements regarding instance memory and session memory are correct?
The SGA is written to by all sessions; a PGA is written by one session The SGA is allocated at instance startup
Physical data independence
The ability to modify the physical scheme without causing application programs to be written
An Oracle instance can have only one of some processes, but several of others. Which of these processes can occur several times? A. The archive process B. The checkpoint process C. The database writer process D. The log writer process E. The session server process
The archive process The database writer process The session server process
Alternative key
The candidate key which are not selected for primary key
Consider this tnsnames.ora net service name: orcl=(description= (address=(protocol=tcp)(host=dbserv1)(port=(1521)) (connect_data=(service_name=orcl)(server=dedicated)) ) What will happen if shared server is configured and this net service name is used?
The connect will succeed with a dedicated server connection
What files are created by the CREATE DATABASE command?
The control file The online redo log files The SYSAUX tablespace datafile The SYSTEM tablespace datafile
What is Durability?
The effects of a committed transaction are permanently recorded -they should never be lost because of a failure.
If a tablespace is created with the syntax create tablespace tbs1 datafile 'tbs1.dbf' size 10m; which of these characteristics will it have? A. The datafile will autoextend, but only to double its initial size B. The datafile will autoextend with MAXSIZE UNLIMITED C. The extent management will be local D. Segment space management will be with bitmaps E. The file will be created in teh DB_CREATE_FILE_DEST directory
The extent management will be local Segment space management will be with bitmaps
What is the concurrency control protocol?
The process of managing simultaneous operations on the DB without them interfering with each other.
Prepared Statement
The same query can be compiled once and then run multiple times with different parameter values. PreparedStatement pStmt = conn.prepareStatement("insert into instructor values(?,?,?,?)");
What is a DB schema?
The total description of the DB.
Relational Algebra
Theoretical way of manipulating table contents using relational operators
Entity Supertypes and Subtypes Criteria to determine the usage
There must be different, identifiable kinds of the entity in the user's environment, The different kinds of instances should each have one or more attributes that are unique to that kind of instance
If you stop your listener, what will happen to sessions that connected through it?
They will not be affect in any way
Reasons for Identifying and Documenting Business Rules Allow designer to
Understand the nature, role, scope of data, and business processes, Develop appropriate relationship participation rules and constraints, Create an accurate data model
Entity
Unique and distinct object used to collect and store data
Data Redundancy
Unnecessarily storing same data at different places, Islands of information
Types of Data Anomaly
Update Anomalies, Insertion Anomalies, Deletion Anomalies
Adding a column §
Use ALTER and ADD § Do not include the NOT NULL clause for new column §
Dropping a column §
Use ALTER and DROP § Some RDBMSs impose restrictions on the deletion of an attribute
Changing Column's Data Characteristics §
Use ALTER to change data characteristics § Changes in column's characteristics are permitted if changes do not alter the existing data type § Syntax § ALTER TABLE tablename MODIFY (columnname(characterstic)) ;
Procedural Language SQL (PL/SQL) §
Use and storage of procedural code and SQL statements within the database § Merging of SQL and traditional programming constructs §
Closure
Use of relational algebra operators on existing relations produces new relations
Creating Table Structures §
Use one line per column (attribute) definition § Use spaces to line up attribute characteristics and constraints § Table and attribute names are capitalized §
Surrogate Keys Require ensuring that the candidate key of entity in question performs properly
Use unique index and not null constraints
Surrogate Keys
Used by designers when the primary key is considered to be unsuitable, System-defined attribute, Created an managed via the DBMS, Have a numeric value which is automatically incremented for each new row
Forms Builder
Used for creating custom applications
Reports Builder
Used for creating reports for displaying, printing, and distributing summary data
Enterprise Manager
Used for performing database administration tasks such as creating new user accounts and configureing how the DBMS stores and manages data
Inserting Table Rows with a SELECT Subquery
Used to add multiple rows using another table as source §
IN subqueries
Used to compare a single attribute to a list of values
SQL* Plus
Used to create and test command-line SQL queries and executing PL/SQL procedural programs
Data definition language (DDL) commands
Used to create new database objects and modify or delete existing objects. DDL Commands automatically save the change made to the database
Data manipulation language (DML) commands
Used to insert, update and view database data. DML commands must be expicitly saved
Associative Entities
Used to represent an M:N relationship between two or more entities
Selecting Rows Using Conditional Restrictions
Used to select partial table contents by placing restrictions on the rows §
Updatable Views §
Used to update attributes in any base tables used in the view
Need for Normalization
Used while designing a new database structure, Analyzes the relationship among the attributes within each entity, Determines if the structure can be improved, Improves the existing data structure and creates an appropriate database design
Presentation Layer
User Interface
When the database is in mount mode, what views may be queried to find what datafiles and tablespaces make up the database?
V$DATAFILE V$TABLESPACE
Which of these views can be queried successfully in NOMOUNT mode? A. DBA_DATA_FILES B. DBA_TABLESPACES C. V$DATABASE D. V$DATAFILE E. V$INSTANCE F. V$SESSION
V$INSTANCE V$SESSION
NVARCHAR2 & NCHAR
VARCHAR2 & CHAR use ASCII coding, Unicode is used with NVARCHAR & CHAR to address the need to expand beyond the 256-charcter limitation of ASCII for other languages
Functional dependence
Value of one or more attributes determines the value of one or more other attributes
Key Feilds
Values that help identify an individual row or link data from different tables
VARCHAR2
Variable-length character data l_name VARCHAR2(maximum_sizeofstring (0-4000)
Data Model Verification
Verified against proposed system processes Revision of original design Careful reevaluation of entities Detailed examination of attributes describing entities
________ partitioning distributes the columns of a table into several separate physical records.
Vertical
Entity Cluster
Virtual entity type used to represent multiple entities and relationships in ERD, Avoid the display of attributes to eliminate complications that result when the inheritance rules change
View
Virtual table based on a SELECT query
Referential integrity
if the relation is a foreign key, the its value is either: -null or -matches a primary key of the corresponding relation
An alternative name for an attribute is called a(n):
alias
Which of the following commands will shrink space in a table or index segment and relocate the HWM? A. alter table employees shrink space compact hwm; B. alter table employees shrink space hwm; C. alter table employees shrink space compact; D. alter table employees shrink space; E. alter index employees shrink space cascade;
alter table employees shrink space;
INTERVAL YEAR TO MONTH
coulmn name INTERVAL YEAR[(year)] TO MONTH
Sensitivity testing involves:
checking to see if missing data will greatly impact results
A method to allow adjacent secondary memory space to contain rows from several tables is called:
clustering
Attribute domain
column has a specific range of values
INTERVAL DAY TO SECOND
column_name INTERVAL DAY(MAXIMUM DIGITS EXPRESSED IN THE COLUMN) TO SECONDS(MAXIMUM DIGITS EXPRESSED IN ELAPSED SECONDS)
ALTER TABLE
command: To make changes in the table structure
A primary key that consists of more than one attribute is called a:
composite key
lookup table (pick list)
contains a list of legal values for a column in another table
In the SQL language, the ________ statement is used to make table definitions.
create table
When a regular entity type contains a multivalued attribute, one must:
create two new relations, one containing the multivalued attribute
The storage format for each attribute from the logical data model is chosen to maximize ________ and minimize storage space.
data integrity
The value a field will assume unless the user enters an explicit value for an instance of that field is called a:
default value
integrity constraints
define primary and foreign keys
value contstraints
define specific data values or date ranges that must be inserted into columns(Unique or null)
Answering queries with CUBE(F) - Useage
define tuples for a query q s.t.: 1) if the query q has a condition s.t. A=v for some constant v, then t must have A=v 2) if query GROUPS BY attribute A, should NOT aggregate over A; t should have concrete values (any non * value) for A 3) if attribute A does not appear in GROUP BY or WHERE clause, then t should have A = * (aggregate over A for all t's)
Designing physical files requires ________ of where and when data are used in various ways.
descriptions
A nonkey attribute is also called a(n):
descriptor
The attribute on the left-hand side of the arrow in a functional dependency is the:
determinant
Database designer
determines whether an entity is weak based on business rules
Mapping Quantitative Association Rules to Boolean Rules:
discretize the data of each column into categories and then, for each categories k options, replace k new boolean columns (one for each option)
Single linkage (clustering)
distance b/w 2 clusters A and B = min{d(a,b): a∈A, b∈B}
An advantage of partitioning is:
efficiency
In most cases, the goal of ________ dominates the design process.
efficient data processing
A factor to consider when choosing a file organization is:
efficient storage
INTERVAL
elapsed time between two dates
A primary key whose value is unique across all relations is called a(n):
enterprise key
A contiguous section of disk storage space is called a(n):
extent
A disadvantage of partitioning is:
extra space and update time
The smallest unit of application data recognized by system software is a:
field
A(n) ________ is a technique for physically arranging the records of a file on secondary storage devices.
file organization
When all multivalued attributes have been removed from a relation, it is said to be in:
first normal form
Constraints on generalized association rules
for non-trivial association rules X => Y: no item in Y is an ancestor of an item in X (b/c x=>ancestor(x) is trivial/ always holds)
An attribute in a relation of a database that serves as the primary key of another relation in the same database is called a:
foreign key
The normal form which deals with multivalued dependencies is called:
fourth normal form
A constraint between two attributes is called a(n):
functional dependency
Transaction Processing
grouping of related database changes into unit of work that will suceed(proceed) or fail(terminate)
A file organization that uses hashing to map a key into a location in an index where there is a pointer to the actual data record matching the hash key is called a:
hash index table
A(n) ________ is a routine that converts a primary key value into a relative record number.
hashing algorithm
A file organization where files are not stored in any particular order is considered a:
heap file organization.
An attribute that may have more than one meaning is called a(n):
homonym
Distributing the rows of data into separate files is called:
horizontal partitioning
The entity integrity rule states that:
no primary key attribute can be null
A requirement to begin designing physical files and databases is:
normalized relations
Understanding the steps involved in transforming EER diagrams into relations is important because:
one must be able to check the output of a CASE tool
Q.16 In the figure below, what type of relationship do the relations depict?
one-to-Many
While Oracle has responsibility for managing data inside a tablespace, the tablespace as a whole is managed by the:
operating system
Iceberg queries
queries that generate a large set of data but only keep/return a small amount after filtering (i.e., tip of the iceberg) (can use a priori algorithm to optimize these by first getting sets of entities that sat the constraints and then joining entity types to get full data candidates & applying filtering again, ie, only include customers that bought more than min number of items)
Subquery
query inside another query
An integrity control supported by a DBMS is:
range control
A rule that states that each foreign key value must match a primary key value in the other relation is called the:
referential integrity constraint
A two-dimensional table of data sometimes is called a:
relation
Relational OLAP (ROLAP)
relational and specialized relational DBMS to store and manage warehouse data (set up like traditional relational DBs)
Attributes
repersents different characteristics about the entity
table constraint
restricts the data value with respect to all other values in the table
Euclidean Distance between two time series
sqrt(∑(c_i - q_i)²), where Q and C are assumed to be same length. Problem: time series that are even a little staggered along time axis have very high distance for this distance function (cannot compensate for small distortions in time axis)
Column-oriented storage for OLAP
store each column's data together instead of storing by rows; good for read only data (but can be expensive to update/write to); easier run-length compression
datetime
stores data and time data
Association rules as implication rules
support({A,B}) = Pr(A, B) Conf(A=>B) = P(B|A) = Pr(A, B)/Pr(A) (= Pr(B) if A and B are independent -- problem!)
Two or more attributes having different names but the same meaning are called:
synonyms
Data is represented in the form of:
tables
Within Oracle, the named set of storage elements in which physical files for database tables may be stored is called a(n):
tablespace