544

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Dynamic view (slides)

-A "virtual table" created dynamically upon request by a user -No data actually stored; instead data from base table made available to user -Based on SQL SELECT statement on base tables or other views

Tips for Developing Queries

-Be familiar with the data model (entities and relationships) -Understand the desired results -Know the attributes desired in results -Identify the entities that contain desired attributes -Review the ERD -Construct a WHERE equality for each link -Fine tune with GROUP BY and HAVING clauses if needed -Consider the effect on unusual data

Materialized view (slides)

-Copy or replication of data -Data actually stored -Must be refreshed periodically to match corresponding base tables

Several reasons to consider embedding SQL

-It is possible to create a more flexible, accessible, interface for user. -It may be possible to improve performance by using embedded SQL -Database security may be improved

Advantages of SQL Standard

-Reduced training costs -Productivity -Application portability -Application longevity -Reduced dependence on a single vendor -Cross-system communication

Advantages of view

-Simplify query commands -Assist with data security (but don't rely on views for security, there are more important security measures) -Enhance programming productivity -Contain most current base table data -Use little storage space -Provide customized view for user -Establish physical data independence

SQL Standard purposes

-Specify syntax/semantics for data definition and manipulation -Define data structures and basic operations -Enable portability of database definition and application modules -Specify minimal (level 1) and complete (level 2) standards -Allow for later growth/enhancement to standards

Disadvantages of SQL Standard

-Stifle creativity & Innovation -One standard is never enough to meet all needs -Fixing deficiencies may take considerable effort -Loss advantages like application portability

Guidelines for Better Query Design

-Understand how indexes are used in query processing -Keep optimizer statistics up-to-date -Use compatible data types for fields and literals -Write simple queries -Break complex queries into multiple simple parts -Don't nest one query inside another query -Don't combine a query with itself (if possible avoid self-joins) -Create temporary tables for groups of queries -Combine update operations -Retrieve only the data you need -Don't have the DBMS sort without an index -Learn! -Consider the total query processing time for ad hoc queries

Disadvantages of view

-Use processing time each time view is referenced -May or may not be directly updateable

Comparison Operators

= (Equals) | Equal to > (Greater Than) |Greater than < (Less Than) |Less than >= (Greater Than or Equal To) |Greater than or equal to <= (Less Than or Equal To) | Less than or equal to <> (Not Equal To) |Not equal to != (Not Equal To) |Not equal to !< (Not Less Than)|Not less than !> (Not Greater Than)|Not greater than

Backup facility

A DBMS COPY utility that produces a backup copy (or save) of an entire database or a subset of a database

Before-image; after-image

A before-image is simply a copy of a record before it has been modified, and an after-image is a copy of the same record after it has been modified.

Enterprise data warehouse (EDW) (graph)

A centralized, integrated data warehouse that is the control point and single source of all data made available to end users for decision support applications

Information Resource Dictionary System (IRDS)

A computer software tool that is used to manage and control access to the information repository

Smart card

A credit card-sized plastic card with an embedded microprocessor chip that can store, process, and output electronics data in a secure manner

Logical data mart

A data mart created by a relational view of a data warehouse

Dependent data mart (graph)

A data mart filled exclusively from an enterprise data warehouse and its reconciled data

Independent data mart (graph)

A data mart filled with data extracted from the operational environment, without the benefit of a data warehouse.

Data mart

A data warehouse that is limited in scope, whose data are obtained by selecting and summarizing data from a data warehouse or from separate extract, transform, and load processes from source data systems

RDBMS

A database management system that manages data as a collection of tables in which all data relationships are represented by common values in related tables.

Transaction

A discrete unit of work that must be completely processed or not processed at all within a computer system.

Checkpoint facility

A facility by which a DBMS periodically refuses to accept any new transactions. The system is in a quiet state, and the database and transaction logs are synchronized

Data administration

A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards.

Database change log

A log that contains before and after images of records that have been modified by transactions

Volatility of a data warehouse vs. volatility of an operational database:

A major difference between a data warehouse and an operational system is the type of data stored. An operational system most often stores transient data, which are overwritten when changes to the data occur. Thus, the data in an operational system are very volatile. On the other hand, a data warehouse usually contains periodic data, which are never overwritten once they have been added to the store. A data warehouse contains a history of the varying values for important (dimensional) data.

Two-phase locking protocol

A procedure for acquiring the necessary locks for a transaction where all necessary locks are acquired before any locks are released, resulting in a growing phase, when locks are acquired, and a shrinking phase, when they are released.

Heartbeat query

A query submitted by the DBA to test the current performance of the database. A heartbeat query is also called a canary query.

Transaction log

A record of the essential data for each transaction that is processed against the database

Data dictionary

A repository of information about a database that documents data elements of a database

What's SQL Standard?

A set of directions

Catalog

A set of schemas that, when put together, constitute a description of a database

Star schema

A simple database design in which dimensional data are separated from fact or event data. A dimensional model is another name for a star schema. It's not suited to online transaction processing, and therefore, it is not generally used in operational systems, operational data stores or an EDW. Contains one fact table and one or more dimension tables. Example: Sales - Product, Period, Store

Data warehouse

A subject-oriented (organized around key subjects), integrated, time variant (contain a time dimension to study trends and changes), nonupdateable collection of data used in support of management decision-making processes.

Informational system

A system designed to support decision-making based on historical point-in-time and prediction data for complex queries or data mining applications

Operational system

A system that is used to run a business in real time, based on current data. Also called a system of record

System catalog

A system-created database that describes all database objects, including data dictionary information, and also includes user access information

Virtual table

A table constructed automatically as needed by a DBMS. Virtual tables are not maintained as real data

Base table

A table in the relational data model containing the inserted raw data, Base table correspond to the relations that are identified in the database's conceptual schema

Database administration

A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery.

Restore/Rerun

A technique that involves reprocessing the day's transactions (up to the point of failure) against the backup copy of the database

Aborted transaction

A transaction in progress that terminates abnormally

Why is transaction integrity important?

A transaction is the identification of the sequence of steps that constitute a well-defined business activity. Transaction integrity includes those actions taken to enforce the commitment of all of the steps that constitute a transaction. In processing a transaction, we want any changes to the database to be made only if the entire transaction is processed successfully. If the transaction aborts at any point, no changes can be allowed, or else the database will not reflect the data needed to manage the firm.

Dynamic SQL

Ability for an application program to generate SQL code on the fly, as the application is running

Characteristics of a surrogate key for a data warehouse or data mart:

All keys used to join the fact table to the dimension tables should be system assigned. The key should be simple as compared to the production or composite key. It is best to maintain the same length and format for all surrogate keys across the entire data warehouse, regardless of the business dimensions involved.

When would the use of the LIKE keyword with the CREATE TABLE command be useful

Allow for the copying of the existing structure of one or more tables into a new table

ON UPDATE SET NULL

Allow the update on the table but to change the involved PK value in the other table to NULL

What is the meaning of the phrase "slowly changing dimension"?

Although data warehouses track data over time, the business does not remain static. We need to keep track of the history of values in order to record the history of facts with correct dimensional descriptions when the facts occurred. Dimension data changes slower than transactional data, thus we can consider dimensions to be slowly changing dimensions.

Journalizing facility

An audit trail of transactions and database changes

Real time data warehouse (graph)

An enterprise data warehouse that accepts near-real-time feeds of transactional data from the systems of record, analyzes warehouse data, and in near-real-time relays business rules to the data warehouse and systems of record so that immediate action can be taken in response to business events

Snowflake Schema

An expanded version of a star schema in which dimension tables are normalized into several related tables

Big data

An ill-defined term applied to database whose size strains the ability of commonly used relational DBMSs to capture, manage, and process the data within a tolerable elapsed time.

Deadlock

An impasse that results when two or more transactions have locked a common resource, and each waits for the other to unlock that resource.

Operational data store

An integrated, subject-oriented, continuously updateable, curren- valued (with recent history), enterprise-wide, detailed database designed to serve operational users as they do decision support processing

Inconsistent read problem

An unrepeatable read, one that occurs when one user reads data that have been partially updated by another user

ON UPDATE RESTRICT

Any updates that would delete or change a primary key value will be rejected unless no foreign key references that value in any child table

COUNT, COUNT DISTINCT, COUNT (*)

COUNT tallies only rows that contain values, it ignores all null values; COUNT(*) counts all rows selected by a query, regardless of whether any of the rows contain null values; COUNT DISTINCT only counts non-duplicate rows

How might a view be used in SQL

CREATE VIEW xxxx_V AS Views provide users controlled access to tables Purpose of a view: simplify query commands, improve data security and enhance programming consistency and productivity for a database

What can and cannot be changed using ALTER

Cannot be changed: view

SQL environment

Catalog, Schema, DDL, DML, DCL An SQL environment includes an instance of an SQL database management system along with the databases accessible by that DBMS and the users and programs that may use that DBMS to access the database.

Versioning

Concurrency control mechanism that doesn't use record locking. Each transaction is restricted to a view of the database as of the time that transaction started, and when a transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record.

Referential Integrity

Constraint that ensures that foreign key values of a table must match primary key values of related table in 1:M relationships

Authorization rules

Controls incorporated in a data management systems that restrict access to data and also restrict the actions that people may take when they access data

Explain the processing order of a correlated subquery.

Correlated subqueries use the result of the outer query to determine the processing of the inner query. Thus, the inner query may be somewhat different for each row referenced in the outer query.

The most common approach used to handle slowly changing dimensions.

Create a new dimension table row (with a new key) each time the dimension object changes and this new row will contain all the dimension characteristics. A fact row is associated with the key whose attributes apply at the time of the fact. This approach allows us to create as many dimensional object changes as necessary. It can become unwieldy if rows change frequently. We may also want to store the surrogate key value for the original object in the dimension row so that we can relate changes back to the original object.

The changing roles of the data administrator and database administrator:

Data administration is a high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide data definitions and standards. Typically, the role of database administration is taken to be a more hands-on, physical involvement with the management of a database or databases. Database administration is a technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery. As business practices change, the roles are also changing within organizations. There are, however, a core set of database administration functions, which must be met in every organization, regardless of the database administration chosen. On one hand, these functions may be spread across data administrators and database administrators. At the other extreme, all of these functions may be handled by a single DBA.

DCL (3 major statements)

Data control language Commands used in to control a database, including those for administering privileges and committing (saving) data GRANT ADD REVOKE

DDL (4 major statements)

Data definition language Commands used to define a database, including those for creating, altering, and dropping tables and establishing constraints CREATE SCHEMA CREATE TABLE CREATE VIEW other CREATE ALTER

Transient data

Data in which changes to existing records are written over previous records, thus destroying the previous data content

DML (3 major statements)

Data manipulation language Commands used to maintain and query a database, including those for updating, inserting, modifying, and querying data. INSERT UPDATE SELECT

Periodic data

Data that are never physically altered or deleted once they have been added to the store

Derived data

Data that have been selected, formatted, and aggregated before end user decision support applications

What is the difference between deadlock prevention and deadlock resolution?

Deadlock prevention requires all users to lock all records they will require at the beginning of a transaction as opposed to one at a time. Unfortunately, it is often difficult to predict in advance all records that will be needed at the beginning of a transaction. Deadlock resolution requires DBMS mechanisms that detect and break deadlocks. A matrix of a resource usage is maintained that enables deadlock detection.

What is a derived table? How are derived table used?

Derived tables are used to create a temporary table that is treated as if it were an actual table. This table is not persistent in that it goes away after the query in which it was created is run.

Reconciled data

Detailed, current data intended to be single authoritative source for all decisions support applications

Surrogate Keys and why?

Dimension table keys should be surrogate (non-intelligent and non-business related) because: -Business keys may change over time -Helps keep track of non-key attribute values for a given production key -Surrogate keys are simpler and shorter -Surrogate keys can be same length and format for all key

In what order are the clauses of an SQL statement processed

FROM ON JOIN WHERE GROUP BY WITH CUBE or WITH ROLLUP HAVING SELECT DISTINCT ORDER BY TOP

Open Source DBMS

Free DBMS source code software that provides the core functionality of an SQL-compliant DBMS

What are the key areas of IT that are examined during a Sarbanes-Oxley audit?

IT change management Logical access to data IT operations

Why is it necessary to limit the kinds of updates that can be performed on data when referencing data through a view

In general, update operations to data in a view are permitted as long as the update is unambiguous in terms of data modification in the base table. However, when the CREATE VIEW statement contains any of the following situations, that view may not be updated directly. a. The SELECT clause includes the keyword DISTINCT. b. The SELECT clause contains expressions, including derived columns, aggregates, statistical functions, and so forth. c. The FROM clause, a subquery, or a UNION clause references more than one table. d. The FROM clause or a subquery references another view, which is not updateable. e. The CREATE VIEW command contains an ORDER BY, GROUP BY, or HAVING clause.

Embedded SQL

Including hard-coded SQL statements in a program written in another language such as C or Java

Integrity controls in database security:

Integrity controls protect data from unauthorized use and update. Integrity controls limit the values a field may hold, limit the actions that can be performed, or trigger the execution of some procedure (such as placing an entry into a log to record which users have performed which actions on which data). A domain is an example of an integrity constraint that can be used to define a user-defined data type. If this type ever needs to be changed, it can be changed in only one place, and all fields using this domain will be updated automatically. Assertion constraints enforce desirable database conditions and check automatically whenever transactions are run. Triggers can be used for events or conditions, and actions needed to be tracked against a database. Triggers cannot be circumvented.

Data mining

Knowledge discovery, using a sophisticated blend of techniques from traditional statistics, artificial intelligence, and computer graphics

Database recovery

Mechanisms for restoring a database quickly and accurately after loss or damage

In what way are dimension tables often not normalized?

Most dimension tables are not normalized so that for a given user group the dimension data are only one join away from associated facts. One example might be multivalued data, in which one could store multiple values by using several different fields. Another example would be the incorporation of data from other tables that are not part of the star schema but might be needed for analysis.

Why SQL is called a set-oriented language

Most operations performed using SQL statements do not operate on a single row in a table, but instead operate on multiple rows in a table

Vector aggregate

Multiple values returned from an SQL query that includes an aggregate function

Evaluation order for Boolean operators

NOT -> AND -> OR

Data warehouses nonupdatable?

Non-updatable means that data, once included in the data warehouse, are never changed (except to correct errors). Instead, new versions of the same data may be stored.

Multidimensional OLAP

OLAP tools that load data into an intermediate structure, usually a three or higher dimensional array

Can an outer join be easily implemented when joining more than two tables?

OUTER JOIN syntax does not apply easily to a join condition of more than two tables. The results will vary, be sure to test any outer join syntax.

Example of derived table

One example of the use of a derived table would be to find all ships that were loaded beyond capacity. In this example, a shipment's weight is calculated by computing the sum of the quantity order times the weight. The query follows: SELECT ship.ship_no FROM ship, shipment, (SELECT shipment_line.shipment_id, SUM(item.weight*shipment_line.quantity) AS tweight FROM shipment_line,item WHERE shipment_line.item_no = item.item_no GROUP BY shipment_id) AS ship_wt WHERE ship.ship_no = shipment.ship_no AND shipment.shipment_id = ship_wt.shipment_id AND ship.capacity < ship_wt.tweight;

Conformed dimension

One of more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each table

If two queries involved in a UNION operation contained columns that were incompatible with one another, how would you recommend fixing this? Explain

One possibility would be to convert one of the data types. For example, if one data type is a character and the other numeric, you could use a function such as Oracle's TO_CHAR to convert the numeric to a character. Another option is to decide which tables might be involved in UNION operations and make sure that the data types are compatible.

GROUP BY limitation

Only those columns that have a single value for each group can be included.

ON UPDATE CASCADE

Pass the change through to the child tables by using this

Shared lock (S Lock, or Read Lock); Exclusive lock (X Lock, or Write Lock)

Placing a shared lock on a record prevents another user from placing an exclusive lock on that record. Placing an exclusive lock on a record prevents another user from placing any type of lock on that record.

Pros and cons of logical data marts:

Pros: a. New data marts can be created quickly because no physical database or database technology needs to be acquired or created. Also, loading routines do not need to be written. b. Data marts are always up-to-date because data in a view are created when the view is referenced. Views can be materialized. Con: Logical data marts are only practical for moderate-sized data warehouses or when high performance data warehousing technology is used.

Database security

Protection of database against accidental or intentional loss, destruction or misuse

ON DELETE CASCADE

Remove one PK value would remove all associated records from other table

Procedures

Routines that do not return values and can take input or output parameters

Triggers

Routines that execute in response to a database event (INSERT, UPDATE, or DELETE)

Functions

Routines that return values and take input parameters

Self-join

Self-joins are usually used on tables with unary relationships

NoSQL

Short for "not only SQL", NoSQL is a class of database technology used to store and access textual and other unstructured data, using more flexible structures than the rows and columns format of relational database. It's used when the user unsure of what structure to use for data.

Information repository

Stores metadata that describe an organization's data and data processing resources. Manages the total information-processing environment. Combines information about an organization's business information and its application portfolio.

Schema

Structure that contains descriptions of objects created by a user (base tables, views, constraints)

How a view can be used to reinforce data security

Tables and columns that are not included will not be obvious to the user of the view

How creating a view may increase data security, and why not to rely completely on using views to enforce data security:

The advantage of a view is that it can be built to present only the data to which the user requires access, thus effectively preventing the user from viewing other data (not produced by the SQL for the view) that may be private or confidential. The user may be granted the right to access the view, but not to access the base tables upon which the view is based. For most views, data may not be updated, thus eliminating the possibility of unauthorized changes to the base data. However, views are not adequate security measures because unauthorized persons may gain knowledge of or access to a particular view. In addition, with high-level query languages, an unauthorized person may gain access to data through simple experimentation. Views also do not protect data from access outside the DBMS by hackers. Thus, views provide more convenience than security.

Encryption

The coding (or scrambling) of data so that humans cannot read them.

Database destruction

The database itself is lost, destroyed, or cannot be read

Grain

The level of detail in a fact table, determined by the intersection of all the components of the primary key, including all foreign keys and any other primary key elements

What is the advantage of optimistic concurrency control and pessimistic concurrency control? Explain.

The main advantage of versioning over locking is performance improvement. Read-only transactions can run concurrently with updating transactions without loss of database consistency.

Concurrency control

The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multiuser environment

Data archiving

The process of moving inactive data to another location where it can be accessed when needed.

What is the purpose of the GRANT and REVOKE SQL commands? Explain.

The purpose of the GRANT and REVOKE SQL commands is to control the access to data, and the actions that can be taken on the data, within a database management system. The GRANT and REVOKE commands are generally operated against User accounts within a database management system. The DBMS vendors may vary in their ability to control the authorization rules within a particular DBMS. In Oracle, authorization rules are handled with privileges on User accounts at the database or the table level, as well as INSERT and UPDATE privileges on a column level. Oracle privileges that can be GRANTed or REVOKEd from a user include: SELECT, INSERT, UPDATE, DELETE, ALTER, INDEX, REFERENCES, EXECUTE.

Data visualization

The representation of data in graphical and multimedia formats for human analysis

Scalar aggregate

The single value returned from an SQL query that includes an aggregate function

Is a star schema a relational model?

The star schema is a denormalized implementation of the relational data model. The fact table plays the role of a normalized n-ary associative entity that links together the instances of the various dimensions. Usually, the dimension tables are in second normal form or possibly (but rarely) in third normal form. The dimension tables are denormalized and because they are neither updated nor joined with one another, provide an optimized user view for specific information needs but could not be used for operational purposes.

Online analytical processing

The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques

Correlated Subqueries

This type of subquery is processed outside in, rather than inside out. That is, the inner query is executed for each row in the outer query, and the inner query depends in some way on values from the current row in the outer query.

Why is time almost always a dimension in a data warehouse or data mart

Time is almost always a dimension in a data warehouse or data mart because data marts and data warehouses record facts about dimensions over time. Date and time are almost always included as a dimension table, and a date surrogate key is usually one of the components of the primary key of the fact table. The time dimension is critical to most of the reporting and analysis needs that end users of the data warehouse have. Often, users will want to view how facts (such as sales) have changed over time or may want to compare one time period against another.

IN/NOT IN

To match a list of values

Difference between a trigger and stored procedures?

Trigger in act which is performed automatically before or after a event occur Stored procedure is a set of functionality which is executed when it is explicitly invoked

In ORDER BY clause, 2 ways to refer to columns to be used for sorting the results of the query

Type column names Use their column positions in the select list

Wildcard

Use of the (*), in the WHERE clause when an exact match is not possible. LIKE '%Desk' - all different types of desks LIKE '_-drawer' - exactly one character before drawer

If want to work in desire order rather than prescribed order,

Use parentheses

User-defined procedures

User exits (or interfaces) that allow system designers to define their own security procedures in addition to the authorization rules

CHECK within CREATE TABLE

Using CHECK as a column constraint, to establish validation rules for values to be inserted into the database

How is the order in which attributes appear in a result table changed

Using ORDER BY Reverse order use DESC after column name

How is referential integrity established

Using REFERENCES clause

How are the column heading labels in a result table changed

Using alias, AS xxx

Boolean /logical Operators

WHERE.... (AND, OR, NOT)

How to use Comparison Operators

WHERE.... (single quote)

Deadlock prevention; deadlock resolution

When deadlock prevention is employed, user programs must lock all records they will require at the beginning of a transaction, rather than one at a time. Deadlock resolution allows deadlocks to occur, but builds mechanisms into the DBMS for detecting and breaking the deadlocks.

WITH CHECK OPTION in CREATE VIEW

When the data values do not meet the specifications of WITH CHECK OPTION attempts to insert data through the view will be rejected

Any query that can be written using the subquery approach can also be written using the joining approach, but not vice versa. Why?

While SQL*PLUS allows a subquery to return more than one column, most systems allow pairwise joining of one and only one column in an inner query with one column in an outer query. (An exception to this is when a subquery is used with the EXISTS keyword.) You can display data only from the table(s) referenced in the outer query. If you want to include data from the subquery in the result, then you would have to use the join technique because data from the subquery cannot be included in the results. The joining technique is useful when data from several relations are to be retrieved and displayed, and the relationships are not necessarily nested.

Backward recovery; forward recovery

With backward recovery (also called rollback), the DBMS backs out of or undoes unwanted changes to the database. Before-images of the records that have been changed are applied to the database. As a result, the database is returned to an earlier state; the unwanted changes are eliminated. With forward recovery (also called rollforward), the DBMS starts with an earlier copy of the database. By applying after-images (the results of good transactions), the database is quickly moved forward to a later state (Figure 12-7).

Is it possible to use both WHERE and HAVING? Differences between two?

Yes it's possible. -WHERE doesn't allow aggregate -WHERE qualifies a set of rows, HAVING qualifies a set of groups -HAVING follows GROUP BY

Outer Join

a join in which rows that do not have matching values in common columns are nonetheless included in the result table

Equi-Join

a join in which the joining condition is based on equality between values in the common columns; common columns appear redundantly in the result table

Join

a relational operation that causes two or more tables with a common domain to be combined into a single table or view

Five claimed limitations of independent data marts:

a. A separate ETL process has to be developed for each data mart. This can yield costly redundant data and efforts. b. A clear, enterprise-wide view of data may not be provided because data marts may not be consistent with one another. c. Analysis is limited because there is no capability to drill down into greater detail or into related facts in other data marts. d. Scaling costs are excessive as each new application creates a separate data mart, which repeats all the extract and load steps. e. Attempting to make the separate data marts consistent generates a high cost to the organization.

What are the common types of database failure?

a. Aborted transaction: Aborting a transaction in progress due to some abnormal condition, such as the loss of transmission in a communication link while a transaction is in progress. b. Incorrect data: The database is updated with incorrect, but valid data: for example, an incorrect grade recorded for a student. c. System failure: Some component of the system fails, but the database is not damaged. Causes include power loss, operator failure, loss of communications transmission, or system software failure. d. Database loss or destruction: The database itself is lost or destroyed or cannot be read.

Five areas where threats to data security may occur:

a. Accidental losses, including human error, software, and hardware-caused breaches: Establishing operating procedures, such as user authorization, uniform software installation procedures, and hardware maintenance schedules, are examples of actions that may be taken to address threats from accidental losses. As in any effort that involves human beings, some losses are inevitable, but well thought-out policies and procedures should reduce the amount and severity of losses. b. Theft and fraud: These activities are going to be perpetrated by people, quite possibly through electronic means, and may or may not alter data. Attention here should focus on each possible location shown in Figure 12-3. For example, control of physical security, so that unauthorized personnel are not able to gain access to the machine room, should be established. Data access policies that restrict altering data immediately prior to a payroll run will help to secure the data. Establishment of a firewall to protect unauthorized access to inappropriate parts of the database through outside communication links is another example of a security procedure that will hamper people who are intent on theft or fraud. c. Loss of privacy or confidentiality: Loss of privacy is usually taken to mean loss of protection of data about individuals, while loss of confidentiality is usually taken to mean loss of protection of critical organizational data that may have strategic value to the organization. Failure to control privacy of information may lead to blackmail, bribery, public embarrassment, or use of user passwords. Failure to control confidentiality may lead to loss of competitiveness. d. Loss of data integrity: When data integrity is compromised, data will be invalid or corrupted. Unless data integrity can be restored through established backup and recovery procedures, an organization may suffer serious losses or make incorrect and expensive decisions based on the invalid data. e. Loss of availability: Sabotage of hardware, networks, or applications may cause the data to become unavailable to users, which again may lead to severe operational difficulties.

Two claimed benefits of independent data marts:

a. Allow for the concept of a data warehouse to be proved by working on a series of small, fairly independent projects b. A reduction in the amount of time until a benefit from data warehousing is perceived by the organization, so that there is not a delay until all data are centralized

What are the different types of DBMS facilities that are required for database backup and recovery?

a. Backup facilities provide periodic backup copies of the entire database. b. Journalizing facilities maintain an audit trail of transactions and database changes. c. Checkpoint facility allows periodic suspension of all processing and synchronization of a database's files and journals. d. Recovery manager allows the DBMS to restore the database to a correct condition and restart processing transactions.

Changes in data administration procedures that will decrease development and implementation time:

a. Database planning: Improve technology selection through selective evaluation of possible products. Consider each technology's fit with the enterprise data model, reducing time required in later stages by effective selection of technology at the database planning stage. b. Database analysis: Work on physical design in parallel with development of the logical and physical models. Prototyping the application now may well lead to changes in the logical and physical data models earlier in the development process. c. Database design: Prioritize application transactions by volume, importance, and complexity. These transactions are going to be most critical to the application, and specifications for them should be reviewed as quickly as the transactions are developed. Logical data modeling, physical database modeling, and prototyping may occur in parallel. DBAs should strive to provide adequate control of the database environment while allowing the developers space and opportunity to experiment. d. Database implementation: Institute database change control procedures so that development and implementation are supported rather than slowed. Wherever possible, segment the model into modules that can be analyzed and implemented more quickly. Find ways to test the system more quickly without compromising quality. Testing may be moved earlier in the development; use testing and change control tools to build and manage the test and production environments. e. Operation and maintenance: Review all timesaving measures that have been taken to ensure that database quality has not been compromised. Consider using third-party tools and utilities wherever possible to save work; other tools, such as Lotus Notes, may reduce the need for meetings, thus saving time.

What are the main components of a disaster recovery plan

a. Develop a detailed disaster recovery plan. b. Choose and train a multidisciplinary team to carry out the plan. c. Establish a backup data center at an off-site location. d. Send backup copies of the databases to the backup data center on a scheduled basis.

Major trends that have led to the growth in data warehouse

a. No single system of record b. Multiple systems are not synchronized c. Organizations want to analyze the activities in a balanced way d. Customer relationship management e. Supplier relationship management

Major components of a data warehouse architechture

a. Operational data. Stored in the various operational systems throughout the organization (and sometimes in external systems) b. Reconciled data. The type of data stored in the enterprise data warehouse c. Derived data. The type of data stored in each of the data marts

List three types of metadata; briefly describe the purpose of each type:

a. Operational metadata. These are metadata that describe the data in the various operational systems (as well as external data) that feed the enterprise data warehouse. Operational metadata typically exist in a number of different formats, and they are, unfortunately, often of poor quality. b. Enterprise data warehouse (EDW) metadata. These metadata are derived from (or at least are consistent with) the enterprise data model. They describe the reconciled data layer. EDW metadata also describe the rules that are used to transform operational data to reconciled data. c. Data mart metadata. These metadata describe the derived data layer. They also describe the rules that are used to transform reconciled data to derived data.

Purpose of the COMMIT command; its relation to the notion of a business transaction.

a. SQL transactions terminate by executing either a COMMIT or ROLLBACK operation. COMMIT [WORK] takes the contents of the log file and applies them to the database and then empties the log file. There is also an AUTOCOMMIT (ON/OFF) command in many RDBMSs that specifies whether changes are made permanent after each data modification command (ON) or only when work is explicitly made permanent (OFF) by the COMMIT WORK command. These commands are necessary to maintain a valid database and are transparent to the user in most interactive SQL situations. b. SQL transactions are logical units of work. Either all of the operations performed in the SQL transaction will be committed, or none of the operations will be committed to the database. An SQL transaction may be more involved than an accounting transaction. For example, the entry of a customer order may also trigger inventory adjustment. Executing the COMMIT command will either make permanent changes to all relations involved in the logical unit of work, or it will make changes to none of them.

Three types of operations that can be easily performed with OLAP tools:

a. Slicing a cube b. Drill-down c. Data mining

Four characteristics of a data warehouse:

a. Subject-oriented. A data warehouse is organized around the key subjects (or high-level entities) of the enterprise. Major subjects may include customers, patients, students, products, time, etc., depending on the real-world domain within which the organization operates. b. Integrated. The data housed in the data warehouse are defined using consistent naming conventions, formats, encoding structures, and related characteristics gathered from several internal systems of record and also often from sources external to the organization. This means that the data warehouse holds the one version of "the truth." c. Time-variant. Data in the data warehouse contain a time dimension so that they may be used to study trends and changes. d. Non-updatable. Data in the data warehouse are loaded and refreshed from operational systems, but cannot be updated by end users.

Natural Join

an equi-join in which one of the duplicate columns is eliminated in the result table

Conceptual data modeling

data administrator

Database planning

data administrator

Managing the data repository

data administrator

Database backup and recovery

database administrator

Installing and upgrading the DBMS

database administrator

Managing data security and privacy

database administrator

Running heartbeat queries

database administrator

Tuning database performance

database administrator

inner join

in which rows must have matching values in order to appear in the result table

Union Join

includes all columns from each table in the join, and an instance for each row of each table


Ensembles d'études connexes

Financial Management of the Firm

View Set

Information Systems Security - C845

View Set

AP Chemistry Semester Test Study Guide

View Set

Chapter 35: Hypothalamic and Pituitary Agents

View Set

Ch. 11 - Input Demand: Capital Market and the Investment Decision

View Set