ISA 245 Final Exam
Consistent database state
One in which all data integrity constraints are satisfied. A successful transaction changes the database from one consistent state to another.
Data cube
Conceptually, MDBMS end users visualize the stored data as a three-dimensional cube known as a data cube. The location of each data value in the data cube is a function of the x-, y-, and z-axes in a three-dimensional space. The three axes represent the dimensions of the data value. The data cubes can grow to n number of dimensions, thus becoming hypercubes. Data cubes are created by extracting data from the operational databases or from the data warehouse. One important characteristic of data cubes is that they are static; that is, they are not subject to change and must be created before they can be used. Data cubes cannot be created by ad hoc queries. Instead, you query precreated cubes with defined axes; for example, a cube for sales will have the product, location, and time dimensions, and you can query only those dimensions. Therefore, the data cube creation process is critical and requires in-depth front-end design work.
Database security
Refers to DBMS features and other related measures that comply with the organization's security requirements.
Availability
Refers to the accessibility of data whenever required by authorized users and for authorized pur- poses. To ensure data availability, the entire system must be protected from service degradation or interruption caused by any internal or external source.
Concurrency control
Coordinating the simultaneous execution of transactions in a multiuser database system is known as concurrency control. The objective of concurrency control is to ensure the serializability of transactions in a multiuser database environment. Concurrency control is important because the simultaneous execution of transactions over a shared database can create several data integrity and consistency problems. The three main problems are lost updates, uncommitted data, and inconsistent retrievals.
Extraction, transformation, and loading (ETL)
Data extraction, transformation, and loading (ETL) tools collect, filter, integrate, and aggregate internal and external data to be saved into a data store optimized for decision support. Internal data are generated by the company during its day-to-day operations, such as product sales history, invoicing, and payments. The external data sources provide data that cannot be found within the company but are relevant to the business, such as stock prices, market indicators, marketing information
Atomicity
Requires that all operations (SQL requests) of a transaction be completed; if not, the transaction is aborted. If a transaction T1 has four SQL requests, all four requests must be successfully completed; other- wise, the entire transaction is aborted. In other words, a transaction is treated as a single, indivisible, logical unit of work.
Data administrator (DA)
The DA, also known as the information resource manager (IRM), usually reports directly to top management and is given a higher degree of responsibility and authority than the DBA, although the two roles can overlap.
Very large databases (VLDBs)
The DBMS must be capable of sup- porting very large databases (VLDBs). To support a VLDB adequately, the DBMS might be required to support advanced storage technologies, and even more importantly, to support multiple-processor technologies, such as a sym- metric multiprocessor (SMP) or a massively parallel processor (MPP).
Database security officer (DSO)
The DSO's sole job is to ensure database security and integrity. In large organizations, the DSO's activities are often classified as disaster management.
Algorithms find (3)
The data analysis and classification phase studies the data to identify common data characteristics or patterns. During this phase, the data-mining tool applies specific algorithms to find: 1) Data groupings, classifications, clusters, or sequences 2) Data dependencies, links, or relationships 3) Data patterns, trends, and deviations
Metrics
Facts can also be computed or derived at run time. Such computed or derived facts are sometimes called metrics to differentiate them from stored facts.
System administrator
The general coordinator of all DBAs is sometimes known as the systems administrator
Lost update
The lost update problem occurs when two concurrent transactions, T1 and T2, are updating the same data element and one of the updates is lost (overwritten by the other transaction)
Roll-up
(contrast to drill down) you are aggregating the data to a higher level.
Extends
A database administrator (DBA) determines the initial size of the data files that make up the database; however, the data files can automatically expand as required in predefined increments known as extends.
Data mart
A small, single-subject data warehouse subset that provides decision support to a small group of people. In addition, a data mart could be created from data extracted from a larger data warehouse for the specific purpose of supporting faster data access to a target group or function. That is, data marts and data warehouses can coexist within a business intelligence environment.
ACID Test
Atomicity, Consistency, Isolation, Durability
Explanatory analytics
Focuses on discovering and explaining data characteristics and relationships based on existing data. Explanatory analytics uses statistical tools to formulate hypotheses, test them, and answer the how and why of such relationships—for example, how do past sales relate to previous customer promotions?
Information systems architecture (ISA)
The output of the IE process is an information systems architecture (ISA) that serves as the basis for planning, development, and control of future information systems.
Pessimistic locking
The use of locks based on the assumption that conflict between transactions is likely is often referred to as pessimistic locking.
Dimension tables
In effect, dimensions are the magnifying glass through which you study the facts. Such dimensions are normally stored in dimension tables.
Slice and dice
In multidimensional terms, the ability to focus on slices of the cube to perform a more detailed analysis is known as slice and dice.
Disaster management
Includes all of the DBA activities designed to secure data availability following a physical disaster or a database integrity failure. Disaster management includes all planning, organizing, and testing of database contingency plans and recovery procedures.
Lock granularity
Lock granularity indicates the level of lock use. Locking can take place at the following levels: database, table, page, row, or even field (attribute).
Periodicity
These fact tables must have an implicit or explicit periodicity defined. Periodicity, which is usually expressed as current year only, previous years, or all years, provides information about the time span of the data stored in the table.
Online analytical processing (OLAP) 3 characteristics
a BI style whose systems share three main characteristics: 1) Multidimensional data analysis techniques 2) Advanced database support 3) Easy-to-use end-user interfaces
Master data management (MDM)
a collection of concepts, techniques, and processes for the proper identification, definition, and management of data elements within an organization. MDM's main goal is to provide a comprehensive and consistent definition of all data within an organization. MDM ensures that all company resources (people, procedures, and IT systems) that work with data have uniform and consistent views of the company's data.
Security policy
a collection of standards, policies, and procedures created to guarantee the security of a system and ensure auditing and compliance.
Star schema and components (4)
a data-modeling technique used to map multidimensional decision support data into a relational database. In effect, the star schema creates the near equivalent of a multidimensional database schema from the exist- ing relational database. Star schemas yield an easily implemented model for multidimensional data analysis while pre- serving the relational structures on which the operational database is built. The basic star schema has four components: 1) facts 2) dimensions 3) attributes 4) attribute hierarchies.
Table space
a logical grouping of several data files that store data with similar characteristics. For example, you might have a system table space where the data dictionary table data are stored, a user data table space to store the user-created tables, an index table space to hold all indexes, and a temporary table space to do temporary sorts, grouping, and so on. Each time you create a new database, the DBMS automatically creates a minimum set of table spaces.
Index selectivity
a measure of the likelihood that an index will be used in query processing.
Governance
a method or process of government. In this case, BI provides a method for controlling and monitoring business health and for consistent decision making. Furthermore, having such governance creates accountability for business decisions.
Data cache
a shared, reserved memory area that stores the most recently accessed data blocks in RAM. The data read from the data files are stored in the data cache after the data have been read or before the data are written to the data files. The data cache also caches system catalog data and the contents of the indexes.
SQL cache/Procedure cache
a shared, reserved memory area that stores the most recently executed SQL statements or PL/SQL procedures, including triggers and functions. The SQL cache does not store the SQL written by the end user. Rather, the SQL cache stores a "processed" version of the SQL that is ready for execution by the DBMS.
Data analytics
a subset of BI functionality that encompasses a wide range of mathematical, statistical, and model- ing techniques with the purpose of extracting knowledge from data. Data analytics is used at all levels within the BI framework, including queries and reporting, monitoring and alerting, and data visualization. Hence, data analytics is a "shared" service that is crucial to what BI adds to an organization. Data analytics represents what business managers really want from BI: the ability to extract actionable business insight from current events and foresee future problems or opportunities.
Business intelligence
a term for a comprehensive, cohesive, and integrated set of applications used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making.
Snowflake schema
a type of star schema in which the dimension tables can have their own dimension tables. The snowflake schema is usually the result of normalizing dimension tables.
Security vulnerability
a weakness in a system component that could be exploited to allow unauthorized access or cause service disruptions
Field-level lock
allows concurrent transactions to access the same row as long as they require the use of different fields (attributes) within that row. Although field-level locking clearly yields the most flexible multiuser data access, it is rarely implemented in a DBMS because it requires an extremely high level of computer overhead and because the row- level lock is much more useful in practice.
Information engineering
allows for translation of the company's strategic goals into the data and applications that will help the company achieve those goals. IE focuses on the description of corporate data instead of the processes. The IE rationale is simple: business data types tend to remain fairly stable, but processes change often and thus require frequent modification of existing systems. By placing the emphasis on data, IE helps decrease the impact on systems when processes change.
Security threat
an imminent security violation.
Data warehouse
an integrated, subject-oriented, time-variant, nonvolatile collection of data that provides support for decision making.
Transaction
any action that reads from or writes to a database. A transaction may consist of the following: - A simple SELECT statement to generate a list of table contents - A series of related UPDATE statements to change the values of attributes in various tables - A series of INSERT statements to add rows to one or more tables - A combination of SELECT, UPDATE, and INSERT statements The sales transaction example includes a combination of INSERT and UPDATE statements.
Policies
are general statements of direction or action that communicate and support DBA goals.
Dimensions
are qualifying characteristics that provide additional perspectives to a given fact. Recall that dimensions are of interest because decision support data are almost always viewed in relation to other data. For instance, sales might be compared by product from region to region and from one time period to the next.
Key performance indicators (KPIs) 4 types
are quantifiable numeric or scale-based measurements that assess the company's effectiveness or success in reaching its strategic and operational goals 1) General. Year-to-year measurements of profit by line of business, same-store sales, product turnovers, product recalls, sales by promotion, and sales by employee 2) Finance. Earnings per share, profit margin, revenue per employee, percentage of sales to account receivables, and assets to sales 3) Human resources. Applicants to job openings, employee turnover, and employee longevity 4) Education. Graduation rates, number of incoming freshmen, student retention rates, publication rates, and teaching evaluation scores
Audit log
automatically records a brief description of database operations performed by all users. Such audit trails enable the DBA to pinpoint access violations.
Active data dictionary
automatically updated by the DBMS with every database access to keep its access information up to date.
Integrity
concerned with keeping data consistent and free of errors or anomalies.
Information resource dictionary
data dictionary
Dirty data
data that suffer from inaccuracies and inconsistencies, becomes an even greater threat. Data can become dirty for many reasons: - Lack of enforcement of integrity constraints, such as not null, uniqueness, and referential integrity - Data-entry errors and typographical errors - Use of synonyms and homonyms across systems - Nonstandard use of abbreviations in character data - Different decompositions of composite attributes into simple attributes across systems
Confidentiality
deals with ensuring that data are protected against unauthorized access, and if the data are accessed by an authorized user, that the data are used only for an authorized purpose.
Drill down
decompose, the data into more atomic components—that is, finer- grained data at lower levels of aggregation
Authorization management
defines procedures to protect and guarantee database security and integrity. Includes: -User Access Management -View Definition -DBMS access control -DBMS usage monitoring
Standards
describe the minimum requirements of a given DBA activity; they are more detailed and specific than policies. In effect, standards are rules that evaluate the quality of the activity. For example, standards define the structure of application programs and the naming conventions programmers must use.
Durability
ensures that once transaction changes are done and committed, they cannot be undone or lost, even in the event of a system failure.
Exclusive lock
exists when access is reserved specifically for the transaction that locked the object. The exclusive lock must be used when the potential for conflict exists.
Multidimensional online analytical processing (MOLAP)
extends OLAP functionality to multidimensional database management systems (MDBMSs). An MDBMS uses proprietary techniques to store data in matrix- like n-dimensional arrays. MOLAP's premise is that multidimensional databases are best suited to manage, store, and analyze multidimensional data.
Predictive analytics
focuses on predicting future data outcomes with a high degree of accuracy. Predictive analytics uses sophisticated statistical tools to help the end user create advanced models that answer questions about future data occurrences—for example, what would next month's sales be based on a given customer promotion?
Lock
guarantees exclusive use of a data item to a current transaction. In other words, transaction T2 does not have access to a data item that is currently being used by transaction T1.
Consistency
indicates the permanence of the database's consistent state. A transaction takes a database from one consistent state to another. When a transaction is completed, the database must be in a consistent state; if any of the transaction parts violates an integrity constraint, the entire transaction is aborted.
Passive data dictionary
is not updated automatically and usually requires running a batch process.
Isolation
means that the data used during the execution of a transaction cannot be used by a second transaction until the first one is completed. In other words, if transaction T1 is being executed and is using the data item X, that data item cannot be accessed by any other transaction (T2 ... Tn) until T1 ends. This property is particularly useful in multiuser database environments because several users can access and update the database at the same time.
Sparsity (cubes) computed by
measures the density of the data held in the data cube; Divide the total number of actual values in the cube by its total number of cells. Because the data cube's dimensions are predefined, not all cells are populated.
Uncommitted data
occurs when two transactions, T1 and T2, are executed concurrently and the first transaction (T1) is rolled back after the second transaction (T2) has already accessed the uncommitted data— thus violating the isolation property of transactions
Incremental backup
produces a backup of all data since the last backup date.
Full backup (database dump)
produces a complete copy of the entire database
Relational online analytical processing (ROLAP) adds extensions: 3
provides OLAP functionality by using relational databases and familiar relational query tools to store and analyze multidimensional data. This approach builds on existing relational technologies. ROLAP adds the following extensions to traditional RDBMS technology: 1) Multidimensional data schema support within the RDBMS 2) Data access language and query performance optimized for multidimensional data 3) Support for very large databases (VLDBs)
Decision support system
refer to an arrangement of computerized tools used to assist managerial decision making within a business. DSSs were the original precursor of current-generation BI systems.
Security ensures (3)
refers to activities and measures that ensure the 1) confidentiality 2) integrity 3) availability of an information system and its main asset, data.
Compliance
refers to activities that meet data privacy and security reporting guidelines. These guidelines are either part of internal procedures or are imposed by exter- nal regulatory agencies such as the federal government.
Data mining
refers to analyzing massive amounts of data to uncover hidden trends, patterns, and relationships; to form computer models to simulate and explain the findings; and then to use such models to support business decision making. In other words, data mining focuses on the discovery and explanation stages of knowledge acquisition.
Data sparsity (for indexes)
refers to the number of different values a column could have. For example, a STU_SEX column in a STUDENT table can have only two possible values, M or F; therefore, that column is said to have low sparsity. When you perform a search in a column with low sparsity, you are likely to read a high percentage of the table rows anyway; therefore, index processing might be unnecessary work.
Optimistic locking phases (3)
requires neither locking nor timestamping techniques. Instead, a transaction is executed without restrictions until it is committed. Using an optimistic approach, each transaction moves through two or three phases, referred to as read, validation, and write. 1) During the read phase, the transaction reads the database, executes the needed computations, and makes the updates to a private copy of the database values. All update operations of the transaction are recorded in a temporary update file, which is not accessed by the remaining transactions. 2) During the validation phase, the transaction is validated to ensure that the changes made will not affect the integrity and consistency of the database. If the validation test is positive, the transaction goes to the write phase. If the validation test is negative, the transaction is restarted and the changes are discarded. 3) During the write phase, the changes are permanently applied to the database.
Concurrent backup
takes place while the user is working on the database.
Page-level lock
the DBMS locks an entire diskpage. A diskpage, or page, is the equivalent of a diskblock, which can be described as a directly addressable section of a disk. A page has a fixed size, such as 4K, 8K, or 16K.
Enterprise database
the company's data representation that provides support for all present and expected future operations. Most of today's successful organizations depend on the enterprise database to provide support for all of their operations— from design to implementation, from sales to services, and from daily decision making to strategic planning.
Database lock
the entire database is locked, thus preventing the use of any tables in the database by trans- action T2 while transaction T1 is being executed. This level of locking is good for batch processes, but it is unsuitable for multiuser DBMSs.
Table-level lock
the entire table is locked, preventing access to any row by transaction T2 while transaction T1 is using the table. If a transaction requires access to several tables, each table may be locked. However, two transactions can access the same database as long as they access different tables.
Privacy
the extent to which individuals and organizations have the right to determine the details of data usage (who, what, when, where, and how).
Dashboard
use Web-based technologies to present key business performance indicators or information in a single integrated view, generally using graphics that are clear, concise, and easy to understand.
Multidimensional database management system (MDBMS)
uses proprietary techniques to store data in matrix- like n-dimensional arrays. MOLAP's premise is that multidimensional databases are best suited to manage, store, and analyze multidimensional data. Most of the proprietary techniques used in MDBMSs are derived from engineering fields such as computer-aided design/computer-aided manufacturing (CAD/CAM) and geographic information systems (GIS). MOLAP tools store data using multidimensional arrays, row stores, or column stores Conceptually, MDBMS end users visualize the stored data as a three-dimensional cube known as a data cube.
Security breach
when a security threat is exploited to endanger the integrity, confidentiality, or availability of the system. Security breaches can lead to a database whose integrity is either preserved or corrupted
Procedures
written instructions that describe a series of steps to be followed during the performance of a given activity. Procedures must be developed within existing working conditions, and they must support and enhance the work environment.