245 Review
Integrated data warehouse
Meaning that hit is a centralized, consolidated database that integrates data derived from the entire organization and from multiple sources with diverse formats. Data integration implies that all data is described in the same way throughout the organization
MOLAP
Multidimensional online analytical processing: This extends OLAP functionality to multidimensional database management systems . A MDBMS uses techniques to store data in a matrix like structure, because they believe that multidimensional databases are best suited to manage, store, and analyze multidimensional data. MDBMS visualize the stored data as a data cube. They are much quicker than using ROLAP. Used for very large databases and are faster for larger data sets with predefined dimensions.
BI data
Non normalized, historic data. Generally stored in a few tables and represent transaction summaries. BI data give tactical and strategic business meaning to the operational data. Cover a long period of time, more general, and include many data dimensions. Periodic updates are required.
Outer join
Not only returns the rows matching the join condition, but also the rows with unmatched values. When you have a left outer join, the unmatched values appear on the left and vice versa.
Facts (star schema)
Numeric measurements that represent a specific business aspect or activity. Facts are stored in the center of the star schema and are linked through their dimensions. EX: sales figures
Subject oriented data warehouse
Organized and summarized by topic, such as sales, marketing, or finance. For each topic, the warehouse contains specific subjects of interest
UNION query
Perfect tool for generating a combined listing of customers, one that excludes duplicate records. This combines rows from two or more queries without including duplicate rows (combines the output of two SELECT queries)
Governance
Business intelligence is a method for controlling and monitoring business health for consistent decision making
create an index in SQL
CREATE INDEX <name> ON <attribute>
JOIN USING vs. JOIN ON
Can be used interchangeably most of the time. JOIN USING can only be used when the two tables have the same attribute name. JOING <table> USING ( <fieldname>); JOIN ON can be used when the two tables do not have the same field name but they are the same thing. JOIN <table> ON <table1.fieldname=table2.fieldname>
Updatable view
Can be used to update attributes in any base table used in the view. However, not all views are updatable. One easy way to determine whether a view can be used to update a base table is to examine the view's output. If the primary key columns of the base table you want to update still have unique values in the view, the base table is updatable.
PL/SQL
Procedural language sequel: this is a language that makes it possible to use and store procedural code and SQL statements within the database and to merge SQL with transitional programming constructs, such as variables, loops, and error trapping. Most commonly used with triggers and stored procedures PL/SQL functions are mainly invoked within PL/SQL programs such as triggers and stored procedures.
Dimensions (star schema)
Qualifying characteristics that provide additional perspectives to a given fact. Dimensions are an area of interest because decision support data are almost always viewed in relation to other data. Dimensions are the magnifying glass through which you study the facts.
Key performance indicators
Quantifiable numeric or scaled based measurements that assess the company's effectiveness and success in reaching its strategic and operational goals.
BI architecture
Ranges from highly integrated single-vendor systems to loosely integrated, multivendor environments. Includes ETL tools, data storage, query and reporting, data visualization, data altering and monitoring, and data analytics
Integrity of security
Concerned with keeping data consistent and free of errors. ex: shredding or using encryption to copy data
Availability of security
Refers to the accessibility of data whenever required by authorized users and for authorized purposes.
Concurrency Control
Coordinating the simultaneous execution of transactions in a multi-user database. The objective is to ensure the serializability of transactions in a multiuser database environment. Using a database between multiple users and environments can create data integrity problems
Very large databases
DBMS must be required to support advanced storage technologies and support multi-processor technologies. Drive the need for data warehouses
Delete a value in SQL
DELETE FROM <table> WHERE <condition>
To drop a table you have already completed
DROP TABLE <table> CASCADE CONSTRAINT ;
DA
Data administrator: reports directly to top management and is given a higher degree of responsibility and authority than the DBA. The DA is responsible for controlling the overall corporate data resources He sets long term goals and policies. The job is more broad in scope, and focuses on the long term goals.
Data as an asset
Data can translate into information, which if used accurately and timely, it can enhance a companies competitive advantage
DDL
Data definition language: used to create tables, indexes, commands
Table space/file group (architecture)
Data files are grouped in table space or a file group. This is a logical grouping of several data files that store data with similar characteristics. EX: specific table space where data dictionary information is
Star schema
Data modeling technique used to map multidimensional decision support data into a relational database. Creates the equivalent of a multidimensional database schema from the existing relational database.
DBA
Database administrator: The focal point for data and user interaction. the DBA defines and enforces the procedures and standards to be used by programmers and end users during their work with the DBMS. Responsible for controlling, monitoring, and allocating resources to people and data. Gathers user requirements, building user end confidence, resolves conflicts, finds solutions to information needs, ensuring quality of data
DML
Date manipulation langage: used to insert values, update, and delete values
ROLAP
Relational online analytical processing: PRovides OLAP functionality by using relational databases to analyze multidimensional data. Very easy to use and manipulate This approach builds on existing relational technology and represent a natural extension to companies that already use relational database management systems within their organizations ROLAP uses to star schema to enable relational database technology to support multidimensional data representations.
Time variant data warehouse
Represent the flow of data through time. Data is periodically updated
Atomicity
Requires that all operations of a transaction be completed and if not then the transaction is aborted. If a transaction has four database requests, all four requests must successfully be completed otherwise it is aborted.
Data mart
Small, single subject data warehouse subset that provides decision support to a small group of people. Can be created from the data extracted for a larger data warehouse. Powerful computers can provide a customized decision support system to small groups in ways that might not be possible with a centralized system
Operational data
Stored in a relational database in which the structures are highly normalized. Optimized to store transactions that represent daily operations. Cover a short period of time, atomic detailed data, focuses on representing individual transactions rather than the effects of transactions over time.
Master data management
Technology is not enough in BI. Master data management is a collection of concepts, techniques, and processes for the proper identification, definition, and management of data elements within an organization. The main goal is to provide a comprehensive and consistent definition of all data within an organization. (uniform and consistent data)
Business Intelligence
Describes a comprehensive, cohesive, integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making. Allows a business to transform data into information, then into knowledge, then into wisdom. Helps identify opportunities to create a competitive advantage Has the potential to affect a company's culture by creating continuous business performance improvement through active decision support at all levels in an organization.
Attributes (star schema)
Each dimension table contains attributes. Attributes are often used to search, filter, and classify facts. Dimensions provide descriptive characteristics about the facts through their attributes.
Data analytics
Encompasses a wide range of mathematical, statistical, and modeling techniques with the purpose of extracting knowledge from data. Represent what business managers really want from BI- the ability to extract actionable business insight from current events and foresee future problems.
What type of integrity is enforced when a primary key is declared
Entity integrity is the condition in which each row (entity instance) in a table has its own unique identity. To ensure entity integrity, the primary key has two requirements: all the values in PK must be unique and cannot contain a null
NOT used in SQL
Finds the rows that do not match a criteria WHERE NOT( attribute=);
Database request
The equivalent of a single SQL statement in an application program or transaction. Ex: three SQL statements is equivalent to three database requests.
IN used in SQL
The in operator uses a value list. All of the values must be of the same data type WHERE <attribute> IN (1, 2, 3);
Fact table
The middle of the star schema and contains facts that are linked through their dimensions. Each fact table is designed to answer specific decision support questions.
OLAP
A BI style whose systems share three main characteristics: multidimensional data analysis, advanced database support, and easy to use interfaces Designed to use both operational and BI data. The objective is to increase the speed of data access and data visualization
Lock
A lock guarantees exclusive use of a data item to a current transaction. In other words, transaction 2 does not have access to a data item that is currently being used in transaction
Data cache/ buffer cache (architecture)
A shared, reserved memory area that stores the most recently accessed data blocks in RAM. This also caches system catalogue data and contents of the indexes
SQL/ Procedure Cache (architecture)
A shared, reserved memory area that stores the most recently executed SQL statements, including triggers and functions.
Consistent database state
A successful database transaction changes the database from one consistent state to another. A consistent database state is one in which all data integrity constraints are satisfied.
Data files (architecture )
All data in a database are stored in data files. A typical enterprise database is normally composed of several data files. A data file can contain rows from a single table or it can contain rows from many different tables.
Transaction
Any action that reads from or writes to a database. A logical unit of work that must be entirely completed or entirely aborted; no intermediate states are acceptable meaning that no transaction can be partially completed COMMIT is used to end a successful transaction
ACID TEST
Atomicity, consistency, isolation, durability. Each transaction must display these
Explanatory analytics
Focuses on discovering and explaining data characteristics and relationships based on existing data. This uses statistical tools to formulate hypotheses and answer questions about relationships
Predictive analytics
Focuses on predicting future data outcomes with a high degree of accuracy. Uses statistical tools to answer questions about the future.
Data cube
How end users visualize stored data. Data cubes are created by extracting data form the operational databases or from the data warehouse. You cannot change a data cube
Insert values into a table
INSERT INTO <table> VALUES (); - Character and date values must be entered between apostrophes, number values do not need to
Pessimistic locking
The use of locks based on the assumption that conflict between transactions
Slice and dice
In multidimensional terms, the ability to focus on slices of the cube to perform a more detailed analysis. To slice and dice, it must be possible to identify each slice of the cube. To do so, you use the values of each attribute in a given dimension. For example, to use the location dimension, you might need to define a STORE ID to a particular store.
Indexes
These are crucial in speeding up data access because they facilitate searching, sorting, and using aggregate functions An index is an ordered set of values that contains the index key and pointers. An index scan is more efficient than a full table scan because the index data are preordered and the amount of data is usually much smaller. Index when the field appears in a where or having clause, order or group by, high data sparsity, high index selectivity.
Durability
This ensure that once transaction changes are done and committed, they cannot be undone or lost even in the event of a system failure.
Serializability
This ensures that the schedule for the concurrent execution of the transactions yields consistent results. This is important in multiuser and distributed databases in which multiple transactions are likely to be executed at the same time.
Lock gradularity
This indicates the level of lock use. Database lock, table level locking, page level lock, row level low, or field level lock
Consistency
This indicates the permanence of the database's consistent state. A transaction takes the database from one consistent state to another
Inner query/ subquery
This is how you add rows to a table using another table as the source of the data. A subquery/inner query is a query that is embedded inside another query. The inner query is executed first
Isolation
This means that the data used during the execution of a transaction cannot be used by a second transaction until the first one is completed. This is used in multiuser environments because several users access the database at the same time.
Database security officer
This persons job is to ensure database security and integrity. Planning, organizing, and testing of database contingency plans and recovery procedures
Batch update routine (updatable view)
This pools multiple transactions into a single batch to update a master table field in a single operation. For example, used to update a product's quantity on hand based on summary sales transactions.
Data mining
This refers to analyzing massive amounts of data to uncover hidden trends, patterns, and relationships. To use computer models, explain the findings, and then make business decisions. Data warehouses are a good source of data mining operations because the data has already been cleaned.
Periodicity
Time is the most common dimension used in business analysis. Therefore it is very common to have one fact table for each level of aggregation defined within the time dimension.
I/O requests (architecture)
To work with data, the DBMS must retrieve the data from permanent storage and place them in RAM or the data cache. To move data from permanent storage, the DBMS issues I/O requests and waits for replies. Working with data in the cache is faster because you do not have to wait for retrieval.
Extraction, transformation, and loading (ETL)
Tools to collect, filter, integrate, and aggregate internal and external data to be saved into a data store optimized for decision support.
Snowflake schema
Type of star schema in which the dimension tables can have their own dimension tables. The snowflake schema is usually the result of normalizing dimension tables
Union compatible
UNION only works properly if relations are union compatible, which meant the number of attributes must be the same and their corresponding data types must be alike. EX: Char and Varchar are similar, as are number and small integer
Update a value in SQL
UPDATE <table> SET <attribute> WHERE <condition>
Update/delete cascade
Update: this ensures that if you make changes to the table you referenced, the change is updated throughout the entire system Delete: if you delete a value in the table you referenced the change is updated throughout the entire system
Transaction log
Used by a DBMS to keep track of all transactions that update the database. The DBMS uses this information for a recovery requirement triggered by a ROLLBACK statement or to recover the database forward to a consistent state. A transaction log is costly, but the ability to restore a corrupted database is worth it.
GROUP BY in SQL
Used when you have attribute columns combined with aggregate functions in the SELECT statement
Between used in SQL
WHERE <attribute> BETWEEN <> AND <> - can also be used with OR in the where line
Dashboards
Web based technologies to present key business performance indicators or information in a single integrated view, generally using graphics that are clear concise, and easy to understand
Systems administrator
When a business has several incompatible DBMS to support different operations, the company might have one DBA assigned to each DBMS. The coordinator of all of the DBAs is called the system administrator
Inconsistent retrievals
When a transaction accesses data before and after one or more other transactions finish working with the data. The transaction might read some data before they are changed, thus yielding inconsistent results
Lost update problem
When two concurrent transactions are updating the same data element and one of the updates is lost or overwritten by the other transaction
Uncommitted data
When two transactions are executed concurrently and the first transaction is rolled back after the second transaction has already accessed the uncommitted data
Recursive join
When you join a table to itself
IN subqueries
When you want to compare a single attribute to a list of values, you use the IN operator in the sub query.
Multidimensional data analysis
Where data is processed and viewed as part of a multidimensional structure. This type of data is important to businesses because they tend to view business data as being related to other business data.
Data warehouse
a repository that contains data in formats that facilitate data extraction, data analysis, and decision making. An integrated, subject oriented, time variant, consolidated database that integrates data derived from the entire organization and from multiple sources with diverse formats.
Security
activities that ensure confidentiality, integrity, and availability of an information system and its main asset. You cannot secure data if you do not secure all the processes and systems around it.
Dirty Data
data that suffers from inaccuracies and inconsistencies and is a huge threat to organizations. Some dirty data can come form the movement of data across systems, as in the creation of a data warehouse
Confidentiality of security
ensuring that data are protected against unauthorized access, and if the data are accessed by an authorized user, that the data is used for an authorized purpose
ROLLBACK transaction
if you lose power or do not finish a transaction, the database is in an inconsistent state and it is not usable for transactions. assuming the DBMS supports transaction management, the DBMS will roll back the database to the previous consistent state
Stored procedure
named collection of PL/SQL statements. They can be used to encapsulate and represent business transactions. For example, you can create a stored procedure to represent a product sale and that stored procedure can be executed in a single transaction.
Nonvolatile data warehouse
once entered in a data warehouse, they can never be removed
View
this is a virtual table based on a SELECT query. The query can contain columns, computed columns, and aggregate functions. The tables on which the view is based are called base tables. DDL language. Views are dynamically updated meaning that it is re-created on demand each time it is invoked. Views create a level of security because they can restrict users to only seeing certain columns and rows in a table.