Exam 2 combined set Study Guide(Chapters 4,5,6,7,9)
Union Join
Includes all columns from each table in the join, and an instance for each row of each table
Tree search
Indexed File Organization uses ____
data warehouse
Informational systems are also known as the____
Decisions
Inputs in the Physical Design process lead to ____
Entity integrity
No primary key attribute may be null. All primary key fields MUST have data.
Inputs
Normalized relations, volume estimates, data security/backup needs, attribute definitions/physical specs, response time expectations, technology used, and integrity expectations are all _____ in the Physical Design Process?
common columns
The ______ in joined tables are usually the primary key of the dominant table and the foreign key of the dependent table in 1:M relationships
Time-variant
The ability to use a data warehouse to study trends and changes over time is ____
Data Mining
The goal of ___ is to explain observed events or conditions, confirm hypothesis, and exploring data for new or unexpected relationships.
irrelevant
The order of columns and rows in a relation must be ___
Schema`
The organization of data to create a blueprint of how a database will be constructed (divided into tables)
transitive dependencies
functional dependecies on non-primary-key attributes
Drill-down
going from summary to more detailed views in OLAP operation is ___
Range Control
limits allowable values for field (e.g., drivers license)
Data Type
limits type of data and length
data integrity
maintaining and assuring the accuracy and consistency of data
Multivalued Attributes
non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
Integer
positive/negative whole number up to 38 digits long
Subject oriented
products,customers, patients, students, etc are all part of ____ in data warehousing
Referential Integrity
range control for foreign-key to primary-key match-ups
Procedures
routines that do not return values and can take input or output parameters
Triggers
routines that execute in response to a database event
Base Tables
tables containing the raw data
ORDER BY
the ___ statement Sorts the result according to specified criteria
WHERE
the ___ statement will indicate the conditions under which a row will be included in the result
GROUP BY
the ____ statement will Indicate categorization of results
HAVING
the ____ statement will Indicate the conditions under which a category (group) will be included
FROM
the ____ statement will Indicate the table(s) or view(s) from which data will be obtained
Data Dictionary
the schema is stored in the ____
COUNT
use the ___ function to find totals
Default value
value-assumed value if no explicit value entered
VARCHAR@
variable-length character, must enter max length, only consumes space needed
transitive dependency
when a non-key attribute determines another non-key attribute.
Partial Dependency
when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.
Identify columns that are unique(candidate keys). Select a Primary key
3rd and 4th steps of Table Creation
Identify primary key-foreign key mates. Determine default values
5th and 6th steps of Table Creation
Dynamic View
A "virtual table" created upon request by user. No data is stored; data from base tables is available to user. Based on SQL SELECT statement on base tables or other views
Data Mart
A 'mini' data warehouse that is limited in scope, i.e., a slice of the data warehouse.
Relational Database Management System(RDBMS)
A DBMS that manages data as a collection of tables in which all relationships are represented by common values in related tables
Data Mart; Data Warehouse
A ___ would have few internal and external sources while ___ has many
Data mart; Warehouse
A ____ is highly denormalized while a ___ is lightly denormalized
Hash Algorithm
A routine that converts primary key to record address. Locates records based on algorithm.
Natural Join
An equi-join in which one of the duplicate columns is eliminated in the result table
Join
An operation that causes two or more tables with a common field to be combined into a single table or view
Denormalization
Can improve performance (speed) by reducing number of table lookups (i.e. reduce number of necessary join queries)
Operational
Clerks, salespersons, and administrators are all primary users of the ____ System
Data Control Language(DCL)
Commands that control a database, including administering privileges and committing data
Data Definition Language(DDL)
Commands that define a database, including creating, altering, and dropping tables and establishing constraints
Data Manipulation Language(DML)
Commands that maintain and query a database
referential integrity
Constraint that states that foreign key values of a table must match primary key values of a related table
Materialized View
Copy of data that is actually stored on disk. Must be refreshed periodically to maintain sync to the base tables.
Goal of Physical Database Design
Create a design for storing data that will provide adequate performance and insure database integrity, security, and recoverability
Informational System
The primary purpose of ___ is to support managerial decision making?
Operational System
The primary purpose of ____ is to run the business on a current basis?
Indexed File Organization
The storage of records either sequentially or nonsequentially with an index that allows software to locate individual records
Schema
The structure that contains descriptions of objects created by a user (base tables, views, constraints)
Online Analytical Processing(OLAP)
The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple techniques
Functional Dependency
The value of one attribute determines the value of another attribute
Operational
The volume of ___ System includes many constant updates and queries on one or a few table rows
File Organization
Fast data retrieval, efficient storage space utilization, protection from failure/ data loss, low need for reorganization, growth accomodation, are all factors for selecting _____.
logical views of the data warehouse.
In logical Data Mart and Real Time Warehouse Architecture data marts are NOT seperate databases but _______. making it easier to create new data marts
Informational
The volume of ____System includes periodic batch updates and queries requiring many or all rows
unnormalized
There are multivalued attributes or repeating groups is ____
a seperate relation with a foreign key taken from the original entity
When transforming entities to relations, a multivalued attribute becomes___
their simple, component attributes
When transforming entities to relations, if you have composite attributes only use ___
Referential Integrity
You cannot have an order for a nonexistent customer. You cannot delete a customer that has an order. These are examples of keeping___
Vertical Partioning
You would use ____ if different users need access to different rows
Vertical Partioning
You would use _____ if different users need access to different columns
Operational
___ Systems usage is narrow, planned, and have simple updates and queries
Relation
___ is a named, two demensional table of data
Data Mart
___ is specific DSS application and is decentralized by user area. Also, is organic and possibly not planned
Normalization
___ is the process of decomposing relations with anomalies to produce smaller, well-structured relations
Procedures
____ are called explicitly
Trigger
____ are event-driven.
Company-Wide View
____ can lead to inconsistent key structures/ data values ,synonyms, missing data, and uses free-form instead of structured fields
Fact Tables
____ contain factual or quantitative data. provides statistics for sales broken down by product, period and store dimensions
Operational/Informational
____ design goal is performance(throughput, availability) while ____ design goal is ease of flexible access and use
Data Warehouse
____ is application independent, centralized(possibly enterprise wide) and planned
DELETE
____ statement allows you to remove rows from a table
DROP TABLE
____ statement allows you to remove tables
SELECT
____ statement is used to List the columns (and expressions) that should be returned from the query
Fractals
____ technique in Data mining is used to compress large databases without losing information
Case-based reasoning
____ technique in Data mining is used to derive rules from real-world case examples
Neural nets
____ technique in Data mining is used to develop predictive models based on principles modele after the human brain
Sequence association
____ technique in Data mining is used to discover cycles of events and behaviors
Affinity
____ technique in Data mining is used to discover strong mutual relationships
Clustering and signal processing
____ technique in Data mining is used to discover subgroups or segments
Rule Discovery
____ technique in Data mining is used to search for patterns and correlations in large data sets
Decision Tree Induction
____ technique in Data mining is used to test or discover if ... then rules for decision propensity
Regression
____ technique in Data mining is used to test or discover relationships from historical data
Candidate Keys
____ uniquely identify each entity instance (row) and one will become the primary key
Informational
_____ Systems usage is broad, ad hoc, and have complex queries/ analysis
Embedded SQL
_____ includes hard coded SQL statements in a program written in another languance such as Jave and creates a more flexible, accessible interface for the user; sometimes creates performance improvement; always provides security improvements
Dynamic SQL
______ has the ability for an application program to generate SQL code on the fly, as the application is running, is more flexible for ad hoc(random queries)
Insertion Anomaly
____adding new rows forces user to create duplicate data
Deletion Anomaly
____deleting rows may cause a loss of data that would be needed for other future rows
ALIAS
____is an alternative column or table name. It is often used for brevity with long table names.
Modification Anomaly
___changing data in a row forces changes to other rows because of duplication
simple
a key with a single field or attribute are called____
composite
a key with more than one field is called ___
Informational System
a system designed to support decision making based on a historical point-in-time and prediction data for complex queries or data mining applications
Operational System
a system that is used to run a business in real time, based on current data
DATE
actual date
1st Normal form
all relations are in ____
Null value control
allowing or prohibiting empty fields
BLOB
binary large object capable of storing 4 gigabytes of binary data, good for graphics, sound clips, etc
Cube slicing
coming up with 2-D view of data is ___ in OLAP operation
Dimension Tables
contain descriptions about the objects (e.g., products and stores) of the business
CREATE VIEW
defines a logical table from one or more tables or views
CREATE TABLE
defines a new table and its columns
CREATE SCHEMA
defines a portion of the database owned by a particular user
Physical Record
A group of fields stored in adjacent memory locations and retrieved together as a unit
Equi-join
A join in which the joining condition is based on matching the values of the common columns.. Common columns appear redundantly in the result table
Physical File
A named portion of memory allocated for the purpose of storing physical records
Structured Query Langauge(SQL)
A special-purpose programming language designed for managing data RDBMS.
Data Warehouse
A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
Index
A table or other data structure used to determine in a file the location of records that satisfy some condition
extract, transform, and load(ETL)
All data warehouse architectures involve some form of ____
Informational
An ____ System provides data that shows a historical point in time snapshot and predictions
Operational
An _____ System provides data that represents the current state of the business
repeating groups
Another name for multivalued attributes is ____
Decisions
Attribute data types, Physical record descriptions, file organizations, indexes/database architectures, and query optimization are all types of ____ in the Physical Design process?
depth of the tree
Average time to find desired record=
Integrated
Being ___ includes consistent naming conventions, formats, and encoding structures; from multiple data sources
Multidimensional OLAP (MOLAP)
Cube structure of online analytical processing
Maintenance&Implementation
DCL is part of the ___&____ database development processes
Physical Design&Maintenance
DDL is part of the ___&____ database development processes
implementation
DML is part of the ___ database development process
Project; Data
Data Marts are ___ oriented while Data warehouses are ___ oriented
Horizontal Partitioning
Distributing the rows of a table into several seperate files.
Non-updatable
Data warehouses are read only, and are periodically refreshed.This makes them ___
EDW
Dependent Data marts are loaded from ____
denormalized
Dimension tables are ___ to maximize performance
Vertical Partioning
Distributing the columns of a table into several seperate relations
functionally dependent
Each non-key attribute (column, field) is ____ on every primary and candidate key
Second Normal Form(2NF)
Every non-key attribute must be defined by the entire primary key (not just part of the key) is in ____
unique
Every row in a Relation must be ___
Identify data types for attributes. Identify columns that can/cannot be null
First two steps of Table Creation
CHAR
Fixed-length character, max 2000 characters, default length is 1 character
atomic(not multivalued, not composite)
For a table to qualify as a Relation it must have a unique name and every attribute value must be ____
primary or candidate key
For referential integrity to hold, any field in a table that is declared a foreign key can contain only values from a parent table's ______
Partitioning
Form of denormaliztion that creates more tables. Split tables in multiple sub tables. Often corresponds with user schemas or views.
foreign to primary key references
How are referential integrity constraints implemented in a DBMS?
Triggers
INSERT,UPDATE, or DELETE are all examples of ____
Sequential File Storage
If using ____ you must start at the beginning and scan until you find the desired record(not used as much because inflexible)
Logical Data Mart and Real Time Warehouse Architecture
In ___ ODS and data warehouse are one and the same
Dependent Data Mart with Operational Data Store
In ___ architecture ODS provides option for obtaining current data.
Independent Data Mart
In ____ architecture there are Seperate ETL for each data mart. It also creates data access complexity due to multiple data marts
Enterprise data warehouse(EDW)
In a Dependent Data Mart with Operational Data Store a single ETL is for ____
Data Warehouse
In a Logical Data Mart and Real Time Warehouse Architecture Near real-time ETL is for ___
attributes or field(StudentName, StudentID)
In a relation table columns are ___
entity instances (records)
In a relation table rows are ___
Data Mining
Knowledge discovery using a blend of statistical, AI, and computer graphics techniques
Identify constraints on columns(domain specifications). Create the table(and indexes if appropriate)
Last two steps of Table Creation
Informational
Managers, business analysts, customers are all the primary users of the ____ System
Tablespace
Named set of disk storage elements in which physical files for database tables can be stored
2NF
No partial functional dependencies Would be _____ form
Database
Operational System is also known as the ___
secondary keys
Other fields or combinations of fields that can also be indexed are called____
Indexed
Primary keys are automatically ____
Routines
Program modules that execute on demand
entity types
Relations are ____(Student)
Functions
Routines that return values and take input parameters
Inner Join
Rows must have matching values in order to appear in the result
Outer Join
Rows that do not have matching values in common columns are still included in the result
Field
Smallest unit of data in a database
Index
Speed up random/sequential access to base table data. Similar to the back of a textbook
online transaction processing
Star Schema is bad for _____
ad-hoc
Star Schema is excellent for ____ queries.
TIMESTAMPTIMESTAMP WITH LOCAL TIME ZONE
Stores a moment an event occurs, using a definable fraction-of-a-second precision. Value adjusted to the users session time zone
BINARY LARGE OBJECT(BLOB)
Stores binary string values in hexadecimal format. defined to be a variable length.
NUMERIC
Stores exact numbers with a defined precision and scale.
Integer(INT)
Stores exact numbers with a predefined precision and scale of zero.
CHARACTER VARYING(VARCHAR or VARCHAR2)
Stores string values containing any characters in a character set but of definable variable length.
Character(CHAR)
Stores string values containing any characters in a character set. Is defined to be a fixed length
BOOLEAN
Stores truth values:TRUE, FALSE, or UNKNOWN
ALTER TABLE
The _____ statement allows you to change column specifications
INSERT
To add one or more rows to a table you would use the ____ statement
UPDATE
To modify data in an existing row/rows you would uses the ___ Statement
Denormalization
Transforming normalized relations into non-normalized physical record specifications
Purpose of Physical Database Design
Translate the logical design/description of data into the technical specifications for storing and retrieving data
wildcard*
Use the _____ after SELECT for everything in the table
AND, OR and NOT
Use the _______ operators for customizing conditions in the WHERE clause
1:N
What type of relationships between dimension tables and fact tables?
data integrity
When choosing data types you want to improve ____?
storage space
When choosing data types you want to minimize ____?
all possible values of the attribute
When choosing data types you want to represent ___?
data manipulations
When choosing data types you want to support ____?
Hashed
Which type of file organization is best for adding new records?
Hashed
Which type of file organization is best for updating records?
Sequential
Which type of file organization is the best for not wasting storage space?Worst?
Hashed
Which type of file organization would be best for Deleting records?
Indexed
Which type of file organization would be best for Multiple-key retrieval?
Sequential
Which type of file organization would not be used for random retrieval on primary key?
Hashed; impractical unless using a hash index
Which type of file organization would you not want to use for Sequential retrieval on primary key?
Data warehouse
Which would have a longer life a data warehouse or a Data Mart?
Warehouse; Mart has one central subject of concern to users
Which would have more subjects a Data warehouse or Mart?
extra work
With hashed file organization, when adding new records multiple keys with the same address would require?
Data warehouse
Would a Data mart or Data Warehouse be more flexible?