ISM 4481 Final
True
A hashing algorithm is a routine that converts a primary key value into a relative record number.
Outer Join
A join in which rows that do not have matching values in common columns are nonetheless included in the result table.
Equi-Join
A join in which the joining condition is based upon equality between values in the common columns; columns appear redundantly in the result table
True
A major benefit of SQL as a standard is reduced training costs.
True
A procedure is run by calling it by its name.
False
A routine is a named set of SQL statements that are considered when a data modification occurs.
Stored Procedures
A set of Structured Query Language (SQL) statements with an assigned name, which are stored in a relational database management system as a group, so it can be reused and shared by multiple programs.
False
A snowflake schema is usually heavily aggregated.
True
Efficient database structures will be beneficial only if the queries and underlying DBMS are tuned to properly use these structures
True
Figuring out what attributes you want in your query before you write the query will help with query writing.
Sequential File organization
File organization method in which Records in the file are stored in sequence by primary key field values (if sorted, every insert or delete requires a resort)
Inputs of physical database design process
Inputs of ___ include: - Normalized Relations - Volume Estimates - Attribute Definitions - Response time expectations - Data Security needs - backup / recovery needs - integrity expectations - DBMS technology used
True
ODBC is an application programming interface that provides a common language for application programs to access and process an SQL database independent of the particular RDBMS that is accessed.
True
One decision in the physical design process is selecting structures
False
Quality data are not essential for well-run organizations.
Characteristics of Hashed organization
The following are Characteristics of _____ file organization - Extra space may be needed to allow for addition an deletion of records after the initial set of records is loaded - Sequential retrieval is impractical unless using a hash index - Random retrieval on primary key is very fast - multiple key retrieval is not possible unless using a hash index - Deleting records is very easy -Adding records is easy but multiple records with the same address requires extra work - updating records is very easy
True
Web services allow for communication between programs over the Internet.
True
When creating tables, it's important to decide which columns will allow null values before the table is created.
range controller
allowable value limitations (constraints or validation rules)
default value
assumed value if no explicit value is assigned
Fact Tables
characteristic of star schema. Table which contains factual or quantitative data
Indexed File Organization
file organization method in which records are stored sequentially or non-sequentially with an index that allows software to locate individual records. Uses a Tree search: Average time to find records = Depth of the Tree.
Hashed
file organization method which Usually uses division remainder to determine record position. Records with same position are grouped in lists.
API- Application Program interface
routines that an application uses to direct the performance of procedures by the computer's operating system
When performing Field Design
• Choosing data type • Coding, compression, encryption • Controlling data integrity Factors to consider when...
Types of Cloud computing
• Infrastructure-as-a-service (IaaS) • Platform-as-a-service (PaaS) • Software-as-a-service (SaaS)
SOA - Service Oriented Architecture
- A new paradigm for IT application development, based mostly on Web services - Loosely coupled, highly inter-operable components - Leads to flexibility and shorter development time
False
A DTD is a language used for defining XML databases.
Three Tiered Architectures
A client-server architecture in which the functional process logic, data access, computer data storage and user interface are developed and maintained as independent modules on separate platforms
Service-Oriented Architectures (SOA)
A collection of services that communicate with each other, usually by passing data or coordinating a business activity
True
A correlated subquery is executed once for each iteration through the outer loop.
False
A data mart is a data warehouse that contains data that can be used across the entire organization.
True
A data steward is a person assigned the responsibility of ensuring the organizational applications properly support the organization's enterprise goals for data quality.
Data Mart
A data warehouse that is limited in scope
False
A database is maintained and queried using the data mapping language (DML).
False
A file organization is a named portion of primary memory.
False
A join index is a combination of two or more indexes.
False
A key is a data structure used to determine the location of rows in a file that satisfy some condition.
False
A language used to transform complex XML documents and also used to create HTML pages from XML documents is called Extensible Stylesheet Transformation.
Data Type
A particular kind of data item, as defined by the values it can take, language used, or the operations that can be performed on it
True
A pointer is a field of data that can be used to locate a related field or record of data.
False
A referential integrity constraint specifies that the existence of an attribute in one table depends upon the existence of a foreign key in the same or another table.
Join
A relational operation that causes two or more tables with a common domain to be combined into a single table or view
Data Warehouse
A subject oriented, integrated, time variant, non-updatable collection of data used in support of the management decision making process
True
A tablespace is a named set of disk storage elements in which physical files for the database tables may be stored
True
A trigger is a named set of SQL statements that are considered when a data modification occurs.
Table Action
ADD [COLUMN] column_definition ALTER [COLUMN] column_name DROP [COLUMN] column_name ADD table_constraint
Alter Table Statement
ALTER TABLE table_name alter_table_action;
True
Adding the DISTINCT keyword to a query eliminates duplicates.
Advantages of Stored Procedures
Advantages of ___ include: - performance improves for compiled SQL statements - reduced network traffic - improved security - improved data integrity - Thinner Clients
Advantages of user-defined transactions
Advantages of ______ include: - can create a more flexible, accessible interface for the user - Possible performance improvement - Database security improvement; grant access only to the application instead of users
True
An SQL query that implements an outer join will return rows that do not have matching values in common columns.
Natural Join
An equi-join in which one of the duplicate columns is eliminated in the result table
False
An equi-join is a join in which one of the duplicate columns is eliminated in the result table.
False
An extent is a named portion of secondary memory allocated for the purpose of storing physical records.
True
An independent data mart is filled with data extracted from the operational environment without the benefit of a data warehouse.
True
An insert command does not need to have the fields listed.
True
An operational data store is typically a relational database and normalized, but it is tuned for decision-making applications.
True
Application partitioning gives developers the opportunity to write application code that can later be placed on either a client workstation or a server, depending upon which location will give the best performance.
Single Table Query
CREATE TABLE tablename ( {column definition [table constrain]} . , . . [ON COMMIT {DELETE | PRESERVE} ROWS] ); where column definition ::= Column_name {domain name | datatype [(size)] } [column_constraint_clause...] [default value] [collate clause] And table constraint ::= [CONSTRAINT constraint_name] Constraint_type [constraint_atttributes]
Characteristics of Quality Data
Characteristics of ____ include: - completeness - consistency - uniqueness - timeliness
Characteristics of Cloud Technologies
Characteristics of _____ include: - On demand - broad network access - resource pooling - rapid elasticity - measured service
False
Cloud computing will have a limited impact on three-tier applications.
Data Definition Language (DDL)
Commands that define database, including creating, altering, and dropping tables and establishing constraints.
Data Manipulation Language (DML)
Commands that maintain and Query a database
True
Completeness means that all data that are needed are present.
Star Schema
Consists of one or more fact tables referencing any number of dimension tables
True
Correlated sub-queries are less efficient than queries that do not use nesting.
False
Count(*) tallies only those rows that contain a value, while Count counts all rows.
False
DCL is used to update the database with new records.
Case-Based reasoning
Data mining technique used to Derive rules from real world case examples
Neural nets
Data mining technique used to Develop predictive models based on principles modeled after the human brain
Sequence association
Data mining technique used to Discover cycles of events and behaviors
Affinity
Data mining technique used to Discover strong mutual relationships
Clustering and Signal Processing
Data mining technique used to Discover subgroups or segments
Rule discovery
Data mining technique used to Search for patterns and correlations in large datasets
Regression
Data mining technique used to Test or discover relationships from historical data
Decision Tree induction
Data mining technique used to Test or discover; IF (condition) THEN rules for decision propensity
Fractals
Data mining technique used to compress large databases without losing information
True
Data quality is essential for SOX and Basel II compliance.
True
Denormalization almost always leads to more storage space for raw data.
False
Dirty data saves work for information systems projects.
False
EXISTS takes a value of false if the subquery returns an intermediate result set.
False
Even with XML, different types of devices will require a different page.
True
Expressions are mathematical manipulations of data in a table that may be included as part of the SELECT statement.
True
Extensible Markup Language (XML) is a scripting language based upon SGML that allows the creation of customized tags.
Goals of Data Mining
Goals of ____ include: - Explain observed events or conditions - Confirm hypotheses - Explore data for new or unexpected relationships
Handling Changing Dimensions
How to maintain knowledge of the past - Kimble's approaches: - Type 1: just replace old data with new (lose historical data) - Type 3: for each changing attribute, create a current value field and several old-valued fields (multivalued) - Type 2: create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of change. Most common approach.
True
If multiple Boolean operators are used in an SQL statement, NOT is evaluated first, then AND, then OR.
True
In order to update data in SQL, one must inform the DBMS which relation, columns, and rows are involved.
Embedded SQL
Included hard coded SQL statements in a program written in another programming language such as C or Java
Union Join
Includes all columns for each table in the join and an instance for each row of each table
False
Independent data marts do not generally lead to redundant data and efforts.
True
Indexes are most useful for columns that frequently appear in WHERE clauses of SQL commands, either to qualify the rows to select or for linking.
False
Infrastructure-as-a-Service refers to the use of technologies such as servers, storage and networks from external service providers.
True
Joining tables or using a subquery may produce the same result.
False
Logical data marts are physically separate databases from the enterprise data warehouse.
Self Join
Matching rows of a table with other rows from the same table
False
Multidimensional OLAP (MOLAP) tools use variations of SQL and view the database as a relational database, in either a star schema or other normalized or denormalized set of tables.
False
NoSQL is a great technology for storing well-structured data.
False
One method to handle missing values is to substitute an exact value.
True
One of the original purposes of the SQL standard was to provide a vehicle for portability of database definition and application modules between conforming DBMSs.
Outputs of Physical Database Design Process
Outputs of _____ include: - Attribute Data Types - Physical record descriptions (doesn't always match logical design) - File organizations - Indexes and database archetectures - Query optimization
True
PHP is a platform for Web development.
True
Requirements for Response time, data security, backup recovery, are all requirements for physical database design
Procedure
Routines that do not return values and can take input or output parameters
Rules of thumb for choosing indexes
Rules of thumb for _____ include: - Use on Larger Tables - Index the primary key of each table - Index Search fields (frequently in WHERE clause) - Fields in ORDER BY and GROUP BY commands - When there are > 100 values but not when <30 - Avoid use of indexes for fields with long vaules - If key to index is used to determine location of record use surrogate to allow the even spread in storage area. -DBMS may have limited number of indexes per table - Be careful indexing attributes with null values, DBMS may not recognize
Clauses of Select Statement
SELECT • List the columns (and expressions) to be returned from the query FROM • Indicate the table(s) or view(s) from which data will be obtained WHERE • Indicate the conditions under which a row will be included in the result GROUP BY • Indicate categorization of results HAVING • Indicate the conditions under which a category (group) will be included ORDER BY • Sorts the result according to specified criteria
True
SQL is both an American and international standard for database access.
True
SQL statements can be included in another language, such as C or Java.
True
Scalable technology is critical to a data mart.
Write single and multiple table SQL queries such as: Union, Select ... from ... where
Select Customer_T.CustomerID, Order_T.CustomerID, CustomerName, Order ID FROM Customer_T, Order_T WHERE Customer_T.CustomerID = Order_T.CustomerID ORDER BY OrderID
True
Sensitivity testing involves ignoring missing data unless value may significantly change results
True
Simple Object Access Protocol (SOAP) is an XML-based communication protocol.
False
Specifying the attribute names in the SELECT statement will make it easier to find errors in queries and also correct for problems that may occur in the base system.
Dimension Tables
Star Schema Characteristic which Contain descriptions about the subjects of the business
True
Storing XML data is becoming a big question as XML data becomes more prevalent.
How to handle missing data
Strategies for ______ include - Substitute an estimate of the missing value - construct a report listing missing values - In programs, ignore missing data unless the value is significant
False
Stripping an XML document means storing each element of an XML schema in a relational table and using other tables to store the elements.
File Organization
Technique for physically arranging records of a file on secondary storage.
Dynamic SQL
The Ability for an application program to generate SQL code on the fly, as the application is running
False
The DELETE TABLE DDL command is used to remove a table from the database.
True
The ORDER BY clause sorts the final results rows in ascending or descending order.
True
The WHERE clause is always processed before the GROUP BY clause when both occur in a SELECT statement.
True
The asterisk (*) wildcard designator can be used to select all fields from a table as well as in WHERE clauses when an exact match is not possible.
False
The comparison operators = and != are used to establish a range of values.
True
The first requirement for building a user-friendly interface is a set of metadata that describes the data in the data mart in business terms that users can easily understand.
True
The following SQL statement is an example of a correlated subquery. select first_name, last_name, total_sales from salesman s1 where total_sales > all (select total_sales from salesman s2 where s1.salesman_id != s2.salesman_id);
Characteristics of Sequential File organization
The following are Characteristics of _____ file organization - No wasted storage space - very fast retrieval on primary key - Random Retrieval on primary key is unpractical - Multiple key retrieval is possible but requires scanning the entire field - deleting records can create wasted space or require reorganization -Adding records requires rewriting a file -updating records requires rewriting the file
Characteristics of Indexed file organization
The following are Characteristics of _____ file organization - no wasted space for data but extra space for index - moderately fast sequential retrieval - moderately fast random retrieval on primary key - Multiple key retrieval: Very fast with multiple indexes - deleting records easy but requires maintenance of all indexes -adding records easy but requires maintenance of all indexes -updating records easy but requires maintenance of indexes
False
The following query will execute without errors. select customer.customer_name, salesman.sales_quota from customer where customer.salesman_id = (select salesman_id where lname = 'SMITH');
Physical Database design process
The goal of ______ is to create a design for storing data that will provide adequate performance, and insure database integrity, security, and recover ability
False
The grain of a data warehouse indicates the size and depth of the records.
True
The joining condition of an equi-join is based upon an equality.
False
The logical database design always forms the best foundation for grouping attributes in the physical design.
Purposes of SQL Standard
The purposes of ___ are too: - Specify semantics for data definition and manipulation - Define Data structures and basic operations - Enable portability of database definition & application modules - Specify minimum level and complete level standards - Allow for later growth/enhancement to standard (referential integrity, transaction management, user-defined functions, extended join operations, national character sets)
Star Schema
The simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts
field
The smallest unit of application data recognized by system software
False
The status of data is the representation of the data after an event has occurred.
True
The storage component of a client/server architecture is responsible for data storage and retrieval from the physical storage devices associated with the application.
True
There are applications for fact tables without any nonkey data, only the foreign keys for the associated dimensions.
False
Transaction integrity commands are not used to identify whole units of database changes that must be completed in full for the database to retain integrity.
False
Transient data are never changed.
True
Triggers have three parts: the event, the condition, and the action.
False
Two-tier architectures are much more scalable than three-tier architectures.
Inner Join
Type of Join that will only return rows from each table that have a matching row in the other.
False
Using an index for attributes referenced in ORDER BY and GROUP BY clauses has no significant impact upon database performance.
True
Web Services Description Language is an XML-based grammar.
True
When EXISTS or NOT EXISTS is used in a subquery, the select list of the subquery will usually just select all columns as a placeholder because it doesn't matter which columns are returned.
True
When a dimension participates in a hierarchy, the database designer can normalize the dimension into a nested set of tables with 1:M relationships between them.
False
When a subquery is used in the FROM clause, it is called a denied table.
False
When creating a table, it is not important to consider foreign key - primary key mates.
Factors to consider when selecting an appropriate file organization:
When selecting ____, factors that should be considered are - Fast Data retrieval and throughput - Efficient storage space utilization - protection from failure and data loss - Minimizing need for reorganization - Accommodating Growth - Security form authorized use
True
XML addresses the structuring and manipulation of information in a Web browser.
Web Services Description Language (WSDL)
XML-based grammar for describing a web service and specifying its public interface
False
XML-based vocabularies, such as XBRL, do not allow meaningful comparisons to be made of data across many organizations.
False
XQuery is an HTML transformation language used to query HTML pages.
True
XQuery is an XML transformation language for querying relational data as well as XML databases.
Physical database design
____ must comply with the following: •Sarbanes- Oxley Act (SOX) - protect investors by improving accuracy and reliability •Committee of Sponsoring Organizations (COSO) of the Treadway Commission •IT Infrastructure Library (ITIL) •Control Objectives for Information and Related Technology (COBIT)
Cloud Computing
a model for creating ubiquitous, convenient, on demand access to network services
Thin Client
a networked computer where resources are distributed over a network
Fat Client
a networked computer with most resources installed locally, rather than distributed over a network
Web Services
a set of emerging XML-based standards that define protocols for automatic communication between software programs over the web
null Value
allowing or prohibiting empty fields
Big Data
database whose volume, velocity, and variety strain the ability of the relational DBMS to capture, manage, and process data in a timely fashion
Delete statement: Removes rows from a table
o Delete certain rows DELETE FROM CUSTOMER_T WHERE CUSTOMERSTATE = 'HI'; o Delete all rows DELETE FROM CUSTOMER_T;
high-quality data
quality refers to the condition of a set of values of qualitative or quantitative variables. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning"
Middleware
software that allows an application to inter-operate with other software with out requiring the user to understand the code-low level operations
Physical Data Model
transforms the logical description of data into technical specifications for storing and retrieving data