Databases Comprehensive
The SDLC phase in which database processing programs are created is the ________ phase.
Implementation
A method of capturing only the changes that have occurred in the source data since the last capture is called ________ extract.
Incremental
In the figure below, which attributed is derived?
Years_Employed
Operational and informational systems are generally separated because of which of the following factors?
A data warehouse centralizes data that are scattered throughout disparate operational systems and makes them readily available for decision support applications.
What results would the following SQL statement produce? select owner, table_name from dba_tables where table_name = 'CUSTOMER';
A listing of the owner of the customer table
An entity cluster is:
A set of one or more entity types and associated relationships grouped into a single abstract entity type
Which of the following is NOT a component of a repository system architecture?
An informational model
The SDLC phase in which the detailed conceptual data model is created is the ________ phase.
Analysis
A property or characteristic of an entity type that is of interest to the organization is called a(n):
Attribute
Which of the following data-mining techniques identifies clusters of observations with similar characteristics?
Clustering and signal processing
________ is/are a new technology which trade(s) off storage space savings for computing time.
Column databases
A(n) ________ constraint is a type of constraint that addresses whether an instance of a supertype must also be an instance of at least one subtype.
Completeness
An attribute that uniquely identifies an entity and consists of a composite attribute is called a(n):
Composite attribute
A primary key that consists of more than one attribute is called a:
Composite key
Which of the following is true of data visualization?
Correlations and clusters in data can be easily identified
A technique using pattern recognition to upgrade the quality of raw data is called:
Data scrubbing
With HDFS it is less expensive to move the execution of computation to data than to move the:
Data to computation
The role of a ________ emphasizes integration and coordination of metadata across many data sources.
Data warehouse administrator
A data mart is a(n):
Data warehouse that is limited in scope
________ is a technical function responsible for database design, security, and disaster recovery.
Database administration
Which of the following is software used to create, maintain, and provide controlled access to databases?
Database management system (DBMS)
The value a field will assume unless the user enters an explicit value for an instance of that field is called a:
Default value
Allowing users to dive deeper into the view of data with online analytical processing (OLAP) is an important part of:
Descriptive analytics
The oldest form of analytics is:
Descriptive analytics
A star schema contains both fact and ________ tables.
Dimension
The ________ rule specifies that an entity can be a member of only one subtype at a time.
Disjoint
A ________ addresses whether an instance of a supertype may simultaneously be a member of two or more subtypes.
Disjointedness constraint
________ is the most popular RDMS data model notation.
ERD
In order to embed SQL inside of another language, the ________ statement must be placed before the SQL in the host language.
EXEC SQL
A database action that results from a transaction is called a(n):
Event
The goal of data mining related to analyzing data for unexpected relationships is:
Exploratory
A contiguous section of disk storage space is called a(n):
Extent
Datatype conflicts is an example of a(n) ________ reason for deteriorated data quality.
External data source
A disadvantage of partitioning is:
Extra space and update time
A file organization is a named portion of primary memory.
False
A join index is a combination of two or more indexes.
False
A synonym is an attribute that may have more than one meaning.
False
A ternary relationship is equivalent to three binary relationships.
False
A transversal dependency is a functional dependency between two or more nonkey attributes.
False
An extent is a named portion of secondary memory allocated for the purpose of storing physical records.
False
Horizontal partitioning refers to the process of combining several smaller relations into a larger table.
False
In the figure below, Name would be an ideal identifier.
False
In the figure shown below, a person has to be married.
False
In the figure shown below, a rental unit can be both a house and an apartment.
False
It is desirable that no two attributes across all entity types have the same name.
False
Master data management is the disciplines, technologies and methods to ensure the currency, meaning and quality of data within one subject area.
False
Most data outages in organizations are caused by hardware failures.
False
Operational metadata are derived from the enterprise data model.
False
Quality data are not essential for well-run organizations.
False
Reduced uptime is a disadvantage of partitioning.
False
Specifying the attribute names in the SELECT statement will make it easier to find errors in queries and also correct for problems that may occur in the base system.
False
Subqueries can only be used in the WHERE clause.
False
The degree of a relationship is the number of attributes that are associated with it.
False
The following figure is an example of total specialization.
False
The following queries produce the same results. Select DISTINCT customer_name, customer_city from customer, salesman where customer.salesman_id = salesman.salesman_id and salesman.lname = 'SMITH'; select customer_name, customer_city from customer where customer.salesman_id = (select salesman_id from salesman where lname = 'SMITH');
False
The internal schema consists of the physical schema and the enterprise data model.
False
Older systems that often contain data of poor quality are called ________ systems.
Legacy
A database is an organized collection of ________ related data.
Logically
________ tools commonly load data into intermediate hypercube structures.
MOLAP
Which of the following is NOT true of poor data and/or database administration?
Maintaining a secure server
A student can attend five classes, each with a different professor. Each professor has 30 students. The relationship of students to professors is a ________ relationship.
Many-to-many
When an organization must decide on optimization and simulation tools to make things happen it is using:
Prescriptive analytics
An attribute (or attributes) that uniquely identifies each row in a relation is called a:
Primary Key
A two-dimensional table of data sometimes is called a:
Relation
________ are established between entities in a well-structured database so that the desired information can be retrieved.
Relationships
Packaged data models:
Require customization
An attribute that must have a value for every entity (or relationship) instance is a(n):
Required Attribute
A centralized knowledge base of all data definitions, data relationships, screen and report formats, and other system components is called a(n):
Respository
The ________ rule specifies that each entity instance of the supertype must be a member of some subtype in the relationship.
Total specialization
A method for handling missing data is to:
Track missing data with special reports
A candidate key is an attribute, or combination of attributes, that uniquely identifies a row in a relation.
True
A data quality audit helps an organization understand the extent and nature of data quality problems.
True
A data warehouse contains summarized and historical information.
True
A good data definition is always accompanied by diagrams, such as the entity-relationship diagram.
True
A hashing algorithm is a routine that converts a primary key value into a relative record number.
True
A method of capturing data in a snapshot at a point in time is called static extract.
True
A partial functional dependency is a functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key.
True
A physical schema contains the specifications for how data from a conceptual schema are stored in a computer's secondary memory.
True
A pointer is a field of data that can be used to locate a related field or record of data.
True
A primary key is an attribute that uniquely identifies each row in a relation.
True
A single occurence of an entity is called an entity instance.
True
A tablespace is a named set of disk storage elements in which physical files for the database tables may be stored.
True
Data scrubbing is a technique using pattern recognition and other artificial intelligence techniques to upgrade the quality of raw data before transforming and moving the data to the data warehouse.
True
Data structures include data organized in the form of tables with rows and columns.
True
ETL is short for Extract, Transform, Load.
True
For the relationship represented in the figure below, a department can have more than one employee.
True
HBASE is a wide-column store database that runs on top of HDFS (modeled after Google).
True
Human intervention is an important part of big data analytics.
True
Improving data capture process is a fundamental step in data quality improvement.
True
In order to find out what customers have not placed an order for a particular item, one might use the NOT qualifier along with the IN qualifier.
True
Informational systems are designed to support decision making based on historical point-in-time and prediction data.
True
One of the biggest challenges of the extraction process is managing changes in the source system.
True
One property of a relation is that each attribute within a relation has a unique name
True
Packaged data models are as flexible as possible, because all supertype/subtype relationships allow the total specialization and overlap rules.
True
Participation in a relationship may be optional or mandatory.
True
Security is one advantage of partitioning.
True
The UNION clause is used to combine the output from multiple queries into a single result table.
True
The first requirement for building a user-friendly interface is a set of metadata that describes the data in the data mart in business terms that users can easily understand.
True
The need for data warehousing in an organization is driven by its need for an integrated view of high-quality data.
True
The relationship between the instances of two entity types is called a binary relationship.
True
The three major types of analytics are: descriptive, predictive, and prescriptive.
True
________ includes concern about data quality issues.
Veracity
______ partitioning distributes the columns of a table into several separate physical records.
Vertical
The three 'v's commonly associated with big data include:
Volume, variety, velocity
An entity type whose existence depends on another entity type is called a ________ entity.
Weak
A relation that contains minimal redundancy and allows easy use is considered to be:
Well-structured
Data federation is a technique which:
provides a virtual view of integrated data without actually creating one centralized database.
A ________ is a DBMS module that restores the database to a correct condition when a failure occurs.
recovery manager
Relational databases establish the relationships between entities by means of common fields included in a file called a(n):
relation
A DBMS may perform checkpoints automatically or in response to commands in user application programs.
True
A business transaction is a sequence of steps that constitute some well-defined business activity.
True
A join in which the joining condition is based on equality between values in the common column is called an equi-join.
True
An SQL query that implements an outer join will return rows that do not have matching values in common columns.
True
An enterprise data warehouse that accepts near-real time feeds of transactional data and immediately transforms and loads the appropriate data is called a real-time data warehouse.
True
SQL statements can be included in another language, such as C or Java.
True
Smartphones can produce millions of observations per second making them Business Intelligence and Analytics 3.0.
True
The external schema contains a subset of the conceptual schema relevant to a particular group of users.
True
An optimistic approach to concurrency control is called:
Versioning
A transaction that terminates abnormally is called a(n) ________ transaction.
aborted
While views promote security by restricting user access to data, they are not adequate security measures because:
an unauthorized person may gain access to view through experimentation
At a basic level, analytics refers to:
analysis and interpretation of data
One way to improve the data capture process is to:
check entered data immediately for quality against data in the database
The actions that must be taken to ensure data integrity is maintained during multiple simultaneous transactions are called ________ actions.
concurrency control
Organizing the database in computer disk storage is done in the ________ phase.
design
Embedded SQL consists of:
hard-coded SQL statements included in a program written in another language
Data quality is important for all of the following reasons EXCEPT:
it provides a stream of profit
The process of combining data from various sources into a single table or view is called:
joining
Data that describe the properties of other data are:
metadata
Data quality ROI stands for:
risk of incarceration
A(n) ________ is a procedure for acquiring the necessary locks for a transaction where all necessary locks are acquired before any are released.
two-phase lock
Establishing IF-THEN-ELSE logical processing within an SQL statement can be accomplished by:
using the CASE keyword in a statement
The following figure shows an example of:
A composite attribute
For the relationship represented in the figure below, which of the following is true?
A department can have more than one employee
Which of the following advances in information systems contributed to the emergence of data warehousing?
Advances in middleware products that enabled enterprise database connectivity across heterogeneous platforms.
The SDLC phase in which every data attribute is defined, every category of data is listed and every business relationship between data entities is defined is called the ________ phase.
Analysis
A ________ defines or constrains some aspect of the business.
Business rule
Which of the following factors drive the need for data warehousing?
Businesses need an integrated view of company information
Which of the following criteria should be considered when selecting an identifier?
Choose an identifier that doesn't have large composite attributes
An attribute that can be broken down into smaller parts is called a(n) ________ attribute.
Composite
________ takes a value of TRUE if a subquery returns an intermediate results table which contains one or more rows.
EXISTS
In the following diagram, which of the answers below is true?
Each patient has one or more patient histories
An advantage of partitioning is:
Efficiency
A primary key whose value is unique across all relations is called a(n):
Enterprise key
A person, place, object, event, or concept about which the organization wishes to maintain data is called a(n):
Entity
The logical representation of an organization's data is called a(n):
Entity-relationship model
A constraint is a rule in a database system that can be violated by users.
False
A data mart is a data warehouse that contains data that can be used across the entire organization.
False
A default value is the value that a field will always assume, regardless of what the user enters for an instance of that field.
False
Creating a data model from a packaged data model requires much more skill than creating one from scratch.
False
Databases are generally the property of a single department within an organization.
False
Denormalization is the process of transforming relations with variable-length fields into those with fixed-length fields.
False
Generalization is a top-down process.
False
Hadoop is considered a relational database management system.
False
There are two principal types of authorization tables: one for subjects and one for facts.
False
There can be multivalued attributes in a relation.
False
Transient data are never changed.
False
The smallest unit of application data recognized by system software is a:
Field
A(n) ________ is a technique for physically arranging the records of a file on secondary storage devices.
File organization
A(n) ________ is a field of data used to locate a related field or record.
Index
An audit trail of database changes is kept by a:
Journalizing facility
The entity integrity rule states that:
No primary key attribute can be null
Which of the following are properties of relations?
No two rows in a relation are identical
According to your text, NoSQL stands for:
Not Only Structured Query Language
In a supertype/subtype hierarchy, each subtype has:
Only one supertype
The most commonly used form of join operation is the:
Outer join
In the figure below, which of the following is a subtype of patient?
Outpatient
The ________ rule states that an entity instance can simultaneously be a member of two (or more) subtypes.
Overlap
Which type of file is most efficient with storage space?
Sequential
In the figure below, which attribute is multivalued?
Skill
________ are examples of Business Intelligences and Analytics 3.0 because they have millions of observations per second.
Smartphones
The process of defining one or more subtypes of a supertype and forming relationships is called:
Specialization
Which of the following is an entity that exists independently of other entity types?
Strong
________ is a tool even non-programmers can use to access information from a database.
Structured Query language
The characteristic that indicates that a data warehouse is organized around key high-level entities of the enterprise is:
Subject-oriented
A type of query that is placed within a WHERE or HAVING clause of another query is called a:
Subquery
The following code is an example of a: SELECT CustomerName, CustomerAddress, CustomerCity, CustomerState, CustomerPostalCode FROM Customer_T WHERE Customer_T.CustomerID = (SELECT Order_T.CustomerID FROM Order_T WHERE OrderID = 1008);
Subquery
An attribute of the supertype that determines the target subtype(s) is called the:
Subtype discriminator
Which of the following is a generic entity type that has a relationship with one or more subtypes?
Supertype
Which of the following is a basic method for single field transformation?
Table lookup
An open-source DBMS is:
A free source-code RBMS that provides the functionality of an SQL-compliant DBMS
An alternative name for an attribute is called a(n):
Alias
The following code would include: SELECT Customer_T.CustomerID, CustomerName, OrderID FROM Customer_T RIGHT OUTER JOIN Order_T ON Customer_T.CustomerID = Order_T.CustomerID;
All rows of the Order_T Table regardless of matches with the Customer_T Table
The property by which subtype entities possess the values of all attributes of a supertype is called:
Attribute inheritance
Which of the following is a type of network security?
Authentication of the client workstation
A method to allow adjacent secondary memory space to contain rows from several tables is called:
Clustering
The real-time data warehouse is characterized by which of the following?
Data are immediately transformed and loaded into the warehouse
A repository of information about a database that documents data elements of a database is called a:
Data dictionary
Including data capture controls (i.e., dropdown lists) helps reduce ________ deteriorated data problems.
Data entry
________ is a component of the relational data model included to specify business rules to maintain the integrity of data when they are manipulated.
Data integrity
The Hadoop Distributed File System (HDFS) is the foundation of a ________ infrastructure of Hadoop.
Data management
A graphical system used to capture the nature and relationships among data is called a(n):
Data model
________ duplicates data across databases.
Data propagation
Which of the following functions do cost/benefit models?
Database planning
The process of managing simultaneous operations against a database so that data integrity is maintained is called completeness control.
False
The schema on write and schema on read are considered synonymous approaches.
False
The status of data is the representation of the data after an event has occurred.
False
When all multivalued attributes have been removed from a relation, it is said to be in:
First normal form
An attribute in a relation of a database that serves as the primary key of another relation in the same database is called a:
Foreign Key
Loading data into a data warehouse does NOT involve:
Formatting the hard drive
A constraint between two attributes is called a(n):
Functional dependency
The process of defining a more general entity type from a set of more specialized entity types is called:
Generalization
In which type of file is multiple key retrieval not possible?
Hashed
An attribute that may have more than one meaning is called a(n):
Homonym
The Hadoop framework consists of the ________ algorithm to solve large scale problems.
MapReduce
The methods to ensure the quality of data across various subject areas are called:
Master Data management
A functional dependency in which one or more nonkey attributes are functionally dependent on part, but not all, of the primary key is called a ________ dependency.
Partial functional
The ________ rule specifies that an entity instance of a supertype is allowed not to belong to any subtype.
Partial specialization
In the figure below, which of the following apply to both OUTPATIENTs and RESIDENT_PATIENTs?
Patient_Name
All of the following are applications for big data and analytics EXCEPT:
Personal finances
________ is arguably the most common concern by individuals regarding big data analytics.
Personal privacy
Descriptive, predictive, and ________ are the three main types of analytics.
Prescriptive
One of the most popular RAD methods is:
Prototyping
Conformed dimensions allow users to do the following:
Query across fact tables with consistency
Data that are detailed, current, and intended to be the single, authoritative source of all decision support applications are called ________ data.
Reconciled
Which of the following is NOT an advantage of database systems?
Reduced program maintenance
While triggers run automatically, ________ do not and have to be called.
Routines
One field or combination of fields for which more than one record may have the same combination of values is called a(n):
Secondary Key
The traditional methodology used to develop, maintain and replace information systems is called the:
Systems Development Life Cycle.
Data is represented in the form of:
Tables
Security measures for dynamic Web pages are different from static HTML pages because:
The connection requires full access to the database for dynamic pages
The following figure shows an example of:
The overlap rule
Subtypes should be used when:
There are attributes that apply to some but not all instances of an entity type.
A trigger can be used as a security measure in which of the following ways?
To cause special handling procedures to be executed
A referential integrity constraint is a rule that maintains consistency among the rows of two relations.
True
Triggers have three parts: the event, the condition, and the action.
True
When two or more attributes describe the same characteristic of an entity, they are synonyms.
True
The ________ operator is used to combine the output from multiple queries into a single result table.
UNION
A generic or template data model that can be reused as a starting point for a data modeling project is called a(n):
Universal data model
Regarding big data value, the primary focus is on:
Usefulness
A(n) ________ is often developed by identifying a form or report that a user needs on a regular basis.
User View
In the figure below, to which of the following entities are the entities "CAR" and "TRUCK" generalized?
Vehicle
A procedure is:
called by name
A join operation:
causes two tables with a common domain to be combined into a single table or view
A logical data mart is a(n):
data mart created by a relational view of a slightly denormalized data warehouse.
A technique using artificial intelligence to upgrade the quality of raw data is called:
data scrubbing
A detailed coding scheme recognized by system software for representing organizational data is called a(n):
data type
The coding or scrambling of data so that humans cannot read them is called:
encryption
The following code is an example of a(n): SELECT Customer_T.CustomerID, Order_T.CustomerID, CustomerName, OrderID FROM Customer_T, Order_T WHERE Customer_T.CustomerID = Order_T. CustomerID;
equi-join
Data governance can be defined as:
high-level organizational groups and processes that oversee data stewardship
Most data outages in organizations are caused by:
human error
The best place to improve data entry across all applications is:
in teh database definitions
The analysis of summarized data to support decision making is called:
informational processing
An operational data store (ODS) is a(n):
integrated, subject-orientated, updateable, current-valued, detailed database designed to serve the decision support needs of operational users
A dependent data mart:
is filled exclusively from the enterprise data warehouse with reconciled data.
Dynamic SQL:
is used to generate appropriate SQL code on the fly as an application is processing
Big Data includes:
large volumes of data with many different data types that are processed at very high speeds.
When reporting and analysis organization of the data is determined when the data is used is called a:
schema on read
External data sources present problems for data quality because:
there is a lack of control over data quality
User-defined transactions can improve system performance because:
transactions are processes as sets, reducing system overhead
A relatively small team of people who collaborate on the same project is called a:
workgroup