exam 4 review
______ includes NoSQL accommodation of various data types.
Variety
______ includes the value of speed in a NoSQL database.
Velocity
According to the UML Notation Guide, complete means:
all subclasses have been specified, and no others are expected.
While views promote security by restricting user access to data, they are not adequate security measures because:
an unauthorized person may gain access to a view through experimentation.
At a basic level, analytics refers to:
analysis and interpretation of data.
A(n) ________ is a named relationship between or among classes.
association
A(n) ________ is shown as a solid line between the participating classes.
association
Controls designed to restrict access and activities are called:
authorization rules.
The preferred method of fixing an aborted transaction is:
backing out the transaction.
A device to measure or detect fingerprints or signatures is called a(n) ________ device.
biometric
{disjoint, complete} is an example of a UML:
business rule constraint.
One way to improve the data capture process is to:
check entered data immediately for quality against data in the database
A DBMS periodically suspends all processing and synchronizes its files and journals through the use of a
checkpoint facility.
A diagram that shows the static structure of an object-oriented model is called a(n):
class diagram.
A(n) ________ is an attribute of a class that specifies a value common to an entire class.
class-scope attribute
An operation that applies to a class rather than an object instance is a(n):
class-scope operation.
A class that has direct instances is called a(n) ________ class.
concrete
The actions that must be taken to ensure data integrity is maintained during multiple simultaneous transactions are called ________ actions.
concurrency control
A design goal for distributed databases that states that although a distributed database runs many transactions, it appears that a given transaction is the only one in the system is called:
concurrency transparency.
An operation that creates a new instance of a class is called a(n):
constructor operation.
Data quality problems can cascade when:
data are copied from legacy systems
Conformance means that:
data are stored, exchanged or presented in a format that is specified by its metadata
A repository of information about a database that documents data elements of a database is called a:
data dictionary.
Including data capture controls (i.e., dropdown lists) helps reduce ________ deteriorated data problems.
data entry
A technique using artificial intelligence to upgrade the quality of raw data is called:
data scrubbing
A technique using pattern recognition to upgrade the quality of raw data is called:
data scrubbing
Converting data from the format of its source to the format of its destination is called:
data transformation
The role of a ________ emphasizes integration and coordination of metadata across many data sources.
data warehouse administrator
A(n) ________ is a database stored on multiple computers in multiple locations that are NOT connected by a data communications link.
decentralized database
Loading data into a data warehouse does NOT involve:
formatting the hard drive
Simple paths to other databases without the benefits of one logical database are called:
gateways.
When reporting and analysis organization of the data is determined when the data is used is called a:
schema on read.
A credit-card sized plastic card with an embedded microprocessor chip with the ability to store, process and output electronic data in a secure manner is called a(n):
smart card.
External data sources present problems for data quality because:
there is a lack of control over data quality
One characteristic of quality data which pertains to the expectation for the time between when data are expected and when they are available for use is:
timeliness
A discrete unit of work that must be processed completely or not at all within a computer system is called a:
transaction.
User interaction integration is achieved by creating fewer ________ that feed different systems.
user interfaces
Although volume, variety, and velocity are considered the initial three v dimensions, two additional Vs of big data were added and include:
veracity and value.
An optimistic approach to concurrency control is called:
versioning.
Which of the following is true of distributed databases?
Better local control
Which type of index is commonly used in data warehousing environments?
Bit-mapped index
Which of the following functions model business rules?
Database analysis
When a data repository (including internal and external data) does NOT follow a predefined schema, this is called a:
data lake.
The Hadoop Distributed File System (HDFS) is the foundation of a ________ infrastructure of Hadoop.
data management
All of the following are ways to consolidate data EXCEPT:
data rollup and integration
With HDFS it is less expensive to move the execution of computation to data than to move the:
data to computation
All of the following are benefits of object-oriented modeling EXCEPT:
decreased communication among the users, analysts, designers, and programmers.
All of the following are advantages of vertical partitioning EXCEPT:
easier to set up than horizontal partitioning.
The coding or scrambling of data so that humans cannot read them is called:
encryption.
A synchronized replication strategy has a(n) ________ reliability.
excellent
A(n) ________ prevents another transaction from reading and therefore updating a record until it is unlocked.
exclusive lock
A researcher trying to explain why sales of garden supplies in Hawaii have decreased would be an example of ________ data mining.
explanatory
The ________ occurs when one user reads data that have been partially updated by another user.
inconsistent read problem
A method of capturing only the changes that have occurred in the source data since the last capture is called ________ extract.
incremental
A(n) ________ stores metadata about an organization's data and data processing resources.
information repository
The Unified Modeling Language:
is a notation useful for graphically depicting an object-oriented analysis or design model.
Data quality is important for all of the following reasons EXCEPT:
it provides a stream of profit
The process of combining data from various sources into a single table or view is called:
joining
An optimization strategy that allows sites that can update to proceed and other sites to catch up is called:
lazy commit
The implementation of an operation is called a(n):
method.
An organization using HDFS realizes that hardware failure is a(n):
norm.
NoSQL includes data storage and retrieval:
not based on the relational model.
A(n) ________ is a concept, abstraction, or thing that has a state, behavior, and identity.
object
The following figure is an example of a(n):
object diagram.
Both E-R model and object-oriented models are centered around:
objects.
The process of replacing a method inherited from a superclass by a more specific implementation of the method in a subclass is called:
overriding.
In the ________ approach, one consolidated record is maintained from which all applications draw data.
persistent
All of the following are applications for big data and analytics EXCEPT:
personal finances.
A centralized strategy has ________ expandability.
poor
Application of statistical and computational methods to predict data events is:
predictive analytics.
Descriptive, predictive, and ________ are the three main types of analytics.
prescriptive
When an organization must decide on optimization and simulation tools to make things happen it is using:
prescriptive analytics.
A distributed database must:
present a single logical database that is physically distributed
Data federation is a technique which:
provides a virtual view of integrated data without actually creating one centralized database.
A(n) ________ encompasses an object's properties and the values of those properties.
state
One simple task of a data quality audit is to:
statistically profile all files
A form of distributed database in which all data across a network are kept continuously updated, so a user can access any data anywhere on the network and get the same answer is called a(n) ________ distributed database.
synchronous
With a pull strategy of replication, the ________ node determines when a database is updated.
target
Replication should be used when:
there are no or few triggers.
All of the following are tasks of data cleansing EXCEPT:
creating foreign keys
Most data outages in organizations are caused by:
human error.
The NoSQL model that includes a simple pair of a key and an associated collection of values is called a:
key-value score.
It is true that in an HDFS cluster the DataNodes are the:
large number of slaves
The three 'v's commonly associated with big data include:
volume, variety, and velocity.
Apache Cassandra is a leading producer of ________ NoSQL database management systems.
wide-column
The NoSQL model that incorporates 'column families' is called a:
wide-column store.
Hive is a(n) ________ data warehouse software.
Apache
Which of the following are key steps in a data quality program?
Apply TQM principles and practices
Which of the following environments uses a different DBMS at each node and supports local databases for unique data requests?
Heterogeneous; federated
Which of the following environments uses the same DBMS at each node with a central or master DBMS coordinating database access across nodes?
Homogeneous; nonautonomous
______ is a design goal for a distributed database that says a site can independently administer and operate its database.
Local autonomy
_______ is a design goal for a distributed database, which says a user does not need to know the location of data to use the data.
Location transparency
Which of the following threats involves outside parties using information to embarrass a company?
Loss of confidentiality
________ tools commonly load data into intermediate hypercube structures.
MOLAP
Which of the following is NOT true of poor data and/or database administration?
Maintaining a secure server
The Hadoop framework consists of the ________ algorithm to solve large scale problems.
MapReduce
The methods to ensure the quality of data across various subject areas are called:
Master Data Management
An organization that decides to adopt the most popular NoSQL database management system would select:
MongoDB
_______ indicates how many objects participate in a given relationship.
Multiplicity
An organization that requires a graph database that is highly scalable would select the ________ database management system.
Neo4j
Which of the following is true of data replication?
Node decoupling
According to your text, NoSQL stands for:
Not Only Structured Query Language.
Which of the following factors in deciding on database distribution strategies is related to autonomy of organizational units?
Organizational forces
________ is arguably the most common concern by individuals regarding big data analytics.
Personal privacy
Which of the following operations does NOT alter the state of an object?
Query
An organization that requires a sole focus on performance with the ability for keys to include strings, hashes, lists, and sorted sets would select ________ database management system.
Redis
______ is the most popular key-value store NoSQL database management system.
Redis
Data may be loaded from the staging area into the warehouse by following:
SQL Commands (Insert/Update)
Which of the following characterizes homogeneous environments?
Same DBMS used at all locations
________ are examples of Business Intelligences and Analytics 3.0 because they have millions of observations per second.
Smartphones
Sarbanes-Oxley Act was enacted to ensure the integrity of:
public companies' financial statements.
Event-driven propagation:
pushes data to duplicate sites as an event occurs
The major advantage of data propagation is:
real-time cascading of data changes throughout the organization
A ________ is a DBMS module that restores the database to a correct condition when a failure occurs.
recovery manager
All of the following are disadvantages of data replication EXCEPT:
reduced network traffic at prime time.
An approach to filling a data warehouse that employs bulk rewriting of the target data periodically is called:
refresh mode
A design goal for distributed databases to allow programmers to treat a data item replicated at several sites as though it were at one site is called:
replication transparency.
When incorrect data have been introduced, the database is best recovered by:
restarting from the most recent checkpoint and processing subsequent transactions.
Research shows that if an online customer does not get the service he or she expects within a few ________, the customer will switch to a competitor.
seconds
Guidelines for server security should include all of the following EXCEPT:
securing the network between client and server.
A joining operation in which only the joining attribute from one site is transmitted to the other site is called a(n):
semijoin.
NoSQL systems enable automated ________ to allow distribution of the data among multiple nodes to allow servers to operate independently on the data located on it.
sharding
It is true that in an HDFS cluster the NameNode is the:
single master server.
Security measures for dynamic Web pages are different from static HTML pages because:
the connection requires full access to the database for dynamic pages.
First degree or complete price discrimination relates to:
the maximum price customers are willing to pay.
Forward recovery is faster than restore/rerun because:
transactions do not have to be repeated
One way to generate, store, and forward messages for completed transactions to be broadcast across a network is through the use of:
triggers.
A(n) ________ is a procedure for acquiring the necessary locks for a transaction where all necessary locks are acquired before any are released.
two-phase lock
Quality data can be defined as being:
unique
The sequence of instructions required to process a transaction is called the:
unit of work.
Regarding big data value, the primary focus is on:
usefulness.
________ duplicates data across databases.
Data propagation
_______ is an application that can effectively employ snapshot replication in a distributed environment.
Data warehousing
________ is a technical function responsible for database design, security, and disaster recovery.
Database administration
Which of the following functions develop integrity controls?
Database design
Which of the following functions do cost/benefit models?
Database planning
________ is the technique of hiding the internal implementation details of an object from its external view.
Encapsulation
All of the following are popular architectures for Master Data Management EXCEPT:
Normalization
Object-oriented model objects differ from E-R models because:
OO objects exhibits behavior.
Which of the following is a function or service provided by all instances of a class?
Operation
In the figure below, which of the following is true?
Students use various software tools for different courses.
Which of the following is a principal type of authorization table?
Subject
Which of the following is a basic method for single field transformation?
Table lookup
______ includes concern about data quality issues.
Veracity
According to the UML Notation Guide, overlapping means:
a descendant may be descended from more than one of the subclasses.
An open-source DBMS is:
a free source-code RBMS that provides the functionality of an SQL-compliant DBMS.
A(n) ________ defines the form or protocol of an operation, but not its implementation.
abstract operation
A characteristic of reconciled data that means the data reflect an enterprise-wide view is:
comprehensive
Allowing users to dive deeper into the view of data with online analytical processing (OLAP) is an important part of:
descriptive analytics.
The oldest form of analytics is:
descriptive analytics.
The goal of data mining related to analyzing data for unexpected relationships is:
exploratory
Datatype conflicts is an example of a(n) ________ reason for deteriorated data quality.
external data source
Getting poor data from a supplier is a(n) ________ reason for deteriorated data quality.
external data source
With ________, all of the actions of a transaction are either committed or not committed.
failure transparency
In a distributed database, a transaction that requires reference to data at one or more nonlocal sites is called a ________ transaction.
global
The step in which a distributed database decides the order in which to execute the distributed query is called:
global optimization.
The NoSQL model that is specifically designed to maintain information regarding the relationships (often real-world instances of entities) between data items is called a:
graph-oriented database.
A(n) ________ is submitted by a DBA to test the current performance of a database or predict the response time for queries.
heartbeat query
Data governance can be defined as:
high-level organizational groups and processes that oversee data stewardship
Data that are accurate, consistent, and available in a timely fashion are considered:
high-quality
The best place to improve data entry across all applications is:
in the database definitions
An audit trail of database changes is kept by a:
journalizing facility.
Big Data includes:
large volumes of data with many different data types that are processed at very high speeds.
Informational and operational data differ in all of the following ways EXCEPT:
level of detail
In the following diagram, ________ objects are present (i.e., :Registration).
link
With ________, users can act as if all the data were located at a single node.
location transparency
The extent of the database resource that is included with each lock is called the level of:
lock granularity.
Big data requires effectively processing:
many data types.
A ________ is the implementation of an operation.
method
Data replication allowing for each transition to proceed without coordination is called:
node decoupling
In the ________ approach, one consolidated record is maintained, and all applications draw on that one actual "golden" record.
persistent
An organization should have one data warehouse administrator for every:
100 gigabytes of data in the enterprise data warehouse.
Which of the following is NOT a component of a repository system architecture?
A data transformation process
In the figure below, what relationship is shown?
Aggregation
Which of the following is a type of network security?
Authentication of the client workstation
Which of the following supports a simple path to other databases, without the benefits of one logical database?
Gateways
________ is an important scripting language to help reduce the complexity of MapReduce.
Pig
The W3C standard for Web privacy is called:
Platform for Privacy Preferences.
______ means that the same operation can apply to two or more classes in different ways.
Polymorphism
________ is used to undo unwanted database changes.
Rollback
A transaction that terminates abnormally is called a(n) ________ transaction.
aborted
With ________, the database itself is lost, destroyed, or cannot be read.
database destruction
NoSQL focuses on:
flexibility.
A graph of instances that are compatible within a class diagram is called a(n):
object diagram.
Data quality ROI stands for:
risk of incarceration
NoSQL systems allow ________ by incorporating commodity servers that can be easily added to the architectural solution.
scaling out
In the figure below, which of the following is true?
A faculty may advise up to a maximum of 10 students.
An information repository supplies information:
to database management systems.
An integrated partition strategy is ________ to manage.
difficult
A ________ allows a single SQL statement to refer to tables in more than one remote DBMS.
distributed request
Big data:
does not require a strictly defined data model.
When online analytical processing (OLAP) studies last year's sales, this represents:
descriptive analytics.
______ ensures that a transaction is successfully completed or else it is aborted.
Commit protocol
Which of the following is true about horizontal partitioning?
Data can be stored to optimize local access.
Which of the following are business conditions that encourage the use of distributed databases?
Data communication reliability
A trigger can be used as a security measure in which of the following ways?
To cause special handling procedures to be executed
TQM stands for:
Total Quality Management
_______ generally processes the largest quantities of data.
Transaction processing
Which of the following is NOT an area of concern when trying to maintain a well-tuned database?
User interface design
________ are not used for querying and analyzing data stored in data warehouses
Word processing programs
Snapshot replication is most appropriate for:
a data warehouse application.
The process of transforming data from a detailed to a summary level is called:
aggregating