DB Ch10-14
The process of transforming data from a detailed to a summary level is called:
aggregating.
A distributed database must:
all of the above
A model of a real world application and associated properties are created in the:
analysis phase
Replication should be used when:
no or few triggers
An organization using HDFS realizes that hardware failure is a(n):
norm
According to your text, NoSQL stands for:
not only sql
A graph of instances that are compatible within a class diagram is called a(n):
object diagram
Data replication allowing for each transition to proceed without coordination is called:
node decoupling
NoSQL focuses on:
flexibility
An organization should have one data warehouse administrator for every:
100 gigs of data in warehouse
Hive is a(n) ________ data warehouse software.
Apache
Which of the following are key steps in a data quality program?
Apply TQM principles and practices
A trigger can be used as a security measure in which of the following ways?
Cause handling procedures to be executed
Which of the following functions model business rules?
DB analysis
The best place to improve data entry across all applications is:
In the database definitions
Which type of index is commonly used in data warehousing environments?
Join and bit-mapped index
________ tools commonly load data into intermediate hypercube structures.
MOLAP
The methods to ensure the quality of data across various subject areas are called:
Master Data Management
An organization that decides to adopt the most popular NoSQL database management system would select:
MongoDB
An organization that requires a graph database that is highly scalable would select the ________ database management system.
Neo4j
TQM stands for:
Total Quality Management.
Which type of operation has side effects?
Update
A design goal for distributed databases that states that although a distributed database runs many transactions, it appears that a given transaction is the only one in the system is called:
concurrency transparency
All of the following are tasks of data cleansing EXCEPT:
creating foreign keys.
Snapshot replication is most appropriate for:
data warehouse
In the ________ approach, one consolidated record is maintained, and all applications draw on that one actual "golden" record.
federated
Loading data into a data warehouse does NOT involve:
formatting the hard drive.
An audit trail of database changes is kept by a:
journalizing facility
Which of the following factors in deciding on database distribution strategies is related to autonomy of organizational units?
organizational forces
NoSQL systems enable automated ________ to allow distribution of the data among multiple nodes to allow servers to operate independently on the data located on it.
sharding
External data sources present problems for data quality because:
there is a lack of control over data quality.
Quality data can be defined as being:
unique.
________ includes NoSQL accommodation of various data types.
variety
Although volume, variety, and velocity are considered the initial three v dimensions, two additional Vs of big data were added and include:
veracity and value
The three 'v's commonly associated with big data include:
volume, variety, and velocity.
Apache Cassandra is a leading producer of ________ NoSQL database management systems.
wide-column
The NoSQL model that incorporates 'column families' is called a:
wide-column
Which of the following functions develop integrity controls?
DB design
Which of the following functions do cost/benefit models?
DB planning
Including data capture controls (i.e., dropdown lists) helps reduce ________ deteriorated data problems
Data entry
________ duplicates data across databases.
Data propagation
All of the following are popular architectures for Master Data Management EXCEPT
Normalization
The W3C standard for Web privacy is called:
Platform for privacy preferences
An organization that requires a sole focus on performance with the ability for keys to include strings, hashes, lists, and sorted sets would select ________ database management system.
Redis
________ is the most popular key-value store NoSQL database management system.
Redis
Data may be loaded from the staging area into the warehouse by following:
SQL Commands (Insert/Update).
Research shows that if an online customer does not get the service he or she expects within a few ________, the customer will switch to a competitor.
Seconds
________ are examples of Business Intelligences and Analytics 3.0 because they have millions of observations per second.
Smartphones
Which of the following is a principal type of authorization table?
Subject
Which of the following is a basic method for single field transformation?
Table lookup
Which of the following is true of distributed databases?
better local control
One way to improve the data capture process is to
check entered data immediately for quality against data in the database
A diagram that shows the static structure of an object-oriented model is called a(n):
class diagram
First degree or complete price discrimination relates to:
company charges max what they are willing to pay
A part object which belongs to only one whole object and which lives and dies with the whole object is called a:
composition
A characteristic of reconciled data that means the data reflect an enterprise-wide view is
comprehensive
Data quality problems can cascade when:
data are copied from legacy systems.
Conformance means that
data are stored, exchanged or presented in a format that is specified by its metadata.
Which of the following is true about horizontal partitioning?
data can be stored to optimize
The Hadoop Distributed File System (HDFS) is the foundation of a ________ infrastructure of Hadoop.
data management
All of the following are ways to consolidate data EXCEPT:
data rollup and integration
A technique using pattern recognition to upgrade the quality of raw data is called:
data scrubbing
A technique using artificial intelligence to upgrade the quality of raw data is called
data scrubbing.
With HDFS it is less expensive to move the execution of computation to data than to move the:
data to computation
Which of the following is NOT a component of a repository system architecture?
data transformation
Converting data from the format of its source to the format of its destination is called:
data transformation.
The role of a ________ emphasizes integration and coordination of metadata across many data sources.
data warehouse admin
________ is an application that can effectively employ snapshot replication in a distributed environment.
data warehousing
The oldest form of analytics is:
descriptive
When online analytical processing (OLAP) studies last year's sales, this represents:
descriptive (past)
Allowing users to dive deeper into the view of data with online analytical processing (OLAP) is an important part of:
descriptive analytics
A researcher trying to explain why sales of garden supplies in Hawaii have decreased would be an example of ________ data mining.
explanatory
The goal of data mining related to analyzing data for unexpected relationships is:
exploratory
Datatype conflicts is an example of a(n) ________ reason for deteriorated data quality
external data source
Getting poor data from a supplier is a(n) ________ reason for deteriorated data quality
external data source.
Which of the following is true of data replication?
fast response, node decoupling, addnl storage
The step in which a distributed database decides the order in which to execute the distributed query is called:
global optimization
The NoSQL model that is specifically designed to maintain information regarding the relationships (often real-world instances of entities) between data items is called a:
graph-oriented database.
Data governance can be defined as:
high-level organizational groups and processes that oversee data stewardship
Data that are accurate, consistent, and available in a timely fashion are considered:
high-quality.
A method of capturing only the changes that have occurred in the source data since the last capture is called ________ extract.
incremental
The process of combining data from various sources into a single table or view is called:
joining.
The NoSQL model that includes a simple pair of a key and an associated collection of values is called a:
key-value score
An optimization strategy that allows sites that can update to proceed and other sites to catch up is called:
lazy commit
Informational and operational data differ in all of the following ways EXCEPT:
level of detail.
Big data requires effectively processing:
many data types
The Hadoop framework consists of the ________ algorithm to solve large scale problems.
mapreduce
Object-oriented model objects differ from E-R models because:
objects vs relations
The process of replacing a method inherited from a superclass by a more specific implementation of the method in a subclass is called:
overriding
In the ________ approach, one consolidated record is maintained from which all applications draw data.
persistent
________ is arguably the most common concern by individuals regarding big data analytics.
personal privacy
________ means that the same operation can apply to two or more classes in different ways.
polymorphism
Application of statistical and computational methods to predict data events is:
predictive analytics.
Descriptive, predictive, and ________ are the three main types of analytics.
prescriptive
When an organization must decide on optimization and simulation tools to make things happen it is using:
prescriptive analytics
Regarding big data value, the primary focus is on:
privacy
Data federation is a technique which
provides a virtual view of integrated data without actually creating one centralized database
Event-driven propagation:
pushes data to duplicate sites as an event occurs.
The major advantage of data propagation is:
real-time cascading of data changes throughout the organization.
An approach to filling a data warehouse that employs bulk rewriting of the target data periodically is called
refresh mode
An approach to filling a data warehouse that employs bulk rewriting of the target data periodically is called:
refresh mode
A design goal for distributed databases to allow programmers to treat a data item replicated at several sites as though it were at one site is called:
replicatoin transparency (sites)
Data quality ROI stands for:
return on investment
NoSQL systems allow ________ by incorporating commodity servers that can be easily added to the architectural solution.
scaling out
When reporting and analysis organization of the data is determined when the data is used is called a:
schema on read.
It is true that in an HDFS cluster the NameNode is the:
single master server
It is true that in an HDFS cluster the DataNodes are the:
slaves
One simple task of a data quality audit is to:
statistically profile all files
The end of an association where it connects to a class is called a(n):
terminator
Security measures for dynamic Web pages are different from static HTML pages because:
the connection requires full access to the database for dynamic pages
One characteristic of quality data which pertains to the expectation for the time between when data are expected and when they are available for use is:
timeliness
________ generally processes the largest quantities of data.
transaction processing
User interaction integration is achieved by creating fewer ________ that feed different systems
user interfaces