Database chapter 10
Data in MongoDB is represented in:
BSON.
Hive uses ________ to query data.
HiveQL
The Hadoop framework consists of the ________ algorithm to solve large scale problems.
MapReduce
HBASE is a wide-column store database that runs on top of HDFS (modeled after Google).
True
HP HAVEn integrates HP technologies with open source big data technologies.
True
NoSQL databases DO NOT support ACID (atomicity, consistency, isolation, and durability).
True
NoSQL stands for 'Not only SQL.'
True
The 'schema on read' approach often incorporates JSON or XML.
True
The original three 'v's' attributed to big data include volume, variety, and velocity.
True
NoSQL focuses on:
flexibility.
The NoSQL model that includes a simple pair of a key and an associated collection of values is called a:
key-value store.
It is true that in an HDFS cluster the DataNodes are the:
large number of slaves.
Big data includes:
large volumes of data with many different data types that are processed at very high speeds.
Big data requires effectively processing:
many data types.
Structured Query Language (SQL) is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful information.
False
MongoDB databases are composed of:
collections.
When a data repository (including internal and external data) does NOT follow a predefined schema, this is called a:
data lake.
The Hadoop Distributed File System (HDFS) is the foundation of a ________ infrastructure of Hadoop.
data management
With HDFS it is less expensive to move the execution of computation to data than to move the:
data to computation.
Big data:
does not require a strictly defined data model.
Value (related to the five 'v's' of big data) addresses the pursuit of a meaningful goal.
True
________ includes NoSQL accommodation of various data types.
Variety
The NoSQL model that is specifically designed to maintain information regarding the relationships (often real-world instances of entities) between data items is called a:
graph-oriented database.
NoSQL includes data storage and retrieval:
not based on the relational model.
Hive is a(n) ________ data warehouse software.
Apache
The dive in anywhere characteristic of a data lake overrides constraints related to confidentiality.
False
The schema on write and schema on read are considered synonymous approaches.
False
Transaction processing and management reporting tend to fit big data databases better than relational databases.
False
An organization that requires a sole focus on performance with the ability for keys to include strings, hashes, lists, and sorted sets would select ________ database management system.
Redis
________ is the most popular key-value store NoSQL database management system.
Redis
________ includes the value of speed in a NoSQL database.
Velocity
________ includes concern about data quality issues.
Veracity
________ is an important scripting language to help reduce the complexity of MapReduce.
Pig
Although volume, variety, and velocity are considered the initial three v dimensions, two additional Vs of big data were added and include:
veracity and value.
The three 'v's' commonly associated with big data include:
volume, variety, and velocity.
Big data allows for two different data types (text and numeric).
False
The target market for Hadoop is small to medium companies using local area networks.
False
Word processing documents are commonly stored in a 'document store' NoSQL database model.
False
Hive creates MapReduce jobs and executes them on a Hadoop Cluster.
True
NoSQL systems allow ________ by incorporating commodity servers that can be easily added to the architectural solution
scaling out
When reporting and analysis organization of the data is determined when the data is used is called a(n):
schema on read
Economies of storage indicate data storage costs increase every year.
False
Neo4j is a wide-column NoSQL database management system developed by Oracle.
False
NoSQL focuses on avoidance of replication and minimizing storage space.
False
An organization using HDFS realizes that hardware failure is a(n):
norm.
An organization that decides to adopt the most popular NoSQL database management system would select:
MongoDB.
An organization that requires a graph database that is highly scalable would select the ________ database management system.
Neo4j
According to your text, NoSQL stands for:
Not Only SQL.
________ generally processes the largest quantities of data.
Transaction processing
A business owner that needs carefully normalized tables would likely need a relational database instead of a NoSQL database.
True
Apache Cassandra is a wide-column NoSQL database management system.
True
Big data databases tend to sacrifice consistency for availability.
True
Collect everything is a characteristic of a data lake.
True
JSON is commonly used in conjunction with the 'document store' NoSQL database model.
True
At a basic level, analytics refers to:
analysis and interpretation of data.
NoSQL systems enable automated ________ to allow distribution of the data among multiple nodes to allow servers to operate independently on the data located on it.
sharding
It is true that in an HDFS cluster the NameNode is the:
single master server.
The primary use of Pig is to:
transform raw data into a format that is useful for analysis.
Apache Cassandra is a leading producer of ________ NoSQL database management systems.
wide-column
The NoSQL model that incorporates 'column families' is called a:
wide-column store.