BCIS 4620 final pt3 draft
When using MapReduce, best practices suggest that the number of mappers on a given node should be ______.
100 or less
When using a HDFS, a heartbeat is sent every ______ to notify the name node that the data mode is still available.
3 seconds
______ processing occurs when a program runs from beginning to end without any user interaction.
Batch
Tools focus on the strategic and tactical use of information.
Business intelligence
Which of the following is NOT one of the standard NoSQL categories?
Chart databases
The interactive, declarative query language in Neo4j is called ______.
Cypher
______ is used to extract knowledge from sources of data—NoSQL databases, Hadoop data stores, and data warehouses—to provide decision support to all organizational users.
Data Analytics
__ can serve as a test vehicle for companies exploring the potential benefits of data warehouses.
Data marts
__ are in charge of presenting data to the end user in a variety of ways.
Data visualization tools
______ do not store relationships as perceived in the relational model and generally have no support for join operations.
Document databases
(T F) For a data set to be considered Big Data, it must display only one of the 3 Vs (volume, velocity and variety).
False
(T F) Hive is a good choice for jobs that require a small subset of data to be returned very quickly.
False
(T F) The ability to graphically present data in a way that makes it understandable is the concept of value.
False
(T F) A block report is used to let the name node know that the data mode is still available.
False
A data store is used by data analyst to create queries that access the database.
False
A data warehouse designer must define common business dimensions that will be used by a data analyst to narrow a search, group information, or describe attributes.
False
By default, the fact table's primary key is always formed by combining the superkeys pointing to the dimension tables to which they are related.
False
Data warehouse data are organized and summarized by table, such as CUSTOMER and ADDRESS
False
Master data management's main goal is to provide a partial and segmented definition of all data within an organization
False
Normalizing fact tables improves data access performance and saves data storage space.
False
Operational data and decision support data serve the same purpose.
False
Queries against operational data typically are broad in scope and high in complexity.
False
Relational data warehouses use multidimensional data schema support to handle multidimensional data
False
The CUBE extension enable you to get a grand total for each column listed in the expression
False
The ROLLUP extension is used with the GROUP BY clause to generate aggregates by the listed columns, including the last one.
False
______ refers to the analysis of the data to produce actionable results.
Feedback loop processing
______ was the first SQL on Hadoop application.
Impala
______ is a human-readable text format for data interchange that defines attributes and values in a document.
JavaScript object-notation (JSON)
Which of the following is a personal analytics vendor for BI applications?
MicroStrategy
_____ splits a table into subsets of rows or columns and places the subsets close to the client computer to improve data access time.
Partitioning
______ is the coexistence of a variety of data storage and data management technologies within an organization's infrastructure.
Polyglot persistence
__ provide a unified, single point of entry for information Distribution.
Portals
__ extends SQL so that it can differentiate between access requirements for data warehouse data and operational data.
ROLAP
______ minimizes the number of disk reads necessary to retrieve a row of data.
Row-centric storage
______ is a tool for converting data back and forth between a relational database and the HDFS.
Sqoop
Within Hadoop, can transfer data in both directions - into and out of HDFS.
Sqoop
______ focuses on filtering data as it enters the system to determine which data to keep and which to discard.
Stream processing
(T F) A column family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component.
True
(T F) Big Data processing imposes a structure on the data as needed for applications as a part of retrieval and processing.
True
(T F) Characteristics that are important in working with data in the relational database model also apply to Big Data.
True
(T F) Hadoop is a database that has become the de facto standard for most Big Data storage and processing.
True
(T F) In many ways, the issues associated with volume and velocity are the same.
True
(T F) Interest in graph databases can be tied to the area of social networks.
True
(T F) The analysis of data to produce actionable results is feedback loop processing.
True
(T F) The name, MongoDB, comes from the word humongous as its developers intended their new product to support extremely large data sets.
True
(T F) Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues.
True
A star schema is designed to optimize data query operations rather than data update operations.
True
Advanced OLAP feature become more useful when access to them is kept simple.
True
Business Intelligence (BI) architecture is composed of data, people, processes, technology, and the management of such components.
True
Business intelligence is a framework that allows a business transform data into information, information into knowledge, and knowledge into wisdom.
True
Decision support data are a a snapshot of the operational data at a given point in time.
True
Multidimensional data analysis techniques include advanced advanced computational functions.
True
Periodicity, usually expressed as current year only, previous years, or all years, provides information about the time span of the data stored in a table.
True
ROLAP and MOLAP vendors are working toward the integration of their respective solutions within a unified decision support framework.
True
The data warehouse development life cycle differs from classical systems development
True
The data warehouse development life cycle differs from classical systems development.
True
To provide better performance, some OLAP systems merge data warehouse and data mart approaches by storing small extracts of the data warehouse at end-user workstations.
True
In the context of Big Data, ______ relates to changes in meaning.
Variability
______ is the Big Data 3 V that relates to the speed at which data is entering the system.
Velocity
In the context of Big Data, ______ refers to the trustworthiness of a set of data.
Veracity
Data collected or aggregated around a central topic or entity is said to be ______ aware.
aggregate
The attribute hierarchy provides a top-down data organization that is used for two main purposes:_____ and drill-down/roll-up data analysis.
aggregation
A __ index is based on 0 and 1 bits to represent a given condition.
bitmapped
Big Data ______.
captures data in whatever format it naturally exists
When using a HDFS, the ______ node creates new files by communicating with the ______ node.
client; name
Scaling out is also referred to as _______.
clustering
Document databases group documents into logical groups called ______.
collections
Conceptually, MDBMS end users visualize the stored data as a three-dimensional cube known as a __.
data cube
A __ is optimized for decision support and is generally represented by a data warehouse or a data mart.
data store
Bill Inmon and Chuck Kelley created a set of 12 rules to define a(n) __ .
data warehouse
In business intelligence framework, data are captured from a production system and placed in the____ on a near real- time basis.
data warehouse
The __ schema must support complex (non-normalized) data representations.
decision support database
From a data analyst's point of view, decision support data differ from operational data in three main areas: time span, granularity, and __.
dimensionality
The basic star schema has four components: facts, __ , attributes, and attribute hierarchies.
dimensions
Decision support data tend to be non-normalized, __ , and pre-aggregated.
duplicated
Most organizations that use Hadoop also use a set of other related products that interact and complement each other to produce an entire ______ of applications and tools.
ecosystem
To query the value component of the pair when using a key-value database, use get or ______.
fetch
In MongoDB, ______ method retrieves objects from a collection that match the restrictions provided.
find()
Fact and dimension tables are related by __ keys.
foreign
Neo4j is a ______ database.
graph
Data __ implies that all business entities, data elements, data characteristics, and business metrics are described in the same way throughout the enterprise.
integration
______ databases simply store data with no attempt to understand the contents of the value component or its meaning.
key-value (KV)
A(n) ______ is a tag that is used to associate a collection of nodes as being of the same type or belonging to the same group.
label
In star schema representation, a fact table is related to each dimension table in a relationship.
many-to-one (M:1)
When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs.
map
A _____ is a dynamic table that not only contains the SQL query command to generate the rows, but also stores the actual rows.
materialized view
A multidimensional database management systems (MDBMS) uses __ proprietary techniques to store data in n-dimensional arrays
matrix-like
A ______ is a programmed function within an object used to manipulate the data in that same object.
method
Computed or derived facts, at run time, are sometimes called to differentiate them from stored facts.
metrics
Graph theory is a mathematical and computer science field that models relationships, or edges, between objects called ______.
nodes
The reliance on __ as the design methodology for relational databases is seen as a stumbling block to its use in OLAP systems.
normalization
In MongoDB, the ______ method is used to improve the readability of retrieved documents through the use of line breaks and indention.
pretty()
A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude is referred to as ______ analysis.
sentimental
A __ schema is a type of star schema in which dimension tables can have their own dimension tables.
snowflake
In a column family database, a column that is composed of a group of other related columns is called a(n) ______.
super column
By default, Hadoop uses a replication factor of ______.
three
Operational data are commonly stored in many tables, and the stored data represent information about a given __ only.
transaction
(T F) Most NoSQL products run only in a Linux or Unix environment.
true
In a star schema, attributes are often used to search, filter, or classify .
true
In a typical star schema, each dimension record is related to thousands of records.
true