IST 210 Databases Chapter: 14 Big Data Analytics and NoSQL
Graph Database
A NoSQL database model based on graph theory that stores data on relationship-rich data as a collection of nodes and edges.
Column Family Database
A NoSQL database model that organizes data into key-value pairs, in which the value component is composed of a set of columns that vary by row.
Key-Value Database (KV)
A NoSQL database model that stores data as a collection of key-value pairs in which that value component is unintelligible to DBMS.
Document Database
A NoSQL database model that stores data in key-value pairs in which the value component of a tag-encoded document.
Job Tracker
A central control program used to accept, distribute, monitor, and report on MapReduce processing jobs in a Hadoop environment.
Volume
A characteristic of Big Data that describes the quanity of data to be stored.
Velocity
A characteristic of Big Data that describes the speed at which data enters the system and must be processed.
Variety
A characteristic of Big Data that describes the variations in the structure of data to be stored.
Batch Processing
A data processing method that runs data processing tasks from beginning to end without any user interaction.
NewSQL
A database model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure.
Hadoop Distributed File System (HDFS)
A highly distributed, fault-tolerant file storage system designed to manage latge amounts of data at high speeds.
JSON (JavaScript Object Notation)
A human-readable text format for data interchange that defines attributes and values in a document.
BSON (Binary JSON)
A human-readable text format for data interchange that expands the JSON format to include addtional data types including binary objects
Scaling Out
A method for dealing with data growth that involves distributing data storage structures across a cluster of commodity servers.
Scaling Up
A method for dealing with data growth that involves migrating the same structure to move powerful systems.
Sentiment Analysis
A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude.
NoSQL
A new generation of database management systems that is not based on the traditional relational database model.
Column-Centric Storage
A physical data storage technique in which data is stored in blocks, which hold data from a single column across many rows.
Row-Centric Storage
A physical data storage technique in which data is stored in blocks, which hold data from all columns of a given set of rows.
Algorithm
A process or set of operations in a calculation.
Data Mining
A process that employs automated tools to analyze data in a data warehouse and other sources and to proactively identify possible relationships and anomalies.
Task Tracker
A program in the MapReduce framework responsible to running map and reduce tasks on a nide.
Mapper
A program that performs a map function.
Reducer
A program that performs a reduce function.
Traversal
A query graph database.
Data Analytics
A subset of business intelligence functionality that encompasses a wide range of mathematical, statistical, and modeling techniques with the purpose of extracting knowledge from data.
MapReduce
An open-source application programming interface (API) that provides fast data analytics services; one of the main Big Data technologies that allows organizations to process massive data stores.
Feedback Loop Processing
Analyzing stored data to produce actionable results.
Explanatory Analytics
Data analysis that provides ways to discover relationships, trends, and patterns among data.
Predictive Analytics
Data analytics that use advanced statistical and modeling techniques to predict future business outcomes with great accurancy.
Unstructured Data
Data that does not conform to predefined data model.
Structured Data
Data the conforms to a predefined data model.
Column Family
In a column family database, a column that is composed of a group of other related coloumns.
Super Column
In a column family database, a column that is composed of a group of other related columns.
Edge
In a graph database, the representation of a relationship between nodes.
Node
In a graph database, the representation of a single entity instanace.
Properties
In a group database, the attributes or characteristics of a node or edge that are of interest to the users.
Bucket
In a key-value database, a logical collection of related key-value pairs.
Block Report
In the Hadoop Distributed File System (HDFS), a report sent every 6 hours by the data node to the name informing the name node which blocks are on that data node.
Heartbeat
In the Hadoop Distributed File System (HDFS), a signal sent every 3 seconds from the data node to notify the name node that the data node is still available.
Visualization
The ability to graphically present data in such a way as to make it understandable to users.
Varability
The characteristic of Big Data for the same data values to vary in meaning over time.
Polyglot Persistence
The coexistence of a variety of data storage and data management technologies within an organization's infrastructure.
Value
The degree to which data can be analyed to provide meaningful insights.
Reduce
The function in a MapReduce job that collects and summarizes the results of map functions to produce a single result.
Map
The function in a MapReduce job that sorts and filters data into a set of key-value pairs as a subtask within a larger job.
Stream Processing
The processing of data inputs in order to make decisions about which data to keep and which data to discard before storage.
Veracity
The trustworthiness of a set of data.