Chapter 14 - Advanced SQL

Ace your homework & exams now with Quizwiz!

Hadoop Distributed File System

A highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds.

JSON (JavaScript Object Notation)

A human-readable text format for data interchange that defines attributes and values in a document. (Know the full term)

Scaling out

A method for dealing with data growth that involves distributing data storage structures across a cluster of commodity servers.

Scaling up

A method for dealing with data growth that involves migrating the same structure to more powerful systems.

Sentiment analysis

A method of text analysis that attempts to determine if a statement conveys a positive, negative, or neutral attitude.

NoSQL

A new generation of database management systems that is not based on the traditional relational database model.

Row-centric storage

A physical data storage technique in which data is stored in blocks, which hold data from all columns of a given set of rows.

Column-centric storage

A physical data storage technique in which data is stored in clocks, which hold data form a single solumn across many rows.

Data mining

A process that employs automated tools to analyze data in a data warehouse and other sources and to proactively identify possible relationships and anomalies.

Task tracker

A program in the MapReduce framework responsible to running amp and reduce tasks on a node.

Mapper

A program that performs a map function.

Traversal

A query in a graph database.

Visualization

The ability to graphically present data in such a way as to make it understandable to users.

Variability

The characteristic of Big Data for the same data values to vary in meaning over time.

Polygot persistence

The coexistence of a variety of data storage and data management technologies within an organization's infrastructure.

Value

The degree to which data can be analyzed to provide meaningful insights.

Reduce

The function in a MapReduce job that collects and summarizes the results of map function to produce a single result.

Map

The function in a MapReduce job that sorts and filters data into a set of key-value pairs as a subtask within a larger job.

Stream processing

The processing of data inputs in order to make decisions about which data to keep and which data to discard before storage.

Veracity

The trustworthiness of a set of data.

Explanatory analytics

Data analysis that provides ways to discover relationships, trends, and patterns, among data.

Predictive analytics

Data analytics that use advanced statistical and modeling techniques to predict future business outcomes with great accuracy.

Structured data

Data that conforms to a predefined data model.

Unstructured data

Data that does not conform to a predefined data model.

Graph database

A NoSQL database model based on graph theory that stores data on relationship-rich data as a collection of nodes and edges.

Column family database

A NoSQL database model that organizes data into key-value pairs, in which the value component is composed of a set of columns that vary by row.

Key-value database

A NoSQL database model that stores data as a collection of key-value pairs in which the value component is unintelligible to the DBMS.

Document database

A NoSQL database model that stores data in key-value pairs in which the values component is composed of a tab-encoded document.

Job tracker

A central control program used to accepts, distribute, monitor, and report on MapReduce processing jobs in a Hadoop environment

Velocity

A characteristic of Bid Data that describes the speed at which data enters the system and must be processed.

Volume

A characteristic of Bid data that describes the quantity of data to be stored.

Variety

A characteristic of Big Data that describes the variations in the structure of data to be stored.

BSON (Binary JSON)

A computer-readable format for data interchange that expands the JSON format to include additional data types including binary objects.

Batch processing

A data processing method that runs data processing tasks from beginning to end without any user interaction.

NewSQL

A database model that attempts to provide ACID-compliant transactions across a highly distributed infrastructure.

Data analytics

A subset of business intelligence functionality that encompasses a wide range of mathematical, statistical, and modeling techniques with the purpose of extracting knowledge from data.

MapReduce

An open-source application programming interface (API) that provides fast data analytics services; on of the main Big Data technologies that allows organizations to process massive data stores.

Column family

In a column family database, a collection of columns or super columns related to a collection of rows.

Super column

In a column family database, a column that is composed of a group of other related columns.

Properties

In a graph database, the attributes or characteristics of a node or edge that are of interest to the users.

Edge

In a graph database, the representation of a relationship between nodes.

Node

In a graph database, the representation of a single entity instance.

Block report

In the Hadoop Distributed File System (HDFS), a report sent every 6 hours by the data node to the name node informing the name node which blocks are on that data node.

Heartbeat

In the Hadoop Distributed File System (HDFS), a signal sent every 3 seconds from the data node to the name node to notify the name node that the data node is still available.

Bucket

Ina key-value database, a logical collection of related key-value pairs.

Algorithm

a process of set of operations in a calculation.

Reducer

a program that performs a reduce function.

Feedback loop processing

analyzing stored data to produce actionable results.


Related study sets

Lesson 20: Personal Finance and Investments

View Set

Perioperative Questions from CoursePoint Set #1

View Set