Introduction to NoSQL Databases

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Graph Databases

Web of information whose nodes are very small (maybe nothing more than a name) but there is a rich structure of interconnections between them. Can ask questions such as " find the books in the Databases category that are written by someone whom a friend of mine likes." Graph databases specialize in capturing this sort of information—but on a much larger scale than a readable diagram could capture. This is ideal for capturing any data consisting of complex relationships such as social networks, product preferences, or eligibility rules.

Key-Value Data Stores

Example: SimpleDB Based on Amazon's Single Storage Service (S3) items (represent objects) having one or more pairs (name, value), where name denotes an attribute. An attribute can have multiple values. items are combined into domains

Document vs Key/Value Databases

A document database is, at its core, a key/value store with one major exception. Instead of just storing any blob in it, a document db requires that the data will be store in a format that the database can understand (i.e. JSON, XML etc). In most doc DBs, that means that we can now allow queries on the document data

What is a Graph Database?

A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. Every element contains a direct pointer to its adjacent elements and no index lookups are necessary.

Example of Graph Database

Data model: nodes and edges Nodes may have properties (including ID) Edges may have labels or roles

Columnar Databases

In Relational databases data is stored in the disk row by row. Where in Columnar databases data is stored in the disk column by column

Advantages of NoSQL Databases

Massive write performance. Fast key value look ups. Flexible schema and data types. No single point of failure. Fast prototyping and development. Out of the box scalability (particularly Horizontal scalability).

Column Store Comments

More efficient than row (or document) Database format if: Multiple row/record/documents are inserted at the same time so updates of column blocks can be aggregated Retrievals access only some of the columns in a row/record/document

Why Columnar Databases?

-Can be significantly faster than row stores for some applications Fetch only required columns for a query Better cache effects Better compression (similar attribute values within a column) -But can be slower for other applications OLTP with many row inserts, ..

Column-oriented Database

-Data is stored in column order -Allows key-value pairs to be stored (and retrieved on key) in a massively parallel system data model: families of attributes defined in a schema, new attributes can be added storing principle: big hashed distributed tables properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application

NoSQL characteristics

-Does not use SQL as its query language NoSQL databases are not primarily built on tables, and generally do not use SQL for data manipulation (but some have SQL like interfaces -May not give full ACID guarantees Usually only eventual consistency is guaranteed or transactions limited to single data items -Distributed, fault-tolerant architecture Data are partitioned and held on large number of servers, and is replicated among these machines: horizontal scaling

NoSQL involves more Programming and Less Database Design -Alternative to traditional relational DBMS

-Flexible schema -Quicker/cheaper to set up -Massive scalability -Relaxed consistency higher performance & availability -No declarative query language (SQL) more programming -Relaxed consistency fewer guarantees

Horizontal scalability

-Scale-out, Adding servers to existing system with little effort, aka Elastically scalable -Bugs, hardware errors, things fail all the time. -It should become cheaper. Cost efficiency. -Shared nothing -Use of commodity/cheap hardware -Heterogeneous systems

Why NoSQL databases?

-The (exponential growth) of the volume of data generated by users and systems -The (increasing interdependency) and complexity of data, accelerated by the Internet, Web 2.0, social networks NoSQL databases are useful when working with a huge quantity of data and the data's nature does not require a relational model for the data structure

Graph Databases Pros and Cons

Graph databases are a good solution when: Data is highly linked to other items in the database - Including situations where nodes may be linked to many other nodes. The relationships between data are important to using the database. For example, when there is a need to compute the shortest path between two objects or centrality measures. However, unlike many other NoSQL database types, they do not always provide performance gains over relational databases Use should be balanced against how important the relationships are to the project. Querying the database can also be more complicated than with other NoSQL databases. Example include Neo4j, Titan

CouchDB Data Model (JSON)

IN CouchDB, no schema is enforced, so new document types with new meaning can be added alongside the old. A CouchDB document is an object that consists of named fields. Field values may be: strings, numbers, dates etc.

MongoDB

Data are organized in collections. A collection stores a set of documents. Collection like table and document like record -but: each document can have a different set of attributes even in the same collection -Semi-structured schema! Only requirement: every document should have an "_id" field (Key)

Key-Value NoSQL Databases

Extremely simple interface Data model: (key, value) pairs Value can be complex structure Example Operations: Insert(key, value), Fetch(key), Update(key), Delete(key) Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, "eventual consistency"

Eventual Consistency

Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent Conflict resolution: -Read repair: The correction is done when a read finds an inconsistency. This slows down the read operation. -Write repair: The correction takes place during a write operation, if an inconsistency has been found, slowing down the write operation. -Asynchronous repair: The correction is not part of a read or write operation.

Categories of NoSQL storages

Key-Value SimpleDB, BigTable Column Family Cassandra Document MongoDB, CouchBase, Graph Neo4j, Titan

Columnar Database Weaknesses

There are problems for which columnar databases not good solution for applications that require ACID transactions for writes and reads. If you need the database to aggregate the data using queries (such as SUM or AVG), you have to do this on the client side using data retrieved by the client from all the rows.

BASE Transactions Characteristics

Weak consistency - stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) Simpler and faster

Relational (Row Store) and (Columnar) Columnar Store

Relational (Row Store) -Pro: Easy to add/modify a record -Con: Might read in unnecessary data Columnar (Column Store) -Pro: Only need to read in relevant data -Con: Might read in unnecessary data So columnar stores are suitable for read-mostly, read-intensive, large data repositories

NoSQL Categories

Relational data model is best visualized as a set of tables, rather like a page of a spreadsheet. -NoSQL is a move away from the relational model. Four categories widely referred to as NoSQL: - Key-value - Columnar - Document - Graph.

Cassandra NoSQL Columnar Database

Some Characteristics: Column family: structure containing an unlimited number of rows Column: a tuple with name, value and time stamp Key: name of record Super column: contains more columns

CAP Theorem

States that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: -Consistency All nodes see the same data at the same time -Availability A guarantee that every request receives a response about whether it was successful or failed -Partition tolerance The system continues to operate despite arbitrary message loss or failure of part of the system

ACID vs BASE

ACID: Strong consistency. Less availability. Pessimistic concurrency. Complex. BASE: Availability is the most important thing. Willing to sacrifice for this (CAP). Weaker consistency (Eventual). Best effort. Simple and fast. Optimistic.

Vertical scalability

Also called scaling up Increasing server capacity. Adding more CPU, RAM. Managing is hard. Possible down times

Why use this type of Structure?

An order makes a good aggregate when a customer is making and reviewing orders, and when the retailer is processing orders. However, if a retailer wants to analyze its product sales over the last few months, then an order aggregate becomes a trouble. To get to product sales history, you'll have to dig into every aggregate in the database. So an aggregate structure may help with some data interactions but be an obstacle for others. An aggregate-ignorant model allows you to easily look at the data in different ways, so it is a better choice when you don't have a primary structure for manipulating your data. This approach is very efficient when with running on a cluster on nodes However, it does not support ACID Transactions

Transactions - ACID Properties (Relational Databases do this very well)

Atomic - All of the work in a transaction completes (commit) or none of it completes Consistent - A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints. Isolated - The results of any changes made during a transaction are not visible until the transaction has committed. Durable - The results of a committed transaction survive failures

What is BASE?

BASE is an alternative to ACID -Basically Available -Soft state -Eventual consistency Weak consistency Availability first Approximate answers Faster

Document Databases

Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) Also Fetch based on document contents Example systems MongoDB, SimpleDB

Relational (Row Store) and (Columnar) Column Store

Many queries does not process all the attributes of a particular relation. For example the query: Select name and address From Customer Where region='New York' Only process three attributes of the table Customer. But the customer table can have more than three attributes. Column-stores are more I/O efficient for read-only queries as they read, only those attributes which are accessed by a query.

RDBMS will start to add NoSQL features

Next generation RDBMS will evolve to be much more scalable and elastic "NewSQL" databases will be designed to scale out horizontally on shared nothing machines, but: still provide ACID guarantees, applications interact with the database primarily using SQL, the system provides higher performance than available from the traditional systems. Examples: MySQL Cluster (most mature solution), VoltDB, Clustrix, ScalArc, ...

What is NOSQL?

NoSQL is a class of database management system identified by its non-adherence to the widely used relational database management system (RDBMS) model with its structured query language (SQL). NOSQL has evolved to mean "Not Only" SQL NOSQL has become prominent with the advent of web scale data and systems created by Google, Facebook, Amazon, Twitter and others to manage data for which SQL was not the best fit.

NOSQL Landscape can be confusing

NoSQL marketing is confusing ... Everything does everything and at a small scale everything works "If you're evaluating Mongo Vs Riak or Couch vs Cassandra you don't understand either your problem or the technologies"

RDBMS: Pros and Cons

Pros: Many programmers are already familiar with it. Transactions and ACID make development easy. Lots of tools to use. Rigorous Design (Schema) Cons: Impedance mismatch. Object Relational Mapping doesn't work quite well. Rigid schema design. Harder to scale. Replication. Joins across multiple nodes? Hard. How does RDMS handle data growth? (e.g. scaling) Hard. Need for a DBA. SQL and NoSQL have different strengths and weaknesses

Mongodb NoSQL Example

{ "_id":ObjectId("6ffa3loe7d284dad101e4bc9"), "Last Name": " Wilson", "First Name": " Joan", "Date of Birth": "04-8-87" }, { "_id": ObjectId("6fefa3l097d284son101e4bc7"), "Last Name": "Shaw", "First Name": "Bill", "Date of Birth": "04-28-1970", "Address": "23 Plum Street", "City": "San Francisco" }


Kaugnay na mga set ng pag-aaral

questions that could possibly be written from the lab practical quizzes

View Set

Property And Casualty Chapter 9 Exam

View Set

Chapter 1 Study Guide from Practice Questions

View Set

international business and law (ba 3304)

View Set

Ch 30 Respiratory Tract Infections and Childhood Disorders

View Set