#4: Unit 11
When should NoSQL be used?
- Client wants 99.999% availability on a high traffic site. - Your data makes no sense in SQL, you find yourself doing multiple JOIN queries for accessing some piece of information. - You are breaking the relational model, you have CLOBs that store denormalized data and you generate external indexes to search that data.
Why has NoSQL become a popular solution for some organizations?
- Improved ability to keep data consistent - Faster access to data than relational database management systems (RDBMS) - More easily allows for data to be held across multiple servers
What are 2 motivations for using the NoSQL approach to storing data?
- Simplicity of design - Simpler "horizontal" scaling to clusters of machines (which is a problem for relational databases) - Finer control over availability.
What is eventual consistency? What databases provide this feature?
A consistency model, which is used in many large distributed databases. Such databases require that all changes to a replicated piece of data eventually reach all affected. MongoDB, CouchDB, Amazon SimpleDB, Amazon DynamoDB, Riak, DeeDS, ZATARA
What is an in-memory database?
A database management system that primarily relies on main memory for computer data storage.
What is Apache Hadoop?
A framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
What property requires that each transaction is "all or nothing"?
Atomicity
What are the four ACID guarantees for transactions in a database?
Atomicity Consistency Isolation Durability
What are different ways of organizing and/or grouping documents?
Collections Tags Non-visible metadata Directory hierarchies
Which ACID property ensures that any transaction will bring the database from one valid state to another? What does the consistency property insure?
Consistency Ensures that only valid data following all rules and constraints is written in the database.
What are three features of key value stores?
Consistency - all replications of a database have the same data Availability - a request sent receives always a response Partition Tolerance - the system resists when some replications encounter problems
What is {FirstName:"Bob", Address:"5 Oak St.", Hobby:"sailing"}?
Document
What database offers an API or query language that allows a user to retrieve based on content?
Document Store
What company originally developed the NoSQL database Apache Cassandra?
What is the name of a proprietary (closed-source) database created by Google that uses columns?
Google Apps
What databases store data as a collection of nodes, connected by edges?
Graph
What are the four major post-relational data models?
Graph Model Multivalue Model Object-oriented Database Models
What kind of database is designed for data whose elements are interconnected with an undetermined number of relations between them?
Graph database
What is HDFS?
Hadoop Distributed File System is a distributed file system that provides high-performance access to data across Hadoop clusters. Like other Hadoop-related technologies, HDFS has become a key tool for managing pools of big data and supporting big data analytics applications.
What is done to protect against loss of data in the case of a complete system crash of a main memory database system (MMDB)?
High availability implementations that rely on database replication, with automatic failover to an identical standby database in the event of primary database failure. To protect against loss of data in the case of a complete system crash, replication of an IMDB is normally used in addition to one or more of the mechanisms listed above.
What are the BASE guarantees?
If no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
What is JSON?
JSON is short for JavaScript Object Notation, and is a way to store information in an organized, easy-to-access manner. In a nutshell, it gives us a human-readable collection of data that we can access in a really logical manner.
What are the four types of NoSQL data models?
Key-Value Store Document-Based Store Column-Based Store Graph-Based
In a distributed data store, what has similar importance like a schema has in a relational database?
Keyspace
What type of database provides a mechanism for storage and retrieval of data that use less structured consistency models than traditional relational database?
NoSQL Database
Why did Google develop its own database?
Scalability and better control of performance
What are major problems/limitations with NoSQL databases?
Security Data consistency Lack of standardization Scalability Caveat
What does soft state mean in a NoSQL database?
Soft state is information (state) the user put into the system that will go away if the user doesn't maintain it. Stated another way, the information will expire unless it is refreshed.
What does the CAP theorem assert about a distributed, networked system?
States that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees
What is the purpose of the MAP function in the MapReduce Framework?
Takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs. The input and output types of the map can be (and often are) different from each other.
Designers of distributed data stores have increased what feature at the expense of consistency?
The ability of arbitrary querying
What is the three dimensional mapping in Google BigTable? What are the row key, column key and timestamp?
Timestamp Arbitrary string values
What is it called when changes to the database are recorded in a journal file to facilitate automatic recovery of an in-memory database?
Transaction logging
What are two primary methods to run a database on the cloud?
Virtual machine image Database as a service
When are main memory databases often used?
When time response is critical