Data Science - MongoDB Cartula Inceptivus & MongoDB Mini Project

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Split

- When data has exceeded its limits, MongoDB splits the key range into two different key ranges. - This is the way MongoDB ensures that there will be no huge chunks and the data will be moved. Consider when a chunk grows beyond 64 MB limit, it will be split into two 32 MB chunks.

By default, MongoDB automatically logs slow queries above _____ ms to the log.

100

Default Listening port for MongoDB is ____________. * not 8080

27017

Which of the following is not correct in MongoDB? db.customer.fine({"instocks" : this.numbers_buy}) (not this) db.customer.find() db.customer.find({"instocks" ; "this.numbers_buy"}) db.customer.find({})

?

Which tool can write data from a binary database dump created by mongodump to a MongoDB instance? mongorestore mongofiles (not this one) mongosupport all

?

Over Indexing

An index also takes up space and memory. More Indexes will be an overhead for write operations. Game rule: - Create schema based on how we can avoid more indexes. - There should be a balance between query and Indexes.

MongoDB architecture

Application Driver Databases Collections Documents Indexes Security Features Storage Engine

Workload Tuning

Approach to workload tuning: Repeat the below processes till performance is improved. - Find candidates with badstatements - for statements in badstatements - do an statements.explain() - Identify issues and rectify Work Load Tuning can be performed by using - Database Profiling - Explain

Balancer

Balancer acts as a background process that helps to manage chunk migrations. -This can be performed from any of the mongo instances in a cluster. -When the distribution of sharded collection in a cluster became uneven, balancer migrates chunks from the largest number of chunks to least number of chunks still the collection balances.

MongoDB is written in which language?

C++

document example

Consider the following example that contains two documents inside the same collection but has different schemas. The first document consists of fields "city" and "Date of joining" that are not present in the second document. The second document consists of fields "Occupation", "DOB" and "Salary" that are not present in the first one. db.employee.insert({Name: 'Abc12', City: 'Delhi', DOJ : '12/01/1980'}) db.employee.insert({Name: 'Abc123', Occupation: 'Software Engineer', DOB : '29/07/1988',Salary:'250000'})

User Management

Creating a user Updating a user Get all user info Removing user

Fail Over

During a fail-over, an election establishes for primary, and a new primary node will be elected. Once the failed node gets recovered, it again joins back the replica set and works as a secondary node.

Enabling Profiling

Enable Profiling for an entire mongod Instance Example: Consider the below example. This will set profiling level to 1, define slow operations that last longer than 25 milliseconds, and specify only 10% of slow operations should get profiled. mongod --profile 1 --slowms 25 --slowOpSampleRate 0.1

In MongoDB, which is the storage measure used by master's oplog?

GB

Roles

Grant roles to a user Revoke roles from the user

Schema Design - Read Ratio/Write Ratio

Identify the business needs and design the schema for read-heavy or write-heavy. Read-heavy When an application is read-heavy, we design the schema that minimizes the number of reads from MongoDB. Write-heavy When an application is write-heavy, we must ensure that the schema designed should maximize MongoDB write throughput.

JSON

JavaScript Object Notation alternative to XML, primarily to transmit data. has keys/values

storage engines (supported by mongoDB)

MMAPv1: Default Storage engine till MongoDB version 3.2. WiredTiger: Default storage engine starting from MongoDB 3.2. In-Memory Storage Engine: This storage engine will be available in Enterprise version. It retains documents in-memory.

Operational Factors and Data Models

Modeling MongoDB depends on both: -Data and -Characteristics and features of MongoDB Along with Schema design, following factors should also be taken into consideration. - Document Growth - Atomicity - Storage - Indexing - Sharding

Hash Based Partitioning

MongoDB creates a hash of a field's value and uses these hashes to create chunks. This partition ensures a random distribution of a collection in the cluster by keeping two documents with close shard key values to be part of the same chunk.

data types

ObjectID double string date (Date(), newDate(), ISODate()) integer boolean timestamp (can be used to track records inserted and modified) null (stores null values) array (holds int, date, double, string) regularExpression

From the following ____ is not a NoSQL database .

Oracle

Avoid Application Joins

Server-side joins are not supported in MongoDB. Performance can degrade when you are pulling back and joining a lot of data. When there is a need of so many joins, it advisable to de-normalize the schema.

The ____________ command returns a document that provides an overview of the database's state.

ServerStatus()

How MongoDB Query Plan Works?

Summary of working query plan: - If there are no matching entries, the query planner will generate candidate plans. - If a matching entry exists on STEP 2, the query planner will generate a plan and evaluate its performance through a replanning mechanism. - On Step 3 Onwards, query planner will choose a winning plan, and creates a cache entry containing the winning plan, and uses this to generate the result documents.

Get User

The db.getUsers() method is used to return information for all users associated with a database. Example: db.getUser("mynewuser"); { "_id" : "test.mynewuser", "user" : "mynewuser", "db" : "test", "roles" : [ { "role" : "readWrite", "db" : "test" }, { "role" : "dbAdmin", "db" : "test" } ] } )

Avoid Growing Documents

When documents created are constantly growing in size, it can impact: - Disk IO - Database performance * in such cases, perform document buckets and document pre-allocation.

capped collections

a fixed type collection that maintains insertion order once the specified size has arrived. acts as a circular queue. can restrain/limit the size of collection db.createCollection(<CollectionName>, {capped: <true/false>, size: Number, max:number }) 2 types: fs.files : stores metadata fs.chunks : store the file chunks ex: restrict data not to go more than four documents: db.createCollection("LogUsers", {capped : true,size : 100, max :4})

key

a key is a string enclosed with quotation marks { 'key' : value, 'key' : value, }

skip()

accepts number type argument, which can be used to skip the doc from the collection { $skip: <positive int> } ex: db.art.aggregate([{ $skip : 5 }])

In MongoDB, identify the command that adds an shard with a sharded Cluster.

addShard

Choose the operations that can be captured by database profiler?

all; read ops write ops cursor ops

security features of mongoDB

authentication authorization encryption on data auditing hardening (ensure only trusted hosts have access)

Which collection maintains insertion order and behaves like a circular queue once specified size has reached?

capped collection

Which is the data type used by capped Collection?

circular queue

drivers

client libraries that provide interfaces and methods for apps to interact with MongoDB database handles the translation of documents between BSON objects and mapping structures some drivers supported include: c++ java .NET ruby javaScript node.js python perl php scala

Revoke Role from User

db.revokeRolesFromUser() removes one or more roles from a user on the current database db.revokeRolesFromUser( "<username>", [ <roles> ], { <writeConcern> } ) Example: The following command will remove - read role on the film database - readWrite role on the Books database use Books db.revokeRolesFromUser( "accountUser01", [ { role: "read", db: "film" }, "readWrite" ], { w: "majority" } )

create a source list file for mongoDB

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.6 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.6.list

Which model is used when there contains relationships between entities?

embedded model

index sorting - ascending/descending

ex: - with a collection of 'Player' containing score and location db.Player.insert( { "_id": "1", "score": 10340, "location": {state: "NSW", city: "Sydney" } }) //create index on score where: // 1 for index indicates scores in asc order // -1 for index for scores in desc order db.Player.createIndex( {score: 1 } ) //the above will make the below query faster db.Player.find( { score: 10340 } ) db.Player.find( { score: { $gt: 10000 } } )

insert() & save()

inserts a document(s) into a collection since version 3.2, you can use insertmany() to insert multiple documents db.Collection.insert(document) db.Collection.insertmany(document1, document2,...) inserting one record: db.topic.insert({ title : 'MongoDB', desc : 'MongoDB is document store', tags : ['MongoDB', 'NoSQL database'], })

Which is the method used to check whether collection is capped or not?

isCapped()

From the following ...... is the simplest NoSQL databases

key-value or wide ?

Which of the following in MongoDB can limit the size of the result document for a query operation?

limit()

__________ is the method used to limit the records in MongoDB.

limit()

data import/export tools (nonbinary?)

mongoexport : exports mongoDb data to JSON, TSV, or CSV files syntax: mongoexport-d <database> -c <Collection name> -o <Output file name>.json example: mongoexport -d customer -c order -o student.json ================================================== mongoimport : imports JSON, TSV, or CSV data into mongodb db syntax: mongoimport -d<databasename> -c <collection name> --file<filename> example: mongoimport -d customers -c orders --file student.json

CRUD operations

projection: db.topic.find({},{"title":1,_id:0}) delete: db.topic.deleteOne({title : 'MongoDB'}) Insert records one by one to console: db.topic.insert({title: 'MongoDB', desc: 'MongoDB is document oriented database'}) db.topic.insert({title: 'Hbase', desc: 'Hbase is a column-oriented database'}) list of all collections: show collections read data from db: db.topic.find()

document-oriented databases

special type of NoSQL database used for: - storing - retrieving - managing semi structured data pair each key with a complex data structure commonly with a block of XML or JSON termed as a document examples: - mongodb - couchbase - orientDB - RavenDB features: flexible data modeling fast querying faster write performance

The chunk operations performed on background is __________.

splitting and balancing

Install MongoDB Package

sudo apt-get install -y mongodb-org

add official MongoDB repository to server

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 2930ADAE8CAF5059EE73BB4B58712A2291FA4AD5

Start MongoDB Service

sudo service mongod start

indexes

support fast and efficient execution of queries

Which of the following in MongoDB provides information for various lock types and lock modes held during the operation?

system.profile.locks

In MongoDB, ________ represents the number of times the operation had to wait for the acquireCount.

system.profile.locks.acquireWaitCount (not this)

Replica Set Data Synchronization

two forms of data synchronization: Initial sync to copy data to new members with the full data set, and replication for ongoing changes to the entire data set. -Initial Sync To copies data to new members with the full data set, MongoDB clones all databases except the local database. To perform Clones mongod will scan every collection in each source database and inserts all data into its copies of these collections. -Fault Tolerance: Recover when there are network or operation failures.

update

updates values in existing document updateOne() updateMany() replaceOne() db.Collection.update (Selecting Criteria, updating data) db.collectionname.updateOne(<filter Criteria>, <updating data>, <options>) db.collectionname.updateMany(<filter criteria>, <updating data>, <options>) db.collectionname.replaceOne(<filter criteria>, <replacement data>, <options>)

create a db

use <db name> ex: use TestDB

shards

useful for storing data. This will ensure high availability and data consistency.

database

viewed as a physical container of collections mongoDB server can have 1<= databases default db for mongodb is test. In the absence of any db, collections will be stored in the test database check db's in mongoDB server: show dbs

dynamic schema

implies that the documents stored in the db can have different fields, with different types for each field

Which of the following is not a stage in Pipeline aggregation?

$order

One-to-Many Relationships

- 1:N relationship expresses a relationship where one side can hold more than one relationship whereas the reverse relationship can only be single sided. Consider the following one-to-many relationship between user and address data, here; the user has multiple address entities. NORMALIZED DATA ---user { _id: "Jose", FullName: "Jose Varghese" } ---Addresses 1----- { user_id: "jose", House: "#4D Golden Flat", streetname: "William street", city: "BANGALORE", state: "KA", zip: "560060" } ---Addresses 2----- { user_id: "jose", House: "1 Some other house name", streetname: "Alexander street", city: "London", state: "NA", zip: "NA" } EMBEDDED DATA { _id: "Jose", name: "Jose Varghese", Addresses: [ { House: "#4D Golden Flat", streetname: "William street", city: "BANGALORE", state: "KA", zip: "560060" }, { House: "1 Some other house name", streetname: "Alexander street", city: "London", state: "NA", zip: "NA" } ] }

One-to-One Relationships Embedded Model

- The 1:1 relationship defines a relationship between two entities. - With referencing, the application needs to issue multiple queries to resolve the references. Below example illustrates the advantage of embedding over referencing. NORMALIZED DATA: Consider we have User and addresses documents ---User----- { _id: "Jose", FullName: "Jose Varghese" } ---Addresses---- { user_id: "jose", House: "#4D Golden Flat", streetname: "William street", city: "BANGALORE", state: "KA", zip: "560060" } EMBEDDED DATA Suppose address data is frequently retrieved with the Fullname. In such scenario, it's better to design data model to embed the address data in the user data as mentioned below. { _id: "Jose", FullName: "Jose Varghese", Address: { House: "#4D Golden Flat", streetname: "William street", city: "BANGALORE", state: "KA", zip: "560060" } }

Mongo Connectivity

- You will be unable to use JDBC API to interact with MongoDB from Java. - Instead Mongo Java Driver API could be used as a driver. In MongoDB, MongoClient class is used for connecting to a MongoDB server and perform database-related operations. MongoClient mongoClient = new MongoClient(); //In the above scenario, MongoClient instance connects to default MongoDBserver MongoClient mongoClient = new MongoClient("abcd.server.com", 27017); //In this scenario, MongoDB server listens to a specific port 27017. //Connect to a replica set of servers: List<ServerAddress> servername = new ArrayList<ServerAddress>(); servername.add(new ServerAddress( "ab.server.com", 27017)); servername.add(new ServerAddress( "abcd.server.com", 27019)); MongoClient mongoClient = new MongoClient(servername); example: public class SampleDBConnection { public static void main(String[] args) { try { /**** MongoDB Connection ****/ // uses MongoClient MongoClient mongodbClients = new MongoClient("localhost", 27017); /**** Get database ****/ List<String> databases = mongodbClients.getDatabaseNames(); for (String mdbNames : databases) { System.out.println("- Databases: " + mdbNames); DB mdb = mongoClient.getDB(mdbNames); /**** Get collection ****/ Set<String> collectionnames = mdb.getCollectionNames(); for (String colNames : collectionnames) { System.out.println("\t + Collection: " + colNames); } } mongodbClients.close(); } catch (UnknownHostException ex) { ex.printStackTrace(); }

compound indexes

- on these, there will be single index structure that holds references to multiple fields - mongoDb has to limit restriction of 31 fields for any compound index db.collectionname.createIndex( { <field1>: <type>, <field2>: <type2>, ... } ) ex: a collection, 'School' containing student documents - each doc consists of the full name of the student, subject studied and marks each scored in a particular subject try { db.school.insertMany( [ { "_id" : "1", "Fullname" : "Mridhula", "subject" : "Science", "score" : 680 }, {"_id" : "2", "Fullname" : "Mridhula", "subject" : "English", "score" : 770 } { "_id" :"3", "Fullname" : "Akhila", "subject" : "science", "score" : 670 }, { "_id" : "4", "Fullname" : "Akhila", "subject" : "English", "score" : 890 }, { "_id" :"5", "Fullname" : "Abhilasha", "subject" : "Science", "score" : 670 } ] ); } catch (e) { print (e); }; //now find details of students whose subjects is sports and score greater than 670 db.school.find( { score: { $gt: 670 }, subject: "sports" }) //create Compound indexes with any one option db.school.createIndex({ subject: 1, score: 1}) db.school.createIndex({ score: 1, subject: 1}) *performance will vary based on the 'order of fields' mentioned on compound index

Which of the following is correct to suppress a "Title" from the resultset? Title,it Title,0 Title,1 (not it) all

?

index properties

1. Unique Index helps MongoDB to reject duplicate values for the indexed field. 2. Sparse Index ensures that only index contains entries for documents that have the indexed field. Documents that do not possess the indexed field will be skipped. 3. TTL Index is used where there is a need for automatically removing documents from a collection after a certain amount of time. 4. Parse Index Index the documents in a collection when specified filter expression is met. It is a subset of the sparse index and offers lower storage requirements and reduced performance costs for index creation and maintenance.

Which indexes uses planar geometry when returning results?

2d indexes

Database Profiling

Database profiler helps to collect detailed information such as CRUD operations, configuration and administration commands executed against a running mongod instance. A profiler logs information such as read and write operations, cursor operations, and other database commands in system.profile collection. system.profile is a capped collection. - By default, the profiler is turned off. - The output of system.profile helps to find out what needs to tune from the query. Following are some of the profiling levels that are available: -0: This is the default profiler level. -1: collects data for operations which is longer than the value of slowms. - 2: collects data for all operations. Enable and Configure Database Profiling //You can pass the profiling level as a parameter and run db.setProfilingLevel() in the mongo shell. syntax: db.setProfilingLevel(<profiling Level>) Example 1: db.setProfilingLevel(2) example 2: //This will set the profiling level to 1 and sets the slow operation threshold for the mongod instance to 25 milliseconds:ie Report anything below 25 milliseconds db.setprofilingLevel(1,25) //To return only the profiling level syntax: db.getProfilingLevel() //check profiling level syntax: db.getProfilingStatus() //sample output looks like: { "was" : 0, "slowms" : 500, "sampleRate" : 5.0, "ok" : 1 } was: indicates current profiling level. slowms: Threshold in milliseconds sampleRate: percentage of slow operations that should be profiled

Find Long Running Queries

Finding and terminating long-running operations in MongoDB. db.currentOp() This method returns all in-progress operations on the database. After examination, you can terminate certain operation with the following command db.killOp(). Find the "opid" value and terminate the operation

Set Up a Replica Set

Following is the basic syntax for replica set. mongod --port "portname" --dbpath " Databasedatapath" --replSet "Replicasetname" Example: mongod --port 27018 --dbpath "C:\mongodb\data1" --replSet rs0 Mongo client will generate rs.initiate() command to initiate a new replica set. rs.conf() can be used to check the replica set configuration. rs.status() can be used to check the status of the replica set. db.isMaster() can be used to check connected node is primary or not.

Rebuild Indexes

If queries are running slow and structure of index is efficient, you can rebuild the index. This operation will drop all indexes on a collection and recreate them. This operation is found expensive for collections that contain a significant amount of data db.collectionname.reIndex() db.records.reIndex()

Chunk Operations

In MongoDB, all the documents with the key range is mentioned as Chunk. Chunk operations will be performed in the background are either - Splitting (or) - Balancing

Pre-Allocated Documents (MMAP)

Suppose schema grows to a known size, in such cases, you can avoid document moves by pre-allocating the maximum size of the document. Impact This makes all operations on the document to be in-place update.

Pre-Aggregated Data

Suppose, if there is a lot of aggregation of data in application queries, you must consider pre-aggregation of the data. Example: Consider you have a web application and want to know how many users view a page. Rather than summing up the number of views made for a particular page on request: - Provide an incremental view counter for that page, each time the page is viewed. - Calculate the number of page views, based on this counter.

MongoDB Mini Project - Importing Data to MongoDB

The Office of Foreign Labor Certification (OFLC) has conducted a survey on H-1B visas. As a part of the survey, they gathered information regarding employer details, agent details, workspace details, and case status of different employees. In this case study, you will be provided with a dataset that contains information collected from the survey. In the content change the three points to: - Importing the data to MongoDB - Perform query operations on it ==================================================== in this case study, a dataset (googleplaystore.csv) will be provided. You need to start querying the dataset from mongoDB 1. install mongoDB with 'install' 2. run the mongoDB : Run > Run *Now the environment is ready!!!! 3. Give the database name and collection name as follows: MongoDB Database Name : mongo MongoDB Collection Name : mongomini 4. Enter the mongo shell $ mongo 5. Use the database mongo > use mongo //create collection? db.createCollection('mongomini') 6. Query the inserted documents in MongoDB //doesn't work db.mongomini.insert(document) db.mongomini.insertmany(document1, document2,...) 7. Query top 10 employers who files more number of cases along with the count of number of cases * Note : Store the above command results in top_employers.txt * Format : ( echo "use mongo" ; echo '<Query_command>' ) | mongo > file_name.txt 8. Query the top 5 job titles with highest application * Note : Store the above command results in top_job_titles.txt //ex to display only 5 docs: //sort desc by application? (db.mongomini.find().sort(desc { jobtitle(application: -1) } ).limit(5) //or db.mongo.find() (echo "use mongo"; echo '(db.mongomini.find().sort(desc { jobtitle(application: -1) } ).limit(5)') | mongo > top_employers.txt 9. Query the most popular city to file the highest number of cases along with its count number db.collection.count() //read data from db: db.Collection.find() mongo.mongomini.find() or db.mongo.find() * failed

update packages list

The package lists must be updated after the repository is added sudo apt-get update

Model Tree Structures

This data model describes a tree-like strong structure by storing references to parent nodes in children nodes. db.BookDetails.insert( { _id: "Oracle", parent: "Databases" } ) db.BookDetails.insert( { _id:"Sqlserver", parent: "Databases" } ) db.BookDetails.insert( { _id:"Databases", parent: " Programming" } ) db.BookDetails.insert( { _id:"Languages", parent: "Programming" } ) db.BookDetails.insert({_id:"Programming", parent: "Books" } ) db.BookDetails.insert({_id: "Books", parent: null } ) Books is the parent. Books > Programming . Programming > Language & Databases. Databases > SQL Server & Oracle. //Query to retrieve a node's parent: db.BookDetails.findOne( { _id: "Oracle" } ).parent ///Query to find parent field by its immediate children nodes. db.BookDetails.find({parent:"Databases"})

config servers

This will be storing cluster's metadata. Data consist of mapping between cluster's data set with shards. This metadata will be taken by query router to target operations to specific shards.

query routers

Those are mongo instances, which interfaces between direct operations to the appropriate shard client applications. Router processes and combines the operations to shards and then returns results to the clients.

Data Model Design

Understand different strategies that need to be considered while choosing data model, their strengths, and their weaknesses. - The key consideration for designing structure of documents is to make the decision either to use embed or references. Embedded Data Models: This model allows storing related pieces of information in the same database record. Use any one among the two types of Embedded Data Model - Model One-to-One Relationships - Model One-to-Many Relationships Normalized Data Models: This data model describes relationships using references between documents.

MongoDB Query Plan

When a user runs a query, MongoDB query optimizer processes the query and chooses the most efficient query plan. For each query sent by a user to MongoDB, query planner will do the following steps: 1. Find matching cache entry 2. Check match found if match found to do the steps until 3 2.1 Evaluate plan performance 2.2 if fails evict the cache and go to STEP 3.1 2.3 if pass go to STEP 3.5 2.4.GO to STEP 3.6 3. IF not match found in STEP 2 3.1 Generate Candidate plan 3.2 Evaluate Candidate plan 3.3 Choose a winning plan 3.4 Create cache entry 3.5 Generate results 3.6 Send Results to a user.

MongoDB

a NoSQL, document-oriented database the name comes from 'humongous', meaning 'huge' - written in C++ - it pairs each key with a complex data structure named as 'document' - stores document in a binary-encoded format termed as BSON. (BSON is an extended format of JSON data model) supports: single value field range fields conditional operators regular expression search queries advantages: - rich doc based query for easy readability -schema-free : schema to change as app evolves - performance-oriented db : best suited for faster request/response - ease of use: codebase is simple, less hardware, quick and easy to add new functionality - high scalability can be achieved by working on low commodity hardware - supports consistency and partition tolerance on CAP theorem - easy replication for high availability - able to handle large volumes of structured, semi-structured, and unstructured data limits: - not suited for complex transactions spanning multiple ops -not for apps with traditional db system requirements like foreign key constraints, etc.

BSON

a binary-encoded serialization of JSON Supports embedding objects and arrays within other objects and arrays. mongodb can access BSON objects to build indexes and match objects against query expressions on both top-level and nested BSON keys traversable efficient BSON format: \x32\x00\x00\x00 //document size \x06 //String Types Study \x00 //name of field \x16\x00\x00\x00MongoDB\x00 //value \x00 // Ending

storage engine

a part of the db that manages hwo data should be stored in memory on a disk. acts as an interface between persistent storage, i.e., disk and mongodb db. examples: MMAPv1 wiredTiger In-Memory Storage Engine

working set

a portion of data that the client is frequently accessing. Accessing disk for data is a time-consuming operation. Based on the query, multiple query plan will be created by mongoDb

document

a set of key-value pairs that support dynamic schema similar to 'row' in RDBMS mongoDB allows the insertion of data without a predefined schema

GridFS

a special type of file system in which data can be stored within MongoDB collections. GridFS splits a larger file into smaller chunks and stores each chunk of data in a separate document with a size of 255k. MongoDB uses GridFS specification for storing and retrieving large collections.

sharding

a technique used for distributing data across multiple servers allows horizontal scaling used by mongo for splitting up a large collection among multiple servers

MMAPv1 (Storage Engine)

a traditional storage engine based on memory mapped files. Provides better workloads with high volume reads, inserts, and in-place updates - automatically allocates power-of-two-sized documents when new documents are inserted - operating system decides which pages can fit into memory 2 types of strategies: 1. power-of-two allocation : store documents in power-of-two, eg: 32, 64, 128, 256, 512, ...MB. It works more efficiently for more insert, update, delete workloads 2. extra fit : collections whose workloads that consist of insert-only operations, or update operations will not increase document size here, consistency is achieved through journalling. Mongodb writes journal files every 100 milliseconds and writes data files to disk on every 60 seconds - indexes and data memories are mapped into virtual address space - frequently used pages will get retained in RAM - offers collection level locking - uses B-trees to store indexes - fast for reads and slow for writes

mongofiles

a utility helps to manipulate files stored in your mongodb. commands are in the form: mongofiles <options> <commands> <filename> options: options to control the behavior of mongofiles. options can be one or more commands : to determine the action of mongofiles filename : represent the name of a file on the local file system or a grid

In Mongodb high ____ can be achieved with replica sets

availability

$match in MongoDB is similar to SQL.

both WHERE and HAVING

value

can be a string, number, boolean, array, or object { 'key' : value, 'key' : value, }

Sharding

can be termed as the processes for distributing data across various servers for storage. A key from the collection will be identified as shard keyand splits data using that specific key. Factors to be considered for selecting Shard Key: -Good Cardinality/Granularity which means selecting key should have enough values to spread the collections. -Common data in queries for the collection. -Based on the schema of data. -Based on Database applications query and perform write operations to be performed. Main Components: - shards - config servers - query routers

Collection in MongoDB that supports fixed-size are called _______________.

capped collection

In MongoDB ..... sorting is not supported collation collection heap (not it)

collation

Documents in MongoDB are stored in _________.

collections

compression (for wiredTiger)

compresses indexes and collections. Works on the principle that identifies repeating values or values like patterns that can be stored once in compressed form thereby reducing the total amount of space. Larger units of data tend to compress more effectively as there are possibilities for repeating values and patterns. compressors : snappy Zlib none (when no compression is needed) options: no compression prefix compression (effective for some data sets whose values are duplicating, like Country)

collection

consists of a group of mongodb documents similar to RDBMS table. documents inside a collection can have same or different values

replica sets

contains two or more copies of the data. By default, read/write ops are performed on the primary replica. The secondary will maintain a copy of primary data. mondoDB uses these for high availability

default_id

created during the creation of a collection. A unique index on the _id the _id index will restrict clients from inserting two documents with the same values (duplicates) for the _id field

The results set of find() method in MongoDB uses a ___________.

cursor

Shard Key

data distribution is based on this Shard key values will be divided into chunk that are evenly distributed across the shard. MongoDB divides shard key values by: - Range based Partitioning - Hash based Partitioning

Which of the following command is used in Determining Indexes Sizes and details of Indexes for Product Collection?

db.Products.stats();

From the collection Books, find the commands that removes a single document matching the condition - Auditor is Joseph? db.books.remove({ Auditor : Joseph }, 1,1) (not this) db.books.remove({ Auditor : Joseph }, { justOne: true }) db.books.removeOne({ Auditor : Joseph }, 1) all (not all)

db.books.remove({ Auditor : Joseph }, { justOne: true })

Find the correct syntax to calculate aggregate values for the data in a collection.

db.collection.aggregate()

Create User

db.createUser() Example: Create user with roles use sample db.createUser( { user: "usertest", pwd: "usertest123", roles: [ "readWrite", "dbAdmin" ] } );

Which of the following will help to identify long running queries?

db.currentOp()

Find the command that removes the user from the current database.

db.dropUser()

Drop User

db.dropUser() removes the user from the current database. Example: Below operation drops the User1 user on the sample database. use sample db.dropUser("User1", {w: "majority", wtimeout: 2000})

Which of the following method is used to create Index in MongoDB?

db.ensureIndex()

Which method is used to return information for all users associated with a database?

db.getUsers()

Grant Roles to User

db.grantRolesToUser() grants additional roles to a user. syntax: db.grantRolesToUser( "<username>", [ <roles> ], { <writeConcern> } ) Example: Below operation gives Usr01, - readWrite role on the Books database - read role on the Film database. use Books db.grantRolesToUser( "Usr01", [ "readWrite" ,{ role: "read", db: "Film" } ], { w: "majority" , wtimeout: 2000 } )

______ can be used to check if the connected node is primary or not?

db.isMaster()

Which is the method used to terminate certain operations after examination?

db.killOp()

Which of the statement skips the first ten documents in student collection and return remaining documents?

db.stud.find().skip(10)

Update User

db.updateUser() You can update a field which will completely replace the previous field's values. Updates can be performed on user's roles array. Syntax: db.updateUser( "<username>", { customData : { <any information> }, roles : [ { role: "<role>", db: "<database>" } | "<role>", ... ], pwd: "<cleartext password>" }, writeConcern: { <write concern> })

snappy

default compression and low overhead. Efficiently use resources

WiredTiger (storage engine)

default storage engine starts from MongoDB 3.2 version. created by BerkelyDB and later taken by Oracle's noSQL DB. supports: - document-level concurrency model - compression - encryption at Rest for MongoDB enterprise edition - durability with and without journal - B-trees by default but also supports LSM trees - no locking algorithms like Hash pointer yields 7x-10x better write operations and 80% of the file system compression than MMAP

pretty()

displays records in a formatted way db.collection.find().pretty() db.topic.find().pretty()

From the following ... is a NoSQL Database Type document database sql server oracle

document database

Replication

helps to synchronize data across multiple servers. - Replication is achieved by placing multiple copies of data on different database servers - Replication assists by -providing Redundancy - increased Data availability Benefits - Protects the database from the loss of a single server failure. - Helps to keep data safe with higher availability of data. - Downtime is not required for maintenance and Disaster Recovery (like backups, index rebuilds, compaction). How Replication Works? -Insertion will occur in the primary node. -An operation will get tracked in oplog (which is part of local db - oplog.$main) -Secondary nodes will read data from oplog and update the respective node. -A group of mongod instances that host the same data set can be termed as a replica set. -On replica set, one node will act as a primary node, and remaining nodes will become secondary nodes.

pipeline aggregation

documents are piped through processing pipeline that will be executed in stages and transforms the docs into an aggregated result. when more than one stage occurs, each of the stages is placed inside the array. some stages are: $Project : reshape the docs. handling of docs is 1:1 $match: filtering of the doc occurs. Reduce the # of docs hence handline is n:1 (n is input) $group : we can aggregate operators like sum, count that will group together the docs. Reduce the # of docs hence nature is N:1 (n is input) $sort: Once group completed, sort documents based on order. This stage will be in 1:1 nature. $skip: Skips some documents .n:1 transformation in nature. $limit: Limit some documents .n:1 transformation nature. $unwind: Used to unwind document that is used in an array. $output: Output collection .1:1 transformation in nature. $redact: Security related feature that is used to limit to certain users. $geonear: Security related feature that is used to perform allocation based queries to limit based on location. ========================================== syntax: db.collectionname.aggregate( [ { <stage1> }, { <stage2>} ,..... { <stage..N> } ]) example: db.customers.aggregate( [ {$match : {status:"Active"}}, {$group: {_id : "Customer_id", total : {$sum : "$salesamount" }}} * customers is the collection ; $match and $group are stages

Column in SQL terminology is _________ in MongoDB.

field

read data from db & find()

find() displays documents in a collection. Returns output as cursor. MongoDB server returns queries in batches db.Collection.find() db.topic.find()

Obsessions

for query optimization of mongoDb - Bad Schema design - Statement tuning - Instance Tuning

Which collections are used to store GridFS data in MongoDB?

fs.files and fs.chunks

Range Based Partitioning

having ranges of low to high [Klow, kHigh]. Example: Consider the below example with a collection "User" that contains name {.....name...}. Here name is the sharding key. Metadata looks like below. Shard nameLow - nameHigh ________________________________________________________________________ Shard 0 (s0) : range [jane - jose] Shard 1 (s1) : range [joe - kyle ] Shard 3 (s2) : range [kyle - matt] Shard 4 (s3) : range [Robert- Zzzzz] //when the user runs the below command, the search will go to only two shards s0 and s1. db.users.find({name:/^jo/})

GridFs

helps to store data that has a size of > 16MB it splits a larger record into small chunks and stores the chunks in documents with a max size of 255KB mongofiles.exe -d <database_name> put <file_name>

Find the command that forces MongoDB to use a particular index for a db.collection.find() operation.

hint()

limit()

limit records in mongoDB. If number is not present, it will display all records. takes a positive int that specifies the max number of docs to pass along db.collection.find().limit(number) ex to display only 5 docs: (db.stock.find({},{titlename:1,id:0}).limit(5) -----> db.Stock.find({},{titlename:1,id:0}).limit(5))

connect to mongoDB running instance

mongo

core processes

mongo : powerful interface for interactive Javascipt shell. helps system admin, developers, to test queries and operations directly with the db mongod : primary process for mongodb. This will manage data requests, control data access, and performs other background management operations

mongoDB indexes

required for faster retrieval of data. They are sorted and stored as B-tree structure. There should be a balance between indexes and queries. support fast and efficient execution of queries. when query in mongodb is not indexed, a full collection scan will be performed. The absence of index can cause significant db performance degradation. performance of these can be improved by: -documents inspected in memory should be reduced - the need to perform in-memory sorts must be removed default_id : each collection contains an idd named default_id single field : used for single field or sort. Indexes can be either in asc order or desc compound index : used for multiple fields text indexes : to support text search queries on string content multikey index : used to index array data geo-spacial index : two dimensional and 2D sphere (geolocation)

binary import/export tools

mongodump : helps to create a binary export of the contents of a db. Part of a backup strategy. example: creates a dump file containing only the 'student' named collection in the db named 'school': mongodump --db School --collection student ======================================================== mongorestore : loads data from binary db dump generated by mongodump example: read db dump in the dumps dir that contains only the 'student' named collection in the db named 'School' mongorestore --collection student --db School dumps/ =================================================== bsondump : converts BSON files into human-readable formats. Works in system command line, not mongo shell. syntax: bsondump --outfile collectionname.json collectionname.bson --outFile : specifies the path of the file to which bsondump should write its output JSON data ====================================================== mongooplog : tool that polls operations from the replication oplog

Which is the command-line tool used to provide a method to keep track of the amount of time a MongoDB instance spends reading and writing data?

mongotop

diagnostic tools

mongotop : helps track the amount of time a MongoDb instance spends on writing and reading data. example: to return every 30 seconds mongotop 30 ================================================= mongostat : provides quick summary of a currently running instance and returns the counters of database operation. Counters consist of inserts, updates, deletes, cursors, and queries. It helps to troubleshoot performance issues. example: mongostat =================================================== mongoperf : tool for quickly testing disk I/O performance. It accepts configuration options in the form of a file that holds a JSON document. ==================================================== mongosniff: helps to investigate mongodb db activity. From version 3.4, it is replaced with 'mongoreplay'. You can reproduce and investigate issues by recording and replaying the operations that trigger an issue

single purpose aggregation operations

operations that aggregates docs from a single collection capabilities and flexibility of this are less compared to pipeline and mapreduce. examples: count() : return count of matching docs distinct() : return distinct values for a field example: db.customers.distinct("Customer_ID")

winning query plan

placed in cache then evicted from cache when: - writes reached a threshold number - rebuilding of an index is required - restart of the server is needed

__________ is the method used to display results in a formatted way.

pretty()

Which is the key in RDBMS similar to ObjectID in MongoDB?

primary key

aggregation

processes records and returns computed results types: pipeline aggregation : documents are piped through processing pipeline and executes in different stages and transforms the documents into a final aggregated result Map-Reduce : splits a larger problem into smaller chunks and sends to different machines for processing. It comprises two phases: reduce and map. single Purpose : aggregates documents from a single collection collections of dos are taken as inputs and return aggregated results in the form of docs, cursor, or a collection

Which of the following in MongoDB is a system that can help identify inefficient queries and operations?

profiler

Explain

provide useful insight into when you are trying to optimize a query. It describes the process and indexes used to return the query. This method will provide information on the query plan. Syntax: db.collectionname.find().explain() Example: db.books.find({year:1936}).explain() db.collectionname.explain("executionStats") //This method helps to provide statistics about the performance of a query. Example: Consider School is a collection that contains student documents. You have to find the details of students whose score is greater than 670. db.school.find( { score: { $gt: 670 } }).explain("executionStats") output: queryPlanner.winningPlan.inputStage .stage : shows COLLSCAN to indicate there is no index usage. executionStats.nReturned: shows 7 to indicate that the query matches and returns seven documents. executionStats.totalKeysExamined: shows 0 to indicate that MongoDB scanned entrie document with the absence of a key executionStats.totalDocsExamined: display 9 to indicate that MongoDB scanned 9 documents.

NoSQL

provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases - NOT relational - runs on clusters - probably open-source - mostly used in Big data and real-time web apps the commonly used Data Structures are : - document - graph - key-value - wide column

delete

removes documents from a collection remove() deleteOne() deleteMany() db.collectionName.remove()

Geospatial Indexes

reserve geospatial data as GeoJSON objects or coordinate pairs. Syntax: db.collectionname.createIndex ({<location field> : "2d"}) //Consider we have a 2D geospatial index defined on the key location in the collection Hotel. Query to find the closest three hotels to the location 75, 245. db.hotel.find({'location': {$near:[75, 245]}}).limit(3)

sort()

returns documents in ascending or descending order

Mongo client will generate _____ command to initiate a new replica set.

rs.initiate()

Which is the command to Check the Size of the Oplog?

rs.printReplicationInfo()

Which is the method to check the size of the oplog for a given replica set member?

rs.printReplicationInfo()

Which is the method used to check the current length of replication lag?

rs.printSlaveReplicationInfo()

Mongoshell command used to list all the databases in MongoDB

show dbs

show db in mongodb

show dbs

Zlib

similar to gzip and provides 10x better compression than snappy compression but cost more on CPU

Collection in MongoDB is __________ in SQL terminology.

table

stemming

technique for text search. Looks for specified words in he string fields and stem the words in a collection to only store root words //'Users' collection, contains user_comments text and NosqlTopic { "user_comments": "Topics mentioned in Frescoplay for MongodDB is very informative", "NosqlTopics": [ "mongodb", "cassandra" ] } //Consider the following document under Users collection that contains user_comments text and databasenames { "user_comments": "Topics mentioned in Frescoplay for MongodDB is very informative", "databasenames": [ "mongodb", "cassandra" ] } //Suppose we have to search for all the user_comments having the word MongodDB in their text. db.users.find({$text: {$search:"MongodDB"}}) //Create Text Index on user_Comments db.users.ensureIndex({user_comments: "text"}) *Compared to normal search, Text Search will improve the search efficiency.

drop index

to remove/drop index from collection. Default index on the _id field can't be removed ex: db.file.dropIndex({tags: 1})

create index

use ensureIndex() [deprecated after 3.0] or createIndex() syntax: db.collection.ensureIndex({KEY:1}) db.collection.createIndex({KEY:1}) ex: db.file.createIndex({tags:1})

multikey indexes

used to make efficient queries against array fields. This can be created over arrays which hold both scalar values and nested documents. //Suppose we have Employees collection that contains details of employees with multiple skills try { db.Employees.insertMany( [ { "_id" :"1", "Name" :"Mridhula", "EmployeeCode" : "EC01", "Country" : "IND" ,"Skills": ["java", "oracle", "Informatica"]}, { "_id" :"2", "Name" : "Akhila", "EmployeeCode" : "EC02", "Country" : "US","Skills": ["java", "oracle", "Informatica"]}, { "_id" :"3", "Name" : "Alisha", "EmployeeCode" : "EC03", "Country" : "UK", "Skills": ["java", "MongoDB", "Informatica"]}, { "_id" :"4", "Name" : "Anwita", "EmployeeCode" : "EC04", "Country" : "IND","Skills": ["java", "Cassandra", "Informatica"] }, { "_id" :"5", "Name" : "Ameya", "EmployeeCode" : "EC06", "Country" :"US" ,"Skills": ["java", "oracle", "Informatica"]} ] ); } catch (e) { print (e); } //create a MultiKey Index of Skills db.Employees.createIndex({"Skills":1});

Map-Reduce Aggregation

used when a problem is complex in nature.. it splits a problem into smaller chunks and sends to different machines. Each chunk will be processed into a separate machine. When all the machines' operations are finished, all the chunks output will be combined together to generate a final solution 2 phases: Map : docs are processed and emit one or more objects for each input doc reduce : combines the output generated by map operation finalize : it is optional in nature. It can be used to make a final modification to results ** custom js functions are used to perform the map and reduce operations ============================================== syntax : db.collectionname.mapReduce( //mapper function function() {emit(key,value);}, function(key,values){ return reducerFunction }, //reducer function { out: collection, query: document, sort: document, limit: number } ) ========================================== example: db.Customers.mapReduce( //map function() { emit(this.customer_id, this.salesamount); }, //reduce function(key,values){ return Array.sum(values) } //filter on mapReduce complete output { query : {status : "Active"}, //final output out : "Customer_orderdetail" } )

Which of the following data type is not used in MongoDB?

value

objectID

when a document gets created, mongoDB creates this which makes the document unique. a 12 byte hexa string BSON type having the following structure: DATE | MACD | PID | Counter 4bytes | 3bytes | 2bytes | 3bytes DATE will be time in seconds MACD represents mac addresses of the machine PID represents of process ID Counter represents random counter values example: var id = ObjectId() db.Product.insert({_id: id, name : "Orange"});

_______ storage engine supports document level concurrency model.

wiredTiger

Find the correct answer from the following in the absence of Primary, replica set cannot accept this operations.

write (not this) maybe both read and write?


Kaugnay na mga set ng pag-aaral

Developmental Psychology I: Child Quiz 4

View Set

Ch.31: Addiction and Substance Use-Related Disorders

View Set

MKTG 3650 Ch. 16, MKTG 3650 CH. 14 Practice Exam, UNT thompson marketing exam 3 chap. 15-16, MKTG 3650 Ch. 11-13

View Set

Rad Physics Ch. 5 (plus transformers)

View Set

REL 140: Indian Boarding Schools Quiz

View Set