MongoDB Aggregation Pipeline

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

$sum

$group accumulator and $project accumulator that returns a sum of numerical values. It ignores non-numeric values. EXAMPLE total: {$sum: "$value"}

$push

$group accumulator that returns an array of expression values for each group. Every time it encounters a value, it pushes it onto each group's array. EXAMPLE username: {$push: "$username"}

$addToSet

$group accumulator that returns an array of unique expression values for each group. EXAMPLE colors: {$addToSet: "$color"}

$avg

$group accumulator that returns an average of numerical values. It ignores non-numeric values. As a $project accumulator, this returns an average of the specified expression or list of expressions for each document. It still ignores non-numeric values EXAMPLE total: {$sum: "$value"}

$max

$group accumulator that returns the highest expression value for each group. As a $project accumulator, this returns the maximum of the specified expression or list of expressions for each document. EXAMPLE maxValue: {$max: "$value"}

$min

$group accumulator that returns the lowest expression value for each group. As a $project accumulator, this returns the minimum of the specified expression or list of expressions for each document. EXAMPLE minValue: {$min: "$value"}

The real power of the aggregation framework comes from the ___, which has its roots in the SQL GROUP BY clause.

$group stage

___ filters out documents based on some criteria and passes the matching documents to the next stage.

$match

What happens if you specify a field that is an empty array during the $unwind stage?

$unwind, by default, ignores the input document and will not output documents for that input document.

What happens if you specify a path for a field that does not exist in an input document during the $unwind stage?

$unwind, by default, ignores the input document and will not output documents for that input document.

Write a query that allows you to find movies directed by Steven Spielberg.

//using aggregation pipeline and $match... db.movies.aggregate([ { $match : {director : "Steven Spielberg" } } ]); //using find() method... db.movies.find( {director : "Steven Spielberg" } );

Write a query to find movies which either tomato meter is greater than 95 or imbd rating is greater than or equal to 9.5. (NOTE: both meter and rating are embedded documents within tomato and imbd documents, respectively)

//using aggregation pipeline with $match stage... db.movies.aggregate([ { $match: { $or: [ { "tomato.meter": { $gt: 95 } }, { "imdb.rating": { $gte: 9.5 } } ] } } ]); //using the find method... db.movies.find( { $or: [ { "tomato.meter": { $gt: 95 } }, { "imdb.rating": { $gte: 9.5 } } ] });

What are the main $group accumulators?

1. $max 2. $min 3. $avg 4. $sum 5. $push 6. $addToSet

What are the basic pipeline stages?

1. Filtering 2. Projecting 3. Grouping 4. Sorting 5. Skipping 6. Limiting

What are two key functions of aggregation operations?

1. The ability to filter out documents and modifying their format 2. The ability to group values from multiple documents and perform a variety of operations on the grouped data

What are the two major functions of the $match stage?

1. To perform aggregation on a subset of documents 2. To filter the result set after aggregation

What mapping does $project produce?

1:1 mapping

What mapping does the $sort stage product?

1:1 mapping

What mapping does $match produce?

1:1 or 1:0 mapping

What mapping does the $unwind produce?

1:M mapping

To group all input documents in to one group, we can use ___.

A null or empty string, "".

The $limit stage takes a(n) ___ that specifies the number of documents to pass along.

A positive integer

If specified, the output of the $group stage can contain ___ that hold the values of some accumulator expression grouped by the $group's ___.

Accumulator expression... _id field

In general, expression operators take ___.

An array of arguments

$max, $min, $avg and $sum accumulators are ___ in the group stage.

Arithmetic operators

Pipelines appear in an ___.

Array

$push and $addToSet group accumulators build ___.

Arrays

To perform aggregation on a subset of documents, when should you place the $match stage? Why?

As early in the pipeline as possible to minimize the amount of processing down the pipe

When grouping documents, why do we prepend the field we use for the _id with a dollar sign and put quotes around it?

Because even though is it a field within the input documents, it is now serving as the value to the _id field.

Why does the cardinality of stages range from 1-1 to 1-M and M-1?

Because pipelines do not need to produce one output document for every input document.

What is the difference between the $push and $addToSet group accumulators?

Both push values in to an array. However, $addToSet ONLY pushes UNIQUE values on to an array for each group.

How do you create a composite _id field in the $group stage?

By making the _id field a document that consists of two fields. {$group: {_id: {state: "$state", city: "$city"}}}

In the $group stage, ___ are optional and computed using the ___.

Computed fields... accumulators

How are collections sent through the pipeline?

Document by document; that is, documents are piped through processing stages and these stages transform documents into an aggregated result set.

Each stage has a ___ that contains expressions to transform the collection.

Document structure

Each aggregation stage is specified in a ___, which is a(n) ___ in the aggregate() method.

Document... element of the array

$add

Expression operator that adds an array of numbers to return the sum. EXAMPLE valuePlus5: {$add: ["$value", 5]}

$concat

Expression operator that concatenates any number of strings EXAMPLE title: {$concat: ["$title"," ","$name"]}

$size

Expression operator that counts and returns the total number of items in an array EXAMPLE numberOfColors: {$size: "$colors"}

$multiply

Expression operator that multiplies an array of numbers to return the product EXAMPLE valueTimes5: {$multiply: ["$value", 5]}

$mod

Expression operator that returns the remainder of the first number divided by the second EXAMPLE valueMod5: {$mod: ["$value", 5]}

$divide

Expression operator that returns the result of dividing the first number by the second EXAMPLE valueDividedBy5: {$divide: ["$value", 5]}

$subtract

Expression operator that returns the result of subtracting the second value from the first EXAMPLE valueMinus5: {$subtract: ["$value", 5]}

$month

Expression operator that takes the date field as the value and returns the month of the date as an integer.

The ___ are available to construct expression for use in aggregation stages.

Expression operators

What do we call operators that can be used when computing new fields but are NOT accumulators?

Expression operators

T or F: If the field does not resolve to an array and isn't empty (for example, is a string field), $unwind ignores the input.

FALSE; $unwind treats the value as a single element array.

T or F: The requested fields to be forwarded to the next stage after $project must be existing fields from the input documents.

FALSE; The requested fields can be existing fields from the input document OR newly computed fields.

T or F: You can only use existing fields in the sort stage.

FALSE; You can use existing fields AND calculated fields in the sort stage.

T or F: The project stage in the aggregation pipeline is the same as the projection document in the find() query method.

FALSE; it is similar but much more general than the projection used in the find() method.

T or F: Pipeline stages need to produce one output document for every input document

FALSE; some stages may generate new documents or filter out documents.

T or F: The _id field in the group by field is not mandatory.

FALSE; the _id field IS mandatory. However, you can specify an _id value of null to calculate accumulated values for all input documents as a whole.

T or F: The output document of the $group stage contains the identifier field and any accumulated fields.

FALSE; the output document only includes accumulated fields if specified in the $group document.

T or F: Using aggregation with the $match stage gives the same result as the find() method.

FALSE; this is only true if we are ONLY filtering a set of documents.

In the $unwind stage, each output document is the ___ with the value of the ___ field replaced by the ___.

Input document... array... element.

What are the key functions of the document taken by the $project stage?

It can specify: 1. The inclusion of fields 2. The exclusion of fields 3. The addition of new fields 4. The resetting of the values of existing fields

What does the $sum group accumulator do with non-numeric values?

It ignores them.

$match

Known as the filtering stage, this stage filters the document stream to allow only matching documents to pass unmodified into the next pipeline stage. For each input document, it outputs either one document (a match) or zero document (no match).

$group

Known as the grouping stage, it groups input documents by a specified identifier expression and applies the accumulator expression(s), if specified, to each group. It consumes all input documents and outputs one document per each distinct group. The output documents only contain the identifier field and, if specified, accumulated fields.

$unwind

Known as the normalization stage, it deconstructs an array field from the input documents to output a document for each element. Each output document replaces the array with an element value. For each input document, it outputs n documents, where n is the number of array elements and can be zero for an empty array.

$project

Known as the projection stage, it reshapes each document in the stream, such as by adding new fields or removing existing fields. For each input document, it outputs one document.

$sort

Known as the sorting stage, it reorders the document stream by a specified sort key. Only the order changes; the documents remain unmodified. For each input document, it outputs one document.

What mapping does the $group stage produce?

M:1 mapping

$group accumulators

Operators that maintain their state for the group of documents that share the same key

The $skip stage takes a ___ that specifies the number of documents to skip.

Positive integer

The ___ stage lets you reshape the documents as they come through the pipeline.

Project

Unlike the find() method, $match does not allow for ___.

Projection

How do you set the $sort order to descending (e.g. 3, 2, 1... or C, B, A...)?

Set the field sort order to -1 EXAMPLE {$sort: {city: -1}}

How do you set the $sort order to ascending (e.g. 1, 2, 3... or A, B, C...)?

Set the field sort order to 1 EXAMPLE {$sort: {total: 1}}

In the $group stage, how do you calculate accumulate values for all the input documents as one single group?

Specify an _id value of null

The aggregation pipeline consists of ___, each of which transforms documents as they pass through the pipeline.

Stages

T or F: Documents pass through the aggregation stages in sequence.

TRUE

T or F: Stages can be combined in any order and repeated as many times as necessary

TRUE

T or F: The $avg group accumulator ignores non-numeric values.

TRUE

T or F: The $limit has no effect on the content of the documents it passes along the pipeline.

TRUE

T or F: The $match query syntax is identical to the find() method query syntax

TRUE

T or F: If an expression operator accepts a single argument, you can omit an array for just one argument.

TRUE; This would take the following form: {<operator>: <argument1>}

Aggregation

The process of applying a series of operations to documents as they are being compiled into a result set

What function do existing fields serve when used with expressions operators?

They allow us to calculate new fields from the existing fields.

What is the difference between the skip and limit cursor methods in the find() query and the skip and limit stages in the aggregation pipeline?

They work pretty much the same. However, the order in which these stages are specified matters in the aggregation pipeline. In the cursor method, the order of skip and limit does not matter when used together because the documents are first sorted, then skipped and finally limited.

$skip

This is the stage that allows you to skip some documents. It skips the first n documents, where n is the specified skip number and passes the remaining documents unmodified to the pipeline.

$limit

This is the stage to limit the output. It passes the first n documents unmodified to the pipeline, where n is the specified limit.

What is the primary purpose of the $unwind stage?

To normalize documents in order to group them on things within the array.

How do you rename fields in the $project stage?

To rename, you put the name of the new field, followed by a colon and the field we want to rename, prepended with a $ and placed inside quotes. EXAMPLE - title field becomes name {$project: {name:"$title", total: {$add: ["$price", "$tax"]}}}

How do $project accumulators behave differently than accumulators in the group stage?

When used in the $project stage, accumulators operate over an array in the current document. They do not carry values over all documents. That is, accumulators have no memory between documents.

When does it make the most sense to use the $limit and $skip stages?

When you have previously used the $sort stage in the aggregation pipeline

How do you exclude fields in the $project stage?

You list the field(s) followed by a 0 in the document taken by the $project stage EXAMPLE {$project: {title: 0, year: 0, "imbd.rating": 0}}

How do you include fields in the $project stage?

You list the field(s) followed by a 1 in the document taken by the $project stage EXAMPLE {$project: {title: 1, year: 1, "imbd.rating": 1}}

How do you reshape fields in the $project stage?

You name the new field followed by a colon and a document containing the values. EXAMPLE - embedded fields rating and meter are set to strings and used to create the new rating embedded document {$project: {title:1, rating: {imdb:"$imdb.rating", tomato:"$tomato.meter"}}}

How do you add new fields in the $project stage?

You name the new field followed by a colon and a document containing the values. EXAMPLE 1- add a new field for rating {$project: {title:1, rating: {imdb:"$imdb.rating", tomato:"$tomato.meter"}}} EXAMPLE 2- add a new field for total {$project: {name:"$title", total: {$add: ["$price", "$tax"]}}}

The output documents of the $group stage contain a(n) ___, which contains the distinct group by key.

_id field

What is the syntax used for the aggregate method in MongoDB?

db.collection.aggregate([{stage1}. {stage2}, {stage3}]);

What is the syntax used for the $group stage?

{$group: {_id: <group by key>, <computedfield1>: {<accumulator1>: <expressiong1>},...}};

What is the syntax used for the $project stage?

{$project: {<specification(s)>}}

What is the syntax used in the $sort stage?

{$sort: {<field1>: <sort order>, <field2>: <sort order>...}}

What is the syntax used with expression operators?

{<operator>: [<argument1>, <argument2>...]}


संबंधित स्टडी सेट्स

Chapter 21 - Developmental Concepts

View Set

MicroEcon 247 Practice Mid-Term Exam Part 1 (all MindTap Quizlets into one)

View Set