Mongo DB A Developers Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 296

DownloadMongo DB - A  Developers Guide
Open PDF In BrowserView PDF
MongoDB
The Complete Developer’s Guide

Introduction to MongoDB

What is MondoDB?
MongoBD is a database which is created by the company who is also called MongoDB. The name
stems from the word “humongous”. This database is built to store a lot of data but also being able
to work with the huge data efficiently. Ultimately, this is a database solution.
There are many database solutions such as mySQL, PostgresSQL, TSQL etc.
MongoDB is most importantly a database server that allows us to run different databases on it for
example a Shop database. Within the database we would have different collections such as a Users
collection or a Orders collection. We can have multiple databases and multiple collections per
database.
Inside of the collection we have something called documents. Documents look like JavaScript
JSON objects. Inside of a collection the documents are schema-less and can contain different data.
This is the flexibility that MongoDB provides us with whereas SQL based database are very strict
about the data stored within the database tables. Therefore, the MongoDB database can grow with
the application needs. MongoDB is a No-SQL database.

2

Typically we will need some kind of structure in a collection because applications typically requires
some type of structure to work with the data.

Diagram 1.1:
Database

Collections

Shop

Users

Orders

{name: ‘Max’, age: 28}

{product: ‘pen’, price: 1.99}

{name: ‘Sam’}

{product: ’t-shirt’}

Documents

3

JSON (BSON) Data Format:
{
“name”: “Alex”,
“age”: 29,
“address”: {
“city”: “Munich”
},
“hobbies”: [
{ “name”: “Cooking” },
{ “name”: “Football” }
]

Key
“name”: “Alex”
Name of the Key

Key Value

}
The above is an example of the JSON data format. A single document is surrounded by curly
brackets. The data is normally structured with a Keys. Keys consist of a Name of the Key and a Key
value. The Name of the Key (which will be referred to as Key from now on) and the Key Value must
be wrapped around quotation marks (unless if the data is a type of number).
There are different types of values we can store such as: string, number, booleans and arrays.
We can also nest documents within documents. This allows us to create complex relations between
4

data and store them within one document, which makes working with the data and fetching data
more efficient because it is contained in one document in a logical way. SQL in contrast requires
more complex method of fetching data which require joins to find data in table A and data in table
B to retrieve the relevant data.
Behind the scenes on the server, MongoDB converts the JSON data to a binary version of the data
which can be stored and queried more efficiently. We do not need to concern ourselves with BSON
as we would tend to work with JSON data.
The whole theme of MongoDB is flexibility, optimisation and usability and it is what really sets
MongoDB apart from other database solutions because it is so efficient from a performance
perspective as we can query data in the format we need it instead of running complex restructuring
on the server.

The Key MongoDB Characteristics.
MongoDB is a no SQL solution because it is following an opposite concept/philosophy to SQL
based databases. Instead of normalising the data i.e. storing data distributed across multiple tables
where every table has a clear schema and then using relations, MongoDB goes for storing data
together in a document. It does not force a schema hence schema-less/No-SQL.
5

We can have multiple documents in a single collection and they can have different structures as we
have seen in Diagram 1.1. This is important, it can lead to messy data but it still our responsibility as
developers to work with clean data and to implement a solution that works. On the other hand this
provides us with a lot of flexibility. We could use mongoDB for applications that might still evolve,
where the exact data requirements are not set yet. MongoDB allows us to started and we could
always add data with more information in the same collection at a later point in time.
We also work with less relations. There are some relations, but with these embedded (nested)
documents, we have less collections (tables) which we connect but instead we store data together.
This is where the efficiencies is derived from, since data is stored together and when we fetch data
from our application it does not require to reach out to multiple tables and merge the data because
all the data is already within the single collection. This is where the speed, performance and
flexibility comes from and can be seen beneficial for when building applications. This is the main
reason why No-SQL solutions are so popular for read and write heavy applications.

6

MongoDB Ecosystem
The below Diagram 1.2 is the current snapshot of the MongoDB companies ecosystem and product
offerings. The focus of this guide is on the MongoDB database used locally on our machines and on
the cloud using Atlas. We will also dive in Compass and the Stitch world of MongoDB.

Diagram 1.2:
MongoDB Database
Self-Managed/
Enterprise

Atlas (Cloud)

CloudManager/
OpsManager
Compass

Stitch

Mobile

Serverless Query API

Serverless Functions

Database Triggers

BI Connectors
MongoDB Charts
7

Realtime Sync

Installing MongoDB
MongoBD runs on all Operating Systems (OS) which include Windows/Mac/Linux. To install
MongoDB we can visit their webpage on:
https://www.mongodb.com/
Under products select MongoDB server and download the MongoDB Community Server for our OS
platform of choice. Install the MongoDB Server by following the installation steps.
Important Note: On Windows when installing click on the Custom Setup Type. MongoDB will be
installed as a service which will be slightly different to how MongoDB runs on Mac & Linux.
On Mac and Linux we simply have a extracted folder which contains files. We would copy all the
contents within this folder and paste them into any place within our OS i.e wherever we would want
to install MongoDB.
We would then want to create a folder called data and a sub-folder called db anywhere within our
OS, preferably in the root of the OS.
On Windows open up the command prompt or on Mac/Linux open up the terminal. This is where
8

we are going to spend most of out time using special commands and queries. Run the following
command:
$ mongo
This should return command not found.
To fix this problem on a Mac go to the user folder and find a file called .bash_profile file (if this does
not exist we could simply create it). Edit the file using a text editor. Add the following line:
export PATH=/Users/Username/mongobd/bin:$PATH
The path should be wherever we placed the mongoDB binary zip files. We need to add :$PATH at
the end on Mac/Linux to make sure all our other commands work on our OS. Save the file and close
the file.
Important Note: if you run into a problem on not being able to edit the .bash_profile using text
editor use the following command to edit it within the terminal:
$ sudo nano ~/ .bash_profile
This will allow you to edit the file within the terminal and enter the mongo bin file path. Press CRTL +
o to save and CTRL + x to exit the nano edit.
9

To fix this on a Windows OS, we need to create an environment variable. Press the windows key and
type environment which should suggest the Edit Environment Variable option. Under the user
variables edit Path to add the directory path to where we installed the mongoDB files:
C:\Program Files\MongoDB\Server\4.0\bin
Restart the terminal/command prompt and now run the command:
$ mongo
This should now return a error of connect failed on Mac/Linx.
On Windows it will connect because MongoDB is running as a service and has already started as a
background service because we would have checked this during the installation. If we open the
command prompt as administrator and ran the command ‘net stop MongoDB’ this will stop the
background service running automatically and we can manually start and stop the MongoDB
service running on windows. DO NOT RUN THIS COMMAND ON MAC/LINUX.
The mongo command is the client which allows us to connect to the server and then run commands
on the databases. To start the service on Mac/Linux we would use the following command:
$ mongod
10

When running this command to start the server it may fail if we chose a different default /data/db
folder. If we used a different folder and not within the root of our OS we would need to start the
mongod command instance followed by the --dbpath flag and the place where the /data/db is
located if not within the root directory.
$ sudo mongod --dbpath “/data/db”
On Mac we would need to run the mongod command every time we wish to run the mongoDB
service whereas on Windows this will run automatically even after restarting the system.
Now that we have the mongod server running minimise the terminal on Mac/Linux and open up a
new terminal. We cannot close the mongod server terminal because it is running the service and if
closed the mongoDB server everything will stop working and we cannot continue to work with the
database server. Pressing the CTRL + C keys within the terminal will quit the mongod service, but we
would need to re-run the mongod command again should we wish to run the server again.
We are now in the mongo shell which is the environment where we can run commands against our
database server. We can create new databases, collections and documents which we will now focus
on in the following sections.

11

Time to get Started
Now that we have the mongod server running and we can now connect to it using the mongo shell
we can now enter the following basic commands in the mongo terminal:
Command

Description

$ cls

Clear the terminal.

$ show dbs

Display existing databases (there are three default
databases: admin, config and local which store meta
data).

$ use databaseName

Connect/Switch to a database.
If the database does not exist it will implicitly create a new
database using the databaseName. It will not create the
database until a collection and document is added.

$ db.collectionName.insertOne( {“name

Create a new collection. The db relates to the current

of key”: “key value”} )

connected database. This will implicitly create a new
collection if it does not exist. We must pass at least one
new data in the collection using the .insert() command
passing in a JSON object. This will return the object to
confirm the data was inserted into the database.

12

Important Note: we can omit the quotes around the name of the key within the shell but we must
contain the quotes for the key value unless the key value is type of number. This is a feature within
the mongo shell which work behind the scenes. MongoDB will also generate a uniqueId for new
documents inserted into the collection.
Command

Description

$ db.collectionName.find()

Display the document within the database collection.

$ db.collectionName.find().pretty()

Display the documents within the database collection but
prettify the data in a more humanly readable format.

This is a very basic introductory look at the following shell commands we can run in the mongo
terminal to create a new database, switch to a database, create a collection and documents and
display all the documents within a database collection either in the standard or pretty format.
Tip: to run the mongod server on a different port to the default port 27017 by run the following
command. Note you would need to specify the port when running the mongo shell command as
well. You would use this in case the default port is being used by something else.
$ sudo mongo --port 27018
$ mongo --port 27018

13

Shell vs Drivers
The shell is a great neutral ground for working with mongoDB. Drivers are packages we install for
different programming languages the application might be written in. There are a whole host of
drivers for the various application server languages such as PHP, node, C#, python etc. Drivers are
the bridges between the programming language and the mongoDB server.
As it turns out, in these drivers, we would use the same command as we use in the shell, they are
just slightly adjusted to the syntax of the language we are working with.
The drivers can be found on the mongoDB website:
https://docs.mongodb.com/ecosystem/drivers/
Throughout this document we will continue to use the Shell commands as it is the neutral
commands. We can take the knowledge of how to insert, configure inserts, query data, filter data,
sort data and many more shell commands. These commands will continue to work when we use the
drivers but we would need to make reference to the driver documentation to understand how to
use the shell commands but using the programming language syntax to perform the commands
using the drivers. This will make us more flexible with the language we use when building
applications that uses mongoDB.

14

MongoDB & Clients: The Big Picture
Diagram 1.3:
Application
Frontend
(UI)

Backend
(Server)

Data

Drivers
Node
Java
Python

MongoDB Shell
Playground, Administration

15

Queries MongoDB

Server

Storage
Engine

Communicate

File/Data
Access

MongoDB & Clients: The Big Picture
Diagram 1.4:
Data
Read + Write Data to
Files (slow)

MongoDB Server

Storage Engine

Memory
Read + Write Data to
Memory (fast)
As we can see in Diagram 1.3 the application driver/shell communicates to the mongoDB server.
The MongoDB server communicates with the storage engine. It is the Storage Engine which deals
with the data passed along by the MongoDB Server, and as Diagram 1.4 depicts it will read/write to
database and/or memory.

16

Understanding the Basics & CRUD Operations

Create, Read, Update & Delete (CRUD)
We could use MongoDB to create a variety of things such as an application, Analytics/BI Tools or
data administration. In an application case, we may have an app where the user interacts with our
code (the code can be written in any programming language) and the mongoDB driver will be
included in the application. In the case of a Analytics/BI Tools we may use the BI Connector/Shell
provided by mongoDB or another import mechanism provided by our BI tool. Finally, in the
database administrator case we would interact with the mongoDB shell.
In all the above cases we would want to interact with the mongoDB server. In an application we
would typically want to be able to create, read, update or delete elements e.g. a blog post app.
With analytics, we would at least want to be able to read the data and as an admins we would
probably want to do all the CRUD actions.
CRUD are the only actions we would want to perform with our data i.e. to create it, manage it or
read it. We perform all these actions using the mongoDB server.

17

Diagram 1.5:
CREATE

UPDATE

insertOne(data, options)
insertMany(data, options)

updateOne(filter, data, options)
updateMany(filter, data, options)
replaceOne(filter, data, options)

READ

DELETE

findOne(filter, options)
find(filter, options)

deleteOne(filter, options)
deleteMany(filter, options)

The above are the four CRUD operations and the commands we can run for each action. In later
sections we will focus on each CRUD action individually to understand in-depth each of the actions
and syntax/command we can use when performing CRUD operation with our mongoDB data
collection and documents.

18

Understanding the Basics & CRUD Operations

Finding, Inserting, Updating & Deleting Elements
To show all the existing databases within the mongoDB server we use the command “show dbs”
while we use the use followed by the database name to switch to a database. The db will then relate
to the switched database.
To perform any CRUD operations, these commands must always be performed/executed on a
collection where you want to create/update/delete documents. Below are example snippets of
CRUD commands on a fictitious flights database (where the collection is called flightData).
$ db.flighData.insertOne( {distance: 1200} )
This will add a single document to the collection as we have seen previously.
$ db.flightData.deleteOne( {departureAirport: “LHR”} )
$ db.flightData.deleteMany( {departureAirport: “LHR”} )
The delete command takes in a filter. To add a filter we would use the curly brackets passing in
19

which name of key and key value of the data we wish to filter an delete. In the above example we
used the departureAirport key and the value of TXL. The deleteOne command will find the first
document in our database collection that meets the criteria and deletes it. The command will return:
{ “acknowledged” : true, “deleteCount” : 1 }
If a document was deleted in the collection this will show the number of deleted documents (the
deleteOne command will always return 1). If no documents matched the filter and none were
deleted the returned deleteCount value will be 0.
The deleteMany in contrast will delete many documents at once where the documents matches the
filter criteria specified.
Note: The easiest way to delete all data in a collection is to delete the collection itself.
$ db.flightData.updateOne( {distance: 1200}, { $set: {marker: “delete”} } )
$ db.flightData.updateMany( {distance: 1200}, { $set: {marker: “delete”} } )
The update command takes in 3 argument/parameters. The first is the filter which is similar to the
delete command. The second is how we would want to update/change the data. We must use the
20

{ $set: { } } keyword (anything with a $ dollar sign in front of the keyword is a reserved word in
mongoDB) which lets mongoDB know how we are describing the changes we want to make to a
document. If the update key:value does not exist, this will create a new key:value property within the
document else it will update the existing key:value with the new value passed in. The third
parameter is options which we will analyse in great detail in the latter sections.
Important Note: when passing in a filter we can also pass in empty curly brackets { } which will
select all documents within the collection.
If successful with updating many this will return within the terminal an acknowledgement as seen
below, where the number of matched the filter criteria and the number of data modified:
{ “acknowledged” : true, “matchedCount” : 2, “modified” : 2 }
If we were to delete all the documents within a collection and use the command to find data in that
collection i.e using the db.flightData.find().pretty() command, the terminal will return empty/nothing
as there are no existing documents to read/display.
The above demonstrates how we can find, insert, update and delete elements using the update and
delete command.
21

Now we have seen how we can use insertOne() to add a single document into our collection.
However, what if we want to add more than one document? We would use the inserMany()
command instead.
db.flightData.insertMany( [
{
“departureAirport”: “LHT”,
“arrivalAirport”: “TXL”
},
{
“departureAirport”: “MUC”,
“arrivalAirport”: “SFO”
}
])
We pass in an array of objects in order to add multiple documents into our database collection. The
square brackets is used to declare an array. The curly brackets declare a object and we must use
comma’s to separate each object. If successful, this will return acknowledged of true and the
insertdIds of each object/document added into the collection.
22

Important Note: mongoDB by default will create a unique id for each new document which is
assigned to a name of key called “_id” followed by a random generated key. When inserting a
object we could assign our own unique id using the _id key followed by a unique value. If we insert
a object and pass in our own _id key value and the value is not unique this will return a duplicate
key error collection in the terminal. We must always use a unique id for our documents and if we do
not specify a value for _id then mongoDB will generate one for us automatically.

Understanding the Basics & CRUD Operations

Diving Deeper Into Finding Data
Currently we have seen the .find() function used without passing any arguments for finding data
within a collection. This will retrieve all the data within the collection. Just as we would use filter to
specify a particular records or documents when deleting or updating a collection, we can also filter
when finding data.
We can pass a document into the find function which will be treated as a filter as seen in the
example below. This allows us to retrieve a subset of the data rather than the whole data within an
application.
db.flightData.find( {intercontinental : true } ).pretty()

23

We can also use logical queries to retrieve more than one document within a collection that
matches the criteria as demonstrated in the below example. We query using another object and
then one of the special operators in mongoDB.
db.flightData.find( {distance: {$gt: 1000 } } ).pretty()
In the above we are using the $gt: operator which is used for finding documents “greater than” the
value specified. If we were to use the findOne() operator this will return the first record within the
collection that matches the criteria.

Understanding the Basics & CRUD Operations

Update vs UpdateManay
Previously we have seen the updateOne() and updateMany() functions. However, we can also use
another update function called update() as seen in the example below:
db.flightData.update( { _id: ObjectId(“abc123”) }, { $set: { delayed: true } } )
The update() function works exactly like the updateMany() function where all matching documents
to the filter are updated. The difference between update() and updateMany() is that the $set:
24

operator is not required for the update() function whereas this will cause an error for either the
updateOne() and updateMany() functions. So we can write the above syntax like so and would not
get an error:
db.flightData.update( { _id: ObjectId(“abc123”) }, { delayed: true } )
The second and main difference is that the update function takes the new update object and
replaces the existing object (this does not affect the unique id) updating the document. It will only
patch the update object instead of replacing the whole existing object (just like the updateOne()
and updateMany() functions), if we were to use the $set: operator, otherwise it would override the
existing document.
This is something to be aware of when using the update() function. If we intend to replace the whole
existing document with a new object then we can omit the $set: operator. In general it is
recommended to use updateOne() and updateMany() to avoid this issue.
If, however, we want to replace a document we should use the replaceOne() function. Again, we
would place our filter and the object we want to replace with. This is a more explicit and more safer
way of replacing the data in a collection.
db.flightData.replaceOne( { id: ObjectId(“abc123”) }, { departureAirport: “LHT”, distance: 950 } )

25

Understanding the Basics & CRUD Operations

Understanding Find() & The Cursor Object
If we have a passengers collection which stores the name and age of passengers and we want to
retrieve all the documents within the passenger collection we can use the find() function as we have
seen previously.
db.passengers.find().pretty()
Useful Tip: when writing commands in the shell we can use the tab key to autocomplete for
example if we wrote db.passe and tab on our keyboard, this should auto-complete db.passengers.
We will notice where a collection has many data, the find() function will return all the data but
display all the data with the shell. If we scroll down to the last record we should see Type “it” for
more within the shell. If we type the command it and press enter, this will display more data from
the returned find() function. The find() command in general returns back what is called a Cursor
Object and not all of the data.
The find() does not give an array of all the documents within a collection. This makes sense as the
26

collection could be really large and if the find() was to return the whole array, imagine if a collection
had 2million documents — this could take a really long time but also send a lot of data over the
connection.
The Cursor Object is an object that has many meta data behind it that allows us to cycle through the
results, which is what the “it” command did. It used the Cursor Object to fetch the next group (cycle)
of data from the collection.
We can use other methods on the find() function such as toArray() which will exhaust the cursor i.e.
go through all of the cursors and fetch back all the documents within the array (i.e. not stopping
after the first 20 documents — a feature within the mongoDB shell).
db.passengers.find().toArray()
There is a forEach method that can also be used on the find() function. The forEach allows us to
write some code to do something on every element that is in the database. The syntax can be found
within the driver documents for whichever language we are using for our application e.g. PHP or
JavaScript etc. Below is a JavaScript function which the shell can also use:
db.passengers.find().forEach( (document) => { printjson(document) } )
27

The forEach function in JavaScript gets the document object passed in automatically into the arrow
function and we can call this whatever we want i.e. passengersData, data, x, etc. In the above we
called this document. We can then use this object and do whatever we want i.e. we used the
printjson() command to print/output the document data as JSON. The above will also return all the
documents within the collection because the forEach loops on every Cursor Object.
To conclude, the find() function does not provide us with all the documents in a collection even
though it may look like it in some circumstances where there are very little data within a collection.
Instead it returns a Cursor Object which we can cycle through the return more documents from the
collection. It is unto us as the developer to use the cursor to either force it to get all the documents
from a collection and place it in an array or better using the forEach or other methods to retrieve
more than 20 documents (the default number of items returned in the shell) from the collection.
Note the forEach is more efficient because it fetches/returns objects on demand through each
iteration rather than fetching all the data in advance and loaded into memory which saves both on
bandwidth and memory.
The Cursor Object is also the reason why we cannot use the .pretty() command on the findOne()
function because the findOne returns one document and not a Cursor Object. For Insert, Update
and Delete commands the Cursor Object does not exist because these methods do not fetch data,
they simply manipulate the data instead.
28

Understanding the Basics & CRUD Operations

Understanding Projections
In Database

In Application

{
“_id”: “…”,
“name”: “John”,
“age”: 35,
“Job”: “Teacher”

{
“name”: “John”,
“age”: 35,

Projection
}

}
Imagine in our database we have the data for a person record and in within our application we do
not need all the data from the document but only the name and age to display on our web
application. We could fetch all the data and filter/manipulate the data within our application in any
programming language. However, this approach will still have an impact on the bandwidth by
fetching unnecessary data — something we want to prevent. It is better to filter the data out from the
mongoDB server and this is exactly what projection allows us to do.
Below are examples of using projections to filter the necessary data to retrieve from our find query.
29

db.passengers.find( {}, {name: 1} ).pretty()
We need to pass in a first argument to filter the find search (note: a empty object will retrieve all
documents). The second argument allows us to project. A projection is setup by passing another
document but specifying which key:value pairs we want to retrieve back. The one means to include
it in the data returned to us.
The above will return all the passengers document but only the name and id, omitting the age from
the returned search results. The id is a special field in our data and by default it is always included.
To exclude the id from the returned results, we must explicitly exclude it. To exclude something
explicitly we would specify the name of key and set the value to zero as seen below:
db.passengers.find( {}, {name: 1, _id:0} ).pretty()
Note: we could do the same for age (e.g. age: 0), however, this is not required because the default
is everything but the _id is not included in the projection unless explicitly specified using the one.
The data transformation/filtering is occurring on the mongoDB server before the data is shipped to
us and is something that we would want because we do not want to retrieve unnecessary data
which will impact on the bandwidth.

30

Understanding the Basics & CRUD Operations

Embedded Documents & Arrays
Embedded documents is a core feature of mongoDB. Embedded documents allows us to nest
other documents within each other and having one overarching document in the collection.
There are two hard limits to nesting/embedded documents:
1. We can have up to 100 level of nesting (a hard limit) in mongoDB.
2. The overall document size has to be below 16mb
The size limit for documents may seem small but since we are only storing text and not files (we
would use file storage for files), 16mb is more than enough.
Along with embedded documents, another documents we can store are arrays and this is not
strictly linked to embedded documents, we can have arrays of embedded documents, but arrays
can hold any data. This means we have list of data in a document.
Below are examples of embedded documents and arrays.

31

db.flightData.updateMany( {}, {$set: {status: {description: “on-time”, lastUpdated: “1 hour ago”} } } )
In the above example we have added a new document property called status which has a
embedded/nested document of description and lastUpdated. If we output the document
using .find() function, the below document would now look something like the below:
{
“_id”: …
“departure”: “LHT”,
“arrivalAirport”: “TXL”,
“status”: {
“description”: “on-time”,
“lastUpdated” “1 hour ago”,
}
}
Note: we could add more nested child documents i.e. description could have a child nested
document called details and that child could have further nested child documents and so on.

32

db.passengers.updateOne( {name: “Albert Twostone”}, {$set: {hobbies: [“Cooking”, “Reading”] } } )
Arrays are marked with square brackets. Inside the array we can have any data, this could be
multiple documents (i.e. using the curly brackets {}), numbers, strings, booleans etc.
If we were to output the document using the .find() function, the document would look like
something below:
{
“_id”: …,
“name”: “Albert Twostone”,
“age”: 63,
“hobbies”: [
“Cooking”,
“Reading”
]
}
Albert Twostone will be the only person with hobbies and this will be a list of data. It is important to
note that hobbies is not a nested/embedded document but simply a list of data.

33

Understanding the Basics & CRUD Operations

Accessing Structured Data
To access structured data within a document we could use the following syntax:
db.passengers.findOne( {name: “Albert Twostone”} ).hobbies
We can specify the name of a structured data within a document by using the find query and then
using the name of key we wish to access from the document, in the above we wanted to access the
hobbies data which will return the hobbies array as the output:
[“Cooking”, “Reading”]
We can also search for all documents that have hobbies of Cooking using the syntax below as we
have seen previously. This will return the whole document entry where someone has Cooking as a
hobby. MongoDB is clever enough to look in arrays to find documents that match the criteria.
db.passengers( {hobbies: “Cooking”} ).pretty()
Below is an example of searching for objects (this includes searching within nested documents):
db.flightData.find( {“status.description”: “on-time”} ).pretty()

34

We use the dot notation to drill into our embedded documents to query our data. It is important
that we wrap the dot notation in quotations (e.g. “status.description”) otherwise the find() function
would fail.
This would return all documents (the whole document) where the drilled criteria matches. This
allows us to query by nested/embedded documents. We can drill as far as we need to using the dot
notation as seen in the example below:
db.flightData.find( {“status.details.responsible”: “John Doe”} ).pretty()
This dot notation is a very important syntax to understand as we would use this a lot to query our
data within our mongoDB database.

Understanding the Basics & CRUD Operations

Conclusion
We have now covered all the basic and core features of mongoDB to understand how mongoDB
works and how we can work with it i.e. store, update, delete and read data within the database as
well as how we can structure our data.

35

Understanding the Basics & CRUD Operations

Resetting The Database
To purge all the data within our mongoDB database server we would use the following command:
use databaseName
db.dropDatabase()
We must first switch to the database using the use command followed by the database name. Once
we have switched to the desired database we can reference the current database using db and then
call on the dropDatabase() command which will purge the specified database and its data.
Similarly, we could get rid of a single collection in a database using the following command:
db.myCollection.drop()
The myCollection should relate to the collection name.
These commands will allow us to clean our database server by removing the database/collections
that we do not want to keep on our mongoDB server.

36

Schemas & Relations: How to Structure Documents

Why Do We Use Schemas?
There is one important question to ask — wasn’t mongoDB all about having no data Schemas i.e.
Schema-less. To answer this question, mongoDB enforces no Schemas. Documents do not have to
use the same schema inside of one collection. Our documents can look like whatever we want it to
look like and we can have totally different documents in one and the same collection i.e. we can mix
different schemas.
Schemas are the structure of one document i.e. how does it look like, which fields does it have and
what types of value do these fields have. MongoDB des not enforce schemas; however, that does
not mean that we cannot use some kind of schema and in reality we would indeed have some form
of schema for our documents. It is in our interest if we were to build a backend database that we
have some form of structure to the types of documents we are storing. This would make it easier for
us to query our database and get the relevant data and then cycle through this data using a
programming language to display the relevant data within our application.
We are most likely to have some form of schemas because we as developers would want it and our
applications will need it. Whilst we are not forced to have a schema we would probably end up with
some kind of schema structure and this is important to understand.
37

Schemas & Relations: How to Structure Documents

Structuring Documents
Chaos

SQL World

Products

Products

Products

{

{

{

“title”: “Book”,
“price”: 12.99
}

“title”: “Book”,
“price”: 12.99
}

“name”:
“Bottle”,
“available”: true
}

}
Extra Data

Very different!

{

“title”: “Book”,
“price”: 12.99

{
“title”: “Bottle”,
“price”: 5.99,
“available”: true
}

Full Equality

{
“title”: “Bottle”,
“price”: 5.99,
}

We can use any of the structured approach in the diagram above depending on how we require it
in our applications. In reality we would tend to use the approach in the middle or on the right.
38

The middle approach used the best of both words where there are some structure to the data,
however, it also has the flexibility advantage that mongoDB provides us so that we can store extra
information.
Note: we can assign the null value to properties in order to have a structured approach although
the data may not have any actual values associated with the property. A null value is considered a
valid value and therefore we can use a SQL (structured) type approach with all our documents.
There is no single best practice with how to set the structure of our data within our documents and
it is up to us as developers to use the best structure that works best for our applications or
whichever is to our personal preference.

Schemas & Relations: How to Structure Documents

Data Types
Now that we understand that we are free to define our own schemas/structure for our documents,
we are now going to analyse the different data types we can use in mongoDB.
Data Types are the types of data we can save in the fields within our documents. The below table
break the different data types for us:
39

Type

Example Value

String

“John Doe”

Boolean

TRUE

NumberInt (int32)

55, 100, 145

NumberLong (int64)

10000000000

NumberDecimal

12.99

ObjectId

ObjectId(“123abc”)

ISODate

ISODate(“2019—02—09”)

Timestamp

Timestamp(11421532)

Embedded Documents

{“a”: {…}}

Arrays

{“b”: […]}

Notice how the text type requires quotation marks (single or double) around the value. There are no
limitation in the size of the text. The only limitation is the 16mb for the whole document. The larger
the text the larger the data it takes.
Notice how numbers and booleans do not require a quotation marks around the value.
There are different types of numbers in mongoDB. Integer (int32) are 32bit long numbers and if we
try to store a number longer than this they would overflow that range and we will end up with a
40

different number. For longer integer numbers we would use NumberLong (int64). The integer
solution we decide to choose will dictate how much space will be allocated and eaten up by the
data. Finally, we can also store NumberDecimal i.e. numbers with decimal values (a.k.a float in other
programming languages).
The default within the shell is to store a int64 floating point value but we also have a special type of
NumberDecimal provided by mongoDB to store high precision floating point values. Normal
floating point values (a.k.a doubles) are rounded and are not precise after their decimal place.
However, for many use cases the floating point (double) is enough precision required e.g. shop
store. If we are performing scientific calculations or something that requires a high precision
calculation, we are able to use the special type that offers this very high decimal place precision (i.e.
34 digits after the decimal place).
The ObjectId is a special value that is automatically generated by mongoDB to provide a unique id
but it also provides some temporal component that allows for sorting built into the ObjectId,
respecting a timestamp.
The above table provides all the data types within mongoDB that we can use to store data within
our database server.

41

Schemas & Relations: How to Structure Documents

Data Types & Limits
MongoDB has a couple of hard limits. The most important limitation: a single document in a
collection (including all embedded documents it might have) must be less than or equal to 16mb.
Additionally we may only have 100 levels of embedded documents.
We can read more on all the limitation (in great detail) on the below link:
https://docs.mongodb.com/manual/reference/limits/
For all the data types that mongoDB supports, we can find a detailed overview on the following link:
https://docs.mongodb.com/manual/reference/bson-types/
Important data type limits are:
Normal Integers (int32) can hold a maximum value of +-2,147,483,674
Long Integers (int64) can hold a maximum value of +-9,223,372,036,854,775,807
Text can be as long as we want — the limit is the 16mb restriction for the overall document.

42

It's also important to understand the difference between int32 (NumberInt), int64 (NumberLong)
and a normal number as you can enter it in the shell. The same goes for a normal double and
NumberDecimal.
NumberInt creates a int32 value => NumberInt(55)
NumberLong creates a in64 value => NumberLong(7489729384792)
If we just use a number within the shell for example insertOne( {a: 1} ), this will get added as a
normal double into the database. The reason for this is because the shell is based on JavaScript
which only knows float/double values and does not differ between integers and floats.
NumberDecimal creates a high precision double value => NumberDecimal(“12.99”)
This can be helpful for cases where we need (many) exact decimal places for calculations.
When working with mongoDB drivers for our application’s programming language (e.g. PHP, .NET,
Node.js, Python, etc.), we can use the driver to create these specific numbers. We should always
browse the API documents for the driver we are using within our applications to identify the
methods for building int32, int64 etc.
Finally we can use the db.stats() command in the mongoDB shell to see stats of our database.
43

Schemas & Relations: How to Structure Documents

How to Derive Our Data Structure Requirements
Below are some guidelines to keep to mind when we think about how to structure our data:
What data does our App need to generate? What is the business model?
User Information, Products Information, Orders etc. This will help define the fields we would need
(and how they relate).
Where do I need my data?
For example, if building a website do we need the data on the welcome page, products list page,
orders page etc. This help define our required collections and field groupings.
Which kind of data or information do we want to display?
For example the welcome page displays product names. This will help define which queries we
need i.e. do we need a list of products or a single product.
These queries we plan to have also have an impact on our collections and document structure.
MongoDB embraces the idea of planning our data structure based on the way we retrieve the data
so that we do not have to perform complex joins but we retrieve the data in the format or almost in
44

the format we need it in our application.
How often do we fetch the data?
Do we fetch data on every page reload, every second or not that often? This will help define
whether we should optimise for easy fetching of data.
How often do we write or change the data?
Do we change or write data often or rarely will help define whether we should optimise for easy
writing of data.
The above are things to keep in mind or to think about when structuring our data structures and
schemas.

Schemas & Relations: How to Structure Documents

Understanding Relations
Typically we would have multiple collections for example a users collection, a product collection
and a orders collections. If we have multiple collections that are relatable or where the documents
in these relations are related, we obviously have to think about how do we store related data.
45

Do we use embedded documents because this is one way of reflecting a relation or alternatively, do
we use references within our documents?

Nested/ Embedded Documents

References

Customers Collection
{

Customers Collection:
{

“userName”: “John”,
“age”: 28,
“address”: {
“street”: “First Street”,
“City”: “Chicago”
}

“userName”: “Alan”
“favBooks”: [“id1”, “id2”]
}
Books Collection:
{
“_id”: “id1”,
“name”: “Lord of the Rings”

}
}

In the reference example above, we would have to run two queries to join the data from the
different collections. However, if a book was to change, we would only update it in the books
collection as the id would remain the same whereas in a embedded document relation we would
have to update multiple customer records affected with the new change.

46

Schemas & Relations: How to Structure Documents

One to One Embedded Relation Example
Example:
One patient has one disease
summary, a disease summary
belongs to one patient.

Patient A

Summary A

Patient B

Summary B

Patient C

Summary C

Code snippet:
$ use hospital
$ db.patients.insertOne( { name: “John Doe”, age: “25”, diseaseSummary: { diseases: [“cold”,
“sickness”] } } )
Where there is a strong one to one relation between two data, it is ideal to use a one to one
embedded approach as demonstrated in the above example.
The advantage of the embedded nested approach is that within our application we only require a
single find query to fetch the necessary data for the patient and disease data from our database
collection.
47

Schemas & Relations: How to Structure Documents

One to One Reference Relation Example
Example:
One person has one car, a car
belongs to one person.
Code snippet:

Person A

Car 1

Person B

Car 2

Person C

Car 3

$ use.carData
$ db.persons.insertOne( { name: “John”, age: 30, salary: 30000 } )
$ db.cars.insertOne( { model: “BMW”, price: 25000, owner: ObjectId(“5b98d4654d01c”) } )
In most one to one relationships we would generally use the embedded document relations.
However, we can opt to use a reference relation approach as we are not forced to use one
approach.
For example, we have a more analytics use case rather than a web application and we have a use
case where we are interested in analysing the person data and or analysing our car data but not so
much in a relation. In this example we have a application driven reason for splitting the data.
48

Schemas & Relations: How to Structure Documents

One to Many Embedded Relation Example
Question Thread A

Answer 1
Answer 2

Question Thread B

Answer 1

Example:
One question thread has many answers, one answer belongs to one question thread.
Code snippet:
$ use support
$ db.questionThreads.insertOne( { creator: “John”, question: “How old are you?”, answers: [ { text: “I
am 30.” }, { text: “Same here.” } ] } )
A scenario where we may use a embedded one to many relation would be post and comments. This
is because you would often need to fetch the question along with the answers in an application
perspective. Also usually there are not too many answers to worry about the 16mb document limit.
49

Schemas & Relations: How to Structure Documents

One to Many Reference Relation Example
City A

Citizen 1
Citizen 2

City B

Citizen 1

Example:
One city has many citizens, one citizen belongs to one city.
Code snippet:
$ use cityData
$ db.cities.insertOne( { name: “New York City”, coordinates: { lat: 2121, lng: 5233 } } )
$ db.citizens.insertMany( [ { name: “John Doe”, cityId: ObjectId(“5b98d6b44d”) }, { name: “Bella
Lorenz”, cityId: cityId: ObjectId(“5b98d6b44d”) } ] )

50

In the above scenario we may have a database containing a collection of all major cities in the world
and a list of every single person living within that city. It would seem to make sense to have a one to
many embedded relationship, however, from an application prospective we may wish to only
retrieve the city data only. Furthermore, a city like New York may have over 1 million people data
and this would make fetching the data slow due to the volume of data passing through the wire.
Furthermore, we may end up running into the document size limit of 16mb. In this type of scenario,
it would make sense to split the data up and using the reference relation to link the data.
In the above we would only store the city metadata and will not store any citizen reference as this
will also end up being a huge list of citizens unique id. Instead, we would create a citizens collection
and within the citizens data we would make reference to the city reference. The reference can be
anything but must be unique ie. we could use the ObjectId() or the city name etc.
This will ensure that we do not exceed the limitation of the 16mb per document as well as not
retrieving unnecessary data if we are only interested in returning just the cities metadata from a
collection.

51

Schemas & Relations: How to Structure Documents

Many to Many Embedded Relation Example
Customer A

Product 1
Product 2

Customer B

Product 3

Example:
One customer has many products (via orders), a product belongs to many customers.
Code snippet:
$ use shop
$ db.products.insertOne( { title: “A Book”, price: 12.99 } )
$ db.customers.insertOne( { name: “Cathy”, age: 18, orders: [ { title: “A Book”, price: 12.99, quantity:
2}]})
We would normally model many to many relationships using references. However, it is possible to

52

use the embedded approach as seen above. We could store a collection for the products as meta
data for an application to retrieve the data in order to help populate the embedded document of
the customer collection using a programming language.
A disadvantage to the embedded approach is data duplication because we have the title and price
of the product within the orders array as the customer can order the product multiple times as well
as other customers which will cause a lot of duplication.
If we decide to change the data for the product, not only do we need to change it within the
product collection but we also have to change it on all the orders affected by this change (or do we
actually need to change old orders?). If we do not care about the product title changing and the
price changing i.e. we have an application that takes a snapshot of the data, we may not worry too
much about duplicating that data because we might not need to change it in all the places where
we have the duplicated the data if the original data changes — this highly depends on the
application we build. Therefore a embedded approach may work.
In other case scenarios where we absolutely need the latest data everywhere, a reference approach
may be most appropriate in a many to many relationship. It is important to think about how we
would fetch our data and how often do we want to change it and if we need to change it
everywhere or are duplicate data fine before deciding which approach to adopt for many to many.
53

Schemas & Relations: How to Structure Documents

Many to Many Reference Relation Example
Book A

Author 1
Author 2

Book B

Author 3

Example:
One book has many authors, an author belongs to many books.
Code snippet:
$ use bookRegistry
$ db.books.insert( { name: “favourite book”, authors: [ objectId(“5b98d9e4”), objectId(“5b98d9a7”) ]
})
$ db.authors.inserMany( [ { name: “Martin”, age: 42 }, { name: “Robert”, age: 56 } ] )
The above is an example of a many to many relation where a reference approach may be suitable
for a scenario where the data that changes needs to be reflected everywhere else.
54

Schemas & Relations: How to Structure Documents

Summarising Relations
We have now explored the different relation options that are available to use. This should provide
us enough knowledge to think about relations and when to use the most appropriate approach
depending on:
the application needs
how often data changes
if a snapshot data suffice
how large is the data (how much data do we have).
Nested/Embedded Documents — group data together logically. This makes it easier when fetching
the data. This approach is great for data that belong together and is not overlapping with other
data. We should always avoid super-deep nesting (100+ levels) or extremely long arrays (16mb size
limit per document).
References — split data across collections. This approach is great for related data but also shared
data as well as for data which is used in relations and standalone. This allows us to overcome
nesting and size limits (by creating new documents).
55

Schemas & Relations: How to Structure Documents

Using $lookup for Merging Reference Relations
MongoDB has a useful operation called $lookup that allows us to merge related documents that are
split up using the reference approach.
The image on the right provides a scenario

customers

books

{

{
userName: “John”
favBook: [“id1”, “id2”]

of a reference approach where the customer
and books have been split into two

}

_id: “id1”
name: “Harry Potter”
}

collections. The lookup operator is used as seen below. This uses the aggregate method which we
have not currently learned.
$ customer.aggregate( [
{ $lookup: { from: “books”, localField: “favBooks”, foreignField: “_id”, as: “favBookData” } }
])
The $lookup operator allows us to fetch two related documents merged together in one document
within one step (rather than having to perform two steps). This mitigates some of the disadvantages
of splitting our documents across multiple collections because we can merge them in one go.
56

This uses the aggregate method framework (which we will dive into in later chapters) and within the
aggregate we pass in an array because we can define multiple steps on how to aggregate the data.
For now we are only interested in one step (a step is a document we pass into an array) where we
pass the $lookup step. The lookup passes in a document as a value, where we define 4 attributes:
from — which other collection do we want to relate documents i.e. we would pass in the name
of the collection where the other document lives that we wish to merge.
localField — in the collection we are running the aggregate function on, where can the
reference to the other (from) collection be found in i.e. the key that stores the reference.
foreignField — which field are we relating to in our target collection (i.e. the from collection)
as — provide an alias for the merged data. This will become the new key which the merged data
will sit.
This is not an excuse to always using a reference relation approach because this costs more
performance than having an embedded document.
If we have a references or want to use a references, we have the lookup step in the aggregate
method that we can use to help get the data we need. This is a first look at aggregate and we will
explore what else the aggregate can do for us in later chapters.

57

Schemas & Relations: How to Structure Documents

Understanding Schema Validation
MongoDB is very flexible i.e. we can have totally different schemas and documents in one and the
same collection and that flexibility is a huge benefit. However, there are times where we would want
to lock down this flexibility and require a strict schema.
Schema validation allows mongoDB to validate the incoming data based on the schema that we
have defined and will either accept the incoming data for the write or update to the database or it
will reject the incoming data and the database is not changed by the new data and the user gets an
error.

validationLevel

validationAction

Which document get validated?

What happens if validation fails?

Strict

All inserts & updates

Error

Throw error and deny
insert/update

Moderate

All inserts & updates
to correct documents

Warn

Log warning but
proceed

58

Schemas & Relations: How to Structure Documents

Adding Collection Document Validation
To add schema validation in mongoDB and the easiest method is to add validation when we create
a new collection for the very first time explicitly (not implicitly when we add a new data). We can use
the createCollection to create and configure a new collection:
$ db.createCollection(“posts”, { validator: { $jsonSchema: { bsonType: “object”, required: [“title”,
“text”, “creator”, “comments”], properties: { title: { bsonType: “string”, description: “must be a string
and is required.” }, text: { bsonType: “string”, description: “must be a string and is required” },
creator: { bsonType: “objectId”, description: must be an objectId and is required }, comments:
{ bsonType: “array”, desription: “must be an array and is required”, items: { bsonType: “object”,
required: [“text,”], properties: { text: { bsonType: “string”, description: “must be a string and is
required” }, author: { bsonType: “objectId”, description: “must be an objectId and is required” } } } } } }
}})
The first argument to the createCollection method is the name of the collection i.e. we are defining
the name of the collection. The second argument is a document where we would configure the new
collection. The validator is an important piece of the configuration.

59

The validator key takes in another sub document where we can now define a schema against
incoming data where inserts and updates has to validated. We do this by inserting a $jsonSchema
key with another nested sub document which will hold the schema.
We can add a son type with the value of object, so that everything that gets added to the collection
should be a valid document or object. We can set a required key which has an array value. In this
array we can define names of fields in the document which will be part of the collection that are
absolutely required and if we try to add data that does not have these fields, we will get an error or
warning depending on our settings.
We can add a properties key which is another nested document where we can define how for every
property of every document that gets added to the collection will look like. In the example above
we defined the title property, which is a required property, in more detail. We can set the bsonType
which is the data type i.e. string, number, boolean, object, array etc. We can also set a description
for the data property.
Because an array and has multiple items, we can add an items key and describe how the items
should look like. We can nest this and this can have another nested required and properties keys for
the items objects that exists within the array.

60

So the Keys to remember are:
The bsonType key is the data type.
The required key is an array of required properties that must be within an insert/update document.
The properties key defines the properties. This has sub key:value of of bsonType and description.
The Item key defines the array items. This can have sub key:value of all the above.
Important Note: it may be difficult to read in the terminal and may be easier to write in a text editor
first and then paste into the terminal to execute the command. We can call the file validation.js to
save the collection validation configuration. Visual Studio/Atom/Sublime or any other text editor/
IDE will help with auto-formatting. Visual Studio has a option under code > Preference > Keyboard
Shortcuts and then you can search for a shortcut command such as format document (shortcut is
Shift + Option + F on a Mac).
We can now validate the incoming data when we explicitly create the new collection. We can copy
the command from the text editor and paste it back into the shell and run the command to create
the new collection with all our validation setup. This will return { “OK” : 1 } in the shell if the new
collection is successfully created.
If a new insert/update document fails the validation rules, the new document will not be added to
the collection.
61

Schemas & Relations: How to Structure Documents

Changing the Validation Action
As a database administrator we can run the following command:
$ db.runCommand( { collMod: “post”, validator: {…}, validationAction: “warn” } )
This allows us to run administrative commands in the shell. We pass a document with information
about the command we wish to run. For example, in the above we run a command called collMod
which stands for collection modifier, whereby we pass in the collection name and then we can pass
in the validator along with the whole schema.
We can amend the validator as we like i.e. add or remove validations. In the above we added
another administrative command after the validator document as a sibling called validationAction.
The validationLevel controls whether all inserts and updates are checked or only updates to
elements which were valid before. The validationAction on the other hand will either throw an
“error” and stope the insert/update action or “warn” of the error but allow the insert/update to
occur. The warn would have written a warning into our log file and the log file is stored on our
system. We can update the validation action later using the runCommand() method as seen above.

62

Schemas & Relations: How to Structure Documents

Conclusion
Things to consider when modelling and structuring our Data.
In which format will we fetch your data?
How does the application or data scientists need the data? We want to store the data in a
way that it is easy to fetch especially in a use case where we would fetch a lot.
How often will we fetch and change the data?
Do we need to optimise for writes or reads? It is often for reads but it may be different
depending on the scenario. If we write a lot then we want to avoid duplicates. If we read a lot
then maybe some duplicates are OK, provided these duplicates do not change often.
How much data will we save (and how big is it)?
If the data is huge, maybe embedding is not the best choice.
How is the data related (one to one, one to many, many to many)?
Will duplicate data hurt us (=> many Updates)?
Do we update our data a lot in which we have to update a lot of duplicates. Do we have
snapshot data where we do not care about updates to the most recent data.
Will we hit the MongoDB data/storage limit (embed 100 level deep and 16mb per
document)?
63

Modelling Schemas

Schema Validation

• Schemas should be modelled based on

• We can define rules to validate

application needs.
• Important factors are: read and write
frequencies, relations, amount (and size)
of data.

inserts and updates before writing
to the database.
• Choose the validation level and
action based on the application
requirements.

Modelling Relations
• Two options: embedded documents or references.
• Use embedded documents if we have one-to-one or one-to-many relationships and

there are no app or data size reasons to split the data.
• Use reference if data amount/size or app needs require it or for many-to-many
relations.
• Exceptions are always possible — keep the app requirements in mind!
Useful Articles & Documents:
https://docs.mongodb.com/manual/reference/limits/
https://docs.mongodb.com/manual/reference/bson-types/
https://docs.mongodb.com/manual/core/schema-validation/
64

Exploring The Shell & The Server

Setting dbpath & logpath
In the terminal we can run the following command to see all the available options for our mongoDB
server:
$ mongo --help
This command will provide a list of all the available options we can use to setup/configure our
mongoDB server. For example the --quiet option allows us to change the way things get logged or
output by the server.
Note: use the official document on the MongoDB website for more detailed explanation of all the
available options.
The --dbpath arg and --logpath arg allows us to configure where the data and log files gets stored
to because mongoDB writes our data to real files on our system. The logs allows us to see for
example warnings of json schema validation as we seen in the last section.
We can create folders such as db and logs (these can be named as anything we want) and have
65

these folders located anywhere we want for example we could create it within the mongoDB
directory which contains the bin folder and other related files.
If we start using mongod instance without any additional settings, it will use the root folder that has
a data/db folder to store all our database records as a default setting. However, we can use the
settings above to tell mongod to use another folder directory to store our data, the same is true for
our logs.
When we start the instance of our mongoDB server, we can run the following command and
passing in the options to declare the path of the dbpath and logpath as seen below:
Mac/Linux:
$ sudo mongod --dbpath /Users/userName/mongoDB/db
Windows command:
$ mongod --dbpath \Users\userName\mongoDB\db
Enter our password and this should bring up our mongoDB server as we have seen previously. We
should now see in the db folder, mongoDB has created a bunch of files as it is now saving the data
66

in the specified folder that we passed into our command. This is now using a totally different
database storage for writing all our data which is detached from the previous database storage of
the default database path. Running the following command will also work for our logs:
Mac/Linux:
$ sudo mongod --dbpath /Users/userName/mongoDB/db --logpath /Users/userName/mongoDB/
logs/logs.log
Windows command:
$ mongod --dbpath /Users/userName/mongoDB/db --logpath
\Users\userName\mongoDB\logs\logs.log
The logs folder path requires a log file which we would define with a .log extension. This will
automatically create and add a logs.log file within the directory path if the file does not exist when
we run the command. All the output in the terminal will now be logged in the logs.log file
compared to previously where it was logged in the terminal shell. This file can be reviewed for
persistent and auditing of our server and viewing any warnings/errors.
This is how we set custom paths for our database and log files.
67

Exploring The Shell & The Server

Exploring the mongoDB Options
If we explore the different options in mongoDB using the mongod --help command in the terminal,
there are many setup options available to us.
The WiredTiger options is related to our storage engine and we could either use the default
settings or change some configurations if we know what we are doing.
We have useful commands such as --repair which we could run if we have any issues connecting or
any warnings or issues related to our database files being corrupted. We could use the command
--directoryperdb which will store each database in its own separate directory folder.
We could change the storage engine using the —storageEngine arg command, which by default is
set to WiredTiger. Theoretically, mongoDB supports a variety of storage engines but WiredTiger is
the default high performance storage engine. Unless, we know what we are doing and have a
strong reason to change the engine, we should stick to the default.
There are other settings in regards to security which we will touch in the latter chapters.
68

Exploring The Shell & The Server

MongoDB as a Background Service
In the mongoDB options, there is an option called --fork which can only run on Mac and Linux.
$ mongod --fork --logpath /Users/userName/mongoDB/logs/logs.log/
The above fork command will error if we do not pass in a logpath to the log file. This command will
start the mongoDB server as a child process. This does not block the terminal and we can continue
to type in other commands in the same terminal with the server running. The server is now running
as a background process instead of a foreground process which usually blocks the terminal
window. In other words the mongoDB server is now running as a service (a service in the
background). Therefore, in the same terminal we could run the mongo command to connect to the
background mongoDB server service. This is also the reason why we require to pass in a logpath
because the service is running in the background and it cannot log error/warning messages in the
terminal, instead it will use/write the warning and errors in the log file.
On Windows, the fork command in unavailable. However, on Windows we can still startup
mongoDB server as a service if we checked this option at the installation process. If we right click on
command prompt and run as administrator, we can run the following command:

69

$ net start MongoDB
This will start up the mongoDB server as a background service. The question then becomes, how do
we stop such a service?
On Mac we can stop the service by connecting to the server with the mongo shell and then
switching to the admin database and running the shutdown server command to shut down the
server we are connected to. Example commands below:
$ use admin
$ db.shutdownServer()
The exact same approach as the above will work on Windows. On Windows we also have an
alternative method by opening the command prompt as administrator and running the following
command:
$ net stop MongoDB
This is how we can use MongoDB server as a background service (instead of a foreground service)
on either Mac, Linux or Windows.

70

Exploring The Shell & The Server

Using a Config File
Now that we have seen the various options we can set and use to run our mongoDB server, it is also
worth noting that we can save our settings in a configuration file.
https://docs.mongodb.com/manual/reference/configuration-options/
This file could be automatically created for us when we run our mongoDB server, else we could
create the config file ourselves and save this anywhere we want. We could create the config file
within the Users/userName/MongoDB/bin folder using a text editor such as VS Code to add the
configuration code:
storage:
dbPath: “/Users/userName/mongoDB/db/“
systemLog:
destination: file
path: “/Users/userName/mongoDB/logs/logs.log/“
We can look at the documents or google search for more comprehensive con gif file setup.
71

Once we have the config file setup, how do we use the config file when we run an instance of the
mongoDB server? MongoDB does not automatically pickup this file when we start to run the
mongoDB server, instead when starting mongoDB we can use the following command to specify
the config file the server should use:
$ sudo mongod --config /Users/userName/mongoDB/bin/mongod.cfg
$ sudo mongod -f /Users/userName/mongoDB/bin/mongod.cfg
Either above command will prompt mongoDB to use the config file from the path specified. This will
start the mongoDB server with the settings setup in the configuration file. This is a useful feature
because it allows us to save a snapshot of our settings (reusable blueprint) in a separate file which
we can always use when starting up our mongoDB server. This also saves us time on writing a very
long command prompt with all our settings when starting up our mongoDB server each time.
Important Note: we could use either .cfg or .conf as the file extension name when creating the
mongoDB configuration file.

72

Exploring The Shell & The Server

Shell Options & Help
In this section we will go over the various shell options available to for us to use. Similar to the
mongoDB server, there is a help option for the mongoDB shell:
$ mongo --help
This will provide all the command options for the shell. This has less options compared to the server
because the shell is just a connecting client at the end of the day and not a server. We can use the
shel without connecting to a database (if we just want to run javascript code) using the --nodb
command, or we could use the --quiet command to have less output information in the terminal, we
can define the port and host for the server using the --port arg and --host arg commands(by default
it uses localhost:27017) and many more other options.
We can also add Authentication Options informations which we will learn more in later chapters.
In the shell we also have another command we can run:
$ help

73

This command will output a shortlist of some important help information/commands we can
execute in the shell. We can also dive deeper into the help by running the help command followed
by the command we want further help on, for example:
$ help admin
This will show further useful commands that we can execute when using the admin command e.g.
admin.hostname() or admin.pwd() etc.
We can also have help displayed for a given database or collection in a database. For example:
$ use test
$ db.help()
We would now see all the commands that we did not see before that we can use on the new “test”
database. We can also get help on the collection level which will provide a list of all th commands
we can execute at the collection level.
$ db.testCollection.help()
Useful Links:
https://docs.mongodb.com/manual/reference/configuration-options/
https://docs.mongodb.com/manual/reference/program/mongo/
https://docs.mongodb.com/manual/reference/program/mongod/
74

Using the MongoDB Compass to Explore Data Visually

Exploring MondoDB Compass
We can download MongoDB Compass from the below link:
https://www.mongodb.com/products/compass
This is a GUI tool to interact with our MongoDB database. Once downloaded and installed on our
machines we are ready to use the GUI tool. It is important to have the mongod server running in the
background when we open the MongoDB Compass to connect to the database. We would connect
to a Host and this by default will have localhost and port 27017. We can click connect and this will
connect the GUI tool to the mongod server. We should be able to see the 3 default databases of
admin, config and local.
We can now use the GUI tool to create a new database and collection name. Once a database and
collection has been created we can then insert documents to the collection. We can also query our
database documents.
We can now start using a GUI tool to interact with our database, collections and documents.
Note: it is best practice to learn how to use the shell first before using GUI tools.
75

Diving Into Create Operation

Understanding insert() Methods
We already understand that there are two methods for inserting documents into mongoDB which
are insertOne() and insertMany() as an alternative. The most important thing to note is that
insertOne() takes in a single document and we can but do not need to specify an id because we will
get one automatically. The insertMany() does the same but with an array (list) of documents.
There is also a third alternative method for inserting documents called insert() — below is an
example:
$ db.collectionName.insert()
This command is more flexible because it takes both a single document or an array of documents.
Insert was used in the past but insertOne and insertMany was introduced on purpose so that we are
more clear about what we will be inserting. Previously, in application code it was difficult to tell with
the insert command whether the application was inserting a single or multiple documents and
therefore may have been error prone.
There is also an importing data command as seen below:
$ mongoimport -d cars -c carsList --drop —jsonArray
76

The insert method can still be used in mongoDB but it is not recommended. The insert() method
works with both a single document and multiple documents as seen in the examples below:
$ db.persons.insert( { name: “Annie”, age: 20 } )
$ db.persons.insert( [ { name: “Barbara”, age: 45 }, { name: “Carl”, age: 65 } ] )
The output message in the terminal is also slightly different i.e. we would receive a text of:
$ WriteResult( { “nInserted” : 1 } )
$ BulkWriteResult( { “writeErrors”: [], “writeConcernErrors”: [], “nInserted”: 2, “nUpserted”: 0,
“nMatched”: 0, “nModified”: 0, “nRemoved”: 0, “upserted”: [] } )
The above does not mean that the inserted document did not get an autogenerated id. The insert
method will automatically create an ObjectId but will not display the ObjectId unlike the insertOne
and insertMany commands output messages which does display the ObjectId. We can see the
advantages of insertOne and InsertMany as the output message is a little more meaningful/helpful
as we can immediately work with the document using the ObjectId provided (i.e. we do not need to
query the database to get the new document id).

77

Diving Into Create Operation

Working With Ordered Inserts
When inserting documents we can define or specify some additional information. Lets look at an
example of a hobbies collection where we keep track of all the hobbies people could possibly have
when we insert many hobbies. Each hobby is a document with the name of the hobby:
$ db.hobbies.inertMany( [ { _id: “sports”, “name”: “Sports” }, { _id: “cooking”, “name”: “Cooking” },
{ _id: “cars”, “name”: “Cars” } ] )
The id’s for these hobbies can be auto-generated. However, there may be times when we want to
use our own id because the data may have been fetched from some other database where we
already have an existing id associated or maybe we need a shorter id. We can use _id and assign a
value for the id. In the above the hobby name could act as a good id because each hobby will be
unique. We must use _id and not just id if we want to set our own id for our documents.
Furthermore, the id must be unique else this would not work. We will no longer see an ObjectId()
for these documents as we have used the _id as the unique identifier for the documents inserted.
If we try to insert a document with the same id we would receive an error message in the terminal
referencing the index number (mongoDB uses zero indexing) of the document that failed the insert
operation along with a description of duplicate key error.
78

$ db.hobbies.inertMany( [ { _id: “yoga”, “name”: “Yoga” }, { _id: “cooking”, “name”: “Cooking” }, { _id:
“hiking”, “name”: “Hiking” } ] )
The above would fail due to the duplicate key error of cooking which was inserted previously in the
above command. However, we would notice on the first item in the insertMany array i.e. Yogo will
be inserted into the hobbies collectio, but the cooking and hiking documents will not be inserted
into the collection due to the error. This is the default behaviour of mongoDB and this is called an
ordered insert.
An ordered insert simply means that every element we insert is processed standalone, but if one
fails, it cancels the entire insert operation but does not rollback the elements it has already inserted.
This is important to note because it cancels the operation and does not continue to the next
document (element i.e. hoking) which we would have known that it would have succeeded insert.
Often we would want this default behaviour, however, sometimes we do not. In these cases, we
could override the behaviour. We would pass in a second argument, separated by a comma, to the
insertMany command which is a document. This is a document that configures the insertMany
operation.
$ db.hobbies.inertMany( [ { _id: “yoga”, “name”: “Yoga” }, { _id: “cooking”, “name”: “Cooking” }, { _id:
“hiking”, “name”: “Hiking” } ], { ordered: false} )
79

The ordered option allows us to specify whether mongoDB should perform an ordered insert which
is the default (we could set this ordered option to true which is redundant because this is the default
option) or we could set this option to false which will make the insert operation not an ordered
insert i.e. an unordered insert.
If we hit enter, we would still get a list of all the error, however, it will continue to the next document
to perform the insert operation and this would insert the document that does not have any issues of
duplicate keys i.e. hiking will now be inserted into the hobbies collection (yoga and cooking would
fail due to the duplicate key issue).
By setting the ordered to false, we have changed the default behaviour and it is up to us to decide
what we require or want in our application. It is important to note that this will not rollback the entire
insert operation if something failed. This is something we will cover in the Transactions chapter. We
can control whether the operation continues with the other documents and tries to insert everything
that is perfectly fine.
We may use an unordered insert where we do not have much control with what is inserted into the
database but we do not care about any document that fail because they already exist in the
database. We could add everything that is not in the database.

80

Diving Into Create Operation

Understanding the writeConcern
There is a second option we can specify on insertOne and insertMany which is the writeConcern
option. We have a client (either the shell or the application using a mongoDB server) and we have
our mongoDB server. If we wanted to insert one document in our mongoDB server, on the
mongoDB server we have a so called storage engine which is responsible for really writing our data
onto the disk and also for managing it in memory. So our write might first end up in memory and
there it manages the data which it needs to access with high frequency because memory is faster
than working with the disk. The write is also scheduled to then end up on the disk, so it will
eventually store data on the disk. This is true for all write operations i.e. insertMany and update.
We can configure a so-called writeConcern on all the write operations with an additional argument,
the writeConcern which is another document where we can set settings.

Client (e.g. Shell)

MongoDB Server (mongod)

e.g. insertOne()

Memory
{ w:1, j: undefined }

81

Storage Engine

Data on Disk

Journal

The w: (default) option tells the mongoDB server of how many instances we want the write to be
acknowledged. The j: option stands for journal which is an additional file which the storage engine
manages, which is like a To-Do file. The journal can be kept to then for example perform save
operations that the storage engine needs to do but have not been completed yet.
The storage engine is aware of the write and that it needs to store the data on disk just by having
the write being acknowledged and being in memory. The idea behind a journal file is to make the
storage engine aware of this and if the server should go down for some reason or anything else
should happen, the journal file is there. If we restart the server or if the server recovers, the server
can look to this file and see what it needs to do. This is a nice backup because the memory might
have been wiped by then. The journal acts as a backup to-do list for the storage engine.
Writing into the database files is more performance heavy whereas a journal is like a single line
which describes the write operations. Writing into the database is of course a more complex task
because we need to find the correct position to insert the data and if we have indexes we also need
to update these as well and therefore takes longer to perform. Writing in a to-do type list is much
quicker.
We can set the j: true as an option which will now report a success for a write operation when it has
been acknowledged and has been saved to the journal. This will provide a greater security.
82

There is a third option to the writeConcern which is the wtimeout: option. This simply sets the
timeframe that we give out server to report a success for the write before we cancel it. For example,
if we have some issues with the server connection or anything of that nature, we may simply
timeout.
If we set the timeout value to a very low number, we may get more fails even though there is no
actual problem, just some small latency.

{ w:1, j: undefined }

{ w:1, j: true }

{ w:1, timeout: 200, j: true }

This is the writeConcern option we can add to our write operations and how we can control this
using the above document settings. Enabling the journal would mean that our writes will take
longer because we do not only tell the server about the write operation but we also need to wait for
the server to store the write operation in the journal, however, we get higher security that the write
also succeeded. These option will again depend on our application needs.

83

Diving Into Create Operation

The writeConcern in Practice
Below is an example of using the writeConcern:
$ db.persons.insertOne( { name: “Alan”, age: 44 }, { writeConcern: { w: 1, j: true } } )
The w:1 (default) simply means to make sure the server acknowledged the write operation. Note we
could set this value to 0 which will return {“acknowledged”: false} in the terminal when we insert the
document. This option sends the request and immediately returns without waiting for a response of
the request from the server. The storage engine had no chance of storing it in memory and to
generate an objectId, hence why we receive {“acknowledged”: false} in the terminal. This makes the
write super fast because we do not have to wait for any response but we do not know where the
write succeeded or not.
The journal by default is set to undefined or false. We can set this option to j: true. The output in the
terminal does not change. The write will be slightly slower (note if playing around locally we would
not notice any change in speed) because the engine would add the write to the journal and we
would have to wait for that to finish before the operation is completed. This will provide a higher
security by ensuring the write appears in the to-do list of the storage engine which will eventually
84

lead to the write operation occurring on the database.
Finally, the wtimeout: option is used to set a timeout operation. This allows us to set a time frame for
the write operation so in the case where within a certain period within the year we would have shaky
connections, we would rather have the write operation fail and we recognise it in our client
application (we would have access to the error) and therefore try again at a later time without
having to wait unnecessarily for the write operation where we have shaky connections.

Diving Into Create Operation

What is Atomicity?
Atomicity is a very important concept to any write operation. Most of the time the write operation
e.g. InsertOne() would succeed, however, it can fail (there can be an error). These are errors that
occur whilst the document is being inserted/written to memory (i.e. whilst being handled by the
storage engine). For example, we were writing a document for a person including name, age and an
array of hobbies, the name and age were written but then the server had issues and was not able to
write the hobbies to memory. MongoDB protects us against this as it guarantees us an atomic
transaction. This means the transaction either succeeds as a whole or it fails as a whole. If it fails
during the write, everything is rolled back for the document we inserted.

85

This is important as it is on a per document level. The document means the top level document, so
it includes all embedded documents and all arrays.
If we insertMany() where there are multiple documents being inserted into the database and the
server fails during a write, we do not get atomicity because it only works at a document level. If we
have multiple documents in one operation like the insertMany() operation, then only each
document on its own is guaranteed to either fail or succeed but not on insertMany. Therefore, if we
have issues during the insertMany operation, only the documents that failed are not inserted and
then the exact behaviour will depend on whether we used ordered or unordered inserts but the
document already inserted will not be rolled back.
We are able to control this on a bulk insert or bulk update level using a concept called transactions
which we will look at in a later section as it requires some additional knowledge about mongoDB
and how the service works.

Success

Saved as a whole

Error

Rolled back (i.e Nothing is saved)

Operation (e.g. InsertOne())

MongoDB CRUD Operations are Atomic on the document level (including Embedded documents)

86

Diving Into Create Operation

Importing Data
To import data into our database, we must first exit the shell by pressing the control + c keys on our
keyboard.
In the normal terminal, we need to navigate to a folder that contains the JSON file that we would
want to import (JSON files can be imported) using the cd command. We can use the ls command to
view the list of items within the directory we are currently in.
Once navigated to the folder containing the import file, we can run the following command:
$ mongoimport tv-shows.json -d moviesData -c movies --jsonArray --drop
The mongoimport command should be globally available since we added the path to our mongo
binary to our path variables on our operating systems. If we did not do this, we need to navigate
into the folder where our mongoDB binaries are in order to execute the mongoimport command
above.

87

The first argument we pass is the name of the file we want to import (if we are not in the path of the
located file we would have to specify the full folder path along with the file name). We then specify
the database we want to import the data into using the -d flag. We can also specify the collection by
using the -c flag.
If the JSON file holds array of documents we must also specify the --jsonArray flag to make the
mongoimport command aware of this fact about the import data.
The last argument option we can add to the import command is the --drop which will tell the
mongoimport that should this collection should already exist, it will be dropped and then re-added,
otherwise it will append the data to the existing collection.
Important Note: the mongod server should be running in the background when we use the import
command. When we press enter to execute the command, this will return in the terminal the
connected to: localhost, dropping: moviesData.movies and imported: # documents in the terminal
as a response to inform us which mongoDB server it is connected to, whether a collection was
dropped/deleted from the database collection and the total number of data imported into the
database collection.

88

Diving Into Read Operation

Methods, Filters & Operators
In the shell, we access the database with the db command (this will differ slightly in a mongoDB
drive). We would get access to a database and then to a collection in the database. Now we can
execute a method like find, insert, update or delete on the collection. We would pass some data
into the method as parameters/arguments for the method. These are usually a key:value pair where
one is the field and the other is the value for that field name (documents are all about field and
values or key and values).
Apply this
Access current
database

db

.

myCollection
Access this
collection

Method

.

find(

Equality/Single value
Filter

{ age:
Field

32 }

)

Value

The argument in the above example happens to also be a filter because the find method accepts a
filter. It can use a filter to narrow down the set of documents it returns to us. In the above we have a
equality or single value filter where the data is exactly the criteria i.e. equality.

89

We can also use more complex filters as seen in the below example. We have a document which
has a field and its value is another document which has an operator as a filed followed by a value.

db

.

myCollection

.

Apply this

Range

Method

Filter

find(

{ age:

{

Field

$gt
Operator

:

30

}

}

)

Value

We can recognise operators by the dollar sign $ at the beginning of the operator. These are all
reserved fields which are understood by mongoldb. The operator in the above example is called a
range filter because it does not just filter for equality, instead this will look for all documents that
have an age that is greater than ($gt) the value i.e. 30.
this is how the Read operator works and we will look at various different operators an the different
ways of using them and the different ways of filtering the data that is returned to us. This is the
structure we should familiarise ourselves with for all of our Read operations.

90

Diving Into Read Operation

Operators and Overview
There are different operators that we can differentiate into two groups:
Query Selectors
Projection Operators
Query selectors such as $gt allows us to narrow down the set of documents we retrieve while
projection operators allows us to transform/change the data we get back to some extent. Both the
Query and Projection operators are Read related operators.
Aggregation allows us to read from a database but also perform more complex transforms. This
concept allows us to setup pipeline of stages to funnel our data through and we have a few
operators that allow us to shape the data we get back to the form we need in our application.
Pipeline Stages
Pipeline Operators
The Update has operators for the fields and arrays. Inserts have no operators and deletes uses the
same operators as the Read operators.
91

How do operators impact our data?

Type

Purpose

Change Data?

Example

Query Operator

Locate Data

No

$eq $gt

Project Operator

Modify data
Presentation

No

$

Update Operator

Modify & Add
additional data

Yes

$inc

Diving Into Read Operation

Query Selectors & Projection Operators
There are a couple of categories for Query Selectors:
Comparison, Logical, Element, Evaluation, Array, Comments & Geospatial.
Projections Operators we have:
$, $elemMatc, $meta & $slice

92

Diving Into Read Operation

Understanding findOne() and find()
The findOne() method finds exactly one document. We are able to pass in a filter into the method to
define which one document to return back. This will find the first matching document.
$ db.movies.findOne( )
$ db.movies.findOne( { } )
Both of the above syntax will return the first document within the database collection. Note, this
does not return a cursor as it only returns one document.
The alternative to findOne() is the find() method. The find() method will return back a cursor. This
method theoretically returns one document, but since it provides us a cursor, it does not give us all
the document but the first 20 documents within the shell.
$ db.movies.find( )
To narrow the find search we would need to provide a filter. To provide a filter we would pass in a
document as the first argument (this is true for both find and findOne methods). The difference
would be that findOne will return the first document that meets the criteria while the find method
93

will return all documents that meet the criteria.
$ db.movies.findOne( { name: “The Last Ship” } )
$ db.movies.findOne( { runtime: 60 } )
$ db.movies.find( { name: “The Last Ship” } )
$ db.movies.find( { runtime: 60 } )
To filter the data, we would specify the name of the field/key followed by the value we are expecting
to filter the field by. In the above example we are filtering the name of the movie to be “The Last
Ship”. By default mongoDB will try to find the filter by equality.
This is the difference between find and findOne and how we would pass in a filter to narrow down
the return read results. It is important to note that there are way more operators and ways to filtering
our queries to narrow down our Read results when using either of the find commands.

Diving Into Read Operation

Working with Comparison Operators
In the official documentation we can view all the various operations available to us:
https://docs.mongodb.com/manual/reference/operator/query/
94

We will explore some of the comparative operators in the below examples:
$ db.movies.find( { runtime: 60 } ).pretty( )
$ db.movies.find( { runtime: { $eq: 60 } } ).pretty( )
The $eq operator is the exact same as the default equality query which will find the document that
matches equally to the query value which in the above case is runtime = 60.
$ db.movies.find( { runtime: { $ne: 60 } } ).pretty( )
This will return all documents that have a runtime not equal to 60.
$ db.movies.find( { runtime: { $gt: 40 } } ).pretty( )
$ db.movies.find( { runtime: { $gte: 40 } } ).pretty( )
The $gt return all documents that have a runtime greater than to 40 while the $gte returns greater
than or equal to 40.
$ db.movies.find( { runtime: { $lt: 40 } } ).pretty( )
$ db.movies.find( { runtime: { $lte: 40 } } ).pretty( )
The $lt return all documents that have a runtime less than to 40 while the $lte returns less than or
equal to 40.
95

Diving Into Read Operation

Querying Embedded Fields & Arrays
We are not limited to querying top level fields and are also able to query embedded fields and
arrays. To query embedded fields and arrays is quite simple as demonstrated below:
$ db.movies.find( { “rating.average”: { $gt: 7 } } ).pretty( )
We specify the path to the field that we are interested in querying the data. We must put the path
within quotations marks because we use the dot (which will invalidate the syntax) to detail each
embedded field within the path that leads to the field we are interested in. The above example is a
single level embedded document, if we wrote e.g. rating.total.average, this is a 2 level embedded
document. We can make the path as deep as we need it to be and we are not limited to one level.
We can also query arrays as seen below:
$ db.movies.find( { genres: “Drama” } ).pretty( )
$ db.movies.find( { genres: [“Drama”] } ).pretty( )
The casing is important in the query. This will return all genres thats has Drama included in it.
Equality in an array does not mean that Drama is the only item within the array; it means that Drama
exists within the array. If we wanted an exactly only Drama within the array we would use square
brackets. We can also use dot to go down embed level paths that has an array.
96

Diving Into Read Operation

Understanding $in and $nin
If there are two discrete values that we wish to check/query our data, for example runtime that is
either 30 and/or 42, we can use the $in operator. The $in operator takes in an array which holds all
the values that will be accepted to be values within the key/field.
The below example return data that have a runtime equal to 30 or 42.
$ db.movies.find( { runtime: { $in: [ 30, 42 ] } } ).pretty( )
The $nin on the other hand is the opposite to $in operator. It finds everything where the value is not
within the set of values defined in the square brackets. The below example returns all entries where
the runtime is not equal to 30 or 42.
$ db.movies.find( { runtime: { $nin: [ 30, 42 ] } } ).pretty( )
We have now explore all the Comparison operators within mongoDB and will continue to now look
at logical query operators such as $and, $not, $nor and $or in the next section.

97

Diving Into Read Operation

Understanding $or and $nor
There are four different logical operators and these are $and, $not, $nor and $or operators. We
would probably use the $or logical operator more compared to the other logical operators. Below
is an example of the $or and $nor operator in action.
$ db.movies.find( { $or: [ {“rating.average”: { $lt: 5 } }, {“rating.average”: { $gt: 9.3 } } ] } ).pretty( )
$ db.movies.find( { $or: [ {“rating.average”: { $lt: 5 } }, {“rating.average”: { $gt: 9.3 } } ] } ).count( )
We start the filter with the $or to tell mongoDB that we have multiple conditions and then add an
array which will hold all the conditions that mongoDB will check. The or logical condition means
that it will return results that match any of these conditions. We would specify our filters as we would
normally would do within our find but now held within the $or array. We can have many expressions
as we want within the $or array, in the above we have two conditions. Note: if we change pretty( ) for
count( ), this will return the total number of documents that meet the criteria rather than the
document itself.
$ db.movies.find( { $nor: [ {“rating.average”: { $lt: 5 } }, {“rating.average”: { $gt: 9.3 } } ] } ).pretty( )
The $nor operator is the opposite/inverse of the $or operator. It returns where neither of the
conditions are met i.e neither conditions are true the complete opposite.
98

Diving Into Read Operation

Understanding $and Operator
The syntax for the $and operator is similar to the $or and $nor operator. The array of documents
acts as the logical conditions and will return all documents where all conditions are met. We can
have as many conditions as we want. Below is an example of the $and logical operator:
$ db.movies.find( { $and: [ { “rating.average”: {$gt: 9} }, { genres: “Drama” } ] } ).pretty( )
In this example, we are trying to find all documents that are Drama with a high rating that is greater
than 9. This is the old syntax and there is now a shorter syntax as seen below:
$ db.movies.find( { { “rating.average”: {$gt: 9}, genres: “Drama” } ).pretty( )
The new shorter syntax does not require the $and operator, instead we use a single document and
write our conditions separating each condition with a comma. By default, mongoDB ands all key
fields that we add to the filtered document. The $and is the default concatenation for mongoDB.
The reason why we have the $and operator is because not all drivers accepts the above syntax.
Furthermore, the above shorthand syntax would return a different result when we filter on the same
key elements.
99

If we examine the two syntax below, we would notice that they both return a different result.
$ db.movies.find( { $and: [ { genre:”Drama” }, { genres: “Horror” } ] } ).count( )
$ db.movies.find( { { genre: “Drama", genres: “Horror” } ).count( )
When we use the second syntax, mongoDB replaces the first condition with the second and
therefore it is the same as filtering for Horror genre only.
$ db.movies.find( { { genre: “Drama", genres: “Horror” } ).count( )
$ db.movies.find( { genres: “Horror” } ).count( )
Therefore, in the scenario where we are looking for both the Drama and Horror values from the
genre key element, it is recommended to use the $and operator for mongoDB to look for both
conditions to return true.

100

Diving Into Read Operation

Understanding $not Operator
The $not operator inverts the effect of a query operator. For example if we query to find movies that
do not have a runtime of 60minutes as seen in the below syntax.
$ db.movies.find( { runtime: { $not: { $eq: 60 } } } ).count( )
The $not operator is less likely to be use as it can be achieved using much simpler alternatives for
example we can use the not equal operator or $nor operator:
$ db.movies.find( { runtime: { $ne: 60 } } ).count( )
$ db.movies.find( {$nor [ { runtime: { $eq: 60 } } ] } ).count( )
There are a lot of ways for querying the inverse, however, where we cannot just simply inverse the
query in another way, we have the $not which we can use to look for the opposite.
We have now examined all four of the logical operators available within mongoDB that we can use
as filters for our Read operations.

101

Diving Into Read Operation

Element Operators
There are two types of element operators which are $exists and $type. This allows us to query by
elements within our database collection.
$ db.users.find( { age: { $exists: true } } ).pretty( )
This will check within our database and return all results where the document contains an age
element/field. Alternatively, we could have queried $exists to be false in order to retrieve all
documents that do not have age as an element/field.
We can query the $exists operator with other operators. In the below example we are filtering by
the age element to exist and age is greater than or equal to 30:
$ db.users.find( { age: { $exists: true, $gte: 30 } } ).pretty( )
To search for a field that exists but also has a value in the field, we would query as seen below:
$ db.users.find( { age: { $exists: true, $ne: null } } ).pretty( )
The $type operator on the other hand, as the name would suggests, returns the document that have
102

The specified field element of the specified data type.
$ db.users.find( { phone: {$type: “number”} } ).pretty( )
The example above returns documents where the phone field element has values of the data type
number. Number is an alias that basically sums up floats and integers. If we searched for the type of
double this would also return back a document even if there are no decimal places. Since the shell
is based on JavaScript, by default, a number inserted into the database will be stored as a floating
point number/double because JavaScript which drives the shell does not know the difference
between integers and doubles as it only knows doubles. The shell takes the number and stores it as
a double even though if we have no decimal places. This is the reason why we could also search by
the type of double and retrieve the documents as well.
We can also specify multiple types by passing an array. The below will look for both data types of a
double and a string and return documents that match the filter condition:
$ db.users.find( { phone: {$type: [ “double”, “string” ] } } ).pretty( )
We can use the type operator to ensure that we only work with the right type of data when returning
some documents.

103

Diving Into Read Operation

Understanding Evaluation Operators - $regex
The $regex operator allows us to search for text. This type of query is not super performant. Regex
stands for regular expression and it allows us to search for certain text based on certain patterns.
Regular expressions is a huge complex topic on its own and is something not covered deeply within
this mongoDB guide. Below is an example of using a simple $regex operator.
$ db.movies.find( { summary: { $regex: /musical/ } } ).pretty( )
In this example the query will look at all the summary key field values to find the word musical
contained in the value and return all matching results.
Regex is very useful for searching for text based on a pattern, however, it is not the most efficient/
performant way of searching/retrieving data (the text index may be a better option and we will
explore this in later chapters).

104

Diving Into Read Operation

Understanding Evaluation Operators - $expr
The $expr operator is useful if we want to compare two fields inside of one document and then find
all documents where this comparison returns a certain result. Below is an example code:
$ use financialData
$ db.sales.insertMany( [ { volume: 100, target: 120 }, { volume: 89, target: 80 }, { volume: 200, target:
177 } ] )
$ db.sales.find( { $expr: { $gt: [ “$volume”, “$target” ] } } ).pretty( )
In the above $expr (expression) we are retrieving all documents where the volume is above the
target. This is the most typical use case where we would use the expression operator to query the
data in such a manner.
The $expr operator takes in a document describing the expression. We can use comparison
operators like gt, lt and so on — more valid expressions and which operators we can use can be
found in the official documentation. We reference fields in the array rather than the number, and
these must be wrapped in quotation marks along with a dollar sign at the beginning. This will tell
mongoDB to look in the field and use the value in the expression. This should return two documents
that meet the expression criteria.
105

Below is another more complex expression example:
$ db.sales.find( { $expr: { $gt: [ { $cond: { if: { $gte: [“$volume”, 190 ] }, then: { $subtract: [“$volume”,
10 ] }, else: “$volume” } }, “$target” ] } } ).pretty( )
Not only are we comparing whether volume is greater than target but also where volume is above
190, the difference between volume and target must be at least 10. To achieve this we have to
change the expression inside our $gt operator.
The first value will be a document where we use a special $cond operator for condition. The $cond
works in tandem with the $expr operator. We are using an if: and then: to calculate the value
dynamically. The if is another comparative operator. We are $subtracting 10 from the volume value
for all the items that are greater than or equal to 190. We use an else: case to define cases that do
not match the above criteria, and in this case we would just use the volume value.
We would finally compare the value with the target to check whether the value is still greater than or
equal to the target. This should return 2 documents.
As we can see from the example above, this is a very powerful command within our tool belt when
querying data from a mongoDB database.
106

Diving Into Read Operation

Diving Deeper into Querying Arrays
There are multiple things we can perform when querying arrays and there are special operators that
help us with queuing arrays. If we want to search for example all documents that have an
embedded sports document, we cannot use the normal queries that we have previously used for
example, if we had embedded documents for hobbies that had title and frequency:
$ db.users.find( { hobbies: “Sports" } ).pretty( )
$ db.users.find( { hobbies: { title: “Sports” } } ).pretty( )
Both of these will not return any results if there are multiple fields within an embedded document.
$ db.users.find( { hobbies: { title: “Sports”, frequency: 2 } } ).pretty( )
This will find any documents that meet both the criteria, however, what if we only want to retrieve all
documents that have an embedded Sports document in an array, regardless of the frequency?
$ db.users.find( { “hobbies.title”: “Sports” } ).pretty( )
We search for a path using dot notation. This must be wrapped in quotation. MongoDB will go
through all of the hobbies elements and within each element it will dig into the document and
compare title to our query value of Sports. Therefore, this will retrieve the relevant documents even
if within an embedded array and there are multiple array documents. This is how we would query
array data using the dot notation.
107

Diving Into Read Operation

Using Array Query Selector - $size, $all &
$elemMatch
There are three dedicated query selectors for Arrays which are $size, $all and $elemMatch
operators. We will examine each of these selectors and their applications.
The $size selector operator allows us to select or retrieve documents where the array is of a certain
size, for example we wanted to return all documents that had an embedded array size of 3
documents. For example:
$ db.users.find( { hobbies: { $size: 3 } } ).pretty( )
This will return all documents within the users collection where the hobbies array size is 3
documents. Note: the $size operator takes an exact match of a number value and cannot be
something like greater or less than 3. This is something mongoDB does not support yet and we will
have to retrieve using a different method.
The $all selector operator allows us to retrieve documents from an array based on the exact values
without worrying about the order of the items within the array. For example if we had a movie
collection where we wanted to retrieve those with a genre of thriller and action but without caring
108

for the order of the values, this is where the $all array selector will help us.
$ db.movies.find( { genre: [“action”, “thriller”] } ).pretty( )
$ db.movies.find( { genre: { $all: [“action”, “thriller”] } } ).pretty( )
The second syntax will ensure both array elements of action and thriller exists within the genre field
and ignores the ordering of these elements (i.e. ordering does not matter) whereas, the first syntax
would take the order of the elements into consideration (i.e. the ordering matters).
Finally, the $elemMatch array selector allows us to retrieve documents where one and the same
element should match our conditions.
In the syntax below, this will find all documents where the hobbies has at least one document with
the title of Sports and a document with a frequency greater than or equal to 3 and it does not have
to be the same document/element. This would mean that a user who has the title of Sports but a
frequency below 3 and a title of a different hobby but a frequency greater then 3, will match the
criteria as it has at least one of each document criteria.
$ db.users( { $and: [ { “hobbies.title”: “Sports” }, { “hobbies.frequency”: { $gte: 3 } } ] } ).pretty( )
To ensure we retrieve all documents where the hobbies is Sports and its frequency is greater than or
equal to 3 is returned, the $elemMatch operator is useful:
$ db.movies.find( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } } ).pretty( )
109

Diving Into Read Operation

Understanding Cursors
The find() method unlike the findOne() method yields us with a cursor object. Why is this cursor
object important? If our client communicates with the mongoDB server and we potentially retrieve
thousands if not millions of documents with our find() query depending on the scale of the
application. Retrieving all these results is very insufficient because all these documents have to be
fetched from the database, they have to be sent over the wire and then loaded into memory in the
client application. These are the three things that are not optimal. In most cases we would not need
all the data at the same time; therefore, find gives us a cursor.
A cursor is basically a pointer which has the query we wrote stored and can go back to the database
as request the next batch. We therefore work with batches of data and we fetch documents over the
wire one document at a time. The shell by default takes the cursor and retrieves the first 20
documents, it then fetches in batches of 20 documents.
If we write our own application with a mongoDB driver, we have to control that cursor manually to
make sure that we get back our results. The cursor approach is beneficial because it saves on
resources and we only load a batch of documents rather than all the documents from a query.

110

Diving Into Read Operation

Applying Cursors
When using the find command in the shell this will display the first 20 documents. We can use the
“it” command in shell to retrieve the next 20 batches of documents and keep using this command
until we have exhausted the cursor i.e. there are no more documents to load. The “it” command will
not work with the mongoDB drivers. Instead most drivers will have a next() method we can call
instead. If we use the next() method within the shell, this will retrieve only one document and there
will be no “it” command to retrieve the next document. If we run the command again this will restart
the find query retrieving the first document again.
$ db.movies.find( ).next( )
Since the shell uses JavaScript, we can use JavaScript syntax to store the cursor value in a variable.
We can then use the next() method the to cycle through the next document on the cursor which will
retrieve the next document continuing on from the last cursor value.
$ const dataCursor = db.movies.find( )
$ dataCursor.next( )
There are other cursor methods available to us in mongoDB that we can use on our find() query.
111

$ dataCursor
$ it
This will return the first 20 batch of documents. Using the it command will retrieve the next 20
documents i.e. the default shell behaviour for cursors.
$ dataCursor.forEach( doc => { printjson(doc) } )
The forEach() method will vary on the driver we are using, but in JavaScript the forEach() method
takes in a function that will be executed for every element that can be loaded through the cursor. In
javascript we get a document which is provided by the forEach method which is passed in as the
input and then our arrow function provides the body what we want to do. In the above example we
are using the printjson() method which is provided by the shell to output the document.
This will cycle through all the remaining documents inside of the cursor (this will exclude any
documents we have already searched for i.e. using the next() method or find() method). The forEach
ill retrieve all the remaining documents and there will be no “it” command to fetch the next
documents as we would have exhausted all the documents in the cursor.
$ dataCursor.hasNext( )
The hasNext() method will return true or false to indicate if we have/have not exhausted the cursor.
We can create a new variable to reset the cursor for const variables or if we used let or var variables
we can re-instantiate the original variable again to reset the cursor to the beginning.
112

We can learn more on cursors and the shell or the mongoDB drivers on the official mongoDB
documentation:
https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/

Diving Into Read Operation

Sorting Cursor Results
A common operation is to sort the data that we retrieve. We are able to sort by anything whether it
is a string sorted alphabetically or a number sorted by numeric value. To sort the data in mongoDB
we would use the sort() method on our find function. Below is an example:
$ db.movies.find( ).sort( { “rating.average”: 1 } ).petty( )
The sort takes in a document to describe how to sort the retrieved data. We can sort by a top level
document field or an embedded document field. The values we use to sort describe the direction to
sort the data i.e. 1 means ascending (lowest value first) and -1 means descending (highest value
first). We are not limited to 1 sorting criteria for example we want to sort by the average ratings first
but then we want to sort by the runtime:
$ db.movies.find( ).sort( { “rating.average”: 1, runtime: -1 } ).pretty( )

113

Diving Into Read Operation

Skipping & Limiting Cursor Results
We are able to skip a certain amount of elements. Why would we want to skip elements? If on our
application or web app we implement pagination where users can view results distributed across
multiple pages (e.g. 10 elements per page) because we do not want to show all results on one
page. If the user switches to a page e.g. page 3, we would want to skip the first 20 results to display
the results for page 3.
The skip method allows us to skip cursor results. Below is an example of skipping by 20 results.
Skipping allow us to move through our dataset.
$ db.movies.find( ).sort( { “rating.average”: 1, runtime: -1 } ).skip(20).pretty( )
The limit method allows us to limit the amount of elements the cursor should retrieve at a time and
then also means the amount of documents we can then move through a cursor. Limit allows us to
retrieve a certain amount of documents but only the amount of documents we specify.
$ db.movies.find( ).sort( { “rating.average”: 1, runtime: -1 } ).skip(100).limit(10).pretty( )
We can have the sort, skip and limit methods in any order we want and this will not affect the sorting
as mongoDB will automatically do the actions in the correct order.
114

Diving Into Read Operation

Using Projections to Shape Results
How can we shape the data which we get back from our find into the form we need it? When we use
the find function this retrieves all the data from the retrieved document. This is not only too much
redundant data transferred over the wire but also makes it hard to work with the data if we have to
manually parse all the data. Projection allows us to control which data is returned from our Read
Operation.
Below is an example code for projecting only the name, genre, runtime and rating from the
returned results and all other data on the document does not matter to us.
$ db.movies.find( { }, { name: 1, genre: 1, runtime: 1, rating: 1 } ).pretty( )
In order to perform a projection we need to pass a second argument to the find function. If we do
not want to specify any filter criteria for the first criteria of find, we would simply add an empty
document as seen above. The second argument allows us to configure how values are projected i.e
how we extract the data fields. We name the field and pass the value of 1. All fields that we do not
mention with a 1 or we explicitly add with a 0 will be excluded by default. This will retrieve a
reduced version of the document, but the id will always be included in the results even if we do not
specify it within the projection. If we want to exclude the id we must explicitly exclude it as seen
115

in the below example:
$ db.movies.find( { }, { name: 1, genre: 1, runtime: 1, rating: 1, _id: 0 } ).pretty( )
This will retrieve the data in the projection and explicitly exclude the _id value from the results. This
is only required for the _id filed and all other fields are implicitly set to 0 to exclude by default on
projections.
We are also able to project on embedded documents for example we are only interested in the
time from the schedule embedded document and not the day field. We would use the path
notation to select the embedded document to project as seen below:
$ db.movies.find( { }, { name: 1, genre: 1, runtime: 1, “schedule.time”: 1 } ).pretty( )
Projections can also work with arrays and help with array data that we include in our returned find
query results. For example, if we want to project Drama only from the genres, we would first filter
the data by the criteria of all documents with an array containing Drama in the Genres and then use
the projection argument to display only the Drama from the array:
$ db.movies.find( { genres: “Drama” }, { “genres.$”: 1 } ).pretty( )
The special syntax of $ after the genres will tell mongoDB to project on the one genre it found.
116

Now the document may have had more than Drama within the array in the retrieved document, but
we have told mongoDB to only fetch the Drama and only output that result because that is the only
data we are interested in retrieving. The items behind the scenes may have more data, just as they
have other fields too. However, with the $ syntax will find the first matching document from our
criteria query which in the above scenario was Drama.
If we had a more complex criteria whereby we are searching for all documents with both Drama and
Horror, the $ projection syntax will return Horror because the Horror is the first matching criteria in
the below example. The $all requires a match when Drama is present but the matching is when
Horror is also present and therefore the projection will display Horror as it is technically the first
matching criteria because Drama didn’t match anything.
$ db.movies.find( { genres: { $all: [ “Drama”, “Horror” ] } }, { “genres.$”: 1 } ).pretty( )
There may be cases where we want to project items from an array in our document that are not
items we queried for. In the below example we query to retrieve all documents with the value of
Drama, but we want to project Horror only. Using the $elemMatch operator allows us to do this:
$ db.movies.find( { genres: “Drama” }, { genere: {$elemMatch: { $eq: “Horror” } } } ).pretty( )
The filter criteria and projection can be totally unrelated as seen in the below example:
$ db.movies.find( { “rating.average”: { $gt: 9 } }, { genere: {$elemMatch: { $eq: “Horror” } } } ).pretty( )
117

Diving Into Read Operation

Understanding $slice
The final special projection relating to arrays is called the $slice operator. This operator returns the
first x amount of items from the array. In the below syntax example we are projecting the first 2 array
items within the generes field.
$ db.movies.find( { “rating.average”: { $gt: 9 } }, { genres: { $slice: 2 }, name: 1 } ).pretty( )
The documents may have more genres assigned to them but we only see the first two items in the
array because we used the $slice: 2 to return only 2 items. The number can be any value to return
any number of items from the array. The alternative method of slice is to use an array form.
$ db.movies.find( { “rating.average”: { $gt: 9 } }, { genres: { $slice: [ 1, 2 ] }, name: 1 } ).pretty( )
The first element in the slice array is the amount of elements in the array which we skip e.g. we skip
one item. The second element in the slice array is the amount of data we want to limit it to e.g. we
want to limit it to two items. This will return item 2 and item 3 from the array and skip item 1 when it
projects the results (i.e. we projected the next two items in the array).
We have many ways of controlling what we see using the 1 and 0 for normal fields and using the $,
$elemMatch and $slice to control which element in the array are projected in our results.
118

We have now completely explored how to fully control what we fetch with our filtering criteria and
then control which fields of the found document to include in our displayed result sets using
projections.
Useful Links:
https://docs.mongodb.com/manual/reference/method/db.collection.find/
https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/
https://docs.mongodb.com/manual/reference/operator/query/

119

Diving Into Update Operation

Updating Fields with updateOne(), updateMany()
and $set
An update operation requires two pieces of information (two properties):
1. Identify the document that should be updated/changed (i.e. the filter)
2. Describe how should it be updated/changed
For identifying the document we can use any of the filters we could use for finding documents and
do not necessarily need to use the _id value. Using the _id is a common use for updating
documents as it will guarantee the correct document is being updated.
The updateOne() method simply takes the first document that matches the filter criteria and
updates it even if multiple documents may match the criteria. The updateMany() method on the
other hand will take the criteria/filter and update all documents that match.
$ db.users.updateOne( { _id: ObjectId(“5b9f707ead7”) }, { $set: { hobbies: [ { title: “Sports”,
frequency: 5 }, { “Cooking”, frequency: 3 }, { title: “Hiking”, frequency: 1 } ] } } )
$ db.users.updateMany( { “hobbies.title”: “Sports” }, { $set: { isSporty: true } } )
120

The second argument/parameter is how to update the document and the various update operators
can be found on the official mongoDB documentation:
https://docs.mongodb.com/manual/reference/operator/update/
The $set takes in a document where we describe some fields that should be changed or added to
the existing document. The updateOne example overwrites the existing hobbies array elements
with the new hobbies array elements. The console will provide a feedback when it updates the
document:
$ { “acknowledged” : true, “matchedCount” : 1, “modifiedCount” : 1 }
If we ran the exact updateOne command again the console will still show a matchedCount of 1 but
the modifiedCount will be 0 as no document data would have been modified because it is exactly
the same as the existing value.
$ { “acknowledged” : true, “matchedCount” : 1, “modifiedCount” : 0 }
If we were to find all the documents using the find() command, we would notice the user would still
have the same _id and other document fields. This is because the $set operator does not override
the existing values, instead it simply defines some changes that mongoDB evaluates and then if
they make sense it changes the existing document by adding or overwriting the second argument
fields. All the existing fields are left untouched. The $set operator by default simply adds or edits the
fields specified in the update command.
121

Diving Into Update Operation

Updating Multiple Fields with $set
In the previous section we demonstrated the $set operator used to update one field at a time in a
document. It is important to note that the $set is not limited to updating one field in a document but
can update multiple fields within a document as seen in the below example whereby we add a field
of age and another field of phone:
$ db.users.updateOne( { _id: ObjectId(“5b9f707ead7”) }, { $set: { age: 40, phone: 07933225211 } } )
The console again will output when a document has matched the filter and have been modified
whether the update modification was a overwrite or adding new fields to the matched document.
$ { “acknowledged” : true, “matchedCount” : 1, “modifiedCount” : 1 }
The $set operator can add fields, arrays and embedded documents inside and outside of an array.

Diving Into Update Operation

Incrementing & Decrementing Values
The update operator allows us not only allows us to simply setting some values, but it can also
122

increment or decrement a number for us. For example, if we wanted to update a users age without
using the $set operator as we would not necessarily know ever users age, rather we would want
mongoDB to perform a simple common transformation of taking the current age and then
recalculate the new age and then issue the update request.
MongoDB has a built in operator to allow us to perform common increment and decrement
operations using the $inc operators.
$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: 1 } } )
This will increment the existing age field value by one. Note we could choose a different number
such as 5 and increment the number by 5. To decrement a filed value we would continue to use the
$inc operator but use a negative number instead.
$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: —1 } } )
Note we can perform multiple different things in the same update such as increment certain fields
while setting new fields/editing existing fields, all within the second update parameter.
$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: 1 }, $set: { gender: “M” } } )
If we tried to increment/decrement a value as well set the same field value, this will give us an error:
123

$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: 1 }, $set: { age: 30 } } )
This would error in the console because updating the path will cause a conflict because we have
two update operations working on the same field and this is not allowed in mongoDB and will fail
providing a message in the shell something like the below:
$ WriteError: Updating the path ‘age’ would create a conflict at ‘age’: …

Diving Into Update Operation

Using $min, $max and $mull Operators
The $min, $max and $mull are other useful operators available to us. Below are scenarios and the
syntax to overcome various update problems.
Scenario 1: We want to set the age to 35 only if the existing age is higher than 35. We would use
the $min operator because this will only change the current value if the new value is lower than
existing value. Note: this will not throw any errors if the existing value is higher than the new value, it
will simply not update the document.
$ db.users.updateOne( { name: “Chris”}, { $min: { age: 35 } } )

124

Scenario 2: We want to set the age to 38 only if the existing age is lower than 38. We would use the
$max operator which is the opposite of the $min operator. Again this will not throw any error for
existing values is lower than the update value as it will simply ignore the update.
$ db.users.updateOne( { name: “Chris”}, { $max: { age: 38 } } )
Important Note: the modifiedCount in the terminal will show as 0 to show if no update occurred.
Scenario 3: We want to multiply the existing value with a multiplier. We would use the $mul
operator which stands for multiply to perform this type of update operation.
$ db.users.updateOne( { name: “Chris”}, { $mull: { score: 1.1 } } )
This will multiply the existing score value by the multiplier of 1.1 to update the score document with
the new value.

Diving Into Update Operation

Getting Rid of Fields
If we want to update records to drop a filed based on a certain criteria(s). There are two solutions to
this problem and below is an example syntax to drop all phone numbers for users that are isSporty.
125

$ db.users.updateMany( { isSporty: true }, { $set: { phone: null } } )
We could use the $set operator to set the phone to null. This will not drop the field but it would
mean that the field has no value which we can use in our application to not display the phone data.
The alternative solution is to use the special operator of $unset to truly get rid of a field from a
document.
$ db.users.updateMany( { isSporty: true }, { $unset: { phone: “” } } )
The value we use with phone (or key field) is totally up to us but typically set to “” which represents
empty. The key would be ignored in the update as the important part of the $unset operator
document is the field name we wish to drop.

Diving Into Update Operation

Renaming Fields
Just as we have the $unset operator to drop a field from a document, we are also able to rename
fields using the $rename operator. We can leave the first update argument empty to target all
documents and update the field name. The $rename document takes in the field name we want to
rename and the key as a string of the new field name value. This will only update all documents that
has a field called age to the new updated field name.
$ db.users.updateMany( { }, { $rename: { age: “totalAge” } } )
126

Diving Into Update Operation

Understanding The Upsert Option.
If we wanted to update some documents but we were uncertain if the document existed or not. For
example we have an application but we did not know if the data was saved in the database yet and
if it wasn’t saved yet, we now want to create a new document and if it did exist we want to overwrite/
update the existing document.
In this scenario if Maria did not exist as a document, nothing will happen other than a message in
the terminal to tell us no document was updated. Instead, we would want mongoDB to create the
document for us instead of having us manually doing this.
$ db.users.updateOne( { name: “Maria” }, { $set: { age: 29, hobbies: [ { title: “Cooking”, frequency: 3 }
], isSporty: true } } )
To allow mongoDB to update or create documents for us, we would pass a third argument option
called upsert and set this to true (by default this is set to false). Users is a combination of update and
insert and will work with both updateOne and updateManu methods. If Maria does not exist it will
create a new document and it will also include the name: “Maria” field automatically for us.
$ db.users.updateOne( { name: “Maria” }, { $set: { age: 29, hobbies: [ { title: “Cooking”, frequency: 3 }
], isSporty: true } }, { upsert: true } )
127

Diving Into Update Operation

Understanding Matched Array Elements.
Example scenario: we want to update all users document where the person has a hobby of sports
and the frequency is greater or equal to three.The hobby field has an array of embedded
documents.
$ db.users.find( { $and: [ { “hobbies.title”: “Sports”, { “hobbies.frequency”: { $gte: 3 } } } ] } ).pretty( )
This syntax will find all users which has hobbies title of Sports and hobbies that as a frequency that is
greater or equal to three. This will find documents even where the Sports frequency is below three
so long as there is another embedded hobbies document which has a frequency greater or equal
to three.
The correct query is to use the $elemMatch operator which will look at the same embedded
document for both criteria:
$ db.users.find( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } } ).pretty( )
Now that we know the correct query to find the documents we wish to update, the question now
becomes how do we update that embedded array document only. So essentially we know which
128

overarching document we want to change but we want to change something exactly within that
document found in the array.
The $set operator, by default will apply the changes to the overall document and not the document
in the array we found. We would use the $set operator and then select the array field and use .$ as
the syntax. This will automatically refer to the element matched in our filter criteria (first argument to
the update command). We can define the new value after the colons.
$ db.users.updateMany( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } }, { $set:
{ “hobbies.$”: { title: “Sports”, frequency: 3 } } } )
Note this will update all matching documents to the updated changes. However, if we only want to
add a new field to all matching documents the syntax would be to add a dot after the .$ as seen
below:
$ db.users.updateMany( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } }, { $set:
{ “hobbies.$.highFrequency”: true } } )
The above syntax will find all documents which match the embedded document criteria and update
that embedded document to add a new field/value of highFrequency: true if the highFrequency
field did not exist (if it did, it would simply update the existing field value). The $set works exactly as
it did before the only difference is adding the special placeholder within the array path to quickly
get access to the matched array element.
129

Diving Into Update Operation

Updating All Array Elements.
Example Scenario: The below find method returns the overall person document where the filter
matched and does not return the document we filtered on only, the filter is just a key factor for
returning to us the overall document.
$ db.users.find( { “hobbies.frequency”: { $gt: 2 } } ) .pretty( )
This will also retrieve a document with a frequency lower than two provided at least one embedded
document has a frequency above two to match the filter. Now that we have found all documents
meeting the criteria but not all array documents fulfilled our filter criteria. However, we want to
change all embedded documents in the hobbies array that did fulfil the filter criteria only.
$ db.users.updateMany( { “hobbies.frequency”: { $gt: 2 } }, { $set: { “hobbies.$.goodFrequency:
true } } )
Again we can use the $set operator, but we want to change all the matched hobbies with a
frequency above two. The “hobbies.$” syntax we saw in the previous section only edits hobby for
each person and if there is multiple matching hobbies per person, it will not update them all but
only the first element within the array.

130

Now in order to update all elements in an array, we would use a special placeholder in mongoDB
that is the .$[] which simply means update all elements. We can use the dot notation after the .$[] to
select a specific field in the array document.
$ db.users.updateMany( { totalAge: { $gt: 30 } }, { $inc: { “hobbies.$[].frequency: -1 } } )
This will update all users that has a totalAge greater than 30 and decrement the hobbies frequency
by 1. The .$[] syntax is used to update all arrays elements.

Diving Into Update Operation

Finding and Updating Specific Fields
Continuing on from the last section, we were able to use the .$[] notation to update all elements
within the array per person. We can now build on this notation to update specific fields and below is
the solution to the previous problem.
$ db.users.updateMany( “hobbies.frequency”: { $gt: 2 } }, { $set: { “hobbies.$[el].goodFrequency”:
true } }, { arrayFilters: [ { “el.frequency”: { $gt: 2 } } ] } )
Note: Within the the el within the square bracket is an identifier which we could have named as
anything. For the third argument in our update method we would use the arrayFilters option to
define the identifier/filter elements. The identifier does not need to be the same as the filter criteria
for example we could filter by age but use the $set to identifier to update based on a
131

complete different filter such as the frequency greater than two. The arrayFilter can have multiple
documents for each identifier.
This would now update specific array elements that meet the identifier arrayFilter criteria. Note that
the third argument of updateOne and updateMany is an options argument whereby we previously
used upsert as an option to update/insert documents. We can also use arrayFilters to provide a filter
criteria for our identifiers.
Note: if an identifier is used we must use arrayFilter option to define the identifier filter criteria
otherwise the update method will fail as mongoDB will not know what to do with the identifier.

Diving Into Update Operation

Adding Elements to Arrays
If we want to add elements onto an array for a document instead if using $set operator (which we
can still use to update fields), we can use $push to push a new element onto the array. The $push
operator takes in a document where we describe firstly the array we wish to push to and then the
element we want to push/add to the existing array documents.
$ db.users.updateOne( { name: “Maria” }, { $push: { hobbies: { title: “Sports”, frequency: 2 } } } })
132

The $push operator can be used with more than one document to be added. We use the $each
operator which takes in an array of multiple documents that should be added/pushed.
$ db.users.updateOne( { name: “Maria” }, { $set: { $push: { hobbies: { $each: [ { title: “Running”,
frequency: 1 }, { title: “Hiking”, frequency: 2 } ] } } } )
There are two options sibling options we can add to the above $each syntax. The first is the $sort
operator. Technically, we are still in the same object where we have the $each operator i.e. it is a
sibling to $each. The sort describes how the elements in the array should be sorted before they are
pushed into hobbies.
$ db.users.updateOne( { name: “Maria” }, { $set: { $push: { hobbies: { $each: [ { title: “Running”,
frequency: 1 }, { title: “Hiking”, frequency: 2 } ], $sort: { frequency: -1 } } } } )
This will sort the hobbies array by frequency in a descending order i.e. having the highest frequency
first.
The second sibling is the $slice operator which allows us to add only a certain number of element.
We can use this in conjunction with the $sort operator as seen below.
$ db.users.updateOne( { name: “Maria” }, { $set: { $push: { hobbies: { $each: [ { title: “Running”,
frequency: 1 }, { title: “Hiking”, frequency: 2 } ], $sort: { frequency: -1 }, $slice: 1 } } } )
In this example the slice is taking only the first element after the sort to push to the hobbies array.
The sort is on the overall array i.e. new and existing elements and not just the elements we add.
133

Diving Into Update Operation

Removing Elements from Arrays
Not only are we able to push elements to an array but we are also able to pull elements from an
array using the $pull operator, as demonstrated below.
$ db.users.updateOne( { name: “Maria” }, { $pull: { hobbies: { title: “Hiking” } } } )
The $pull operator takes in a document where we describe what we want to pull from the array. In
the above we are pulling from the hobbies array based on a condition i.e. pull every element where
the title is equal to Hiking. We do not need to only use equality conditions but we can also use all
the normal filter operators that we have seen before such as the greater than or less than operator.
Sometimes we may wish to remove the last element from an array and have no specific filter criteria.
We would use the $pop operator and use a document to define the name of the field of which we
want to pop. The -1 defines to pop the first element and the 1 defines to pop the last element from
the array.
$ db.users.updateOne( { name: “Chris” }, { $pop: hobbies: 1 } )

134

Diving Into Update Operation

Understanding the $addToSet Operator
The final update command we will explore is the $addToSet operator.
$ db.users.updateOne( { name: “Maria” }, { $addToSet: { hobbies: { title: “Hiking”, frequency: 2 } } } })
The difference between $addToSet and $push operator, the $push operator allows us to push
duplicate values whereas the $addToSet does not allow for this. It is important to note the console
will not error but simply show that no document was updated with the change. Always remember
that $addToSet operator adds unique values only.
This concludes the Update Operations available to us in mongoDB. We now understand the three
arguments we can pass to both the updateOne and updateMany commands which are:
1. Specify a filter (query selector) using the same operators we know from find() command.
2. Describe the updates via $set or other update operators.
3. Additional Options e.g. $upsert or $arrayFilters to the update operation.
In the official documentation we can view all the various operations available to us:
https://docs.mongodb.com/manual/tutorial/update-documents/
135

Diving Into Delete Operation

Understanding deleteOne() and deleteMany()
To delete a single document from a collection we would use the deleteOne() command. We need
to specify a query selector/filter. The filter we specify here is exactly the same as we would use for
the the finding and updating documents. We simply need to narrow down the document we want
to delete. DeleteOne will delete the first document that matches the criteria.
$ db.users.deleteOne( { name: “Chris” } )
We can use the deleteMany() command to delete all documents where the query selector/filter
criteria has been met. Below are two examples.
$ db.users.deleteMany( { totalAge: {$gt: 30}, isSporty: true } )
$ db.users.deleteMany( { totalAge: {$exists: false}, isSporty: true } )
Note: we can add as many query selectors as we want to narrow down the document(s) we wish to
delete from the database.

136

Diving Into Delete Operation

Deleting All Entries in a Collection
There are two approaches to deleting all entries in a collection. The first method is to reach out to
the collection and execute the .deleteMany( ) command and pass and empty document as the
argument. This argument is a filter that matches every document in the collection and therefore will
delete all entries within the collection.
$ db.users.deleteMany( { } )
The alternative approach is to delete the entire collection using the .drop( ) command on the
specified collection. This will return true if successfully dropped a collection.
$ db.users.drop( )
When creating application it is very unlikely we would drop collections. Adding and dropping
collections is more of a system admin task. We can also drop an entire database using
dropDatabase command. We would then use the use command followed by the database to
navigate to the desired database collection.
$ db.dropDatabase( )
https://docs.mongodb.com/manual/tutorial/remove-documents/
137

Working with Indexes

What are Indexes and why do we use them?
An index can speed up our find, update or delete queries i.e. all the queries where we are looking
for certain documents that should match some criteria.
$ db.products.find( { seller: “Marlene” } )
If we take a look at this find query, we have a collection of documents called products and we are
searching for a seller called Abbey. Now by default if we don’t have an index on the seller set,
mongoDB will go ahead and do a so-called collection scan. This simply means that mongoDB to
fulfil this query will go through the entire collection, look at every single document and see if the
seller equals “Marlene” (equality). As we can imagine, for a very large collection with thousands or
millions of document, this can take a while to complete. This is the default or only approach
mongoDB can take when there are no indexes setup in order to retrieve maybe two documents out
of the thousands of documents in the collection.
We can create an index and an index is not a replacement for a collection but rather an addition. We
would create an index for the seller key of the product collection and that index then exists
additionally to the collection and the index is essentially an ordered list of all the values that are
stored in the seller key for all the documents.
138

It is not an ordered list of the documents, it is just the values for the field for which we created that
index. Also it is not just an ordered list of the values, every value/item in the index has a pointer to
the full document it belongs to.
This allows mongoDB to perform a so-called index scan to fulfil this query. This means mongoDB
will see that for seller, such an index exists and it therefore simply goes to that seller index and can
quickly jump to the right values because unlike for the collection, it knows the values are sorted by
that key. This means if we are searching for a seller starting with M, it does not need to search
through the first few records. This allows mongoDB very efficiently go through that index and find
the matching product because of the ordering and the pointer that every items within the index has.
So mongoDB finds the value for the query and then finds the related document to return.
This is how an index works in mongoDB and also answers why we would use indexes because
creating indexes drastically speeds up our queries. However, we also should not overdo it with
indexes. Lets take the example of a Products collection which has a _id, name, age and hobbies
fields. We could create a index for all four fields and we would have the best performance because
no matter what we look for, we have an index and can query for every field efficiently which will
speed our find queries. Having said this, index does not come without cost. We would have to pay
some performance cost on inserts because that extra index that has to be maintained would need
to be updated with every inserts. This is because we have an ordered list of elements with pointers
139

to the documents. So if we add a new document, we also have to add a new element to the index.
This may sound simple and it would not take super long, but if we have 10 indexes for our
document in our collection, we would have to update all 10 indexes for every insert. We may then
quickly run into some issues because we will have to do a lot of work for all these fields for every
insert and for every update too. Therefore, indexes do not come for free and we have to figure out
which indexes makes sense to have and which indexes don’t.
We are now going to explore indexes in more detail and look at all the type of indexes that exist in
mongoDB and how to measure whether an index makes sense or does not make sense to have.

Working with Indexes

Adding a Single Field Index?
To determine whether an index can help us in our find query, mongoDB provides us with a nice tool
that we can use to analyse how it executed the query. We can chain the explain method to our
normal query. This method works with find, update and delete commands but not for inserts (i.e. it
works for methods that narrow down documents.
$ db.contacts.explain( ).find( { “dob.age”: { $gt: 60 } } )
This will provide a detailed description of what mongoDB did to derive to the results.
140

MongoDB thinks in the so-called plans and plans are alternatives it considers for executing the
query and in the end it will find the winning plan. The winning plan is simply what mongoDB did to
get to our results. Without indexes, a full scan is always the only thing mongoDB can do. However, if
there were alternatives and they were rejected then this will appear in the rejectedPlans array.
To get a more detailed report we can run the same command but passing in an argument:
$ db.contacts.explain( “executionStats” ).find( { “dob.age”: { $gt: 60 } } )
The executionStats provides a detailed output for our query and how the results were returned. This
will show things such as executionTimeMillis which is the time it took to execute the query in
milliseconds and totalDocsExamined which shows the number of documents that needed to be
scanned in order to return our query results. The larger the gap between the totalDocsExamined
and nReturned values, this shows how inefficient the query is.
To add an index on a collection, we would use the createIndex method passing in a document. The
first value (key) in the document is the name of the field we want to create a index on (this can be on
top level fields as well as embedded field names). The value is whether mongoDB should create a
list of values in the field in an ascending (1) or descending (-1) order.
$ db.contacts.createIndex( { “dob.age”: 1 } )

141

Once we run the command we should see in the terminal that the index has been created.
{
“createdCollectionAutomatically” : false
“numIndexesBefore” : 1
“numIndexesAfter” : 2
“ok” : 1
}
If we were to run the above explain command again on our collection, we should notice the
executionTimeMillis for the same query has been sped up. We should also see two execution stages
the first being an index scan (IXSCAN). The index scan does not return the documents but the keys/
pointers to the document. The second stage is the fetch (FETCH) which will take the pointers
returned from the index scan and reach out to the actual collection and then fetch the real
documents. We would notice that mongoDB would only have to look at a reduced number of
documents to return the documents from our query.
This is how an index can help us to speed up our queried searches and how to use the explain
method to determine whether an index should be used in our collection to speed up the query.

142

Working with Indexes

Indexes Behind the Scenes
What does createIndex( ) method do in detail?
Whilst we can't really see the index, we can think of the index as a simple list of values and pointers
to the original document.
Something like this (for the "age" field):
(29, "address in memory/ collection a1")
(30, "address in memory/ collection a2")
(33, "address in memory/ collection a3”)
The documents in the collection would be at the "addresses" a1, a2 and a3. The order does not
have to match the order in the index (and most likely, it indeed won’t).
The important thing is that the index items are ordered (ascending or descending - depending on
how we created the index). The syntax of createIndex({age: 1}) creates an index with ascending
sorting while the syntax of createIndex({age: -1}) creates an index with descending sorting.

143

MongoDB is now able to quickly find a fitting document when we filter for its age as it has a sorted
list. Sorted lists are way quicker to search because we can skip entire ranges (and don't have to look
at every single document).
Additionally, sorting (via sort(...)) will also be sped up because you already have a sorted list. Of
course this is only true when sorting for the age.

Working with Indexes

Understanding Index Restriction
In the previous section we created an index which sped up our query when looking for people with
an age greater than 60. However, if we run the same query but find people older than 20, we will
notice that the execution time is higher than it was for people above the age of 60.
To drop an index from our collection we would use the dropIndex method and pass in the
document that we created to create the index.
$ db.contacts.dropIndex( { “dob.age”: 1 } )
If we were to run a full scan against our collection, we will notice that the query is much faster than
144

having an index. The reason for why the query is much faster is because we have saved a step from
going through the index. If we have a query that will return a large portion or the majority of our
documents, an index can actually be slower because we have an extra step to go through almost
the entire index list and we then have to go to the collection and get all these documents. If we do a
full scan, we do not have this extra step of going through the collection to get the documents
because with a full collection scan we already have all the documents in memory and an index
doesn’t offer us any more because it will only be an extra step.
Important note: if we have queries that regularly return the majority or all of our documents, an
index will not help us and it might even slow down the execution. This is the first important note to
keep in mind (a first restriction) when planning our queries and whether or not to use indexes.
If we have a dataset where our queries typically return a fraction like 20% or lower than that of the
documents, then indexes will certainly always speed up our queries. If we have a lot of queries that
give us back all the documents or close to all the documents, then indexes can not do much for us.
The whole point of indexes is to quickly get to a narrow subset of our document list and return the
documents from that index.

145

Working with Indexes

Creating Compound Indexes
Not only can be have indexes on fields that have number values but we can also have indexes on
fields that have text values (both can be sorted). We cannot create indexes for booleans as we only
have two kind of values i.e. true and false and the change of index on booleans will not speed up
our queries. Below is an example of creating a index on a text field.
$ db.contacts.createIndex( { gender: 1 } )
Now the above index would not make too much sense for an index because gender has two values
of Male and Female and would probably return more than half the results. However, if we want to to
find as an example all people who are older than 30 and are male, we can create a so called
compound index.
$ db.contacts.createIndex( { “dob.age”: 1, gender: 1 } )
The order of the two fields in our createIndex method do matter because a compound index simply
is an index with more than one field. This will store one index where each entry in the index is now
not on a single value but on two combined values. This does not create two indexes and this is really
important to note with compound indexes, it creates one index where every element is a connected
value (it creates a pair value for example in the above this is a pair of the age and gender values).
146

The order of the fields defines which kind of pairs mongoDB will create in our compound index (for
example does mongoDB create a 31 male index or a male 31 index — this will be important for our
queries).
There are two queries we can now run which will take advantage of the compound index. The first is
to find based on age and gender:
$ db.contacts.explain( ).find( { “dob.age”: 35, gender: “male” } )
This will perform an index scan with our index name (the index name is auto-generated e.g.
“indexName” : “dob.age_1_gender_1”)
The second query that can utilise the compound index is a query on the age only:
$ db.contacts.explain( ).find( { “dob.age”: 35 } )
This will also use the same compound index we created for the index scan even though we never
specified to search for the gender. Compound indexes can be used from left to right, but the left
must always be used in the search i.e. a find query on the gender alone will not work.
$ db.contacts.explain( ).find( { gender: “male” } )
The above query would use a full collection scan and not the index scan using the compound index.
147

The compound indexes are grouped together, the first field (left) will be ordered whereas the other
fields (right) will not be ordered. We can have a compound index with more than 2 fields but up to a
maximum of 31 field. However, we cannot utilise the compound index without the first field.
These are the restrictions we have on compound indexes but compound indexes allows us to speed
up queries that uses multiple values.

Working with Indexes

Using Indexes for Sorting
Now that we have had a look at the basics of indexes, it is important to know that indexes are not
only used for narrowing our find queries but they can also help with sorting. Now that we have a
sorted list of elements of the index, mongoDB can utilise this in case we want to sort in the same
way that the index list is sorted.
$ db.contacts.explain( ).find( { “dob.age”: 35 } ).sort( { gender: 1 } )
In the above we can find people with the age of 35 but sort them by gender in an ascending order.
We will notice that this will use an index scan for both the gender and age even though we filtered
by age only. It uses the gender information for the sorting. Since we already have an ordered list of
values, mongoDB can utilise this to quickly give back the order of documents we need.
148

It is important to understand that if we are not using indexes and we do a sort on a large amount of
documents, we can actually timeout because mongoDB has a threshold of 32mb in memory for
sorting. If we have no index, mongoDB will essentially fetch all our documents into memory and do
the sort there and for very large collections and large amount of fetched documents, this can be too
much to then sort.
Sometimes we would need indexes not only to speed up the query but also to be able to sort at all.
This is not a problem for small dataset but where we fetch so many documents that an in-memory
sort which is the default is just not possible and we then need an index which is already sorted so
that mongoDB does not have to sort in memory but can take the order we have in the index.
Important Note: mongoDB has a threshold of 32mb which it reserves in memory for the fetched
document and sorting them. This is the second important note to keep in mind as to whether or not
to create an index.

149

Working with Indexes

Understanding the Default Index
When creating an index it would seem like that there is an index already existing in our collection.
To be able to see all indexes that exists for a collection we can use the getIndexes command.
$ db.contacts.getIndexes( )
This command will print all the indexes we have on that collection within the shell. We will notice, if
we have created new indexes on our collection that there are two indexes. The first index on the _id
field is a default index mongoDB maintains for us. The second index are the indexes that we have
created.
The default index for _id is created and painted on every collections by mongoDB automatically.
This means if we are filtering for _id or sorting by _id which is then the default sort order or order by
which the document are fetched, mongoDB is utilising the index for that at least.

150

Working with Indexes

Configuring Indexes
The _id index that we get out of the box for this field is actually unique by default. This is a setting
mongoDB gives us to ensure that we cannot add another document with the same value in the
same collection. There are use cases where we also need that behaviour for a different field and
therefore we can add our own unique indexes.
For example, if we wanted email to be a unique index. We would create an index on the email field
and then pass in a second argument to the createIndex command. The second argument allows us
to configure the index — this is where we can set the unique option to true.
$ db.contacts.createIndex( { email: 1 }, { unique: true } )
If we execute this command, we may receive a duplicate key error collection if we already have have
duplicate values within our collection. This will also show the document(s) where the duplicate key
field exists. This is an advantage of the unique index because we would get such a warning if we try
to add it or we already have it in place and tried to add a document with a value that already
existed. Unique indexes can help us as developers to ensure data consistency and avoid duplicate
data for fields that we need to have as unique. This index is not only useful to speed up our find
queries but also to guarantee that we have unique values for that given field in that collection.
151

Working with Indexes

Understanding Partial Filters
Another interesting kind of configuration for a filter is setting up a so-called partial filter.
For example, if we were creating an application for calculating what someone will get once they
retire — we would typically only look for a person older than 60. Having an index on the dob.age
field might make sense. The problem of course is that we have a lot of values in our index that we
never actually query for. Now the index will still be efficient but it will be unnecessarily big and an
index eats up size on our disk. Additionally, the bigger the index is, the more performance certain
queries will take nonetheless.
Is we know certain values will not be looked at or only very rarely and we would be fine using a full
collection scan, we can actually create a partial index where we only add the values we are regularly
going to look at.
$ db.contacts.createIndex( { “dob.age”: 1 }, { partialFilterExpression: { “dob.age”: { $gt: 60 } } } )
$ db.contacts.createIndex( { “dob.age”: 1 }, { partialFilterExpression: { gender: “male” } } )
We can add this option to compound indexes as well. In the partialFilterExpression, we define which
field will narrow down the set of values we want to add (we can use a totally different field e.g.
gender). We can use all the equality expression we have previously seen e.g. $gt, $lt, $exist etc.
152

The second expression would create a index on the age but only for elements where the underlying
document is for a male while the first will only create a index on age but for elements where the
underlying document is for person older than 60.
If we only created a partial index using the second example and performed the below query, we
would notice that mongoDB will perform a full collection scan and will ignore the partial index. This
is because mongoDB determined that yes we are looking for a field that is part of the index (age)
but since we did not search for gender in our query, it considered the partial index too risky use and
mongoDB as a top priority ensures that we do not lose any data. therefore, the results we receive
back will also include documents that has a gender of female and not male only because it
performed a full collection scan and not a partial index scan.
$ db.contacts.explain( ).find( { “dob.age”: { $gt: 60 } } )
In order for mongoDB to use the partial index we must also filter by gender:
$ db.contacts.explain( ).find( { “dob.age”: { $gt: 60 }, gender: “male” } )
The difference between a partial index and a compound index, for partial indexes the overall index
is smaller. In the above example only the ages of males are stored and female keys are not sorted in
the index and therefore the index size is smaller leading to a lower impact on our hard drive. Also
our right queries are sped up because if we insert a female, that will never have to be added to the
153

index. This still make a lot of sense if we often filter for this type of combination i.e. for the age and
then only males — a partial index can make a lot of sense if we rarely look for the other result i.e. we
rarely look for women.
Whenever mongodb has the impression that our find request would yield more than what's in our
index, it will not use that index but if we typically run queries where we are within our index (filtered
or partial index) then mongodb will take advantage of it. We would then benefit from having a
smaller index and having less impact with writes.
So again it depends on the application we're writing and whether we often just need a subset
or whether we typically need to be able to query everything, in which case a partial index won't
make much sense.

Working with Indexes

Applying the Partial Index
An interesting variation or use case of the partial index can be seen in conjunction with a unique
index. Below is a example demo of this use case.
$ db.users.insertMany( [ { name: “Abel”, email: “abel@email.com” }, { name: “Diane” } ] )
154

We have collection of users but not all documents has an email field (only Abel has an email). We
can try to create an index on our email field within the users collection.
$ db.users.createIndex( { email: 1 } )
The above will successfully create an index in an ascending order on our email field within our users
collection. If we drop this index and then create a new index using the unique option.
$ db.users.dropIndex( { email: 1 } )
$ db.users.createIndex( { email: 1 }, { unique: true } )
The above will successfully create a unique index in an ascending order on our email field within
our users collection. If we now try to insert some new document without an email:
$ db.users.insertOne( { name: “Anna” } )
We would now see a duplicate key error because the non-existing email for which we have an index
is treated as a duplicate key because now we have a no email value stored twice. This is an
interesting behaviour we need to be aware of. MongoDB treats non-existing values still as values in
our index i.e. it stores as a null value and therefore if we have two documents with null values for an
indexed field and that index is unique, we will get this error.

155

Now if we have a use case where we want to create a unique index on a field and it is ok for that
field to have a null value, then we would have to create the index slightly different.
$ db.users.dropIndex( { email: 1 }, { unique: true } )
$ db.users.dropIndex( { email: 1 }, { unique: true, partialFilterExpression: { email: { $exists: true } } } )
In the above, we now use the partialFilterExpression as a second option along with the unique
option. The partialFilterExpression of $exists: true on email lets mongoDB know that we only want
to add elements into our index where the email field exists. This will avoid the case of having a clash
with our unique option. Therefore, if we now try to run the below insert command (the same as
before) this will now work and we will not see any errors.
$ db.users.insertOne( { name: “Anna” } )
We use the combination of unique and partialFilterExpression to not index fields where no value or
where the entire field does not exist and this allows us to continue to use the unique option on that
field.

156

Working with Indexes

Understanding Time-To-Live (TTL) Index
The last interesting index option is the Time-To-Live (TTL) index. This type of index can be very
helpful for a lot of applications where we have self-destroying data for example a session of a user
where we want to clear their data after some duration or anything similar to that nature. Below is an
example of a TTL index:
$ db.sessions.insertOne( { data: “randomText”, createdAt: new Date( ) } )
$ db.sessions.fid( ).pretty( )
The seasons data will receive a random string data and the createdAt will be a data stamp of the
current date. The new Date will provide a ISODate for us, for example:
ISODate(“2019-03-31T19:52:24.272Z”)
To create a TTL index for our sessions collection we would use the expireAfterSeconds option. The
TTL option below is created on the createdAt field.
$ db.sessions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 10 } )
This is a special feature mongoDB offers and will only work on date fields/indexes. We could add
157

this to other field types (i.e. numbers, texts, booleans etc) but this would simply be ignored. In the
above we have set the expireAfterSeconds at 10 seconds. It is important to note that the index does
not delete elements in hindsight i.e. elements already existing before the TTL index was created. If
we were to now insert a new element within this collection using the below, we would notice after
10 seconds both element will now be deleted.
$ db.sessions.find( ).pretty( )
Adding a new element to the collection will trigger mongoDB to re-evaluate the entire collection
after 10 seconds which will include the existing elements, to see whether the createdAt field which
is indexed has fulfilled the expireAfterSeconds criteria (i.e. only being valid for 10 seconds).
This can be very useful because it allows us to maintain a collection of documents which destroy
themselves after a certain time span. This can be very helpful for many applications for example
session data for users on our web app or maybe an online shop where we want to clear a cart after
one day etc. Whenever we have a use case where data should clean up itself, we do not need to
write a complex script for that as we can use TTL index expiryAfterSeconds option.
It is important to note that we can only use this option on single field indexes that are date objects
and it does not work on compound indexes.

158

Working with Indexes

Query Diagnosis & Query Planning
Now that we have looked at what indexes do and how we can create our own indexes, it is
important to keep playing around with it to get a better understanding of the different options and
how indexes work. In order to play around and understand if an index is worth the effort, we need to
know how to diagnose our queries. We have already seen the explain( ) method for this.

explain( )

“queryPlanner”

Show summary for
executed query and
winning plan

159

“executionStats”

“allPlansExecuted”

Show detailed summary
for executed query,
winning plan and
possibly rejected plans.

Show detailed summary
for executed query,
winning plan and
winning plan decision
process.

It is important to note we can execute the explain as is or pass in queryPlanner as an argument to
get the default minimal output where it tells us the winning plan and nothing much else. We can
also use executionStats as an argument to see a detailed summary output and see information
about the winning plan and possibly the rejected plan as well as how long it took to execute the
query. Finally, there is also the allPlansExecuted argument which shows a detailed summary and
information on how the winning plan was chosen.
To determine whether a query is efficient, it is obvious to look at the milliseconds process time to
compare the solution with/without an index i.e. does index scan beat the collection scan. Another
important measure is to compare the number of keys in the index and how many documents
examined and how many documents are returned.

Milliseconds Process Time
IXSCAN typically beats COLLSCAN
Should be as close
as possible or # of

# of Keys (in Index) Examined

Documents should
be 0

Covered Query!
Should be as close

# of Documents Examined

as possible or # of
Documents should

# of Documents Returned

160

be 0

Working with Indexes

Understanding Covered Queries
We can reach a so-called covered query if we only return fields which are also the indexed fields in
which case the query does not examine any documents because it can do this entirely from inside
the index. We will not always be able to reach this state but if we can optimise our index to reach the
covered query state (as the name suggests the query is fully covered by the index) then we have of
course have a very efficient query because we have skipped the stage of reaching out to the
collection to get the documents which will obviously speed up our query and have a very fast
solution.
If we have an opportunity and have a query that we typically run an store fields, it might be worth
storing them in a single field or if it’s two fields, to store them in a compound index so that we can
fully cover the query from inside of our index.
Below is an example of a Covered Query - using a projection to only return the name in our query:
$ db.customers.insertMany( [ { name: “Abbey”, age: 29, salary: 30000 }, { name: “Bill”, age: 20, salary:
18000 } ] )
$ db.customers.createIndex( { name: 1 } )
$ db.customers.explain(“executionStats”).find( { name: “Bill” }, { _id: 0, name: 1 } )
161

Working with Indexes

How MongoDB Rejects a Plan
To understand how mongoDB rejects a plan we will use the customers collection example from the
section. In the customer collection we have two indexes, the standard _id index and our own name
index. We will now add a compound index on the customers collection which will create an index
on age in ascending order and the name as seen below:
$ db.customers.createIndex( { age: 1, name: 1 } )
We now have three indexes on our customer collection. We can now query our collection and use
the explain method as see how mongoDB rejects a plan.
$ db.customers.explain( ).find( { name: “Abbey”, age: 29 } )
We will notice the winningPlan will be a IXSCAN using the compound age_1name_1 index. We
should also now see a rejectedPlan which was the IXSCAN on the single field name_1 index.
MongoDB considered both indexes because the query on the name field fits both indexes. This is
interesting to know which indexes was rejected and which one was considered the winningPlan.
The question now is how exactly does mongoDB figure out which plan is better?
MongoDB uses an approach where it simply, first of all, looks for indexes that could help with the
162

query at hand. Since our find query includes a look for the field name, mongoDB automatically
derived that both the single field index and compound index could help. In this scenario we only
have two approaches but for other scenarios we may have even more approaches. Hypothetically,
lets say we had three approaches to our find query, mongoDB then simply lets those approaches
race against each other but not for the full dataset. It sets a certain winning condition e.g. against
100 documents. So it looks at which approach is the first to find 100 documents, and whichever
approach is first, mongoDB will then use that approach for the real query.
This would be cumbersome if mongoDB would have to do this for every find query we send to the
database because it would obviously cost a little bit of performance. Therefore, mongoDB caches
this winningPlan for this type of query. For future queries that are looking exactly equal, it uses this
winningPlan and for future queries that look different i.e. uses different values or different keys,
mongoDB will race the approaches again and find a winning plan for that type of query.

Winning Plan
Approach 1

Approach 2

Approach 3

Cached

Cache
163

This cache is not there forever and is cleared after a certain amount of inserts or a database restart.
To be precise, instead of being stored forever, the winningPlan is removed from cache after we :
a. We wrote a certain amount of documents to that collection because mongoDB will say it does
not know if the current winningPlan will still win because the collection has changed a lot and it
should then reconsider.
b. If we rebuilt the index i.e. we dropped and recreated the index.
c. If we add other indexes because the new index could be better.
d. If we restart the mongoDB server.
This is how mongoDB derives the winningPlan and how it stores it in cache memory.

Stored Forever?

Write Threshold
(currently 1000)

164

Index Rebuilt

Other Indexes
are Added or
Removed

Other Indexes
are Added or
Removed

This is interesting for us as a developer to regularly check our queries (our find, update or delete
queries) and see what mongodb actually does, if it uses indexes efficiently, if maybe a new index
should be added (something we can do on your own if we own the database instead we can pass
that information to your db administrator) or if we maybe need to adjust the query.
Maybe we're always fetching data that we do not really need and we could use a covered query if
we just would project the data we need which happens to be the data stored in the index.
This is why, as a developer, we need to know how indexes work because either we need to create
them on our own in our next project on which we work alone or because we can optimise our
queries or tell the db administrator to optimise the indexes.
The last level of verbosity that the explain method offers to us is the allPlansExecution:
$ db.customers.explain(“allPlansExecution”).find( { name: “Abbey”, age: 29 } )
What this will do, it will provide a bunch of output with detailed statistics for all plans including the
rejected plans. We can therefore see in detail how an index scan on our compound index perform
as well as how it would perform on any other indexes. With this option, we can get detailed analytics
on different indexes & queries and the possible ways of running our query. We should now have all
the tools we need to optimise our queries and our indexes.

165

Working with Indexes

Using Multi-Key Indexes
We are now going to explore two new type of indexes and the first one is called a multi-key index.
$ db.contacts.drop( )
$ db.contacts.insertOne( { name: “Max”, hobbies: [ “Cooking”, “Football” ], addresses: [ { street: “First
Street” }, { street: “Second Street” } ] } )
In mongoDB it is also possible to index arrays as seen below:
$ db.contacts.createIndex( { hobbies: 1 } )
$ db.contacts.find( { hobbies: “Football” } ).pretty( )
If we explain the above findQuery to see how mongoDB arrived at the winningPlan using the
executionStats command, we will notice that mongoDB used the index scan and the isMultiKey set
to true for the hobbies index.
$ db.contacts.explain(“executionStats”).find( { hobbies: “Football” } ).pretty( )
MongoDB treats index on arrays as a multi-key index because it is an index on an array of values.
Multi-key indexes technically work like regular indexes but are stored slightly differently.
166

MongoDB pulls out all the values in our index key i.e. hobbies from the above case and stores them
as separate elements in an index. This will mean that multi-key indexes for a lot of documents are
larger than single field indexes. For example, if every document has an array with four values on
average and we have a thousand documents and we indexed that array field, we would store four
thousand elements (4 x 1,000 = 4,000). This is something to keep in mind, multi-key are possible
but are also bigger, this does not mean we shouldn’t use them.
$ db.contacts.createIndex( { addresses: 1 } )
$ db.contacts.explain(“executionStats”).find( { “addresses.street”: “First Street” } )
We will notice with the above, we can create an index on the addresses array, however, when we
explore the fir query we will notice mongoDB would use a collection scan and not the index. The
reason for this is because our index holds the whole document and not the fields of the documents.
MongoDB does not go so far to pull out the elements of an array and then pull out all field values of
a nested document that array might hold. If we were looking for the addresses where the street is
First Street, then we would see mongoDB using the index scan because it is the whole document
which is in our index.
$ db.contacts.explain(“executionStats”).find( { addresses: { street: “First Street” } } )
MongoDB pulls out elements of the array for addresses as single elements which happens to be a
document, so that document is what mongoDB pulled out and then stored in the index registry. This
is something to be aware of with multi-key indexes.
167

Note that what we can do is to create an index on address.street as seen below. This will also be a
multi-key index and if we try the earlier find query now on the address.street, we would notice that
mongoDB would now use an index scan on the multi-key index.
$ db.contacts.createIndex( { “addresses.street”: 1 } )
$ db.contacts.explain(“executionStats”).find( { “addresses.street”: “First Street” } )
We can therefore use an index on a field in an embedded document which is part of an array with
the multi-key feature. We must be aware though that using the multi-key index feature on a single
collection will quickly lead to some performance issue with writes because for every new document
we add, all these multi-key indexes have to be updated. If we add a new document with 10 values in
that array which we happen to store in a multi-key index, then these 10 new entries need to be
added to the index registry. If we then have four or five of these multi-key indexes per document we
would then quickly end up in a low performance world.
Multi-key indexes are helpful if we have queries that regularly target array values or even nested
values or values in an embedded document in arrays.
We are able to create an index whereby we have a multi-key index that we add as part of a
compound index which is possible as seen below:
$ db.contacts.createIndex( { name: 1, hobbies: 1 } )
168

However, there is one important restriction to be aware of and that is a compound index made up
of two or more multi-key indexes will not work, for example the below:
$ db.contacts.createIndex( { addresses: 1, hobbies: 1 } )
We cannot index parallel arrays because mongoDB would have to store the cartesian product of the
values of both indexes, of both arrays, so it would have to pull out all the addresses and for every
address it would have to store all the hobbies. So if we have two addresses and five hobbies, we
would have to store ten values and this would become worse the more values we have addresses,
which is why this is not possible.
Compound indexes with multi-key indexes are possible but only with one multi-key index i.e with
one array and not multiple arrays. We can however have multiple multi-key indexes in separate
indexes but in one and the same index only one array can be included.

Working with Indexes

Understanding Text Indexes
There is a special kind of multi-key indexes which is a text index. Lets take the below text as an
example which could be stored in a field in our document as some kind of product description.
169

This product is a must-buy for all fans of modern fiction!

Text Index
product

must

buy

fans

modern

fiction

If we want to search for the above text, we have previously seen that we could use regex operator.
However, regex is not a really great way of searching text as it offers very low performance. A better
method is to use a text index which is a special kind of index that is supported by mongoDB which
will essentially turn the text into an array of single words and will store it as such. A extra thing
mongoDB does for us is that it removes all the stop words and it stems all words so that we have an
array of keywords and words such as “is” or “a” etc. are not stored because they are not typically
something we would search on as they would appear all over the place. The keywords are what
matters for text searches.
Using the below example of a products collection, we will explore the syntax to setup a text index:
$ db.products.insertMany( [ { title: “A Book”, description: “This is an amazing book about a young
explorer!” }, { title: “”Red T-Shirt”, description: “This T-Shirt is red and it’s pretty amazing.” } ] )
170

$ db.products.createIndex( { description: 1 } )
$ db.products.createIndex( { description: “text” } )
We would create the index the same as we would do for any other indexes, however, the important
distinction is that we do not add the 1 or -1 for ascending/descending. We could add this but then
the index will be a single field index and we can search exactly for the whole text to utilise this index
but we cannot search for the individual key words. Instead, we would add the special “text” keyword
which will let mongoDB know to create the text index by removing all the stop words and store the
keywords in an array.
When performing the find command, we can now use the $text and $search keys to search for the
keyword. The casing is not important as every keyword is stored as lowercase.
$ db.products.find( { $text: { $search: “amazing” } } )
We do not specify the field in which we we want to search on because we are only allowed to have
one text index per collection because text indexes are very expensive especially if we had a lot of
long text that has to be split up, we do not want to do this for example ten times per collection.
Therefore, we only have one text index where the $search can look into.
We can actually merge multiple fields into one text index and we will then look through them
automatically which we will see in the later section.

171

Note if we look for the keyword “red book” this will find both documents as it treats each word as
individual keywords and will search all documents which has red and all documents which has
book. If we want to specifically want to find the word red book which is treated as one keyword then
we would have to wrap the text in double quotes like so:
$ db.products.find( { $text: { $search: “\”red book”\” } } )
$ db.products.find( { $text: { $search: “\”amazing book”\” } } )
Because we are already in double quotes, we would need to add a backward slash at the beginning
and end of the phrase to escape them. This will not find anything in the collection because we do
not have a red book phrase anywhere in our text (“amazing book” would work though).
Text indexes are very powerful and much faster than regular expressions and this is definitely the
way to go if we need to look for keywords in text.

Working with Indexes

Text Indexes and Sorting
If we want to find texts from a text index, however, we would want to order the returned documents
where the closest matches are at the top, this is possible in mongoDB.
172

For example, if we want to search for “amazing t-shirt”, this will return both documents because the
amazing keyword exists in both document. However, we would rather have the t-shirt product
appear before the book because it is the better match as it has both keywords in the description.
$ db.products.find( { $text: { $search: ”amazing t-shirt” } } ).pretty( )
MongoDB does something special when managing/searching text indexes — we can find out how it
scores its results. If we use projection as the second argument to our find method in order to project
the score, we can use the $meta operator to add the textScore. The textScore is a meta field added/
managed by mongoDB for text searches i.e. $text operator on a text index.
$ db.products.find( { $text: { $search: ”amazing t-shirt” } }, { score: { $meta: “textScore” } } ).pretty( )
We would see the score mongoDB has assigned to a result and it automatically sorts all returned
documents by the score. To make sure that the returned documents are sorted we could add the
sort command as seen below — however, this is a longer syntax and the above already sorts by the
score:
$ db.products.find( { $text: { $search: ”amazing t-shirt” } }, { score: { $meta:
“textScore” } } ).sort( { score: { $meta: “textScore” } } ).pretty( )
We can therefore use the textScore meta managed by mongoDB to sort the returned results for us.
173

Working with Indexes

Creating Combined Text Indexes
As previously mentioned we can only have one text index per collection. If we look at the indexes
using the below syntax we would notice that the default_language for the text index is English
which we are able to change which we will see later.
$ db.products.getIndexes( )
If we try to add another text index to the same collection but now on the title like so:
$ db.products.createIndex( { title: “text” } )
Notice that we would now receive an IndexOptionsConflict error in the shell and this is because we
can only have one text index per collection. However, what we can do is merge the text of multiple
fields together into one text index. First, we would need to drop the existing text index — dropping
text indexes is a little harder as we cannot drop by the field name (i.e. { title: “text” } will not work),
rather we need to use the text index name.
$ db.products.dropIndex( { title: “text” } )
$ db.products.dropIndex(“description_text”)

174

Now that we have dropped the existing text index from the collection, we can now create a new text
index combining/merging multiple fields.
$ db.products.createIndex( { title: “text”, description: “text” } )
Ultimately, we will still only have one text index in our collection; however, it will contain the
keywords from both the title and description fields. We can now search for keywords that we have in
the title for example we can search for the keyword book which appears in both the title and
description or a keyword that only appears in the title and not the description and vice versa.

Working with Indexes

Using Text Indexes to Exclude Words
With text indexes not only can we search for keywords but we can also exclude/rule out keywords.
$ db.products.find( { $text: { $search: ”amazing —t-shirt” } } ).pretty( )
In the example above, by adding the minus in front of the keyword, this will tell mongoDB to
exclude any results that has the keyword t-shirt. This is really helpful to narrow down text search
queries like the above where we find amazing products that are not T-Shirts or which at least don’t
have T-Shirt in the title or in the description (the above result will only return one document and not
both as previously seen with the keyword of amazing t-shirt).
175

Working with Indexes

Setting the Default Language & Using Weights
To drop an existing text index we would first need to search for the index name and then use the
dropIndex command as seen below:
$ db.products.getIndexes( )
$ db.products.dropIndex(“title_text_description_text”)
If we now create a new index but now pass in a second options argument, we have two interesting
options available to configure about our text indexes. The first option is the default language - we
can assign the default language to a new value. The default language is English, but we can set this
to a different language such as German — mongoDB has a list of supported languages we can use.
This will determine how words are stemmed i.e. how prefixes are removed and what stop words are
removed for example words like “is” or “a” are removed in English while words like “iste” and “deya”
are removed in German. It is important to note that English is the default language but we can
explicitly specify the option.
$ db.products.createIndex( { title: “text”, description: “text” }, { default_language: “german” } )
The second option we have available to us is the ability to set different weightings for the different
176

fields we merge together. So in the below example we are merging the text and description fields
together, however, we would want to specify that the description should be a higher weight. The
weights are important for when mongoDB calculates the score of the results. To set up such
weights, we can add the weights key in our config object and this key holds a document as a value
where we reference the field name and assign a weights that are relative to each other.
$ db.products.createIndex( { title: “text”, description: “text” }, { default_language: “english”, weights: {
title: 1, description: 10 } } )
The description will be worth/weigh in ten times as much as the title. If we search for our keyword in
our products collection index we can not only search for a keyword but also set the language as
seen below:
$ db.products.find( { $text: { $search: “red”, $language: “german” } } )
This is an interesting search option if we use a different way of storing the language for different
documents. We can also turn on case sensitivity using the caseSensitive set to true. The default for
caseSensitive is false, demonstrated below:
$ db.products.find( { $text: { $search: “red”, $caseSensitive: true } } )
If we print the score we would notice that the scoring will be weighted differently if we set weight
options.
$ db.products.find( { $text: { $search: “red” } }, { score: { $meta: “textScore” } } ).pretty( )
177

Working with Indexes

Building Indexes
There are two ways in which we can build indexes; a foreground and a background.
So far we have always added indexes in the foreground with the createIndex just as we executed it.
Something we did not notice because it always occurred instantly; but during the creation of the
index the collection will be locked and we cannot edit the collection. On the other hand we can also
add indexes in the background and the collection will still be accessible.
The advantage of the foreground mode is that it is faster and the background mode is slower.
However, if we have a collection that is used in production, we probably do not want to lock it just
because we are adding an index.

178

Foreground

Background

Collection is locked
during index creation.

Collection is accessible
during index creation.

Faster

Slower

We will observe how we can add an index in the background and see what difference it would
make. To see the difference we can use the credit-rating.js file, and mongoDB shell can execute this
file by simply typing mongoDB followed by the JavaScript file name.
$ mongo credit-rating.js
MongoDB will still connect to the server but it will then execute the file and basically execute the
command in the .js file against the server. In this file we have a for loop that will add one million
documents to a collection with random numbers. Executing this will take quite a while depending
on our system and we can always quit the command using control + c on our keyboard or
alternatively reduce the number of document created in the .js file for loop. Once completed we
would have a new database and collection with one million documents in the collection.
$ show dbs
$ use credit
$ show collections
$ db.ratings.cound( )
We can use this collection to demonstrate the difference between both the foreground and
background modes. If we were to create an index now on this collection, we would notice that the
indexing does not occur instantly because we have a million documents although this can still be
quick depending on our system.
179

To demonstrate the point where the foreground mode takes time to create but also blocks us from
doing anything with the collection while it is creating the index, we would open a second mongoDB
shell instance and prepare a query in that new shell.
$ db.ratings.findOne( )
In the first shell instance we would need to create the index, but then quickly change to the second
shell instance to run the findOne query (as it does not take too long to create the new index).
$ db.ratings.createIndex( { age: 1 } )
We will notice the findOne( ) query does not finish instantly as it takes a while as it waits for the
foreground mode index creating to complete before it can execute the query. There are no errors
but the commands are being deferred until the index has been created.
For more complex indexes such as a text index or for even more documents etc. which would make
the index creation take much longer. This will become a problem because the database or the
collection might be locked for a couple of minutes or longer and therefore this is not an alternative
for a production database. This is because we cannot suddenly lock down the entire database and
the app can’t interact with the database anymore, which is why we can create indexes in the
background.

180

To create a background mode index, we would pass in a second option argument to our
createIndex command setting the background to true (the default is set to false, meaning indexes
are created in the foreground).
$ db.ratings.createIndex( { age: 1 }, { background: true } )
If we now create the new index and in the second shell instance run a command, we can
demonstrate that the database/collection is no longer locked during the index creation.
$ db.ratings.insertOne( { person_id: “dfjve9f348u6iew”, score: 44.2531, age: 60 } )
We should notice that the insertOne command should continue to work and a new document
inserted immediately while the indexes is being created in the background. This is a very useful
feature for a production databases as we do not want to add an index in the foreground in
production especially not if the index creation will take quite a while.
Useful Links:
https://docs.mongodb.com/manual/core/index-partial/
https://docs.mongodb.com/manual/reference/text-search-languages/#text-search-languages
https://docs.mongodb.com/manual/tutorial/specify-language-for-text-index/#create-a-text-indexfor-a-collection-in-multiple-languages
181

Working with Geospatial Data

Adding GeoJSON Data
In the official documents we will find an article about GeoJSON and how it is structured and which
kind of GeoJSON objects mongoDB supports. MongoDB supports all major important objects such
as points, lines or polygons as well as special advanced objects.
https://docs.mongodb.com/manual/reference/geojson/
The most important thing is to understand how GeoJSON objects are created and creating it is very
simple. To work with some data we can open up Google maps and work with different locations.
On Google maps if we click on a location, we can easily access the coordinates of the place from
inside the URL. The first coordinate is the latitude while the second coordinate after the comma is
the longitude. We will need to remember this in order to store it correctly in mongoDB. The
longitude describes a position on a vertical axis and the latitude describes a horizontal axis on the
earth globe. With this coordinate system we can map any point onto our earth.
Below is an example of adding a GeoJSON data in mongoDB.
$ use awesomeplaces
$ db.places.insertOne( { name: “California Academy of Sciences”, location: { type: “Point”,
coordinates: [ —122.46636, 37.77014 ] } } )
182

There is nothing special about GeoJSON as we can add any key name we want i.e. location, loc or
something completely different. What matters with GeoJSON data is the structure of the value. The
value should be an embedded document and in that embedded document we need two pieces of
information, the type and the coordinates. The coordinates is an array where the first value has to be
longitude and the second value is the latitude. The type must be one of the supported types by
mongoDB such as point.
We have now created a GeoJSON object and mongoDB will treat the document as a GeoJSON
object because it has fulfilled the requirements of having a type which is one of the supported
objects and having coordinates which is an array where the first value is treated as a longitude and
the second value is treated as a latitude.

Working with Geospatial Data

Running Geo Queries
We may have a web application where users can locate themselves, we can do this through some
web API or a mobile app where the use can locate themselves. Location APIs will always return
coordinates in the form of latitude and longitude which is the standard format. Our application will
give us some latitude and longitude data for whatever the user did, for example locating
themselves.
183

We can simulate this by taking another location from google maps to query whether the location we
created in the previous section is located near the new location coordinates.
$ db.places.find( { location: { $near: { $geometry: { type: “Point”, coordinates: [ —122.471114,
37.771104 ] } } } } )
The location will relate to what we named the key and is not a special reserved key (i.e. if we called
this loc then we would need to use loc). The $near operator provided by mongoDB is a operator for
working with geospatial data. The $near operator requires another document as a value and in
there we can now define a $geometry for which we want to check if it is near to. The $geometry
takes in a document which describes a GeoJSON object. We could check here if a point we add her
is close to our point.
The above query requires a geospatial index in order to run this query without running into any
errors (the above query will not run and will error), but not all geospatial queries require index but
they all, just as with other indexes, will most likely benefit from having such an index.
Important note: The $near operator requires a geospatial index whereas the other operators such
as $geoIntersect and $geoWithin operators (which we will look at in later sections) do not require
an index although an index can help speed the search query.

184

Working with Geospatial Data

Adding Geospatial Index to Track the Distance
In the previous section query example, this would have failed because we had no geospatial index.
A geospatial index is required for the $near query operator. So the question now is how do we add
such an index? This is quite straight forward and is similar to how we create other indexes.
$ db.places.createIndex( { location: “2dsphere” } )
In the above example we add the index to the location field but note that if we called this field
something different such as loc then we would need to use that field name. The difference is the
type of index, we do not use/sort by ascending or descending (1 or -1) or the text index but rather a
special 2dsphere index. This will create a geospatial index on the location field.
If we now repeat the geospatial query we had from the previous section, it should now succeed
without giving any error message in the console.
$ db.places.find( { location: { $near: { $geometry: { type: “Point”, coordinates: [ —122.471114,
37.771104 ] } } } } ).pretty( )
Now this will find the one point but it does not really tell us too much i.e. how is near defined? The
$near operator does not make much sense unless we restrict it. Typically, we would not just pass in a
185

geometry $near as we have done above, but we would also pass in another argument and define a
max and maybe also a min distance.
$ db.places.find( { location: { $near: { $geometry: { type: “Point”, coordinates: [ —122.471114,
37.771104 ] }, $maxDistance: 30, $minDistance: 10 } } } ).pretty( )
The $maxDistance and $minDistance is simply a value in meters. The above would look for a
location that is a minimum distance away of 10 meters and a maximum distance of 30 meters. If we
run this query, this will not find any geolocation because the California Academy of Sciences is
further away from the location point we added in our query (the point in the query represents our
current location). We can look on Google maps and measure the distance by right clicking on the
map and selecting measure distance and we would notice the distance is around 435.36 meters. If
we run the below query but update the maximum distance to a larger number, we should have one
geospatial location point returned from our find query.
$ db.places.find( { location: { $near: { $geometry: { type: “Point”, coordinates: [ —122.471114,
37.771104 ] }, $maxDistance: 500, $minDistance: 10 } } } ).pretty( )
This query allows us to find the nearest places/locations within a certain radius of our current
location which is an often question we would want to answer in an application that uses
geolocations.

186

Working with Geospatial Data

Adding Additional Locations
In the last section we answered the question of which points are near to our current location point.
The next common typical question we want to answer: we have a sphere or polygon area and want
to know what points are inside of that area?
In order to answer the above question, we need to learn how to add more points to our database
before we can dive into the above question. Below is a sample syntax of adding more new locations
to our database collection:
$ db.places.insertOne( { name: “Conservatory of Flowers”, location: { type: “Point”, coordinates: [ —
122.4615748, 37.7701756 ] } } )
$ db.places.insertOne( { name: “Golden Gate Park Tennis Courts”, location: { type: “Point”,
coordinates: [ —122.4593702, 37.7705046 ] } } )
$ db.places.insertOne( { name: “Nopa”, location: { type: “Point”, coordinates: [ —122.4389058,
37.7747415 ] } } )
We should now have a total of four locations added to our database collection to answer the above
question in the next section. Note: when taking locations from google maps the coordinates are
based on our screen and if not centred properly the coordinates may be slightly off.
187

Working with Geospatial Data

Finding Places Inside a Certain Area
In the previous section we asked wanted to answer the question: what locations are within a certain
sphere/polygon area?
Using Google Maps to make it easier to run this query, we can go into the Menu and select Your
Places tab and within the Maps tab create a new Map (note: we may need to be logged into our
Google account to use this feature). This will make it easier to navigate around and find coordinates.
To draw a polygon around the area we wish to capture for our query we can use the add marker to
mark locations to get the exact coordinates (we could delete these once we are done with them).
We can use this information within our query:
Note: we can create variables within the shell to store these variables so that we can use this within
our query as the shell uses JavaScript.
$ const p1 = [ —122.46636, 37.77014 ]
$ const p2 = [ —122.45303, 37.76641 ]
$ const p3 = [ —122.51026, 37.76411 ]
$ const p4 = [ —122.51088, 37.77131 ]
These four coordinates will act as our area where we want to query what locations exists within this
area and the returned query should bring back only 3 out of the 4 locations within our collection.
188

Below is the syntax to query what locations are within our polygon area:
$ db.places.find( { location: { $geoWithin: { $geometry: { type: “Polygon”, coordinates: [ [ [ —
122.4547, 37.77473 ], [ —122.45303, 37.76641], [ —122.51026, 37.76411 ], [ —122.51088, 37.77131 ],
[ —122.4547, 37.77473 ] ] ] } } } } ).pretty( )
Or using the variables, the syntax would look something like the below:
$ db.places.find( { location: { $geoWithin: { $geometry: { type: “Polygon”, coordinates: [ [ p1, p2, p3,
p4, p1 ] ] } } } } ).pretty( )
The $geoWithin operator provided by mongoDO allows us to find all elements within a certain
shape, typically a polygon, within a certain object. The $geoWithin takes a document as a value and
inside here we can add a geometry object which is simply a GeoJSON object.
Note that the type is now not a Point but a Polygon which has an array of coordinates of a polygon.
A Point used a pair of coordinates whereas a Polygon uses more than a single pair of coordinates
and therefore we must use a nested array i.e. an array within an array. Within that array we then
again add more arrays, where each array now describes one longitude and latitude pair for each
corner of the polygon. We can use the const variables which store our array coordinates of each
point. We require the p1 variable again at the end of the array because the polygon must end with
the starting point to close the polygon. This should return back the results of locations/points within
the polygon area.
189

Working with Geospatial Data

Finding Out if a User is Inside a Specific Area
In another typical use case scenario would be to check whether a user is in a certain area. For
example, we do not want to find all places in an area but we want to store a couple of different areas
potentially in the database e.g. neighbourhoods of a city, and the user sends some coordinates
because they have located themselves and we want to know which neighbourhood the user is
located. Essentially the opposite of the the last section query.
We can store the multiple polygons within our database within a new collection called areas (we
could name this whatever we want) as seen below:
$ db.areas.insertOne( { name: “Golden Gate Park”, area: { type: “Polygon”, coordinates: [ [ p1, p2, p3,
p4, p1 ] ] } } )
To check whether the user is within the area we would first need to create an index for our areas
collection.
$ db.areas.createIndex( { area: “2dsphere” } )
Now that we have our 2dsphere index created on our areas collection on the area field we can now
run our find query to see if the user is inside of a specific area.
190

$ db.areas.find( { area: { $geoIntersects: {$geometry: { “Point”, coordinates: [ -122.49089, 37.76992 ]
} } } } ).pretty( )
The $geoIntersects operator gets used on the area field to return the area where the user point
intersects the geometry area. The area field is used because this is where we stored the indexed
areas that we are looking for in our query.
The $geoIntersects operator returns all areas that have a common point or common area. The
reason we cannot use $geoWithin because we cannot find the areas within the point where the user
is and the area will never be within that point. The other way round is to check whether the user
point intersect with the area and if the answer is yes then the user is within the area and we can
return the area because this is what we are trying to find out i.e. which area is the user in.
We pass in a $geometry document which is now just a Point for the users coordinates which is an
array of a single longitude and latitude pair.
If the point is within multiple areas then we would be returned all the areas that point intersects. Not
only can we intersect points within an area but we could also intersect areas within areas to see if
the two areas touch one another. If no area intersect then we would retrieve no results.

191

Working with Geospatial Data

Finding Places Within a Certain Radius
To conclude the geospatial queries we can run, the final query is to return all places in a certain
radius around the user. We kind of looked at this before using the $near operator, however, there is
an alternative method. The $near operator also sorted the results which is an important difference
to this alternative approach.
We want to find all elements in an unsorted order that are within a certain radius:
$ db.places.find( { location: { $geoWithin: { $centerSphere: [ [ —122.46203, 37.77286 ], 1 /
6378.1 ] } } } ).pretty( )
We want to use the $geoWithin and not the $geoIntersect operator because we want to find all
places within a place or area which we will define in the query. Previously we had the opposite,
where we had a point in the query and wanted to find an area that surrounds the point.
The $geoWithin operator takes in a $geometry operator which describes the GeoJSON object,
however, there is a special operator we can use in mongoDB and that is the $centerSphere
operator. This operator allows us to quickly get a circle around a point. Essentially it uses a radius
and a centre to give us a whole circle.
192

The $centerSphere takes in an array as the value and that array has two elements. The first element
is another array that holds the coordinates of the centre of the circle we want to draw while the
second element is the radius itself.
The radius needs to be translated manually from meters or miles to radians. This conversion is
relatively easy to do and we can find an article in the mongoDB documentations. This article
explains how to translate miles to radians or kilometres. In the above example we worked with
kilometres (i.e. we wanted a 1km radius = 1 / 6378.1).
https://docs.mongodb.com/manual/tutorial/calculate-distances-using-spherical-geometry-with-2dgeospatial-indexes/
The important difference between the $near and $geoWithin & $centerSphere operator is that the
former operator returns an ordered list while the latter returns an unordered list which keeps the
order of the elements in the database (we can manually sort this using the sort method). The $near
operator gave us elements in a certain radius and sorted them by proximity. This really depends on
our requirements and what we need i.e. do we need a sorted results with the nearest results first or
do we want an inserted results and we just want to get a list of elements in general?
Useful Links:
https://docs.mongodb.com/manual/geospatial-queries/
https://docs.mongodb.com/manual/reference/operator/query-geospatial/
193

Understanding the Aggregate Framework

What is the Aggregation Framework?
The aggregation framework in its core is just an alternative to the find method. We have our
collection and the aggregation framework is all about building a pipeline of steps thats runs on the
data that is retrieved from our collection and then gives us the output in the form we need it.
These steps are sometimes related to what we already know from the find for example the match is
equivalent to filtering in the find method. There are a lot of different steps we can combine as we
want and we can reuse certain stages and therefore we have a very powerful way of modelling our
data transformation.

Collection

Output (list of documents)
{ $match }

{ $sort }

{ $group }

{ $project }

Every stage receives the
output of the previous stage
We can have a very structured way of having some input data and then slowly modify it the way we
need it in the end.
194

Understanding the Aggregate Framework

Getting Started with the Aggregation Pipeline
To get started with the aggregation pipeline, we require some data. We will use the persons data by
importing the file using the mongoimport command ensuring the command is run in the directory
where we have stored the person.json file.
$ mongimport persons.json —d analytics —c persons ——jsonArray
The above will store the import file in a database called analytics within the persons collection. We
use the jsonArray flag because the data contains an array of documents and therefore the flag is
required in the import query. This should import 5000 documents into the database.
We can now connect to the database within the shell and we should see the persons collection in
the analytics database using the commands below. We can use the findOne to see how a single
document looks. We will use this dataset to learn how to use the aggregation framework on this
dataset.
$ use analytics
$ show collections
$ db.persons.findOne( ).pretty( )
195

Understanding the Aggregate Framework

Using the Aggregation Framework
To use the aggregation framework with the persons collection we created in the last section; instead
of running the find, findOne or findMany command on the persons collection we now run the
aggregate command.
$ db.persons.aggregate( [ {…}, {…}, {…} ] )
The aggregate method takes in an array whereby we define a series of steps that should be run on
our data. The first step will receive the entire dataset right from the collection and then the next step
can do something with the data returned by the first step and so on.
It is important to note that the aggregate does not go ahead and fetch all the data from the
database and then give it to us and then do something on it. The first step runs on the database and
can take advantages of indexes, so if we filter or sort in the first step then we can take advantage of
the index and therefore we do not have to fetch all the documents just because we are using the
aggregate. Aggregate (the same as find) execute on the mongoDB server and therefore can take
advantages of things like the indexes.
Every step is a document. Below is an example and has been separated onto multiple lines to make
196

it easier to read the syntax of each step within the aggregate.
$ db.persons.aggregate( [
… { $match: { gender: “female” } }
] ).pretty( )
The match is simply a filtering step. We define some criteria on which we want to filter our data in
the persons collection. We can filter here in the same way we can filter in the find command. We
could at this stage finish our pipeline by closing the square brackets around our stages. Just like the
find method, the aggregate method returns a cursor.

Understanding the Aggregate Framework

Understanding the Group Stage
Following on from the last section, we can now add a new $group stage onto our aggregate
pipeline.
$ db.persons.aggregate( [
… { $match: { gender: “female” } },
… { $group: { _id: { state: “$location.state”, totalPersons: { $sum: 1 } } } }
] ).pretty( )
197

The group stage allows us to group our data by certain field or by multiple fields. There are a
couple of parameters that we need to define, the first being the _id field. Thus far we have always
used an objectId, string or maybe a number but we have never used a document. Just as with any
other field we can assign a document to the _id field it is just not that common. We often see the
document used in the group stage syntax because it will be interpreted in a special way an it will
basically allow us to define multiple fields by which we want to group.
In the above we have grouped by location state, and we do this by assigning a key (which we can
name as anything we want) and then the value of $location.state — the dollar sign is important
because it tells mongoDB that we are referring to a field of our document which is passed into the
group stage. This will group our results by the state. We can now add a new key to each document
and we can name this as whatever we want, in the above we used totalPersons. We would pass in a
document where we now describe the kind of aggregation function we want to execute and these
functions (accumulation operators) can be found in the official mongoDB documentation.
In the above we used the $sum accumulation operator and passed in a value we want to add for
every document that is grouped together. For example, if we have three people from the same
location state, the sum would be incremented by 1, three times i.e. 1x3 = 3. MongoDB will keep the
aggregated sum in memory until it is done with a group and then writes the total sum into the
totalPersons field.
198

It is important to understand that group accumulates data which simply means that we may have
multiple documents with the same state but the group will only output one i.e. the three documents
with the same state will be merged into one because of the aggregating (we are building a sum in
this case).
We now have a totally different data output. We no longer have any person data because we
changed it. We used group to merge our documents into new documents with totally different data
with the total persons and that ID. The ID is an object we defined with the state in which its grouped.
We can verify that our aggregation is working correctly by manually reaching out to our persons
collection and finding all persons where the location is equal to a state e.g. Sinop and then sum
those that are female from the results.
$ db.persons.find( { “location.state”: “sinop” } ).pretty( )
This is the group stage in action within the Aggregation framework.

199

Understanding the Aggregate Framework

Diving Deeper Into the Group Stage
When we group our data using the aggregation pipeline we saw that we lost all the existing data
but we are typically fine with losing the data because we are grouping them together.
When we ran our pipeline method we got a bunch of outputs in a totally unsorted order. We are of
course able to sort our data and one advantage of the aggregation pipeline is that we are able to
sort at any place in the pipeline. If we want to sort on the amount of persons in a state (i.e. the
totalPersons field) we can only sort after we have grouped the data.
$ db.persons.aggregate( [
… { $match: { gender: “female” } },
… { $group: { _id: { state: “$location.state”, totalPersons: { $sum: 1 } } } },
… { $sort: { totalPersons: -1 } }
] ).pretty( )
We use the $sort stage which takes in a document as an input to define how the sorting should
happen. We can sort as we have seen previously, however, we are now able to use the totalPersons
field from the previous group pipeline stage.
Each pipeline stage passes some output data to the next stage and that output data is the only data
200

that the next stage has. So in the above, the sort stage does not have access to the original data as
we fetched it from the collection, it only has access to the output data of our group stage. So in
there we only have a totalPersons field that we can now sort by, in the above we sorted in
descending order i.e. highest value first.
As we can see, we already have a lot of power with the first three tools that we have seen in the
aggregation framework. This is a kind of operation we could not perform with the normal find
method because we could not group and then sort on the result or group. We would have had to
do that in the client side code using the find method. Using the aggregation framework, we can run
the aggregate on the mongoDB server and simply get back the data in the client that we need in
our client to work with.

Understanding the Aggregate Framework

Working with $project
We are now going to look at a different pipeline stage called $project that allows us to transform
every document instead of grouping multiple together. We already know projection from the find
method, however, the project becomes more powerful as an aggregate stage.
If we start simple and we want to just transform every document using only the $project stage.
201

$ db.persons.aggregate( [
… { $project: { _id: 0, gender: 1, fullName: { $concat: [ “$name.first”, “ “, “name.last” ] } } }
] ).pretty( )
As with all stages, the $project takes in a document as its value to configure the stage. In its most
simplest form, the project works in the same was as the projection work in the find method. In the
above we do not want the id, but we want the gender, name, location and email field. Notice that
we are able to add new fields here and also reformat some of the fields; for example the name field
is now called fullName ad treats the name data as one field rather than the embedded document
with separated the title, first and last name.
Using the $concat operator allows us to concatenate multiple strings together. We would need to
pass in an object as the value to fullName because we are performing an operation. We pass in an
array of strings we wish to join together. We can either hard code these string values or use the
fields from our collection. In order to use our data it is important to wrap the field in double
quotations and then use the dollar $ to name the field name — using dot notation for embedded
documents. MongoDB will know from the $ that the value is not hardcoded and is coming from field
value instead. Concatenate does not add white spaces and we must add these ourselves.
If we run the aggregate method we get the same amount of documents s before because unlike
group the project does not group multiple documents togehter, it simply transforms every single
202

document and therefore we get the same amount of documents but with a totally different data.
Interestingly, not only can we include and exclude data but we can also add new fields with
hardcoded values or derived values from the data in the documents if we wanted to.
If we wanted to make sure the first and last name start with uppercase characters, we are also able
to do this in the projection stage. Below is an example broken in a much more readable syntax:
$ db.persons.aggregate( [
… { $project: {
…

_id: 0, gender: 1,

…

fullName: { $concat: [

…

{ $toUpper: { $substrCP: [ “$name.first”, 0, 1 ] } },

…

{ $substrCP: [ “$name.first”, 1, { $subtract: [ { $strLenCP: “$name.first” }, 1 ] } ] },

…

“ “,

…

{ $toUpper: { $substrCP: [ “$name.last”, 0, 1 ] } },

…

{ $substrCP: [ “$name.last”, 1, { $subtract: [ { $strLenCP: “$name.last” }, 1 ] } ] },

…

]}

… }}
] ).pretty( )

203

We can use the $toUpper operator (note: all operators are wrapped in an object) to upper case the
whole string.
The $substrCP operator returns the substring of a string. The $substrCP operator uses an array with
the first argument of the string, second argument the starting character point from the string and
the last argument of how many characters should be included in the substring (this uses zero
indexing).
The $subtract operator allows us to return the difference of two numbers, while the $strLenCP
operator calculates the length of a string.
As you can see we can nest operators to transform our documents in the project stage and this in
not uncommon. The more we get use to the different operators the more powerful transformations
we can perform on our data within the project stage and is something we would get better at the
more we practice.
Important Note: it may be easier to use a text editor such as Visual Studio Code to write our
complex $project transformation code rather than the terminal as it will allow us to easily separate
our code into multiple lines without executing the code. This would also allow us to read our code
much more easily and ensure there are no syntax errors in the code.
204

Understanding the Aggregate Framework

Turning the Location Into a GeoJSON Object
We can prepare the location data into a GeoJSON object so that we can later work with it.
Important Note: We can have have multiple $project stage and this is not uncommon for example
we could do some matching, sorting, grouping and then we project and then maybe we sort again
(so often we have some in-between stages).
$ db.persons.aggregate( [
{ $project: { _id: 0, name: 1, email: 1, location: { type: “point”, coordinates: [
{ $convert: { input: “$location.coordinates.longitude”, to: “double”, onError: 0.0, onNull: 0.0 } },
{ $convert: { input: “$location.coordinates.latitude”, to: “double”, onError: 0.0, onNull: 0.0 } }
] } } },
{ $project: { gender: 1, email: 1, location: 1, … } }
] ).pretty( )
We want the coordinates from the location field however if the field values are of the wrong format
type we can transform this into the correct type by hardcoding the type as seen above.
205

The original coordinates are in a nested document and of the type string, however we can easily
change this into an array with the type of number for the longitude and latitude coordinates.
Converting data is something we would have to do often. In the above we converted the string data
for the longitude and latitude into a number using mongoDB $convert operator.
The $convert operator takes in a couple of fields. The first is the input and the second is the to field
which defines the type we want to convert the input field value to. We can finally define the onError
and onNull values i.e. the default values to be returned in case the transformation fails. This will
transform the original data type to the new required data type we want returned from our
aggregate pipeline.

Understanding the Aggregate Framework

Using Shortcuts for Transformations
In the previous section we looked at the $convert operator and how we can use it to transform data
from one type to another. Below is another example of transforming the dob.date from a string to a
date type. We can omit the onError and onNull should we wish to:
$ db.persons.aggregate( [
{ $project: { _id: 0, name: 1, email: 1, birthdate: { $convert: { input: “$dob.date”, to: “date” } } } }
] ).pretty( )
206

We should now have a special ISODate( ) date object/type for out birthdate rather than a string.
Also note that If we have multiple $project stages whereby the first projection has new fields added,
we must also include these new fields in the second projection stage in order output the results.
This demonstrates how we are able to transform our data considerably from the original data using
the aggregate framework.
If we require a simple conversion where we do not specify onError and onNull values, as seen
above, we can therefore use a shortcut. MongoDB has special operators starting with to such as
$toDate, $toDecimal, $toBool, $toString, etc. which are shortcuts if we need to do a specific
transformation. So in the above case rather than using the $convert operator we could use the
$toDate operator passing in the field we wish to convert which will transform the original data into a
ISODate( ) date object.
$ db.persons.aggregate( [
{ $project: { _id: 0, name: 1, email: 1, birthdate: { $toDate: “$dob.date”, to: “date” } } }
] ).pretty( )
These shortcuts will help us reduce the amount of syntax we would have to write and we should be
weary of these simple transformations and use these shortcuts. If we want to specify the onError and
onNull fallback values because we have incomplete data in our dataset, then we would have to use
the $convert operator.
207

Understanding the Aggregate Framework

Understanding the $isoWeekYear Operator
Having looked at the transformation in the last section, we can now look at a scenario where we
want to group the result data by the new birthdate field but not by the date but by the birth year
instead.
$ db.persons.aggregate( [
{ $project: { _id: 0, name: 1, email: 1, birthdate: { $toDate: “$dob.date”, to: “date” } } },
{ $group: { _id: { birthYear: { $isoWeekYear: “$birthdate” } }, numPersons: { $sum: 1 } } },
{ $sort: { numPersons: -1 } }
] ).pretty( )
The $isoWeekYear operator returns the year of the date. The above will therefore group the by the
year returning the number of people in our dataset born in a particular year. We can also sort the
returned grouped data using the $sort stage either ascending or descending order.

208

Understanding the Aggregate Framework

The $group vs $project Stage
It is really important for us to understand the difference between the $group and $project
Operators. The $group operator is for grouping multiple documents into one document whereas
the $project operator is a one to one relation i.e. we would get one document and return one
document but that one document would have changed.
So for group we have multiple documents and we return one grouped by one or more categories
of our choice and also with any new fields with some summary/statistic calculations. In projections
we have the one for one relation.
Therefore in grouping we do things such as summing, counting, averaging and so on while in the
projection phase we transform a single document, add new fields and so on. This is a very important
difference to understand and to get right.

209

Understanding the Aggregate Framework

Pushing Elements into Newly Created Arrays
We are now going to look at operators and stages that will help us with working with arrays. To help
us demonstrate this, we can import a new array dataset to work that contains arrays. We are able to
do quite a lot of things with arrays in the aggregation framework.
One thing we often want to do with arrays is to merge or combine arrays into a grouping stage.
$ db.friends.aggregate( [
{ $group: { _id: { age: “$age” }, allHobbies: { $push: “$hobbies” } } }
] ).pretty( )
There are two operators that help with combining array values, the first is the $push operator. The
$push operator allows us to push a new element into the allHobbies array for every incoming
document.
What we will see is that we have two groups of age 30 and age 29 and allHobbies is an array of
arrays (nested) because we pushed the hobbies into our array and hobbies happens to be an array
itself. This is what we meant that we can push any values. What if we wanted to push existing array
210

values but not as an array but we went to pull these values out of the hobbies array and then add
them to the allHobbies array?

Understanding the Aggregate Framework

Understanding the $unwind Stage
So in the previous section we saw how we can push elements into newly created arrays, however,
the values we push do not have to be arrays themselves. We left off with the question of how can we
have the hobbies array values in the allHobbies array but not in their nested array. To do this we
have a new pipeline stage we can use called the $unwind stage.
The $unwind stage is always a great stage when we have an array of which we want to pull out the
elements. The $unwind has two different syntaxes but in its most common usage, we just pass the
name of a field that holds an array.
$ db.friends.aggregate( [
{ $unwind: “$hobbies” }
] ).pretty( )
The $unwind flattens the array by repeating the document that held the array as often as needed to
211

merge it with the array elements. So the original array of Max simply has sports and cooking and
therefore Max was repeated twice i.e. two new documents — Max with hobbies sport and Max with
hobbies of cooking and the same is true for all other elements.
So where the $group stage merges multiple documents into one, the $unwind stage takes on
document and spits out multiple documents. Now with that, we can group again but take hobbies
which is no longer an array but a single value — as seen below:
$ db.friends.aggregate( [
{ $unwind: “$hobbies” }
{ $group: { _id: { age: “$age” }, allHobbies: { $push: “$hobbies” } } }
] ).pretty( )
This will now solves the original question where we still have two groups for the age (29 and 30) but
the allHobbies now holds just an array of values and not an embedded array. However, we would
notice that this can have duplicate values. So how can we solve this problem?

212

Understanding the Aggregate Framework

Eliminating Duplicate Values
We may not want to have duplicate values in our aggregate returned results. To avoid duplicate
values, we can use an alternative to the $push operator called the $addToSet operator.
$ db.friends.aggregate( [
{ $unwind: “$hobbies” }
{ $group: { _id: { age: “$age” }, allHobbies: { $addToSet: “$hobbies” } } }
] ).pretty( )
The $addToSet does almost the same as the $push operator; however, we no longer see any
duplicate values because $addToSet essentially pushes but avoids duplicate values. If it finds that
an entry already exists it does not push the new value.
So with the $unwind stage, the $push and $addToSet operators (in the $group stage), we have
some powerful features that should help us manage our array data efficiently and transform them
into whichever format we require.

213

Understanding the Aggregate Framework

Using Projections with Arrays
We are now going to look at projections with arrays. Looking at the friends data and the
examScores array value, what if we only wanted to output the first value of that array instead of all
the examScores values from the array?
$ db.friends.aggregate( [
{ $project: { _id: 0, examScore: { $slice: [ “$examScores”, 1 ] } } }
] ).pretty( )
The $slice operator allows us to get back the slice of an array. The $slice operator takes in an array
itself and the first value is the array itself we wish to slice. This first value can be hardcoded as well as
pointing to a collection field. The second argument is the number of elements we want to slice from
the array seen from the start. In the above we sliced the first array value. If we use a negative value
as the second value for example -2, mongoDB will slice the last two values from the array.
If we want to retrieve one element but starting at the second element then the syntax will be slightly
different where the second argument number is the starting position and the third argument is the
number of elements we wish to slice. Indexing is zero based.
{ $project: { _id: 0, examScore: { $slice: [ “$examScores”, -2 ] } } }
{ $project: { _id: 0, examScore: { $slice: [ “$examScores”, 2, 1 ] } } }
214

Understanding the Aggregate Framework

Getting the Length of an Array
To get the length of an array for example if we wanted to know how many exams a friend took? This
can be achieved in the projection stage using the $size operator which calculates the length of an
array.
$ db.friends.aggregate( [
{ $project: { _id: 0, numScore: { $size: “$examScores” } } }
] ).pretty( )
We can either hard code the array or point the operator to the field which holds the array for which
we want to calculate the length and this will get stored in the new numScore (or whatever we wish
to call this) projection field.

Understanding the Aggregate Framework

Using the $Filter Operator
What if we want to transform examScores to be an array where we only see scores higher than 60?
Now this can also be done in the projection stage because we want to transform every single record
but we do not want to group by anything. We just want to transform the array in there.
215

$ db.friends.aggregate( [
{ $project: { _id: 0, scores: { $filter: { input: “$examScores”, as: ”sc”, cond: { $gt: [ “$$sc.score”,
60 ] } } } } }
] ).pretty( )
The $filter operator which allows us to filter out certain elements in an array and only return
elements that fulfil a certain condition. So in our scenario we want to filter for the score being
greater than 60. The first argument is the input i.e. the array we wish to filter. The second argument
we assign a temporary name which is known as the local variable and we can name this whatever
we want. The last argument is the condition. This last argument can take a bunch of expression
operators and in our example we used the greater than operator.
Note that the $gt operator works a bit differently than when we use it in match or find. Here the $gt
takes an array of values it should compare which makes sense as we are now in the context of
another operator. We want to compare the temporary variable (i.e. sc local variable) which will refer
to the different values in our examScores and see if it is greater than 60. The filter operator will
execute over an over again on all the elements in the array and compares to condition to filter the
list or returned array. One $ sign will tell mongoDB to look for a field name while $$ sign will tell
mongoDB to refer to the temporary variable.

216

Understanding the Aggregate Framework

Applying Multiple Operations to Arrays
What if we wanted to transform our friend array so that we only output the highest exam score for
every person so that we no longer have the examScores array but we still only get three person
results, however, we have no examScores but only the highest exam score?
There may be multiple ways of achieving the above but the below is one solution to solving the
above problem.
$ db.friends.aggregate( [
{ $unwind: “$examScores” },
{ $project: { _id: 1, name: 1, age: 1, score: “$examScores.score” } },
{ $sort: { score: -1 } }
{ $group: { _id: “$_id”, name: { $first: “$name” } , maxScore: { $max: “$score” } } },
{ $sort: { maxScore: -1 } }
] ).pretty( )
We would firstly need to unwind the examScores array to get multiple documents per person with
the score being the top level element in order to then sort the documents by the score and then
group them together by person and take the first score for that person. The $first operator tells
mongoDB to return the first value it encounters and the $max operator allows us to get a maximum.
217

Understanding the Aggregate Framework

Understanding $bucket Stage
Sometimes we want to get a feeling for the distribution of the data we have and there is a useful
pipeline that can help us with that called the $bucket stage. The $bucket stage allows us to output
our data in buckets for which we can calculate certain summary statistics.
$ db.persons.aggregate( [
{ $bucket: { groupBy: “$dob.age”, boundaries: [ 0, 18, 30, 50, 80, 120 ], output: { numPersons:
{ $sum: 1 }, averageAge: { $avg: ”$dob.age” } } } }
] ).pretty( )
The $bucket stage takes a group by parameter (i.e a groupBy field) where we define by which field
do we want to put our data into buckets. This groupBy argument tells the $bucket command where
the input data is that we want to put into buckets. We then define some boundaries which are
essentially our categories for our data. Finally, we want to define what we want to output into these
buckets. This is where we define the structure of what we get back from our results. This allows us to
get an idea of distribution, average age, etc. in each bucket
There is an alternative to the above which is called the bucketAuto which the name suggests, it
does the bucketing algorithm for us. We define the groupBy, the number of buckets and the output.
218

$ db.persons.aggregate( [
{ $bucketAuto: { groupBy: “$dob.age”, buckets: 5, output: { numPersons: { $sum: 1 },
averageAge: { $avg: ”$dob.age” } } } }
] ).pretty( )
Now each bucket holds almost the same amount of values because mongoDB tries to derive equal
distribution and bucketAuto can also be an even quicker way of getting a feeling for our data.

Understanding the Aggregate Framework

Diving into Additional Stages
We are going to look at some stages which we have to get the syntax right.
We want to find the 10 persons with the oldest birth date and thereafter we want to find the next 10.
$ db.persons.aggregate( [
{ $project: { _id: 0, name: { $concat: [ “$name.first”, “ “, “$name.last” ] }, birthdate: { $toDate:
“$dob.date” } } },
{ $sort: { birthdate: 1 } },
{ $limit: 10 }
] ).pretty( )
219

The $limit stage allows us to limit the entries we want to see, in the above we set this to 10. Notice
that there is no cursor as we have exhausted our cursor here because we only see 10 entries and
therefore this is what we get back.
If we want to see the next 10 eldest persons, we have another command called $skip stage and it is
important that this stage comes prior to the $limit stage as seen below.
$ db.persons.aggregate( [
{ $match: { gender: “male” } }
{ $project: { _id: 0, name: { $concat: [ “$name.first”, “ “, “$name.last” ] }, birthdate: { $toDate:
“$dob.date” } } },
{ $sort: { birthdate: 1 } },
{ $skip: 10 }
{ $limit: 10 }
] ).pretty( )
This will skip the first 10 records and then show the next 10 records. The order of $sort, $skip and
$limit is important. If we have $skip come after $limit we will have no results returned because $limit
will return 10 documents and when we skip by 10 then this leads to zero. The order does not matter
on the find method but it does in the aggregation pipeline because our pipeline is processed step
220

by step. The same is true for sorting whereby if we sort after $skip and $limit, we would get a totally
different set of results. We will notice that we will return persons that are not that old and this is
because we would have skipped the first ten persons in our dataset as its stored in the collection
and then we take the next ten persons and then only sort on those ten persons we limited our
output to. This is also true for the matching stage where the order is important to what is returned
back from our aggregation pipeline.
This is very important concept to understand when ordering the aggregation pipeline stages. We
should also note that mongoDB does some optimisation for us to optimise our pipelines, so it might
very well fix the issue, but we shouldn’t rely too much on this and we should always try to build
correct pipelines with the correct order that optimises for performance and builds the kind of
performance we want to have.
MongoDB actually tries its best to optimise our Aggregation Pipelines without interfering with our
logic. More can be read about the default optimisations mongoDB performs:
https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/

221

Understanding the Aggregate Framework

Writing Pipeline Results Into a New Collection
We can take the result of a pipeline and write it into a new collection. To do this we need to specify
another pipeline stage called $out stage for output. The $out stage will take the result of our
operation and write it into a collection, either a new one or an existing one.
$ db.persons.aggregate( [ …
{ $out: “transformedPersons” }
])
If we run the show collections command in the terminal we should see the new collection appear
should we have outputted the results to a new collection that is created on the fly. The $out
operator is great if we have a pipeline where we want to funnel our results right into a new
collection.

Understanding the Aggregate Framework

Working with the $geoNear Stage
We are going to explore how to work with the $geoNear stage within mongoDB aggregation
framework using the transformedPersons collection as an example.
222

First we would create a geospatial index on the locations field:
$ db.transformedPersons.createIndex( { location: “2dsphere” } )
Now with the index created, we can now use the transformedPersons collection for geolocation
queries as well as the geolocation/geospatial aggregation pipeline stage. The $geoNear stage
takes in a bunch of arguments to configure it.
$ db.transformedPersons.aggregate( {
{ $geoNear: {
near: { type: “Point”, coordinates: [ -18.4, -42.8 ] },
maxDistance: 100000,
num: 10,
query: { age: { $gt: 30 } },
distanceField: “distance”
}}
} ).pretty( )
Firstly, we need to define the point where we are for which we want to find close points. This is
because $geoNear allows us to simply find elements in our collection which are close to our current
position. The near argument takes in a GeoJSON object.
223

The second argument defines the maxDistance in meters. We can limit the amount of results we
want to retrieve in the num argument.
The third argument allows us to add a query where we can filter for other things. This is available
because $geoNear has to be the first element in a pipeline because it needs to use the geospatial
index. The first pipeline stage is the only stage with direct access to the collection while other
pipeline stages just get the output of the previous pipeline stage. Therefore if we have any filters
which we want to run directly on the collection we can add it here and mongoDB will execute a very
efficient query against the collection and not force us to use a match stage, which will mean that we
have to fetch all the data in order to be able to match in the next step.
Finally, we can use the distanceField argument. The $geoNear will give back the distance that is
calculated between our point and the document it found and we can tell mongoDB in which new
field it should store that value. We named this newfield distance but could be named as anything
we want. We will notice, when executing the aggregation pipeline, in the output there is a new
distance field added.
This is how we can use the $geoNear as a pipeline stage. The most important think to remember
with $geoNear is that it must be the first pipeline stage and thereafter we can add all other stages
that we have looked at before.
224

Useful Links:
Official Aggregation Framework:
https://docs.mongodb.com/manual/core/aggregation-pipeline/
Learn more about $project:
https://docs.mongodb.com/manual/reference/operator/aggregation/project/
Learn more about $cond:
https://docs.mongodb.com/manual/reference/operator/aggregation/cond/

225

Working with Numeric Data

Number Types an Overview
Which important number types do we have to differentiate? These are mostly integers, long
integers (a.k.a long) and doubles which also has different types. To be precise we can work with the
four number types shown below. The below table provides details on each number type as follow:

Integers ( int32 )

Longs ( int64 )

Only full numbers

-2,147,483,648
to
2,147,483,647

Used for normal
integers

226

-9,223,372,036,
854,775,808
to
9,223,372,036,
854,775,807

Used for large
integers

Doubles ( 64bit )

High Precision
Doubles ( 128bit )

Numbers with decimal places
Decimal values
are
approximated

Used for floats
where high
precision is not
required

Decimal values
are stored with
high precision
(up to 34
decimal digits)
Used for floats
where high
precision is
required

MongoDB by default stores numbers as a 64bit double when passing in the number through the
shell no matter if the number is theoretically an integer and has no decimal places. It is important to
note that the decimals are approximated and not guaranteed/stored with high precision.
If we know a number is within an integer range, we should consider using an integer because it will
simply take up less space than if we just enter it as a normal value and therefore automatically
stored as a 64bit double. We should use a long if we are working with full numbers above the
integer threshold range.
We can use doubles for basically all values where we do not need high precision i.e. the quick and
lazy approach to storing numbers, but it is also a valid approach for storing numbers that have
decimal places where we do not need high precision.
Finally, we have high precision doubles if we need high precision for calculations with monetary/
scientific data calculations.

227

Working with Numeric Data

Understanding Programming Language Defaults
It is important to note that the mongoDB Shell is based on JavaScript and runs JavaScript. This is
also the reason why we can use JavaScript syntax in the shell. The default data types are the default
JavaScript data types. This matters especially for the numbers.
JavaScript does not differentiate between integers and floating point numbers and every number is
a 64bit float instead. So 12 and 12.0 are seen as exactly the same number in JavaScript and
therefore also in the Shell and stored as such. Behind the scenes the number is stored with some
kind of imprecision because it is a 64bit float (i.e. 12.0 could be stored behind the scenes as
12.0000003).
This is inherent to the mongoDB shell because it is based on JavaScript and we would face the
exact same behaviour when working with nodeJS mongo driver. However, for other languages and
their drivers e.g. Python this would differ. In Python 12 would be stored as an integer and a value of
12.0 would be stored as a float because Python does differentiate the numeric data types. We
would always need to know the language we are working in and know the defaults for the language
i.e. does it differentiate between integers and doubles? If yes, what is the default integer - is it an int
32? We would then know if we need to convert the number.
228

Working with Numeric Data

Working with Int32
Why would we use Int32? Let’s say we have a collection in our database and we insertOne a person
record. This person has a name and an age. We would normally store the data as seen below:
$ db.persons.insertOne( { name: “Andy”, age: 29 } )
Now there is nothing wrong with the above and if we retrieve our data using the findOne( ) method
we would see that the age seems to be stored as 29. This is because it’s a 64bit float/double and
therefore the integer path is actually stored with full precision while the decimal part is not. Even
though we do not see the decimal part of the number, it is stored behind the scenes and there will
be some imprecision at some point. In our app we Ould use this number as an integer so we do not
care about some imprecision but it is worth noting that there is some imprecision.
$ db.persons.stats( )
If we look at the stats for the collection we would notice the size of the document e.g. a size of 49,
and this is due to a single entry with a name and an age. If we delete all entries and insertOne again
but this time looking at the age, we should notice the difference in size when using a double or
int32.
$ db.persons.deleteMany( { } )
229

$ db.persons.insertOne( { age: 29 } )
$ db.persons.stats( )
Notice the size is now 35 for the one entry in our database. If we now contrast this with an int32 for
the age we would see a difference in size.
$ db.persons.deleteMany( { } )
$ db.persons.insertOne( { age: NumberInt(29) } )
$ db.persons.stats( )
NumberInt (a command within the shell) allows us to store a number as an int32 number rather than
the default float/double. It is important to note that we can pas in the number as seen above or in
quotation marks. We will now notice the size of the single age entry is now 31 which is slightly
smaller. This may be one reason why it might be worth considering using int32.
If we are using the drivers, we would need to look t the driver documentation and the language to
see how we would implement the conversion to Int32 as the NumberInt is a method only for the
shell which is based on JavaScript while the driver will be specific to the programming language
used and what is the default number for that language in order to convert into an int32.

230

Working with Numeric Data

Working with Int64
In this example we will look at storing a very huge int64 value for example storing a company’s net
value. Int32 can only store a number above 2.1 billion, but if we have a company valued higher than
this value and therefore need to store it as a int64 value.
Below is an example of storing a company worth 5 billion but trying to store this using an int32. This
will store successfully without any errors, however, if we try to find this record we will notice that
what is actually stored is a totally different value. The reason for this is because we exceeded the
range limit available and monoDB ends up storing it as a different value to that of the original.
$ db.companies.insertOne( { valuation: NumberInt(“5000000000”) } )
$ db.companies.findOne( )
If we store a 32bit integer using the default 64bit double we would notice that there will be no error
in the number because the 64bit double is larger than the 32 bit int. A 64bit double will not have
the same range as a 64bit integer because the 64 bit double doesn’t just store integers as it also
handles decimal places. So it’s not like the 64bits are fully available for integer values.
$ db.companies.insertOne( { valuation: 2147483648 } )
231

If we have really large numbers, the best way to store it and guarantee that we can store the biggest
possible number that are supported by the int64 range is that we use the NumberLong wrapper (if
using driver, we should refer to the driver guide and check the default value for that language).
$ db.companies.insertOne( { valuation: NumberLong(2147483648) } )
Lets say we want to store the largest possible number in an int64 as seen below. We will notice that
we now get an error, even though we are in the range of accepted values. The problem is that this
value, which is still a number, is simply too big because it’s a double64 which gets wrapped by a
NumberLong.
$ db.companies.insertOne( { valuation: NumberLong(9223372036854775807) } )
To fix the above we should wrap the number in quotation marks instead and this should now work
without any errors.
$ db.companies.insertOne( { valuation: NumberLong(“9223372036854775807”) } )
This is really important to understand that both NumberInt and NumberLong can both be used with
a number passed as a value as well as a number in quotation marks passed as a value. We should
always use quotation marks i.e. basically pass in a string representation of the number because
mongoDB internally will convert the string and store it appropriately as a number. If we pass a
number it still faces the JavaScript limitations in the shell whereas a string does not.
232

Working with Numeric Data

Performing Maths with Float int32s & int64s
Previously we saw that we could store numbers as text and this allows us to store really huge
numbers without losing the accuracy of the number. However, the problem we face with storing
number as a string is that we are not able to perform mathematical calculations using string values.
The below demonstrates the problem when calculating using strings:
$ db.accounts.insertOne( { amount: “10” } )
$ db.accounts.updateOne( { }, { $inc: { amount: 10 } } )
The increment function will not work and this will cause an error in the terminal of “Cannot apply
$inc to a value of a non-numeric type”. We cannot use strings to calculate; however, calculations will
work with NumberInt and NumberLong though.
$ db.accounts.insertOne( { amount: NumberInt(“10”) } )
$ db.accounts.updateOne( { }, { $inc: { amount: 10 } } )
MongoDB will convert the string into a int32 number when we insert, however, when we increment
in the update command, mongoDB will automatically convert the number into a flat 64bit number.
To keep the number as a int32 we would need to write the update command like so:
233

$ db.accounts.updateOne( { }, { $inc: { amount: NumberInt(“10”) } } )
This will succeed as we are no longer working with strings but these special number types provided
by mongoDB. We must remember that we must work with either NumberInt or NumberLong within
our insert and update commands should we wish for the number to be stored behind the scenes as
either an int32 or int64, else mongoDB will store the number as a float 64bit behind the scenes
(although in the terminal it is displayed as a integer).
Note: if we have a int64 number and we update the value without specifying NumberLong,
mongoDB will convert the number into a float 64bit, however, due to the limit of float 64bit it will
round the number. In the below example the float 64bit updated the number as
123456789123456780:
$ db.companies.insertOne( { }, { valuation: NumberLong(“123456789123456789”) } } )
$ db.companies.updateOne( { }, { $inc: { valuation: 1 } } )
If we update the number using the NumberLong constructor provided by mongoDB then this will
correctly increment the int64 number by 1 i.e. 1234567891234567890.
$ db.companies.updateOne( { }, { $inc: { valuation: NumberLong(“1”) } } )
$ db.companies.findOne( )
234

Working with Numeric Data

Whats Wrong with Normal Doubles?
We have looked at both int32 and int64 numbers and it is important to note that we can also sort
and query for them as well using the special NumberInt and Number Long constructors mongoDB
provides us. We are now going to look at doubles number types.
Lets say we have some database which we use for scientific calculations as a hypothetical example.
The default type in the shell is the 64bit floating point number and so if we insertOne document, as
seen below, this will be stored as such and not the high precision decimals.
$ db.science.insertOne( { a: 0.3, b: 0.1 } )
$ db.science.findOne( )
When we find the document, the number will look good as well i.e. they look the same as we stored
them. However, behind the scenes they will actually be stored slightly different for example 0.3 will
be stored something like 0.3000000001 (there will be more decimal places than what is displayed
within the shell and inherently some imprecision).
We will notice the difference between the default 64bit floating point and the high precision
decimal when we perform some form of calculation.
235

$ db.science.aggregate( [ { $project: { result: { $subtract: [ “$a”, “$b” ] } } } ] )
If we perform the above calculation, using the aggregation projection (or any regular calculation
method), we will notice that the number for the calculation of a — b (i.e. 0.3 — 0.1) is not equal to 0.2
but rather mongoDB shows the result of 0.19999999999999998 which is not what we expected.
This demonstrates the imprecision we are talking about when it comes to the default 64bit floating
point number type.
These values are not stored exactly as we insert them. In some use cases, this might not even matter
even if we are building an online shop with products where we store the price. This might be
because we are only displaying the price on the web page and the approximation is alright because
we only display 2 decimal places and would therefore be correctly displayed. Even if we are
charging we might be fine because we would send the data returned from the database to some
kind of third party service and we rely on the provider charging exactly the amount and not some
incrementally lower amount.
If we work with the number and perform some calculation on the server (as aggregate performs on
the mongoDB server) then we might have a problem because the result may not be acceptable for
the application; this is where the high precision double, the 128bit double, can help us.

236

Working with Numeric Data

Working with the 128bit Decimal
Continuing on from the previous example, we will look at the syntax (i.e. the constructor) in order to
use a high precision decimal.
$ db.science.insertOne( { a: NumberDecimal(“0.3”), b: NumberDecimal(“0.1”) } )
$ db.science.findOne( )
The NumberDecimal is the mongoDB constructor for converting into a 128bit decimal value. Again,
if we were to use drivers for Python, Node, C++, Java, etc. we would need to look at the documents
for the driver to find the constructor we would have to use to create a 128bit decimal. Again we
would use the quotation marks to pass in the number although we do not necessarily have to use
them but will face the issue of imprecision if we do not use the quotation marks.
$ db.science.aggregate( [ { $project: { result: { $subtract: [ “$a”, “$b” ] } } } ] )
If we now run the aggregation command we saw before, we should now see the result we were
expecting i.e. 0.2 as the result (i.e. an exact number decimal). Just like the other number types we
can calculate, sort filter, update etc. using this high precision decimal constructor. The same rule we
saw for int32 and int64 also applies for the 128bit floating decimal.
237

It is worth noting that the high precision does come at a price to the size of the document.
$ db.nums.insertOne( { a: 0.1 } )
$ db.nums.stats( )
$ db.nums.deleteMany( { } )
$ db.nums.insertOne( { a: NumberDecimal(“0.1") } )
$ db.nums.stats( )
We will notice the size of the default 64bit floating decimal for this document is 33 whereas the size
for the 128bit floating decimal is 41. This is the cost we must pay when using the high precision
decimal because more space is reserved for this number type. Therefore using this for all decimal
values may not be optimal, but cases where we do need the high precision then this is the solution
for performing mathematical calculations without losing the precision.
Useful Links:
Float vs Double vs Decimal - A Discussion on Precision:
https://stackoverflow.com/questions/618535/difference-between-decimal-float-and-double-in-net
Modelling Number/ Monetary Data in MongoDB:
https://docs.mongodb.com/manual/tutorial/model-monetary-data/
238

Number Ranges:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/d2f723c7f00a-4600-945a-72da23cbc53d/can-anyone-explain-clearly-about-float-vs-decimal-vs-double-?
forum=csharpgeneral

MongoDB & Security

Understanding Role Based Access Control
Authentication and Authorisation are actually two concepts which are closely related. Authentication
is all about identifying users in our database, while authorisation on the other hand is all about
identifying what these users may then actually do in the database. MongoDB uses these two
concepts to control who is able to connect to the database server and then what these people who
are able to connect can do to the database. It is important to understand when we say people or
users, we mean apps/users directly interacting with the mongoDB server and not the application we
are building.
It is important to fully understand the authentication system mongoDB uses. MongoDB employs a
role based access control system. Below is a example diagram to demonstrate how the
authentication system works.

239

Not the User of of our
Application!

User
(Data Analyst / Application)

Privileges
Resources

Actions

Login with username & Password
Auth
e

nabl

MongoDB Server
Shop Database
Products
Collection

Customers
Collection

Blog Database
Posts
Collection

ed

Admin Database

Authors
Collection

We have a mongoDB server with three databases, the admin collection being one of them (the
admin collection being a special database collection that exists out of the box). We enable
authentication and suddenly our mongoDB server only allows authenticated users to interact with
the collections, documents and the overall server.
Lets say we have a user, it is important to note the user is either a real person like a data analyst who
directly connects to the database through the shell or an application which uses the mongoDB
driver to connect to our database. They need to login with a username and a password. MongoDB
automatically has a process in place where we have to enter a username and password.
240

The user must exist on the mongoDB server, otherwise logging in will be impossible. If the user
exists and is logged with that information only, then we are not able to do anything. Users in
mongoDB are not just entities that are made up of a username and password but they are also
assigned some roles and these roles are essentially groups of privileges.
A privilege is a combination of a resource and an action. A resource
would be something like the products collection in our shop
database and an action would be something like an insert
command. So actions are essentially the kind of tasks/commands
we can execute i.e. we can do in our mongoDB database and the
resource is the collection we can execute the command.

Privileges
Resources

Actions

Shop =>
Products

insert( )

So in the privilege example, we have access to insert documents into the Products collection in the
Shop database. This will allow us to do something as a logged in user. This is what is known as a
privilege and typically we do not just have one privilege but instead multiple privileges are grouped
into a so-called roles and therefore a user has a role and that role includes all the privileges.
This is the model mongoDB uses and this is a very flexible model because it allows us to create
multiple users where we as the database owner or administrator can give every user exactly the
rights the user needs which is very important.
241

MongoDB & Security

Creating a User
Users are created by a user with sufficient permissions with the createUser( ) command. We then
create a user with a username and a password and as mentioned this user will have one or more
roles and then each role will typically contain a bunch of privileges. Now a user is created on a
database, so a user is attached to a database.

createUser( )

“ashleywilliams”

updateUser( )

Roles | Privileges

Database (e.g. admin)
Access is NOT limited to
authentication database.

This does not actually limit the access of the user to that database only as we might think. This is the
database against which the user will have to authenticate, the exact rights that the user has will
depend on the roles we have assigned to that user. So even if the user authenticates against the
shop database, the user might still have the role to work with all databases that exists in our
mongoDB environment. Whilst this sounds strange, it will become more apparent as to why we can
assign users to different databases.
Not only can we create users but we can also update users if we ever need to. This means that the
administrator update user for example change the password but we should tell the user too.
242

To update the user we would use the updateUser( ) command.
So to see this in practice we would use the following commands. First we must bring up the
mongoDB server but now adding the --auth flag. This flag simply means that we now need to
authenticate to work with that database.
$ sudo mongod --auth
Previously we did not have the flag setting which meant that everyone who connects is allowed to
do everything. This now ends with this setting. When we start the server with that setting, we now
require users to authenticate. If we open up a second terminal and connect using mongo, we would
have expected it to fail but actually we are still able to connect. However we will face authorisation
issues whereby nothing will display is we use the show dbs or show collection commands.
$ mongo
There are are two different ways of signing in. We can either sign in with db.auth( ) command which
takes in a username and password or during the connection process using the -u and -p flags for
user and password. Both demonstrated below:
$ db.auth( “Abel”, “Password” )
$ mongo —u Abel —p Password
243

The issue we have is that we have no users created; however, mongoDB has a special solution
called the localhost exception. When connecting to our database in this state, where we have not
added any users, we are allowed to add one user and that one user should of course be a user who
is then allowed to create more users. To do this we first of all should switch to the admin database
and then use db.createUser command.
$ use admin
$ db.createUser( { user: “Abel”, pwd: “Password”, roles: [ “userAdminAnyDatabase" ] } )
A user is created by passing in a document into the createUser command and the first argument
should be the user and the value to be the username. The second argument should be the
password. The third argument should be the roles key and this hold an array of roles the user
should have. We can add roles in different ways, but the one role we should add is the
userAdminAnyDatabase role. This is a special inbuilt role which will grant the user the right to
administrate any database in this mongoDB environment. Once the user has been created, we then
need to authenticate using the .auth( ) command:
$ db.auth( “Abel”, “Password” )
When executing the command we must make sure we are within the admin database to
authenticate. Once we have authenticated we should be able to run the show dbs or show
collection commands without any error.
244

MongoDB & Security

Built-in Roles Overview
MongoDB ships with a bunch of built-in roles that basically cover all the typical use cases we have.
We can also create our own roles which is beyond the scope of this guide because this will be a
pure admin task — there will be a link to the official document where we can learn how to create
such roles and it’s actually pretty simple.
So what built-in roles does mongoDB provide? We have typical roles as seen in the below table:
Role Type

Roles

Database User

read, readWrite

Database Admin

dbAdmin, userAdmin, dbOwner

All Database Roles

readAnyDatabase, readWriteAnyDatabase, userAdminAnyDatabase,
dbAdminAnyDatabase

Cluster Admin

clusterManager, clusterMonitor, hostManager, clusterAdmin

Sometimes we will have cases where we want to give a user rights to do something in all databases
and also in all future databases and we do not want to manually add this, therefore we have these
all database roles.
245

Besides these roles we also have cluster administration roles. Clusters are essentially constructs
where we have multiple mongoDB servers working together. This is used for scaling, so that we can
have multiple machines running mongoDB servers and store our data, which then works together.
Managing these clusters of servers in a meaningful task and performed by people who know what
they are doing and they have their own rights and we can assign their own rights to let them do
their job.
Role Type

Roles

Backup / Restore

Backup, restore

Super User

dbOwner (admin), userAdmin (admin), userAdminAnyDatabase, root

We also have special backup and restore roles in case we have some users who are only
responsible for this type of job.
Finally, we have the super user roles. If we assign the dbOwner or a userAdmin role to the admin
database, this will be a special case because the admin database is a special database and these
users are then able to create new users and also change their own role and this is why its a super
user and are very powerful roles. The root role is basically the most powerful role as this user can do
everything (i.e. the same rights we had before locking down the database with —auth flag).

246

MongoDB & Security

Assigning Roles to Users & Databases
An alternative method of signing into the mongoDB database is to use the flags, but it is important
to add the --authenticationDatabase flag if we have not switched to the database the user was
created.
$ mongo -u Abel -p Password --authenticationDatabase admin
This will successfully log us into the database without any errors and we do not need to run db.auth
again. Below is another example of creating a new user on the shop database and assigning a role.
$ use shop
$ db.createUser( { user: “appdev”, pwd: “dev”, roles: [ “readWrite” ] } )
We now have a new user and we can log in as the user using the db.auth command passing in the
username and password as arguments.
$ db.auth( “appdev”, “dev” )
The 1 in the shell will signify the login was successful. If we now try to insert a new document into
the shop collection we would notice that we get an error of “too many users are authenticated”.
247

To mitigate this we should have ran the logout command before switching to the new user.
$ db.logout( )
The quickest way to fix this error is to quit the mongo shell and then restart the mongo shell.
$ ^C
$ mongo -u appdev -p dev --authenticationDatabase shop
Notice the appdev authenticationDatabase is not admin but rather shop. This is because the user
was created in the shop database and must be authenticated against the shop database. If we try
against the wrong database the login will fail.
If we now try to insert a new product into the shop database logged in as the appdev user account
we would still get an error and this is because we must use the shop database (if we quit mongo
shell and logged back in).
The readWrite role was assigned to this user, but since the user was created on the shop database,
the readWrite role by default also only gave us readWrite access to that shop database. This is the
one thing that assigning a user to a database does for us out of the box. Therefore we can really
create the users who are assigned to the database they should work with and they only have the
necessary roles, so that the user would have to login against the database they should work with.
248

MongoDB & Security

Updating & Extending Roles to other Databases
If we want to change our users roles so that they can access another database we can use the
updateUser command. We must perform this command with a user account that has access to
create/update users, otherwise we would receive an error. Furthermore, we must execute the
command connected to the shop database in order to find the user account that requires updating:
$ db.updateUser( “appdev”, { roles: [ “readWrite”, { role: “readWrite”, db: “blog” } ] } )
The first argument to the updateUser command is the username of the user that we want to update.
The second argument is a document describing of the changes/update we want to make to that
user. We have a couple of key value pairs we can use which can be found in the official
documentations (e.g. pwd is a key we can use to change the password of a user).
It is important to note that these roles we specified above are not added to the existing roles but
rather will replace the existing roles with the new roles we specify as part of the updateUser
command. The first argument provides the readWrite role to the database the user was registered
against. The second argument allows us to pass in a document where we give the user access to a
new database and their role. We can check the user account for the update:
$ db.getUser( “appdev” )
249

Important Note: When logging into an account using the auth command we must either be in the
database the user is registered to or use the --authenticationDatabase flag to authenticate against
the database the user was created. When logging out using the logout command we must be within
the database the user account is registered to. Therefore using the use command to switch to the
correct database before running commands in crucial, else this could be the cause for errors in the
shell when trying to run certain commands such as too many users authenticated. Ultimately we can
quit the shell and re-log into the shell which will log us out from all accounts completely.
We can create a user and add multiple roles to different database, without giving any roles to the
database the account was registered to as seen below:
$ db.createUser( “appdev”, { roles: [ { role: “readWrite”, db: “customers” }, { role: “readWrite”, db:
“sales” } ] } )

MongoDB & Security

Adding SSL Transport Encryption
Now that we know how to lock down our database using users, we are now going to look at making
sure data that is transferred from our app to the database is also secure. This could be a node, PHP,
Python, C++ or whatever language the application uses the mongoDB driver to then communicate
to the mongoDB server and store data. It is important that the data is encrypted whilst it is in
250

transport, so that someone who is spoofing our connection cannot read our data and mongoDB has
everything we need for this built in. We are going to now explore how we can easily secure our data
whilst it is on its way from client to server.
MongoDB uses SSL (actually TLS which is the successor to SSL). SSL in the end is just a way of
encrypting our data whilst it is on its way. We will use a public private key pair to decrypt the
information on the server and to prove that we as the client are who we make the server think we
are. So essentially it is a secure way of encrypting our data and decrypting it on the server and whilst
it is on its way, it is consistently encrypted.
We will need a command to execute which will create us the files we need to enable SSL encryption.
On mac/linux we can simply run the command in the terminal but on windows this will not work. For
Windows OS we need to install openSSL in order to run the command.
$ openssl req -newkey rsa:2048 -new -x509 - days 365 -nodes -out mongodb-cert.crt -keyout
mongodb-cert.key
We will be asked a couple of questions — note we can skip some of the first few question by
pressing dot and then enter:
$ Country Name: UK
$ State or Province Name (full Name) [Some-State]: .
251

$ Locality Name (eg, city) []: London
$ Organization Name (eg, company) [Internet Widgits Pty Ltd]: .
$ Organization Unit Name (eg, section) []: .
$ Common Name (e.g. server FQDN or your name): localhost
$ Email Address []: email@email.com
The important part to the setup is the common name here. We must fill in localhost during
development whilst running this on local host. If we were to deploy our mongo database onto a
server in the web, we need to fill in the address of that web server and this will be important,
otherwise the connection will fail because this will later be validated that if we are connecting to a
server, the server we are connecting to is the server that is mentioned in our certificate.
Once we add the email address we are now done and should have two files, the mongodb-cert.key
and the mongodb-cert-crt files. We now need to concatenate both these files into one file by using
a command on mac/linux:
$ cat mongodb-cert.key mongodb-cert.crt > mongodb.pem
On Windows we would run the command instead:
$ type mongodb-cert.key mongodb-cert.crt > mongodb.pem
252

We will now have a mongodb.pem file and this is the file we will need to enable SSL encryption.
How does this now work? In the folder where we stored the .pem file, we can now copy and move
this file around in our file system, but in the folder where the file lives we can start our mongod
server with SSL enabled.
Running the --help command in the mongod shell will show us all the SSL commands available to
us. One of the option we need to set is the mode. The --sslMode flag defines whether SSL is
disabled, allowSSL (allows to connect with/without SSL), preferSSL (only important if we are using
replica sets) or requireSSL (must have SSL connection else denied).
We would need to point to our .pem file using the --sslPEMKeyFile flag. We can also have a -sslCAFile flag argument passed for a certificate authority file which we can get our SSL certificate
through an official authority online paide/unpaid. This will be passed in addition to our .pem file
which the CA file will be an extra layer of security that basically ensures that man in the middle
attacks can be prevented.
If we deploy our mongoDB database in production, we would get our SSL certificate from a certified
authority and they would give us a .pem file and a .ca file so that we can basically add both
arguments and point at the respective files when launching our server.

253

Mac/Linux:
$ sudo mongod --sslMode requireSSL --sslPEMKeyFile mongodb.pem
Windows:
$ mongod --sslMode requireSSL --sslPEMKeyFile mongodb.pem
This will start the server and we will see a warning that we have no SSL certificate validation in place
because we are missing that sslCAFile, but besides this we now have our server which is now
waiting for connection on port 27017 ssl.
To connect to the server, we should navigate in a new terminal window, to the same folder where
we have the .pem file and launch our mongo client. Running the mongo command should now fail,
if it succeeds then we have connected to another mongoDB instance which we can shut down using
the db.shutdownServer( ) command. This should fail because we have no mongod running which
would allow connections from non-SSL clients.
We need to set two things, first we need to enable SSL using --ssl flag and we will need to pass
our .pem file as a —sslCAFile. We may also need to add the host (we specified during certification
creation) for this command to work:
$ mongo --ssl --sslCAFile: mongodb.pem --host: localhost
254

If we do not specify the host this will try to connect to 127.0.0.1 which is the localhost but technically
is a different word and therefore not considered equal to localhost and therefore by specifying the
--host: localhost we have made it really clear that this is the host we are expected to see named on
the backend. This is also the host on the certificate and therefore this works.
Now obviously we can have more elaborate setup, but this will do for now and demonstrates how
we can generally connect with SSL turned on and now all data we send from the client i.e. from the
mongo shell to the server (mongod) will be encrypted.

MongoDB & Security

Encryption at REST
Encryption at REST simply means that the data we stored on our mongoDB server in a file, this
might also be encrypted. We can encrypt two different things.
We can encrypt the overall storage so the files themselves — this is a feature built into mongoDB
enterprise. We as a developer should always encrypt at least certain values in our code such as user
passwords are hashed and not stored as plain text. We could go as far as hashing/encrypting all
data.
We can encrypt and hash both our data as well as the overall file to have the most security possible.
255

The hashing of password will be something we will see in the From Shell to Driver Chapter.
Useful Links:
Official MongoDB Users& Auth Doc:
https://docs.mongodb.com/manual/core/authentication/
Official Security Checklist:
https://docs.mongodb.com/manual/administration/security-checklist/
What is SSL/TLS?:
https://www.acunetix.com/blog/articles/tls-security-what-is-tls-ssl-part-1/
Official MongoDB SSL Setup Doc:
https://docs.mongodb.com/manual/tutorial/configure-ssl/
Official “Encryption at Rest” Doc:
https://docs.mongodb.com/manual/core/security-encryption-at-rest/

256

Performance, Fault Tolerance & Deployment

What Influences Performance?
What influences performance?
On one hand, there are things which we directly or indirectly control as a developer.
1. We should write efficient queries and operations in general i.e. inserting, finding, etc. and all this
should be done in a careful way that we only retrieve data we need, we insert data in the right
format with the right write concerns and so on.
2. We should use indexes, either we have access to the database and we can create them on our
own or we need to communicate with our database admin, so that we can ensure that for the
queries our application does, we got the right indexes to support these queries to run efficiently.
3. A fitting data schema is also important. If we always need to transform our data, either on the
client side or when fetching it through the aggregation framework (if we need to do a lot of
transformation for common queries), then our data format as it is stored in our collection might
not be optimal. We should try to reach a data schema in our database that fits the application or
our use case needs.
These are block of factors which we can influence as a developer and as a database administrator.
257

On the other hand the hardware and network on which we deploy our mongoDB server and
database matters. Sharding is another important concept and so are replica sets. These factors are
not really tasks a developer needs to be involved in too much as these are typical DB/System Admin
tasks. We will not dive too deeply into these because these are very complex matters — but will be
something we have to understand to understand the big picture of mongoDB and what it is all
about.

Developer / DB Admin

258

DB Admin / System Admin

Efficient Queries / Operations

Hardware & Network

Indexes

Sharding

Fitting Data Schema

Replica Sets

Performance, Fault Tolerance & Deployment

Understanding Capped Collections
Capped Collections are a special type of collection which we have to create explicitly where we limit
the amount of data or documents that can be stored in the collection. The old documents will
simply be deleted when the size is exceeded. It is basically a store where the oldest data is
automatically deleted when new data comes in.
This can be efficient for high throughput application logs where we only need the most recent logs
or as a caching service where we cache some data and if the data then was deleted because it has
not been used in a while, then we are fine with this and we can just re-add it.
To create such a collection, we use the createCollection command:
$ db.createCollection( { “collectionName”, { capped: true, size: 10000, max: 10 } } )
Firstly, we define the name of the collection and then pass in the document specifying the options
for the collection. We want to set capped to true to enable the capped collection. By default the size
limit will be 4 bytes but we can increase/set the size to any value which will then automatically be
turned into a multiple of 256 bytes. The max option allows us to define the maximum number of
documents that can be stored in the collection i.e. capping by number of documents.
259

It is important to note that for capping collections, the order in which we retrieve the document is
always the order in which they were inserted. For a normal collection that may be the case but it is
not guaranteed. If we want to change the order and sort in reverse there is a special key we can use
called $natural which allows us to sort by the natural order. Positive 1 will sort in the normal order
while negative 1 will sort in the reverse order. We can also create indexes in a capped collection and
we have an index on the _id by default.
$ db.collectionName.find( ).sort( $natural: -1 ).pretty( )
The idea of a capped collection is not to give an error when we insert too many documents but
rather it clears the oldest document i.e. if we already have 100 documents, the next insert should
delete the very first inserted document in the collection but the total documents would remain 100
documents using the above example.
When would we use a capped collection instead of a normal collection? Well because of the
automatic clear up. We can keep this collection fairly small and therefore the collection will be more
efficient to work with and we do not have to worry about manually deleting old data. So for use
cases where we need to get rid of old data anyways with the new data coming in or where we need
a high throughput and it is ok to lose old data at some point, like for caching, then the capped
collection is something we should keep in mind as a tool to improve performance.

260

Performance, Fault Tolerance & Deployment

What are Replica Sets?
Replica sets are something we would create and manage as a
database or system administrator. What are replica sets? Lets say we

Client (Shell, Driver)
Write

have our client, either the mongo shell or some native driver for
Node, PHP, Python, etc. We want to write some data to our database
and we send our insert/write operation to the mongoDB server

MongoDB Server

which in the end talks to the primary node we could say.
Note that a Node here is simply a mongoDB server.
So when we used thus far with the mongod
command gave us a node and this was the only node
we had. So the mongoDB server is technically

Write

Replica Set
Primary Node
Asynchronous Replication

attached to that node but it is easier to understand it
like this.

Secondary Node

Secondary Node

We can add more nodes which are called secondary nodes, which are essentially additional
database servers that we start up but are all tied together into what is known as a replica set. The
idea here is that we always communicate with the primary node automatically and we do not need
to do this manually. If we send a insert command to our connected mongo server, it will
261

automatically talk to the primary node, but behind the scenes the primary node will asynchronously
replicate the data on the secondary nodes. Asynchronously means if we insert data, it is not
immediately written to the secondary nodes but relatively soon.
So we have this replication of data. Why do we replicate data? If we have the setup seen above and
we read data and for some reason, our primary node should be offline, we can reach out to a
secondary nodes in a replica set which will then be the elected new primary node. The secondary
nodes in a replica set hold a so-called election when the primary node goes down to elect and
select a new primary node and then we talk to that new primary node until our entire replica set is
restored. We therefore get some fault tolerance here because even if one of our servers was to go
down we can talk to another node instance in that server network (in the cluster) to still read data.
As a new primary we can then also not just read but also write data.

MongoDB Server
Read from new

Replica Set

Primary Node (Offline)

Secondary Node

262

elected Primary Node.

Secondary Node

This is the reason we use replica sets. We have the backup and fault tolerance and we also get
better read performance as well. We can configure everything such that our backend will
automatically distribute incoming read requests across all nodes. This is only for read request only
as the write request will always go to the primary node. The configuration of the reads is a task for
the system/database administrator and we want to ensure we can read our data as fast as possible.
Therefore, with replica sets we get the backup, fault tolerance and improved read performance.

Performance, Fault Tolerance & Deployment

Understanding Sharding
Sharding and Replica sets are sometimes confused with each other but they are actually different
things. Sharding is all about horizontal scaling. When we talk about scaling, if we have a mongoDB
server (and we really mean a computer which runs our mongoDB server and where our database
files are stored on) and we need more power because we are now getting more requests coming in
because we have more users and more read and write operations. What can we do? We can
upgrade the server i.e. buy more CPU, memory etc. and put that into the server (using a cloud
provider we can simply upgrade on a click of a button). This is a solution but only gets us so far
because at some point we cannot squeeze any more CPU or memory into a single machine. At that
point we would need horizontal scaling, which means we need more servers.

263

The issue here of course is that the servers now do not

MongoDB Servers

duplicate the data and they are not backups but
instead split the data. So server A stores data for the
same application as the other servers but a different
chunk of the data. With sharding, we have multiple
computers who all run mongoDB servers but these
servers do not work standalone but work together and

Data distributed (not replicated) across shards
Queries are run across all shards

split up the available data, so the data is distributed across our shards and not replicated.
Therefore, queries for read, find, insert, update and delete operation have to be run against all the
servers or the correct server because each chunk manages its data. Its range of data if we were to
split and store alphabetical data i.e. the first server would store A to F, the next server stores F to L
and so on, the operations would have to be forwarded to the correct servers. How does this work?
If we have our different mongod instances (our different servers/shards) and each shard can and will
typically be a replica set (i.e. we have multiple replica sets because each shard is also a replica set) —
so if we have these servers and we have our client, we then have a new middleman which we have
not used in this guide called mongos. Mongos is the executable and is our router mongod offers.

264

The mongos router is responsible for

Client

forwarding our operations such as
mongos (router)

inserts, reads, update and so on to the
correct shards. So the router has to find

mongod

mongod

mongod

out which shard is responsible for the

(Server / Shard)

(Server / Shard)

(Server / Shard)

data we are inserting i.e. where should
this data be stored and which shard will

Shard Key

Shard Key

Shard Key

hold the data we want to retrieve.

For this splitting we would use a so-called shared key. Shard key is essentially just a field thats
added to every document which is important for the server to understand where this document
belongs to. This shard key configuration is actually something which is not trivial because we want
to ensure we have a shard key that is evenly distributed. We can assign our own values but we
should ensure that the data is roughly evenly distributed so that the data is not stored all on one
server.
Example of Sharding — We issue a find query on our client to look for a user named Andrea, this
reaches mongos and now there are two options:
The first option is that our find query did not specify a value for the shard key e.g. we are looking for
Andrea but our shard key is some other value and not the name. In our find filter there is no
265

information regarding the shard key and therefore mongos does not know which shard is
responsible for handling this request. In such a case mongos has to broadcast our request to all
shards and then each shard has to check if it is responsible i.e. has the data and then each shard
returns its response which is either the data or not the data. Mongos will then have to merge all that
data together and return it.
The second option is that our find query does contain a shard key which is the username and we are
searching for the username in our find filter. Mongos can directly forward this to the correct shard
and fetch the data from where it is stored and therefore of course is more efficient.
Choosing the shard key wisely is important and usually a job of the admin. This is an important part
of mongos, the router, finding out where an operation should go i.e. which server is responsible for
the operation.
As a developer if we know that we are using sharding, we should sit together with the administrator
and choose a wise shard key based on our application needs that is evenly distributed so that we
can write queries that uses the shard key as often as possible.
Sharding is all about distributing data across servers and setting up everything so that data can be
queried and used efficiently. This is an advanced topic just as with replica sets and is something a
developer will not have to worry about but is something we should be aware of.
266

Performance, Fault Tolerance & Deployment

Deploying a MongoDB Server
We now want to get our localhost and mongod instance onto the web i.e. we want to get it on a
server that we can reach from anywhere and not only from inside of our local computer. Deploying
a mongoDB server is a complex task and is definitely more complex than just deploying a website
because we need to do a lot of configuration. We need to manage shards if we have sharding, we
have to manage replica sets, setup authentication to the database, protect the web server and the
network (totally unrelated to mongoDB), ensure software stays up to date (e.g. general software,
mongoDB related software and security patches etc), setup regular backups (including backup to a
disk) and encryption both during transportation and at rest.
There are a lot of complex tasks that we have to manage when deploying and is beyond the scope
of most developers as this is very much a system administrator task. This is something outside the
scope of this guide but we can look at a managed solution provided by mongoDB called mongoDB
Atlas, which gives us scalable and best practice mongoDB server running in the cloud which we can
configure through a convenient interface. We can scale up and scale down the servers and there is
also a free tier which provides us all the things we have mentioned above automatically.

267

Performance, Fault Tolerance & Deployment

Using MongoDB Atlas
If we visit mongoDB.com there is a chance that we see the mongoDB Atlas on the home page
where we can click on the get started free button. We are then required to sign up to the service
and there is no credit card required so that we can start absolutely free without any danger of
getting charged.
Once signed in we will either be greeted to the dashboard page with the clusters option selected
or have an option to create a project first before shown the dashboard.
A cluster simply describes the mongoDB environment. A cluster contains all our replica sets and
shards. It is basically what we would deploy i.e. ur deployed mongod server. On the cluster option
page we can click on build a new cluster and this will take us to a page where we can now configure
our mongoDB environment.
Generally, we have a global cluster configuration which is not available for free but we can choose
different parts of the world where we want to deploy different clusters or shards that then talk to
each other with a couple of clicks. The idea is so that our data is distributed across the world so that
users have the shortest way to the data possible. This is something we can enable or disable.
268

The next step is to choose the underlying cloud provider, we do not need to sign up to these cloud
provider but mongoDB the company does not host its own data centres, instead the mongoDB
solution we configure here will be deployed in one of the free data centre providers. We can
choose from Amazon AWS, Google Cloud Platform or Microsoft Azure providers and then the
region where we want to deploy in the case we are not using the global cluster configuration. We
need to make sure we select a region where the free tier is available.
The next option is to select the Cluster Tier which defines the power we have and what we can do.
In the free tier we have to select M0 cluster instance else we can choose a different tier such as M10.
We can then customise the storage i.e. how much can be stored overall in gigabytes (GB). We can
also enable the auto-expand storage option to expand storage before we exceed it.
In the Additional Settings we can choose the storage engine version we want to use (this is only
available in the paid tier).
We can then configure backups of which we have two types of backup. Continuous means that all
data is backed up all the time i.e. a continuous backup history, while the alternative is a snapshot
approach where we have a 24 hour period backup. The snapshot approach poses danger of losing
data for the last 23hours or so if the last backup is that long ago. Again this is only available if we are
on a paid tier.

269

The next option is sharding (we require M30 tier or above for this option) where we can choose the
number of shards we want which adds more power besides the tier we chose. We can also add
options such as Encryption at Rest and enabling Business Intelligence Connects but again only
available for the paid tiers only.
There are then more configuration options which we can either ignore or setup some more
configurations. Finally, we can then assign a cluster name and then click on the create cluster
button. MongoDB will deploy this solution onto the web i.e. onto a couple of servers and will take a
couple of minutes and thereafter we will have a fully functional mongoDB environment running in
the cloud, automatically secured and configured according to the best practices. It will also
automatically be a replica set i.e. we get a free nodes replica set. We can add sharding and also
reconfigure the cluster after its running. We can see more on this within the mongoDB Atlas
documentations.
We can then go to the security tab and here setup secure access to our mongoDB databases. We
should definitely add a user here and we can add as many users as we want. Another important
setting is the IP Whitelist, we would need to add the IP address of the server that is running our
application or during development. We can automatically fetch our current local IP address using
the button. We can also turn on some Enterprise Security Features but is again beyond the scope of
this guide.
270

Performance, Fault Tolerance & Deployment

Backups & Setting Alerts in MongoDB Atlas
Atlas is a powerful tool for getting our mongoDB environment up and running. If we have backups
turned on, we can restore them here once they are available. We can also configure alerts which
allows us to see what happened and also allow us to create new alerts with the add new alert button
option where we can get an email alert when something happens e.g. when a user logs in or when
the average execution time for reads is above a certain millisecond value. We can setup a bunch of
alerts for all kinds of things to always keep track of what is happening in our cluster which is of
course really useful.
In general we should look at the Atlas documentation to learn everything about Atlas when
planning to use it in production. It is a strong recommendation to use Atlas as a managed solution
for getting our mongoDB environment up and running unless we are a system admin and
absolutely know what we are doing when we want to configure everything on our own which we
can do but is not covered in this guide which is for developers.

271

Performance, Fault Tolerance & Deployment

Connecting to Our Clusters
Now with the cluster up and running, we can now work with the cluster. On the cluster we have a
couple of options such as migrate data into the cluster, pause the cluster, terminate the cluster,
check our general metrics about the cluster and view all other informations regarding our cluster.
Note some of the options may be outside the free tier. The interesting part is how can we connect to
the cluster?
On the overview page we can click on the connect button and a modal should appear. We will see
all the IP addresses that will be able to connect and we can also add a whitelist entry from here. We
can choose the way of connecting to the cluster. We can connect through the shell but of course are
also able to connect from an application. If connecting via shell, we can follow the instruction for our
OS to download the necessary software and run the shell command. This will require our password
to connect. We no longer need to run the mongod on our local machine as we now have it on the
web. Below is an example of the shell command:
$ mongo “mongodb+srv://cluster0-ntrwp.mongodb.net/test” —username John
The path is important as it tells mongo not to connect locally but to connect to the server at the
address specified in the command. The /test tells which default database to connect to.
272

We can change this to another database if we want. We then need to add our username using the
flag and we can then hit enter. This will prompt for a password which we should enter. This
password will be the password we used when creating the user we are trying to login as. This
should then connect to our mongoDB cluster server running in the cloud. We can now use the
terminal to run our mongoDB commands on our database as normal. We do not need a separate
terminal when running the commands because the mongod server is already running in the cloud.
We have connected to the server from our shell to run our normal mongoDB commands on our
databases. We can now use the mongoDB server from anywhere and not just from our local
computer.
Useful Links:
https://docs.mongodb.com/manual/replication/
https://docs.mongodb.com/manual/sharding/
https://docs.atlas.mongodb.com/getting-started/

273

Transactions

What are Transactions?
MongoDB v4.0 or higher is required for transactions. Atlas does not provide MongoDB v4.0 for the
free tier.
What are Transactions? If we take a use case of a Users collection and a Post collection. Lets suggest
most users have a couple of posts. The posts are related to the users because the user is the person
who created the post. So we have a stored relation either via a reference or a key stored in the user
or in the posts document, it doesn’t really matter. We now delete the user account. Therefore, we
want to delete the documents in both the users and posts together i.e. delete documents in two
collections.
Now this can be done without transactions, we can simply delete a user and right before we do that,
we can save the id of the user and then reach out to the posts collection and find all posts linked to
that user id and delete those posts. This is perfectly possible without transactions.
However, what if we have a use case where the deletion of the user succeeds but during the post
deletion something goes wrong i.e. temporary server outage or network issue, etc. We now end up
in a state where the user was deleted but the posts are still there but the user they are pointing at
274

doesn’t exist anymore. This is the exact use case where transactions help us with.
With a transaction we can basically tell mongoDB that these operations (as many as we want) either
succeed together or they fail together and we roll back i.e. restore the database in the state it was
before the transaction regarding the documents that were affected in the transaction. This is the
idea behind transactions. In order to try this out we need MongoDB v4.0 and a replica set.

Transactions

A Typical Use Case
So if we had a Atlas Server setup that has access to MongoDB v4.0 or above we can connect to it
and setup a database collection to trail this out using the following commands below:
$ use blog
$ db.users.insertOne( { name: “Beth” } )
$ db.posts.insertMany( [ { title: “First Post”, userID: ObjectId(“5ba0adfacfd31f”) }, { title: “Second
Post”, userId: ObjectId(“5ba0adfacfd31f”) } ] )
We now have a user called Beth created and two posts that has a reference to Beth’s userID created.
Now if we want to delete the user, we would obviously try to find the userID which is something we
275

would know in the application as the user would be in their account and click some button to delete
their account. So in the end we would do something like the below:
$ db.users.deleteOne( { _id: ObjectId(“5ba0adfacfd31f”) } )
$ db.products.deleteMany( { userId: ObjectId(“5ba0adfacfd31f”) } )
This is how we could clear the data and this would work 99.9% of cases. The problem with the
above is where it doesn’t work because something goes wrong and we end up in a state where it
suddenly deleted the user but not the posts or vice versa. Now we can use Transactions for this use
case and will explore how transactions work in the next section.

Transactions

How Does a Transaction Work?
Now for a transaction, we need a so-called session. A session basically means that all our requests
are grouped together logically. We create a session and store it in a constant as seen below:
$ const session = db.getMongo( ).startSession( )
db.getMongo gives us access to the mongo server while .startSession creates a session. We now
have a session stored in a const variable. This session is basically an object that now allows us to
276

group all requests that we send based on that session together. We can now use that session to
start a transaction:
$ session.startTransaction( )
The session is important because technically every command we issue is sent to the server and then
normally the server would forget us and therefore we need a session so that when we send
something based on that session, the server remembers that session because behind the scenes
the sessions are managed through a session key and so on. The server will know that the command
we just sent it should be considered in the context of the other commands it was sent based on that
session.
Below is more commands using the session:
$ const usersCol = session.getDatabase( “blog” ).users
$ const postsCol = session.getDatabase( “blog” ).posts
The above creates a new const variable which we want to get the database blog and the users
collection. We repeat the same now for the post collection. We have now started the transaction
and got access to our collections. We can now write all the commands we want to run against these
collections that we got.
277

$ db.users.find( ).pretty( )
If we run the above command to get the user id, note that this will not be included in the transaction
because the command was not included in the session which we started the transaction. We just
needed to get the id right now. We can now go back to the session transaction using the userId we
retrieved.
$ usersCol.deleteOne( _id: ObjectId(“5ba0adfacfd31f”) )
The usersCol variable is a pointer to our user collection but also mapped to the session. We are now
using the collection in the context of our session through the variable and we can now use our
normal operations such as insert, update, delete etc. If we hit enter to run the command, this
session transaction command will be acknowledged and deleted, but if we repeat the
db.users.find( ) command, we will see the user still exists. So the user has not been deleted yet, it
was just saved as a to-do. We can then run a command on the postsCol variable.
$ postsCol.deleteMany( userId: ObjectId(“5ba0adfacfd31f”) )
This will also be acknowledged and in the terminal said to be deleted too, however, it has not yet
write this to the database as we can prove by looking into the posts using the db.posts.find( ).pretty
command to view all posts in the collection.
278

In order to actually commit the session transaction changes to the database we have to run the
following command:
$ session.commitTransaction( )
The commitTransaction( ) command will execute the session commands. To abort a session
command there is a command we can use should we not wish to continue with the session:
$ session.abortTransaction( )
This allows us to cleanly close the session command. Therefore, the commands will not be
committed to the database.
If we execute the steps: create a session const, getting access to the collection for the session,
starting the tranactions and then specifying the two commands that belong to the transaction and
then committing it, this should work with no errors — i.e. we should not put any db commands inbetween our session commands. If we then look at the database collections we should no longer
see the user or the posts created by that use i.e. they have been deleted by the delete commands
within the transaction.
Therefore this is how transactions works with MongoDB v4.0 and above. To summarise what we
have learnt about transaction and how the work: We get access to a session, based on that session,
we use the session to store a reference to our collection in some variable(s)/constant(s) and we do
279

this from our native drivers e.g. Node, Python, PHP etc. We then start a transaction on the session
and then we would execute our manipulating queries. Finding does not make much sense in
transactions as transactions are all about safely changing data like in our use case of deleting the
user and their posts in two different collections. Finally we commit to the transaction. We can abort
the transaction if we no longer wish to continue with the transaction.
It is also important to understand that for this transaction if something was to go wrong, if it
somehow would fail, but the users was deleted but the post was not or vice versa — it will roll back
these operations in the commit transaction and thereafter our database will be in the same state as
before. Therefore the actions in the transaction either succeed together or they fail together. This is
the idea behind transactions. This provides atomicity across multiple operations, so across multiple
steps or even something like deleteMany [atomicity is on a document level and is either written
entirely or nor written at all]. We can wrap our InsertMany or deleteMany commands in a session
transaction to guarantee that either all documents are inserted/deleted or none at all. Therefore we
can get atomicity on a operational level and not just on a document level. We should definitely not
overuse it though as this takes a bit more performance than a normal delete or insert, so we should
only use it if we need that cross operation consistency.
Useful Links:
https://docs.mongodb.com/manual/core/transactions/
280

From Shell to Driver

Splitting Work Between the Driver & the Shell
We have to understand how to work between management of the database probably through the
shell and interacting with the database through the driver. We have to understand how to split the
work between the driver and the shell as seen below:

Shell

Driver

Configure Database

CRUD Operations.

Create Collections

Aggregation Pipelines

Create Indexes
The task in the shells are typically things we do up front i.e. setting up the backend for our
application. The driver is tightly coupled with our application i.e. app logic. If we build an
application for a shop as an example, our code will basically be responsible for handling products,
users, order etc. The initial setup is not something the application deals with as it assumes that the
database is there to then communicate with. The driver typically handles CRUD operations to
handle data within the database i.e. Create, Read, Update and Delete. This is how we would roughly
split the responsibilities but technically we can create collections and indexes from inside the driver
which is still a possibility.
281

From Shell to Driver

Preparing Application Project
We will create an application to demonstrate the Shell to Driver module and how we would use the
driver code to perform CRUD operations within our application which will communicate with the
database. For this project we will use MongoDB Atlas as our database host. All following sections
will demonstrate the driver commands to communicate with the database via the drivers and how
easy it is to transfer from the shell to driver.
To prepare for the project we will setup a new free cluster on MongoDB Atlas using one of the
cloud hosting platforms and the free hosting region. We can call this cluster something like sandbox
or FromShellToDriverTutorial. Once the cluster has been created, we would need to go into security
and make sure we have at least one user with readWriteAnyDatabase access in our database which
is really important. The setup of the server will be done upfront and we do not need atlasAdmin
role. If we do need to do something we will do it by connecting to the server in the shell — so
anything related to setting up connections, collection validation or indexes, we would do that
through the shell. For the cluster we should also make sure that our local IP is whitelisted. Once
setup we should wait for MongoDB Atlas to finish setting up and in the mean time we could do
something else such as setting up NodeJS and our application project files as we will be using
NodeJS as our server language to help build our react node.js application.
282

We should head over to the NodeJS webpage to download and install NodeJS on our local
machine. This will also install Node Package Module (NPM) which will allow us to install all the
dependencies for our project file reading from the package.json file.
In the terminal, we should navigate to our project directory that contains the package.json file and
run the following terminal command to install all the dependencies.
$ npm install
This will take a while as it installs all the dependency files. Once all files have been installed we can
run the following command within the terminal to start the project in our browser on our localhost
server.
$ npm start
This will run the react script and open the single page application within the browser. We should
noticed an error modal message appear on the webpage and this is because it fails to fetch data
from the backend of the application as this has not been setup. We need to start the Node REST API
by simply opening another terminal navigated to the same project directory but run the following
command:
$ npm run start:server
We need to run both processes at the same time and keep them both running. With both running
we can reload the app and notice the products have been loaded from a local stored dummy data
and not from a database. We are now ready to play with the application and learn driver code and
see the slight difference between the shell and driver and how to connect to our database.
283

From Shell to Driver

Installing the Node.JS Driver
Within the MongoDB official documentation we can go into the MongoDB drivers section and we
can find instructions on how to install and use the driver for the language our application uses.
https://docs.mongodb.com/ecosystem/drivers/
For our demo application we would use the node.js drivers. This will forward us to the link/page
containing the instructions on how to install and use the node.js driver. We should find similar
documentation for all drivers so that we can learn how to use the driver with our application
language. We can also look at the Driver API which shows a list of all methods/objects that are part
of the driver.
To install the node.js driver in our project we must run the terminal command within our project
directory:
$ npm install mongodb --save
We now have the mongodb driver installed in our project file. Our mongoDB driver code
(credentials) will sit in the backend server and will never be exposed to the front end client.
284

From Shell to Driver

Connecting Node.js and the MongoDB Cluster
On the MongoDB Atlas clusters page click on the connect button. This is where we can whitelist our
localHost IP address as well as find the various methods of connecting to our cluster. We can now
select the Connect Your Application as the method. We can then select the driver and version we
are using and then copy the connection string.
In the project application directory we will need to add some code to our app.js file.
/backend/app.js file:
We need to import our mongoDB client. We do this by creating a const variable and importing the
mongodb package to use the MongoClient in order to establish a connection. We can then use the
const variable and call the connect method on it, passing in the url connection string we got from
the MongoDB Atas page. Replace the  and  with the actual users created on the
database.
const mongodb = require(‘mongodb’).MongoClient;
mongodb.connect(‘mongodb+srv://:@fromshelltodrivertutorialgtajb.mongodb.net/test?retryWrites=true’)
.then( client => {
285

console.log( ‘Connected!’ ) ;
client.close( );
})
.catch( err => {
console.log(err);
} );
This will establish a connection to our database. We can pass a function within our promises which
will be executed when the connection completed and this will output successful connections or
errors for unsuccessful connections. In our .then clause we can pass in the client as an argument
which will allow us access to the client which will then allow us to execute a database method which
will then allow us for example to work with collections and so on. In the example above we close the
client immediately but will later look at how to interact with the database using the client.
After saving the changes to our server side code, we have to go to the terminal where we ran the
npm run start:server command and quit that process using ctrl + c on our keyboard and simply rerun the command again for the changes to the server to take effect. After restarting we should see
Connected! In the terminal if the connection was successful to our mongoDB database server.
Now that we are connected to our database server, in the following sections we will look at driver
code on how to perform database actions to interact with our database server.
286

From Shell to Driver

Storing Products in the Database
Previously we saw how to connect to the mongoDB Atlas Database Server using a dummy
connection. We now want to use that connection and do something useful with it.
We can cut the previous code from the app.js file and head over to the products.js file. This file
contains a function that allows us to add a new product but we want to store the new product data
to our database. We want to also store the price as a 128bit decimal value and therefore we can
also practice transformation of our data.
backend/routes/products.js File:
const mongodb = require(‘mongodb’);
Const MongoClient = mongodb.MongoClient;
Router.post( ‘’, (req, res, next) => {
const newProduct = {
name: req.body.name,
description: req.body.description,
287

price: parseFloat(req.body.price), //Store as 128bit decimal in MongoDB
image: req.body.image
};
console.log(newProduct);
MongoClient.connect(‘mongodb+srv://:@fromshelltodrivertutorialgtajb.mongodb.net/test?retryWrites=true’)
.then( client => {
client.db( ).collection( ‘products’ ).insertOne(newProduct);
client.close( );
})
.catch( err => {
console.log(err);
} );
res.status(201).json( { message: ‘Product Added’, productId: ‘DUMMY’ } );
} );
We can paste in the mongoDB connection function code from our app.js file below the newProduct
function. We always have to connect from scratch every time we want to do something with the
database.
288

In the .then clause of our promise we can use client.db( ) to get access to our database and then call
collection as a method which allows us to access a collection. The products collection does not
currently exist. If we wanted to create the collection ahead of time because we wanted to add
schema validation or anything of the sort, we would create it within the shell.
$ mongo "mongodb+srv://fromshelltodrivertutorial-gtajb.mongodb.net/test" --username

We would replace  with the username we wish to connect to and the shell will then ask
you to enter the password when we run the command to login to that user.
Inside the Shell if we want to interact with our test database and create the collection we would use
the commands seen below:
$ use shop
$ db.createCollection( )
Now we do not need any special settings on our collections and therefore we can use the on the fly
approach and simply use the CRUD operation to insert something into the collection and the
collection and the database will then be created.
Notice how the insert command for the Node.JS driver is exactly the same as it is in the shell i.e. we
have the insertOne and insertMany commands as we have seen before. This is true for all the other
289

languages whereby the operations we can do are the same but the language may have a slight
difference in the syntax. In our above example we used the insertOne method and passed in our
newProduct variable.

From Shell to Driver

Storing the Price as a 128bit Decimal
It is always a great idea to dive into the official documentation for the driver we are using to
understand the driver syntax. The Driver API page will have a list of all available driver methods and
we could look for the method we want — in this example we are looking for something related to the
128bit decimal (there is a Decimal128 link which will provide sample snippet of the object method).
We can use the new Decimal128( ) constructor or better the Decimal128.fromString( ) operators.
We can import this into our code using our mongodb variable:
const mongodb = require(‘mongodb’);
const MongoClient = mongodb.MongoClient;
const Decimal128 = mongodb.Decimal128;
This is the reason we have mongodb imported separately as a const variable as we can use it to
import various objects from the mongodb package. We can now use this decimal const variable
290

as a reference to convert our price into a 128bit decimal. We would replace the native JavaScript
parseFloat( ) object method with the mongoDB driver Decimal128.fromString( ) method:
Router.post( ‘’, (req, res, next) => {
const newProduct = {
name: req.body.name,
description: req.body.description,
price: parseFloat Decimal128.fromString(req.body.price.toString()), //Store as 128bit
decimal in MongoDB
image: req.body.image
};
When we create a new product this should convert the price and store it as a 128bit decimal value.
For the Decimal128.fromString( ) to work we need to ensure the value we pass through is a string.
We can use the JavaScript toString( ) method to convert a number into a string value.
Finally we can use the .then and .catch on our insertOne( ) because it also returns a promise so that
we can either react to any errors or console.log our results. We want to make sure to close the client
within the promises as we do not want to close the client before the operation has completed.
291

MongoClient.connect(‘mongodb+srv://:@fromshelltodrivertutorialgtajb.mongodb.net/test?retryWrites=true’)
.then( client => {
client.db( ).collection( ‘products’ ).insertOne(newProduct)
.then( result => {
console.log(result);
client.close( );
res.status(201).json( { message: ‘Product Added’, productId: result.insertedId } );
})
.catch( err => {
console.log(err);
client.close( );
res.status(500).json( { message: ‘An error occurred.’ } );
})
})
.catch( err => {
console.log(err);
} );
We have now restructured the code so that we are now connected to our database inside the post
292

route and we are sending data to the database to the products collection and we should have a
working code. We can restart the server running the npm run start:server again as we changed
some code and now in our application (npm start to run application dev server) we should be able
to create a new product and have it saved into our database.
In the terminal running our backend server we will notice something that does not look like an error.
We receive a commandResult which has a lot of information about the operation that was executed.
The host sends the data to us and at the bottom we would see the insertedCount to indicate one
document was inserted/added to our database along with the ID. This is not an objectId( ) because
in JavaScript the objectId( ) would not exist but the id is basically the string which is wrapped by the
mongoDB objectId.
In the other terminal which is the shell connected to our mongoDB Atlas database server we can
run the command:
$ show collection
$ db.products.find( ).pretty( )
We will notice that our products collection has been created and if we look into the collection we
should also have the new product document added to the collection. We now have a working insert
through the client application.
293

From Shell to Driver

Fetching Data From the Database
Below is an example JavaScript code for fetching data from a mongoDB database. Note we have
copied the same code from the insert but changed it to a fetch command using the driver syntax.
backend/routes/products.js File:
router.get( ‘/’, (res, req, next) => {
MongoClient.connect(‘mongodb+srv://:@fromshelltodrivertutorialgtajb.mongodb.net/test?retryWrites=true’)
.then( client => {
const products = []
client.db( ).collection( ‘products’ ).find( ).forEach( productDoc => {
productDoc.price = productDoc.price.toString( );
products.push(productDoc);
} );
.then( result => {
// console.log(result);
client.close( );
294

res.status(200).json(products);
})
.catch( err => {
console.log(err);
client.close( );
res.status(500).json( { message: ‘An error occurred.’ } );
})
})
.catch( err => {
console.log(err);
res.status(500).json( { message: ‘An error occurred.’ } );
} );
} );
We use the same connection code as we did for the insert command however we would change
insertOne( ) to the find( ) command. Note that the find command will return us a cursor and we must
traverse the cursor through our data. We can learn more about the cursor for our driver in the
official driver documentation (in the Driver API list we should see the cursor object and we can see
all the methods we could execute on the cursor and how they work and how we can interact with
the cursor).
295

The forEach method simply goes through all the documents inside our cursor (i.e. it fetches all the
data in the forEach) and we can have access to each product document one at a time and do
something with each document i.e. push the documents into our products array. This method also
returns a promise as this is a asynchronous task as it will fetch each document from the database
server.
We also need to convert the 128bit decimal into a number that JavaScript can handle else we
would receive an error. The mongodb module provides us with a toString( ) method which converts
the 128bit decimal back into a string.
If we restart the server after making changes to the server code and rerun our application we should
notice the terminal showing that some data is being fetched and we should also see in our
application that the single document that we added to our database is now appearing within the list
of products.
Note that images should always be stored in a file storage and never on the database. If we had an
application that allows upload of images, we would save this uploaded image in a file storage and
then store the file path to the image in the database and not the image itself. This is because images
will bloat our database, is insufficient (a lot of data transfer) and is not what a database is built for.

296



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 296
PDF Version                     : 1.4
Title                           : MongoDB - A Complete Developers Guide
Producer                        : Mac OS X 10.13.6 Quartz PDFContext
Creator                         : Pages
Create Date                     : 2019:05:05 19:30:46Z
Modify Date                     : 2019:05:05 19:30:46Z
EXIF Metadata provided by EXIF.tools

Navigation menu