Mongo DB A Developers Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 184

DownloadMongo DB - A  Developers Guide
Open PDF In BrowserView PDF
MongoDB
The Complete Developer’s Guide

Introduction to MongoDB

What is MondoDB?
MongoBD is a database which is created by the company who is also called MongoDB. The name
stems from the word “humongous”. This database is built to store a lot of data but also being able
to work with the huge data efficiently. Ultimately, this is a database solution.
There are many database solutions such as mySQL, PostgresSQL, TSQL etc.
MongoDB is most importantly a database server that allows us to run different databases on it for
example a Shop database. Within the database we would have different collections such as a Users
collection or a Orders collection. We can have multiple databases and multiple collections per
database.
Inside of the collection we have something called documents. Documents look like JavaScript
JSON objects. Inside of a collection the documents are schema-less and can contain different data.
This is the flexibility that MongoDB provides us with whereas SQL based database are very strict
about the data stored within the database tables. Therefore, the MongoDB database can grow with
the application needs. MongoDB is a No-SQL database.

2

Typically we will need some kind of structure in a collection because applications typically requires
some type of structure to work with the data.

Diagram 1.1:
Database

Collections

Shop

Users

Orders

{name: ‘Max’, age: 28}

{product: ‘pen’, price: 1.99}

{name: ‘Sam’}

{product: ’t-shirt’}

Documents

3

JSON (BSON) Data Format:
{
“name”: “Alex”,
“age”: 29,
“address”: {
“city”: “Munich”
},
“hobbies”: [
{ “name”: “Cooking” },
{ “name”: “Football” }
]

Key
“name”: “Alex”
Name of the Key

Key Value

}
The above is an example of the JSON data format. A single document is surrounded by curly
brackets. The data is normally structured with a Keys. Keys consist of a Name of the Key and a Key
value. The Name of the Key (which will be referred to as Key from now on) and the Key Value must
be wrapped around quotation marks (unless if the data is a type of number).
There are different types of values we can store such as: string, number, booleans and arrays.
We can also nest documents within documents. This allows us to create complex relations between
4

data and store them within one document, which makes working with the data and fetching data
more efficient because it is contained in one document in a logical way. SQL in contrast requires
more complex method of fetching data which require joins to find data in table A and data in table
B to retrieve the relevant data.
Behind the scenes on the server, MongoDB converts the JSON data to a binary version of the data
which can be stored and queried more efficiently. We do not need to concern ourselves with BSON
as we would tend to work with JSON data.
The whole theme of MongoDB is flexibility, optimisation and usability and it is what really sets
MongoDB apart from other database solutions because it is so efficient from a performance
perspective as we can query data in the format we need it instead of running complex restructuring
on the server.

The Key MongoDB Characteristics.
MongoDB is a no SQL solution because it is following an opposite concept/philosophy to SQL
based databases. Instead of normalising the data i.e. storing data distributed across multiple tables
where every table has a clear schema and then using relations, MongoDB goes for storing data
together in a document. It does not force a schema hence schema-less/No-SQL.
5

We can have multiple documents in a single collection and they can have different structures as we
have seen in Diagram 1.1. This is important, it can lead to messy data but it still our responsibility as
developers to work with clean data and to implement a solution that works. On the other hand this
provides us with a lot of flexibility. We could use mongoDB for applications that might still evolve,
where the exact data requirements are not set yet. MongoDB allows us to started and we could
always add data with more information in the same collection at a later point in time.
We also work with less relations. There are some relations, but with these embedded (nested)
documents, we have less collections (tables) which we connect but instead we store data together.
This is where the efficiencies is derived from, since data is stored together and when we fetch data
from our application it does not require to reach out to multiple tables and merge the data because
all the data is already within the single collection. This is where the speed, performance and
flexibility comes from and can be seen beneficial for when building applications. This is the main
reason why No-SQL solutions are so popular for read and write heavy applications.

6

MongoDB Ecosystem
The below Diagram 1.2 is the current snapshot of the MongoDB companies ecosystem and product
offerings. The focus of this guide is on the MongoDB database used locally on our machines and on
the cloud using Atlas. We will also dive in Compass and the Stitch world of MongoDB.

Diagram 1.2:
MongoDB Database
Self-Managed/
Enterprise

Atlas (Cloud)

CloudManager/
OpsManager
Compass

Stitch

Mobile

Serverless Query API

Serverless Functions

Database Triggers

BI Connectors
MongoDB Charts
7

Realtime Sync

Installing MongoDB
MongoBD runs on all Operating Systems (OS) which include Windows/Mac/Linux. To install
MongoDB we can visit their webpage on:
https://www.mongodb.com/
Under products select MongoDB server and download the MongoDB Community Server for our OS
platform of choice. Install the MongoDB Server by following the installation steps.
Important Note: On Windows when installing click on the Custom Setup Type. MongoDB will be
installed as a service which will be slightly different to how MongoDB runs on Mac & Linux.
On Mac and Linux we simply have a extracted folder which contains files. We would copy all the
contents within this folder and paste them into any place within our OS i.e wherever we would want
to install MongoDB.
We would then want to create a folder called data and a sub-folder called db anywhere within our
OS, preferably in the root of the OS.
On Windows open up the command prompt or on Mac/Linux open up the terminal. This is where
8

we are going to spend most of out time using special commands and queries. Run the following
command:
$ mongo
This should return command not found.
To fix this problem on a Mac go to the user folder and find a file called .bash_profile file (if this does
not exist we could simply create it). Edit the file using a text editor. Add the following line:
export PATH=/Users/Username/mongobd/bin:$PATH
The path should be wherever we placed the mongoDB binary zip files. We need to add :$PATH at
the end on Mac/Linux to make sure all our other commands work on our OS. Save the file and close
the file.
Important Note: if you run into a problem on not being able to edit the .bash_profile using text
editor use the following command to edit it within the terminal:
$ sudo nano ~/ .bash_profile
This will allow you to edit the file within the terminal and enter the mongo bin file path. Press CRTL +
o to save and CTRL + x to exit the nano edit.
9

To fix this on a Windows OS, we need to create an environment variable. Press the windows key and
type environment which should suggest the Edit Environment Variable option. Under the user
variables edit Path to add the directory path to where we installed the mongoDB files:
C:\Program Files\MongoDB\Server\4.0\bin
Restart the terminal/command prompt and now run the command:
$ mongo
This should now return a error of connect failed on Mac/Linx.
On Windows it will connect because MongoDB is running as a service and has already started as a
background service because we would have checked this during the installation. If we open the
command prompt as administrator and ran the command ‘net stop MongoDB’ this will stop the
background service running automatically and we can manually start and stop the MongoDB
service running on windows. DO NOT RUN THIS COMMAND ON MAC/LINUX.
The mongo command is the client which allows us to connect to the server and then run commands
on the databases. To start the service on Mac/Linux we would use the following command:
$ mongod
10

When running this command to start the server it may fail if we chose a different default /data/db
folder. If we used a different folder and not within the root of our OS we would need to start the
mongod command instance followed by the --dbpath flag and the place where the /data/db is
located if not within the root directory.
$ sudo mongod --dbpath “/data/db”
On Mac we would need to run the mongod command every time we wish to run the mongoDB
service whereas on Windows this will run automatically even after restarting the system.
Now that we have the mongod server running minimise the terminal on Mac/Linux and open up a
new terminal. We cannot close the mongod server terminal because it is running the service and if
closed the mongoDB server everything will stop working and we cannot continue to work with the
database server. Pressing the CTRL + C keys within the terminal will quit the mongod service, but we
would need to re-run the mongod command again should we wish to run the server again.
We are now in the mongo shell which is the environment where we can run commands against our
database server. We can create new databases, collections and documents which we will now focus
on in the following sections.

11

Time to get Started
Now that we have the mongod server running and we can now connect to it using the mongo shell
we can now enter the following basic commands in the mongo terminal:
Command

Description

$ cls

Clear the terminal.

$ show dbs

Display existing databases (there are three default
databases: admin, config and local which store meta
data).

$ use databaseName

Connect/Switch to a database.
If the database does not exist it will implicitly create a new
database using the databaseName. It will not create the
database until a collection and document is added.

$ db.collectionName.insertOne( {“name

Create a new collection. The db relates to the current

of key”: “key value”} )

connected database. This will implicitly create a new
collection if it does not exist. We must pass at least one
new data in the collection using the .insert() command
passing in a JSON object. This will return the object to
confirm the data was inserted into the database.

12

Important Note: we can omit the quotes around the name of the key within the shell but we must
contain the quotes for the key value unless the key value is type of number. This is a feature within
the mongo shell which work behind the scenes. MongoDB will also generate a uniqueId for new
documents inserted into the collection.
Command

Description

$ db.collectionName.find()

Display the document within the database collection.

$ db.collectionName.find().pretty()

Display the documents within the database collection but
prettify the data in a more humanly readable format.

This is a very basic introductory look at the following shell commands we can run in the mongo
terminal to create a new database, switch to a database, create a collection and documents and
display all the documents within a database collection either in the standard or pretty format.
Tip: to run the mongod server on a different port to the default port 27017 by run the following
command. Note you would need to specify the port when running the mongo shell command as
well. You would use this in case the default port is being used by something else.
$ sudo mongo --port 27018
$ mongo --port 27018

13

Shell vs Drivers
The shell is a great neutral ground for working with mongoDB. Drivers are packages we install for
different programming languages the application might be written in. There are a whole host of
drivers for the various application server languages such as PHP, node, C#, python etc. Drivers are
the bridges between the programming language and the mongoDB server.
As it turns out, in these drivers, we would use the same command as we use in the shell, they are
just slightly adjusted to the syntax of the language we are working with.
The drivers can be found on the mongoDB website:
https://docs.mongodb.com/ecosystem/drivers/
Throughout this document we will continue to use the Shell commands as it is the neutral
commands. We can take the knowledge of how to insert, configure inserts, query data, filter data,
sort data and many more shell commands. These commands will continue to work when we use the
drivers but we would need to make reference to the driver documentation to understand how to
use the shell commands but using the programming language syntax to perform the commands
using the drivers. This will make us more flexible with the language we use when building
applications that uses mongoDB.

14

MongoDB & Clients: The Big Picture
Diagram 1.3:
Application
Frontend
(UI)

Backend
(Server)

Data

Drivers
Node
Java
Python

MongoDB Shell
Playground, Administration

15

Queries MongoDB

Server

Storage
Engine

Communicate

File/Data
Access

MongoDB & Clients: The Big Picture
Diagram 1.4:
Data
Read + Write Data to
Files (slow)

MongoDB Server

Storage Engine

Memory
Read + Write Data to
Memory (fast)
As we can see in Diagram 1.3 the application driver/shell communicates to the mongoDB server.
The MongoDB server communicates with the storage engine. It is the Storage Engine which deals
with the data passed along by the MongoDB Server, and as Diagram 1.4 depicts it will read/write to
database and/or memory.

16

Understanding the Basics & CRUD Operations

Create, Read, Update & Delete (CRUD)
We could use MongoDB to create a variety of things such as an application, Analytics/BI Tools or
data administration. In an application case, we may have an app where the user interacts with our
code (the code can be written in any programming language) and the mongoDB driver will be
included in the application. In the case of a Analytics/BI Tools we may use the BI Connector/Shell
provided by mongoDB or another import mechanism provided by our BI tool. Finally, in the
database administrator case we would interact with the mongoDB shell.
In all the above cases we would want to interact with the mongoDB server. In an application we
would typically want to be able to create, read, update or delete elements e.g. a blog post app.
With analytics, we would at least want to be able to read the data and as an admins we would
probably want to do all the CRUD actions.
CRUD are the only actions we would want to perform with our data i.e. to create it, manage it or
read it. We perform all these actions using the mongoDB server.

17

Diagram 1.5:
CREATE

UPDATE

insertOne(data, options)
insertMany(data, options)

updateOne(filter, data, options)
updateMany(filter, data, options)
replaceOne(filter, data, options)

READ

DELETE

findOne(filter, options)
find(filter, options)

deleteOne(filter, options)
deleteMany(filter, options)

The above are the four CRUD operations and the commands we can run for each action. In later
sections we will focus on each CRUD action individually to understand in-depth each of the actions
and syntax/command we can use when performing CRUD operation with our mongoDB data
collection and documents.

18

Understanding the Basics & CRUD Operations

Finding, Inserting, Updating & Deleting Elements
To show all the existing databases within the mongoDB server we use the command “show dbs”
while we use the use followed by the database name to switch to a database. The db will then relate
to the switched database.
To perform any CRUD operations, these commands must always be performed/executed on a
collection where you want to create/update/delete documents. Below are example snippets of
CRUD commands on a fictitious flights database (where the collection is called flightData).
$ db.flighData.insertOne( {distance: 1200} )
This will add a single document to the collection as we have seen previously.
$ db.flightData.deleteOne( {departureAirport: “LHR”} )
$ db.flightData.deleteMany( {departureAirport: “LHR”} )
The delete command takes in a filter. To add a filter we would use the curly brackets passing in
19

which name of key and key value of the data we wish to filter an delete. In the above example we
used the departureAirport key and the value of TXL. The deleteOne command will find the first
document in our database collection that meets the criteria and deletes it. The command will return:
{ “acknowledged” : true, “deleteCount” : 1 }
If a document was deleted in the collection this will show the number of deleted documents (the
deleteOne command will always return 1). If no documents matched the filter and none were
deleted the returned deleteCount value will be 0.
The deleteMany in contrast will delete many documents at once where the documents matches the
filter criteria specified.
Note: The easiest way to delete all data in a collection is to delete the collection itself.
$ db.flightData.updateOne( {distance: 1200}, { $set: {marker: “delete”} } )
$ db.flightData.updateMany( {distance: 1200}, { $set: {marker: “delete”} } )
The update command takes in 3 argument/parameters. The first is the filter which is similar to the
delete command. The second is how we would want to update/change the data. We must use the
20

{ $set: { } } keyword (anything with a $ dollar sign in front of the keyword is a reserved word in
mongoDB) which lets mongoDB know how we are describing the changes we want to make to a
document. If the update key:value does not exist, this will create a new key:value property within the
document else it will update the existing key:value with the new value passed in. The third
parameter is options which we will analyse in great detail in the latter sections.
Important Note: when passing in a filter we can also pass in empty curly brackets { } which will
select all documents within the collection.
If successful with updating many this will return within the terminal an acknowledgement as seen
below, where the number of matched the filter criteria and the number of data modified:
{ “acknowledged” : true, “matchedCount” : 2, “modified” : 2 }
If we were to delete all the documents within a collection and use the command to find data in that
collection i.e using the db.flightData.find().pretty() command, the terminal will return empty/nothing
as there are no existing documents to read/display.
The above demonstrates how we can find, insert, update and delete elements using the update and
delete command.
21

Now we have seen how we can use insertOne() to add a single document into our collection.
However, what if we want to add more than one document? We would use the inserMany()
command instead.
db.flightData.insertMany( [
{
“departureAirport”: “LHT”,
“arrivalAirport”: “TXL”
},
{
“departureAirport”: “MUC”,
“arrivalAirport”: “SFO”
}
])
We pass in an array of objects in order to add multiple documents into our database collection. The
square brackets is used to declare an array. The curly brackets declare a object and we must use
comma’s to separate each object. If successful, this will return acknowledged of true and the
insertdIds of each object/document added into the collection.
22

Important Note: mongoDB by default will create a unique id for each new document which is
assigned to a name of key called “_id” followed by a random generated key. When inserting a
object we could assign our own unique id using the _id key followed by a unique value. If we insert
a object and pass in our own _id key value and the value is not unique this will return a duplicate
key error collection in the terminal. We must always use a unique id for our documents and if we do
not specify a value for _id then mongoDB will generate one for us automatically.

Understanding the Basics & CRUD Operations

Diving Deeper Into Finding Data
Currently we have seen the .find() function used without passing any arguments for finding data
within a collection. This will retrieve all the data within the collection. Just as we would use filter to
specify a particular records or documents when deleting or updating a collection, we can also filter
when finding data.
We can pass a document into the find function which will be treated as a filter as seen in the
example below. This allows us to retrieve a subset of the data rather than the whole data within an
application.
db.flightData.find( {intercontinental : true } ).pretty()

23

We can also use logical queries to retrieve more than one document within a collection that
matches the criteria as demonstrated in the below example. We query using another object and
then one of the special operators in mongoDB.
db.flightData.find( {distance: {$gt: 1000 } } ).pretty()
In the above we are using the $gt: operator which is used for finding documents “greater than” the
value specified. If we were to use the findOne() operator this will return the first record within the
collection that matches the criteria.

Understanding the Basics & CRUD Operations

Update vs UpdateManay
Previously we have seen the updateOne() and updateMany() functions. However, we can also use
another update function called update() as seen in the example below:
db.flightData.update( { _id: ObjectId(“abc123”) }, { $set: { delayed: true } } )
The update() function works exactly like the updateMany() function where all matching documents
to the filter are updated. The difference between update() and updateMany() is that the $set:
24

operator is not required for the update() function whereas this will cause an error for either the
updateOne() and updateMany() functions. So we can write the above syntax like so and would not
get an error:
db.flightData.update( { _id: ObjectId(“abc123”) }, { delayed: true } )
The second and main difference is that the update function takes the new update object and
replaces the existing object (this does not affect the unique id) updating the document. It will only
patch the update object instead of replacing the whole existing object (just like the updateOne()
and updateMany() functions), if we were to use the $set: operator, otherwise it would override the
existing document.
This is something to be aware of when using the update() function. If we intend to replace the whole
existing document with a new object then we can omit the $set: operator. In general it is
recommended to use updateOne() and updateMany() to avoid this issue.
If, however, we want to replace a document we should use the replaceOne() function. Again, we
would place our filter and the object we want to replace with. This is a more explicit and more safer
way of replacing the data in a collection.
db.flightData.replaceOne( { id: ObjectId(“abc123”) }, { departureAirport: “LHT”, distance: 950 } )

25

Understanding the Basics & CRUD Operations

Understanding Find() & The Cursor Object
If we have a passengers collection which stores the name and age of passengers and we want to
retrieve all the documents within the passenger collection we can use the find() function as we have
seen previously.
db.passengers.find().pretty()
Useful Tip: when writing commands in the shell we can use the tab key to autocomplete for
example if we wrote db.passe and tab on our keyboard, this should auto-complete db.passengers.
We will notice where a collection has many data, the find() function will return all the data but
display all the data with the shell. If we scroll down to the last record we should see Type “it” for
more within the shell. If we type the command it and press enter, this will display more data from
the returned find() function. The find() command in general returns back what is called a Cursor
Object and not all of the data.
The find() does not give an array of all the documents within a collection. This makes sense as the
26

collection could be really large and if the find() was to return the whole array, imagine if a collection
had 2million documents — this could take a really long time but also send a lot of data over the
connection.
The Cursor Object is an object that has many meta data behind it that allows us to cycle through the
results, which is what the “it” command did. It used the Cursor Object to fetch the next group (cycle)
of data from the collection.
We can use other methods on the find() function such as toArray() which will exhaust the cursor i.e.
go through all of the cursors and fetch back all the documents within the array (i.e. not stopping
after the first 20 documents — a feature within the mongoDB shell).
db.passengers.find().toArray()
There is a forEach method that can also be used on the find() function. The forEach allows us to
write some code to do something on every element that is in the database. The syntax can be found
within the driver documents for whichever language we are using for our application e.g. PHP or
JavaScript etc. Below is a JavaScript function which the shell can also use:
db.passengers.find().forEach( (document) => { printjson(document) } )
27

The forEach function in JavaScript gets the document object passed in automatically into the arrow
function and we can call this whatever we want i.e. passengersData, data, x, etc. In the above we
called this document. We can then use this object and do whatever we want i.e. we used the
printjson() command to print/output the document data as JSON. The above will also return all the
documents within the collection because the forEach loops on every Cursor Object.
To conclude, the find() function does not provide us with all the documents in a collection even
though it may look like it in some circumstances where there are very little data within a collection.
Instead it returns a Cursor Object which we can cycle through the return more documents from the
collection. It is unto us as the developer to use the cursor to either force it to get all the documents
from a collection and place it in an array or better using the forEach or other methods to retrieve
more than 20 documents (the default number of items returned in the shell) from the collection.
Note the forEach is more efficient because it fetches/returns objects on demand through each
iteration rather than fetching all the data in advance and loaded into memory which saves both on
bandwidth and memory.
The Cursor Object is also the reason why we cannot use the .pretty() command on the findOne()
function because the findOne returns one document and not a Cursor Object. For Insert, Update
and Delete commands the Cursor Object does not exist because these methods do not fetch data,
they simply manipulate the data instead.
28

Understanding the Basics & CRUD Operations

Understanding Projections
In Database

In Application

{
“_id”: “…”,
“name”: “John”,
“age”: 35,
“Job”: “Teacher”

{
“name”: “John”,
“age”: 35,

Projection
}

}
Imagine in our database we have the data for a person record and in within our application we do
not need all the data from the document but only the name and age to display on our web
application. We could fetch all the data and filter/manipulate the data within our application in any
programming language. However, this approach will still have an impact on the bandwidth by
fetching unnecessary data — something we want to prevent. It is better to filter the data out from the
mongoDB server and this is exactly what projection allows us to do.
Below are examples of using projections to filter the necessary data to retrieve from our find query.
29

db.passengers.find( {}, {name: 1} ).pretty()
We need to pass in a first argument to filter the find search (note: a empty object will retrieve all
documents). The second argument allows us to project. A projection is setup by passing another
document but specifying which key:value pairs we want to retrieve back. The one means to include
it in the data returned to us.
The above will return all the passengers document but only the name and id, omitting the age from
the returned search results. The id is a special field in our data and by default it is always included.
To exclude the id from the returned results, we must explicitly exclude it. To exclude something
explicitly we would specify the name of key and set the value to zero as seen below:
db.passengers.find( {}, {name: 1, _id:0} ).pretty()
Note: we could do the same for age (e.g. age: 0), however, this is not required because the default
is everything but the _id is not included in the projection unless explicitly specified using the one.
The data transformation/filtering is occurring on the mongoDB server before the data is shipped to
us and is something that we would want because we do not want to retrieve unnecessary data
which will impact on the bandwidth.

30

Understanding the Basics & CRUD Operations

Embedded Documents & Arrays
Embedded documents is a core feature of mongoDB. Embedded documents allows us to nest
other documents within each other and having one overarching document in the collection.
There are two hard limits to nesting/embedded documents:
1. We can have up to 100 level of nesting (a hard limit) in mongoDB.
2. The overall document size has to be below 16mb
The size limit for documents may seem small but since we are only storing text and not files (we
would use file storage for files), 16mb is more than enough.
Along with embedded documents, another documents we can store are arrays and this is not
strictly linked to embedded documents, we can have arrays of embedded documents, but arrays
can hold any data. This means we have list of data in a document.
Below are examples of embedded documents and arrays.

31

db.flightData.updateMany( {}, {$set: {status: {description: “on-time”, lastUpdated: “1 hour ago”} } } )
In the above example we have added a new document property called status which has a
embedded/nested document of description and lastUpdated. If we output the document
using .find() function, the below document would now look something like the below:
{
“_id”: …
“departure”: “LHT”,
“arrivalAirport”: “TXL”,
“status”: {
“description”: “on-time”,
“lastUpdated” “1 hour ago”,
}
}
Note: we could add more nested child documents i.e. description could have a child nested
document called details and that child could have further nested child documents and so on.

32

db.passengers.updateOne( {name: “Albert Twostone”}, {$set: {hobbies: [“Cooking”, “Reading”] } } )
Arrays are marked with square brackets. Inside the array we can have any data, this could be
multiple documents (i.e. using the curly brackets {}), numbers, strings, booleans etc.
If we were to output the document using the .find() function, the document would look like
something below:
{
“_id”: …,
“name”: “Albert Twostone”,
“age”: 63,
“hobbies”: [
“Cooking”,
“Reading”
]
}
Albert Twostone will be the only person with hobbies and this will be a list of data. It is important to
note that hobbies is not a nested/embedded document but simply a list of data.

33

Understanding the Basics & CRUD Operations

Accessing Structured Data
To access structured data within a document we could use the following syntax:
db.passengers.findOne( {name: “Albert Twostone”} ).hobbies
We can specify the name of a structured data within a document by using the find query and then
using the name of key we wish to access from the document, in the above we wanted to access the
hobbies data which will return the hobbies array as the output:
[“Cooking”, “Reading”]
We can also search for all documents that have hobbies of Cooking using the syntax below as we
have seen previously. This will return the whole document entry where someone has Cooking as a
hobby. MongoDB is clever enough to look in arrays to find documents that match the criteria.
db.passengers( {hobbies: “Cooking”} ).pretty()
Below is an example of searching for objects (this includes searching within nested documents):
db.flightData.find( {“status.description”: “on-time”} ).pretty()

34

We use the dot notation to drill into our embedded documents to query our data. It is important
that we wrap the dot notation in quotations (e.g. “status.description”) otherwise the find() function
would fail.
This would return all documents (the whole document) where the drilled criteria matches. This
allows us to query by nested/embedded documents. We can drill as far as we need to using the dot
notation as seen in the example below:
db.flightData.find( {“status.details.responsible”: “John Doe”} ).pretty()
This dot notation is a very important syntax to understand as we would use this a lot to query our
data within our mongoDB database.

Understanding the Basics & CRUD Operations

Conclusion
We have now covered all the basic and core features of mongoDB to understand how mongoDB
works and how we can work with it i.e. store, update, delete and read data within the database as
well as how we can structure our data.

35

Understanding the Basics & CRUD Operations

Resetting The Database
To purge all the data within our mongoDB database server we would use the following command:
use databaseName
db.dropDatabase()
We must first switch to the database using the use command followed by the database name. Once
we have switched to the desired database we can reference the current database using db and then
call on the dropDatabase() command which will purge the specified database and its data.
Similarly, we could get rid of a single collection in a database using the following command:
db.myCollection.drop()
The myCollection should relate to the collection name.
These commands will allow us to clean our database server by removing the database/collections
that we do not want to keep on our mongoDB server.

36

Schemas & Relations: How to Structure Documents

Why Do We Use Schemas?
There is one important question to ask — wasn’t mongoDB all about having no data Schemas i.e.
Schema-less. To answer this question, mongoDB enforces no Schemas. Documents do not have to
use the same schema inside of one collection. Our documents can look like whatever we want it to
look like and we can have totally different documents in one and the same collection i.e. we can mix
different schemas.
Schemas are the structure of one document i.e. how does it look like, which fields does it have and
what types of value do these fields have. MongoDB des not enforce schemas; however, that does
not mean that we cannot use some kind of schema and in reality we would indeed have some form
of schema for our documents. It is in our interest if we were to build a backend database that we
have some form of structure to the types of documents we are storing. This would make it easier for
us to query our database and get the relevant data and then cycle through this data using a
programming language to display the relevant data within our application.
We are most likely to have some form of schemas because we as developers would want it and our
applications will need it. Whilst we are not forced to have a schema we would probably end up with
some kind of schema structure and this is important to understand.
37

Schemas & Relations: How to Structure Documents

Structuring Documents
Chaos

SQL World

Products

Products

Products

{

{

{

“title”: “Book”,
“price”: 12.99
}

“title”: “Book”,
“price”: 12.99
}

“name”:
“Bottle”,
“available”: true
}

}
Extra Data

Very different!

{

“title”: “Book”,
“price”: 12.99

{
“title”: “Bottle”,
“price”: 5.99,
“available”: true
}

Full Equality

{
“title”: “Bottle”,
“price”: 5.99,
}

We can use any of the structured approach in the diagram above depending on how we require it
in our applications. In reality we would tend to use the approach in the middle or on the right.
38

The middle approach used the best of both words where there are some structure to the data,
however, it also has the flexibility advantage that mongoDB provides us so that we can store extra
information.
Note: we can assign the null value to properties in order to have a structured approach although
the data may not have any actual values associated with the property. A null value is considered a
valid value and therefore we can use a SQL (structured) type approach with all our documents.
There is no single best practice with how to set the structure of our data within our documents and
it is up to us as developers to use the best structure that works best for our applications or
whichever is to our personal preference.

Schemas & Relations: How to Structure Documents

Data Types
Now that we understand that we are free to define our own schemas/structure for our documents,
we are now going to analyse the different data types we can use in mongoDB.
Data Types are the types of data we can save in the fields within our documents. The below table
break the different data types for us:
39

Type

Example Value

String

“John Doe”

Boolean

TRUE

NumberInt (int32)

55, 100, 145

NumberLong (int64)

10000000000

NumberDecimal

12.99

ObjectId

ObjectId(“123abc”)

ISODate

ISODate(“2019—02—09”)

Timestamp

Timestamp(11421532)

Embedded Documents

{“a”: {…}}

Arrays

{“b”: […]}

Notice how the text type requires quotation marks (single or double) around the value. There are no
limitation in the size of the text. The only limitation is the 16mb for the whole document. The larger
the text the larger the data it takes.
Notice how numbers and booleans do not require a quotation marks around the value.
There are different types of numbers in mongoDB. Integer (int32) are 32bit long numbers and if we
try to store a number longer than this they would overflow that range and we will end up with a
40

different number. For longer integer numbers we would use NumberLong (int64). The integer
solution we decide to choose will dictate how much space will be allocated and eaten up by the
data. Finally, we can also store NumberDecimal i.e. numbers with decimal values (a.k.a float in other
programming languages).
The default within the shell is to store a int64 floating point value but we also have a special type of
NumberDecimal provided by mongoDB to store high precision floating point values. Normal
floating point values (a.k.a doubles) are rounded and are not precise after their decimal place.
However, for many use cases the floating point (double) is enough prevision required e.g. shop
store. If we are performing scientific calculations or something that requires a high precision
calculation, we are able to use the special type that offers this very high decimal place precision (i.e.
34 digits after the decimal place).
The ObjectId is a special value that is automatically generated by mongoDB to provide a unique id
but it also provides some temporal component that allows for sorting built into the ObjectId,
respecting a timestamp.
The above table provides all the data types within mongoDB that we can use to store data within
our database server.

41

Schemas & Relations: How to Structure Documents

Data Types & Limits
MongoDB has a couple of hard limits. The most important limitation: a single document in a
collection (including all embedded documents it might have) must be less than or equal to 16mb.
Additionally we may only have 100 levels of embedded documents.
We can read more on all the limitation (in great detail) on the below link:
https://docs.mongodb.com/manual/reference/limits/
For all the data types that mongoDB supports, we can find a detailed overview on the following link:
https://docs.mongodb.com/manual/reference/bson-types/
Important data type limits are:
Normal Integers (int32) can hold a maximum value of +-2,147,483,674
Long Integers (int64) can hold a maximum value of +-9,223,372,036,854,775,807
Text can be as long as we want — the limit is the 16mb restriction for the overall document.

42

It's also important to understand the difference between int32 (NumberInt), int64 (NumberLong)
and a normal number as you can enter it in the shell. The same goes for a normal double and
NumberDecimal.
NumberInt creates a int32 value => NumberInt(55)
NumberLong creates a in64 value => NumberLong(7489729384792)
If we just use a number within the shell for example insertOne( {a: 1} ), this will get added as a
normal double into the database. The reason for this is because the shell is based on JavaScript
which only knows float/double values and does not differ between integers and floats.
NumberDecimal creates a high precision double value => NumberDecimal(“12.99”)
This can be helpful for cases where we need (many) exact decimal places for calculations.
When working with mongoDB drivers for our application’s programming language (e.g. PHP, .NET,
Node.js, Python, etc.), we can use the driver to create these specific numbers. We should always
browse the API documents for the driver we are using within our applications to identify the
methods for building int32, int64 etc.
Finally we can use the db.stats() command in the mongoDB shell to see stats of our database.
43

Schemas & Relations: How to Structure Documents

How to Derive Our Data Structure Requirements
Below are some guidelines to keep to mind when we think about how to structure our data:
What data does our App need to generate? What is the business model?
User Information, Products Information, Orders etc. This will help define the fields we would need
(and how they relate).
Where do I need my data?
For example, if building a website do we need the data on the welcome page, products list page,
orders page etc. This help define our required collections and field groupings.
Which kind of data or information do we want to display?
For example the welcome page displays product names. This will help define which queries we
need i.e. do we need a list of products or a single product.
These queries we plan to have also have an impact on our collections and document structure.
MongoDB embraces the idea of planning our data structure based on the way we retrieve the data
so that we do not have to perform complex joins but we retrieve the data in the format or almost in
44

the format we need it in our application.
How often do we fetch the data?
Do we fetch data on every page reload, every second or not that often? This will help define
whether we should optimise for easy fetching of data.
How often do we write or change the data?
Do we change or write data often or rarely will help define whether we should optimise for easy
writing of data.
The above are things to keep in mind or to think about when structuring our data structures and
schemas.

Schemas & Relations: How to Structure Documents

Understanding Relations
Typically we would have multiple collections for example a users collection, a product collection
and a orders collections. If we have multiple collections that are relatable or where the documents
in these relations are related, we obviously have to think about how do we store related data.
45

Do we use embedded documents because this is one way of reflecting a relation or alternatively, do
we use references within our documents?

Nested/ Embedded Documents

References

Customers Collection
{

Customers Collection:
{

“userName”: “John”,
“age”: 28,
“address”: {
“street”: “First Street”,
“City”: “Chicago”
}

“userName”: “Alan”
“favBooks”: [“id1”, “id2”]
}
Books Collection:
{
“_id”: “id1”,
“name”: “Lord of the Rings”

}
}

In the reference example above, we would have to run two queries to join the data from the
different collections. However, if a book was to change, we would only update it in the books
collection as the id would remain the same whereas in a embedded document relation we would
have to update multiple customer records affected with the new change.

46

Schemas & Relations: How to Structure Documents

One to One Embedded Relation Example
Example:
One patient has one disease
summary, a disease summary
belongs to one patient.

Patient A

Summary A

Patient B

Summary B

Patient C

Summary C

Code snippet:
$ use hospital
$ db.patients.insertOne( { name: “John Doe”, age: “25”, diseaseSummary: { diseases: [“cold”,
“sickness”] } } )
Where there is a strong one to one relation between two data, it is ideal to use a one to one
embedded approach as demonstrated in the above example.
The advantage of the embedded nested approach is that within our application we only require a
single find query to fetch the necessary data for the patient and disease data from our database
collection.
47

Schemas & Relations: How to Structure Documents

One to One Reference Relation Example
Example:
One person has one car, a car
belongs to one person.
Code snippet:

Person A

Car 1

Person B

Car 2

Person C

Car 3

$ use.carData
$ db.persons.insertOne( { name: “John”, age: 30, salary: 30000 } )
$ db.cars.insertOne( { model: “BMW”, price: 25000, owner: ObjectId(“5b98d4654d01c”) } )
In most one to one relationships we would generally use the embedded document relations.
However, we can opt to use a reference relation approach as we are not forced to use one
approach.
For example, we have a more analytics use case rather than a web application and we have a use
case where we are interested in analysing the person data and or analysing our car data but not so
much in a relation. In this example we have a application driven reason for splitting the data.
48

Schemas & Relations: How to Structure Documents

One to Many Embedded Relation Example
Question Thread A

Answer 1
Answer 2

Question Thread B

Answer 1

Example:
One question thread has many answers, one answer belongs to one question thread.
Code snippet:
$ use support
$ db.questionThreads.insertOne( { creator: “John”, question: “How old are you?”, answers: [ { text: “I
am 30.” }, { text: “Same here.” } ] } )
A scenario where we may use a embedded one to many relation would be post and comments. This
is because you would often need to fetch the question along with the answers in an application
perspective. Also usually there are not too many answers to worry about the 16mb document limit.
49

Schemas & Relations: How to Structure Documents

One to Many Reference Relation Example
City A

Citizen 1
Citizen 2

City B

Citizen 1

Example:
One city has many citizens, one citizen belongs to one city.
Code snippet:
$ use cityData
$ db.cities.insertOne( { name: “New York City”, coordinates: { lat: 2121, lng: 5233 } } )
$ db.citizens.insertMany( [ { name: “John Doe”, cityId: ObjectId(“5b98d6b44d”) }, { name: “Bella
Lorenz”, cityId: cityId: ObjectId(“5b98d6b44d”) } ] )

50

In the above scenario we may have a database containing a collection of all major cities in the world
and a list of every single person living within that city. It would seem to make sense to have a one to
many embedded relationship, however, from an application prospective we may wish to only
retrieve the city data only. Furthermore, a city like New York may have over 1 million people data
and this would make fetching the data slow due to the volume of data passing through the wire.
Furthermore, we may end up running into the document size limit of 16mb. In this type of scenario,
it would make sense to split the data up and using the reference relation to link the data.
In the above we would only store the city metadata and will not store any citizen reference as this
will also end up being a huge list of citizens unique id. Instead, we would create a citizens collection
and within the citizens data we would make reference to the city reference. The reference can be
anything but must be unique ie. we could use the ObjectId() or the city name etc.
This will ensure that we do not exceed the limitation of the 16mb per document as well as not
retrieving unnecessary data if we are only interested in returning just the cities metadata from a
collection.

51

Schemas & Relations: How to Structure Documents

Many to Many Embedded Relation Example
Customer A

Product 1
Product 2

Customer B

Product 3

Example:
One customer has many products (via orders), a product belongs to many customers.
Code snippet:
$ use shop
$ db.products.insertOne( { title: “A Book”, price: 12.99 } )
$ db.customers.insertOne( { name: “Cathy”, age: 18, orders: [ { title: “A Book”, price: 12.99, quantity:
2}]})
We would normally model many to many relationships using references. However, it is possible to

52

use the embedded approach as seen above. We could store a collection for the products as meta
data for an application to retrieve the data in order to help populate the embedded document of
the customer collection using a programming language.
A disadvantage to the embedded approach is data duplication because we have the title and price
of the product within the orders array as the customer can order the product multiple times as well
as other customers which will cause a lot of duplication.
If we decide to change the data for the product, not only do we need to change it within the
product collection but we also have to change it on all the orders affected by this change (or do we
actually need to change old orders?). If we do not care about the product title changing and the
price changing i.e. we have an application that takes a snapshot of the data, we may not worry too
much about duplicating that data because we might not need to change it in all the places where
we have the duplicated the data if the original data changes — this highly depends on the
application we build. Therefore a embedded approach may work.
In other case scenarios where we absolutely need the latest data everywhere, a reference approach
may be most appropriate in a many to many relationship. It is important to think about how we
would fetch our data and how often do we want to change it and if we need to change it
everywhere or are duplicate data fine before deciding which approach to adopt for many to many.
53

Schemas & Relations: How to Structure Documents

Many to Many Reference Relation Example
Book A

Author 1
Author 2

Book B

Author 3

Example:
One book has many authors, an author belongs to many books.
Code snippet:
$ use bookRegistry
$ db.books.insert( { name: “favourite book”, authors: [ objectId(“5b98d9e4”), objectId(“5b98d9a7”) ]
})
$ db.authors.inserMany( [ { name: “Martin”, age: 42 }, { name: “Robert”, age: 56 } ] )
The above is an example of a many to many relation where a reference approach may be suitable
for a scenario where the data that changes needs to be reflected everywhere else.
54

Schemas & Relations: How to Structure Documents

Summarising Relations
We have now explored the different relation options that are available to use. This should provide
us enough knowledge to think about relations and when to use the most appropriate approach
depending on:
the application needs
how often data changes
if a snapshot data suffice
how large is the data (how much data do we have).
Nested/Embedded Documents — group data together logically. This makes it easier when fetching
the data. This approach is great for data that belong together and is not overlapping with other
data. We should always avoid super-deep nesting (100+ levels) or extremely long arrays (16mb size
limit per document).
References — split data across collections. This approach is great for related data but also shared
data as well as for data which is used in relations and standalone. This allows us to overcome
nesting and size limits (by creating new documents).
55

Schemas & Relations: How to Structure Documents

Using $lookup for Merging Reference Relations
MongoDB has a useful operation called $lookup that allows us to merge related documents that are
split up using the reference approach.
The image on the right provides a scenario

customers

books

{

{
userName: “John”
favBook: [“id1”, “id2”]

of a reference approach where the customer
and books have been split into two

}

_id: “id1”
name: “Harry Potter”
}

collections. The lookup operator is used as seen below. This uses the aggregate method which we
have not currently learned.
$ customer.aggregate( [
{ $lookup: { from: “books”, localField: “favBooks”, foreignField: “_id”, as: “favBookData” } }
])
The $lookup operator allows us to fetch two related documents merged together in one document
within one step (rather than having to perform two steps). This mitigates some of the disadvantages
of splitting our documents across multiple collections because we can merge them in one go.
56

This uses the aggregate method framework (which we will dive into in later chapters) and within the
aggregate we pass in an array because we can define multiple steps on how to aggregate the data.
For now we are only interested in one step (a step is a document we pass into an array) where we
pass the $lookup step. The lookup passes in a document as a value, where we define 4 attributes:
from — which other collection do we want to relate documents i.e. we would pass in the name
of the collection where the other document lives that we wish to merge.
localField — in the collection we are running the aggregate function on, where can the
reference to the other (from) collection be found in i.e. the key that stores the reference.
foreignField — which field are we relating to in our target collection (i.e. the from collection)
as — provide an alias for the merged data. This will become the new key which the merged data
will sit.
This is not an excuse to always using a reference relation approach because this costs more
performance than having an embedded document.
If we have a references or want to use a references, we have the lookup step in the aggregate
method that we can use to help get the data we need. This is a first look at aggregate and we will
explore what else the aggregate can do for us in later chapters.

57

Schemas & Relations: How to Structure Documents

Understanding Schema Validation
MongoDB is very flexible i.e. we can have totally different schemas and documents in one and the
same collection and that flexibility is a huge benefit. However, there are times where we would want
to lock down this flexibility and require a strict schema.
Schema validation allows mongoDB to validate the incoming data based on the schema that we
have defined and will either accept the incoming data for the write or update to the database or it
will reject the incoming data and the database is not changed by the new data and the user gets an
error.

validationLevel

validationAction

Which document get validated?

What happens if validation fails?

Strict

All inserts & updates

Error

Throw error and deny
insert/update

Moderate

All inserts & updates
to correct documents

Warn

Log warning but
proceed

58

Schemas & Relations: How to Structure Documents

Adding Collection Document Validation
To add schema validation in mongoDB and the easiest method is to add validation when we create
a new collection for the very first time explicitly (not implicitly when we add a new data). We can use
the createCollection to create and configure a new collection:
$ db.createCollection(“posts”, { validator: { $jsonSchema: { bsonType: “object”, required: [“title”,
“text”, “creator”, “comments”], properties: { title: { bsonType: “string”, description: “must be a string
and is required.” }, text: { bsonType: “string”, description: “must be a string and is required” },
creator: { bsonType: “objectId”, description: must be an objectId and is required }, comments:
{ bsonType: “array”, desription: “must be an array and is required”, items: { bsonType: “object”,
required: [“text,”], properties: { text: { bsonType: “string”, description: “must be a string and is
required” }, author: { bsonType: “objectId”, description: “must be an objectId and is required” } } } } } }
}})
The first argument to the createCollection method is the name of the collection i.e. we are defining
the name of the collection. The second argument is a document where we would configure the new
collection. The validator is an important piece of the configuration.

59

The validator key takes in another sub document where we can now define a schema against
incoming data where inserts and updates has to validated. We do this by inserting a $jsonSchema
key with another nested sub document which will hold the schema.
We can add a son type with the value of object, so that everything that gets added to the collection
should be a valid document or object. We can set a required key which has an array value. In this
array we can define names of fields in the document which will be part of the collection that are
absolutely required and if we try to add data that does not have these fields, we will get an error or
warning depending on our settings.
We can add a properties key which is another nested document where we can define how for every
property of every document that gets added to the collection will look like. In the example above
we defined the title property, which is a required property, in more detail. We can set the bsonType
which is the data type i.e. string, number, boolean, object, array etc. We can also set a description
for the data property.
Because an array and has multiple items, we can add an items key and describe how the items
should look like. We can nest this and this can have another nested required and properties keys for
the items objects that exists within the array.

60

So the Keys to remember are:
The bsonType key is the data type.
The required key is an array of required properties that must be within an insert/update document.
The properties key defines the properties. This has sub key:value of of bsonType and description.
The Item key defines the array items. This can have sub key:value of all the above.
Important Note: it may be difficult to read in the terminal and may be easier to write in a text editor
first and then paste into the terminal to execute the command. We can call the file validation.js to
save the collection validation configuration. Visual Studio/Atom/Sublime or any other text editor/
IDE will help with auto-formatting. Visual Studio has a option under code > Preference > Keyboard
Shortcuts and then you can search for a shortcut command such as format document (shortcut is
Shift + Option + F on a Mac).
We can now validate the incoming data when we explicitly create the new collection. We can copy
the command from the text editor and paste it back into the shell and run the command to create
the new collection with all our validation setup. This will return { “OK” : 1 } in the shell if the new
collection is successfully created.
If a new insert/update document fails the validation rules, the new document will not be added to
the collection.
61

Schemas & Relations: How to Structure Documents

Changing the Validation Action
As a database administrator we can run the following command:
$ db.runCommand( { collMod: “post”, validator: {…}, validationAction: “warn” } )
This allows us to run administrative commands in the shell. We pass a document with information
about the command we wish to run. For example, in the above we run a command called collMod
which stands for collection modifier, whereby we pass in the collection name and then we can pass
in the validator along with the whole schema.
We can amend the validator as we like i.e. add or remove validations. In the above we added
another administrative command after the validator document as a sibling called validationAction.
The validationLevel controls whether all inserts and updates are checked or only updates to
elements which were valid before. The validationAction on the other hand will either throw an
“error” and stope the insert/update action or “warn” of the error but allow the insert/update to
occur. The warn would have written a warning into our log file and the log file is stored on our
system. We can update the validation action later using the runCommand() method as seen above.

62

Schemas & Relations: How to Structure Documents

Conclusion
Things to consider when modelling and structuring our Data.
In which format will we fetch your data?
How does the application or data scientists need the data? We want to store the data in a
way that it is easy to fetch especially in a use case where we would fetch a lot.
How often will we fetch and change the data?
Do we need to optimise for writes or reads? It is often for reads but it may be different
depending on the scenario. If we write a lot then we want to avoid duplicates. If we read a lot
then maybe some duplicates are OK, provided these duplicates do not change often.
How much data will we save (and how big is it)?
If the data is huge, maybe embedding is not the best choice.
How is the data related (one to one, one to many, many to many)?
Will duplicate data hurt us (=> many Updates)?
Do we update our data a lot in which we have to update a lot of duplicates. Do we have
snapshot data where we do not care about updates to the most recent data.
Will we hit the MongoDB data/storage limit (embed 100 level deep and 16mb per
document)?
63

Modelling Schemas

Schema Validation

• Schemas should be modelled based on

• We can define rules to validate

application needs.
• Important factors are: read and write
frequencies, relations, amount (and size)
of data.

inserts and updates before writing
to the database.
• Choose the validation level and
action based on the application
requirements.

Modelling Relations
• Two options: embedded documents or references.
• Use embedded documents if we have one-to-one or one-to-many relationships and

there are no app or data size reasons to split the data.
• Use reference if data amount/size or app needs require it or for many-to-many
relations.
• Exceptions are always possible — keep the app requirements in mind!
Useful Articles & Documents:
https://docs.mongodb.com/manual/reference/limits/
https://docs.mongodb.com/manual/reference/bson-types/
https://docs.mongodb.com/manual/core/schema-validation/
64

Exploring The Shell & The Server

Setting dbpath & logpath
In the terminal we can run the following command to see all the available options for our mongoDB
server:
$ mongo --help
This command will provide a list of all the available options we can use to setup/configure our
mongoDB server. For example the --quiet option allows us to change the way things get logged or
output by the server.
Note: use the official document on the MongoDB website for more detailed explanation of all the
available options.
The --dbpath arg and --logpath arg allows us to configure where the data and log files gets stored
to because mongoDB writes our data to real files on our system. The logs allows us to see for
example warnings of json schema validation as we seen in the last section.
We can create folders such as db and logs (these can be named as anything we want) and have
65

these folders located anywhere we want for example we could create it within the mongoDB
directory which contains the bin folder and other related files.
If we start using mongod instance without any additional settings, it will use the root folder that has
a data/db folder to store all our database records as a default setting. However, we can use the
settings above to tell mongod to use another folder directory to store our data, the same is true for
our logs.
When we start the instance of our mongoDB server, we can run the following command and
passing in the options to declare the path of the dbpath and logpath as seen below:
Mac/Linux:
$ sudo mongod --dbpath /Users/userName/mongoDB/db
Windows command:
$ mongod --dbpath \Users\userName\mongoDB\db
Enter our password and this should bring up our mongoDB server as we have seen previously. We
should now see in the db folder, mongoDB has created a bunch of files as it is now saving the data
66

in the specified folder that we passed into our command. This is now using a totally different
database storage for writing all our data which is detached from the previous database storage of
the default database path. Running the following command will also work for our logs:
Mac/Linux:
$ sudo mongod --dbpath /Users/userName/mongoDB/db --logpath /Users/userName/mongoDB/
logs/logs.log
Windows command:
$ mongod --dbpath /Users/userName/mongoDB/db --logpath
\Users\userName\mongoDB\logs\logs.log
The logs folder path requires a log file which we would define with a .log extension. This will
automatically create and add a logs.log file within the directory path if the file does not exist when
we run the command. All the output in the terminal will now be logged in the logs.log file
compared to previously where it was logged in the terminal shell. This file can be reviewed for
persistent and auditing of our server and viewing any warnings/errors.
This is how we set custom paths for our database and log files.
67

Exploring The Shell & The Server

Exploring the mongoDB Options
If we explore the different options in mongoDB using the mongod --help command in the terminal,
there are many setup options available to us.
The WiredTiger options is related to our storage engine and we could either use the default
settings or change some configurations if we know what we are doing.
We have useful commands such as --repair which we could run if we have any issues connecting or
any warnings or issues related to our database files being corrupted. We could use the command
--directoryperdb which will store each database in its own separate directory folder.
We could change the storage engine using the —storageEngine arg command, which by default is
set to WiredTiger. Theoretically, mongoDB supports a variety of storage engines but WiredTiger is
the default high performance storage engine. Unless, we know what we are doing and have a
strong reason to change the engine, we should stick to the default.
There are other settings in regards to security which we will touch in the latter chapters.
68

Exploring The Shell & The Server

MongoDB as a Background Service
In the mongoDB options, there is an option called --fork which can only run on Mac and Linux.
$ mongod --fork --logpath /Users/userName/mongoDB/logs/logs.log/
The above fork command will error if we do not pass in a logpath to the log file. This command will
start the mongoDB server as a child process. This does not block the terminal and we can continue
to type in other commands in the same terminal with the server running. The server is now running
as a background process instead of a foreground process which usually blocks the terminal
window. In other words the mongoDB server is now running as a service (a service in the
background). Therefore, in the same terminal we could run the mongo command to connect to the
background mongoDB server service. This is also the reason why we require to pass in a logpath
because the service is running in the background and it cannot log error/warning messages in the
terminal, instead it will use/write the warning and errors in the log file.
On Windows, the fork command in unavailable. However, on Windows we can still startup
mongoDB server as a service if we checked this option at the installation process. If we right click on
command prompt and run as administrator, we can run the following command:

69

$ net start MongoDB
This will start up the mongoDB server as a background service. The question then becomes, how do
we stop such a service?
On Mac we can stop the service by connecting to the server with the mongo shell and then
switching to the admin database and running the shutdown server command to shut down the
server we are connected to. Example commands below:
$ use admin
$ db.shutdownServer()
The exact same approach as the above will work on Windows. On Windows we also have an
alternative method by opening the command prompt as administrator and running the following
command:
$ net stop MongoDB
This is how we can use MongoDB server as a background service (instead of a foreground service)
on either Mac, Linux or Windows.

70

Exploring The Shell & The Server

Using a Config File
Now that we have seen the various options we can set and use to run our mongoDB server, it is also
worth noting that we can save our settings in a configuration file.
https://docs.mongodb.com/manual/reference/configuration-options/
This file could be automatically created for us when we run our mongoDB server, else we could
create the config file ourselves and save this anywhere we want. We could create the config file
within the Users/userName/MongoDB/bin folder using a text editor such as VS Code to add the
configuration code:
storage:
dbPath: “/Users/userName/mongoDB/db/“
systemLog:
destination: file
path: “/Users/userName/mongoDB/logs/logs.log/“
We can look at the documents or google search for more comprehensive con gif file setup.
71

Once we have the config file setup, how do we use the config file when we run an instance of the
mongoDB server? MongoDB does not automatically pickup this file when we start to run the
mongoDB server, instead when starting mongoDB we can use the following command to specify
the config file the server should use:
$ sudo mongod --config /Users/userName/mongoDB/bin/mongod.cfg
$ sudo mongod -f /Users/userName/mongoDB/bin/mongod.cfg
Either above command will prompt mongoDB to use the config file from the path specified. This will
start the mongoDB server with the settings setup in the configuration file. This is a useful feature
because it allows us to save a snapshot of our settings (reusable blueprint) in a separate file which
we can always use when starting up our mongoDB server. This also saves us time on writing a very
long command prompt with all our settings when starting up our mongoDB server each time.
Important Note: we could use either .cfg or .conf as the file extension name when creating the
mongoDB configuration file.

72

Exploring The Shell & The Server

Shell Options & Help
In this section we will go over the various shell options available to for us to use. Similar to the
mongoDB server, there is a help option for the mongoDB shell:
$ mongo --help
This will provide all the command options for the shell. This has less options compared to the server
because the shell is just a connecting client at the end of the day and not a server. We can use the
shel without connecting to a database (if we just want to run javascript code) using the --nodb
command, or we could use the --quiet command to have less output information in the terminal, we
can define the port and host for the server using the --port arg and --host arg commands(by default
it uses localhost:27017) and many more other options.
We can also add Authentication Options informations which we will learn more in later chapters.
In the shell we also have another command we can run:
$ help

73

This command will output a shortlist of some important help information/commands we can
execute in the shell. We can also dive deeper into the help by running the help command followed
by the command we want further help on, for example:
$ help admin
This will show further useful commands that we can execute when using the admin command e.g.
admin.hostname() or admin.pwd() etc.
We can also have help displayed for a given database or collection in a database. For example:
$ use test
$ db.help()
We would now see all the commands that we did not see before that we can use on the new “test”
database. We can also get help on the collection level which will provide a list of all th commands
we can execute at the collection level.
$ db.testCollection.help()
Useful Links:
https://docs.mongodb.com/manual/reference/configuration-options/
https://docs.mongodb.com/manual/reference/program/mongo/
https://docs.mongodb.com/manual/reference/program/mongod/
74

Using the MongoDB Compass to Explore Data Visually

Exploring MondoDB Compass
We can download MongoDB Compass from the below link:
https://www.mongodb.com/products/compass
This is a GUI tool to interact with our MongoDB database. Once downloaded and installed on our
machines we are ready to use the GUI tool. It is important to have the mongod server running in the
background when we open the MongoDB Compass to connect to the database. We would connect
to a Host and this by default will have localhost and port 27017. We can click connect and this will
connect the GUI tool to the mongod server. We should be able to see the 3 default databases of
admin, config and local.
We can now use the GUI tool to create a new database and collection name. Once a database and
collection has been created we can then insert documents to the collection. We can also query our
database documents.
We can now start using a GUI tool to interact with our database, collections and documents.
Note: it is best practice to learn how to use the shell first before using GUI tools.
75

Diving Into Create Operation

Understanding insert() Methods
We already understand that there are two methods for inserting documents into mongoDB which
are insertOne() and insertMany() as an alternative. The most important thing to note is that
insertOne() takes in a single document and we can but do not need to specify an id because we will
get one automatically. The insertMany() does the same but with an array (list) of documents.
There is also a third alternative method for inserting documents called insert() — below is an
example:
$ db.collectionName.insert()
This command is more flexible because it takes both a single document or an array of documents.
Insert was used in the past but insertOne and insertMany was introduced on purpose so that we are
more clear about what we will be inserting. Previously, in application code it was difficult to tell with
the insert command whether the application was inserting a single or multiple documents and
therefore may have been error prone.
There is also an importing data command as seen below:
$ mongoimport -d cars -c carsList --drop —jsonArray
76

The insert method can still be used in mongoDB but it is not recommended. The insert() method
works with both a single document and multiple documents as seen in the examples below:
$ db.persons.insert( { name: “Annie”, age: 20 } )
$ db.persons.insert( [ { name: “Barbara”, age: 45 }, { name: “Carl”, age: 65 } ] )
The output message in the terminal is also slightly different i.e. we would receive a text of:
$ WriteResult( { “nInserted” : 1 } )
$ BulkWriteResult( { “writeErrors”: [], “writeConcernErrors”: [], “nInserted”: 2, “nUpserted”: 0,
“nMatched”: 0, “nModified”: 0, “nRemoved”: 0, “upserted”: [] } )
The above does not mean that the inserted document did not get an autogenerated id. The insert
method will automatically create an ObjectId but will not display the ObjectId unlike the insertOne
and insertMany commands output messages which does display the ObjectId. We can see the
advantages of insertOne and InsertMany as the output message is a little more meaningful/helpful
as we can immediately work with the document using the ObjectId provided (i.e. we do not need to
query the database to get the new document id).

77

Diving Into Create Operation

Working With Ordered Inserts
When inserting documents we can define or specify some additional information. Lets look at an
example of a hobbies collection where we keep track of all the hobbies people could possibly have
when we insert many hobbies. Each hobby is a document with the name of the hobby:
$ db.hobbies.inertMany( [ { _id: “sports”, “name”: “Sports” }, { _id: “cooking”, “name”: “Cooking” },
{ _id: “cars”, “name”: “Cars” } ] )
The id’s for these hobbies can be auto-generated. However, there may be times when we want to
use our own id because the data may have been fetched from some other database where we
already have an existing id associated or maybe we need a shorter id. We can use _id and assign a
value for the id. In the above the hobby name could act as a good id because each hobby will be
unique. We must use _id and not just id if we want to set our own id for our documents.
Furthermore, the id must be unique else this would not work. We will no longer see an ObjectId()
for these documents as we have used the _id as the unique identifier for the documents inserted.
If we try to insert a document with the same id we would receive an error message in the terminal
referencing the index number (mongoDB uses zero indexing) of the document that failed the insert
operation along with a description of duplicate key error.
78

$ db.hobbies.inertMany( [ { _id: “yoga”, “name”: “Yoga” }, { _id: “cooking”, “name”: “Cooking” }, { _id:
“hiking”, “name”: “Hiking” } ] )
The above would fail due to the duplicate key error of cooking which was inserted previously in the
above command. However, we would notice on the first item in the insertMany array i.e. Yogo will
be inserted into the hobbies collectio, but the cooking and hiking documents will not be inserted
into the collection due to the error. This is the default behaviour of mongoDB and this is called an
ordered insert.
An ordered insert simply means that every element we insert is processed standalone, but if one
fails, it cancels the entire insert operation but does not rollback the elements it has already inserted.
This is important to note because it cancels the operation and does not continue to the next
document (element i.e. hoking) which we would have known that it would have succeeded insert.
Often we would want this default behaviour, however, sometimes we do not. In these cases, we
could override the behaviour. We would pass in a second argument, separated by a comma, to the
insertMany command which is a document. This is a document that configures the insertMany
operation.
$ db.hobbies.inertMany( [ { _id: “yoga”, “name”: “Yoga” }, { _id: “cooking”, “name”: “Cooking” }, { _id:
“hiking”, “name”: “Hiking” } ], { ordered: false} )
79

The ordered option allows us to specify whether mongoDB should perform an ordered insert which
is the default (we could set this ordered option to true which is redundant because this is the default
option) or we could set this option to false which will make the insert operation not an ordered
insert i.e. an unordered insert.
If we hit enter, we would still get a list of all the error, however, it will continue to the next document
to perform the insert operation and this would insert the document that does not have any issues of
duplicate keys i.e. hiking will now be inserted into the hobbies collection (yoga and cooking would
fail due to the duplicate key issue).
By setting the ordered to false, we have changed the default behaviour and it is up to us to decide
what we require or want in our application. It is important to note that this will not rollback the entire
insert operation if something failed. This is something we will cover in the Transactions chapter. We
can control whether the operation continues with the other documents and tries to insert everything
that is perfectly fine.
We may use an unordered insert where we do not have much control with what is inserted into the
database but we do not care about any document that fail because they already exist in the
database. We could add everything that is not in the database.

80

Diving Into Create Operation

Understanding the writeConcern
There is a second option we can specify on insertOne and insertMany which is the writeConcern
option. We have a client (either the shell or the application using a mongoDB server) and we have
our mongoDB server. If we wanted to insert one document in our mongoDB server, on the
mongoDB server we have a so called storage engine which is responsible for really writing our data
onto the disk and also for managing it in memory. So our write might first end up in memory and
there it manages the data which it needs to access with high frequency because memory is faster
than working with the disk. The write is also scheduled to then end up on the disk, so it will
eventually store data on the disk. This is true for all write operations i.e. insertMany and update.
We can configure a so-called writeConcern on all the write operations with an additional argument,
the writeConcern which is another document where we can set settings.

Client (e.g. Shell)

MongoDB Server (mongod)

e.g. insertOne()

Memory
{ w:1, j: undefined }

81

Storage Engine

Data on Disk

Journal

The w: (default) option tells the mongoDB server of how many instances we want the write to be
acknowledged. The j: option stands for journal which is an additional file which the storage engine
manages, which is like a To-Do file. The journal can be kept to then for example perform save
operations that the storage engine needs to do but have not been completed yet.
The storage engine is aware of the write and that it needs to store the data on disk just by having
the write being acknowledged and being in memory. The idea behind a journal file is to make the
storage engine aware of this and if the server should go down for some reason or anything else
should happen, the journal file is there. If we restart the server or if the server recovers, the server
can look to this file and see what it needs to do. This is a nice backup because the memory might
have been wiped by then. The journal acts as a backup to-do list for the storage engine.
Writing into the database files is more performance heavy whereas a journal is like a single line
which describes the write operations. Writing into the database is of course a more complex task
because we need to find the correct position to insert the data and if we have indexes we also need
to update these as well and therefore takes longer to perform. Writing in a to-do type list is much
quicker.
We can set the j: true as an option which will now report a success for a write operation when it has
been acknowledged and has been saved to the journal. This will provide a greater security.
82

There is a third option to the writeConcern which is the wtimeout: option. This simply sets the
timeframe that we give out server to report a success for the write before we cancel it. For example,
if we have some issues with the server connection or anything of that nature, we may simply
timeout.
If we set the timeout value to a very low number, we may get more fails even though there is no
actual problem, just some small latency.

{ w:1, j: undefined }

{ w:1, j: true }

{ w:1, timeout: 200, j: true }

This is the writeConcern option we can add to our write operations and how we can control this
using the above document settings. Enabling the journal would mean that our writes will take
longer because we do not only tell the server about the write operation but we also need to wait for
the server to store the write operation in the journal, however, we get higher security that the write
also succeeded. These option will again depend on our application needs.

83

Diving Into Create Operation

The writeConcern in Practice
Below is an example of using the writeConcern:
$ db.persons.insertOne( { name: “Alan”, age: 44 }, { writeConcern: { w: 1, j: true } } )
The w:1 (default) simply means to make sure the server acknowledged the write operation. Note we
could set this value to 0 which will return {“acknowledged”: false} in the terminal when we insert the
document. This option sends the request and immediately returns without waiting for a response of
the request from the server. The storage engine had no chance of storing it in memory and to
generate an objectId, hence why we receive {“acknowledged”: false} in the terminal. This makes the
write super fast because we do not have to wait for any response but we do not know where the
write succeeded or not.
The journal by default is set to undefined or false. We can set this option to j: true. The output in the
terminal does not change. The write will be slightly slower (note if playing around locally we would
not notice any change in speed) because the engine would add the write to the journal and we
would have to wait for that to finish before the operation is completed. This will provide a higher
security by ensuring the write appears in the to-do list of the storage engine which will eventually
84

lead to the write operation occurring on the database.
Finally, the wtimeout: option is used to set a timeout operation. This allows us to set a time frame for
the write operation so in the case where within a certain period within the year we would have shaky
connections, we would rather have the write operation fail and we recognise it in our client
application (we would have access to the error) and therefore try again at a later time without
having to wait unnecessarily for the write operation where we have shaky connections.

Diving Into Create Operation

What is Atomicity?
Atomicity is a very important concept to any write operation. Most of the time the write operation
e.g. InsertOne() would succeed, however, it can fail (there can be an error). These are errors that
occur whilst the document is being inserted/written to memory (i.e. whilst being handled by the
storage engine). For example, we were writing a document for a person including name, age and an
array of hobbies, the name and age were written but then the server had issues and was not able to
write the hobbies to memory. MongoDB protects us against this as it guarantees us an atomic
transaction. This means the transaction either succeeds as a whole or it fails as a whole. If it fails
during the write, everything is rolled back for the document we inserted.

85

This is important as it is on a per document level. The document means the top level document, so
it includes all embedded documents and all arrays.
If we insertMany() where there are multiple documents being inserted into the database and the
server fails during a write, we do not get atomicity because it only works at a document level. If we
have multiple documents in one operation like the insertMany() operation, then only each
document on its own is guaranteed to either fail or succeed but not on insertMany. Therefore, if we
have issues during the insertMany operation, only the documents that failed are not inserted and
then the exact behaviour will depend on whether we used ordered or unordered inserts but the
document already inserted will not be rolled back.
We are able to control this on a bulk insert or bulk update level using a concept called transactions
which we will look at in a later section as it requires some additional knowledge about mongoDB
and how the service works.

Success

Saved as a whole

Error

Rolled back (i.e Nothing is saved)

Operation (e.g. InsertOne())

MongoDB CRUD Operations are Atomic on the document level (including Embedded documents)

86

Diving Into Create Operation

Importing Data
To import data into our database, we must first exit the shell by pressing the control + c keys on our
keyboard.
In the normal terminal, we need to navigate to a folder that contains the JSON file that we would
want to import (JSON files can be imported) using the cd command. We can use the ls command to
view the list of items within the directory we are currently in.
Once navigated to the folder containing the import file, we can run the following command:
$ mongoimport tv-shows.json -d moviesData -c movies --jsonArray --drop
The mongoimport command should be globally available since we added the path to our mongo
binary to our path variables on our operating systems. If we did not do this, we need to navigate
into the folder where our mongoDB binaries are in order to execute the mongoimport command
above.

87

The first argument we pass is the name of the file we want to import (if we are not in the path of the
located file we would have to specify the full folder path along with the file name). We then specify
the database we want to import the data into using the -d flag. We can also specify the collection by
using the -c flag.
If the JSON file holds array of documents we must also specify the --jsonArray flag to make the
mongoimport command aware of this fact about the import data.
The last argument option we can add to the import command is the --drop which will tell the
mongoimport that should this collection should already exist, it will be dropped and then re-added,
otherwise it will append the data to the existing collection.
Important Note: the mongod server should be running in the background when we use the import
command. When we press enter to execute the command, this will return in the terminal the
connected to: localhost, dropping: moviesData.movies and imported: # documents in the terminal
as a response to inform us which mongoDB server it is connected to, whether a collection was
dropped/deleted from the database collection and the total number of data imported into the
database collection.

88

Diving Into Read Operation

Methods, Filters & Operators
In the shell, we access the database with the db command (this will differ slightly in a mongoDB
drive). We would get access to a database and then to a collection in the database. Now we can
execute a method like find, insert, update or delete on the collection. We would pass some data
into the method as parameters/arguments for the method. These are usually a key:value pair where
one is the field and the other is the value for that field name (documents are all about field and
values or key and values).
Apply this
Access current
database

db

.

myCollection
Access this
collection

Method

.

find(

Equality/Single value
Filter

{ age:
Field

32 }

)

Value

The argument in the above example happens to also be a filter because the find method accepts a
filter. It can use a filter to narrow down the set of documents it returns to us. In the above we have a
equality or single value filter where the data is exactly the criteria i.e. equality.

89

We can also use more complex filters as seen in the below example. We have a document which
has a field and its value is another document which has an operator as a filed followed by a value.

db

.

myCollection

.

Apply this

Range

Method

Filter

find(

{ age:

{

Field

$gt
Operator

:

30

}

}

)

Value

We can recognise operators by the dollar sign $ at the beginning of the operator. These are all
reserved fields which are understood by mongoldb. The operator in the above example is called a
range filter because it does not just filter for equality, instead this will look for all documents that
have an age that is greater than ($gt) the value i.e. 30.
this is how the Read operator works and we will look at various different operators an the different
ways of using them and the different ways of filtering the data that is returned to us. This is the
structure we should familiarise ourselves with for all of our Read operations.

90

Diving Into Read Operation

Operators and Overview
There are different operators that we can differentiate into two groups:
Query Selectors
Projection Operators
Query selectors such as $gt allows us to narrow down the set of documents we retrieve while
projection operators allows us to transform/change the data we get back to some extent. Both the
Query and Projection operators are Read related operators.
Aggregation allows us to read from a database but also perform more complex transforms. This
concept allows us to setup pipeline of stages to funnel our data through and we have a few
operators that allow us to shape the data we get back to the form we need in our application.
Pipeline Stages
Pipeline Operators
The Update has operators for the fields and arrays. Inserts have no operators and deletes uses the
same operators as the Read operators.
91

How do operators impact our data?

Type

Purpose

Change Data?

Example

Query Operator

Locate Data

No

$eq $gt

Project Operator

Modify data
Presentation

No

$

Update Operator

Modify & Add
additional data

Yes

$inc

Diving Into Read Operation

Query Selectors & Projection Operators
There are a couple of categories for Query Selectors:
Comparison, Logical, Element, Evaluation, Array, Comments & Geospatial.
Projections Operators we have:
$, $elemMatc, $meta & $slice

92

Diving Into Read Operation

Understanding findOne() and find()
The findOne() method finds exactly one document. We are able to pass in a filter into the method to
define which one document to return back. This will find the first matching document.
$ db.movies.findOne( )
$ db.movies.findOne( { } )
Both of the above syntax will return the first document within the database collection. Note, this
does not return a cursor as it only returns one document.
The alternative to findOne() is the find() method. The find() method will return back a cursor. This
method theoretically returns one document, but since it provides us a cursor, it does not give us all
the document but the first 20 documents within the shell.
$ db.movies.find( )
To narrow the find search we would need to provide a filter. To provide a filter we would pass in a
document as the first argument (this is true for both find and findOne methods). The difference
would be that findOne will return the first document that meets the criteria while the find method
93

will return all documents that meet the criteria.
$ db.movies.findOne( { name: “The Last Ship” } )
$ db.movies.findOne( { runtime: 60 } )
$ db.movies.find( { name: “The Last Ship” } )
$ db.movies.find( { runtime: 60 } )
To filter the data, we would specify the name of the field/key followed by the value we are expecting
to filter the field by. In the above example we are filtering the name of the movie to be “The Last
Ship”. By default mongoDB will try to find the filter by equality.
This is the difference between find and findOne and how we would pass in a filter to narrow down
the return read results. It is important to note that there are way more operators and ways to filtering
our queries to narrow down our Read results when using either of the find commands.

Diving Into Read Operation

Working with Comparison Operators
In the official documentation we can view all the various operations available to us:
https://docs.mongodb.com/manual/reference/operator/query/
94

We will explore some of the comparative operators in the below examples:
$ db.movies.find( { runtime: 60 } ).pretty( )
$ db.movies.find( { runtime: { $eq: 60 } } ).pretty( )
The $eq operator is the exact same as the default equality query which will find the document that
matches equally to the query value which in the above case is runtime = 60.
$ db.movies.find( { runtime: { $ne: 60 } } ).pretty( )
This will return all documents that have a runtime not equal to 60.
$ db.movies.find( { runtime: { $gt: 40 } } ).pretty( )
$ db.movies.find( { runtime: { $gte: 40 } } ).pretty( )
The $gt return all documents that have a runtime greater than to 40 while the $gte returns greater
than or equal to 40.
$ db.movies.find( { runtime: { $lt: 40 } } ).pretty( )
$ db.movies.find( { runtime: { $lte: 40 } } ).pretty( )
The $lt return all documents that have a runtime less than to 40 while the $lte returns less than or
equal to 40.
95

Diving Into Read Operation

Querying Embedded Fields & Arrays
We are not limited to querying top level fields and are also able to query embedded fields and
arrays. To query embedded fields and arrays is quite simple as demonstrated below:
$ db.movies.find( { “rating.average”: { $gt: 7 } } ).pretty( )
We specify the path to the field that we are interested in querying the data. We must put the path
within quotations marks because we use the dot (which will invalidate the syntax) to detail each
embedded field within the path that leads to the field we are interested in. The above example is a
single level embedded document, if we wrote e.g. rating.total.average, this is a 2 level embedded
document. We can make the path as deep as we need it to be and we are not limited to one level.
We can also query arrays as seen below:
$ db.movies.find( { genres: “Drama” } ).pretty( )
$ db.movies.find( { genres: [“Drama”] } ).pretty( )
The casing is important in the query. This will return all genres thats has Drama included in it.
Equality in an array does not mean that Drama is the only item within the array; it means that Drama
exists within the array. If we wanted an exactly only Drama within the array we would use square
brackets. We can also use dot to go down embed level paths that has an array.
96

Diving Into Read Operation

Understanding $in and $nin
If there are two discrete values that we wish to check/query our data, for example runtime that is
either 30 and/or 42, we can use the $in operator. The $in operator takes in an array which holds all
the values that will be accepted to be values within the key/field.
The below example return data that have a runtime equal to 30 or 42.
$ db.movies.find( { runtime: { $in: [ 30, 42 ] } } ).pretty( )
The $nin on the other hand is the opposite to $in operator. It finds everything where the value is not
within the set of values defined in the square brackets. The below example returns all entries where
the runtime is not equal to 30 or 42.
$ db.movies.find( { runtime: { $nin: [ 30, 42 ] } } ).pretty( )
We have now explore all the Comparison operators within mongoDB and will continue to now look
at logical query operators such as $and, $not, $nor and $or in the next section.

97

Diving Into Read Operation

Understanding $or and $nor
There are four different logical operators and these are $and, $not, $nor and $or operators. We
would probably use the $or logical operator more compared to the other logical operators. Below
is an example of the $or and $nor operator in action.
$ db.movies.find( { $or: [ {“rating.average”: { $lt: 5 } }, {“rating.average”: { $gt: 9.3 } } ] } ).pretty( )
$ db.movies.find( { $or: [ {“rating.average”: { $lt: 5 } }, {“rating.average”: { $gt: 9.3 } } ] } ).count( )
We start the filter with the $or to tell mongoDB that we have multiple conditions and then add an
array which will hold all the conditions that mongoDB will check. The or logical condition means
that it will return results that match any of these conditions. We would specify our filters as we would
normally would do within our find but now held within the $or array. We can have many expressions
as we want within the $or array, in the above we have two conditions. Note: if we change pretty( ) for
count( ), this will return the total number of documents that meet the criteria rather than the
document itself.
$ db.movies.find( { $nor: [ {“rating.average”: { $lt: 5 } }, {“rating.average”: { $gt: 9.3 } } ] } ).pretty( )
The $nor operator is the opposite/inverse of the $or operator. It returns where neither of the
conditions are met i.e neither conditions are true the complete opposite.
98

Diving Into Read Operation

Understanding $and Operator
The syntax for the $and operator is similar to the $or and $nor operator. The array of documents
acts as the logical conditions and will return all documents where all conditions are met. We can
have as many conditions as we want. Below is an example of the $and logical operator:
$ db.movies.find( { $and: [ { “rating.average”: {$gt: 9} }, { genres: “Drama” } ] } ).pretty( )
In this example, we are trying to find all documents that are Drama with a high rating that is greater
than 9. This is the old syntax and there is now a shorter syntax as seen below:
$ db.movies.find( { { “rating.average”: {$gt: 9}, genres: “Drama” } ).pretty( )
The new shorter syntax does not require the $and operator, instead we use a single document and
write our conditions separating each condition with a comma. By default, mongoDB ands all key
fields that we add to the filtered document. The $and is the default concatenation for mongoDB.
The reason why we have the $and operator is because not all drivers accepts the above syntax.
Furthermore, the above shorthand syntax would return a different result when we filter on the same
key elements.
99

If we examine the two syntax below, we would notice that they both return a different result.
$ db.movies.find( { $and: [ { genre:”Drama” }, { genres: “Horror” } ] } ).count( )
$ db.movies.find( { { genre: “Drama", genres: “Horror” } ).count( )
When we use the second syntax, mongoDB replaces the first condition with the second and
therefore it is the same as filtering for Horror genre only.
$ db.movies.find( { { genre: “Drama", genres: “Horror” } ).count( )
$ db.movies.find( { genres: “Horror” } ).count( )
Therefore, in the scenario where we are looking for both the Drama and Horror values from the
genre key element, it is recommended to use the $and operator for mongoDB to look for both
conditions to return true.

100

Diving Into Read Operation

Understanding $not Operator
The $not operator inverts the effect of a query operator. For example if we query to find movies that
do not have a runtime of 60minutes as seen in the below syntax.
$ db.movies.find( { runtime: { $not: { $eq: 60 } } } ).count( )
The $not operator is less likely to be use as it can be achieved using much simpler alternatives for
example we can use the not equal operator or $nor operator:
$ db.movies.find( { runtime: { $ne: 60 } } ).count( )
$ db.movies.find( {$nor [ { runtime: { $eq: 60 } } ] } ).count( )
There are a lot of ways for querying the inverse, however, where we cannot just simply inverse the
query in another way, we have the $not which we can use to look for the opposite.
We have now examined all four of the logical operators available within mongoDB that we can use
as filters for our Read operations.

101

Diving Into Read Operation

Element Operators
There are two types of element operators which are $exists and $type. This allows us to query by
elements within our database collection.
$ db.users.find( { age: { $exists: true } } ).pretty( )
This will check within our database and return all results where the document contains an age
element/field. Alternatively, we could have queried $exists to be false in order to retrieve all
documents that do not have age as an element/field.
We can query the $exists operator with other operators. In the below example we are filtering by
the age element to exist and age is greater than or equal to 30:
$ db.users.find( { age: { $exists: true, $gte: 30 } } ).pretty( )
To search for a field that exists but also has a value in the field, we would query as seen below:
$ db.users.find( { age: { $exists: true, $ne: null } } ).pretty( )
The $type operator on the other hand, as the name would suggests, returns the document that have
102

The specified field element of the specified data type.
$ db.users.find( { phone: {$type: “number”} } ).pretty( )
The example above returns documents where the phone field element has values of the data type
number. Number is an alias that basically sums up floats and integers. If we searched for the type of
double this would also return back a document even if there are no decimal places. Since the shell
is based on JavaScript, by default, a number inserted into the database will be stored as a floating
point number/double because JavaScript which drives the shell does not know the difference
between integers and doubles as it only knows doubles. The shell takes the number and stores it as
a double even though if we have no decimal places. This is the reason why we could also search by
the type of double and retrieve the documents as well.
We can also specify multiple types by passing an array. The below will look for both data types of a
double and a string and return documents that match the filter condition:
$ db.users.find( { phone: {$type: [ “double”, “string” ] } } ).pretty( )
We can use the type operator to ensure that we only work with the right type of data when returning
some documents.

103

Diving Into Read Operation

Understanding Evaluation Operators - $regex
The $regex operator allows us to search for text. This type of query is not super performant. Regex
stands for regular expression and it allows us to search for certain text based on certain patterns.
Regular expressions is a huge complex topic on its own and is something not covered deeply within
this mongoDB guide. Below is an example of using a simple $regex operator.
$ db.movies.find( { summary: { $regex: /musical/ } } ).pretty( )
In this example the query will look at all the summary key field values to find the word musical
contained in the value and return all matching results.
Regex is very useful for searching for text based on a pattern, however, it is not the most efficient/
performant way of searching/retrieving data (the text index may be a better option and we will
explore this in later chapters).

104

Diving Into Read Operation

Understanding Evaluation Operators - $expr
The $expr operator is useful if we want to compare two fields inside of one document and then find
all documents where this comparison returns a certain result. Below is an example code:
$ use financialData
$ db.sales.insertMany( [ { volume: 100, target: 120 }, { volume: 89, target: 80 }, { volume: 200, target:
177 } ] )
$ db.sales.find( { $expr: { $gt: [ “$volume”, “$target” ] } } ).pretty( )
In the above $expr (expression) we are retrieving all documents where the volume is above the
target. This is the most typical use case where we would use the expression operator to query the
data in such a manner.
The $expr operator takes in a document describing the expression. We can use comparison
operators like gt, lt and so on — more valid expressions and which operators we can use can be
found in the official documentation. We reference fields in the array rather than the number, and
these must be wrapped in quotation marks along with a dollar sign at the beginning. This will tell
mongoDB to look in the field and use the value in the expression. This should return two documents
that meet the expression criteria.
105

Below is another more complex expression example:
$ db.sales.find( { $expr: { $gt: [ { $cond: { if: { $gte: [“$volume”, 190 ] }, then: { $subtract: [“$volume”,
10 ] }, else: “$volume” } }, “$target” ] } } ).pretty( )
Not only are we comparing whether volume is greater than target but also where volume is above
190, the difference between volume and target must be at least 10. To achieve this we have to
change the expression inside our $gt operator.
The first value will be a document where we use a special $cond operator for condition. The $cond
works in tandem with the $expr operator. We are using an if: and then: to calculate the value
dynamically. The if is another comparative operator. We are $subtracting 10 from the volume value
for all the items that are greater than or equal to 190. We use an else: case to define cases that do
not match the above criteria, and in this case we would just use the volume value.
We would finally compare the value with the target to check whether the value is still greater than or
equal to the target. This should return 2 documents.
As we can see from the example above, this is a very powerful command within our tool belt when
querying data from a mongoDB database.
106

Diving Into Read Operation

Diving Deeper into Querying Arrays
There are multiple things we can perform when querying arrays and there are special operators that
help us with queuing arrays. If we want to search for example all documents that have an
embedded sports document, we cannot use the normal queries that we have previously used for
example, if we had embedded documents for hobbies that had title and frequency:
$ db.users.find( { hobbies: “Sports" } ).pretty( )
$ db.users.find( { hobbies: { title: “Sports” } } ).pretty( )
Both of these will not return any results if there are multiple fields within an embedded document.
$ db.users.find( { hobbies: { title: “Sports”, frequency: 2 } } ).pretty( )
This will find any documents that meet both the criteria, however, what if we only want to retrieve all
documents that have an embedded Sports document in an array, regardless of the frequency?
$ db.users.find( { “hobbies.title”: “Sports” } ).pretty( )
We search for a path using dot notation. This must be wrapped in quotation. MongoDB will go
through all of the hobbies elements and within each element it will dig into the document and
compare title to our query value of Sports. Therefore, this will retrieve the relevant documents even
if within an embedded array and there are multiple array documents. This is how we would query
array data using the dot notation.
107

Diving Into Read Operation

Using Array Query Selector - $size, $all &
$elemMatch
There are three dedicated query selectors for Arrays which are $size, $all and $elemMatch
operators. We will examine each of these selectors and their applications.
The $size selector operator allows us to select or retrieve documents where the array is of a certain
size, for example we wanted to return all documents that had an embedded array size of 3
documents. For example:
$ db.users.find( { hobbies: { $size: 3 } } ).pretty( )
This will return all documents within the users collection where the hobbies array size is 3
documents. Note: the $size operator takes an exact match of a number value and cannot be
something like greater or less than 3. This is something mongoDB does not support yet and we will
have to retrieve using a different method.
The $all selector operator allows us to retrieve documents from an array based on the exact values
without worrying about the order of the items within the array. For example if we had a movie
collection where we wanted to retrieve those with a genre of thriller and action but without caring
108

for the order of the values, this is where the $all array selector will help us.
$ db.movies.find( { genre: [“action”, “thriller”] } ).pretty( )
$ db.movies.find( { genre: { $all: [“action”, “thriller”] } } ).pretty( )
The second syntax will ensure both array elements of action and thriller exists within the genre field
and ignores the ordering of these elements (i.e. ordering does not matter) whereas, the first syntax
would take the order of the elements into consideration (i.e. the ordering matters).
Finally, the $elemMatch array selector allows us to retrieve documents where one and the same
element should match our conditions.
In the syntax below, this will find all documents where the hobbies has at least one document with
the title of Sports and a document with a frequency greater than or equal to 3 and it does not have
to be the same document/element. This would mean that a user who has the title of Sports but a
frequency below 3 and a title of a different hobby but a frequency greater then 3, will match the
criteria as it has at least one of each document criteria.
$ db.users( { $and: [ { “hobbies.title”: “Sports” }, { “hobbies.frequency”: { $gte: 3 } } ] } ).pretty( )
To ensure we retrieve all documents where the hobbies is Sports and its frequency is greater than or
equal to 3 is returned, the $elemMatch operator is useful:
$ db.movies.find( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } } ).pretty( )
109

Diving Into Read Operation

Understanding Cursors
The find() method unlike the findOne() method yields us with a cursor object. Why is this cursor
object important? If our client communicates with the mongoDB server and we potentially retrieve
thousands if not millions of documents with our find() query depending on the scale of the
application. Retrieving all these results is very insufficient because all these documents have to be
fetched from the database, they have to be sent over the wire and then loaded into memory in the
client application. These are the three things that are not optimal. In most cases we would not need
all the data at the same time; therefore, find gives us a cursor.
A cursor is basically a pointer which has the query we wrote stored and can go back to the database
as request the next batch. We therefore work with batches of data and we fetch documents over the
wire one document at a time. The shell by default takes the cursor and retrieves the first 20
documents, it then fetches in batches of 20 documents.
If we write our own application with a mongoDB driver, we have to control that cursor manually to
make sure that we get back our results. The cursor approach is beneficial because it saves on
resources and we only load a batch of documents rather than all the documents from a query.

110

Diving Into Read Operation

Applying Cursors
When using the find command in the shell this will display the first 20 documents. We can use the
“it” command in shell to retrieve the next 20 batches of documents and keep using this command
until we have exhausted the cursor i.e. there are no more documents to load. The “it” command will
not work with the mongoDB drivers. Instead most drivers will have a next() method we can call
instead. If we use the next() method within the shell, this will retrieve only one document and there
will be no “it” command to retrieve the next document. If we run the command again this will restart
the find query retrieving the first document again.
$ db.movies.find( ).next( )
Since the shell uses JavaScript, we can use JavaScript syntax to store the cursor value in a variable.
We can then use the next() method the to cycle through the next document on the cursor which will
retrieve the next document continuing on from the last cursor value.
$ const dataCursor = db.movies.find( )
$ dataCursor.next( )
There are other cursor methods available to us in mongoDB that we can use on our find() query.
111

$ dataCursor
$ it
This will return the first 20 batch of documents. Using the it command will retrieve the next 20
documents i.e. the default shell behaviour for cursors.
$ dataCursor.forEach( doc => { printjson(doc) } )
The forEach() method will vary on the driver we are using, but in JavaScript the forEach() method
takes in a function that will be executed for every element that can be loaded through the cursor. In
javascript we get a document which is provided by the forEach method which is passed in as the
input and then our arrow function provides the body what we want to do. In the above example we
are using the printjson() method which is provided by the shell to output the document.
This will cycle through all the remaining documents inside of the cursor (this will exclude any
documents we have already searched for i.e. using the next() method or find() method). The forEach
ill retrieve all the remaining documents and there will be no “it” command to fetch the next
documents as we would have exhausted all the documents in the cursor.
$ dataCursor.hasNext( )
The hasNext() method will return true or false to indicate if we have/have not exhausted the cursor.
We can create a new variable to reset the cursor for const variables or if we used let or var variables
we can re-instantiate the original variable again to reset the cursor to the beginning.
112

We can learn more on cursors and the shell or the mongoDB drivers on the official mongoDB
documentation:
https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/

Diving Into Read Operation

Sorting Cursor Results
A common operation is to sort the data that we retrieve. We are able to sort by anything whether it
is a string sorted alphabetically or a number sorted by numeric value. To sort the data in mongoDB
we would use the sort() method on our find function. Below is an example:
$ db.movies.find( ).sort( { “rating.average”: 1 } ).petty( )
The sort takes in a document to describe how to sort the retrieved data. We can sort by a top level
document field or an embedded document field. The values we use to sort describe the direction to
sort the data i.e. 1 means ascending (lowest value first) and -1 means descending (highest value
first). We are not limited to 1 sorting criteria for example we want to sort by the average ratings first
but then we want to sort by the runtime:
$ db.movies.find( ).sort( { “rating.average”: 1, runtime: -1 } ).pretty( )

113

Diving Into Read Operation

Skipping & Limiting Cursor Results
We are able to skip a certain amount of elements. Why would we want to skip elements? If on our
application or web app we implement pagination where users can view results distributed across
multiple pages (e.g. 10 elements per page) because we do not want to show all results on one
page. If the user switches to a page e.g. page 3, we would want to skip the first 20 results to display
the results for page 3.
The skip method allows us to skip cursor results. Below is an example of skipping by 20 results.
Skipping allow us to move through our dataset.
$ db.movies.find( ).sort( { “rating.average”: 1, runtime: -1 } ).skip(20).pretty( )
The limit method allows us to limit the amount of elements the cursor should retrieve at a time and
then also means the amount of documents we can then move through a cursor. Limit allows us to
retrieve a certain amount of documents but only the amount of documents we specify.
$ db.movies.find( ).sort( { “rating.average”: 1, runtime: -1 } ).skip(100).limit(10).pretty( )
We can have the sort, skip and limit methods in any order we want and this will not affect the sorting
as mongoDB will automatically do the actions in the correct order.
114

Diving Into Read Operation

Using Projections to Shape Results
How can we shape the data which we get back from our find into the form we need it? When we use
the find function this retrieves all the data from the retrieved document. This is not only too much
redundant data transferred over the wire but also makes it hard to work with the data if we have to
manually parse all the data. Projection allows us to control which data is returned from our Read
Operation.
Below is an example code for projecting only the name, genre, runtime and rating from the
returned results and all other data on the document does not matter to us.
$ db.movies.find( { }, { name: 1, genre: 1, runtime: 1, rating: 1 } ).pretty( )
In order to perform a projection we need to pass a second argument to the find function. If we do
not want to specify any filter criteria for the first criteria of find, we would simply add an empty
document as seen above. The second argument allows us to configure how values are projected i.e
how we extract the data fields. We name the field and pass the value of 1. All fields that we do not
mention with a 1 or we explicitly add with a 0 will be excluded by default. This will retrieve a
reduced version of the document, but the id will always be included in the results even if we do not
specify it within the projection. If we want to exclude the id we must explicitly exclude it as seen
115

in the below example:
$ db.movies.find( { }, { name: 1, genre: 1, runtime: 1, rating: 1, _id: 0 } ).pretty( )
This will retrieve the data in the projection and explicitly exclude the _id value from the results. This
is only required for the _id filed and all other fields are implicitly set to 0 to exclude by default on
projections.
We are also able to project on embedded documents for example we are only interested in the
time from the schedule embedded document and not the day field. We would use the path
notation to select the embedded document to project as seen below:
$ db.movies.find( { }, { name: 1, genre: 1, runtime: 1, “schedule.time”: 1 } ).pretty( )
Projections can also work with arrays and help with array data that we include in our returned find
query results. For example, if we want to project Drama only from the genres, we would first filter
the data by the criteria of all documents with an array containing Drama in the Genres and then use
the projection argument to display only the Drama from the array:
$ db.movies.find( { genres: “Drama” }, { “genres.$”: 1 } ).pretty( )
The special syntax of $ after the genres will tell mongoDB to project on the one genre it found.
116

Now the document may have had more than Drama within the array in the retrieved document, but
we have told mongoDB to only fetch the Drama and only output that result because that is the only
data we are interested in retrieving. The items behind the scenes may have more data, just as they
have other fields too. However, with the $ syntax will find the first matching document from our
criteria query which in the above scenario was Drama.
If we had a more complex criteria whereby we are searching for all documents with both Drama and
Horror, the $ projection syntax will return Horror because the Horror is the first matching criteria in
the below example. The $all requires a match when Drama is present but the matching is when
Horror is also present and therefore the projection will display Horror as it is technically the first
matching criteria because Drama didn’t match anything.
$ db.movies.find( { genres: { $all: [ “Drama”, “Horror” ] } }, { “genres.$”: 1 } ).pretty( )
There may be cases where we want to project items from an array in our document that are not
items we queried for. In the below example we query to retrieve all documents with the value of
Drama, but we want to project Horror only. Using the $elemMatch operator allows us to do this:
$ db.movies.find( { genres: “Drama” }, { genere: {$elemMatch: { $eq: “Horror” } } } ).pretty( )
The filter criteria and projection can be totally unrelated as seen in the below example:
$ db.movies.find( { “rating.average”: { $gt: 9 } }, { genere: {$elemMatch: { $eq: “Horror” } } } ).pretty( )
117

Diving Into Read Operation

Understanding $slice
The final special projection relating to arrays is called the $slice operator. This operator returns the
first x amount of items from the array. In the below syntax example we are projecting the first 2 array
items within the generes field.
$ db.movies.find( { “rating.average”: { $gt: 9 } }, { genres: { $slice: 2 }, name: 1 } ).pretty( )
The documents may have more genres assigned to them but we only see the first two items in the
array because we used the $slice: 2 to return only 2 items. The number can be any value to return
any number of items from the array. The alternative method of slice is to use an array form.
$ db.movies.find( { “rating.average”: { $gt: 9 } }, { genres: { $slice: [ 1, 2 ] }, name: 1 } ).pretty( )
The first element in the slice array is the amount of elements in the array which we skip e.g. we skip
one item. The second element in the slice array is the amount of data we want to limit it to e.g. we
want to limit it to two items. This will return item 2 and item 3 from the array and skip item 1 when it
projects the results (i.e. we projected the next two items in the array).
We have many ways of controlling what we see using the 1 and 0 for normal fields and using the $,
$elemMatch and $slice to control which element in the array are projected in our results.
118

We have now completely explored how to fully control what we fetch with our filtering criteria and
then control which fields of the found document to include in our displayed result sets using
projections.
Useful Links:
https://docs.mongodb.com/manual/reference/method/db.collection.find/
https://docs.mongodb.com/manual/tutorial/iterate-a-cursor/
https://docs.mongodb.com/manual/reference/operator/query/

119

Diving Into Update Operation

Updating Fields with updateOne(), updateMany()
and $set
An update operation requires two pieces of information (two properties):
1. Identify the document that should be updated/changed (i.e. the filter)
2. Describe how should it be updated/changed
For identifying the document we can use any of the filters we could use for finding documents and
do not necessarily need to use the _id value. Using the _id is a common use for updating
documents as it will guarantee the correct document is being updated.
The updateOne() method simply takes the first document that matches the filter criteria and
updates it even if multiple documents may match the criteria. The updateMany() method on the
other hand will take the criteria/filter and update all documents that match.
$ db.users.updateOne( { _id: ObjectId(“5b9f707ead7”) }, { $set: { hobbies: [ { title: “Sports”,
frequency: 5 }, { “Cooking”, frequency: 3 }, { title: “Hiking”, frequency: 1 } ] } } )
$ db.users.updateMany( { “hobbies.title”: “Sports” }, { $set: { isSporty: true } } )
120

The second argument/parameter is how to update the document and the various update operators
can be found on the official mongoDB documentation:
https://docs.mongodb.com/manual/reference/operator/update/
The $set takes in a document where we describe some fields that should be changed or added to
the existing document. The updateOne example overwrites the existing hobbies array elements
with the new hobbies array elements. The console will provide a feedback when it updates the
document:
$ { “acknowledged” : true, “matchedCount” : 1, “modifiedCount” : 1 }
If we ran the exact updateOne command again the console will still show a matchedCount of 1 but
the modifiedCount will be 0 as no document data would have been modified because it is exactly
the same as the existing value.
$ { “acknowledged” : true, “matchedCount” : 1, “modifiedCount” : 0 }
If we were to find all the documents using the find() command, we would notice the user would still
have the same _id and other document fields. This is because the $set operator does not override
the existing values, instead it simply defines some changes that mongoDB evaluates and then if
they make sense it changes the existing document by adding or overwriting the second argument
fields. All the existing fields are left untouched. The $set operator by default simply adds or edits the
fields specified in the update command.
121

Diving Into Update Operation

Updating Multiple Fields with $set
In the previous section we demonstrated the $set operator used to update one field at a time in a
document. It is important to note that the $set is not limited to updating one field in a document but
can update multiple fields within a document as seen in the below example whereby we add a field
of age and another field of phone:
$ db.users.updateOne( { _id: ObjectId(“5b9f707ead7”) }, { $set: { age: 40, phone: 07933225211 } } )
The console again will output when a document has matched the filter and have been modified
whether the update modification was a overwrite or adding new fields to the matched document.
$ { “acknowledged” : true, “matchedCount” : 1, “modifiedCount” : 1 }
The $set operator can add fields, arrays and embedded documents inside and outside of an array.

Diving Into Update Operation

Incrementing & Decrementing Values
The update operator allows us not only allows us to simply setting some values, but it can also
122

increment or decrement a number for us. For example, if we wanted to update a users age without
using the $set operator as we would not necessarily know ever users age, rather we would want
mongoDB to perform a simple common transformation of taking the current age and then
recalculate the new age and then issue the update request.
MongoDB has a built in operator to allow us to perform common increment and decrement
operations using the $inc operators.
$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: 1 } } )
This will increment the existing age field value by one. Note we could choose a different number
such as 5 and increment the number by 5. To decrement a filed value we would continue to use the
$inc operator but use a negative number instead.
$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: —1 } } )
Note we can perform multiple different things in the same update such as increment certain fields
while setting new fields/editing existing fields, all within the second update parameter.
$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: 1 }, $set: { gender: “M” } } )
If we tried to increment/decrement a value as well set the same field value, this will give us an error:
123

$ db.users.updateOne( { name: “Emanuel” }, { $inc: { age: 1 }, $set: { age: 30 } } )
This would error in the console because updating the path will cause a conflict because we have
two update operations working on the same field and this is not allowed in mongoDB and will fail
providing a message in the shell something like the below:
$ WriteError: Updating the path ‘age’ would create a conflict at ‘age’: …

Diving Into Update Operation

Using $min, $max and $mull Operators
The $min, $max and $mull are other useful operators available to us. Below are scenarios and the
syntax to overcome various update problems.
Scenario 1: We want to set the age to 35 only if the existing age is higher than 35. We would use
the $min operator because this will only change the current value if the new value is lower than
existing value. Note: this will not throw any errors if the existing value is higher than the new value, it
will simply not update the document.
$ db.users.updateOne( { name: “Chris”}, { $min: { age: 35 } } )

124

Scenario 2: We want to set the age to 38 only if the existing age is lower than 38. We would use the
$max operator which is the opposite of the $min operator. Again this will not throw any error for
existing values is lower than the update value as it will simply ignore the update.
$ db.users.updateOne( { name: “Chris”}, { $max: { age: 38 } } )
Important Note: the modifiedCount in the terminal will show as 0 to show if no update occurred.
Scenario 3: We want to multiply the existing value with a multiplier. We would use the $mul
operator which stands for multiply to perform this type of update operation.
$ db.users.updateOne( { name: “Chris”}, { $mull: { score: 1.1 } } )
This will multiply the existing score value by the multiplier of 1.1 to update the score document with
the new value.

Diving Into Update Operation

Getting Rid of Fields
If we want to update records to drop a filed based on a certain criteria(s). There are two solutions to
this problem and below is an example syntax to drop all phone numbers for users that are isSporty.
125

$ db.users.updateMany( { isSporty: true }, { $set: { phone: null } } )
We could use the $set operator to set the phone to null. This will not drop the field but it would
mean that the field has no value which we can use in our application to not display the phone data.
The alternative solution is to use the special operator of $unset to truly get rid of a field from a
document.
$ db.users.updateMany( { isSporty: true }, { $unset: { phone: “” } } )
The value we use with phone (or key field) is totally up to us but typically set to “” which represents
empty. The key would be ignored in the update as the important part of the $unset operator
document is the field name we wish to drop.

Diving Into Update Operation

Renaming Fields
Just as we have the $unset operator to drop a field from a document, we are also able to rename
fields using the $rename operator. We can leave the first update argument empty to target all
documents and update the field name. The $rename document takes in the field name we want to
rename and the key as a string of the new field name value. This will only update all documents that
has a field called age to the new updated field name.
$ db.users.updateMany( { }, { $rename: { age: “totalAge” } } )
126

Diving Into Update Operation

Understanding The Upsert Option.
If we wanted to update some documents but we were uncertain if the document existed or not. For
example we have an application but we did not know if the data was saved in the database yet and
if it wasn’t saved yet, we now want to create a new document and if it did exist we want to overwrite/
update the existing document.
In this scenario if Maria did not exist as a document, nothing will happen other than a message in
the terminal to tell us no document was updated. Instead, we would want mongoDB to create the
document for us instead of having us manually doing this.
$ db.users.updateOne( { name: “Maria” }, { $set: { age: 29, hobbies: [ { title: “Cooking”, frequency: 3 }
], isSporty: true } } )
To allow mongoDB to update or create documents for us, we would pass a third argument option
called upsert and set this to true (by default this is set to false). Users is a combination of update and
insert and will work with both updateOne and updateManu methods. If Maria does not exist it will
create a new document and it will also include the name: “Maria” field automatically for us.
$ db.users.updateOne( { name: “Maria” }, { $set: { age: 29, hobbies: [ { title: “Cooking”, frequency: 3 }
], isSporty: true } }, { upsert: true } )
127

Diving Into Update Operation

Understanding Matched Array Elements.
Example scenario: we want to update all users document where the person has a hobby of sports
and the frequency is greater or equal to three.The hobby field has an array of embedded
documents.
$ db.users.find( { $and: [ { “hobbies.title”: “Sports”, { “hobbies.frequency”: { $gte: 3 } } } ] } ).pretty( )
This syntax will find all users which has hobbies title of Sports and hobbies that as a frequency that is
greater or equal to three. This will find documents even where the Sports frequency is below three
so long as there is another embedded hobbies document which has a frequency greater or equal
to three.
The correct query is to use the $elemMatch operator which will look at the same embedded
document for both criteria:
$ db.users.find( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } } ).pretty( )
Now that we know the correct query to find the documents we wish to update, the question now
becomes how do we update that embedded array document only. So essentially we know which
128

overarching document we want to change but we want to change something exactly within that
document found in the array.
The $set operator, by default will apply the changes to the overall document and not the document
in the array we found. We would use the $set operator and then select the array field and use .$ as
the syntax. This will automatically refer to the element matched in our filter criteria (first argument to
the update command). We can define the new value after the colons.
$ db.users.updateMany( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } }, { $set:
{ “hobbies.$”: { title: “Sports”, frequency: 3 } } } )
Note this will update all matching documents to the updated changes. However, if we only want to
add a new field to all matching documents the syntax would be to add a dot after the .$ as seen
below:
$ db.users.updateMany( { hobbies: {$elemMatch: { title: “Sports”, frequency: { $gte: 3 } } } }, { $set:
{ “hobbies.$.highFrequency”: true } } )
The above syntax will find all documents which match the embedded document criteria and update
that embedded document to add a new field/value of highFrequency: true if the highFrequency
field did not exist (if it did, it would simply update the existing field value). The $set works exactly as
it did before the only difference is adding the special placeholder within the array path to quickly
get access to the matched array element.
129

Diving Into Update Operation

Updating All Array Elements.
Example Scenario: The below find method returns the overall person document where the filter
matched and does not return the document we filtered on only, the filter is just a key factor for
returning to us the overall document.
$ db.users.find( { “hobbies.frequency”: { $gt: 2 } } ) .pretty( )
This will also retrieve a document with a frequency lower than two provided at least one embedded
document has a frequency above two to match the filter. Now that we have found all documents
meeting the criteria but not all array documents fulfilled our filter criteria. However, we want to
change all embedded documents in the hobbies array that did fulfil the filter criteria only.
$ db.users.updateMany( { “hobbies.frequency”: { $gt: 2 } }, { $set: { “hobbies.$.goodFrequency:
true } } )
Again we can use the $set operator, but we want to change all the matched hobbies with a
frequency above two. The “hobbies.$” syntax we saw in the previous section only edits hobby for
each person and if there is multiple matching hobbies per person, it will not update them all but
only the first element within the array.

130

Now in order to update all elements in an array, we would use a special placeholder in mongoDB
that is the .$[] which simply means update all elements. We can use the dot notation after the .$[] to
select a specific field in the array document.
$ db.users.updateMany( { totalAge: { $gt: 30 } }, { $inc: { “hobbies.$[].frequency: -1 } } )
This will update all users that has a totalAge greater than 30 and decrement the hobbies frequency
by 1. The .$[] syntax is used to update all arrays elements.

Diving Into Update Operation

Finding and Updating Specific Fields
Continuing on from the last section, we were able to use the .$[] notation to update all elements
within the array per person. We can now build on this notation to update specific fields and below is
the solution to the previous problem.
$ db.users.updateMany( “hobbies.frequency”: { $gt: 2 } }, { $set: { “hobbies.$[el].goodFrequency”:
true } }, { arrayFilters: [ { “el.frequency”: { $gt: 2 } } ] } )
Note: Within the the el within the square bracket is an identifier which we could have named as
anything. For the third argument in our update method we would use the arrayFilters option to
define the identifier/filter elements. The identifier does not need to be the same as the filter criteria
for example we could filter by age but use the $set to identifier to update based on a
131

complete different filter such as the frequency greater than two. The arrayFilter can have multiple
documents for each identifier.
This would now update specific array elements that meet the identifier arrayFilter criteria. Note that
the third argument of updateOne and updateMany is an options argument whereby we previously
used upsert as an option to update/insert documents. We can also use arrayFilters to provide a filter
criteria for our identifiers.
Note: if an identifier is used we must use arrayFilter option to define the identifier filter criteria
otherwise the update method will fail as mongoDB will not know what to do with the identifier.

Diving Into Update Operation

Adding Elements to Arrays
If we want to add elements onto an array for a document instead if using $set operator (which we
can still use to update fields), we can use $push to push a new element onto the array. The $push
operator takes in a document where we describe firstly the array we wish to push to and then the
element we want to push/add to the existing array documents.
$ db.users.updateOne( { name: “Maria” }, { $push: { hobbies: { title: “Sports”, frequency: 2 } } } })
132

The $push operator can be used with more than one document to be added. We use the $each
operator which takes in an array of multiple documents that should be added/pushed.
$ db.users.updateOne( { name: “Maria” }, { $set: { $push: { hobbies: { $each: [ { title: “Running”,
frequency: 1 }, { title: “Hiking”, frequency: 2 } ] } } } )
There are two options sibling options we can add to the above $each syntax. The first is the $sort
operator. Technically, we are still in the same object where we have the $each operator i.e. it is a
sibling to $each. The sort describes how the elements in the array should be sorted before they are
pushed into hobbies.
$ db.users.updateOne( { name: “Maria” }, { $set: { $push: { hobbies: { $each: [ { title: “Running”,
frequency: 1 }, { title: “Hiking”, frequency: 2 } ], $sort: { frequency: -1 } } } } )
This will sort the hobbies array by frequency in a descending order i.e. having the highest frequency
first.
The second sibling is the $slice operator which allows us to add only a certain number of element.
We can use this in conjunction with the $sort operator as seen below.
$ db.users.updateOne( { name: “Maria” }, { $set: { $push: { hobbies: { $each: [ { title: “Running”,
frequency: 1 }, { title: “Hiking”, frequency: 2 } ], $sort: { frequency: -1 }, $slice: 1 } } } )
In this example the slice is taking only the first element after the sort to push to the hobbies array.
The sort is on the overall array i.e. new and existing elements and not just the elements we add.
133

Diving Into Update Operation

Removing Elements from Arrays
Not only are we able to push elements to an array but we are also able to pull elements from an
array using the $pull operator, as demonstrated below.
$ db.users.updateOne( { name: “Maria” }, { $pull: { hobbies: { title: “Hiking” } } } )
The $pull operator takes in a document where we describe what we want to pull from the array. In
the above we are pulling from the hobbies array based on a condition i.e. pull every element where
the title is equal to Hiking. We do not need to only use equality conditions but we can also use all
the normal filter operators that we have seen before such as the greater than or less than operator.
Sometimes we may wish to remove the last element from an array and have no specific filter criteria.
We would use the $pop operator and use a document to define the name of the field of which we
want to pop. The -1 defines to pop the first element and the 1 defines to pop the last element from
the array.
$ db.users.updateOne( { name: “Chris” }, { $pop: hobbies: 1 } )

134

Diving Into Update Operation

Understanding the $addToSet Operator
The final update command we will explore is the $addToSet operator.
$ db.users.updateOne( { name: “Maria” }, { $addToSet: { hobbies: { title: “Hiking”, frequency: 2 } } } })
The difference between $addToSet and $push operator, the $push operator allows us to push
duplicate values whereas the $addToSet does not allow for this. It is important to note the console
will not error but simply show that no document was updated with the change. Always remember
that $addToSet operator adds unique values only.
This concludes the Update Operations available to us in mongoDB. We now understand the three
arguments we can pass to both the updateOne and updateMany commands which are:
1. Specify a filter (query selector) using the same operators we know from find() command.
2. Describe the updates via $set or other update operators.
3. Additional Options e.g. $upsert or $arrayFilters to the update operation.
In the official documentation we can view all the various operations available to us:
https://docs.mongodb.com/manual/tutorial/update-documents/
135

Diving Into Delete Operation

Understanding deleteOne() and deleteMany()
To delete a single document from a collection we would use the deleteOne() command. We need
to specify a query selector/filter. The filter we specify here is exactly the same as we would use for
the the finding and updating documents. We simply need to narrow down the document we want
to delete. DeleteOne will delete the first document that matches the criteria.
$ db.users.deleteOne( { name: “Chris” } )
We can use the deleteMany() command to delete all documents where the query selector/filter
criteria has been met. Below are two examples.
$ db.users.deleteMany( { totalAge: {$gt: 30}, isSporty: true } )
$ db.users.deleteMany( { totalAge: {$exists: false}, isSporty: true } )
Note: we can add as many query selectors as we want to narrow down the document(s) we wish to
delete from the database.

136

Diving Into Delete Operation

Deleting All Entries in a Collection
There are two approaches to deleting all entries in a collection. The first method is to reach out to
the collection and execute the .deleteMany( ) command and pass and empty document as the
argument. This argument is a filter that matches every document in the collection and therefore will
delete all entries within the collection.
$ db.users.deleteMany( { } )
The alternative approach is to delete the entire collection using the .drop( ) command on the
specified collection. This will return true if successfully dropped a collection.
$ db.users.drop( )
When creating application it is very unlikely we would drop collections. Adding and dropping
collections is more of a system admin task. We can also drop an entire database using
dropDatabase command. We would then use the use command followed by the database to
navigate to the desired database collection.
$ db.dropDatabase( )
https://docs.mongodb.com/manual/tutorial/remove-documents/
137

Working with Indexes

What are Indexes and why do we use them?
An index can speed up our find, update or delete queries i.e. all the queries where we are looking
for certain documents that should match some criteria.
$ db.products.find( { seller: “Marlene” } )
If we take a look at this find query, we have a collection of documents called products and we are
searching for a seller called Abbey. Now by default if we don’t have an index on the seller set,
mongoDB will go ahead and do a so-called collection scan. This simply means that mongoDB to
fulfil this query will go through the entire collection, look at every single document and see if the
seller equals “Marlene” (equality). As we can imagine, for a very large collection with thousands or
millions of document, this can take a while to complete. This is the default or only approach
mongoDB can take when there are no indexes setup in order to retrieve maybe two documents out
of the thousands of documents in the collection.
We can create an index and an index is not a replacement for a collection but rather an addition. We
would create an index for the seller key of the product collection and that index then exists
additionally to the collection and the index is essentially an ordered list of all the values that are
stored in the seller key for all the documents.
138

It is not an ordered list of the documents, it is just the values for the field for which we created that
index. Also it is not just an ordered list of the values, every value/item in the index has a pointer to
the full document it belongs to.
This allows mongoDB to perform a so-called index scan to fulfil this query. This means mongoDB
will see that for seller, such an index exists and it therefore simply goes to that seller index and can
quickly jump to the right values because unlike for the collection, it knows the values are sorted by
that key. This means if we are searching for a seller starting with M, it does not need to search
through the first few records. This allows mongoDB very efficiently go through that index and find
the matching product because of the ordering and the pointer that every items within the index has.
So mongoDB finds the value for the query and then finds the related document to return.
This is how an index works in mongoDB and also answers why we would use indexes because
creating indexes drastically speeds up our queries. However, we also should not overdo it with
indexes. Lets take the example of a Products collection which has a _id, name, age and hobbies
fields. We could create a index for all four fields and we would have the best performance because
no matter what we look for, we have an index and can query for every field efficiently which will
speed our find queries. Having said this, index does not come without cost. We would have to pay
some performance cost on inserts because that extra index that has to be maintained would need
to be updated with every inserts. This is because we have an ordered list of elements with pointers
139

to the documents. So if we add a new document, we also have to add a new element to the index.
This may sound simple and it would not take super long, but if we have 10 indexes for our
document in our collection, we would have to update all 10 indexes for every insert. We may then
quickly run into some issues because we will have to do a lot of work for all these fields for every
insert and for every update too. Therefore, indexes do not come for free and we have to figure out
which indexes makes sense to have and which indexes don’t.
We are now going to explore indexes in more detail and look at all the type of indexes that exist in
mongoDB and how to measure whether an index makes sense or does not make sense to have.

Working with Indexes

Adding a Single Field Index?
To determine whether an index can help us in our find query, mongoDB provides us with a nice tool
that we can use to analyse how it executed the query. We can chain the explain method to our
normal query. This method works with find, update and delete commands but not for inserts (i.e. it
works for methods that narrow down documents.
$ db.contacts.explain( ).find( { “dob.age”: { $gt: 60 } } )
This will provide a detailed description of what mongoDB did to derive to the results.
140

MongoDB thinks in the so-called plans and plans are alternatives it considers for executing the
query and in the end it will find the winning plan. The winning plan is simply what mongoDB did to
get to our results. Without indexes, a full scan is always the only thing mongoDB can do. However, if
there were alternatives and they were rejected then this will appear in the rejectedPlans array.
To get a more detailed report we can run the same command but passing in an argument:
$ db.contacts.explain( “executionStats” ).find( { “dob.age”: { $gt: 60 } } )
The executionStats provides a detailed output for our query and how the results were returned. This
will show things such as executionTimeMillis which is the time it took to execute the query in
milliseconds and totalDocsExamined which shows the number of documents that needed to be
scanned in order to return our query results. The larger the gap between the totalDocsExamined
and nReturned values, this shows how inefficient the query is.
To add an index on a collection, we would use the createIndex method passing in a document. The
first value (key) in the document is the name of the field we want to create a index on (this can be on
top level fields as well as embedded field names). The value is whether mongoDB should create a
list of values in the field in an ascending (1) or descending (-1) order.
$ db.contacts.createIndex( { “dob.age”: 1 } )

141

Once we run the command we should see in the terminal that the index has been created.
{
“createdCollectionAutomatically” : false
“numIndexesBefore” : 1
“numIndexesAfter” : 2
“ok” : 1
}
If we were to run the above explain command again on our collection, we should notice the
executionTimeMillis for the same query has been sped up. We should also see two execution stages
the first being an index scan (IXSCAN). The index scan does not return the documents but the keys/
pointers to the document. The second stage is the fetch (FETCH) which will take the pointers
returned from the index scan and reach out to the actual collection and then fetch the real
documents. We would notice that mongoDB would only have to look at a reduced number of
documents to return the documents from our query.
This is how an index can help us to speed up our queried searches and how to use the explain
method to determine whether an index should be used in our collection to speed up the query.

142

Working with Indexes

Indexes Behind the Scenes
What does createIndex( ) method do in detail?
Whilst we can't really see the index, we can think of the index as a simple list of values and pointers
to the original document.
Something like this (for the "age" field):
(29, "address in memory/ collection a1")
(30, "address in memory/ collection a2")
(33, "address in memory/ collection a3”)
The documents in the collection would be at the "addresses" a1, a2 and a3. The order does not
have to match the order in the index (and most likely, it indeed won’t).
The important thing is that the index items are ordered (ascending or descending - depending on
how we created the index). The syntax of createIndex({age: 1}) creates an index with ascending
sorting while the syntax of createIndex({age: -1}) creates an index with descending sorting.

143

MongoDB is now able to quickly find a fitting document when we filter for its age as it has a sorted
list. Sorted lists are way quicker to search because we can skip entire ranges (and don't have to look
at every single document).
Additionally, sorting (via sort(...)) will also be sped up because you already have a sorted list. Of
course this is only true when sorting for the age.

Working with Indexes

Understanding Index Restriction
In the previous section we created an index which sped up our query when looking for people with
an age greater than 60. However, if we run the same query but find people older than 20, we will
notice that the execution time is higher than it was for people above the age of 60.
To drop an index from our collection we would use the dropIndex method and pass in the
document that we created to create the index.
$ db.contacts.dropIndex( { “dob.age”: 1 } )
If we were to run a full scan against our collection, we will notice that the query is much faster than
144

having an index. The reason for why the query is much faster is because we have saved a step from
going through the index. If we have a query that will return a large portion or the majority of our
documents, an index can actually be slower because we have an extra step to go through almost
the entire index list and we then have to go to the collection and get all these documents. If we do a
full scan, we do not have this extra step of going through the collection to get the documents
because with a full collection scan we already have all the documents in memory and an index
doesn’t offer us any more because it will only be an extra step.
Important note: if we have queries that regularly return the majority or all of our documents, an
index will not help us and it might even slow down the execution. This is the first important note to
keep in mind (a first restriction) when planning our queries and whether or not to use indexes.
If we have a dataset where our queries typically return a fraction like 20% or lower than that of the
documents, then indexes will certainly always speed up our queries. If we have a lot of queries that
give us back all the documents or close to all the documents, then indexes can not do much for us.
The whole point of indexes is to quickly get to a narrow subset of our document list and return the
documents from that index.

145

Working with Indexes

Creating Compound Indexes
Not only can be have indexes on fields that have number values but we can also have indexes on
fields that have text values (both can be sorted). We cannot create indexes for booleans as we only
have two kind of values i.e. true and false and the change of index on booleans will not speed up
our queries. Below is an example of creating a index on a text field.
$ db.contacts.createIndex( { gender: 1 } )
Now the above index would not make too much sense for an index because gender has two values
of Male and Female and would probably return more than half the results. However, if we want to to
find as an example all people who are older than 30 and are male, we can create a so called
compound index.
$ db.contacts.createIndex( { “dob.age”: 1, gender: 1 } )
The order of the two fields in our createIndex method do matter because a compound index simply
is an index with more than one field. This will store one index where each entry in the index is now
not on a single value but on two combined values. This does not create two indexes and this is really
important to note with compound indexes, it creates one index where every element is a connected
value (it creates a pair value for example in the above this is a pair of the age and gender values).
146

The order of the fields defines which kind of pairs mongoDB will create in our compound index (for
example does mongoDB create a 31 male index or a male 31 index — this will be important for our
queries).
There are two queries we can now run which will take advantage of the compound index. The first is
to find based on age and gender:
$ db.contacts.explain( ).find( { “dob.age”: 35, gender: “male” } )
This will perform an index scan with our index name (the index name is auto-generated e.g.
“indexName” : “dob.age_1_gender_1”)
The second query that can utilise the compound index is a query on the age only:
$ db.contacts.explain( ).find( { “dob.age”: 35 } )
This will also use the same compound index we created for the index scan even though we never
specified to search for the gender. Compound indexes can be used from left to right, but the left
must always be used in the search i.e. a find query on the gender alone will not work.
$ db.contacts.explain( ).find( { gender: “male” } )
The above query would use a full collection scan and not the index scan using the compound index.
147

The compound indexes are grouped together, the first field (left) will be ordered whereas the other
fields (right) will not be ordered. We can have a compound index with more than 2 fields but up to a
maximum of 31 field. However, we cannot utilise the compound index without the first field.
These are the restrictions we have on compound indexes but compound indexes allows us to speed
up queries that uses multiple values.

Working with Indexes

Using Indexes for Sorting
Now that we have had a look at the basics of indexes, it is important to know that indexes are not
only used for narrowing our find queries but they can also help with sorting. Now that we have a
sorted list of elements of the index, mongoDB can utilise this in case we want to sort in the same
way that the index list is sorted.
$ db.contacts.explain( ).find( { “dob.age”: 35 } ).sort( { gender: 1 } )
In the above we can find people with the age of 35 but sort them by gender in an ascending order.
We will notice that this will use an index scan for both the gender and age even though we filtered
by age only. It uses the gender information for the sorting. Since we already have an ordered list of
values, mongoDB can utilise this to quickly give back the order of documents we need.
148

It is important to understand that if we are not using indexes and we do a sort on a large amount of
documents, we can actually timeout because mongoDB has a threshold of 32mb in memory for
sorting. If we have no index, mongoDB will essentially fetch all our documents into memory and do
the sort there and for very large collections and large amount of fetched documents, this can be too
much to then sort.
Sometimes we would need indexes not only to speed up the query but also to be able to sort at all.
This is not a problem for small dataset but where we fetch so many documents that an in-memory
sort which is the default is just not possible and we then need an index which is already sorted so
that mongoDB does not have to sort in memory but can take the order we have in the index.
Important Note: mongoDB has a threshold of 32mb which it reserves in memory for the fetched
document and sorting them. This is the second important note to keep in mind as to whether or not
to create an index.

149

Working with Indexes

Understanding the Default Index
When creating an index it would seem like that there is an index already existing in our collection.
To be able to see all indexes that exists for a collection we can use the getIndexes command.
$ db.contacts.getIndexes( )
This command will print all the indexes we have on that collection within the shell. We will notice, if
we have created new indexes on our collection that there are two indexes. The first index on the _id
field is a default index mongoDB maintains for us. The second index are the indexes that we have
created.
The default index for _id is created and painted on every collections by mongoDB automatically.
This means if we are filtering for _id or sorting by _id which is then the default sort order or order by
which the document are fetched, mongoDB is utilising the index for that at least.

150

Working with Indexes

Configuring Indexes
The _id index that we get out of the box for this field is actually unique by default. This is a setting
mongoDB gives us to ensure that we cannot add another document with the same value in the
same collection. There are use cases where we also need that behaviour for a different field and
therefore we can add our own unique indexes.
For example, if we wanted email to be a unique index. We would create an index on the email field
and then pass in a second argument to the createIndex command. The second argument allows us
to configure the index — this is where we can set the unique option to true.
$ db.contacts.createIndex( { email: 1 }, { unique: true } )
If we execute this command, we may receive a duplicate key error collection if we already have have
duplicate values within our collection. This will also show the document(s) where the duplicate key
field exists. This is an advantage of the unique index because we would get such a warning if we try
to add it or we already have it in place and tried to add a document with a value that already
existed. Unique indexes can help us as developers to ensure data consistency and avoid duplicate
data for fields that we need to have as unique. This index is not only useful to speed up our find
queries but also to guarantee that we have unique values for that given field in that collection.
151

Working with Indexes

Understanding Partial Filters
Another interesting kind of configuration for a filter is setting up a so-called partial filter.
For example, if we were creating an application for calculating what someone will get once they
retire — we would typically only look for a person older than 60. Having an index on the dob.age
field might make sense. The problem of course is that we have a lot of values in our index that we
never actually query for. Now the index will still be efficient but it will be unnecessarily big and an
index eats up size on our disk. Additionally, the bigger the index is, the more performance certain
queries will take nonetheless.
Is we know certain values will not be looked at or only very rarely and we would be fine using a full
collection scan, we can actually create a partial index where we only add the values we are regularly
going to look at.
$ db.contacts.createIndex( { “dob.age”: 1 }, { partialFilterExpression: { “dob.age”: { $gt: 60 } } } )
$ db.contacts.createIndex( { “dob.age”: 1 }, { partialFilterExpression: { gender: “male” } } )
We can add this option to compound indexes as well. In the partialFilterExpression, we define which
field will narrow down the set of values we want to add (we can use a totally different field e.g.
gender). We can use all the equality expression we have previously seen e.g. $gt, $lt, $exist etc.
152

The second expression would create a index on the age but only for elements where the underlying
document is for a male while the first will only create a index on age but for elements where the
underlying document is for person older than 60.
If we only created a partial index using the second example and performed the below query, we
would notice that mongoDB will perform a full collection scan and will ignore the partial index. This
is because mongoDB determined that yes we are looking for a field that is part of the index (age)
but since we did not search for gender in our query, it considered the partial index too risky use and
mongoDB as a top priority ensures that we do not lose any data. therefore, the results we receive
back will also include documents that has a gender of female and not male only because it
performed a full collection scan and not a partial index scan.
$ db.contacts.explain( ).find( { “dob.age”: { $gt: 60 } } )
In order for mongoDB to use the partial index we must also filter by gender:
$ db.contacts.explain( ).find( { “dob.age”: { $gt: 60 }, gender: “male” } )
The difference between a partial index and a compound index, for partial indexes the overall index
is smaller. In the above example only the ages of males are stored and female keys are not sorted in
the index and therefore the index size is smaller leading to a lower impact on our hard drive. Also
our right queries are sped up because if we insert a female, that will never have to be added to the
153

index. This still make a lot of sense if we often filter for this type of combination i.e. for the age and
then only males — a partial index can make a lot of sense if we rarely look for the other result i.e. we
rarely look for women.
Whenever mongodb has the impression that our find request would yield more than what's in our
index, it will not use that index but if we typically run queries where we are within our index (filtered
or partial index) then mongodb will take advantage of it. We would then benefit from having a
smaller index and having less impact with writes.
So again it depends on the application we're writing and whether we often just need a subset
or whether we typically need to be able to query everything, in which case a partial index won't
make much sense.

Working with Indexes

Applying the Partial Index
An interesting variation or use case of the partial index can be seen in conjunction with a unique
index. Below is a example demo of this use case.
$ db.users.insertMany( [ { name: “Abel”, email: “abel@email.com” }, { name: “Diane” } ] )
154

We have collection of users but not all documents has an email field (only Abel has an email). We
can try to create an index on our email field within the users collection.
$ db.users.createIndex( { email: 1 } )
The above will successfully create an index in an ascending order on our email field within our users
collection. If we drop this index and then create a new index using the unique option.
$ db.users.dropIndex( { email: 1 } )
$ db.users.createIndex( { email: 1 }, { unique: true } )
The above will successfully create a unique index in an ascending order on our email field within
our users collection. If we now try to insert some new document without an email:
$ db.users.insertOne( { name: “Anna” } )
We would now see a duplicate key error because the non-existing email for which we have an index
is treated as a duplicate key because now we have a no email value stored twice. This is an
interesting behaviour we need to be aware of. MongoDB treats non-existing values still as values in
our index i.e. it stores as a null value and therefore if we have two documents with null values for an
indexed field and that index is unique, we will get this error.

155

Now if we have a use case where we want to create a unique index on a field and it is ok for that
field to have a null value, then we would have to create the index slightly different.
$ db.users.dropIndex( { email: 1 }, { unique: true } )
$ db.users.dropIndex( { email: 1 }, { unique: true, partialFilterExpression: { email: { $exists: true } } } )
In the above, we now use the partialFilterExpression as a second option along with the unique
option. The partialFilterExpression of $exists: true on email lets mongoDB know that we only want
to add elements into our index where the email field exists. This will avoid the case of having a clash
with our unique option. Therefore, if we now try to run the below insert command (the same as
before) this will now work and we will not see any errors.
$ db.users.insertOne( { name: “Anna” } )
We use the combination of unique and partialFilterExpression to not index fields where no value or
where the entire field does not exist and this allows us to continue to use the unique option on that
field.

156

Working with Indexes

Understanding Time-To-Live (TTL) Index
The last interesting index option is the Time-To-Live (TTL) index. This type of index can be very
helpful for a lot of applications where we have self-destroying data for example a session of a user
where we want to clear their data after some duration or anything similar to that nature. Below is an
example of a TTL index:
$ db.sessions.insertOne( { data: “randomText”, createdAt: new Date( ) } )
$ db.sessions.fid( ).pretty( )
The seasons data will receive a random string data and the createdAt will be a data stamp of the
current date. The new Date will provide a ISODate for us, for example:
ISODate(“2019-03-31T19:52:24.272Z”)
To create a TTL index for our sessions collection we would use the expireAfterSeconds option. The
TTL option below is created on the createdAt field.
$ db.sessions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 10 } )
This is a special feature mongoDB offers and will only work on date fields/indexes. We could add
157

this to other field types (i.e. numbers, texts, booleans etc) but this would simply be ignored. In the
above we have set the expireAfterSeconds at 10 seconds. It is important to note that the index does
not delete elements in hindsight i.e. elements already existing before the TTL index was created. If
we were to now insert a new element within this collection using the below, we would notice after
10 seconds both element will now be deleted.
$ db.sessions.find( ).pretty( )
Adding a new element to the collection will trigger mongoDB to re-evaluate the entire collection
after 10 seconds which will include the existing elements, to see whether the createdAt field which
is indexed has fulfilled the expireAfterSeconds criteria (i.e. only being valid for 10 seconds).
This can be very useful because it allows us to maintain a collection of documents which destroy
themselves after a certain time span. This can be very helpful for many applications for example
session data for users on our web app or maybe an online shop where we want to clear a cart after
one day etc. Whenever we have a use case where data should clean up itself, we do not need to
write a complex script for that as we can use TTL index expiryAfterSeconds option.
It is important to note that we can only use this option on single field indexes that are date objects
and it does not work on compound indexes.

158

Working with Indexes

Query Diagnosis & Query Planning
Now that we have looked at what indexes do and how we can create our own indexes, it is
important to keep playing around with it to get a better understanding of the different options and
how indexes work. In order to play around and understand if an index is worth the effort, we need to
know how to diagnose our queries. We have already seen the explain( ) method for this.

explain( )

“queryPlanner”

Show summary for
executed query and
winning plan

159

“executionStats”

“allPlansExecuted”

Show detailed summary
for executed query,
winning plan and
possibly rejected plans.

Show detailed summary
for executed query,
winning plan and
winning plan decision
process.

It is important to note we can execute the explain as is or pass in queryPlanner as an argument to
get the default minimal output where it tells us the winning plan and nothing much else. We can
also use executionStats as an argument to see a detailed summary output and see information
about the winning plan and possibly the rejected plan as well as how long it took to execute the
query. Finally, there is also the allPlansExecuted argument which shows a detailed summary and
information on how the winning plan was chosen.
To determine whether a query is efficient, it is obvious to look at the milliseconds process time to
compare the solution with/without an index i.e. does index scan beat the collection scan. Another
important measure is to compare the number of keys in the index and how many documents
examined and how many documents are returned.

Milliseconds Process Time
IXSCAN typically beats COLLSCAN
Should be as close
as possible or # of

# of Keys (in Index) Examined

Documents should
be 0

Covered Query!
Should be as close

# of Documents Examined

as possible or # of
Documents should

# of Documents Returned

160

be 0

Working with Indexes

Understanding Covered Queries
We can reach a so-called covered query if we only return fields which are also the indexed fields in
which case the query does not examine any documents because it can do this entirely from inside
the index. We will not always be able to reach this state but if we can optimise our index to reach the
covered query state (as the name suggests the query is fully covered by the index) then we have of
course have a very efficient query because we have skipped the stage of reaching out to the
collection to get the documents which will obviously speed up our query and have a very fast
solution.
If we have an opportunity and have a query that we typically run an store fields, it might be worth
storing them in a single field or if it’s two fields, to store them in a compound index so that we can
fully cover the query from inside of our index.
Below is an example of a Covered Query - using a projection to only return the name in our query:
$ db.customers.insertMany( [ { name: “Abbey”, age: 29, salary: 30000 }, { name: “Bill”, age: 20, salary:
18000 } ] )
$ db.customers.createIndex( { name: 1 } )
$ db.customers.explain(“executionStats”).find( { name: “Bill” }, { _id: 0, name: 1 } )
161

Working with Indexes

How MongoDB Rejects a Plan
To understand how mongoDB rejects a plan we will use the customers collection example from the
section. In the customer collection we have two indexes, the standard _id index and our own name
index. We will now add a compound index on the customers collection which will create an index
on age in ascending order and the name as seen below:
$ db.customers.createIndex( { age: 1, name: 1 } )
We now have three indexes on our customer collection. We can now query our collection and use
the explain method as see how mongoDB rejects a plan.
$ db.customers.explain( ).find( { name: “Abbey”, age: 29 } )
We will notice the winningPlan will be a IXSCAN using the compound age_1name_1 index. We
should also now see a rejectedPlan which was the IXSCAN on the single field name_1 index.
MongoDB considered both indexes because the query on the name field fits both indexes. This is
interesting to know which indexes was rejected and which one was considered the winningPlan.
The question now is how exactly does mongoDB figure out which plan is better?
MongoDB uses an approach where it simply, first of all, looks for indexes that could help with the
162

query at hand. Since our find query includes a look for the field name, mongoDB automatically
derived that both the single field index and compound index could help. In this scenario we only
have two approaches but for other scenarios we may have even more approaches. Hypothetically,
lets say we had three approaches to our find query, mongoDB then simply lets those approaches
race against each other but not for the full dataset. It sets a certain winning condition e.g. against
100 documents. So it looks at which approach is the first to find 100 documents, and whichever
approach is first, mongoDB will then use that approach for the real query.
This would be cumbersome if mongoDB would have to do this for every find query we send to the
database because it would obviously cost a little bit of performance. Therefore, mongoDB caches
this winningPlan for this type of query. For future queries that are looking exactly equal, it uses this
winningPlan and for future queries that look different i.e. uses different values or different keys,
mongoDB will race the approaches again and find a winning plan for that type of query.

Winning Plan
Approach 1

Approach 2

Approach 3

Cached

Cache
163

This cache is not there forever and is cleared after a certain amount of inserts or a database restart.
To be precise, instead of being stored forever, the winningPlan is removed from cache after we :
a. We wrote a certain amount of documents to that collection because mongoDB will say it does
not know if the current winningPlan will still win because the collection has changed a lot and it
should then reconsider.
b. If we rebuilt the index i.e. we dropped and recreated the index.
c. If we add other indexes because the new index could be better.
d. If we restart the mongoDB server.
This is how mongoDB derives the winningPlan and how it stores it in cache memory.

Stored Forever?

Write Threshold
(currently 1000)

164

Index Rebuilt

Other Indexes
are Added or
Removed

Other Indexes
are Added or
Removed

This is interesting for us as a developer to regularly check our queries (our find, update or delete
queries) and see what mongodb actually does, if it uses indexes efficiently, if maybe a new index
should be added (something we can do on your own if we own the database instead we can pass
that information to your db administrator) or if we maybe need to adjust the query.
Maybe we're always fetching data that we do not really need and we could use a covered query if
we just would project the data we need which happens to be the data stored in the index.
This is why, as a developer, we need to know how indexes work because either we need to create
them on our own in our next project on which we work alone or because we can optimise our
queries or tell the db administrator to optimise the indexes.
The last level of verbosity that the explain method offers to us is the allPlansExecution:
$ db.customers.explain(“allPlansExecution”).find( { name: “Abbey”, age: 29 } )
What this will do, it will provide a bunch of output with detailed statistics for all plans including the
rejected plans. We can therefore see in detail how an index scan on our compound index perform
as well as how it would perform on any other indexes. With this option, we can get detailed analytics
on different indexes & queries and the possible ways of running our query. We should now have all
the tools we need to optimise our queries and our indexes.

165

Working with Indexes

Using Multi-Key Indexes
We are now going to explore two new type of indexes and the first one is called a multi-key index.
$ db.contacts.drop( )
$ db.contacts.insertOne( { name: “Max”, hobbies: [ “Cooking”, “Football” ], addresses: [ { street: “First
Street” }, { street: “Second Street” } ] } )
In mongoDB it is also possible to index arrays as seen below:
$ db.contacts.createIndex( { hobbies: 1 } )
$ db.contacts.find( { hobbies: “Football” } ).pretty( )
If we explain the above findQuery to see how mongoDB arrived at the winningPlan using the
executionStats command, we will notice that mongoDB used the index scan and the isMultiKey set
to true for the hobbies index.
$ db.contacts.explain(“executionStats”).find( { hobbies: “Football” } ).pretty( )
MongoDB treats index on arrays as a multi-key index because it is an index on an array of values.
Multi-key indexes technically work like regular indexes but are stored slightly differently.
166

MongoDB pulls out all the values in our index key i.e. hobbies from the above case and stores them
as separate elements in an index. This will mean that multi-key indexes for a lot of documents are
larger than single field indexes. For example, if every document has an array with four values on
average and we have a thousand documents and we indexed that array field, we would store four
thousand elements (4 x 1,000 = 4,000). This is something to keep in mind, multi-key are possible
but are also bigger, this does not mean we shouldn’t use them.
$ db.contacts.createIndex( { addresses: 1 } )
$ db.contacts.explain(“executionStats”).find( { “addresses.street”: “First Street” } )
We will notice with the above, we can create an index on the addresses array, however, when we
explore the fir query we will notice mongoDB would use a collection scan and not the index. The
reason for this is because our index holds the whole document and not the fields of the documents.
MongoDB does not go so far to pull out the elements of an array and then pull out all field values of
a nested document that array might hold. If we were looking for the addresses where the street is
First Street, then we would see mongoDB using the index scan because it is the whole document
which is in our index.
$ db.contacts.explain(“executionStats”).find( { addresses: { street: “First Street” } } )
MongoDB pulls out elements of the array for addresses as single elements which happens to be a
document, so that document is what mongoDB pulled out and then stored in the index registry. This
is something to be aware of with multi-key indexes.
167

Note that what we can do is to create an index on address.street as seen below. This will also be a
multi-key index and if we try the earlier find query now on the address.street, we would notice that
mongoDB would now use an index scan on the multi-key index.
$ db.contacts.createIndex( { “addresses.street”: 1 } )
$ db.contacts.explain(“executionStats”).find( { “addresses.street”: “First Street” } )
We can therefore use an index on a field in an embedded document which is part of an array with
the multi-key feature. We must be aware though that using the multi-key index feature on a single
collection will quickly lead to some performance issue with writes because for every new document
we add, all these multi-key indexes have to be updated. If we add a new document with 10 values in
that array which we happen to store in a multi-key index, then these 10 new entries need to be
added to the index registry. If we then have four or five of these multi-key indexes per document we
would then quickly end up in a low performance world.
Multi-key indexes are helpful if we have queries that regularly target array values or even nested
values or values in an embedded document in arrays.
We are able to create an index whereby we have a multi-key index that we add as part of a
compound index which is possible as seen below:
$ db.contacts.createIndex( { name: 1, hobbies: 1 } )
168

However, there is one important restriction to be aware of and that is a compound index made up
of two or more multi-key indexes will not work, for example the below:
$ db.contacts.createIndex( { addresses: 1, hobbies: 1 } )
We cannot index parallel arrays because mongoDB would have to store the cartesian product of the
values of both indexes, of both arrays, so it would have to pull out all the addresses and for every
address it would have to store all the hobbies. So if we have two addresses and five hobbies, we
would have to store ten values and this would become worse the more values we have addresses,
which is why this is not possible.
Compound indexes with multi-key indexes are possible but only with one multi-key index i.e with
one array and not multiple arrays. We can however have multiple multi-key indexes in separate
indexes but in one and the same index only one array can be included.

Working with Indexes

Understanding Text Indexes
There is a special kind of multi-key indexes which is a text index. Lets take the below text as an
example which could be stored in a field in our document as some kind of product description.
169

This product is a must-buy for all fans of modern fiction!

Text Index
product

must

buy

fans

modern

fiction

If we want to search for the above text, we have previously seen that we could use regex operator.
However, regex is not a really great way of searching text as it offers very low performance. A better
method is to use a text index which is a special kind of index that is supported by mongoDB which
will essentially turn the text into an array of single words and will store it as such. A extra thing
mongoDB does for us is that it removes all the stop words and it stems all words so that we have an
array of keywords and words such as “is” or “a” etc. are not stored because they are not typically
something we would search on as they would appear all over the place. The keywords are what
matters for text searches.
Using the below example of a products collection, we will explore the syntax to setup a text index:
$ db.products.insertMany( [ { title: “A Book”, description: “This is an amazing book about a young
explorer!” }, { title: “”Red T-Shirt”, description: “This T-Shirt is red and it’s pretty amazing.” } ] )
170

$ db.products.createIndex( { description: 1 } )
$ db.products.createIndex( { description: “text” } )
We would create the index the same as we would do for any other indexes, however, the important
distinction is that we do not add the 1 or -1 for ascending/descending. We could add this but then
the index will be a single field index and we can search exactly for the whole text to utilise this index
but we cannot search for the individual key words. Instead, we would add the special “text” keyword
which will let mongoDB know to create the text index by removing all the stop words and store the
keywords in an array.
When performing the find command, we can now use the $text and $search keys to search for the
keyword. The casing is not important as every keyword is stored as lowercase.
$ db.products.find( { $text: { $search: “amazing” } } )
We do not specify the field in which we we want to search on because we are only allowed to have
one text index per collection because text indexes are very expensive especially if we had a lot of
long text that has to be split up, we do not want to do this for example ten times per collection.
Therefore, we only have one text index where the $search can look into.
We can actually merge multiple fields into one text index and we will then look through them
automatically which we will see in the later section.

171

Note if we look for the keyword “red book” this will find both documents as it treats each word as
individual keywords and will search all documents which has red and all documents which has
book. If we want to specifically want to find the word red book which is treated as one keyword then
we would have to wrap the text in double quotes like so:
$ db.products.find( { $text: { $search: “\”red book”\” } } )
$ db.products.find( { $text: { $search: “\”amazing book”\” } } )
Because we are already in double quotes, we would need to add a backward slash at the beginning
and end of the phrase to escape them. This will not find anything in the collection because we do
not have a red book phrase anywhere in our text (“amazing book” would work though).
Text indexes are very powerful and much faster than regular expressions and this is definitely the
way to go if we need to look for keywords in text.

Working with Indexes

Text Indexes and Sorting
If we want to find texts from a text index, however, we would want to order the returned documents
where the closest matches are at the top, this is possible in mongoDB.
172

For example, if we want to search for “amazing t-shirt”, this will return both documents because the
amazing keyword exists in both document. However, we would rather have the t-shirt product
appear before the book because it is the better match as it has both keywords in the description.
$ db.products.find( { $text: { $search: ”amazing t-shirt” } } ).pretty( )
MongoDB does something special when managing/searching text indexes — we can find out how it
scores its results. If we use projection as the second argument to our find method in order to project
the score, we can use the $meta operator to add the textScore. The textScore is a meta field added/
managed by mongoDB for text searches i.e. $text operator on a text index.
$ db.products.find( { $text: { $search: ”amazing t-shirt” } }, { score: { $meta: “textScore” } } ).pretty( )
We would see the score mongoDB has assigned to a result and it automatically sorts all returned
documents by the score. To make sure that the returned documents are sorted we could add the
sort command as seen below — however, this is a longer syntax and the above already sorts by the
score:
$ db.products.find( { $text: { $search: ”amazing t-shirt” } }, { score: { $meta:
“textScore” } } ).sort( { score: { $meta: “textScore” } } ).pretty( )
We can therefore use the textScore meta managed by mongoDB to sort the returned results for us.
173

Working with Indexes

Creating Combined Text Indexes
As previously mentioned we can only have one text index per collection. If we look at the indexes
using the below syntax we would notice that the default_language for the text index is English
which we are able to change which we will see later.
$ db.products.getIndexes( )
If we try to add another text index to the same collection but now on the title like so:
$ db.products.createIndex( { title: “text” } )
Notice that we would now receive an IndexOptionsConflict error in the shell and this is because we
can only have one text index per collection. However, what we can do is merge the text of multiple
fields together into one text index. First, we would need to drop the existing text index — dropping
text indexes is a little harder as we cannot drop by the field name (i.e. { title: “text” } will not work),
rather we need to use the text index name.
$ db.products.dropIndex( { title: “text” } )
$ db.products.dropIndex(“description_text”)

174

Now that we have dropped the existing text index from the collection, we can now create a new text
index combining/merging multiple fields.
$ db.products.createIndex( { title: “text”, description: “text” } )
Ultimately, we will still only have one text index in our collection; however, it will contain the
keywords from both the title and description fields. We can now search for keywords that we have in
the title for example we can search for the keyword book which appears in both the title and
description or a keyword that only appears in the title and not the description and vice versa.

Working with Indexes

Using Text Indexes to Exclude Words
With text indexes not only can we search for keywords but we can also exclude/rule out keywords.
$ db.products.find( { $text: { $search: ”amazing —t-shirt” } } ).pretty( )
In the example above, by adding the minus in front of the keyword, this will tell mongoDB to
exclude any results that has the keyword t-shirt. This is really helpful to narrow down text search
queries like the above where we find amazing products that are not T-Shirts or which at least don’t
have T-Shirt in the title or in the description (the above result will only return one document and not
both as previously seen with the keyword of amazing t-shirt).
175

Working with Indexes

Setting the Default Language & Using Weights
To drop an existing text index we would first need to search for the index name and then use the
dropIndex command as seen below:
$ db.products.getIndexes( )
$ db.products.dropIndex(“title_text_description_text”)
If we now create a new index but now pass in a second options argument, we have two interesting
options available to configure about our text indexes. The first option is the default language - we
can assign the default language to a new value. The default language is English, but we can set this
to a different language such as German — mongoDB has a list of supported languages we can use.
This will determine how words are stemmed i.e. how prefixes are removed and what stop words are
removed for example words like “is” or “a” are removed in English while words like “iste” and “deya”
are removed in German. It is important to note that English is the default language but we can
explicitly specify the option.
$ db.products.createIndex( { title: “text”, description: “text” }, { default_language: “german” } )
The second option we have available to us is the ability to set different weightings for the different
176

fields we merge together. So in the below example we are merging the text and description fields
together, however, we would want to specify that the description should be a higher weight. The
weights are important for when mongoDB calculates the score of the results. To set up such
weights, we can add the weights key in our config object and this key holds a document as a value
where we reference the field name and assign a weights that are relative to each other.
$ db.products.createIndex( { title: “text”, description: “text” }, { default_language: “english”, weights: {
title: 1, description: 10 } } )
The description will be worth/weigh in ten times as much as the title. If we search for our keyword in
our products collection index we can not only search for a keyword but also set the language as
seen below:
$ db.products.find( { $text: { $search: “red”, $language: “german” } } )
This is an interesting search option if we use a different way of storing the language for different
documents. We can also turn on case sensitivity using the caseSensitive set to true. The default for
caseSensitive is false, demonstrated below:
$ db.products.find( { $text: { $search: “red”, $caseSensitive: true } } )
If we print the score we would notice that the scoring will be weighted differently if we set weight
options.
$ db.products.find( { $text: { $search: “red” } }, { score: { $meta: “textScore” } } ).pretty( )
177

Working with Indexes

Building Indexes
There are two ways in which we can build indexes; a foreground and a background.
So far we have always added indexes in the foreground with the createIndex just as we executed it.
Something we did not notice because it always occurred instantly; but during the creation of the
index the collection will be locked and we cannot edit the collection. On the other hand we can also
add indexes in the background and the collection will still be accessible.
The advantage of the foreground mode is that it is faster and the background mode is slower.
However, if we have a collection that is used in production, we probably do not want to lock it just
because we are adding an index.

178

Foreground

Background

Collection is locked
during index creation.

Collection is accessible
during index creation.

Faster

Slower

We will observe how we can add an index in the background and see what difference it would
make. To see the difference we can use the credit-rating.js file, and mongoDB shell can execute this
file by simply typing mongoDB followed by the JavaScript file name.
$ mongo credit-rating.js
MongoDB will still connect to the server but it will then execute the file and basically execute the
command in the .js file against the server. In this file we have a for loop that will add one million
documents to a collection with random numbers. Executing this will take quite a while depending
on our system and we can always quit the command using control + c on our keyboard or
alternatively reduce the number of document created in the .js file for loop. Once completed we
would have a new database and collection with one million documents in the collection.
$ show dbs
$ use credit
$ show collections
$ db.ratings.cound( )
We can use this collection to demonstrate the difference between both the foreground and
background modes. If we were to create an index now on this collection, we would notice that the
indexing does not occur instantly because we have a million documents although this can still be
quick depending on our system.
179

To demonstrate the point where the foreground mode takes time to create but also blocks us from
doing anything with the collection while it is creating the index, we would open a second mongoDB
shell instance and prepare a query in that new shell.
$ db.ratings.findOne( )
In the first shell instance we would need to create the index, but then quickly change to the second
shell instance to run the findOne query (as it does not take too long to create the new index).
$ db.ratings.createIndex( { age: 1 } )
We will notice the findOne( ) query does not finish instantly as it takes a while as it waits for the
foreground mode index creating to complete before it can execute the query. There are no errors
but the commands are being deferred until the index has been created.
For more complex indexes such as a text index or for even more documents etc. which would make
the index creation take much longer. This will become a problem because the database or the
collection might be locked for a couple of minutes or longer and therefore this is not an alternative
for a production database. This is because we cannot suddenly lock down the entire database and
the app can’t interact with the database anymore, which is why we can create indexes in the
background.

180

To create a background mode index, we would pass in a second option argument to our
createIndex command setting the background to true (the default is set to false, meaning indexes
are created in the foreground).
$ db.ratings.createIndex( { age: 1 }, { background: true } )
If we now create the new index and in the second shell instance run a command, we can
demonstrate that the database/collection is no longer locked during the index creation.
$ db.ratings.insertOne( { person_id: “dfjve9f348u6iew”, score: 44.2531, age: 60 } )
We should notice that the insertOne command should continue to work and a new document
inserted immediately while the indexes is being created in the background. This is a very useful
feature for a production databases as we do not want to add an index in the foreground in
production especially not if the index creation will take quite a while.
Useful Links:
https://docs.mongodb.com/manual/core/index-partial/
https://docs.mongodb.com/manual/reference/text-search-languages/#text-search-languages
https://docs.mongodb.com/manual/tutorial/specify-language-for-text-index/#create-a-text-indexfor-a-collection-in-multiple-languages
181

Working with Geospatial Data

Adding GeoJSON Data
In the official documents we will find an article about GeoJSON and how it is structured and which
kind of GeoJSON objects mongoDB supports. MongoDB supports all major important objects such
as points, lines or polygons as well as special advanced objects.
https://docs.mongodb.com/manual/reference/geojson/
The most important thing is to understand how GeoJSON objects are created and creating it is very
simple. To work with some data we can open up Google maps and work with different locations.
On Google maps if we click on a location, we can easily access the coordinates of the place from
inside the URL. The first coordinate is the latitude while the second coordinate after the comma is
the longitude. We will need to remember this in order to store it correctly in mongoDB. The
longitude describes a position on a vertical axis and the latitude describes a horizontal axis on the
earth globe. With this coordinate system we can map any point onto our earth.
Below is an example of adding a GeoJSON data in mongoDB.
$ use awesomeplaces
$ db.places.insertOne( { name: “California Academy of Sciences”, location: { type: “Point”,
coordinates: [ —122.4724356, 37.7672544 ] } } )
182

There is nothing special about GeoJSON as we can add any key name we want i.e. location, loc or
something completely different. What matters with GeoJSON data is the structure of the value. The
value should be an embedded document and in that embedded document we need two pieces of
information, the type and the coordinates. The coordinates is an array where the first value has to be
longitude and the second value is the latitude. The type must be one of the supported types by
mongoDB such as point.
We have now created a GeoJSON object and mongoDB will treat the document as a GeoJSON
object because it has fulfilled the requirements of having a type which is one of the supported
objects and having coordinates which is an array where the first value is treated as a longitude and
the second value is treated as a latitude.

Working with Geospatial Data

Running Geo Queries
We may have a web application where users can locate themselves, we can do this through some
web API or a mobile app where the use can locate themselves. Location APIs will always return
coordinates in the form of latitude and longitude which is the standard format. Our application will
give us some latitude and longitude data for whatever the user did, for example locating
themselves.
183

We can simulate this by taking another location from google maps to query whether the location we
created in the previous section is located near the new location coordinates.
$ db.places.find( { location: { $near: { $geometry: { type: “Point”, coordinates: [ —122.471114,
37.771104 ] } } } } )
The location will relate to what we named the key and is not a special reserved key (i.e. if we called
this loc then we would need to use loc). The $near operator provided by mongoDB is a operator for
working with geospatial data. The $near operator requires another document as a value and in
there we can now define a $geometry for which we want to check if it is near to. The $geometry
takes in a document which describes a GeoJSON object. We could check here if a point we add her
is close to our point.
The above query requires a geospatial index in order to run this query without running into any
errors (the above query will not run and will error), but not all geospatial queries require index but
they all, just as with other indexes, will most likely benefit from having such an index.

184



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 184
PDF Version                     : 1.4
Title                           : MongoDB - A Complete Developers Guide
Producer                        : Mac OS X 10.13.6 Quartz PDFContext
Creator                         : Pages
Create Date                     : 2019:04:12 23:02:22Z
Modify Date                     : 2019:04:12 23:02:22Z
EXIF Metadata provided by EXIF.tools

Navigation menu