Mongo DB Admin Guide

User Manual:

Open the PDF directly: View PDF .
Page Count: 220 [warning: Documents this large are best viewed by clicking the View PDF Link!]

MongoDB Administrator Training

Release 3.4

MongoDB, Inc.

Nov 15, 2017

Contents

1Introduction 4

1.1 Warm Up ................................................. 4

1.2 MongoDB - The Company ....................................... 5

1.3 MongoDB Overview ........................................... 5

1.4 MongoDB Stores Documents ..................................... 8

1.5 MongoDB Data Types ......................................... 11

1.6 Lab: Installing and Conﬁguring MongoDB .............................. 14

2Storage 18

2.1 Introduction to Storage Engines .................................... 18

3CRUD 24

3.1 Creating and Deleting Documents .................................. 24

3.2 Reading Documents .......................................... 29

3.3 Query Operators ............................................ 36

3.4 Lab: Finding Documents ........................................ 40

3.5 Updating Documents .......................................... 40

3.6 Lab: Updating Documents ....................................... 49

4Indexes 50

4.1 Index Fundamentals .......................................... 50

4.2 Lab: Basic Indexes ........................................... 56

4.3 Compound Indexes ........................................... 57

4.4 Lab: Optimizing an Index ........................................ 62

4.5 Multikey Indexes ............................................. 63

4.6 Hashed Indexes ............................................. 67

4.7 Geospatial Indexes ........................................... 68

4.8 Using Compass with Indexes ..................................... 75

4.9 TTL Indexes ............................................... 79

4.10 Text Indexes ............................................... 81

4.11 Partial Indexes .............................................. 83

4.12 Lab: Finding and Addressing Slow Operations ........................... 86

4.13 Lab: Using explain() ........................................ 87

5ReplicaSets 88

5.1 Introduction to Replica Sets ...................................... 88

5.2 Elections in Replica Sets ........................................ 91

5.3 Replica Set Roles and Conﬁguration ................................. 96

5.4 The Oplog: Statement Based Replication .............................. 97

5.5 Lab: Working with the Oplog ...................................... 99

5.6 Write Concern ..............................................101

5.7 Read Concern ..............................................106

5.8 Read Preference ............................................113

5.9 Lab: Setting up a Replica Set .....................................114

6Sharding 118

6.1 Introduction to Sharding ........................................118

6.2 Balancing Shards ............................................125

6.3 Shard Zones ...............................................127

6.4 Lab: Setting Up a Sharded Cluster ..................................129

7ReportingToolsandDiagnostics 136

7.1 Performance Troubleshooting .....................................136

8BackupandRecovery 144

8.1 Backup and Recovery .........................................144

9Aggregation 149

9.1 Intro to Aggregation ...........................................149

10 Views 157

10.1 Views Tutorial ..............................................157

10.2 Lab: Vertical Views ...........................................159

10.3 Lab: Horizontal Views .........................................160

10.4 Lab: Reshaped Views .........................................161

11 Security 162

11.1 Security Introduction ..........................................162

11.2 Authorization ...............................................165

11.3 Lab: Administration Users .......................................171

11.4 Lab: Create User-Deﬁned Role (Optional) ..............................172

11.5 Authentication ..............................................174

11.6 Lab: Secure mongod ..........................................175

11.7 Auditing ..................................................176

11.8 Encryption ................................................178

11.9 Log Redaction ..............................................180

11.10Lab: Secured Replica Set - KeyFile (Optional) ...........................181

11.11Lab: LDAP Authentication & Authorization (Optional) ........................184

11.12Lab: Security Workshop ........................................186

12 MongoDB Atlas, Cloud & Ops Manager Fundamentals 195

12.1 MongoDB Cloud & Ops Manager ...................................195

12.2 Automation ................................................197

12.3 Lab: Cluster Automation ........................................200

12.4 Monitoring ................................................201

12.5 Lab: Create an Alert ..........................................203

12.6 Backups .................................................203

13 MongoDB Cloud & Ops Manager Under the Hood 206

13.1 API ....................................................206

13.2 Lab: Cloud Manager API ........................................207

13.3 Architecture (Ops Manager) ......................................208

13.4 Security (Ops Manager) ........................................210

13.5 Lab: Install Ops Manager .......................................211

14 Introduction to MongoDB BI Connector 214

14.1 MongoDB Connector for BI ......................................214

1 Introduction

Warm Up (page 4) Activities to get the class started

MongoDB - The Company (page 5) About MongoDB, the company

MongoDB Overview (page 5) MongoDB philosophy and features

MongoDB Stores Documents (page 8) The structure of data in MongoDB

MongoDB Data Types (page 11) An overview of BSON data types in MongoDB

Lab: Installing and Conﬁguring MongoDB (page 14) Install MongoDB and experiment with a few operations.

1.1 Warm Up

Introductions

• Who am I?

•MyroleatMongoDB

•Mybackgroundandpriorexperience

Getting to Know You

• Who are you?

• What role do you play in your organization?

• What is your background?

•DoyouhavepriorexperiencewithMongoDB?

MongoDB Experience

• Who has never used MongoDB?

• Who has some experience?

• Who has worked with production MongoDB deployments?

• Who is more of a developer?

• Who is more of an operations person?

Logistics

1.2 MongoDB - The Company

10gen

•MongoDBwasinitiallycreatedin2008aspartofahostedapplication stack.

•Thecompanywasoriginallycalled10gen.

•Aspartoftheiroverarchingplantocreatethe10genplatform, the company built a database.

•Suddenlyeverybodysaid:“Ilikethat!Givemethatdatabase!”

Origin of MongoDB

•10genbecameadatabasecompany.

•In2013,thecompanyrebrandedasMongoDB,Inc.

•Thefoundershaveotherstartupstotheircredit:DoubleClick, ShopWiki, Gilt.

•Themotivationforthedatabasecamefromobservingthefollowing pattern with application development.

–The user base grows.

–The associated body of data grows.

–Eventually the application outgrows the database.

–Meeting performance requirements becomes difﬁcult.

1.3 MongoDB Overview

Learning Objectives

Upon completing this module students should understand:

•MongoDBvs.relationaldatabasesandkey/valuestores

•Verticalvs.horizontalscaling

•TheroleofMongoDBinthedevelopmentstack

•ThestructureofdocumentsinMongoDB

•Arrayﬁelds

•Embeddeddocuments

•FundamentalsofBSON

MongoDB is a Document Database

Documents are associative arrays like:

•Pythondictionaries

•Rubyhashes

•PHParrays

•JSONobjects

An Example MongoDB Document

AMongoDBdocumentexpressedusingJSONsyntax.

{

"_id" :"/apple-reports-second-quarter-revenue",

"headline" :"Apple Reported Second Quarter Revenue Today",

"date" :ISODate("2015-03-24T22:35:21.908Z"),

"author" :{

"name" :"Bob Walker",

"title" :"Lead Business Editor"

"copy" :"Apple beat Wall St expectations by reporting ...",

"tags" :[

"AAPL","Earnings","Cupertino"

"comments" :[

{"name" :"Frank","comment" :"Great Story" },

{"name" :"Wendy","comment" :"When can I buy an Apple Watch?" }

]

}

Vertical Scaling

CPU

RAM

I/O

CPU

RAM

I/O

Scaling with MongoDB

1TB

Collection1

Shard A

256 GB

Shard B

256 GB

Shard C

256 GB

Shard D

256 GB

Collection1

Database Landscape

Depth of Functionality

Scalability & Performance

RDBMS

Memcached

MongoDB

MongoDB Deployment Models

1.4 MongoDB Stores Documents

Learning Objectives

Upon completing this module, students should understand:

•JSON

•BSONbasics

•Thatdocumentsareorganizedintocollections

JSON

•JavaScriptObjectNotation

•Objectsareassociativearrays.

•Theyarecomposedofkey-valuepairs.

ASimpleJSONObject

{

"firstname" :"Thomas",

"lastname" :"Smith",

"age" :29

}

JSON Keys and Values

•Keysmustbestrings.

•Valuesmaybeanyofthefollowing:

–string (e.g., “Thomas”)

–number (e.g., 29, 3.7)

–true / false

–null

–array (e.g., [88.5, 91.3, 67.1])

–object

•Moredetailat

json.org1.

Example Field Values

{

"headline" :"Apple Reported Second Quarter Revenue Today",

"date" :ISODate("2015-03-24T22:35:21.908Z"),

"views" :1234,

"author" :{

"name" :"Bob Walker",

"title" :"Lead Business Editor"

"tags" :[

"AAPL",

23,

{"name" :"city","value" :"Cupertino" },

{"name" :"stockPrice","value":NumberDecimal("143.51")},

["Electronics","Computers" ]

]

}

1http://json.org/

BSON

•MongoDBstoresdataasBinaryJSON(BSON).

•MongoDBdriverssendandreceivedatainthisformat.

•TheymapBSONtonativedatastructures.

•BSONprovidessupportforallJSONdatatypesandseveralothers.

•BSONwasdesignedtobelightweight,traversableandefﬁcient.

•Seebsonspec.org2.

BSON Hello World

// JSON

{"hello" :"world" }

// BSON

x16 x0 x0 x0 // document size

x2 // type 2=string

hellox0 // name of the field, null terminated

x6 x0 x0 x0 // size of the string value

worldx0 // string value, null terminated

x0 // end of document

AMoreComplexBSONExample

// JSON

{"BSON" :["awesome",5.05,1986 ]}

// BSON

x31 x0 x0 x0 // document size

x4 // type=4, array

BSONx0 // name of first element

x26 x0 x0 x0 // size of the array, in bytes

x2 // type=2, string

x30 x0 // element name ’0’

x8 x0 x0 x0 // size of value for array element 0

awesomex0 // string value for element 0

x1 // type=1, double

x31 x0 // element name ’1’

x33 x33 x33 x33 x33 x33 x14 x40 // double value for array element 1

x10 // type=16, int32

x32 x0 // element name ’2’

xc2 x7 x0 x0 // int32 value for array element 2

2http://bsonspec.org/#/speciﬁcation

Documents, Collections, and Databases

•Documentsarestoredincollections.

•Collectionsarecontainedinadatabase.

•Example:

–Database: products

–Collections: books, movies, music

•Eachdatabase-collectioncombinationdeﬁnesanamespace.

–products.books

–products.movies

–products.music

The _id Field

•Alldocumentsmusthavean_id ﬁeld.

•Ifno_id is speciﬁed when a document is inserted, MongoDB will add the _id ﬁeld as an ObjectId.

•MostdriverswillactuallycreatetheObjectIdifno_id is speciﬁed.

•Somerestrictions:

–The _id is immutable.

–Can not be an array

–The _id ﬁeld must be unique to a collection

*acts as Primary key for replication.

1.5 MongoDB Data Types

Learning Objectives

By the end of this module, students should understand:

• What data types MongoDB supports

•SpecialconsiderationforsomeBSONtypes

What is BSON?

BSON is a binary serialization of JSON, used to store documents and make remote procedure calls in MongoDB. For

more in-depth coverage of BSON, speciﬁcally refer to bsonspec.org3

Note: All ofﬁcial MongoDB drivers map BSON to native types and data structures

BSON types

MongoDB supports a wide range of BSON types. Each data type hasacorrespondingnumberandstringaliasthatcan

be used with the $type operator to query documents by BSON type.

Double 1“double”

String 2“string”

Object 3“object”

Array 4“array”

Binary data 5“binData”

ObjectId 7“objectId”

Boolean 8“bool”

Date 9“date”

Null 10 “null”

BSON types continued

Regular Expression 11 “regex”

JavaScript 13 “javascript”

JavaScript (w/ scope) 15 “javascriptWithScope”

32-bit integer 16 “int”

Timestamp 17 “timestamp”

64-bit integer 18 “long”

Decimal128 19 “decimal”

Min key -1 “minKey”

Max key 127 “maxKey”

3http://bsonspec.org/

ObjectId

Date MAC address PID Counter

12 byte Hex String

ObjectId:

>ObjectId()

ObjectId("58dc309ce3f39998099d6275")

Timestamps

BSON has a special timestamp type for internal MongoDB use and is not associated with the regular Date type.

Date

BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This

results in a representable date range of about 290 million years into the past and future.

•OfﬁcialBSONspecreferstotheBSONDatetypeasUTCdatetime

•Signeddatatype.Negativevaluesrepresentdatesbefore1970.

var today =ISODate() // using the ISODate constructor

Decimal

In MongoDB 3.4, support was added for 128-bit decimals.

•Thedecimal BSON type uses the decimal128 decimal-based ﬂoating-point numbering format.

•Thissupports34signiﬁcantdigitsandanexponentrangeof-6143 to +6144.

•Intendedforapplicationsthathandlemonetaryandscientiﬁc data that requires exact precision.

How to use Decimal

For speciﬁc information about how your preferred driver supports decimal128, click here4.

In the Mongo shell, we use the NumberDecimal() constructor.

•Canbecreatedwithastringargumentoradouble

•StoredinthedatabaseasNumberDecimal(“999.4999”)

>NumberDecimal("999.4999")

NumberDecimal("999.4999")

>NumberDecimal(999.4999)

NumberDecimal("999.4999")

4https://docs.mongodb.com/ecosystem/drivers/

Decimal Considerations

•Ifupgradinganexistingdatabasetousedecimal128,itisrecommendedanewﬁeldbeaddedtoreﬂectthenew

type. The old ﬁeld may be deleted after verifying consistency

•Ifanyﬁeldscontaindecimal128 data, they will not be compatible with previous versions of MongoDB. There

is no support for downgrading dataﬁles containing decimals

•decimal types are not strictly equal to their double representations, so use the NumberDecimal constructor in

queries.

1.6 Lab: Installing and Conﬁguring MongoDB

Learning Objectives

Upon completing this exercise students should understand:

•HowMongoDBisdistributed

•HowtoinstallMongoDB

•ConﬁgurationstepsforsettingupasimpleMongoDBdeployment

•HowtorunMongoDB

•HowtoruntheMongoshell

Production Releases

64-bit production releases of MongoDB are available for the following platforms.

•Windows

•OSX

•Linux

Installing MongoDB

•Visithttps://docs.mongodb.com/manual/installation/.

•PleaseinstalltheEnterpriseversionofMongoDB.

•Clickontheappropriatelink,suchas“InstallonWindows”or “Install on OS X” and follow the instructions.

•Versions:

–Even-numbered builds are production releases, e.g., 2.4.x,2.6.x.

–Odd-numbers indicate development releases, e.g., 2.5.x, 2.7.x.

Linux Setup

PATH=$PATH:<path to mongodb>/bin

sudo mkdir -p /data/db

sudo chmod -R 744 /data/db

sudo chown -R `whoami`/data/db

Install on Windows

•Downloadandrunthe.msiWindowsinstallerfrommongodb.org/downloads.

•Bydefault,binarieswillbeplacedinthefollowingdirectory.

C:\Program Files\MongoDB\Server\<VERSION>\bin

•ItishelpfultoaddthelocationoftheMongoDBbinariestoyour path.

•Todothis,from“SystemProperties”select“Advanced”then“EnvironmentVariables”

Create a Data Directory on Windows

•EnsurethereisadirectoryforyourMongoDBdataﬁles.

•Thedefaultlocationis\data\db.

•Createadatadirectorywithacommandsuchasthefollowing.

md \data\db

Launch a mongod

Explore the mongod command.

<path to mongodb>/bin/mongod --help

Launch a mongod with the MMAPv1 storage engine:

<path to mongodb>/bin/mongod --storageEngine mmapv1

Alternatively, launch with the WiredTiger storage engine (default).

<path to mongodb>/bin/mongod

Specify an alternate path for data ﬁles using the --dbpath option. (Make sure the directory already exists.) E.g.,

<path to mongodb>/bin/mongod --dbpath /test/mongodb/data/wt

The MMAPv1 Data Directory

ls /data/db

•Themongod.lockﬁle

–This prevents multiple mongods from using the same data directory simultaneously.

–Each MongoDB database directory has one .lock.

–The lock ﬁle contains the process id of the mongod that is usingthedirectory.

•Dataﬁles

–The names of the ﬁles correspond to available databases.

–Asingledatabasemayhavemultipleﬁles.

The WiredTiger Data Directory

ls /data/db

•Themongod.lockﬁle

–Used in the same way as MMAPv1.

•Dataﬁles

–Each collection and index stored in its own ﬁle.

–Will fail to start if MMAPv1 ﬁles found

Import Exercise Data

unzip usb_drive.zip

cd usb_drive

mongoimport -d sample -c tweets twitter.json

mongoimport -d sample -c zips zips.json

mongoimport -d sample -c grades grades.json

cd dump

mongorestore -d sample city

mongorestore -d sample digg

Note: If there is an error importing data directly from a USB drive, please copy the sampledata.zip ﬁle to your local

computer ﬁrst.

Launch a Mongo Shell

Open another command shell. Then type the following to start the Mongo shell.

mongo

Display available commands.

help

Explore Databases

Display available databases.

show dbs

To use a particular database we can type the following.

use <database_name>

Exploring Collections

show collections

db.<COLLECTION>.help()

db.<COLLECTION>.find()

Admin Commands

•Therearealsoanumberofadmincommandsatourdisposal.

•Thefollowingwillshutdownthemongodweareconnectedtothrough the Mongo shell.

•YoucanalsojustkillwithCtrl-Cintheshellwindowfromwhich you launched the mongod.

db.adminCommand({shutdown : 1})

•Conﬁrmthatthemongodprocesshasindeedstopped.

•Onceyouhave,pleaserestartit.

2 Storage

Introduction to Storage Engines (page 18) MongoDB storage engines

2.1 Introduction to Storage Engines

Learning Objectives

Upon completing this module, students should be familiar with:

•AvailablestorageenginesinMongoDB

•MongoDBjournalingmechanics

•ThedefaultstorageengineforMongoDB

•Commonstorageengineparameters

•ThestorageengineAPI

What is a Database Storage Engine?

How Storage Engines Affect Performance

• Writing and reading documents

•Concurrency

•Compressionalgorithms

•Indexformatandimplementation

•On-diskformat

Storage Engine Journaling

•Keeptrackofallchangesmadetodataﬁles

•Stagewritessequentiallybeforetheycanbecommittedtothe data ﬁles

•Crashrecovery,writesfromjournalcanbereplayedtodataﬁles in the event of a failure

MongoDB Storage Engines

As of MongoDB 3.4, three storage engine options are available:

•WiredTiger(default)

–with the option of on-disk/at rest encryption (Enterprise only)

•MMAPv1

•In-memorystorage(Enterpriseonly)

Specifying a MongoDB Storage Engine

Use the --storageEngine parameter to specify which storage engine MongoDB should use. E.g.,

mongod --storageEngine mmapv1

Specifying a Location to Store Data Files

•Usethedbpath parameter

mongod --dbpath /data/db

•Otherﬁlesarealsostoredhere.E.g.,

–mongod.lock ﬁle

–journal

•SeetheMongoDBdocsforacompletelistof

storage options5.

5http://docs.mongodb.org/manual/reference/program/mongod/#storage-options

MMAPv1 Storage Engine

•MMAPv1isMongoDB’soriginalstorageenginewasthedefaultuptoMongoDB3.0.

•specifytheuseoftheMMAPv1storageengineasfollows:

mongod --storageEngine mmapv1

•MMAPv1isbasedonmemory-mappedﬁles,whichmapdataﬁlesondiskintovirtualmemory.

•AsofMongoDB3.0,MMAPv1supportscollection-levelconcurrency.

MMAPv1 Workloads

MMAPv1 excels at workloads where documents do not outgrow their original record size:

•High-volumeinserts

•Read-onlyworkloads

•In-placeupdates

Power of 2 Sizes Allocation Strategy

•MongoDB3.0usesallocationasthedefaultrecordallocation strategy for MMAPv1.

•Withthisstrategy,recordsincludethedocumentplusextraspace,orpadding.

•Eachrecordhasasizeinbytesthatisapowerof2(e.g.32,64,128,...2MB).

•Fordocumentslargerthan2MB,allocationisroundeduptothe nearest multiple of 2MB.

•ThisstrategyenablesMongoDBtoefﬁcientlyreusefreedrecords to reduce fragmentation.

•Inaddition,theaddedpaddinggivesadocumentroomtogrowwithout requiring a move.

–Saves the cost of moving a document

–Results in fewer updates to indexes

Compression in MongoDB

•Compressioncansigniﬁcantlyreducetheamountofdiskspace / memory required.

•ThetradeoffisthatcompressionrequiresmoreCPU.

•MMAPv1doesnotsupportcompression.

•WiredTigerdoes.

WiredTiger Storage Engine

•TheWiredTigerstorageengineexcelsatallworkloads,especially write-heavy and update-heavy workloads.

•NotablefeaturesoftheWiredTigerstorageenginethatdonot exist in the MMAPv1 storage engine include:

–Compression

–Document-level concurrency

•DefaultstorageenginesinceMongoDB3.2.

•Forolderversions,specifytheuseoftheWiredTigerstorage engine as follows.

mongod --storageEngine wiredTiger

WiredTiger Compression Options

•snappy (default): less CPU usage than zlib,lessreductionindatasize

•zlib:greaterCPUusagethansnappy,greaterreductionindatasize

•nocompression

Conﬁguring Compression in WiredTiger

Use the wiredTigerCollectionBlockCompressor parameter. E.g.,

mongod --storageEngine wiredTiger

--wiredTigerCollectionBlockCompressor zlib

Conﬁguring Memory Usage in WiredTiger

Use the wiredTigerCacheSize parameter to designate the amount of RAM for the cache used by the WT storage

engine.

•Bydefault,thisvalueissettothemaximumof:

–50% of physical RAM minus 1GB or 256 MB (for MongoDB 3.4+)

–60% of physical RAM minus 1GB or 1GB (for MongoDB 3.2)

•Additionally,MongoDBusesmemoryforconnections,aggregations, sorts, ...

•TherestofthememoryisusedbytheFileSystemCache,whichis also needed by WT for optimal performance.

Journaling in MMAPv1 vs. WiredTiger

•MMAPv1useswrite-aheadjournalingtoensureconsistencyand durability between fsyncs.

•WiredTigerusesawrite-aheadlogincombinationwithcheckpoints to ensure durability.

•Regardlessofstorageengine,alwaysusejournalinginproduction.

MMMAPv1 Journaling Mechanics

•Journalﬁlesin<DATA-DIR>/journalareappendonly

•1GBperjournalﬁle

•OnceMongoDBappliesallwriteoperationsfromajournalﬁletothedatabasedataﬁles,itdeletesthejournal

ﬁle (or re-uses it)

•Usuallyonlyafewjournalﬁlesinthe<DATA-DIR>/journaldirectory

MMAPv1 Journaling Mechanics (Continued)

•Dataisﬂushedfromthesharedviewtodataﬁlesevery60seconds (conﬁgurable)

•Theoperatingsystemmayforceaﬂushatahigherfrequencythan 60 seconds if the system is low on free

memory

•Onceajournalﬁlecontainsonlyﬂushedwrites,itisnolonger needed for recovery and can be deleted or re-used

WiredTiger Journaling Mechanics

•WiredTigerwillcommitacheckpointtodiskevery60secondsorwhenthereare2gigabytesofdatatowrite.

•Betweenandduringcheckpointsthedataﬁlesarealwaysvalid.

•TheWiredTigerjournalpersistsalldatamodiﬁcationsbetween checkpoints.

•IfMongoDBexitsbetweencheckpoints,itusesthejournaltoreplayalldatamodiﬁedsincethelastcheckpoint.

•Bydefault,WiredTigerjournaliscompressedusingsnappy.

Storage Engine API

MongoDB 3.0 introduced a storage engine API:

•Abstractedstorageenginefunctionalityinthecodebase

•EasierforMongoDBtodevelopfuturestorageengines

•EasierforthirdpartiestodeveloptheirownMongoDBstorage engines

Conclusion

•MongoDB3.0introducespluggablestorageengines.

•Currentoptionsinclude:

–MMAPv1 (default)

–WiredTiger

•WiredTigerintroducesthefollowingtoMongoDB:

–Compression

–Document-level concurrency

3CRUD

Creating and Deleting Documents (page 24) Inserting documents into collections, deleting documents,anddrop-

ping collections

Reading Documents (page 29) The ﬁnd() command, query documents, dot notation, and cursors

Query Operators (page 36) MongoDB query operators including: comparison, logical, element, and array operators

Lab: Finding Documents (page 40) Exercises for querying documents in MongoDB

Updating Documents (page 40) Using update methods and associated operators to mutate existing documents

Lab: Updating Documents (page 49) Exercises for updating documents in MongoDB

3.1 Creating and Deleting Documents

Learning Objectives

Upon completing this module students should understand:

•HowtoinsertdocumentsintoMongoDBcollections.

•_id ﬁelds:

•Howtodeletedocumentsfromacollection

•Howtoremoveacollectionfromadatabase

•HowtoremoveadatabasefromaMongoDBdeployment

Creating New Documents

•CreatedocumentsusinginsertOne() and insertMany().

•Forexample:

// Specify the collection name

db.<COLLECTION>.insertOne( { "name" :"Mongo" })

// For example

db.people.insertOne( { "name" :"Mongo" })

Example: Inserting a Document

Experiment with the following commands.

use sample

db.movies.insertOne( { "title" :"Jaws" })

db.movies.find()

Implicit _id Assignment

•Wedidnotspecifyan_id in the document we inserted.

•Ifyoudonotassignone,MongoDBwillcreateoneautomatically.

•ThevaluewillbeoftypeObjectId.

Example: Assigning _ids

Experiment with the following commands.

db.movies.insertOne( { "_id" :"Jaws","year" :1975 })

db.movies.find()

Inserts will fail if...

•Thereisalreadyadocumentinthecollectionwiththat_id.

•Youtrytoassignanarraytothe_id.

•Theargumentisnotawell-formeddocument.

Example: Inserts will fail if...

// fails because _id can’t have an array value

db.movies.insertOne( { "_id" :["Star Wars",

"The Empire Strikes Back",

"Return of the Jedi" ]})

// succeeds

db.movies.insertOne( { "_id" :"Star Wars" })

// fails because of duplicate id

db.movies.insertOne( { "_id" :"Star Wars" })

// malformed document

db.movies.insertOne( { "Star Wars" })

insertMany()

•Youmaybulkinsertusinganarrayofdocuments.

•UseinsertMany() instead of insertOne()

Ordered insertMany()

•FororderedinsertsMongoDBwillstopprocessinginsertsupon encountering an error.

•Meaningthatonlyinsertsoccurringbeforeanerrorwillcomplete.

•Thedefaultsettingfordb.<COLLECTION>.insertMany is an ordered insert.

•Seethenextexerciseforanexample.

Example: Ordered insertMany()

Experiment with the following operation.

db.movies.insertMany( [ { "_id" :"Batman","year" :1989 },

{"_id" :"Home Alone","year" :1990 },

{"_id" :"Ghostbusters","year" :1984 },

{"_id" :"Ghostbusters","year" :1984 }])

db.movies.find()

Unordered insertMany()

•Pass{ordered: false}to insertMany() to perform unordered inserts.

•Ifanygiveninsertfails,MongoDBwillstillattemptallofthe others.

•Theinsertsmaybeexecutedinadifferentorderthanyouspeciﬁed.

•Thenextexerciseisverysimilartothepreviousone.

•However,weareusing{ordered: false}.

•Oneinsertwillfail,butalltherestwillsucceed.

Example: Unordered insertMany()

Experiment with the following insert.

db.movies.insertMany( [ { "_id" :"Jaws","year" :1975 },

{"_id" :"Titanic","year" :1997 },

{"_id" :"The Lion King","year" :1994 }],

{ ordered :false })

db.movies.find()

The Shell is a JavaScript Interpreter

•SometimesitisconvenienttocreatetestdatausingalittleJavaScript.

•Themongoshellisafully-functionalJavaScriptinterpreter. You may:

–Deﬁne functions

–Use loops

–Assign variables

–Perform inserts

Exercise: Creating Data in the Shell

Experiment with the following commands.

for (i=1;i<=10000;i++){

db.stuff.insert( { "a" :i})

}

db.stuff.find()

Deleting Documents

You may delete documents f r o m a M o n g o D B d e p l oyment i n s everalways.

•UsedeleteOne() and deleteMany() to delete documents matching a speciﬁc set of conditions.

•Dropanentirecollection.

•Dropadatabase.

Using deleteOne()

•DeleteadocumentfromacollectionusingdeleteOne()

•Thiscommandhasonerequiredparameter,aquerydocument.

•Theﬁrstdocumentinthecollectionmatchingthequerydocument will be deleted.

Using deleteMany()

•DeletemultipledocumentsfromacollectionusingdeleteMany().

•Thiscommandhasonerequiredparameter,aquerydocument.

•Alldocumentsinthecollectionmatchingthequerydocumentwillbedeleted.

•Passanemptydocumenttodeletealldocuments.

Example: Deleting Documents

Experiment with removing documents. Do a find() after each deleteMany() command below.

for (i=1;i<=20;i++) { db.testcol.insertOne( { _id :i, a :i})}

db.testcol.deleteMany( { a :1}) // Delete the first document

// $lt is a query operator that enables us to select documents that

// are less than some value. More on operators soon.

db.testcol.deleteMany( { a :{$lt:5}}) // Remove three more

db.testcol.deleteOne( { a :{$lt:10 }}) // Remove one more

db.testcol.deleteMany() // Error: requires a query document.

db.testcol.deleteMany( { } ) // All documents removed

Dropping a Collection

•Youcandropanentirecollectionwithdb.<COLLECTION>.drop()

•Thecollectionandalldocumentswillbedeleted.

•Itwillalsoremoveanymetadataassociatedwiththatcollection.

•Indexesareonetypeofmetadataremoved.

•All collection and indexes ﬁles are removed and space allocated reclaimed.

–Wired Tiger only!

•Moreonmetadatalater.

Example: Dropping a Collection

db.colToBeDropped.insertOne( { a :1})

show collections // Shows the colToBeDropped collection

db.colToBeDropped.drop()

show collections // collection is gone

Dropping a Database

•Youcandropanentiredatabasewithdb.dropDatabase()

•Thisdropsthedatabaseonwhichthemethodiscalled.

•Italsodeletestheassociateddataﬁlesfromdisk,freeingdisk space.

•Bewarethatinthemongoshell,thisdoesnotchangedatabasecontext.

Example: Dropping a Database

use tempDB

db.testcol1.insertOne( { a :1})

db.testcol2.insertOne( { a :1})

show dbs // Here they are

show collections // Shows the two collections

db.dropDatabase()

show collections // No collections

show dbs // The db is gone

use sample // take us back to the sample db

3.2 Reading Documents

Learning Objectives

Upon completing this module students should understand:

•Thequery-by-exampleparadigmofMongoDB

•Howtoqueryonarrayelements

•Howtoqueryembeddeddocumentsusingdotnotation

•Howthemongoshellanddriversusecursors

•Projections

•Cursormethods:.count(),.sort(),.skip(),.limit()

The find() Method

•ThisisthefundamentalmethodbywhichwereaddatafromMongoDB.

•Wehavealreadyuseditinitsbasicform.

•find() returns a cursor that enables us to iterate through all documents matching a query.

•Wewilldiscusscursorslater.

Query by Example

•ToqueryMongoDB,specifyadocumentcontainingthekey/value pairs you want to match

•Youneedonlyspecifyvaluesforﬁeldsyoucareabout.

•Otherﬁeldswillnotbeusedtoexcludedocuments.

•Theresultsetwillincludealldocumentsinacollectionthat match.

Example: Querying by Example

Experiment with the following sequence of commands.

db.movies.drop()

db.movies.insertMany( [

{"title" :"Jaws","year" :1975,"imdb_rating" :8.1 },

{"title" :"Batman","year" :1989,"imdb_rating" :7.6 }

])

db.movies.find()

db.movies.find( { "year" :1975 })

// Multiple Batman movies from different years, find the correct one

db.movies.find( { "year" :1989,"title" :"Batman" })

Querying Arrays

•InMongoDByoumayqueryarrayﬁelds.

•Specifyasinglevalueyouexpecttoﬁndinthatarrayindesired documents.

•Alternatively,youmayspecifyanentirearrayinthequerydocument.

•Aswewillseelater,therearealsoseveraloperatorsthatenhance our ability to query array ﬁelds.

Example: Querying Arrays

db.movies.drop()

db.movies.insertMany(

[{ "title" :"Batman","category" :["action","adventure" ]},

{"title" :"Godzilla","category" :["action","adventure","sci-fi" ]},

{"title" :"Home Alone","category" :["family","comedy" ]}

])

// Match documents where "category" contains the value specified

db.movies.find( { "category" :"action" })

// Match documents where "category" equals the value specified

db.movies.find( { "category" :["action","sci-fi" ]}) // no documents

// only the second document

db.movies.find( { "category" :["action","adventure","sci-fi" ]})

Querying with Dot Notation

•Dotnotationisusedtoqueryonﬁeldsinembeddeddocuments.

•Thesyntaxis:

"field1.field2" :value

•Putquotesaroundtheﬁeldnamewhenusingdotnotation.

Example: Querying with Dot Notation

db.movies.insertMany(

[{

"title" :"Avatar",

"box_office" :{"gross" :760,

"budget" :237,

"opening_weekend" :77

}

{

"title" :"E.T.",

"box_office" :{"gross" :349,

"budget" :10.5,

"opening_weekend" :14

}

])

db.movies.find( { "box_office" :{"gross" :760 }})// no values

// dot notation

db.movies.find( { "box_office.gross" :760 }) // expected value

Example: Arrays and Dot Notation

db.movies.insertMany( [

{"title" :"E.T.",

"filming_locations" :

[{"city" :"Culver City","state" :"CA","country" :"USA" },

{"city" :"Los Angeles","state" :"CA","country" :"USA" },

{"city" :"Cresecent City","state" :"CA","country" :"USA" }

]},

{"title":"Star Wars",

"filming_locations" :

[{"city" :"Ajim","state" :"Jerba","country" :"Tunisia" },

{"city" :"Yuma","state" :"AZ","country" :"USA" }

]}])

db.movies.find( { "filming_locations.country" :"USA" })// two documents

Projections

•Youmaychoosetohaveonlycertainﬁeldsappearinresultdocuments.

•Thisiscalledprojection.

•Youspecifyaprojectionbypassingasecondparametertofind().

Projection: Example (Setup)

db.movies.insertOne(

{

"title" :"Forrest Gump",

"category" :["drama","romance" ],

"imdb_rating" :8.8,

"filming_locations" :[

{"city" :"Savannah","state" :"GA","country" :"USA" },

{"city" :"Monument Valley","state" :"UT","country" :"USA" },

{"city" :"Los Anegeles","state" :"CA","country" :"USA" }

"box_office" :{

"gross" :557,

"opening_weekend" :24,

"budget" :55

}

})

Projection: Example

db.movies.findOne( { "title" :"Forrest Gump" },

{"title" :1,"imdb_rating" :1})

{

"_id" :ObjectId("5515942d31117f52a5122353"),

"title" :"Forrest Gump",

"imdb_rating" :8.8

}

Projection Documents

•IncludeﬁeldswithfieldName: 1.

–Any ﬁeld not named will be excluded

–except _id, which must be explicitly excluded.

•ExcludeﬁeldswithfieldName: 0.

–Any ﬁeld not named will be included.

Example: Projections

for (i=1;i<=20;i++){

db.movies.insertOne(

{"_id" :i, "title" :i,

"imdb_rating" :i, "box_office" :i})

}

db.movies.find()

// no "box_office"

db.movies.find( { "_id" :3}, { "title" :1,"imdb_rating" :1})

// no "imdb_rating"

db.movies.find( { "_id" :{ $gte :10 }},{"imdb_rating" :0})

// just "title"

db.movies.find( { "_id" :4}, { "_id" :0,"title" :1})

// just "imdb_rating", "box_office"

db.movies.find( { "_id" :5}, { _id :0,"title" :0})

// Can’t mix inclusion/exclusion except _id

db.movies.find( { "_id" :6}, { "title" :1,"imdb_rating" :0})

Cursors

• When you use find(),MongoDBreturnsacursor.

•Acursorisapointertotheresultset

•Youcangetiteratethroughdocumentsintheresultusingnext().

•Bydefault,themongoshellwilliteratethrough20documents at a time.

Example: Introducing Cursors

db.testcol.drop()

for (i=1;i<=10000;i++){

db.testcol.insertOne( { a :Math.floor( Math.random() *100 +1),

b:Math.floor( Math.random() *100 +1)})

}

db.testcol.find()

Example: Cursor Objects in the Mongo Shell

// Assigns the cursor returned by find() to a variable x

var x=db.testcol.find()

// Displays the first document in the result set.

x.next()

// True because there are more documents in the result set.

x.hasNext()

// Assigns the next document in the result set to the variable y.

y=x.next()

// Return value is the value of the a field of this document.

y.a

// Displaying a cursor prints the next 20 documents in the result set.

Cursor Methods

•count():Returnsthenumberofdocumentsintheresultset.

•limit():Limitstheresultsettothenumberofdocumentsspeciﬁed.

•skip():Skipsthenumberofdocumentsspeciﬁed.

Example: Using count()

db.testcol.drop()

for (i=1;i<=100;i++) { db.testcol.insertOne( { a :i})}

// all 100

db.testcol.count()

// just 41 docs

db.testcol.count( { a :{$lt:42 }})

// Another way of writing the same query

db.testcol.find( { a :{$lt:42 } } ).count( )

Example: Using sort()

db.testcol.drop()

for (i=1;i<=20;i++){

db.testcol.insertOne( { a :Math.floor( Math.random() *10 +1),

b:Math.floor( Math.random() *10 +1)})

}

db.testcol.find()

// sort descending; use 1 for ascending

db.testcol.find().sort( { a :-1})

// sort by b, then a

db.testcol.find().sort( { b :1,a:1})

// $natural order is just the order on disk.

db.testcol.find().sort( { $natural :1})

The skip() Method

•Skipsthespeciﬁednumberofdocumentsintheresultset.

•Thereturnedcursorwillbeginattheﬁrstdocumentbeyondthe number speciﬁed.

•Regardlessoftheorderinwhichyouspecifyskip() and sort() on a cursor, sort() happens ﬁrst.

The limit() Method

•Limitsthenumberofdocumentsinaresultsettotheﬁrstk.

•Specifykas the argument to limit()

•Regardlessoftheorderinwhichyouspecifylimit(),skip(),andsort() on a cursor, sort() happens

ﬁrst.

•Helpsreduceresourcesconsumedbyqueries.

The distinct() Method

•Returnsallvaluesforaﬁeldfoundinacollection.

•Onlyworksononeﬁeldatatime.

•Inputisastring(notadocument)

Example: Using distinct()

db.movie_reviews.drop()

db.movie_reviews.insertMany( [

{"title" :"Jaws","rating" :5},

{"title" :"Home Alone","rating" :1},

{"title" :"Jaws","rating" :7},

{"title" :"Jaws","rating" :4},

{"title" :"Jaws","rating" :8}])

db.movie_reviews.distinct( "title" )

3.3 Query Operators

Learning Objectives

Upon completing this module students should understand the following types of MongoDB query operators:

•Comparisonoperators

•Logicaloperators

•Elementqueryoperators

•Operatorsonarrays

Comparison Query Operators

•$lt:Existsandislessthan

•$lte:Existsandislessthanorequalto

•$gt:Existsandisgreaterthan

•$gte:Existsandisgreaterthanorequalto

•$ne:Doesnotexistordoesbutisnotequalto

•$in:Existsandisinaset

•$nin:Doesnotexistorisnotinaset

Example (Setup)

// insert sample data

db.movies.insertMany( [

{

"title" :"Batman",

"category" :["action","adventure" ],

"imdb_rating" :7.6,

"budget" :35

{

"title" :"Godzilla",

"category" :["action",

"adventure","sci-fi" ],

"imdb_rating" :6.6

{

"title" :"Home Alone",

"category" :["family","comedy" ],

"imdb_rating" :7.4

}

])

Example: Comparison Operators

db.movies.find()

db.movies.find( { "imdb_rating" :{ $gte :7}})

db.movies.find( { "category" :{$ne:"family" }})

db.movies.find( { "title" :{$in:["Batman","Godzilla" ]}})

db.movies.find( { "title" :{ $nin :["Batman","Godzilla" ]}})

Logical Query Operators

•$or:Matcheitheroftwoormorevalues

•$not:Usedwithotheroperators

•$nor:Matchneitheroftwoormorevalues

•$and:Matchbothoftwoormorevalues

–This is the default behavior for queries specifying more thanonecondition.

–Use $and if you need to include the same operator more than once in a query.

Example: Logical Operators

db.movies.find( { $or :[

{"category" :"sci-fi" }, { "imdb_rating" :{ $gte :7}}

]})

// more complex $or, really good sci-fi movie or medicore family movie

db.movies.find( { $or :[

{"category" :"sci-fi","imdb_rating" :{ $gte :8}},

{"category" :"family","imdb_rating" :{ $gte :7}}

]})

// find bad movies

db.movies.find( { "imdb_rating" :{ $not :{$gt:7}}})

Example: Logical Operators

// find movies within an imdb_rating range

db.movies.find( { "imdb_rating" :{$gt:5, $lte :7}}) // and is implicit

// queries can be nested, why are there no results?

db.movies.find( { $and :[

{$or:[

{"category" :"sci-fi","imdb_rating" :{ $gte :8}},

{"category" :"family","imdb_rating" :{ $gte :7}}

]},

{$or:[

{"category" :"action","imdb_rating" :{ $gte :6}}

]}

]})

Element Query Operators

•$exists:Selectdocumentsbasedontheexistenceofaparticularﬁeld.

•$type:Selectdocumentsbasedontheirtype.

•See

BSON types6for reference on types.

Example: Element Operators

db.movies.find( { "budget" :{ $exists :true }})

// type 1 is Double

db.movies.find( { "budget" :{ $type :1}})

// type 3 is Object (embedded document)

db.movies.find( { "budget" :{ $type :3}})

Array Query Operators

•$all:Arrayﬁeldmustcontainallvalueslisted.

•$size:Arraymusthaveaparticularsize.E.g.,$size : 2 means 2 elements in the array

•$elemMatch:Allconditionsmustbematchedbyatleastoneelementinthearray

6http://docs.mongodb.org/manual/reference/bson-types

Example: Array Operators

db.movies.find( { "category" :{ $all :["sci-fi","action" ]}})

db.movies.find( { "category" :{ $size :3}})

Example: $elemMatch

db.movies.insertOne( {

"title" :"Raiders of the Lost Ark",

"filming_locations" :[

{"city" :"Los Angeles","state" :"CA","country" :"USA" },

{"city" :"Rome","state" :"Lazio","country" :"Italy" },

{"city" :"Florence","state" :"SC","country" :"USA" }

]})

// This query is incorrect, it won’t return what we want

db.movies.find( {

"filming_locations.city" :"Florence",

"filming_locations.country" :"Italy"

})

// $elemMatch is needed, now there are no results, this is expected

db.movies.find( {

"filming_locations" :{

$elemMatch :{

"city" :"Florence",

"country" :"Italy"

}}})

3.4 Lab: Finding Documents

Exercise: student_id < 65

In the sample database, how many documents in the grades collection have a student_id less than 65?

Exercise: Inspection Result “Fail” & “Pass”

In the sample database, how many documents in the inspectionscollectionhaveresult “Pass” or “Fail”?

Exercise: View Count > 1000

In the stories collection, write a query to ﬁnd all stories where the view count is greater than 1000.

Exercise: Most comments

Find the news article that has the most comments in the storiescollection

Exercise: Television or Videos

Find all digg stories where the topic name is “Television” or the media type is “videos”. Skip the ﬁrst 5 results and

limit the result set to 10.

Exercise: News or Images

Query for all digg stories whose media type is either “news” or“images”andwherethetopicnameis“Comedy”.(For

extra practice, construct two queries using different sets of operators to do this.)

3.5 Updating Documents

Learning Objectives

Upon completing this module students should understand

•ThereplaceOne() method

•TheupdateOne() method

•TheupdateMany() method

•Therequiredparametersforthesemethods

•Fieldupdateoperators

•Arrayupdateoperators

•Theconceptofanupsertandusecases.

•ThefindOneAndReplace() and findOneAndUpdate() methods

The replaceOne() Method

•Takesonedocumentandreplacesitwithanother

–But leaves the _id unchanged

•Takestwoparameters:

–Amatchingdocument

–Areplacementdocument

•Thisis,insomesense,thesimplestformofupdate

First Parameter to replaceOne()

•RequiredparametersforreplaceOne()

–The query parameter:

*Use the same syntax as with find()

*Only the ﬁrst document found is replaced

•replaceOne() cannot delete a document

Second Parameter to replaceOne()

•Thesecondparameteristhereplacementparameter:

–The document to replace the original document

•The_idmuststaythesame

•Youmustreplacetheentiredocument

–You cannot modify just one ﬁeld

–Except for the _id

Example: replaceOne()

db.movies.insertOne( { title:"Batman" })

db.movies.find()

db.movies.replaceOne( { title :"Batman" }, { imdb_rating :7.7 })

db.movies.find()

db.movies.replaceOne( { imdb_rating:7.7 },

{ title

:"Batman", imdb_rating:7.7 })

db.movies.find()

db.movies.replaceOne( { }, { title:"Batman" })

db.movies.find() // back in original state

db.movies.replaceOne( { }, { _id :ObjectId() } )

The updateOne() Method

•MutateonedocumentinMongoDBusingupdateOne()

–Affects only the _ﬁrst_ document found

•Twoparameters:

–Aquerydocument

*same syntax as with find()

–Change document

*Operators specify the ﬁelds and changes

$set and $unset

•UsetospecifyﬁeldstoupdateforUpdateOne()

•Iftheﬁeldalreadyexists,using$set will change its value

–If not, $set will create it, set to the new value

•Onlyspeciﬁedﬁeldswillchange

•Alternatively,removeaﬁeldusing$unset

Example (Setup)

db.movies.insertMany( [

{

"title" :"Batman",

"category" :["action","adventure" ],

"imdb_rating" :7.6,

"budget" :35

{

"title" :"Godzilla",

"category" :["action",

"adventure","sci-fi" ],

"imdb_rating" :6.6

{

"title" :"Home Alone",

"category" :["family","comedy" ],

"imdb_rating" :7.4

}

])

Example: $set and $unset

db.movies.updateOne( { "title" :"Batman" },

{ $set :{"imdb_rating" :7.7 }})

db.movies.updateOne( { "title" :"Godzilla" },

{ $set :{"budget" :1}})

db.movies.updateOne( { "title" :"Home Alone" },

{ $set :{"budget" :15,

"imdb_rating" :5.5 }})

db.movies.updateOne( { "title" :"Home Alone" },

{ $unset :{"budget" :1}})

db.movies.find()

Update Operators

•$inc:Incrementaﬁeld’svaluebythespeciﬁedamount.

•$mul:Multiplyaﬁeld’svaluebythespeciﬁedamount.

•$rename:Renameaﬁeld.

•$set:Updateoneormoreﬁelds(alreadydiscussed).

•$unset Delete a ﬁeld (already discussed).

•$min:Updatestheﬁeldvaluetoaspeciﬁedvalueifthespeciﬁedvalue is less than the current value of the

ﬁeld

•$max:Updatestheﬁeldvaluetoaspeciﬁedvalueifthespeciﬁedvalue is greater than the current value of

the ﬁeld

•$currentDate:Setthevalueofaﬁeldtothecurrentdateortimestamp.

Example: Update Operators

db.movies.updateOne( { title:"Batman" }, { $inc:{"imdb_rating" :2}})

db.movies.updateOne( { title:"Home Alone" }, { $inc:{"budget" :5}})

db.movies.updateOne( { title:"Batman" }, { $mul:{"imdb_rating" :4}})

db.movies.updateOne( { title:"Batman" },

{ $rename

:{ budget:"estimated_budget" }})

db.movies.updateOne( { title:"Home Alone" }, { $min:{ budget:5}})

db.movies.updateOne( { title:"Home Alone" },

{ $currentDate :{ last_updated:{ $type:"timestamp" }}})

// increment movie rating by 1

db.movie_mentions.updateOne( { title:"Batman" },

{ $inc

:{"imdb_rating" :1}})

The updateMany() Method

•TakesthesameargumentsasupdateOne

•Updatesalldocumentsthatmatch

–updateOne stops after the ﬁrst match

–updateMany continues until it has matched all

Warning: Without an appropriate index, you may scan every document in the collection.

Example: updateMany()

// let’s start tracking the number of sequals for each movie

db.movies.updateOne( { }, { $set :{"sequels" :0}})

db.movies.find()

// we need updateMany to change all documents

db.movies.updateMany( { }, { $set :{"sequels" :0}})

db.movies.find()

Array Element Updates by Index

•Youcanusedotnotationtospecifyanarrayindex

•Youwillupdateonlythatelement

–Other elements will not be affected

Example: Update Array Elements by Index

// add a sample document to track mentions per hour

db.movie_mentions.insertOne(

{"title" :"E.T.",

"day" :ISODate("2015-03-27T00:00:00.000Z"),

"mentions_per_hour" :[0,0,0,0,0,0,0,

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

0,0]

})

// update all mentions for the fifth hour of the day

db.movie_mentions.updateOne(

{"title" :"E.T." },

{"$set" :{"mentions_per_hour.5" :2300 }})

Array Operators

•$push:Appendsanelementtotheendofthearray.

•$pushAll:Appendsmultipleelementstotheendofthearray.

•$pop:Removesoneelementfromtheendofthearray.

•$pull:Removesallelementsinthearraythatmatchaspeciﬁedvalue.

•$pullAll:Removesallelementsinthearraythatmatchanyofthespeciﬁed values.

•$addToSet:Appendsanelementtothearrayifnotalreadypresent.

Example: Array Operators

db.movies.updateOne(

{"title" :"Batman" },

{ $push :{"category" :"superhero" }})

db.movies.updateOne(

{"title" :"Batman" },

{ $pushAll :{"category" :["villain","comic-based" ]}})

db.movies.updateOne(

{"title" :"Batman" },

{ $pop :{"category" :1}})

db.movies.updateOne(

{"title" :"Batman" },

{ $pull :{"category" :"action" }})

db.movies.updateOne(

{"title" :"Batman" },

{ $pullAll :{"category" :["villain","comic-based" ]}})

The Positional $Operator

•$7is a positional operator that speciﬁes an element in an array to update.

•Itactsasaplaceholderfortheﬁrstelementthatmatchesthequerydocument.

•$replaces the element in the speciﬁed position with the value given.

•Example:

db.<COLLECTION>.updateOne(

{<array>:value ... },

{<update operator>:{"<array>.$" :value } }

)

7http://docs.mongodb.org/manual/reference/operator/update/postional

Example: The Positional $Operator

// the "action" category needs to be changed to "action-adventure"

db.movies.updateMany( { "category":"action",},

{ $set

:{"category.$" :"action-adventure" }})

Upserts

•Ifnodocumentmatchesawritequery:

–By default, nothing happens

–With upsert: true,insertsonenewdocument

*$setOnInsert will add ﬁelds only in the insert scenario

•WorksforupdateOne(),updateMany(),replaceOne()

•Syntax:

db.<COLLECTION>.updateOne( <query document>,

<update document>,

{ upsert

:true })

Upsert Mechanics

•Willupdateifdocumentsmatchingthequeryexist

•Willinsertifnodocumentsmatch

–Creates a new document using equality conditions in the querydocument

–Adds an _id if the query did not specify one

–Performs the write on the new document

•updateMany() will only create one document

–If none match, of course

Example: Upserts

db.movies.updateOne( { "title" :"Jaws" },

{ $inc

:{"budget" :5}},

{ upsert

:true })

db.movies.updateMany( { "title" :"Jaws II" },

{ $inc

:{"budget" :5}},

{ upsert

:true })

db.movies.replaceOne( { "title" :"E.T.","category" :["scifi" ]},

{"title" :"E.T.","category" :["scifi" ], "budget" :1},

{ upsert

:true })

save()

•Thedb.<COLLECTION>.save() method is syntactic sugar

–Similar to replaceOne(),queryingthe_id ﬁeld

–Upsert if _id is not in the collection

•Syntax:

db.<COLLECTION>.save( <document>)

Example: save()

•Ifthedocumentintheargumentdoesnotcontainan_id ﬁeld, then the save() method acts like

insertOne() method

–An ObjectId will be assigned to the _id ﬁeld.

•Ifthedocumentintheargumentcontainsan_id ﬁeld: then the save() method is equivalent to a replaceOne

with the query argument on _id and the upsert option set to true

// insert

db.movies.save( { "title" :"Beverly Hills Cops","imdb_rating" :7.3 })

// update with { upsert: true }

db.movies.save( { "_id" :1234,"title" :"Spider Man","imdb_rating" :7.3 })

Be careful with save()

Careful not to modify stale data when using save().Example:

db.movies.drop()

db.movies.insertOne( { "title" :"Jaws","imdb_rating" :7.3 })

db.movies.find( { "title" :"Jaws" })

// store the complete document in the application

doc =db.movies.findOne( { "title" :"Jaws" })

db.movies.updateOne( { "title" :"Jaws" }, { $inc:{"imdb_rating" :2}})

db.movies.find()

doc.imdb_rating =7.4

db.movies.save(doc) // just lost our incrementing of "imdb_rating"

db.movies.find()

findOneAndUpdate() and findOneAndReplace()

•Update(orreplace)onedocumentandreturnit

–By default, the document is returned pre-write

•Canreturnthestatebeforeoraftertheupdate

•Makesareadplusawriteatomic

•Canbeusedwithupserttoinsertadocument

findOneAndUpdate() and findOneAndReplace() Options

•Thefollowingareoptionalﬁeldsfortheoptionsdocument

•projection: <document> -selecttheﬁeldstosee

•sort: <document> -sorttoselecttheﬁrstdocument

•maxTimeoutMS: <number> -howlongtowait

–Returns an error, kills operation if exceeded

•upsert: <boolean> if true, performs an upsert

Example: findOneAndUpdate()

db.worker_queue.findOneAndUpdate(

{ state :"unprocessed" },

{ $set

:{"worker_id" :123,"state" :"processing" }},

{ upsert

:true })

findOneAndDelete()

•Notanupdateoperation,butﬁtsinwithfindOneAnd ...

•Returnsthedocumentanddeletesit.

•Example:

db.foo.drop();

db.foo.insertMany( [ { a :1}, { a :2}, { a :3}]);

db.foo.find(); // shows the documents.

db.foo.findOneAndDelete( { a :{ $lte :3}});

db.foo.find();

3.6 Lab: Updating Documents

Exercise: Pass Inspections

In the sample.inspections collection, let’s imagine that we want to do a little data cleaning.

We’ve decided to eliminate the “Completed” inspection result and use only “No Violation Issued” for such inspection

cases.

Please update all inspections accordingly.

Exercise: Set fine value

For all inspections that failed, set a fine value of 100.

Exercise: Increase fine in ROSEDALE

•Updateallinspectionsdoneinthecityof“ROSEDALE”.

•Forfailedinspections,raisethe“ﬁne”valueby150.

Exercise: Give a pass to “MongoDB”

•TodayMongoDBgotavisitfromtheinspectors.

•Wepassed,ofcourse.

•Sogoaheadandupdate“MongoDB”andsettheresult to “AWESOME” and give a corresponding certiﬁcate.

•Theinspectormaynothaveuploadedthebasicdetailsfor“MongoDB”, so ensure the update takes place even if

“MongoDB” isn’t in the collection

•MongoDB’sinformationis

business name:MongoDB

id:10407-2017-ENFO

address:

city:New York, zip:10036, street:43, number:229

Exercise: Updating Array Elements

Insert a document representing product metrics for a backpack:

db.product_metrics.insertOne(

{ name

:"backpack",

purchasesPast7Days:[0,0,0,0,0,0,0]})

Each 0 within the “purchasesPast7Days” ﬁeld corresponds to adayoftheweek. TheﬁrstelementisMonday,the

second element is Tuesday, etc.).

Write an update statement to increment the number of backpackssoldonFridayby200.

4 Indexes

Index Fundamentals (page 50) An introduction to MongoDB indexes

Lab: Basic Indexes (page 56) Ashortexerciseonthebasicofindexusage

Compound Indexes (page 57) Indexes on two or more ﬁelds

Lab: Optimizing an Index (page 62) Lab on optimizing a compound index

Multikey Indexes (page 63) Indexes on array ﬁelds

Hashed Indexes (page 67) Hashed indexes

Geospatial Indexes (page 68) Geospatial indexes: both those on legacy coordinate pairs and those supporting queries

that calculate geometries on an earth-like sphere.

Using Compass with Indexes (page 75) Using Compass to create a geospatial index

TTL Indexes (page 79) Time-To-Live indexes

Text Inde xes (page 81) Free text indexes on string ﬁelds

Partial Indexes (page 83) Partial indexes in MongoDB

Lab: Finding and Addressing Slow Operations (page 86) Lab on ﬁnding and addressing slow queries

Lab: Using explain() (page 87) Lab on using the explain operation to review execution stats

4.1 Index Fundamentals

Learning Objectives

Upon completing this module students should understand:

•Theimpactofindexingonreadperformance

•Theimpactofindexingonwriteperformance

•Howtochooseeffectiveindexes

•Theutilityofspeciﬁcindexesforparticularquerypatterns

Why Indexes?

{

x:8.5,

...

}

{

x:5,

...

}

{

x:17,

...

}

{

x:35,

...

}

... ...

{

x:25,

...

}

9 25

17 27 35

8.5 16 26 28 33 39 55

1 2 4 5 6

3 7 8

Index on x

Types of Indexes

•Single-ﬁeldindexes

•Compoundindexes

•Multikeyindexes

•Geospatialindexes

•Textindexes

Exercise: Using explain()

Let’s explore what MongoDB does for the following query by using explain().

We are projecting only user.name so that the results are easy to read.

db.tweets.find( { "user.followers_count" :1000 },

{"_id" :0,"user.name":1})

db.tweets.find( { "user.followers_count" :1000 } ).explain()

Results of explain()

With the default explain() verbosity, you will see results similar to the following:

{

"queryPlanner" :{

"plannerVersion" :1,

"namespace" :"twitter.tweets",

"indexFilterSet" :false,

"parsedQuery" :{

"user.followers_count" :{

"$eq" :1000

}

Results of explain() -Continued

"winningPlan" :{

"stage" :"COLLSCAN",

"filter" :{

"user.followers_count" :{

"$eq" :1000

}

"direction" :"forward"

"rejectedPlans" :[]

...

}

explain() Verbosity Can Be Adjusted

•default: determines the winning query plan but does not execute query

•executionStats: executes query and gathers statistics

•allPlansExecution: runs all candidate plans to completion and gathers statistics

explain("executionStats")

>db.tweets.find( { "user.followers_count" :1000 })

.explain("executionStats")

Now we have query statistics:

"executionStats" :{

"executionSuccess" :true,

"nReturned" :8,

"executionTimeMillis" :107,

"totalKeysExamined" :0,

"totalDocsExamined" :51428,

"executionStages" :{

"stage" :"COLLSCAN",

"filter" :{

"user.followers_count" :{

"$eq" :1000

}

explain("executionStats") -Continued

"nReturned" :8,

"executionTimeMillisEstimate" :100,

"works" :51430,

"advanced" :8,

"needTime" :51421,

"needFetch" :0,

"saveState" :401,

"restoreState" :401,

"isEOF" :1,

"invalidates" :0,

"direction" :"forward",

"docsExamined" :51428

}

...

}

explain("executionStats") Output

•nReturned :numberofdocumentsreturenedbythequery

•totalDocsExamined :numberofdocumentstouchedduringthequery

•totalKeysExamined :numberofindexkeysscanned

•AtotalKeysExamined or totalDocsExamined value much higher than nReturned indicates we

need a better index

•Based.explain() output, this query would beneﬁt from a better index

Other Operations

In addition to find(),weoftenwanttouseexplain() to understand how other operations will be handled.

•aggregate()

•count()

•group()

•update()

•remove()

•findAndModify()

•insert()

db.<COLLECTION>.explain()

db.<COLLECTION>.explain() returns an ExplainableCollection.

>var explainable =db.tweets.explain()

>explainable.find( { "user.followers_count" :1000 })

equivalent to

>db.tweets.explain().find( { "user.followers_count" :1000 })

also equivalent to

>db.tweets.find( { "user.followers_count" :1000 } ).explain()

Using explain() for Write Operations

Simulate the number of writes that would have occurred and determine the index(es) used:

>db.tweets.explain("executionStats").remove( { "user.followers_count" :1000 })

>db.tweets.explain("executionStats").update( { "user.followers_count" :1000 },

{ $set :{"large_following" :true } }, { multi:true })

Single-Field Indexes

•Single-ﬁeldindexesarebasedonasingleﬁeldofthedocuments in a collection.

•Theﬁeldmaybeatop-levelﬁeld.

•Youmayalsocreateanindexonﬁeldsinembeddeddocuments.

Creating an Index

The following creates a single-ﬁeld index on user.followers_count.

db.tweets.createIndex( { "user.followers_count" :1})

db.tweets.find( { "user.followers_count" :1000 } ).explain()

explain() indicated there will be a substantial performance improvement in handling this type of query.

Listing Indexes

List indexes for a collection:

db.tweets.getIndexes()

List index keys:

db.tweets.getIndexKeys()

Indexes and Read/Write Performance

•Indexesimprovereadperformanceforqueriesthataresupported by the index.

•InsertswillbeslowerwhenthereareindexesthatMongoDBmust also update.

•ThespeedofupdatesmaybeimprovedbecauseMongoDBwillnotneedtodoacollectionscantoﬁndtarget

documents.

•Anindexismodiﬁedanytimeadocument:

–Is inserted (applies to all indexes)

–Is deleted (applies to all indexes)

–Is updated in such a way that its indexed ﬁeld changes

Index Limitations

•Youcanhaveupto64indexespercollection.

•YoushouldNEVERbeanywhereclosetothatupperbound.

• Write performance will degrade to unusable at somewhere between 20-30.

Use Indexes with Care

•Everyqueryshoulduseanindex.

•Everyindexshouldbeusedbyaquery.

•Anywritethattouchesanindexedﬁeldwillupdateeveryindex that touches that ﬁeld.

•IndexesrequireRAM.

•Bemindfulaboutthechoiceofkey.

Additional Index Options

•Sparse

•Unique

•Background

Sparse Indexes in MongoDB

•Sparseindexesonlycontainentriesfordocumentsthathavetheindexedﬁeld.

db.<COLLECTION>.createIndex(

{ field_name :1},

{ sparse :true })

Deﬁning Unique Indexes

•Enforceauniqueconstraintontheindex

–On a per-collection basis

•Can’tinsertdocumentswithaduplicatevaluefortheﬁeld

–Or update to a duplicate value

•Noduplicatevaluesmayexistpriortodeﬁningtheindex

db.<COLLECTION>.createIndex(

{ field_name :1},

{ unique :true })

Building Indexes in the Background

•Buildingindexesinforegroundisablockingoperation.

•Backgroundindexcreationisnon-blocking,however,takeslongertobuild.

•Initiallylarger,orlesscompact,thananindexbuiltintheforeground.

db.<COLLECTION>.createIndex(

{ field_name :1},

{ background :true })

4.2 Lab: Basic Indexes

Exercise: Creating a Basic Index

•Beginbyimportingtheroutescollectionfromtheusbdriveinto a running mongod process

•Youshouldimport66985

# if no mongod running

mkdir -p data/db

mongod --port 30000 --dbpath data/db --logpath data/mongod.log --append --fork

# end if no mongod running

mongoimport --drop -d airlines -c routes routes.json

Executing a Query

•Withthedocumentsinserted,performthefollowingtwoqueries, ﬁnding all routes for Delta

db.routes.find({"airline.id":2009})

db.routes.find({"airline.id":2009}).explain("executionStats")

Creating an Index

•Createanindexontheroutes collection

•Theindexshouldbeonthe"airline.id" key, in descending order

•Rerunthequerywithexplain

•Verifythatthenewlycreatedindexsupportsthequery

4.3 Compound Indexes

Learning Objectives

Upon completing this module students should understand:

• What a compound index is.

•Howcompoundindexesarecreated.

•Theimportanceofconsideringﬁeldorderwhencreatingcompound indexes.

•Howtoefﬁcientlyhandlequeriesinvolvingsomecombination of equality matches, ranges, and sorting.

•Somelimitationsoncompoundindexes.

Introduction to Compound Indexes

•Itiscommontocreateindexesbasedonmorethanoneﬁeld.

•Thesearecalledcompound indexes.

•Youmayuseupto31ﬁeldsinacompoundindex.

•Youmaynotusehashedindexﬁelds.

The Order of Fields Matters

Speciﬁcally we want to consider how the index will be used for:

•Equalitytests,e.g.,

db.movies.find( { "budget" :7,"imdb_rating" :8})

•Rangequeries,e.g.,

db.movies.find( { "budget" :10,"imdb_rating" :{$lt:9}})

•Sorting,e.g.,

db.movies.find( { "budget" :10,"imdb_rating" :6}

).sort( { "imdb_rating" :-1})

Designing Compound Indexes

•Let’slookatsomeguidingprinciplesforbuildingcompoundindexes.

•Thesewillgenerallyproduceagoodifnotoptimalindex.

•Youcanoptimizeafteralittleexperimentation.

•Wewillexplorethisinthecontextofarunningexample.

Example: A Simple Message Board

Requirements:

•Findallmessagesinaspeciﬁedtimestamprange.

•Selectforwhetherthemessagesareanonymousornot.

•Sortbyratingfromhighesttolowest.

Load the Data

a=[{"timestamp" :1,"username" :"anonymous","rating" :3},

{"timestamp" :2,"username" :"anonymous","rating" :5},

{"timestamp" :3,"username" :"sam","rating" :1},

{"timestamp" :4,"username" :"anonymous","rating" :2},

{"timestamp" :5,"username" :"martha","rating" :5}]

db.messages.insertMany(a)

Start with a Simple Index

Start by building an index on { timestamp : 1 }

db.messages.createIndex( { timestamp :1}, { name :"myindex" })

Now let’s query for messages with timestamp in the range 2 through 4 inclusive.

db.messages.find( { timestamp :{ $gte :2, $lte :4} } ).explain("executionStats")

Analysis:

•Explainplanshowsgoodperformance,i.e.totalKeysExamined =n.

•However,thisdoesnotsatisfyourquery.

•Needtoqueryagainwith{username: "anonymous"} as part of the query.

Query Adding username

Let’s add the user ﬁeld to our query.

db.messages.find( { timestamp :{ $gte :2, $lte :4},

username :"anonymous" } ).explain("executionStats")

totalKeysExamined >n.

Include username in Our Index

db.messages.dropIndex( "myindex" );

db.messages.createIndex( { timestamp :1, username :1},

{ name :"myindex" })

db.messages.find( { timestamp :{ $gte :2, $lte :4},

username :"anonymous" } ).explain("executionStats")

totalKeysExamined is still > n. Why?

totalKeysExamined >n

timestamp username

1“anonymous”

2“anonymous”

3“sam”

4“anonymous”

5“martha”

ADifferentCompoundIndex

Drop the index and build a new one with user.

db.messages.dropIndex( "myindex" );

db.messages.createIndex( { username :1, timestamp :1},

{ name :"myindex" })

db.messages.find( { timestamp :{ $gte :2, $lte :4},

username :"anonymous" } ).explain("executionStats")

totalKeysExamined is 2. nis 2.

totalKeysExamined == n

username timestamp

“anonymous” 1

“anonymous” 2

“anonymous” 4

“sam” 2

“martha” 5

Let Selectivity Drive Field Order

•Orderﬁeldsinacompoundindexfrommostselectivetoleastselective.

•Usually,thismeansequalityﬁeldsbeforerangeﬁelds.

• When dealing with multiple equality values, start with the most selective.

•Ifacommonrangequeryismoreselectiveinstead(rare),specify the range component ﬁrst.

Adding in the Sort

Finally, let’s add the sort and run the query

db.messages.find( {

timestamp :{ $gte :2, $lte :4},

username :"anonymous"

} ).sort( { rating :-1} ).explain("executionStats");

•NotethatthewinningPlan includes a SORT stage

•ThismeansthatMongoDBhadtoperformasortinmemory

•Inmemorysortsoncandegradeperformancesigniﬁcantly

–Especially if used frequently

–In-memory sorts that use > 32 MB will abort

In-Memory Sorts

Let’s modify the index again to allow the database to sort for us.

db.messages.dropIndex( "myindex" );

db.messages.createIndex( { username :1, timestamp :1, rating :1},

{ name :"myindex" });

db.messages.find( {

timestamp :{ $gte :2, $lte :4},

username :"anonymous"

} ).sort( { rating :-1} ).explain("executionStats");

•Theexplainplanremainsunchanged,becausethesortﬁeldcomes after the range ﬁelds.

•Theindexdoesnotstoreentriesinorderbyrating.

•Notethatthisrequiresustoconsideratradeoff.

Avoiding an In-Memory Sort

Rebuild the index as follows.

db.messages.dropIndex( "myindex" );

db.messages.createIndex( { username :1, rating :1, timestamp :1},

{ name :"myindex" });

db.messages.find( {

timestamp :{ $gte :2, $lte :4},

username :"anonymous"

} ).sort( { rating :-1} ).explain("executionStats");

•Wenolongerhaveanin-memorysort,butneedtoexaminemorekeys.

•totalKeysExamined is 3 and and nis 2.

•Thisisthebestwecandointhissituationandthisisﬁne.

•However,iftotalKeysExamined is much larger than n,thismightnotbethebestindex.

No need for stage : SORT

username rating timestamp

“anonymous” 2 4

“anonymous” 3 1

“anonymous” 5 2

“sam” 1 2

“martha” 5 5

General Rules of Thumb

•Equalitybeforerange

•Equalitybeforesorting

•Sortingbeforerange

Covered Queries

• When a query and projection include only the indexed ﬁelds, MongoDB will return results directly from the

index.

•Thereisnoneedtoscananydocumentsorbringdocumentsintomemory.

•Thesecoveredqueriescanbeveryefﬁcient.

Exercise: Covered Queries

db.testcol.drop()

for (i=1;i<=20;i++){

db.testcol.insertOne({ "_id" :i, "title" :i, "name" :i,

"rating" :i, "budget" :i})

};

db.testcol.createIndex( { "title" :1,"name" :1,"rating" :1})

// Not covered because _id is present.

db.testcol.find( { "title" :3},

{"title" :1,"name" :1,"rating" :1}

).explain("executionStats")

// Not covered because other fields may exist in matching docs.

db.testcol.find( { "title" :3},

{"_id" :0,"budget" :0} ).explain("executionStats")

// Covered query!

db.testcol.find( { "title" :3},

{"_id" :0,"title" :1,"name" :1,"rating" :1}

).explain("executionStats")

4.4 Lab: Optimizing an Index

Exercise: What Index Do We Need?

Run the the following Javascript ﬁle from the handouts.

mongo --shell localhost/performance performance.js

In the shell that launches execute the following method

performance.init()

The method above will build a sample data set in the “sensor_readings” collection. What index is needed for this

query?

db.sensor_readings.find( { tstamp:{ $gte:ISODate("2012-08-01"),

$lte:ISODate("2012-09-01")},

active:true } ).limit(3)

Exercise: Avoiding an In-Memory Sort

What index is needed for the following query to avoid an in-memory sort?

db.sensor_readings.find( { active:true } ).sort( { tstamp :-1})

Exercise: Avoiding an In-Memory Sort, 2

What index is needed for the following query to avoid an in-memory sort?

db.sensor_readings.find(

{x:{$in:[100,200,300,400]}}

).sort( { tstamp :-1})

4.5 Multikey Indexes

Learning Objectives

Upon completing this module, students should understand:

• What a multikey index is

• When MongoDB will use a multikey index to satisfy a query

•Howmultikeyindexeswork

•Howmultikeyindexeshandlesorting

•Somelimitationsonmultikeyindexes

Introduction to Multikey Indexes

•Amultikeyindexisanindexonanarray.

•Anindexentryiscreatedoneachvaluefoundinthearray.

•Multikeyindexescansupportprimitives,documents,orsub-arrays.

•Thereisnothingspecialthatyouneedtodotocreateamultikey index.

•YoucreatethemusingcreateIndex() just as you would with an ordinary single-ﬁeld index.

•Ifthereisanarrayasavalueforanindexedﬁeld,theindexwill be multikey on that ﬁeld.

Example: Array of Numbers

db.race_results.drop()

db.race_results.createIndex( { "lap_times" :1})

a=[{"lap_times" :[3,5,2,8]},

{"lap_times" :[1,6,4,2]},

{"lap_times" :[6,3,3,8]}]

db.race_results.insertMany( a )

// Used the index

db.race_results.find( { lap_times :1} ).explain()

// One document found.

// Index not used, because it is naive to position.

db.race_results.find( { "lap_times.2" :3} ).explain()

Exercise: Array of Documents, Part 1

Create a collection and add an index on the comments.rating ﬁeld:

db.blog.drop()

b=[{"comments" :[

{"name" :"Bob","rating" :1},

{"name" :"Frank","rating" :5.3 },

{"name" :"Susan","rating" :3}]},

{"comments" :[

{ name :"Megan","rating" :1}]},

{"comments" :[

{"name" :"Luke","rating" :1.4 },

{"name" :"Matt","rating" :5},

{"name" :"Sue","rating" :7}]}]

db.blog.insertMany(b)

db.blog.createIndex( { "comments" :1})

// vs

db.blog.createIndex( { "comments.rating" :1})

// for this query

db.blog.find( { "comments.rating" :5})

Exercise: Array of Documents, Part 2

For each of the three queries below:

•Howmanydocumentswillbereturned?

• Will it use our multi-key index? Why or why not?

•Ifaquerywillnotusetheindex,whichindexwillituse?

db.blog.find( { "comments" :{"name" :"Bob","rating" :1}})

db.blog.find( { "comments" :{"rating" :1}})

db.blog.find( { "comments.rating" :1})

Exercise: Array of Arrays, Part 1

Add some documents and create an index simulating a player in agamemovingonanX,Ygrid.

db.player.drop()

db.player.createIndex( { "last_moves" :1})

c=[{"last_moves" :[[1,2], [ 2,3], [ 3,4]]},

{"last_moves" :[[3,4], [ 4,5]]},

{"last_moves" :[[4,5], [ 5,6]]},

{"last_moves" :[[3,4]]},

{"last_moves" :[[4,5]]}]

db.player.insertMany(c)

db.player.find()

Exercise: Array of Arrays, Part 2

For each of the queries below:

•Howmanydocumentswillbereturned?

• Does the query use the multi-key index? Why or why not?

•Ifthequerydoesnotusetheindex,whatisanindexitcoulduse?

db.player.find( { "last_moves" :[3,4]})

db.player.find( { "last_moves" :3})

db.player.find( { "last_moves.1" :[4,5]})

db.player.find( { "last_moves.2" :[2,3]})

How Multikey Indexes Work

•Eacharrayelementisgivenoneentryintheindex.

•Soanarraywith17elementswillhave17entries–oneforeachelement.

•Multikeyindexescantakeupmuchmorespacethanstandardindexes.

Multikey Indexes and Sorting

•Ifyousortusingamultikeyindex:

–Adocumentwillappearattheﬁrstpositionwhereavaluewouldplacethedocument.

–It will not appear multiple times.

•Thisappliestoarrayvaluesgenerally.

•Itisnotaspeciﬁcpropertyofmultikeyindexes.

Exercise: Multikey Indexes and Sorting

db.testcol.drop()

a=[{x:[1,11 ]},{x:[2,10 ]},{x:[3]},

{x:[4]},{x:[5]}]

db.testcol.insert(a)

db.testcol.createIndex( { x :1})

// x : [ 1, 11 ] array comes first. It contains the lowest value.

db.testcol.find().sort( { x :1})

// x : [ 1, 11 ] array still comes first. Contains the highest value.

db.testcol.find().sort( { x :-1})

Limitations on Multikey Indexes

•Youcannotcreateacompoundindexusingmorethanonearray-valued ﬁeld.

•Thisisbecauseofthecombinatorics.

•Foracompoundindexontwoarray-valuedﬁeldsyouwouldendup with N * M entries for one document.

•Youcannothaveahashedmultikeyindex.

•Youcannothaveashardkeyuseamultikeyindex.

•Wediscussshardkeysinanothermodule.

•Theindexonthe_id ﬁeld cannot become a multikey index.

Example: Multikey Indexes on Multiple Fields

db.testcol.drop()

db.testcol.createIndex( { x :1,y:1})

// no problems yet

db.testcol.insertOne( { _id :1,x:1,y:1})

// still OK

db.testcol.insertOne( { _id :2,x:[1,2], y :1})

// still OK

db.testcol.insertOne( { _id :3,x:1,y:[1,2]})

// Won’t work

db.testcol.insertOne( { _id :4,x:[1,2], y :[1,2]})

4.6 Hashed Indexes

Learning Objectives

Upon completing this module, students should understand:

• What a hashed index is

• When to use a hashed index

What is a Hashed Index?

•Hashedindexesarebasedonﬁeldvalueslikeanyotherindex.

•Thedifferenceisthatthevaluesarehashedanditisthehashed value that is indexed.

•Thehashingfunctioncollapsessub-documentsandcomputesthehashfortheentirevalue.

•MongoDBcanusethehashedindextosupportequalityqueries.

•Hashedindexesdonotsupportmulti-keyindexes,i.e.indexes on array ﬁelds.

•Hashedindexesdonotsupportrangequeries.

Why Hashed Indexes?

•InMongoDB,theprimaryuseforhashedindexesistosupportsharding a collection using a hashed shard key.

•Insomecases,theﬁeldwewouldliketousetosharddatawouldmakeitdifﬁculttoscaleusingsharding.

•Usingahashedshardkeytoshardacollectionensuresanevendistributionofdataandovercomesthisproblem.

•See

Shard a Collection Using a Hashed Shard Key8for more details.

•Wediscussshardingindetailinanothermodule.

Limitations

•Youmaynotcreatecompoundindexesthathavehashedindexﬁelds.

•Youmaynotspecifyauniqueconstraintonahashedindex.

•Youcancreatebothahashedindexandanon-hashedindexonthe same ﬁeld.

8http://docs.mongodb.org/manual/tutorial/shard-collection-with-a-hashed-shard-key/

Floating Point Numbers

•MongoDBhashedindexestruncateﬂoatingpointnumbersto64-bit integers before hashing.

•Donotuseahashedindexforﬂoatingpointnumbersthatcannot be reliably converted to 64-bit integers.

•MongoDBhashedindexesdonotsupportﬂoatingpointvalueslarger than 253.

Creating a Hashed Index

Create a hashed index using an operation that resembles the following. This operation creates a hashed index for the

active collection on the aﬁeld.

db.active.createIndex( { a:"hashed" })

4.7 Geospatial Indexes

Learning Objectives

Upon completing this module, students should understand:

•Usecasesofgeospatialindexes

•Thetwotypesofgeospatialindexes

•Howtocreate2dgeospatialindexes

•Howtoqueryfordocumentsinaregion

•Howtocreate2dsphereindexes

•TypesofGeoJSONobjects

•Howtoqueryusing2dsphereindexes

Introduction to Geospatial Indexes

We can use geospatial indexes to quickly determine geometricrelationships:

•Allpointswithinacertainradiusofanotherpoint

• Whether or not points fall within a polygon

• Whether or not two polygons intersect

Easiest to Start with 2 Dimensions

•Initially,itiseasiesttothinkaboutgeospatialindexesin two dimensions.

•OnetypeofgeospatialindexinMongoDBisaﬂat2dindex.

•Withageospatialindexwecan,forexample,searchfornearby items.

•Thisisthetypeofservicethatmanyphoneappsprovidewhen,say,searchingforanearbycafe.

•WemighthaveaquerylocationidentiﬁedbyanXina2dcoordinate system.

Location Field

•Ageospatialindexisbasedonalocationﬁeldwithindocuments in a collection.

•Thestructureoflocationvaluesdependsonthetypeofgeospatial index.

•Wewillgointomoredetailonthisinafewminutes.

•WecanidentifyotherdocumentsinthiscollectionwithXsinour2dcoordinatesystem.

Find Nearby Documents

•Ageospatialindexenablesustoefﬁcientlyqueryacollection based on geometric relationships between docu-

ments and the query.

•Forexample,wecanquicklylocatealldocumentswithinacertain radius of our query location.

•Inthisexample,we’veillustrateda$near query in a 2d geospatial index.

Flat vs. Spherical Indexes

There are two types of geospatial indexes:

•Flat,madewitha2d index

•Two-dimensionalspherical,madewiththe2dsphere index

–Takes into account the curvature of the earth

–Joins any two points using a geodesic or “great circle arc”

–Deviates from ﬂat geometry as you get further from the equator, and as your points get further apart

Flat Geospatial Index

•ThisisaCartesiantreatmentofcoordinatepairs.

•E.g.,theindexwouldnotreﬂectthefactthattheshortestpath from Canada to Siberia is over the North Pole (if

units are degrees).

•2dindexescanbeusedtodescribeanyﬂatsurface.

•Recommendedif:

–You have l egacy c o o r d i n a t e p a i r s ( M o n g o D B 2 . 2 o r e a r l i e r ) .

–You do not plan to use GeoJSON objects such as LineStr i n g s o r Polygons.

–You are not going to use points fa r e n o u g h N o r t h or South to wo r ry about the Earth’s curvature.

Spherical Geospatial Index

•SphericalindexesmodelthecurvatureoftheEarth

•IfyouwanttoplottheshortestpathfromtheKlondiketoSiberia, this will know to go over the North Pole.

•SphericalindexesuseGeoJSONobjects(Points,LineString, and Polygons)

•CoordinatepairsareconvertedintoGeoJSONPoints.

Creating a 2d Index

Creating a 2d index:

db.<COLLECTION>.createIndex(

{ field_name :"2d",<optional additional field>:<value>},

{<optional options document>})

Possible options key-value pairs:

•min : <lower bound>

•max : <upper bound>

•bits : <bits of precision for geohash>

Exercise: Creating a 2d Index

Create a 2d index on the collection testcol with:

•Aminvalueof-20

•Amaxvalueof20

•10bitsofprecision

•Theﬁeldindexedshouldbexy.

Inserting Documents with a 2d Index

There are two accepted formats:

•Legacycoordinatepairs

•Documentwiththefollowingﬁeldsspeciﬁed:

–lng (longitude)

–lat (latitude)

Exercise: Inserting Documents with 2d Fields

•Insert2documentsintothe‘twoD’collection.

•Assign2dcoordinatevaluestothexy ﬁeld of each document.

•Longitudevaluesshouldbe-3and3respectively.

•Latitudevaluesshouldbe0and0.4respectively.

Querying Documents Using a 2d Index

•Use$near to retrieve documents close to a given point.

•Use$geoWithin to ﬁnd documents with a shape contained entirely within the query shape.

•Usethefollowingoperatorstospecifyaqueryshape:

–$box

–$polygon

–$center (circle)

Example: Find Based on 2d Coords

Write a query to ﬁnd all documents in the testcol collection that have an xy ﬁeld value that falls entirely within the

circle with center at [ -2.5, -0.5 ] and a radius of 3.

db.testcol.find( { xy :{ $geoWithin :{ $center :[[-2.5,-0.5 ], 3]}}}

Creating a 2dsphere Index

You can index o n e o r m o r e 2 d s p h e r e ﬁ e l d s i n a n i n d ex.

db.<COLLECTION>.createIndex( { <location field>:"2dsphere" })

The GeoJSON Speciﬁcation

•TheGeoJSONformatencodeslocationdataontheearth.

•Thespecisathttp://geojson.org/geojson-spec.html

•ThisspecisincorporatedinMongoDB2dsphereindexes.

•ItincludesPoint,LineString,Polygon,andcombinationsof these.

GeoJSON Considerations

•Thecoordinatesofpointsaregivenindegrees(longitudethen latitude).

•TheLineStringthatjoinstwopointswillalwaysbeageodesic.

•Shortlines(aroundafewhundredkilometersorless)willgoaboutwhereyouwouldexpectthemto.

•PolygonsaremadeofaclosedsetofLineStrings.

Simple Types of 2dsphere Objects

Point:Asinglepointontheglobe

{<field_name>:{ type :"Point",

coordinates :[<longitude>,<latitude>]}}

LineString:AgeodesiclinethatisdeﬁnedbyitstwoendPoints

{<field_name>:{ type :"LineString",

coordinates :[[<longitude 1>,<latitude 1>],

[<longitude 2>,<latitude 2>],

...,

[<longitude n>,<latitude n>]]}}

Polygons

Simple Polygon:

{<field_name>:{ type :"Polygon",

coordinates :[[[<Point1 coordinate pair>],

[<Point2 coordinate pair>],

...

[<Point1 coordinate pair again>]]

}}

Polygon with One Hole:

{<field_name>:{ type :"Polygon",

coordinates :[[<Points that define outer polygon>],

[<Points that define inner polygon>]]

}}

Other Types of 2dsphere Objects

•MultiPoint:OneormorePointsinonedocument

•MultiLine:OneormoreLineStringsinonedocument

•MultiPolygon:OneormorePolygonsinonedocument

•GeometryCollection:OneormoreGeoJSONobjectsinonedocument

Exercise: Inserting GeoJSON Objects (1)

Create a coordinate pair for each the following airports. Create one variable per airport.

•LaGuardia(NewYork):40.7772°N,73.8726°W

•JFK(NewYork):40.6397°N,73.7789°W

•Newark(NewYork):40.6925°N,74.1686°W

•Heathrow(London):52.4775°N,0.4614°W

•Gatwick(London):51.1481°N,0.1903°W

•Stansted(London):51.8850°N,0.2350°E

•Luton(London):51.9000°N,0.4333°W

Exercise: Inserting GeoJSON Objects (2)

•Nowlet’smakearraysofthese.

•PutalltheNewYorkareaairportsintoanarraycallednyPorts.

•PutalltheLondonareaairportsintoanarraycalledlondonPorts.

•Createathirdarrayforﬂightnumbers:“AA4453”,“VA3333”,“UA2440”.

Exercise: Inserting GeoJSON Objects (3)

•CreatedocumentsforeverypossibleNewYorktoLondonﬂight.

•IncludeaflightNumber ﬁeld for each ﬂight.

Exercise: Creating a 2dsphere Index

•Createtwoindexesonthecollectionflights.

•Maketheﬁrstacompoundindexontheﬁelds:

–origin

–destination

–flightNumber

•Specify2dsphereindexesonbothorigin and destination.

•Specifyasimpleindexonname.

•Makethesecondindexjusta2dsphereindexondestination.

Querying 2dsphere Objects

$geoNear:Findsallpoints,ordersthembydistancefromaposition.

{<field name>:{ $near :{ $geometry :{

type :"Point",

coordinates :[ lng, lat ] },

$maxDistance :<meters>}}}}

$near:Justlike$geoNear,exceptinveryedgecases;checkthedocs.

$geoWithin:Onlyreturnsdocumentswithalocationcompletelycontained within the query.

$geoIntersects:Returnsdocumentswiththeirindexedﬁeldintersectinganypartoftheshapeinthequery.

4.8 Using Compass with Indexes

Learning Objectives

Upon completing this module, students should understand:

•HowtoviewindexusagewithCompass

•HowtocreateindexeswithCompass

Introduction

•CompassprovidesauserfriendlyinterfaceforinteractingwithMongoDB

•IfyouareunfamiliarwithCompass,clickbelowforahighlevel overview

/modules/compass

Execute a GeoJSON query with Compass

•Importthetrips.json dataset into a database called citibike and a collection called trips

•ExecuteageoSpatialqueryﬁndingalltripsthat

–Begin within a 1.2 mile radius (1.93 kilometers) of the middleofCentralPark:

*[-73.97062540054321,40.776398033956916]

–End within a 0.25 mile radius (.40 kilometers) of Madison Square Park:

*[-73.9879247077942, 40.742201076382784]

Execute Query (cont)

•Importingthedata

mongoimport --drop -d citibike -c trips trips.json

•InCompass,executingthequery

{

"start station location":{"$geoWithin":{"$centerSphere":[

[-73.97062540054321, 40.776398033956916 ],0.000302786 ]}},

"end station location":{"$geoWithin":{"$centerSphere":[

[-73.9879247077942, 40.742201076382784 ],0.00006308 ]}}

}

GeoJSON Query Example

GeoJSON Query Explain Plan

GeoJSON Query Explain Detail

Query Explain (cont)

•Ourexplainvisualizeristellinguskeydetails

–Documents returned, index keys examined, documents examined

–Query execution time, sorting information, and if an index was available

–Avisualizationofthequeryplan

Creating an Index Using Compass

•NavigatetotheIndexestab

•Createanewindexnamedgeospatial_start_end

•Selectthe“startstationlocation”ﬁeldandchoose2dsphere

•Addanotherﬁeld

•Selectthe“end station location ﬁeld and choose 2dsphere

•Click“Create”

The Index Tab

Creating an Index Example

Verifying the Index

•NavigatetotheSchema tab

•Resetthequerybar,andthenre-runourgeoquery

•NavigatetotheExplain tab

{

"start station location":{"$geoWithin":{"$centerSphere":[

[-73.97062540054321, 40.776398033956916 ],0.000302786 ]}},

"end station location":{"$geoWithin":{"$centerSphere":[

[-73.9879247077942, 40.742201076382784 ],0.00006308 ]}}

}

Index Performance

4.9 TTL Indexes

Learning Objectives

Upon completing this module students should understand:

•HowtocreateaTTLindex

• When a TTL indexed document will get deleted

•LimitationsofTTLindexes

TTL Index Basics

•TTLisshortfor“TimeToLive”.

•TTLindexesmustbebasedonaﬁeldoftypeDate (including ISODate)orTimestamp.

•AnydocumentwithaDate value older than expireAfterSeconds in the targeted ﬁeld of the index, will

get deleted at some point.

Creating a TTL Index

Create with:

db.<COLLECTION>.createIndex( { field_name :1},

{ expireAfterSeconds :some_number } )

Exercise: Creating a TTL Index

Let’s create a TTL index on the ttl collection that will delete documents older than 30 seconds. Write a script that

will insert documents at a rate of one per second.

db.sessions.drop()

db.sessions.createIndex( { "last_user_action" :1},

{"expireAfterSeconds" :30 })

i=0

while (true){

i+= 1;

db.sessions.insertOne( { "last_user_action" :ISODate(), "b" :i});

sleep(1000); // Sleep for 1 second

}

Exercise: Check the Collection

Then, leaving that window open, open up a new terminal and connect to the database with the mongo shell. This will

allow us to verify the TTL behavior.

// look at the output and wait. After a ramp-up of up to a minute or so,

// count() will be reset to 30 once/minute.

while (true){

print(db.sessions.count());

sleep(100);

}

4.10 Text Indexes

Learning Objectives

Upon completing this module, students should understand:

•Thepurposeofatextindex

•Howtocreatetextindexes

•Howtosearchusingtextindexes

•Howtoranksearchresultsbyrelevancescore

What is a Text Index?

•Atextindexisbasedonthetokens(words,etc.)usedinstring ﬁelds.

•MongoDBsupportstextsearchforanumberoflanguages.

•Textindexesdroplanguage-speciﬁcstopwords(e.g.inEnglish “the”, “an”, “a”, “and”, etc.).

•Textindexesusesimple,language-speciﬁcsufﬁxstemming(e.g., “running” to “run”).

Creating a Text Index

You create a tex t i n d ex a little bit diff e r e n t l y t h a n y o u c r e a te a standard index.

db.<COLLECTION>.createIndex( { <field name>:"text" })

Exercise: Creating a Text Index

Create a text index on the “dialog” ﬁeld of the montyPython collection.

db.montyPython.createIndex( { dialog :"text" })

Creating a Text Index with Weighted Fields

•Thedefaultweightis1foreachindexedﬁeld.

•Theweightisrelativetootherweightsinatextindex.

db.<COLLECTION>.createIndex(

{"title" :"text","keywords":"text","author" :"text" },

{"weights" :{

"title" :10,

"keywords" :5

}})

•Termmatchin“title”ﬁeldhas10times(i.e.10:1)theimpactasatermmatchinthe“author”ﬁeld.

Text Indexes are Similar to Multikey Indexes

•Continuingourexample,youcantreatthedialog ﬁeld as a multikey index.

•Amultikeyindexwitheachofthewordsindialog as values.

•Youcanquerytheﬁeldusingthe$text operator.

Exercise: Inserting Texts

Let’s add some documents to our montyPython collection.

db.montyPython.insertMany( [

{_id:1,

dialog :"What is the air-speed velocity of an unladen swallow?" },

{_id:2,

dialog :"What do you mean? An African or a European swallow?" },

{_id:3,

dialog :"Huh? I... I don’t know that." },

{_id:45,

dialog :"You’re using coconuts!" },

{_id:55,

dialog :"What? A swallow carrying a coconut?" }])

Querying a Text Index

Next, let’s query the collection. The syntax is:

db.<COLLECTION>.find( { $text :{ $search :"query terms go here" }})

Exercise: Querying a Text Index

Using the text index, ﬁnd all documents in the montyPython collection with the word “swallow” in it.

// Returns 3 documents.

db.montyPython.find( { $text :{ $search :"swallow" }})

Exercise: Querying Using Two Words

•FindalldocumentsinthemontyPythoncollectionwitheither the word ‘coconut’ or ‘swallow’.

•BydefaultMongoDBORsquerytermstogether.

•E.g.,ifyouqueryontwowords,resultsincludedocumentsusing either word.

// Finds 4 documents, 3 of which contain only one of the two words.

db.montyPython.find( { $text :{ $search :"coconut swallow" }})

Search for a Phrase

•Tomatchanexactphrase,includesearchtermsinquotes(escaped).

•Thefollowingqueryselectsdocumentscontainingthephrase “European swallow”:

db.montyPython.find( { $text:{ $search:"\"European swallow\"" }})

Text Search Score

•Thesearchalgorithmassignsarelevancescoretoeachsearch result.

•Thescoreisgeneratedbyavectorrankingalgorithm.

•Thedocumentscanbesortedbythatscore.

db.<COLLECTION>.find(

{ $text :{ $search :"swallow coconut"}},

{ textScore

:{$meta :"textScore" }}

).sort(

{ textScore

:{ $meta:"textScore" }}

))

4.11 Partial Indexes

Learning Objectives

Upon completing this module, students should be able to:

•Outlinehowpartialindexeswork

•Distinguishpartialindexesfromsparseindexes

•Listanddescribetheusecasesforpartialindexes

•Createandusepartialindexes

What are Partial Indexes?

•Indexeswithkeysonlyforthedocumentsinacollectionthatmatchaﬁlterexpression.

•Relativetostandardindexes,beneﬁtsinclude:

–Lower storage requirements

*On disk

*In memory

–Reduced performance costs for index maintenance as writes occur

Creating Partial Indexes

•Createapartialindexby:

–Calling db.collection.createIndex()

–Passing the partialFilterExpression option

•YoucanspecifyapartialFilterExpression on any MongoDB index type.

•Filterdoesnotneedtobeonindexedﬁelds,butitcanbe.

Example: Creating Partial Indexes

•Considerthefollowingschema:

{"_id" :7,"integer" :7,"importance" :"high" }

•Createapartialindexonthe“integer”ﬁeld

•Createitonlywhere“importance”is“high”

Example: Creating Partial Indexes (Continued)

db.integers.createIndex(

{ integer :1},

{ partialFilterExpression :{ importance :"high" },

name :"high_importance_integers" })

Filter Conditions

•AsthevalueforpartialFilterExpression,specifyadocumentthatdeﬁnestheﬁlter.

•Thefollowingtypesofexpressionsaresupported.

•Usetheseincombinationsthatareappropriateforyourusecase.

•Yourﬁltermaystipulateconditionsonmultipleﬁelds.

–equality expressions

–$exists:trueexpression

–$gt,$gte,$lt,$lte expressions

–$type expressions

–$and operator at the top-level only

Partial Indexes vs. Sparse Indexes

•Bothsparseindexesandpartialindexesincludeonlyasubset of documents in a collection.

•Sparseindexesreferenceonlydocumentsforwhichatleastone of the indexed ﬁelds exist.

•Partialindexesprovidearicherwayofspecifyingwhatdocuments to index than does sparse indexes.

db.integers.createIndex(

{ importance :1},

{ partialFilterExpression :{ importance :{ $exists :true }}}

)// similar to a sparse index

Quiz

Which documents in a collection will be referenced by a partialindexonthatcollection?

Identifying Partial Indexes

>db.integers.getIndexes()

[

...,

{

"v" :1,

"key" :{

"integer" :1

"name" :"high_importance_integers",

"ns" :"test.integers",

"partialFilterExpression" :{

"importance" :"high"

}

...

]

Partial Indexes Considerations

•Notusedwhen:

–The indexed ﬁeld is not in the query

–Aquerygoesoutsideoftheﬁlterrange,evenifnodocumentsare out of range

•Youcan.explain() queries to check index usage

Quiz

Consider the following partial index. Note the partialFilterExpression in particular:

{

"v" :1,

"key" :{

"score" :1,

"student_id" :1

"name" :"score_1_student_id_1",

"ns" :"test.scores",

"partialFilterExpression" :{

"score" :{

"$gte" :0.65

"subject_name" :"history"

}

Quiz (Continued)

Which of the following documents are indexed?

{"_id" :1,"student_id" :2,"score" :0.84,"subject_name" :"history" }

{"_id" :2,"student_id" :3,"score" :0.57,"subject_name" :"history" }

{"_id" :3,"student_id" :4,"score" :0.56,"subject_name" :"physics" }

{"_id" :4,"student_id" :4,"score" :0.75,"subject_name" :"physics" }

{"_id" :5,"student_id" :3,"score" :0.89,"subject_name" :"history" }

4.12 Lab: Finding and Addressing Slow Operations

Set Up

•Inthisexerciselet’sbringupamongo shell with the following instructions

mongo --shell localhost/performance performance.js

In the shell that launches execute the following method

performance.init()

Exercise: Determine Indexes Needed

•Inamongoshellrunperformance.b().Thiswillruninaninﬁniteloopprintingsomeoutputasitruns

various statements against the server.

•Nowimaginewehavedetectedaperformanceproblemandsuspect there is a slow operation running.

•Findtheslowoperationandterminateit.Everyslowoperation is assumed to run for 100ms or more.

•Inordertodothis,openasecondwindow(ortab)andrunasecond instance of the mongo shell.

• What indexes can we introduce to make the slow queries more efﬁcient? Disregard the index created in the

previous exercises.

4.13 Lab: Using explain()

Exercise: explain(“executionStats”)

Drop all indexes from previous exercises:

mongo performance

>db.sensor_readings.dropIndexes()

Create an index for the “active” ﬁeld:

db.sensor_readings.createIndex({ "active" :1})

How many index entries and documents are examined for the following query? How many results are returned?

db.sensor_readings.find(

{"active":false,"_id":{ $gte:99, $lte:1000 }}

).explain("executionStats")

5 Replica Sets

Introduction to Replica Sets (page 88) An introduction to replication and replica sets

Elections in Replica Sets (page 91) The process of electing a new primary (automated failover) inreplicasets

Replica Set Roles and Conﬁguration (page 96) Conﬁguring replica set members for common use cases

The Oplog: Statement Based Replication (page 97) The process of replicating data from one node of a replica set to

another

Lab: Working with the Oplog (page 99) Abrieflabthatillustrateshowtheoplogworks

Write Concern (page 101) Balancing performance and durability of writes

Read Concern (page 106) Settings to minimize/prevent stale and dirty reads

Read Preference (page 113) Conﬁguring clients to read from speciﬁc members of a replica set

Lab: Setting up a Replica Set (page 114) Launching members, conﬁguring, and initiating a replica set

5.1 Introduction to Replica Sets

Learning Objectives

Upon completing this module, students should understand:

•Strikingtherightbalancebetweencostandredundancy

•Themanyscenariosreplicationaddressesandwhy

•Howtoavoiddowntimeanddatalossusingreplication

Use Cases for Replication

•HighAvailability

•DisasterRecovery

•FunctionalSegregation

High Availability (HA)

•Datastillavailablefollowing:

–Equipment failure (e.g. server, network switch)

–Datacenter failure

•Thisisachievedthroughautomaticfailover.

Disaster Recovery (DR)

•Wecanduplicatedataacross:

–Multiple database servers

–Storage backends

–Datacenters

•Canrestoredatafromanothernodefollowing:

–Hardware failure

–Service interruption

Functional Segregation

There are opportunities to exploit the topology of a replica set:

•Basedonphysicallocation(e.g.rackordatacenterlocation)

•Foranalytics,reporting,datadiscovery,systemtasks,etc.

•Forbackups

Large Replica Sets

Functional segregation can be further exploited by using large replica sets.

•50nodereplicasetlimitwithamaximumof7votingmembers

•Usefulfordeploymentswithalargenumberofdatacentersorofﬁces

•Readonlyworkloadscanpositionsecondariesindatacenters around the world (closer to application servers)

Replication is Not Designed for Scaling

•Canbeusedforscalingreads,butgenerallynotrecommended.

•Drawbacksinclude:

–Eventual consistency

–Not scaling writes

–Potential system overload when secondaries are unavailable

•Considershardingforscalingreadsandwrites.

Replica Sets

Primary

Client Application

Driver

Writes

Reads

Replication

Primary

Replication

Secondary Secondary

Primary Server

•Clientssendwritestotheprimaryonly.

•MongoDB,Inc. maintainsclientdriversinmanyprogramminglanguageslikeJava,C#,Javascript,Python,

Ruby, and PHP.

•MongoDBdriversarereplicasetaware.

Secondaries

•Asecondaryreplicatesoperationsfromanothernodeinthereplica set.

•Secondariesusuallyreplicatefromtheprimary.

•Secondariesmayalsoreplicatefromothersecondaries.This is called replication chaining.

•Asecondarymaybecomeprimaryasaresultofafailoverscenario.

Heartbeats

Primary

Secondary Secondary

Heartbeat

The Oplog

•Theoperationslog,oroplog,isaspecialcappedcollectionthatisthebasisforreplication.

•Theoplogmaintainsoneentryforeachdocumentaffectedbyevery write operation.

•Secondariescopyoperationsfromtheoplogoftheirsyncsource.

Initial Sync

•Occurswhenanewserverisaddedtoareplicaset,orweerasethe underlying data of an existing server (–dbpath)

•Allexistingcollectionsexceptthelocal collection are copied

•AsofMongoDB>=3.4,allindexesarebuiltwhiledataiscopied

•AsofMongoDB>=3.4,initialsyncismoreresilienttointermittent network failure/degradation

5.2 Elections in Replica Sets

Learning Objectives

Upon completing this module students should understand:

•Thatelectionsenableautomatedfailoverinreplicasets

•Howvotesaredistributedtomembers

• What prompts an election

•Howanewprimaryisselected

Members and Votes

Calling Elections

Heartbeat

Election for New Primary

Replication

New Primary Elected

Secondary Secondary

Secondary

Primary

Heartbeat

Selecting a New Primary

•Dependsonwhichreplicationprotocolversionisinuse

•PV0

–Priority

–Optime

–Connections

•PV1

–Optime

–Connections

Priority

•PV0factorspriorityintovoting.

•Thehigheritspriority,themorelikelyamemberistobecomeprimary.

•Thedefaultis1.

•Serverswithapriorityof0willneverbecomeprimary.

•Priorityvaluesareﬂoatingpointnumbers0-1000inclusive.

Optime

•Optime:Operationtime,whichisthetimestampofthelastoperation the member applied from the oplog.

•Tobeelectedprimary,amembermusthavethemostrecentoptime.

•Onlyoptimesofvisiblemembersarecompared.

Connections

•Mustbeabletoconnecttoamajorityofthemembersinthereplica set.

•Majorityreferstothetotalnumberofvotes.

•Notthetotalnumberofmembers.

When will a primary step down?

•AfterreceivingthereplSetStepDown or rs.stepDown() command.

•Ifasecondaryiseligibleforelectionandhasahigherpriority.

•Ifitcannotcontactamajorityofthemembersofthereplicaset.

replSetStepDown Behavior

•Primarywillattempttoterminatelongrunningoperationsbefore stepping down.

•Primarywillwaitforelectablesecondarytocatchupbeforesteppingdown.

•“secondaryCatchUpPeriodSecs”canbespeciﬁedtolimittheamountoftimetheprimarywillwaitforasec-

ondary to catch up before the primary steps down.

Exercise: Elections in Failover Scenarios

•Wehavelearnedaboutelectingaprimaryinreplicasets.

•Let’slookatsomescenariosinwhichfailovermightbenecessary.

Scenario A: 3 Data Nodes in 1 DC

Which secondary will become the new primary?

Scenario B: 3 Data Nodes in 2 DCs

Which member will become primary following this type of network partition?

Scenario C: 4 Data Nodes in 2 DCs

What happens following this network partition?

Scenario D: 5 Nodes in 2 DCs

The following is similar to Scenario C, but with the addition of an arbiter in Data Center 1. What happens here?

Scenario E: 3 Data Nodes in 3 DCs

• What happens here if any one of the nodes/DCs fail?

• What about recovery time?

Scenario F: 5 Data Nodes in 3 DCs

What happens here if any one of the nodes/DCs fail? What about recovery time?

5.3 Replica Set Roles and Conﬁguration

Learning Objectives

Upon completing this module students should understand:

•Theuseofprioritytopreferencecertainmembersordatacenters as primaries.

•Hiddenmembers.

•Theuseofhiddensecondariesfordataanalyticsandotherpurposes (when secondary reads are used).

•TheuseofslaveDelaytoprotectagainstoperatorerror.

Example: A Five-Member Replica Set Conﬁguration

•Forthisexampleapplication,therearetwodatacenters.

•Wenamethehostsaccordingly:dc1-1,dc1-2,dc2-1,etc.

–This is just a clarifying convention for this example.

–MongoDB does not care about host names except to establish connections.

•Thenodesinthisreplicasethaveavarietyofrolesinthisapplication.

Conﬁguration

conf ={// 5 data-bearing nodes

_id:"mySet",

members:[

{_id:0, host :"dc1-1.example.net:27017", priority :5},

{_id:1, host :"dc1-2.example.net:27017", priority :5},

{_id:2, host :"dc2-1.example.net:27017" },

{_id:3, host :"dc1-3.example.net:27017", hidden :true },

{_id:4, host :"dc2-2.example.net:27017", hidden :true,

slaveDelay:7200 }

]

}

Principal Data Center

{_id:0, host :"dc1-1.example.net", priority :5},

{_id:1, host :"dc1-2.example.net", priority :5},

Data Center 2

{_id:2, host :"dc2-1.example.net:27017" },

What about dc1-3 and dc2-2?

// Both are hidden.

// Clients will not distribute reads to hidden members.

// We use hidden members for dedicated tasks.

{_id:3, host :"dc1-3.example.net:27017", hidden :true },

{_id:4, host :"dc2-2.example.net:27017", hidden :true,

slaveDelay:7200 }

What about dc2-2?

{_id:4, host :"dc2-2.example.net:27017", hidden :true,

slaveDelay :7200 }

5.4 The Oplog: Statement Based Replication

Learning Objectives

Upon completing this module students should understand:

•Binaryvs.statement-basedreplication.

•Howtheoplogisusedtosupportreplication.

•HowoperationsinMongoDBaretranslatedintooperationswritten to the oplog.

• Why oplog operations are idempotent.

•Thattheoplogisacappedcollectionandtheimplicationsthis holds for syncing members.

Binary Replication

•MongoDBreplicationisstatementbased.

•Contrastthatwithbinaryreplication.

•Withbinaryreplicationwewouldkeeptrackof:

–The data ﬁles

–The offsets

–How many bytes were written for each change

•Inshort,wewouldkeeptrackofactualbytesandveryspeciﬁclocations.

•Wewouldsimplyreplicatethesechangesacrosssecondaries.

Tradeoffs

•Thegoodthingisthatﬁguringoutwheretowrite,etc.isveryefﬁcient.

•Butwemusthaveabyte-for-bytematchofourdataﬁlesontheprimary and secondaries.

•Theproblemisthatthiscouplesourreplicasetmembersinways that are inﬂexible.

•Binaryreplicationmayalsoreplicatediskcorruption.

Statement-Based Replication

•Statement-basedreplicationfacilitatesgreaterindependence among members of a replica set.

•MongoDBstoresastatementforeveryoperationinacappedcollection called the oplog.

•Secondariesdonotsimplyapplyexactlytheoperationthatwas issued on the primary.

Example

Suppose the following command is issued and it deletes 100 documents:

db.foo.deleteMany({ age :30 })

This will be represented in the oplog with records such as the following:

{"ts" :Timestamp(1407159845,5), "h" :NumberLong("-704612487691926908"),

"v" :2,"op" :"d","ns" :"bar.foo","b" :true,"o" :{"_id" :65 }}

{"ts" :Timestamp(1407159845,1), "h" :NumberLong("6014126345225019794"),

"v" :2,"op" :"d","ns" :"bar.foo","b" :true,"o" :{"_id" :333 }}

{"ts" :Timestamp(1407159845,4), "h" :NumberLong("8178791764238465439"),

"v" :2,"op" :"d","ns" :"bar.foo","b" :true,"o" :{"_id" :447 }}

{"ts" :Timestamp(1407159845,3), "h" :NumberLong("-1707391001705528381"),

"v" :2,"op" :"d","ns" :"bar.foo","b" :true,"o" :{"_id" :1033 }}

{"ts" :Timestamp(1407159845,2), "h" :NumberLong("-6814297392442406598"),

"v" :2,"op" :"d","ns" :"bar.foo","b" :true,"o" :{"_id" :9971 }}

Replication Based on the Oplog

•Onestatementperdocumentaffectedbyeachwrite:insert,update, or delete.

•Providesalevelofabstractionthatenablesindependenceamong the members of a replica set:

–With regard to MongoDB version.

–In terms of how data is stored on disk.

–Freedom to do maintenance without the need to bring the entiresetdown.

Operations in the Oplog are Idempotent

•Eachoperationintheoplogisidempotent.

• Whether applied once or multiple times it produces the same result.

•Necessaryifyouwanttobeabletocopydatawhilesimultaneously accepting writes.

The Oplog Window

•Oplogsarecappedcollections.

•Cappedcollectionsareﬁxed-size.

•Theyguaranteepreservationofinsertionorder.

•Theysupporthigh-throughputoperations.

•Likecircularbuffers,onceacollectionﬁllsitsallocatedspace:

–It makes room for new documents.

–By overwriting the oldest documents in the collection.

Sizing the Oplog

•Theoplogshouldbesizedtoaccountforlatencyamongmembers.

•Thedefaultsizeoplogisusuallysufﬁcient.

•Butyouwanttomakesurethatyouroplogislargeenough:

–So that the oplog window is large enough to support replication

–To give you a large enough history for any diagnostics you might wish to run.

5.5 Lab: Working with the Oplog

Create a Replica Set

Let’s take a look at a concrete example. Launch mongo shell as follows.

mkdir -p /data/db

mongo --nodb

Create a replica set by running the following command in the mongo shell.

replicaSet =new ReplSetTest( { nodes :3})

ReplSetTest

•ReplSetTestisusefulforexperimentingwithreplicasetsas a means of hands-on learning.

•Itshouldneverbeusedinproduction.Never.

•Thecommandabovewillcreateareplicasetwiththreemembers.

•Itdoesnotstartthemongods,however.

•Youwillneedtoissueadditionalcommandstodothat.

Start the Replica Set

Start the mongod processes for this replica set.

replicaSet.startSet()

Issue the following command to conﬁgure replication for these mongods. You will need to issue this while output is

ﬂying by in the shell.

replicaSet.initiate()

Status Check

•Youshouldnowhavethreemongodsrunningonports20000,20001, and 20002.

•Youwillseelogstatementsfromallthreeprintinginthecurrent shell.

•Tocompletetherestoftheexercise,openanewshell.

Connect to the Primary

Open a new shell, connecting to the primary.

mongo --port 20000

Create some Inventory Data

Use the store database:

use store

Add the following inventory:

inventory =[{_id:1, inStock:10 }, { _id:2, inStock:20 },

{_id

:3, inStock:30 }, { _id:4, inStock:40 },

{_id

:5, inStock:50 }, { _id:6, inStock:60 }]

db.products.insert(inventory)

100

Perform an Update

Issue the following update. We might issue this update after apurchaseofthreeitems.

db.products.update({ _id:{$in:[2,5]}},

{ $inc

:{ inStock :-1}},

{ multi

:true })

View the Oplog

The oplog is a capped collection in the local database of each replica set member:

use local

db.oplog.rs.find()

{"ts" :Timestamp(1406944987,1), "h" :NumberLong(0), "v" :2,"op" :"n",

"ns" :"","o" :{"msg" :"initiating set" }}

...

{"ts" :Timestamp(1406945076,1), "h" :NumberLong("-9144645443320713428"),

"v" :2,"op" :"u","ns" :"store.products","o2" :{"_id" :2},

"o" :{"$set" :{"inStock" :19 }}}

{"ts" :Timestamp(1406945076,2), "h" :NumberLong("-7873096834441143322"),

"v" :2,"op" :"u","ns" :"store.products","o2" :{"_id" :5},

"o" :{"$set" :{"inStock" :49 }}}

5.6 Write Concern

Learning Objectives

Upon completing this module students should understand:

•HowandwhenrollbackoccursinMongoDB.

•Thetradeoffsbetweendurabilityandperformance.

• Write concern as a means of ensuring durability in MongoDB.

•Thedifferentlevelsofwriteconcern.

•Relationbetweenvotingmemberandwriteconcern

What happens to the write?

•Awriteissenttoaprimary.

•Theprimaryacknowledgesthewritetotheclient.

•Theprimarythenbecomesunavailablebeforeasecondarycanreplicatethewrite

101

Answer to ‘What happens to the write?’

•Anothermembermightbeelectedprimary.

•Itwillnothavethelastwritethatoccurredbeforetheprevious primary became unavailable.

• When the previous primary becomes available again:

–It will note it has writes that were not replicated.

–It will put these writes into a rollback file.

–Ahumanwillneedtodeterminewhattodowiththisdata.

•ThisisdefaultbehaviorinMongoDBandcanbecontrolledusing write concern.

Balancing Durability with Performance

•Thepreviousscenarioisaspeciﬁcinstanceofacommondistributed systems problem.

•Forsomeapplicationsitmightbeacceptableforwritestoberolledback.

•Otherapplicationsmayhavevaryingrequirementswithregard to durability.

•Tunablewriteconcern:

–Make critical operations persist to an entire MongoDB deployment.

–Specify replication to fewer nodes for less important operations.

Deﬁning Write Concern

•MongoDBacknowledgesitswrites

• Write concern determines when that acknowledgment occurs

–How many servers

–Whether on disk or not

•Clientsmaydeﬁnethewriteconcernperwriteoperation,ifnecessary.

•Standardizeonspeciﬁclevelsofwriteconcernsfordifferent classes of writes.

•Inthediscussionthatfollowswewilllookatincreasinglystrict levels of write concern.

•Onlyvotingmembersparticipateinwriteconcerncount.

102

Write Concern: {w: 1}

Driver

Primary

Write

mongod

Response

Apply

writeConcern:

{ w: 1 }

Example: {w: 1}

db.edges.insertOne( { from :"tom185",to:"mary_p" },

{ writeConcern :{w:1}})

Write Concern: {w: 2}

Driver

Primary

Write

Primary

Secondary

Apply

Replicate

Secondary

Replicate

writeConcern:

{ w: 2 }

Response

Apply

Example: {w: 2}

db.customer.updateOne( { user :"mary_p" },

{ $push :{ shoppingCart:

{_id:335443, name :"Brew-a-cup",

price :45.79 }}},

{ writeConcern :{w:2}})

103

Other Remarks regarding Write Concerns

•wcan use any integer for write concern.

•Acknowledgmentguaranteesthewritehaspropagatedtothespeciﬁed number of voting members.

–E.g., {w: 3},{w: 4},etc.

•j: trueensures the writes are in the journal (which is written to disk) before being acknowledged

–PV0: on the primary need to write to the journal

–PV1: all nodes contributing to the majority write the journaltodiskbeforeacknowledgewriteConcern-

MajorityJournalDefault9

•w: majorityimplies j: truein PV1

Write Concern: {w: "majority"}

• Ensures the primary completed the write (in RAM).

–By default, also on disk

•Ensureswriteoperationshavepropagatedtoamajorityofthe voting members.

•Avoidshardcodingassumptionsaboutthesizeofyourreplica set into your application.

•Usingmajoritytradesoffperformancefordurability.

•Itissuitableforcriticalwritesandtoavoidrollbacks.

Example: {w: "majority"}

db.products.updateOne({ _id :335443 },

{ $inc :{ inStock :-1}},

{ writeConcern :{w:"majority" }})

Quiz: Which write concern?

Suppose you have a replica set with 7 data nodes, all voting members in the replica set. Your application has critical

inserts for which you do not want rollbacks to happen. Secondaries may be taken down from to time for maintenance,

leaving you with a potential 4 server replica set. Which write concern is best suited for these critical inserts?

•{w:1}

•{w:2}

•{w:3}

•{w:4}

•{w:“majority”}

9http://docs.mongodb.org/manual/reference/replica-conﬁguration/#rsconf.writeConcernMajorityJournalDefault

104

Further Reading

•Replica Conﬁguration14

•Replica States15

14 http://docs.mongodb.org/manual/reference/replica-conﬁguration/

15 http://docs.mongodb.org/manual/reference/replica-states/

117

6 Sharding

Introduction to Sharding (page 118) An introduction to sharding

Balancing Shards (page 125) Chunks, the balancer, and their role in a sharded cluster

Shard Zones (page 127) How zone-based sharding works

Lab: Setting Up a Sharded Cluster (page 129) Deploying a sharded cluster

6.1 Introduction to Sharding

Learning Objectives

Upon completing this module, students should understand:

• What problems sharding solves

• When sharding is appropriate

•Theimportanceoftheshardkeyandhowtochooseagoodone

• Why sharding increases the need for redundancy

Contrast with Replication

•Inanearliermodule,wediscussedReplication.

•Thisshouldneverbeconfusedwithsharding.

•Replicationisabouthighavailabilityanddurability.

–Taking your data and constantly copying it

–Being ready to have another machine step in to ﬁeld requests.

Sharding is Concerned with Scale

• What happens when a system is unable to handle the applicationload?

•Itistimetoconsiderscaling.

•Thereare2typesofscalingwewanttoconsider:

–Vertica l s c a l i n g

–Horizontal scaling

118

Vertical Scaling

•AddingmoreRAM,fasterdisks,etc.

• When is this the solution?

•First,consideraconceptcalledtheworking set.

The Working Set

Limitations of Vertical Scaling

•ThereisalimittohowmuchRAMonemachinecansupport.

•ThereareotherbottleneckssuchasI/O,diskaccessandnetwork.

•Costmaylimitourabilitytoscaleup.

•Theremayberequirementstohavealargeworkingsetthatnosingle machine could possible support.

•Thisiswhenitistimetoscalehorizontally.

Sharding Overview

•MongoDBenablesyoutoscalehorizontallythroughsharding.

•Shardingisaboutaddingmorecapacitytoyoursystem.

•MongoDB’sshardingsolutionisdesignedtoperformwelloncommodity hardware.

•Thedetailsofshardingareabstractedawayfromapplications.

•Queriesareperformedthesamewayasifsendingoperationsto a single server.

•Connectionsworkthesamebydefault.

119

When to Shard

•Ifyouhavemoredatathanonemachinecanholdonitsdrives

•Ifyourapplicationiswriteheavyandyouareexperiencingtoo much latency.

•Ifyourworkingsetoutgrowsthememoryyoucanallocatetoasingle machine.

Dividing Up Your Dataset

1TB

Collection1

Shard A

256 GB

Shard B

256 GB

Shard C

256 GB

Shard D

256 GB

Collection1

Sharding Concepts

To understanding how sharding works in MongoDB, we need to understand:

•ShardKeys

•Chunks

Shard Key

•Youmustdeﬁneashardkeyforashardedcollection.

•Basedononeormoreﬁelds(likeanindex)

•Shardkeydeﬁnesaspaceofvalues

•Thinkofthekeyspacelikepointsonaline

•Akeyrangeisasegmentofthatline

120

Shard Key Ranges

•Acollectionispartitionedbasedonshardkeyranges.

•Theshardkeydetermineswheredocumentsarelocatedinthecluster.

•Itisusedtorouteoperationstotheappropriateshard.

•Forreadsandwrites.

•Onceacollectionissharded,youcannotchangeashardkey.

•Youcannotupdate the value of the shard key for a document

Targeted Query Using Shard Key

Shard A Shard B Shard C

Collection (shard key a)

mongos

Driver

Read

Results

a: "z1"

{ a: "z1" }

Chunks

•MongoDBpartitionsdataintochunks based on shard key ranges.

•Thisisbookkeepingmetadata.

•MongoDBattemptstokeeptheamountofdatabalancedacrossshards.

•Thisisachievedbymigratingchunksfromoneshardtoanother as needed.

•Thereisnothinginadocumentthatindicatesitschunk.

•Thedocumentdoesnotneedtobeupdatedifitsassignedchunkchanges.

121

Sharded Cluster Architecture

Router

(mongos)

Shard Shard

2 or more Shards

Router

(mongos)

2 or more Routers

App Server App Server

(replica set) (replica set)

Mongos

•Amongosisresponsibleforacceptingrequestsandreturning results to an application driver.

•Inashardedcluster,nearlyalloperationsgothroughamongos.

•Ashardedclustercanhaveasmanymongosroutersasrequired.

•Itistypicalforeachapplicationservertohaveonemongos.

•Alwaysusemorethanonemongostoavoidasinglepointoffailure.

Conﬁg Servers

Conﬁg ver

Router

(mongos)

d d

ﬁ ervers

Router

(mongos)

e Routers

ver ver

(r (r

122

Conﬁg Server Hardware Requirements

•Qualitynetworkinterfaces

•Asmallamountofdiskspace(typicallyafewGB)

•AsmallamountofRAM(typicallyafewGB)

•Thelargertheshardedcluster,thegreatertheconﬁgserverhardwarerequirements.

Possible Imbalance?

•Dependingonhowyouconﬁguresharding,datacanbecomeunbalanced on your sharded cluster.

–Some shards might receive more inserts than others.

–Some shards might have documents that grow more than those in other shards.

•Thismayresultintoomuchloadonasingleshard.

–Reads and writes

–Disk activity

•Thiswoulddefeatthepurposeofsharding.

Balancing Shards

•IfachunkgrowstoolargeMongoDBwillsplititintotwochunks.

•TheMongoDBbalancerkeepschunksdistributedacrossshards in equal numbers.

•However,abalancedshardedclusterdependsonagoodshardkey.

With a Good Shard Key

You might easily see t h a t :

•Readshitonly1or2shardsperquery.

• Writes are distributed across all servers.

•Yourdiskusageisevenlydistributedacrossshards.

•Thingsstaythiswayasyouscale.

123

With a Bad Shard Key

You might see that:

•Yourreadshiteveryshard.

•Yourwritesareconcentratedononeshard.

•Mostofyourdataisonjustafewshards.

•Addingmoreshardstotheclusterwillnothelp.

Choosing a Shard Key

Generally, you want a shard key:

•Thathashighcardinality

•Thatisusedinthemajorityofreadqueries

•Forwhichthevaluesreadandwriteoperationsusearerandomly distributed

•Forwhichthemajorityofreadsareroutedtoaparticularserver

More Speciﬁcally

•Yourshardkeyshouldbeconsistentwithyourquerypatterns.

•Ifreadsusuallyﬁndonlyonedocument,youonlyneedgoodcardinality.

•Ifreadsretrievemanydocuments:

–Your shard key supports l o c a l i t y

–Matching documents will reside on the same shard

Cardinality

•Agoodshardkeywillhavehighcardinality.

•Arelativelysmallnumberofdocumentsshouldhavethesameshard key.

•Otherwiseoperationsbecomeisolatedtothesameserver.

•Becausedocumentswiththesameshardkeyresideonthesameshard.

•Addingmoreserverswillnothelp.

•Hashingwillnothelp.

124

Non-Monotonic

•Agoodshardkeywillgeneratenewvaluesnon-monotonically.

•Datetimes,counters,andObjectIdsmakebadshardkeys.

•Monotonicshardkeyscauseallinsertstohappenonthesameshard.

•Hashingwillsolvethisproblem.

•However,doingrangequerieswithahashedshardkeywillperform a scatter-gather query across the cluster.

Shards Should be Replica Sets

•Asthenumberofshardsincreases,thenumberofserversinyour deployment increases.

•Thisincreasestheprobabilitythatoneserverwillfailonany given day.

•Withredundancybuiltintoeachshardyoucanmitigatethisrisk.

6.2 Balancing Shards

Learning Objectives

Upon completing this module students should understand:

•Chunksandthebalancer

•Thestatusofchunksinanewlyshardedcollection

•Howchunksplitsautomaticallyoccur

•Advantagesofpre-splittingchunks

•Howthebalancerworks

Chunks and the Balancer

•Chunksaregroupsofdocuments.

•Theshardkeydetermineswhichchunkadocumentwillbecontained in.

•Chunkscanbesplitwhentheygrowtoolarge.

•Thebalancerdecideswherechunksgo.

•Ithandlesmigrationsofchunksfromoneservertoanother.

125

Chunks in a Newly Sharded Collection

•Therangeofachunkisdeﬁnedbytheshardkeyvaluesofthedocuments the chunk contains.

• When a collection is sharded it starts with just one chunk.

•Theﬁrstchunkforacollectionwillhavetherange:

{ $minKey :1} to { $maxKey :1}

•Allshardkeyvaluesfromthesmallestpossibletothelargest fall in this chunk’s range.

Chunk Splits

d A

32.1 MB

64.2 MB

Split

Pre-Splitting Chunks

•Youmaypre-splitdatabeforeloadingdataintoashardedcluster.

• Pre-splitting is useful if:

–You plan to do a larg e d a t a i m p o r t e a r l y o n

–You ex p e c t a h e av y i n i t ial serve r l o a d a n d want t o ens u r e w r i tes are distributed

Start of a Balancing Round

•Abalancingroundisinitiatedbythebalancerprocessontheprimaryconﬁgserver.

•Thishappenswhenthedifferenceinthenumberofchunksbetween two shards becomes to large.

•Speciﬁcally,thedifferencebetweentheshardwiththemostchunksandtheshardwiththefewest.

•Abalancingroundstartswhentheimbalancereaches:

–2whentheclusterhas<20chunks

–4whentheclusterhas20-79chunks

–8whentheclusterhas80+chunks

126

Balancing is Resource Intensive

•Chunkmigrationrequirescopyingallthedatainthechunkfrom one shard to another.

•Eachindividualshardcanbeinvolvedinonemigrationatatime. Parallel migrations can occur for each shard

migration pair (source + destination).

•Theamountofpossibleparallelchunkmigrationsfornshards is n/2 rounded down.

•MongoDBcreatessplitsonlyafteraninsertoperation.

•Forthesereasons,itispossibletodeﬁneabalancingwindowtoensurethebalancerwillonlyrunduringsched-

uled times.

Chunk Migration Steps

1. The balancer process sends the moveChunk command to the source shard.

2. The source shard continues to process reads/writes for that chunk during the migration.

3. The destination shard requests documents in the chunk and begins receiving copies.

4. After receiving all documents, the destination shard receives any changes to the chunk.

5. Then the destination shard tells the conﬁg db that it has thechunk.

6. The destination shard will now handle all reads/writes.

7. The source shard deletes its copy of the chunk.

Concluding a Balancing Round

•Eachchunkwillmove:

–From the shard with the most chunks

–To the shard with the fewest

•Abalancingroundendswhenallshardsdifferbyatmostonechunk.

6.3 Shard Zones

Learning Objectives

Upon completing this module students should understand:

•Thepurposeforshardzones

•Advantagesofusingshardzones

•Potentialdrawbacksofshardzones

127

Zones - Overview

•Shardzonesallowyouto“tie”datatooneormoreshards.

•Ashardzonedescribesarangeofshardkeyvalues.

•Ifachunkisintheshardtagrange,itwillliveonashardwiththattag.

•Shardtagrangescannotoverlap.Inthecasewetrytodeﬁneoverlapping ranges an error will occur during

creation.

Example: DateTime

•Documentsolderthanoneyearneedtobekept,butarerarelyused.

•YousetapartoftheshardkeyastheISODateofdocumentcreation.

•AddshardstotheLTSzone.

•Theseshardscanbeoncheaper,slowermachines.

•Investinhigh-performanceserversformorefrequentlyaccessed data.

Example: Location

•Youarerequiredtokeepcertaindatainitshomecountry.

•Youincludethecountryintheshardtag.

•Maintaindatacenterswithineachcountrythathousetheappropriate shards.

•Meetsthecountryrequirementbutallowsallserverstobepart of the same system.

•Asdocumentsageandpassintoanewzonerange,thebalancerwill migrate them automatically.

Example: Premium Tier

•Youhavecustomerswhowanttopayfora“premium”tier.

•Theshardkeypermitsyoutodistinguishonecustomer’sdocuments from all others.

•Tagthedocumentrangesforeachcustomersothattheirdocuments will be located on shards of the appropriate

tier (zone).

•Shardstaggedaspremiumtierrunonhighperformanceservers.

•Othershardsrunoncommodityhardware.

•See

Manage Shard Zone16

16 http://docs.mongodb.org/manual/tutorial/manage-shard-zone/

128

Zones - Caveats

•Becausetaggedchunkswillonlybeoncertainservers,ifyoutagmorethanthoseserverscanhandle,you’ll

have a problem.

–You’r e not onl y worrying about your overall serve r l o a d , y o u ’re worrying about server load for each of

your tags.

•Yourchunkswillevenlydistributethemselvesacrosstheavailable zones. You cannot control things more ﬁne

grained than your tags.

6.4 Lab: Setting Up a Sharded Cluster

Learning Objectives

Upon completing this module students should understand:

•Howtosetupashardedclusterincluding:

–Replica sets as shards

–Conﬁg Servers

–Mongos processes

•Howtoenableshardingforadatabase

•Howtoshardacollection

•Howtodeterminewheredatawillgo

Our Sharded Cluster

•Inthisexercise,wewillsetupaclusterwith3shards.

•Eachshardwillbeareplicasetwith3members(includingonearbiter).

•Wewillinsertsomedataandseewhereitgoes.

Sharded Cluster Conﬁguration

•Threeshards:

1. A replica set on ports 27107, 27108, 27109

2. A replica set on ports 27117, 27118, 27119

3. A replica set on ports 27127, 27128, 27129

•Threeconﬁgserversonports27217,27218,27219

•Twomongosserversatports27017and27018

129

Build Our Data Directories

On Linux or MacOS, run the following in the terminal to create the data directories we’ll need.

mkdir -p ~/data/cluster/config/{c0,c1,c2}

mkdir -p ~/data/cluster/shard0/{m0,m1,arb}

mkdir -p ~/data/cluster/shard1/{m0,m1,arb}

mkdir -p ~/data/cluster/shard2/{m0,m1,arb}

mkdir -p ~/data/cluster/{s0,s1}

On Windows, run the following commands instead:

md c:\data\cluster\config\c0 c:\data\cluster\config\c1 c:\data\cluster\config\c2

md c:\data\cluster\shard0\m0 c:\data\cluster\shard0\m1 c:\data\cluster\shard0\arb

md c:\data\cluster\shard1\m0 c:\data\cluster\shard1\m1 c:\data\cluster\shard1\arb

md c:\data\cluster\shard2\m0 c:\data\cluster\shard2\m1 c:\data\cluster\shard2\arb

md c:\data\cluster\s0 c:\data\cluster\s1

Initiate a Replica Set (Linux/MacOS)

mongod --replSet shard0 --dbpath ~/data/cluster/shard0/m0 \

--logpath ~/data/cluster/shard0/m0/mongod.log \

--fork --port 27107 --shardsvr

mongod --replSet shard0 --dbpath ~/data/cluster/shard0/m1 \

--logpath ~/data/cluster/shard0/m1/mongod.log \

--fork --port 27108 --shardsvr

mongod --replSet shard0 --dbpath ~/data/cluster/shard0/arb \

--logpath ~/data/cluster/shard0/arb/mongod.log \

--fork --port 27109 --shardsvr

mongo --port 27107 --eval "\

rs.initiate(); sleep(3000);\

rs.add(’$HOSTNAME:27108’);\

rs.addArb(’$HOSTNAME:27109’)"

Initiate a Replica Set (Windows)

mongod --replSet shard0 --dbpath c:\data\cluster\shard0\m0\

--logpath c:\data\cluster\shard0\m0\mongod.log \

--port 27107 --oplogSize 10 --shardsvr

mongod --replSet shard0 --dbpath c:\data\cluster\shard0\m1\

--logpath c:\data\cluster\shard0\m1\mongod.log \

--port 27108 --oplogSize 10 --shardsvr

mongod --replSet shard0 --dbpath c:\data\cluster\shard0\arb \

--logpath c:\data\cluster\shard0\arb\mongod.log \

--port 27109 --oplogSize 10 --shardsvr

mongo --port 27107 --eval "\

rs.initiate(); sleep(3000);\

rs.add (’<HOSTNAME>:27108’);\

rs.addArb(’<HOSTNAME>:27109’)"

130

Spin Up a Second Replica Set (Linux/MacOS)

mongod --replSet shard1 --dbpath ~/data/cluster/shard1/m0 \

--logpath ~/data/cluster/shard1/m0/mongod.log \

--fork --port 27117 --shardsvr

mongod --replSet shard1 --dbpath ~/data/cluster/shard1/m1 \

--logpath ~/data/cluster/shard1/m1/mongod.log \

--fork --port 27118 --shardsvr

mongod --replSet shard1 --dbpath ~/data/cluster/shard1/arb \

--logpath ~/data/cluster/shard1/arb/mongod.log \

--fork --port 27119 --shardsvr

mongo --port 27117 --eval "\

rs.initiate(); sleep(3000);\

rs.add (’$HOSTNAME:27118’);\

rs.addArb(’$HOSTNAME:27119’)"

Spin Up a Second Replica Set (Windows)

mongod --replSet shard1 --dbpath c:\data\cluster\shard1\m0\

--logpath c:\data\cluster\shard1\m0\mongod.log \

--port 27117 --oplogSize 10 --shardsvr

mongod --replSet shard1 --dbpath c:\data\cluster\shard1\m1\

--logpath c:\data\cluster\shard1\m1\mongod.log \

--port 27118 --oplogSize 10 --shardsvr

mongod --replSet shard1 --dbpath c:\data\cluster\shard1\arb \

--logpath c:\data\cluster\shard1\arb\mongod.log \

--port 27119 --oplogSize 10 --shardsvr

mongo --port 27117 --eval "\

rs.initiate(); sleep(3000);\

rs.add (’<HOSTNAME>:27118’);\

rs.addArb(’<HOSTNAME>:27119’)"

AThirdReplicaSet(Linux/MacOS)

mongod --replSet shard2 --dbpath ~/data/cluster/shard2/m0 \

--logpath ~/data/cluster/shard2/m0/mongod.log \

--fork --port 27127 --shardsvr

mongod --replSet shard2 --dbpath ~/data/cluster/shard2/m1 \

--logpath ~/data/cluster/shard2/m1/mongod.log \

--fork --port 27128 --shardsvr

mongod --replSet shard2 --dbpath ~/data/cluster/shard2/arb \

--logpath ~/data/cluster/shard2/arb/mongod.log \

--fork --port 27129 --shardsvr

mongo --port 27127 --eval "\

rs.initiate(); sleep(3000);\

131

rs.add (’$HOSTNAME:27128’);\

rs.addArb(’$HOSTNAME:27129’)"

A Third Replica Set (Windows)

mongod --replSet shard2 --dbpath c:\data\cluster\shard2\m0\

--logpath c:\data\cluster\shard2\m0\mongod.log \

--port 27127 --oplogSize 10 --shardsvr

mongod --replSet shard2 --dbpath c:\data\cluster\shard2\m1\

--logpath c:\data\cluster\shard2\m1\mongod.log \

--port 27128 --oplogSize 10 --shardsvr

mongod --replSet shard2 --dbpath c:\data\cluster\shard2\arb \

--logpath c:\data\cluster\shard2\arb\mongod.log \

--port 27129 --oplogSize 10 --shardsvr

mongo --port 27127 --eval "\

rs.initiate(); sleep(3000);\

rs.add (’<HOSTNAME>:27128’);\

rs.addArb(’<HOSTNAME>:27129’)"

Status Check

•Nowwehavethreereplicasetsrunning.

•Wehaveoneforeachshard.

•Theydonotknowabouteachotheryet.

•Tomakethemashardedclusterwewill:

–Build our conﬁg databases

–Launch our mongos processes

–Add each shard to the cluster

•Tobeneﬁtfromthisconﬁgurationwealsoneedto:

–Enable sharding for a database

–Shard at least one collection within that database

132

Launch Conﬁg Servers (Linux/MacOS)

mongod

--dbpath ~/data/cluster/config/c0 \

--replSet csrs \

--logpath ~/data/cluster/config/c0/mongod.log \

--fork --port 27217 --configsvr

mongod

--dbpath ~/data/cluster/config/c1 \

--replSet csrs \

--logpath ~/data/cluster/config/c1/mongod.log \

--fork --port 27218 --configsvr

mongod

--dbpath ~/data/cluster/config/c2 \

--replSet csrs \

--logpath ~/data/cluster/config/c2/mongod.log \

--fork --port 27219 --configsvr

mongo --port 27217 --eval "\

rs.initiate(); sleep(3000);\

rs.add (’<HOSTNAME>:27218’);\

rs.add (’<HOSTNAME>:27219’)"

Launch Conﬁg Servers (Windows)

mongod --dbpath c:\data\cluster\config\c0\

--replSet csrs \

--logpath c:\data\cluster\config\c0\mongod.log \

--port 27217 --configsvr

mongod --dbpath c:\data\cluster\config\c1\

--replSet csrs \

--logpath c:\data\cluster\config\c1\mongod.log \

--port 27218 --configsvr

mongod --dbpath c:\data\cluster\config\c2\

--replSet csrs \

--logpath c:\data\cluster\config\c2\mongod.log \

--port 27219 --configsvr

mongo --port 27217 --eval "\

rs.initiate(); sleep(3000);\

rs.add (’<HOSTNAME>:27218’);\

rs.add (’<HOSTNAME>:27219’)"

133

Launch the Mongos Processes (Linux/MacOS)

Now our mongos’s. We need to tell them about our conﬁg servers.

mongos --logpath ~/data/cluster/s0/mongos.log --fork --port 27017 \

--configdb "csrs/$HOSTNAME:27217,$HOSTNAME:27218,$HOSTNAME:27219"

mongos --logpath ~/data/cluster/s1/mongos.log --fork --port 27018 \

--configdb "csrs/$HOSTNAME:27217,$HOSTNAME:27218,$HOSTNAME:27219"

Launch the Mongos Processes (Windows)

Now our mongos’s. We need to tell them about our conﬁg servers.

configseedlist="csrs/$HOSTNAME:27217,$HOSTNAME:27218,$HOSTNAME:27219"

mongos --logpath c:\data\cluster\s0\mongos.log --port 27017 \

--configdb $configseedlist

mongos --logpath c:\data\cluster\s1\mongos.log --port 27018 \

--configdb csrs/localhost:27217,localhost:27218,localhost:27219

Add All Shards

echo "sh.addShard( ’shard0/$HOSTNAME:27107’ ); \

sh.addShard( ’shard1/$HOSTNAME:27117’ ); \

sh.addShard( ’shard2/$HOSTNAME:27127’ ); sh.status()" | mongo

Note: Instead of doing this through a bash (or other) shell command,youmayprefertolaunchamongoshelland

issue each command individually.

Enable Sharding and Shard a Collection

Enable sharding for the test database, shard a collection, and insert some documents.

mongo --port 27017

sh.enableSharding("test")

sh.shardCollection("test.testcol",{a:1,b:1})

for (i=0;i<1000;i++){

docArr =[];

for (j=0;j<1000;j++){

docArr.push(

{

a:i, b :j,

c:"Filler String 0000000000000000000000000000000000000000000000000"

})

};

db.testcol.insert(docArr)

};

134

Observe What Happens

Connect to either mongos using a mongo shell and frequently issue:

sh.status()

135

7 Reporting Tools and Diagnostics

Performance Troubleshooting (page 136) An introduction to reporting and diagnostic tools for MongoDB

7.1 Performance Troubleshooting

Learning Objectives

Upon completing this module students should understand basic performance troubleshooting techniques and tools

including:

•mongostat

•mongotop

•db.setProfilingLevel()

•db.currentOp()

•db.<COLLECTION>.stats()

•db.serverStatus()

mongostat and mongotop

•mongostat samples a server every second.

–See current ops, pagefaults, network trafﬁc, etc.

–Does not give a view into historic performance; use Ops Manager for that.

•mongotop looks at the time spent on reads/writes in each collection.

Exercise: mongostat (setup)

In one window, perform the following commands.

db.testcol.drop()

for (i=1;i<=10000;i++){

arr =[];

for (j=1;j<=1000;j++){

doc ={_id:(1000 *(i-1)+j), a:i, b:j, c:(1000 *(i-1)+j) };

arr.push(doc)

};

db.testcol.insertMany(arr);

var x=db.testcol.find( { b :255 });

x.next();

var x=db.testcol.find( { _id :1000 *(i-1)+255 });

x.next();

var x="asdf";

db.testcol.updateOne( { a :i, b :255 }, { $set :{d:x.pad(1000)}});

print(i)

}

136

Exercise: mongostat (run)

•Inanotherwindow/tab,runmongostat.

•Youwillsee:

–Inserts

–Queries

–Updates

Exercise: mongostat (create index)

•Inathirdwindow,createanindexwhenyouseethingsslowingdown:

db.testcol.createIndex( { a :1,b:1})

•Lookatmongostat.

•Noticethatthingsaregoingsigniﬁcantlyfaster.

•Then,let’sdropthatandbuildanotherindex.

db.testcol.dropIndexes()

db.testcol.createIndex( { b :1,a:1})

Exercise: mongotop

Perform the following then, in another window, run mongotop.

db.testcol.drop()

for (i=1;i<=10000;i++){

arr =[];

for (j=1;j<=1000;j++){

doc ={_id:(1000*(i-1)+j), a:i, b:j, c:(1000*(i-1)+j)};

arr.push(doc)

};

db.testcol.insertMany(arr);

var x=db.testcol.find( {b:255} ); x.next();

var x=db.testcol.find( {_id:1000*(i-1)+255} ); x.next();

var x="asdf";

db.testcol.updateOne( {a:i, b:255}, {$set:{d:x.pad(1000)}});

print(i)

}

137

db.currentOp()

•currentOpisatoolthataskswhatthedbisdoingatthemoment.

•currentOpisusefulforﬁndinglong-runningprocesses.

•Fieldsofinterest:

–microsecs_running

–op

–query

–lock

–waitingForLock

Exercise: db.currentOp()

Do the following then, connect with a separate shell, and repeatedly run db.currentOp().

db.testcol.drop()

for (i=1;i<=10000;i++){

arr =[];

for (j=1;j<=1000;j++){

doc ={_id:(1000*(i-1)+j), a:i, b:j, c:(1000*(i-1)+j)};

arr.push(doc)

};

db.testcol.insertMany(arr);

var x=db.testcol.find( {b:255} ); x.next();

var x=db.testcol.find( {_id:1000*(i-1)+255 }); x.next();

var x="asdf";

db.testcol.updateOne( {a:i, b:255}, {$set:{d:x.pad(1000)}});

print(i)

}

db.<COLLECTION>.stats()

•Usedtoviewthecurrentstatsforacollection.

•Everythingisinbytes;usethemultiplierparametertoviewinKB,MB,etc

•Youcanalsousedb.stats() to do this at scope of the entire database

138

Exercise: Using Collection Stats

Look at the output of the following:

db.testcol.drop()

db.testcol.insertOne( { a :1})

db.testcol.stats()

var x="asdf"

db.testcol2.insertOne( { a :x.pad(10000000)})

db.testcol2.stats()

db.stats()

The Proﬁler

•Offbydefault.

•Toreset,db.setProfilingLevel(0)

•Atsetting1,itcaptures“slow”queries.

•Youmaydeﬁnewhat“slow”is.

•Defaultis100ms:db.setProfilingLevel(1)

•E.g.,tocapture20ms:db.setProfilingLevel(1, 20)

The Proﬁler (continued)

•Iftheproﬁlerlevelis2,itcapturesallqueries.

–This will severely impact performance.

–Turns all reads into writes.

•Alwaysturntheproﬁleroffwhendone(setlevelto0)

•Createsdb.system.profile collection

Exercise: Exploring the Proﬁler

Perform the following, then look in your db.system.proﬁle.

db.setProfilingLevel(0)

db.testcol.drop()

db.system.profile.drop()

db.setProfilingLevel(2)

db.testcol.insertOne( { a :1})

db.testcol.find()

var x="asdf"

db.testcol.insertOne( { a :x.pad(10000000)}) // ~10 MB

db.setProfilingLevel(0)

db.system.profile.find().pretty()

139

db.serverStatus()

•Takesasnapshotofserverstatus.

•Bytakingdiffs,youcanseesystemtrends.

•MostofthedatathatOpsManager,CloudManagerandAtlasgetisfromthiscommand.

Exercise: Using db.serverStatus()

•Openuptwowindows.Intheﬁrst,type:

db.testcol.drop()

var x="asdf"

for (i=0;i<=10000000;i++){

db.testcol.insertOne( { a :x.pad(100000)})

}

•Inthesecondwindow,typeperiodically:

var x=db.serverStatus(); x.metrics.document

Analyzing Proﬁler Data

•Enabletheproﬁleratdefaultsettings.

•Runfor5seconds.

•Slowoperationsarecaptured.

•Theissueisthereisnotaproperindexonthemessageﬁeld.

•Youwillseehowfastdocumentsaregettinginserted.

•Itwillbeslowb/cthedocumentsarebig.

Performance Improvement Techniques

•Appropriatewriteconcerns

•Bulkoperations

•Goodschemadesign

•GoodShardKeychoice

•Goodindexes

140

Performance Tips: Write Concern

•Increasingthewriteconcernincreasesdatasafety.

•Thiswillhaveanimpactonperformance,however.

•Thisisespeciallytruewhentherearenetworkissues.

•Youwillwanttobalancebusinessneedsagainstspeed.

Bulk Operations

•Usingbulkoperations(includinginsertMany and updateMany )canimproveperformance,especially

when using write concern greater than 1.

•Theseenabletheservertoamortizeacknowledgement.

•CanbedonewithbothinsertMany and updateMany .

Exercise: Comparing insertMany with mongostat

Let’s spin up a 3-member replica set:

mkdir -p /data/replset/{1,2,3}

mongod --logpath /data/replset/1/mongod.log \

--dbpath /data/replset/1 --replSet mySet --port 27017 --fork

mongod --logpath /data/replset/2/mongod.log \

--dbpath /data/replset/2 --replSet mySet --port 27018 --fork

mongod --logpath /data/replset/3/mongod.log \

--dbpath /data/replset/3 --replSet mySet --port 27019 --fork

echo "conf = {_id: ’mySet’, members: [{_id: 0, host: ’localhost:27017’}, \

{_id: 1, host: ’localhost:27018’}, {_id: 2, host: ’localhost:27019’}]}; \

rs.initiate(conf)" | mongo

mongostat,insertOne with {w: 1}

Perform the following, with writeConcern : 1 and insertOne():

db.testcol.drop()

for (i=1;i<=10000;i++){

for (j=1;j<=1000;j++){

db.testcol.insertOne( { _id :(1000 *(i-1)+j),

a:i, b :j, c :(1000 *(i-1)+j) },

{ writeConcern :{w:1}});

};

print(i);

}

Run mongostat and see how fast that happens.

141

Multiple insertOne swith{w: 3}

Increase the write concern to 3 (safer but slower):

db.testcol.drop()

for (i=1;i<=10000;i++){

for (j=1;j<=1000;j++){

db.testcol.insertOne(

{_id

:(1000 *(i-1)+j), a:i, b:j, c:(1000 *(i-1)+j)},

{ writeConcern

:{w:3}}

);

};

print(i);

}

Again, run mongostat.

mongostat,insertMany with {w: 3}

•Finally,let’suseinsertMany to our advantage:

•NotethatwriteConcern is still {w: 3}

db.testcol.drop()

for (i=1;i<=10000;i++){

arr =[]

for (j=1;j<=1000;j++){

arr.push(

{_id

:(1000 *(i-1)+j), a:i, b:j, c:(1000 *(i-1)+j) }

);

};

db.testcol.insertMany( arr, { writeConcern :{w:3}});

print(i);

}

142

Schema Design

•Thestructureofdocumentsaffectsperformance.

•Optimizeforyourapplication’sread/writepatterns.

•Wewantasfewrequeststothedatabaseaspossibletoperformagivenapplicationtask.

See the data modeling section for more information.

Shard Key Considerations

•Chooseashardkeythatdistributesloadacrossyourcluster.

•Createashardkeysuchthatonlyasmallnumberofdocumentswill have the same value.

•Createashardkeythathasahighdegreeofrandomness.

•Yourshardkeyshouldenableamongostotargetasingleshardforagivenquery.

Indexes and Performance

•Readsandwritesthatdon’tuseanindexwillcrippleperformance.

•Incompoundindexes,ordermatters:

–Sort on a ﬁeld that comes before any range used in the index.

–You can’t s k i p ﬁ e l d s ; t h ey must be used in order.

–Revisit the indexing section for more detail.

143

8 Backup and Recovery

Backup and Recovery (page 144) An overview of backup options for MongoDB

8.1 Backup and Recovery

Disasters Do Happen

144

Human Disasters

Terminology: RPO vs. RTO

•Recovery Point Objective (RPO):Howmuchdatacanyouaffordtolose?

•Recovery Time Objective (RTO):Howlongcanyouaffordtobeoff-line?

Terminology: DR vs. HA

•Disaster Recovery (DR)

•High Availability (HA)

•Distinctbusinessrequirements

•Technicalsolutionsmayconverge

Quiz

• Q: What’s the hardest thing about backups?

•A:Restoringthem!

•Regularly test that restoration works!

145

Backup Options

•DocumentLevel

–Logical

–mongodump,mongorestore

•Filesystemlevel

–Physical

–Copy ﬁles

–Vo l u m e / d i s k s n a p s h o t s

Document Level: mongodump

•DumpscollectiontoBSONﬁles

•Mirrorsyourstructure

•Canberunliveorinofﬂinemode

•Doesnotincludeindexes(rebuiltduringrestore)

•--dbpath for direct ﬁle access

•--oplog to record oplog while backing up

•--query/filter selective dump

mongodump

$ mongodump --help

Export MongoDB data to BSON files.

options:

--help produce help message

-v [ --verbose ] be more verbose (include multiple times for

more verbosity e.g. -vvvvv)

--version print the program’s version and exit

-h [ --host ] arg mongo host to connect to ( /s1,s2 for

--port arg server port. Can also use --host hostname

-u [ --username ] arg username

-p [ --password ] arg password

--dbpath arg directly access mongod database files in path

-d [ --db ] arg database to use

-c [ --collection ] arg collection to use (some commands)

-o [ --out ] arg (=dump)output directory or "-" for stdout

-q [ --query ] arg json query

--oplog Use oplog for point-in-time snapshotting

146

File System Level

•Must use journaling!

•Copy/data/db ﬁles

•Orsnapshotvolume(e.g.,LVM,SAN,EBS)

•Seriously, always use journaling!

Ensure Consistency

Flush RAM to disk and stop accepting writes:

•db.fsyncLock()

•Copy/Snapshot

•db.fsyncUnlock()

File System Backups: Pros and Cons

•Entiredatabase

•Backupﬁleswillbelarge

•Fastestwaytocreateabackup

•Fastestwaytorestoreabackup

Document Level: mongorestore

•mongorestore

•--oplogReplay replay oplog to point-in-time

File System Restores

•Alldatabaseﬁles

•Selecteddatabasesorcollections

•ReplayOplog

147

Backup Sharded Cluster

1. Stop Balancer (and wait) or no balancing window

2. Stop one conﬁg server (data R/O)

3. Backup Data (shards, conﬁg)

4. Restart conﬁg server

5. Resume Balancer

Restore Sharded Cluster

1. Dissimilar # shards to restore to

2. Different shard keys?

3. Selective restores

4. Consolidate shards

5. Changing addresses of conﬁg/shards

Tips and Tricks

•mongodump/mongorestore

–--oplog[Replay]

–--objcheck/--repair

–--dbpath

–--query/--filter

•bsondump

–inspect data at console

•LVMsnapshottime/spacetradeoff

–Multi-EBS (RAID) backup

–clean up snapshots

148

9 Aggregation

Intro to Aggregation (page 149) An introduction to the the aggregation framework, pipeline concept, and select

stages

9.1 Intro to Aggregation

Learning Objectives

Upon completing this module students should understand:

•Theconceptoftheaggregationpipeline

•Keystagesoftheaggregationpipeline

• What aggregation expressions and variables are

•Thefundamentalsofusingaggregationfordataanalysis

Aggregation Basics

•UsetheaggregationframeworktotransformandanalyzedatainMongoDBcollections.

•ForthosewhoareusedtoSQL,aggregationcomprehendsthefunctionality of several SQL clauses like

GROUP_BY,JOIN,AS,andseveralotheroperationsthatallowustocomputedatasets.

•Theaggregationframeworkisbasedontheconceptofapipeline.

The Aggregation Pipeline

•AnaggregationpipelineinanalogoustoaUNIXpipeline.

•Eachstageofthepipeline:

–Receives a set of documents as input.

–Performs an operation on those documents.

–Produces a set of documents for use by the following stage.

•Apipelinehasthefollowingsyntax:

pipeline =[$stage1, $stage2, ...$stageN]

db.<COLLECTION>.aggregate( pipeline, { options } )

149

Aggregation Stages

•Therearemanyaggregationstages.

•Inthisintroductorylesson,we’llcover:

–$match:Similartofind()

–$project:Shapedocuments

–$sort:Likethecursormethodofthesamename

–$group:Usedtoaggregateﬁeldvaluesfrommultipledocuments

–$limit:Usedtolimittheamountofdocumentsreturned

–$lookup:ReplicatesanSQLleftouter-join

Aggregation Expressions and Variables

•Usedtorefertodatawithinanaggregationstage

•Expressions

–Use ﬁeld path to access ﬁelds in input documents, e.g. "$field"

•Variables

–Can be both user-deﬁned and system variables

–Can hold any type of BSON data

–Accessed like expressions, but with two $, e.g. "$$<variable>"

–For more information about variables in aggregation expressions, click here17

The Match Stage

•The$match operator works like the query phase of find()

•Documentsinthepipelinethatmatchthequerydocumentwillbepassedtosubsequentstages.

•$match is often the ﬁrst operator used in an aggregation stage.

•Likeotheraggregationoperators,$match can occur multiple times in a single pipeline.

17 https://docs.mongodb.com/manual/reference/aggregation-variables/

150

The Project Stage

•$projectallowsyoutoshapethedocumentsintowhatyouneedforthenextstage.

–The simplest form of shaping is using $project to select only the ﬁelds you are interested in.

–$project can also create new ﬁelds from other ﬁelds in the input document.

*E.g.,youcanpullavalueoutofanembeddeddocumentandputitatthe top level.

*E.g.,youcancreatearatiofromthevaluesoftwoﬁeldsaspassalong as a single ﬁeld.

•$projectproduces1outputdocumentforeveryinputdocument it sees.

ATwitterDataset

•Let’slookatsomeexamplesthatillustratetheMongoDBaggregation framework.

•Theseexamplesoperateonacollectionoftweets.

–As with any dataset of this type, it’s a snapshot in time.

–It may not reﬂect the structure of Twitter feeds as they look today.

Tweets Data Model

{

"text" :"Something interesting ...",

"entities" :{

"user_mentions" :[

{

"screen_name" :"somebody_else",

...

}

"urls" :[],

"hashtags" :[]

"user" :{

"friends_count" :544,

"screen_name" :"somebody",

"followers_count" :100,

...

}

151

Analyzing Tweets

•Imaginethetypesofanalysesonemightwanttodoontweets.

•It’scommontoanalyzethebehaviorofusersandthenetworksinvolved.

•Ourexampleswillfocusonthistypeofanalysis

Friends and Followers

•Let’slookagainattwostageswetouchedonearlier:

–$match

–$project

•Inourdataset:

–friends are those a user follows.

–followers are others that follow a user.

•Usingtheseoperatorswewillwriteanaggregationpipelinethatwill:

–Ignore anyone with no friends and no followers.

–Calculate who has the highest followers to friends ratio.

Exercise: Friends and Followers

db.tweets.aggregate( [

{ $match

:{"user.friends_count":{$gt:0},

"user.followers_count":{$gt:0}}},

{ $project

:{ ratio:{$divide:["$user.followers_count",

"$user.friends_count"]},

screen_name :"$user.screen_name"}},

{ $sort

:{ ratio:-1}},

{ $limit

:1}])

Exercise: $match and $project

•ThereisonedocumentperTwitteruser

•Oftheusersinthe“Brasilia”timezonewhohavetweeted100times or more, who has the largest number of

followers?

•Timezoneisfoundinthe“time_zone”ﬁeldoftheuserobjectin each tweet.

•Thenumberoftweetsforeachuserisfoundinthe“statuses_count” ﬁeld.

•Aresultdocumentshouldlooksomethinglikethefollowing:

{_id :ObjectId(’52fd2490bac3fa1975477702’),

followers :2597,

screen_name:’marbles’,

tweets :12334

}

152

The Group Stage

•Forthosecomingfromtherelationalworld,$group is similar to the SQL GROUP BY statement.

•$groupoperationsrequirethatwespecifywhichﬁeldtogroup on.

•Documentswiththesameidentiﬁerwillbeaggregatedtogether.

•With$group,weaggregatevaluesusing

accumulators18.

Tweet Source

•Thetweetsinourtwittercollectionhaveaﬁeldcalledsource.

•Thisﬁelddescribestheapplicationthatwasusedtocreatethe tweet.

•Let’swriteanaggregationpipelinethatidentiﬁestheapplications most frequently used to publish tweets.

Exercise: Tweet Source

db.tweets.aggregate( [

{"$group" :{"_id" :"$source",

"count" :{"$sum" :1}}},

{"$sort" :{"count" :-1}}

])

Group Aggregation Accumulators

Accumulators available in the group stage:

•$sum

•$avg

•$ﬁrst

•$last

•$max

•$min

•$push

•$addToSet

18 http://docs.mongodb.org/manual/meta/aggregation-quick-reference/#accumulators

153

Rank Users by Number of Tweets

•Onecommontaskistorankusersbasedonsomemetric.

•Let’slookatwhotweetsthemost.

•Earlierwedidthesamethingfortweetsource.

–Group together all tweets by a user for every user in our collection

–Count the tweets for each user

–Sort in decreasing order

•Let’saddthelistoftweetstotheoutputdocuments.

•Needtouseanaccumulatorthatworkswitharrays.

•Canuseeither$addToSetor$push.

Exercise: Adding List of Tweets

For each user, aggregate all their tweets into a single array.

db.tweets.aggregate( [

{"$group" :{"_id" :"$user.screen_name",

"tweet_texts" :{"$push" :"$text" },

"count" :{"$sum" :1}}},

{"$sort" :{"count" :-1}},

{"$limit" :3}

])

The Sort Stage

•Usesthe$sortoperator

•Workslikethesort() cursor method

•1tosortascending;-1tosortdescending

•E.g,db.testcol.aggregate([{$sort:{b:1,a:-1}}])

The Skip Stage

•Usesthe$skipoperator

•Worksliketheskip() cursor method.

•Valueisanintegerspecifyingthenumberofdocumentstoskip.

•E.g,thefollowingwillpassallbuttheﬁrst3documentstothe next stage in the pipeline.

–db.testcol.aggregate( [ { $skip : 3 }, ... ] )

154

The Limit Stage

•Usedtolimitthenumberofdocumentspassedtothenextaggregation stage.

•Workslikethelimit() cursor method.

•Valueisaninteger.

•E.g.,thefollowingwillonlypass3documentstothestagethat comes next in the pipeline.

–db.testcol.aggregate( [ { $limit: 3 }, ... ] )

The Lookup Stage

•Pullsdocumentsfromasecondcollectionintothepipeline

–The second collection must be in the same database

–The second collection cannot be sharded

•Documentsbasedonamatchingﬁeldineachcollection

•Previously,youcouldgetthisbehaviorwithtwoseparatequeries

The Lookup Stage (continued)

•Documentsbasedonamatchingﬁeldineachcollection

•Previously,youcouldgetthisbehaviorwithtwoseparatequeries

–One to the collection that contains reference values

–The other to the collection containing the documents referenced

Example: Using $lookup

•Importthecompaniesdatasetintoacollectioncalledcompanies

•Createaseparatecollectionfor$lookup

// BEGIN EXAMPLES LOOKUP INSERT

db.commentOnCategory.insertMany( [

{ category_id

:"consulting",

comment:"Consulting - giving advices" },

{ category_id

:"consulting",

comment:"Consulting - providing human resources" },

{ category_id

:"enterprise",

comment:"Enterprise - constructing starships" },

{ category_id

:"finance",

comment:"Finance - making money" },

{ category_id

:"hardware",

155

Example: Using $lookup (Continued)

comment:"Hardware - from a hammer to a laptop" },

{ category_id

:"software",

comment:"Software - everything else that is missing in order to have a solution

!→"},

{ category_id

:null,

comment:"Null - have not decided yet was the business is about" },

{ category_id

:null,

comment:"Null - can’t really disclose what we do" },

{ category_id

:null,

comment:"Null - is not in business anymore" }

])

// END EXAMPLES LOOKUP INSERT

// BEGIN EXAMPLES LOOKUP AGGREGATION

db.companies.aggregate( [

{ $match

:{ number_of_employees:{ $gte:200000 }}},

{ $sort :{ number_of_employees:-1}},

156

10 Views

Views Tutorial (page 157) Creating and Deleting views

Lab: Vertical Views (page 159) Creating a vertical view lab

Lab: Horizontal Views (page 160) Creating a horizontal view lab

Lab: Reshaped Views (page 161) Creating a reshaped view lab

10.1 Views Tutorial

Learning Objectives

Upon completing this module students should understand:

• What a view is

• What views are useful for

•Howtocreateanddropaview

•Internalmechanismsofaview

What a View is

•Anon-materializedcollectioncreatedfromoneormoreother collections.

•ForthosewhoareusedtoSQL,MongoDBviewsareequivalent.

•Canbethoughtofasapredeﬁnedaggregationthatcanbequeried.

What Views are useful for

•Viewsprovideanexcellentmechanismfordataabstraction.

•Viewsprovideanexcellentmeanstoprotectdata

–Sensitive data from a collection can be projected out of the view

–Views are read only

–Combined with role based authorization allows to select information by roles

157

How to create and drop a view

•Creatingaviewisastraightforwardprocess.

–We must give our view a <name>, which will be the name we can access it by

–We must specify a <source> collection

–We must deﬁne an aggregation <pipeline> to ﬁll our new view with data

–Optionally, we may also specify a <collation>

Example - Creating a view

# db.createView(<name>, <source>, <pipeline>, <collation>)

db.createView("contact_info","patients",[

{$project:{

_id: 0,

first_name: 1,

last_name: 1,

gender: 1,

email: 1,

phone: 1

}

])

# views are shown along with other collections

show collections

# views metadata is stored in the system.views collection

db.system.views.find()

Dropping Views

•Viewscanbedroppedlikeanyothercollection

db.contact_info.drop()

Internal mechanisms of a view

Views can be thought of as a predeﬁned aggregation. As such:

•Viewsdonotcontainanydatanortakediskspacebythemselves

•Viewswillbeneﬁtgreatlyfromindexesonthesourcecollection in their $match stage

•Viewsareconsideredshardediftheirunderlyingcollection is sharded.

•Viewsareimmutable,andcannotberenamed

•Aviewwillnotberemovediftheunderlyingcollectionisremoved

158

10.2 Lab: Vertical Views

Exercise: Vertical View Creation

It is useful to create vertical views to give us a lens into a subset of our overall data.

•Startbyimportingthenecessarydataifyouhavenotalready.

tar xvzf views_dataset.tar.gz

# for version >= 3.4

mongoimport -d companies -c complaints --drop views_dataset.json

To help you verify your work, there are 404816 entries in this dataset.

Exercise : Vertical View Creation Instructions

Once you’ve veriﬁed the data import was successful:

•CreateaviewthatonlyshowscomplaintsinNewYork

•Ensuretheviewshowsthemostrecentlysubmittedcomplaints by default

Exercise : Vertical View Creation Instructions Result

The resulting data should look like:

db.companyComplaintsInNY.findOne()

{

"complaint_id" :1416985,

"product" :"Debt collection",

"sub-product" :"",

"issue" :"Cont’d attempts collect debt not owed",

"sub-issue" :"Debt is not mine",

"state" :"NY",

"zip_code" :11360,

"submitted_via" :"Web",

"date_received" :ISODate("2015-06-11T04:00:00Z"),

"date_sent_to_company" :ISODate("2015-06-11T04:00:00Z"),

"company" :"Transworld Systems Inc.",

"company_response" :"In progress",

"timely_response" :"Yes",

"consumer_disputed" :""

}

159

Exercise: Vertical View Creation Validation Instructions

Verify t h e v i ew i s f u n c t ioning c o r r e c t l y.

•Insertthedocumentonthefollowingslide

•Queryyournewlycreatedview

•Thenewlyinserteddocumentshouldbetheﬁrstintheresultset

Exercise: Vertical View Creation Validation Instructions Cont’d

db.complaints.insert({

"complaint_id" :987654,

"product" :"Food and Beverage",

"sub-product" :"Coffee",

"issue" :"Coffee is too hot",

"sub-issue" :"",

"state" :"NY",

"zip_code" :11360,

"submitted_via" :"Web",

"date_received" :new Date(),

"date_sent_to_company" :"pending",

"company" :"CoffeeMerks",

"company_response" :"",

"timely_response" :"",

"consumer_disputed" :""

})

10.3 Lab: Horizontal Views

Exercise: Horizontal View Creation

Horizontal views allow us to provide a selective set of ﬁelds of the underlying collection of documents for efﬁciency

and role-based ﬁltering of data.

•Let’sgoaheadandcreateahorizontalviewofourdataset.

•Startbyimportingthenecessarydataifyouhavenotalready.

mongoimport -d companies -c complaints --drop views_dataset.json

To help you verify your work, there are 404816 entries in this dataset.

160

Exercise : Horizontal View Creation Instructions

Once you’ve veriﬁed the data import was successful, create a view that only shows the the following ﬁelds:

•product

•company

•state

Exercise : Horizontal View Creation Instructions Result

The resulting data should look like:

db.productComplaints.findOne()

{

"product" :"Debt collection",

"state" :"FL",

"company" :"Enhanced Recovery Company, LLC"

}

10.4 Lab: Reshaped Views

Exercise: Reshaped View

We can create a reshaped view of a collection to enable more intuitive data queries and make it easier for applications

to perform analytics.

It is also possible to create a view from a view.

•Usetheaggregationframeworktocreateareshapedviewofour dataset.

•Itisnecessarytohavecompleted

Lab: Horizontal Views (page 160)

Exercise : Reshaped View Speciﬁcation

Create a view that can be queried by company name that shows theamountofcomplaintsbystate.Theresultingdata

should look like:

db.companyComplaintsByState.find({"company":"ROCKY MOUNTAIN MORTGAGE COMPANY"})

{

"company" :"ROCKY MOUNTAIN MORTGAGE COMPANY",

"states" :[

{

"state" :"TX",

"count" :4

}

]

}

161

11 Security

Security Introduction (page 162) AhighleveloverviewofsecurityinMongoDB

Authorization (page 165) Authorization in MongoDB

Lab: Administration Users (page 171) Lab on creating admin users

Lab: Create User-Deﬁned Role (Optional) (page 172) Lab on creating custom user roles

Authentication (page 174) Authentication in MongoDB

Lab: Secure mongod (page 175) Lab on standing up a mongod with authorization enabled

Auditing (page 176) Auditing in MongoDB

Encryption (page 178) Encryption at rest in MongoDB

Log Redaction (page 180) Enabling log redaction in MongoDB

Lab: Secured Replica Set - KeyFile (Optional) (page 181) Using keyﬁles to secure a replica set

Lab: LDAP Authentication & Authorization (Optional) (page 184) Authentication & authorization with LDAP

Lab: Security Workshop (page 186) Securing a full deployment

11.1 Security Introduction

Learning Objectives

Upon completing this module students should understand:

•Thehigh-leveloverviewofsecurityinMongoDB

•SecurityoptionsforMongoDB

–Authentication

–Authorization

–Transport Encryption

–Enterprise only features

162

AHighLevelOverview

Security Mechanisms

163

Authentication Options

•Community

–Challenge/response authentication using SCRAM-SHA-1 (username & password)

–X.509 Authentication (using X.509 Certiﬁcates)

• Enterprise

–Kerberos

–LDAP

Authorization via MongoDB

•Predeﬁnedroles

•Customroles

•LDAPauthorization(MongoDBEnterprise)

–Query LDAP server for groups to which a user belongs.

–Distinguished names (DN) are mapped to roles on the admin database.

–Requires external authentication (x.509, LDAP, or Kerberos).

Transport Encryption

•TLS/SSL

–May use certiﬁcates signed by a certiﬁcate authority or self-signed.

•FIPS(MongoDBEnterprise)

Network Exposure Options

•bindIp limits the ip addresses the server listens on.

•Usinganon-standardportcanprovidealayerofobscurity.

•MongoDBshouldstillberunonlyinatrustedenvironment.

164

Security Flow

11.2 Authorization

Learning Objectives

Upon completing this module, students should be able to:

•OutlineMongoDB’sauthorizationmodel

•Listauthorizationresources

•Describeactionsuserscantakeinrelationtoresources

•Createroles

•Createprivileges

•OutlineMongoDBbuilt-inroles

•Grantrolestousers

•ExplainLDAPauthorization

165

Authorization vs Authentication

Authorization and Authentication are generally confused and misinterpreted concepts:

•Authorizationdeﬁnestherulesbywhichuserscaninteractwith a given system:

–Which operations can they perform

–Over which resources

•Authenticationisthemechanismbywhichusersidentifyandaregrantedaccesstoasystem:

–Validat i o n o f c r e d e n t i a l s and iden t i t i e s

–Controls access to the system and operational interfaces

Authorization Basics

•MongoDBenforcesarole-basedauthorizationmodel.

•Auserisgrantedrolesthatdeterminetheuser’saccesstodatabase resources and operations.

The model determines:

• Which roles are granted to users

• Which privileges are associated with roles

• Which actions can be performed over different resources

What is a resource?

•Databases?

•Collections?

•Documents?

• Users?

•Nodes?

•Shard?

•ReplicaSet?

166

Authorization Resources

•Databases

•Collections

•Cluster

Cluster Resources

Conﬁg ver

Router

(mongos)

d d

ﬁ ervers

Router

(mongos)

e Routers

ver ver

(r (r

Types of Actions

Given a resource, we can consider the available actions:

•Queryandwriteactions

•Databasemanagementactions

•Deploymentmanagementactions

•Replicationactions

•Shardingactions

•Serveradministrationactions

•Diagnosticactions

•Internalactions

167

Speciﬁc Actions of Each Type

Query / Write Database Mgmt Deployment Mgmt

ﬁnd enableProﬁler planCacheRead

insert createIndex storageDetails

remove createCollection authSchemaUpgrade

update changeOwnPassword killop

... ...

See the complete list of actions19 in the MongoDB documentation.

Authorization Privileges

Aprivilegedeﬁnesapairingbetweenaresourceasasetofpermitted actions.

Resource:

{"db":"yourdb","collection":"mycollection"}

Action: find

Privilege:

{

resource:{"db":"yourdb","collection":"mycollection"},

actions:["find"]

}

Authorization Roles

MongoDB grants access to data through a role-based authorization system:

•Built-inroles:pre-cannedrolesthatcoverthemostcommonsetsofprivilegesusersmayrequire

•User-deﬁnedroles:ifthereisaspeciﬁcsetofprivilegesnot covered by the existing built-in roles you are able

to create your own roles

Built-in Roles

Database Admin Cluster Admin All Databases

dbAdmin clusterAdmin readAnyDatabase

dbOwner clusterManager readWriteAnyDatabase

userAdmin clusterMonitor userAdminAnyDatabase

hostManager dbAdminAnyDatabase

Database User Backup & Restore

read backup

readWrite restore

Superuser Internal

root __system

19 https://docs.mongodb.com/manual/reference/privilege-actions/

168

Built-in Roles

To grant roles while creating an user:

use admin

db.createUser(

{

user:"myUser",

pwd:"$up3r$3cr7",

roles:[

{role:"readAnyDatabase",db:"admin"},

{role:"dbOwner",db:"superdb"},

{role:"readWrite",db:"yourdb"}

]

}

)

Built-in Roles

To grant roles to existing user:

use admin

db.grantRolesToUser(

"reportsUser",

[

{ role

:"read",db:"accounts" }

]

)

User-deﬁned Roles

•Ifnosuitablebuilt-inroleexists,wecancancreatearole.

•Deﬁne:

–Role name

–Set of privileges

–List of inherit roles (optional)

use admin

db.createRole({

role:"insertAndFindOnlyMyDB",

privileges:[

{resource:{db:"myDB", collection:"" }, actions:["insert","find"]}

roles:[]})

169

Role Privileges

To check the privileges of any particular role we can get that information using the getRole method:

db.getRole("insertAndFindOnlyMyDB", {showPrivileges:true})

LDAP Authorization

As of MongoDB 3.4, MongoDB supports authorization with LDAP.

How it works:

1. User authenticates via an external mechanism

$ mongo --username alice \

--password secret \

--authenticationMechanism PLAIN \

--authenticationDatabase ’$external’

LDAP Authorization (cont’d)

2. Username is tranformed into LDAP query

[

{

match:"(.+)@ENGINEERING",

substitution:"cn={0},ou=engineering,dc=example,dc=com"

}, {

match:"(.+)@DBA",

substitution:"cn={0},ou=dba,dc=example,dc=com"

}

]

LDAP Authorization (cont’d)

3. MongoDB queries the LDAP server

•Asingleentity’sattributesaretreatedastheuser’sroles

•Multipleentitiy’sdistinguishednamesaretreatedastheuser’s roles

170

Mongoldap

mongoldap can be used to test conﬁgurations between MongoDB and an LDAP server

$ mongoldap -f mongod.conf \

--user "uid=alice,ou=Users,dc=example,dc=com" \

--password secret

11.3 Lab: Administration Users

Premise

Security roles often span different levels:

•Superuserroles

•DBAroles

•Systemadministrationroles

•Useradministrationroles

•Applicationroles

In this lab we will look at several types of administration roles.

User Administration user

•Generally,incomplexsystems,weneedsomeonetoadminister users.

•Thisroleshouldbedifferentfromaroot level user for a few reasons.

•root level users should be used has last resort user

•Administrationofusersisgenerallyrelatedwithsecurityofﬁcers

Create User Admin user

Create a user that will administer other users:

db.createUser(

{

user:"securityofficer",

pwd:"doughnuts",

customData:{ notes:["admin","the person that adds other persons"]},

roles:[

{ role

:"userAdminAnyDatabase",db:"admin" }

]

})

171

Create DBA user

DBAs are generally concerned with maintenance operations inthedatabase.

db.createUser(

{

user:"dba",

pwd:"i+love+indexes",

customData:{ notes:["admin","the person that admins databases"]},

roles:[

{ role

:"dbAdmin",db:"X" }

]

})

If we want to make sure this DBA can administer all databases ofthesystem,whichrole(s)shouldhehave? Seethe

MongoDB documentation20.

Create a Cluster Admin user

Cluster administration is generally an operational role that differs from DBA in the sense that is more focussed on the

deployment and cluster node management.

For a team managing a cluster, what roles enable individuals to do the following?

•Addandremovereplicanodes

•Manageshards

•Dobackups

•Cannotreaddatafromanyapplicationdatabase

11.4 Lab: Create User-Deﬁned Role (Optional)

Premise

•MongoDBprovidesasetofbuilt-inroles.

•Pleaseconsiderthosebeforegeneratinganotherroleonyour system.

•Sometimesitisnecessarytocreaterolesmatchspeciﬁctheneeds of a system.

•Forthatwecanrelyonuser-deﬁnedrolesthatsystemadministrators can create.

•ThisfunctionshouldbecarriedbyuserAdmin level administration users.

20 https://docs.mongodb.com/manual/reference/built-in-roles/

172

Deﬁne Privileges

•Rolesaresetsofprivilegesthatauserisgranted.

•Createarolewiththefollowingprivileges:

–User can read user details from database brands

–Can list all collections of database brands

–Can update all collections on database brands

–Can write to the collection automotive in database brands

Create the JSON array that describes the requested set of privileges.

Create Role

•Giventheprivilegeswejustdeﬁned,wenowneedtocreatethis role speciﬁc to database brands.

•Thenameofthisroleshouldbecarlover

• What command do we need to issue?

Grant Role: Part 1

We now want to grant this role to the user named ilikecars on the database brands.

use brands;

db.createUser(

{

user:"ilikecars",

pwd:"ferrari",

customData:{notes:["application user"]},

roles:[

{role:"carlover",db:"brands"}

]

})

Grant Role: Part 2

•Wenowwanttograntgreaterresponsibilitytoourrecentlycreated ilikecars!

•Let’sgrantthedbOwner role to the ilikecars user.

173

Revoke Role

•Let’sassumethattherolecarlover is no longer valid for user ilikecars.

•Howdowerevokethisrole?

11.5 Authentication

Learning Objectives

Upon completing this module, you should understand:

•Authenticationmechanisms

•Externalauthentication

•Nativeauthentication

•Internalnodeauthentication

•Conﬁgurationofauthenticationmechanisms

Authentication

•Authenticationisconcernedwith:

–Validat i n g i d e n t i t i e s

–Managing certiﬁcates / credentials

–Allowing accounts to connect and perform authorized operations

•MongoDBprovidesnativeauthenticationandsupportsX509certiﬁcates, LDAP, and Kerberos as well.

Authentication Mechanisms

MongoDB supports a number of authentication mechanisms:

•SCRAM-SHA-1(default>=3.0)

• MONGODB-CR (legacy)

•X509Certiﬁcates

•LDAP(MongoDBEnterprise)

•Kerberos(MongoDBEnterprise)

174

Internal Authentication

For internal authentication purposes (mechanism used by replica sets and sharded clusters) MongoDB relies on:

•Keyﬁles

–Shared password ﬁle used by replica set members

–Hexadecimal value of 6 to 1024 chars length

•X509Certiﬁcates

Simple Authentication Conﬁguration

To get started we just need to make sure we are launching our mongod instances with the --auth parameter.

mongod --dbpath /data/db --auth

For any connections to be established to this mongod instance, the system will require a username and password.

mongo --authenticationDatabase admin -u user -p

!→

!→MongoDB shell version: 3.2.5

Enter password:

11.6 Lab: Secure mongod

Premise

It is time for us to get started setting up our ﬁrst MongoDB instance with authentication enabled!

Launch mongod

Let’s start by launching a mongod instance:

mkdir /data/secure_instance_dbpath

mongod --dbpath /data/secure_instance_dbpath --port 28000

At this point there is nothing special about this setup. It is just an ordinary mongod instance ready to receive connec-

tions.

175

Root level user

Create a root level user:

mongo --port 28000 admin // Puts you in the _admin_ database

use admin

db.createUser( {

user:"maestro",

pwd:"maestro+rules",

customData:{ information_field:"information value" },

roles:[ {role:"root",db:"admin" }]

})

Enable Authentication

Launch mongod with auth enabled

mongo admin --port 28000 --eval ’db.shutdownServer()’

mongod --port 28000 --dbpath /data/secure_instance_dbpath --auth

Connect using the recently created maestro user.

mongo --port 28000 admin -u maestro -p

11.7 Auditing

Learning Objectives

Upon completing this module, you should be able to:

•OutlinetheauditingcapabilitiesofMongoDB

•Enableauditing

•Summarizeauditingconﬁgurationoptions

Auditing

•MongoDBEnterpriseincludesanauditingcapabilityformongod and mongos instances.

•Theauditingfacilityallowsadministratorsanduserstotrack system activity

•Importantfordeploymentswithmultipleusersandapplications.

176

Audit Events

Once enabled, the auditing system can record the following operations:

•Schema

•Replicasetandshardedcluster

•Authenticationandauthorization

•CRUDoperations(DML,offbydefault)

Auditing Conﬁguration

The following are command-line parameters to mongod/mongos used to conﬁgure auditing.

Enable auditing with --auditDestination.

•--auditDestination:wheretowritetheauditlog

–syslog

–console

–ﬁle

•--auditPath:auditlogpathincasewedeﬁne“ﬁle”asthedestination

Auditing Conﬁguration (cont’d)

•--auditFormat:theoutputformatoftheemittedeventmessages

–BSON

–JSON

•--auditFilter:anexpressionthatwillﬁlterthetypesofeventsthesystemrecords

By default we only audit DDL operations but we can also enable DML (requires auditAuthorizationSuccess set to

true)

Auditing Message

The audit facility will launch a message every time an auditable event occurs:

{

atype: <String>,

ts : { "$date": <timestamp> },

local: { ip: <String>, port: <int> },

remote: { ip: <String>, port: <int> },

users : [ { user: <String>, db: <String> }, ... ],

roles: [ { role: <String>, db: <String> }, ... ],

param: <document>,

result: <int>

}

177

Auditing Conﬁguration

If we want to conﬁgure our audit system to generate a JSON ﬁle we would need express the following command:

mongod --auditDestination file --auditPath /some/dir/audit.log --auditFormat JSON

If we want to capture events from a particular user myUser:

mongod --auditDestination syslog --auditFilter ’{"users.user": "myUser"}’

To enable DML we need to set a speciﬁc parameter:

mongod --auditDestination console --setParameter auditAuthorizationSuccess=true

11.8 Encryption

Learning Objectives

Upon completing this module, students should understand:

•TheencryptioncapabilitiesofMongoDB

•Networkencryption

•Nativeencryption

•Thirdpartyintegrations

Encryption

MongoDB offers two levels of encryption

•Transportlayer

•Encryptionatrest(MongoDBEnterprise>=3.2)

Network Encryption

•MongoDBenablesTLS/SSLfortransportlayerencryptionoftrafﬁc between nodes in a cluster.

•Threedifferentnetworkarchitectureoptionsareavailable:

–Encryption of application trafﬁc connections

–Full encryption of all connections

–Mixed encryption between nodes

178

Native Encryption

MongoDB Enterprise comes with a encrypted storage engine.

•NativeencryptionsupportedbyWiredTiger

•Encryptsdataatrest

–AES256-CBC: 256-bit Advanced Encryption Standard in CipherBlockChainingmode(default)

*symmetric key (same key to encrypt and decrypt)

–AES256-GCM: 256-bit Advanced Encryption Standard in Galois/Counter Mode

–FIPS is also available

•Enablesintegrationwithkeymanagementtools

Encryption and Replication

•Encryptionisnotpartofreplication:

–Data is not natively encrypted on the wire

*Requires transport encryption to ensure secured transmission

–Encryption keys are not replicated

*Each node should have their own individual keys

Third Party Integration

•KeyManagementInteroperabilityProtocol(KMIP)

–Integrates with Vormetric Data Security Manager (DSM) and SafeNet KeySecure

•StorageEncryption

–Linux Uniﬁed Key Setup (LUKS)

–IBM Guardium Data Encryption

–Vo r m e t r i c D a t a S e c u r i t y P l a t f o r m

*Also enables Application Level Encryption on per-ﬁeld or per-document

–Bitlocker Drive Encryption

179

11.9 Log Redaction

Learning Objectives

Upon completing this module students should understand:

• What log redaction is

•Howtoenableanddisablelogredaction

What is log redaction?

•Logredaction,whenenabled,preventsthefollowing

–Details about speciﬁc queries from showing in the log when verbose mode is enabled

–Details about speciﬁc queries that trigger a proﬁling event (a slow query, for example)

Enabling Log Redaction

•Thereareseveralwaystoenablelogredaction

–In the conﬁguration ﬁle via redactClientLogData: true under security

–Passing the command line argument --redactClientLogData when starting a mongod or mongos

–Connecting to a mongod or mongos and running

db.adminCommand({

setParameter:1, redactClientLogData:true

})

Exercise: Enable Log Redaction Setup

For this exercise we’re going to start a mongod process with verbose logging enabled and then enable log redaction

•Startamongod with verbose logging enabled

mkdir -p data/db

mongod -v --dbpath data/db --logpath data/mongod.log --logappend --port 31000 --fork

•Inanotherterminal,tailthemongod.log to view realtime logging events

tail -f data/mongod.log

180

Exercise: Enable Log Redaction (cont)

•Connecttoyourmongod process from the shell.

•Useadatabasecalledrd and insert a document, observing the output in mongod.log with tail.

mongo --port 31000

use rd

db.foo.insertOne({name:"bob", medicalCondition:"SENSITIVE, should not be logged"})

•Inthelogoutput,youshouldseesomethingsimilartothefollowing:

2017-04-28T09:39:41.629-0700 I COMMAND [conn1]command rd.foo appName: "MongoDB Shell

!→"command: insert {

insert: "foo", documents: [{_id: ObjectId(’5903704d2482ced24904c8a6’),

name: "bob", medicalCondition: "SENSITIVE, should not be logged"

}],

...

Exercise: Enable Log Redaction (cont)

•Fromthemongoshell,enablelogredaction

•Insertanotherdocument

mongo --port 31000

use rd

db.foo.insertOne({name:"mary", medicalCondition:"SENSITIVE, should not be logged"})

•Verifythatthedocumentisbeingredactedinthelog

2017-04-28T12:23:07.111-0700 I COMMAND [conn1]command rd.foo appName: "MongoDB Shell

!→"command: insert {

insert: "###", documents: [{_id: "###", name: "###", medicalCondition: "###" }],

...

11.10 Lab: Secured Replica Set - KeyFile (Optional)

Premise

Security and Replication are two aspects that are often neglected during the Development phase to favor usability and

faster development.

These are also important aspects to take in consideration foryourProductionenvironments,sinceyouprobablydon’t

want to have your production environment Unsecured and without High Availability!

This lab is to get fully acquainted with all necessary steps tocreateasecuredreplicasetusingthekeyfile for cluster

authentication mode

181

Setup Secured Replica Set

AfewstepsarerequiredtofullysetupasecuredReplicaSet:

1. Instantiate one mongod node with no auth enabled

2. Create a root level user

3. Create a clusterAdmin user

4. Generate a keyﬁle for internal node authentication

5. Re-instantiate a mongod with auth enabled, keyfile deﬁned and replSet name

6. Add Replica Set nodes

We will also be basing our setup using MongoDB conﬁguration ﬁles21

Instantiate mongod

This is a rather simple operation that requires just a simple instruction:

$pwd

/data

$ mkdir -p /data/secure_replset/

{1,2,3};cd secure_replset/1

Then go to this yaml ﬁle22 and copy it into your clipboard

$ pbpaste > mongod.conf; cat mongod.conf

Instantiate mongod (cont’d)

systemLog:

destination: file

path: "/data/secure_replset/1/mongod.log"

logAppend: true

storage:

dbPath: "/data/secure_replset/1"

wiredTiger:

engineConfig:

cacheSizeGB: 1

net:

port: 28001

processManagement:

fork: true

# setParameter:

#enableLocalhostAuthBypass:false

# security:

#keyFile:/data/secure_replset/1/mongodb-keyfile

21 https://docs.mongodb.org/manual/reference/conﬁguration-options/

22 https://github.com/thatnerd/work-public/blob/master/mongodb_trainings/secure_replset_conﬁg.yaml

182

Instantiate mongod (cont’d)

After deﬁning the basic conﬁguration we just need to call mongod passing the conﬁguration ﬁle.

mongod -f mongod.conf

Create root user

We start by creating our typical root user:

$ mongo admin --port 28001

>use admin

>db.createUser(

{

user:"maestro",

pwd:"maestro+rules",

roles:[

{ role

:"root",db:"admin" }

]

})

Create clusterAdmin user

We then need to create a clusterAdmin user to enable management of our replica set.

$ mongo admin --port 28001

>db.createUser(

{

user:"pivot",

pwd:"i+like+nodes",

roles:[

{ role

:"clusterAdmin",db:"admin" }

]

})

Generate a keyﬁle

For internal Replica Set authentication we need to use a keyﬁle.

openssl rand -base64 741 > /data/secure_replset/1/mongodb-keyfile

chmod 600 /data/secure_replset/1/mongodb-keyfile

183

Add keyﬁle to the conﬁguration ﬁle

Now that we have the keyﬁle generated it’s time to add that information to our conﬁguration ﬁle. Just un-comment the

last few lines.

systemLog:

destination: file

path: "/data/secure_replset/1/mongod.log"

logAppend: true

storage:

dbPath: "/data/secure_replset/1"

net:

port: 28001

processManagement:

fork: true

setParameter:

enableLocalhostAuthBypass: false

security:

keyFile: /data/secure_replset/1/mongodb-keyfile

Conﬁguring Replica Set

•Nowit’stimetoconﬁgureourReplicaSet

•ThedesiredsetupforthisReplicaSetshouldbenamed“VAULT”

•Itshouldconsistof3databearingnodes

11.11 Lab: LDAP Authentication & Authorization (Optional)

Premise

•Authenticationandauthorizationwithanexternalservice(likeLDAP)isanimportantfunctionalityforlarge

organizations that rely on centralized user management tools.

•Thislabisdesignedtogetyoufamiliarwiththeproceduretorunamongod with authentication and authoriza-

tion enabled with an external LDAP service.

Test Connection to LDAP

•AnLDAPserverisupandrunningforyoutoconnectto.

•Server Info:

–ServerAddress:192.168.19.100:8389

–User:uid=alice,ou=Users,dc=mongodb,dc=com

–Password:secret

184

Test Connection to LDAP (cont’d)

•Yourgoalistoﬁllinthefollowingconﬁgurationﬁleandgetmongoldap to successfully talk to the LDAP

server with the following command:

$ mongoldap --config mongod.conf --user alice --password secret

...

security:

authorization: "enabled"

ldap:

servers: "XXXXXXXXXXXXXX:8389"

authz:

queryTemplate: "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

userToDNMapping: ’[{match: "XXXX", substitution:

!→"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"}]’

transportSecurity: "none"

bind:

method: "simple"

setParameter:

authenticationMechanisms: PLAIN

Authentication with LDAP

•Onceyou’vesuccessfullyconnectedtoLDAPwithmongoldap you should be able to use the same conﬁg ﬁle

with mongod.

$ mongod --config mongod.conf

•Fromhereyoushouldbeabletoauthenticatewithalice and secret.

$ mongo --username alice \

--password secret \

--authenticationMechanism PLAIN \

--authenticationDatabase ’$external’

Authorization with LDAP

•Aftersuccessfullyauthenticating with LDAP, you’ll need to take advantage of the localhost exception to enable

authorization with LDAP.

•Createarolethatallowsanyonewhoisapartofthecn=admins,ou=Users,dc=mongodb,dc=com LDAP group

to be able to manage users (e.g., inheriting userAdminAnyDatabase).

•Toconﬁrmthatyou’vesuccessfullysetupauthorizationthefollowingcommandshouldexecutewithouterrorif

you’re authenticated as alice since she’s apart of the group.

> use admin

> db.getRoles

()

185

11.12 Lab: Security Workshop

Learning Objectives

Upon completing this workshop, attendees will be able to:

•SecureapplicationcommunicationwithMongoDB

•UnderstandallsecurityauthenticationandauthorizationoptionsofMongoDB

•EncryptMongoDBdataatrestusingencryptedstorageengine

•Enableauditingandunderstandtheperformanceimplications

•FeelcomfortabledeployingandsecurelyconﬁguringMongoDB

Introduction

In this workshop, attendees will install and conﬁgure a secure replica set on servers running in AWS.

•WearegoingtosecurethebackendcommunicationsusingTLS/SSL

•Enableauthorizationonthebackendside

•Encryptthestoragelayer

•Makesurethatthereareno“leaks” of information

List of exercises

• 1: Accessing your AWS instances

•2:StartingMongoDBandconﬁguringthereplicaset

•3:LaunchtheClientApplication

•4:Setuplocalaccounts

•5:EnableSSLbetweenthenodes

•6:EnableSSLConnectionfromthemongoshellandfromtheApplication

•7:EncryptStorageLayer

•8:Avoidanylogleaks

•9:EnableAuditing

186

Exercise: Accessing your instances from Windows

•DownloadandinstallPuttyfromhttp://www.putty.org/

•StartPuttywith:All Programs > PuTTY > PuTTY

•InSession:

–In the Host Name box, enter centos@<publicIP>

–Under Connection type,selectSSH

•InConnection/SSH/Auth,

–Browse to the AdvancedAdministrator.ppk ﬁle

•ClickOpen

•Detailedinfoat:

Connect to AWS with Putty23

Exercise: Accessing your instances from Linux or Mac

•Getyour.pemﬁleandclosethepermissionsonit

chmod 600 AdvancedAdministrator.pem

•Enablethekeychainandssh into node1,propagatingyourcredentials

ssh-add -K AdvancedAdministrator.pem

ssh -i AdvancedAdministrator.pem -A centos@54.235.1.1

•SSHintonode2 from node1

ssh -A node2

Solution: Accessing your instances

In our machines we will have access to all nodes in the deployment:

cat /etc/hosts

A/share/downloads folder with all necessary software downloaded

ls /share/downloads

ls /etc/ssl/mongodb

23 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html

187

Important Notes

•onlyusesudo when needed, otherwise you can run in permission issues

–use ‘sudo service ...’ to start mongod

•replicasetshouldbenamedSECURED

•mustuseConﬁg Files to start the mongod processes, not command line options

–use /etc/mongod.conf

–careful with the spacing in the YAML ﬁle

Exercise 2: Starting MongoDB and conﬁguring the replica set

•/share/downloads/mongodb_packages contains MongoDB 3.2 and 3.4

•Installationinstructionsareat:

–https://docs.mongodb.com/manual/tutorial/install-mongodb-enterprise-on-red-hat/

•Conﬁgurethe3nodesasareplicasetnamedSECURED

•Usenode1,node2 and node3 for your host names

•YouMUST use a conﬁg ﬁle24

Solution 2: Installing MongoDB

• Installation

cd /data/downloads/mongodb_packages

sudo yum install -y mongodb-enterprise-3.4.9-1.el7.x86_64.rpm

sudo vi /etc/mongod.conf

sudo service mongod start

sudo service mongod status

# if errors OR mongod not running ...

cat /var/log/mongdb/mongod.log

Solution 2: Conﬁg File (cont)

•Conﬁgurethe3nodesasareplicasetnamedSECURED,changebindIp to the 10.0.0.X address, plus 127.0.0.1

•Use/mongod-data/appdb for your dbpath

•Allotherdefaultsareﬁnefornow

storage:

dbPath: /mongod-data/appdb/

...

replication:

replSetName: SECURED

net:

bindIp: 10.0.0.101,127.0.0.1

24 https://docs.mongodb.com/manual/reference/conﬁguration-options/

188

Solution 2: Replica Set Conﬁg (cont)

cfg ={

_id: "SECURED",

version: 1,

members: [

{_id: 0, host: "node1:27017"},

{_id: 1, host: "node2:27017"},

{_id: 2, host: "node3:27017"}

]

}

rs.initiate(cfg)

rs.status()

Solution 2: Veriﬁcation (cont)

Let’s try to connect to our running MongoDB cluster:

mongo --host node1

If you want to be sure to connect to the Primary, instead, use:

mongo --host SECURED/node1,node2,node3

Finally, verify that the replica set is healthy

rs.status()

Exercise 3: Launch the Client Application

It’s time to connect our client application. Install the application on node4

cd ~

tar xzvf /share/downloads/apps/security_lab.tgz

cd mongo-messenger

npm install

npm start

... webpack: bundle is now VALID.

•Ifyougetanerror running npm install,thereisworkaroundonthenextpage

•Connecttothepublicipofyournode4 instance, port 8080

http://NODE4-public-IP:8080

189

Fixing node/npm issue with SSL (Sept 2017 bug)

npm: relocation error: npm: symbol SSL_set_cert_cb, version libssl.so.10 not defined

!→in file libssl.so.10 with link time reference

Update the OpenSSL lib

sudo yum update -y openssl

OR install a newer NPM version:

curl https://raw.githubusercontent.com/creationix/nvm/v0.13.1/install.sh | bash

source ~/.bash_profile

nvm install v8.6.0

OR use Yarn

sudo wget https://dl.yarnpkg.com/rpm/yarn.repo -O /etc/yum.repos.d/yarn.repo

sudo yum install yarn

How is the client application connecting to the database?

•Theconnectionstringusedbytheapplicationisinmessage.js and looks like this:

const url ="mongodb://node1:27017,node2:27017,node3:27017/

security-lab?replicaSet=SECURED"

•conﬁrmthatthetooliswritingtothedatabasebyrunningthefollowinginamongoshell:

use security-lab

db.messages.find({from:"your_username"}).pretty()

•Thiswillwork,fornow...

WARNING: Spying your deployment!

Throughout the lab, the instructor will be spying on your deployment!

This checking is done by running a few scripts on your machinesthatwillverifywhetherornotyouhavecompletely

secured your deployment.

We will come back to this later on.

190

Authorization and Authentication

Discussion on the following questions:

•differencebetweenauthorizationandauthentication?

•whichauthenticationmechanismtouse?

•whichauthorizationsupportwillyouuse?

Exercise 4: Set up local accounts

It is time to start securing the system.

To do this, you will have to decide:

•Setofusersrequiredtooperatethissystem

•Somereferences:

–MongoDB authentication25

–role-based access control26

SSL certiﬁcates

25 https://docs.mongodb.com/manual/core/authentication/

26 https://docs.mongodb.com/manual/core/authorization/

191

Exercise 5: Enable SSL between the nodes

•Werestricted“bindIp”toalocalnetworkinterface,however if this was an outside address, it would not be good

enough

•Let’sensurewelimittheconnectionstoalistofnodeswecontrol

–Let’s use SSL certiﬁcates

–As a reminder, they are in /etc/ssl/mongodb/

Exercise 6: Enable SSL connection from the mongo shell and from the Application

•Youwillneedtocombinethemongomessenger.key and mongomessenger.pem ﬁles together to

quickly test connection in the mongo shell.

•AfteryouhavetestedSSLfromthemongoshell,updatetheclient’s connection info to connect over SSL27.

•Usemongomessenger.key,mongomessenger.pem,andmessenger-CA.pem for your client con-

nection.

# Concatenate the PEM and KEY files. ’cut’ will add the missing end of line chars

cut -b 1- /etc/ssl/mongodb/mongomessenger.

*> ~/client.pem

mongo --ssl --sslCAFile /etc/ssl/mongodb/messenger-CA.pem \

--sslPEMKeyFile ~/client.pem --host SECURED/node1,node2,node3

AnoteaboutX509certiﬁcateswithclusterAuthMode: x509

•Certiﬁcatesmust differ from the root CA certiﬁcate in the subject area by at least one of the following:

•“O”:Organization

•“OU”:OrganizationalUnit

•“DC”:DomainComponent

•IftheclientpresentsacertiﬁcatethatmatchestheCAcertiﬁcate in these 3 ﬁelds, the client will be given root

access,circumventinganyrole-basedaccesscontrol.

Gaining root access

openssl x509 -noout -subject -in /etc/ssl/mongodb/ca.pem

openssl x509 -noout -subject -in /etc/ssl/mongodb/node1.pem

# "O", "OU" (and no "DC") in the subject lines are the same!

# now, gain root even if a user is created

mongo --ssl --sslCAFile /etc/ssl/mongodb/messenger-CA.pem --sslPEMKeyFile \

/etc/ssl/mongodb/node1.pem --authenticationMechanism MONGODB-X509 \

--authenticationDatabase=’$external’ --host SECURED/node1,node2,node3

# using correctly created certs

openssl x509 -noout -subject -in ~/client.pem

# will fail

mongo --ssl --sslCAFile /etc/ssl/mongodb/messenger-CA.pem --sslPEMKeyFile \

~/client.pem --authenticationMechanism MONGODB-X509 \

--authenticationDatabase=’$external’ --host SECURED/node1,node2,node3

# will connect

27 http://mongodb.github.io/node-mongodb-native/2.2/tutorials/connect/ssl/

192

mongo --ssl --sslCAFile /etc/ssl/mongodb/messenger-CA.pem \

--sslPEMKeyFile ~/client.pem --host SECURED/node1,node2,node3

# will not be authorized until auth’d with user

show dbs

Exercise 7: Encrypt Storage Layer

To fully secure our MongoDB deployment we need to consider theactualMongoDBinstanceﬁles.

Your instructor has s o m e s c r i p t s t h a t w i l l e n a b l e h i m t o h ave apeekintotheyourcollectionandindexesdataﬁles.

Don’t let them do so!!!

Exercise 8: Avoid any log leaks

Logs are an important asset of your system.

Allow us to understand any potential issue with our cluster ordeployment. Buttheycanalsoleak some conﬁdential

information!

Make sure that you do not have any data leaks into your logs.

This should be done without downtime

Auditing

At this point we have a secured MongoDB deployment hardened against outside attacks, and used Role-Based Access

Control to limit the access of users.

•Theﬁnalstepistoenableauditing,givingusaclearrecordof who performed an auditable action.

Exercise 9: Enable Auditing

•Enableauditingforalloperations,toincludeCRUDoperations, for the security-lab database

•OutputthelogﬁleinJSONformat

•Outputthelogﬁleto/mongod-data/audit/SECURED

•Therearemany

ﬁlter options28

28 https://docs.mongodb.com/manual/tutorial/conﬁgure-audit-ﬁlters/

193

Putting it together

storage:

dbPath: /mongod-data/appdb/

...

net:

ssl:

mode: requireSSL

PEMKeyFile: /etc/ssl/mongodb/node1.pem

CAFile: /etc/ssl/mongodb/ca.pem

security:

clusterAuthMode: x509

enableEncryption : true

encryptionKeyFile : /etc/ssl/mongodb/mongodb-keyfile

redactClientLogData: true

Putting it together (cont)

auditLog:

destination: file

format: JSON

path: /mongod-data/audit/SECURED/audit.json

filter: ’{ roles: { role: "readWrite", db: "security-lab" } }’

setParameter: { auditAuthorizationSuccess:true }

Summary

What we did:

•Enabledbasicauthorization

•UsedSSLcertiﬁcatesforthecluster

•UsedX509certiﬁcatetoauthenticatetheclient

•Encryptedthedatabaseatrest

•Redactedthemongod logs

•Conﬁguredauditingforaspeciﬁcuser

194

12 MongoDB Atlas, Cloud & Ops Manager Fundamentals

MongoDB Cloud & Ops Manager (page 195) Learn about what Cloud & Ops Manager offers

Automation (page 197) Cloud & Ops Manager Automation

Lab: Cluster Automation (page 200) Set up a cluster with Cloud Manager Automation

Monitoring (page 201) Monitor a cluster with Cloud Manager

Lab: Create an Alert (page 203) Create an alert on Cloud Manager

Backups (page 203) Use Cloud Manager to create and administer backups

12.1 MongoDB Cloud & Ops Manager

Learning Objectives

Upon completing this module students should understand:

•FeaturesofCloud&OpsManager

•Availabledeploymentoptions

•ThecomponentsofCloud&OpsManager

Cloud and Ops Manager

All services for managing a MongoDB cluster or group of clusters:

•Monitoring

•Automation

•Backups

Deployment Options

•CloudManager:Hosted,

https://www.mongodb.com/cloud

•OpsManager:On-premises

195

Architecture

Cloud Manager

•ManageMongoDBinstancesanywherewithaconnectiontoCloud Manager

• Option to provision servers via AWS integration

Ops Manager

On-premises, with additional features for:

• Alerting (SNMP)

•Deploymentconﬁguration(e.g.backupredundancyacrossinternal data centers)

•GlobalcontrolofmultipleMongoDBclusters

196

Cloud & Ops Manager Use Cases

•Managea1000nodecluster(monitoring,backups,automation)

• Manage a personal project (3 node replica set on AWS, using Cloud Manager)

•Manage40deployments(witheachdeploymenthavingdifferent requirements)

Creating a Cloud Manager Account

Free account at https://www.mongodb.com/cloud

12.2 Automation

Learning Objectives

Upon completing this module students should understand:

•UsecasesforCloud/OpsManagerAutomation

•TheCloud/OpsManagerAutomationinternalworkﬂow

What is Automation?

Fully managed MongoDB deployment on your own servers:

•Automatedprovisioning

•Dynamicallyaddcapacity(e.g.addmoreshardsorreplicaset nodes)

•Upgrades

•Admintasks(e.g.changethesizeoftheoplog)

How Does Automation Work?

•Automationagentisinstalledoneachserverincluster

•Administratorcreatesagoalenvironment/topologyforsystem (through Cloud / Ops Manager interface)

•AutomationagentsperiodicallycheckwithCloud/OpsManager to get new environment/topology instructions

•Agentscreateandfollowaplanforimplementingtheinstructions

•Minuteslater,clusterdesigniscomplete,clusterisingoal state

197

Automation Agents

Sample Use Case

Administrator wants to create a 100-shard sharded cluster, with each shard comprised of a 3 node replica set:

•Administratorinstallsautomationagenton300servers

•Clusterenvironment/topologyiscreatedinCloud/OpsManager, then deployed to agents

•Agentsexecuteinstructionsuntil100-shardclusteriscomplete (usually several minutes)

Upgrades Using Automation

•Upgradeswithoutautomationcanbeamanuallyintensiveprocess (e.g. 300 servers)

•Alotofedgecaseswhenscripting(e.g.1shardhasproblems,oronereplicasetisamixedversion)

•OneclickupgradewithCloud/OpsManagerAutomationfortheentirecluster

198

Automation: Behind the Scenes

•AgentspingCloud/OpsManagerfornewinstructions

•AgentscomparetheirlocalconﬁgurationﬁlewiththelatestversionfromCloud/OpsManager

•ConﬁgurationﬁleinJSON

•AllcommunicationsoverSSL

{

"groupId":"55120365d3e4b0cac8d8a52a737",

"state":"PUBLISHED",

"version":4,

"cluster":{...

Conﬁguration File

When version number of conﬁguration ﬁle on Cloud / Ops Manager is greater than local version, agent begins making

aplantoimplementchanges:

"replicaSets":[

{

"_id":"shard_0",

"members":[

{

"_id":0,

"host":"DemoCluster_shard_0_0",

"priority":1,

"votes":1,

"slaveDelay":0,

"hidden":false,

"arbiterOnly":false

...

199

Automation Goal State

Automation agent is considered to be in goal state after all cluster changes (related to the individual agent) have been

implemented.

Demo

•TheinstructorwilldemonstrateusingAutomationtosetupasmallclusterlocally.

•Referencedocumentation:

•The Automation Agent29

•The Automation API30

•Conﬁguring the Automation Agent31

12.3 Lab: Cluster Automation

Learning Objectives

Upon completing this exercise students should understand:

•Howtodeploy,dynamicallyresize,andupgradeaclusterwith Automation

Exercise #1

Create a cluster using Cloud Manager automation with the following topology:

•3shards

•Eachshardisa3nodereplicaset(2databearingnodes,1arbiter)

•Version2.6.8ofMongoDB

•To conserve s pace, set “s mal lﬁles ” = tru e and “oplog Siz e” = 10

29 https://docs.cloud.mongodb.com/tutorial/nav/automation-agent/

30 https://docs.cloud.mongodb.com/api/

31 https://docs.cloud.mongodb.com/reference/automation-agent/

200

Exercise #2

Modify the cluster topology from Exercise #1 to the following:

•4shards(addoneshard)

•Version3.0.1ofMongoDB(upgradefrom2.6.8->3.0.1)

12.4 Monitoring

Learning Objectives

Upon completing this module students should understand:

•Cloud/OpsManagermonitoringfundamentals

•HowtosetupalertsinCloud/OpsManager

Monitoring in Cloud / Ops Manager

•Identifyclusterperformanceissues

•Identifyindividualnodesinclusterwithperformanceissues

•Visualizeperformancethroughgraphsandoverlays

•Conﬁgureandsetalerts

Monitoring Use Cases

•Alertonperformanceissues,tocatchthembeforetheyturninto an outage

•Diagnoseperformanceproblems

•Historicalperformanceanalysis

•Monitorclusterhealth

•Capacityplanningandscalingrequirements

Monitoring Agent

•Requestsmetricsfromeachhostinthecluster

•SendsthosemetricstoCloud/OpsManagerserver

•Mustbeabletocontacteveryhostinthecluster(agentcanlive in a private network)

•MusthaveaccesstocontactCloud/OpsManagerwebsitewithmetrics from hosts

201

Agent Conﬁguration

•CanuseHTTPproxy

•Cangatherhardwarestatisticsviamunin-node

•Agentcanoptionallygatherdatabasestatistics,andrecord slow queries (sampled)

Agent Security

•SSLcertiﬁcateforSSLclusters

•LDAP/Kerberossupported

•Agentmusthave“clusterMonitor”roleoneachhost

Monitoring Demo

Visit https://www.mongodb.com/cloud

Navigating Cloud Manager Charts

•Addchartstoviewbyclickingthenameofthechartatthebottom of the host’s page

•“i”iconnexttoeachcharttitlecanbeclickedtolearnwhatthe chart means

•Holdingdowntheleftmousebuttonanddraggingontopofthechart will let you zoom in

Metrics

•Minute-levelmetricsfor48hours

•Hourlymetricsforabout3months

•Dailymetricsforthelifeofthecluster

Alerts

•Everychartcanbealertedon

•Changestothestateoftheclustercantriggeralerts(e.g.afailover)

•Alertscanbesenttoemail,SMS,HipChat,orPagerDuty

202

12.5 Lab: Create an Alert

Learning Objectives

Upon completing this exercise students should understand:

•HowtocreateanalertinCloudManager

Exercise #1

Create an alert through Cloud Manager for any node within yourclusterthatisdown.

After the alert has been created, stop a node within your cluster to verify the alert.

12.6 Backups

Learning Objectives

Upon completing this module students should understand:

•HowCloud/OpsManagerBackupswork

•AdvantagestoCloud/OpsManagerBackups

Methods for Backing Up MongoDB

•mongodump

•Filesystembackups

•Cloud/OpsManagerBackups

Comparing MongoDB Backup Methods

Considerations Mongodump File System Cloud Backup Ops Manager

Initial Complexity Medium High Low High

Replica Set PIT Yes ** Yes ** Yes Yes

Sharded Snapshot No Yes ** Yes Yes

Restore Time Slow Fast Medium Medium

**Requires advanced scripting

203

Cloud / Ops Manager Backups

•Basedoffoplogs(evenfortheconﬁgservers)

•Point-in-timerecoveryforreplicasets,snapshotsforsharded clusters

•Oplogonconﬁgserverforshardedclusterbackup

•Abilitytoexcludecollections,databases(suchaslogs)

•Retentionrulescanbedeﬁned

Restoring from Cloud / Ops Manager

•Specifywhichbackuptorestore

•SCPpushorHTTPSpull(onetimeuselink)fordataﬁles

Architecture

204

Snapshotting

•LocalcopyofeveryreplicasetstoredbyCloud/OpsManager

•Oplog entries applied on top of local copy

•Localcopyisusedforsnapshotting

•Verylittleimpacttothecluster(equivalenttoaddinganother secondary)

Backup Agent

•Backupagent(canbemanagedbyAutomationagent)

•BackupagentsendsoplogentriestoCloud/OpsManagerservice to be apply on local copy

205

13 MongoDB Cloud & Ops Manager Under the Hood

API (page 206) Using the Cloud & Ops Manager API

Lab: Cloud Manager API (page 207) Cloud & Ops Manager API exercise

Architecture (Ops Manager) (page 208) Ops Manager

Security (Ops Manager) (page 210) Ops Manager Security

Lab: Install Ops Manager (page 211) Install Ops Manager

13.1 API

Learning Objectives

Upon completing this module students should understand:

•OverviewoftheCloud/OpsManagerAPI

•SampleusecasesfortheCloud/OpsManagerAPI

What is the Cloud / Ops Manager API?

Allows users to programmatically:

•Accessmonitoringdata

•Backupfunctionality(requestbackups,changesnapshotschedules, etc.)

•Automationclusterconﬁguration(modify,view)

API Documentation

https://docs.mms.mongodb.com/core/api/ <https://docs.mms.mongodb.com/core/api/>

Sample API Uses Cases

•IngestCloud/OpsManagermonitoringdata

•Programmaticallyrestoreenvironments

•Conﬁgurationmanagement

206

Ingest Monitoring Data

The monitoring API can be used to ingest monitoring data into another system, such as Nagios, HP OpenView, or your

own internal dashboard.

Programatically Restore Environments

Use the backup API to programmatically restore an integration or testing environment based on the last production

snapshot.

Conﬁguration Management

Use the automation API to integrate with existing conﬁguration management tools (such as Chef or Puppet) to auto-

mate creating and maintaining environments.

13.2 Lab: Cloud Manager API

Learning Objectives

Upon completing this exercise students should understand:

•HaveabasicunderstandingofworkingwiththeCloudManagerAPI(orOpsManagerifthestudentchooses)

Using the Cloud Manager API

If Ops Manager is installed, it may be used in place of Cloud Manager for this exercise.

Exercise #1

Navigate the Cloud Manager interface to perform the following:

•GenerateanAPIkey

•AddyourpersonalmachinetotheAPIwhitelist

Exercise #2

Modify and run the following curl command to return alerts foryourCloudManagergroup:

curl -u"username:apiKey" --digest -i

"https://mms.mongodb.com/api/public/v1.0/groups/<GROUP-ID>/alerts"

207

Exercise #3

How would you ﬁnd metrics for a given host within your Cloud Manager account? Create an outline for the API calls

needed.

13.3 Architecture (Ops Manager)

Learning Objectives

Upon completing this module students should understand:

•OpsManageroverview

•OpsManagercomponents

•ConsiderationsforsizinganOpsManagerenvironment

MongoDB Ops Manager

•On-premisesversionofCloudManager

•Everythingstayswithinprivatenetwork

Components

•Applicationserver(s):webinterface

•OpsManagerapplicationdatabase:monitoringmetrics,automation conﬁguration, etc.

•Backupinfrastructure:clusterbackupsandrestores

Architecture

Application Server

•15GBRAM,50GBofdiskspacearerequired

• Equivalent to a m3.xlarge AWS instance

208

Application Database

•Allmonitoringmetrics,automationconﬁgurations,etc.stored here

•Replicaset,however,astandaloneMongoDBnodecanalsobeused

Backup Infrastructure

•Backupdatabase(blockstore,oplog,sync)

•Backupdaemonprocess(managesapplyingoplogentries,creating snapshots, etc.)

Backup Database

•3sections:-blockstoreforblocks-oplog-syncforinitialsyncslices

•Replicaset,astandaloneMongoDBnodecanalsobeused

•Mustbesizedcarefully

•Allsnapshotsarestoredhere

•Blocklevelde-duping,thesameblockisn’tstoredtwice(signiﬁcantly reduces database size for deployment

with low/moderate writes)

209

Backup Daemon Process

•The“workhorse”ofthebackupinfrastructure

•Createsalocalcopyofthedatabaseitisbackingup(references “HEAD” database)

•Requires2-3Xdataspace(ofthedatabaseitisbackingup)

•Canrunmultipledaemons,pointingtomultiplebackupdatabases (for large clusters)

13.4 Security (Ops Manager)

Learning Objectives

Upon completing this module students should understand:

•OpsManagersecurityoverview

•SecurityandauthenticationoptionsforOpsManager

Ops Manager User Authentication

•Two-Factorauthenticationcanbeenabled(usesGoogleAuthenticator)

•LDAPauthenticationoption

Authentication for the Backing Ops Manager Databases

Ops Manager application database and backup database:

• MongoDB-CR (SCRAM-SHA1)

•LDAP

•Kerberos

Authenticating Between an Ops Manager Agent and Cluster

•LDAP

•MongoDB-CR

•Kerberos(Linuxonly)

210

Encrypting Communications

•AllcommunicationscanbeencryptedoverSSL.

Ops Manager Groups

•Userscanbelongtomanydifferentgroups

•Usershavedifferentlevelsofaccesspergroup

User Roles By Group

•ReadOnly

•UserAdmin

•MonitoringAdmin

•BackupAdmin

•AutomationAdmin

•Owner

Global User Roles

•GlobalReadOnly

•GlobalUserAdmin

•GlobalMonitoringAdmin

•GlobalBackupAdmin

•GlobalAutomationAdmin

•GlobalOwner

13.5 Lab: Install Ops Manager

Learning Objectives

Upon completing this exercise students should understand:

•ThecomponentsneededforOpsManager

•HowtosuccessfullyinstallOpsMananger

211

Install Ops Manager

ALinuxmachinewithatleast15GBofRAMisrequired

Install Ops Manager

We will follow an outline of the installation instructions here:

https://docs.opsmanager.mongodb.com/current/tutorial/install-basic-deployment/

Exercise #1

Prepare your environment for running all Ops Manager components: Monitoring, Automation, and Backups

•Setupa3nodereplicasetfortheOpsManagerapplicationdatabase (2 data bearing nodes, 1 arbiter)

•Setupa3nodereplicasetforOpsManagerbackups(2databearing nodes, 1 arbiter)

•Verifybothreplicasetshavebeeninstalledandconﬁguredcorrectly

Exercise #2

Install the Ops Manager application

•OpsManagerapplicationrequiresalicenseforcommercialuse

•DownloadtheOpsmanagerapplication(aftercompletingform): http://www.mongodb.com/download

•Installationinstructions(fromabove):docs.opsmanager.mongodb.com

•VerifyOpsManagerisrunningsuccessfully

Exercise #3

Install the Ops Manager Backup Daemon

•TheOpsManagerbackupdaemonisrequiredforusingOpsManager for backups

•Downloadandinstallthebackupdaemon(usingthelinkfromthe past exercise)

•Verifytheinstallationwassuccessfulbylookingatthelogs in: <install_dir>/logs

212

Exercise #4

Verify t h e O p s M a n a g e r i n s talla t i o n was s u c c e s s f u l :

https://docs.opsmanager.mongodb.com/current/tutorial/test-new-deployment/

Exercise #5

Use Ops Manager to backup a test cluster:

•Createa1nodereplicasetviaOpsManagerautomation

•Addsampledatatothereplicaset:

>for (var i=0;i<10000;i++) { db.blog.insert( { "name" :i})}

WriteResult({ "nInserted" :1})

>db.blog.count()

10000

•UseOpsManagertobackupthetestcluster

•PerformarestoreviaOpsManagerofthetestcluster

213

14 Introduction to MongoDB BI Connector

MongoDB Connector for BI (page 214) An introduction to MongoDB Connector for BI

14.1 MongoDB Connector for BI

Learning Objectives

Upon completing this module students should understand:

•ThedifferenttoolsincludedintheMongoDBBIConnectorpackage

•ThedifferentconﬁgurationﬁlesrequiredbytheBIConnector

•ThesupportedSQLstatementsversion

•Howtolaunchmongosqld

•RunSQLstatementsinaMongoDBserverinstance

MongoDB BI Connector: Introduction

MongoDB Connector for BI enables the execution of SQL statements in a MongoDB server.

It’s a native connector implementation that enables Business Intelligence tools to read data from a MongoDB server.

How it works

The MongoDB Connector for BI executes in the following mode:

•GeneratesaDocument-RelationalDeﬁnitionLanguage(DRDL) ﬁle that deﬁnes a map between a given collec-

tion shape to a relational schema

•Oncethedrdl ﬁle is generate, BI tools are able to request the correspondant relational sql and express queries

•AfterreceivingandprocessingaSQLstatement,providesback results in tabular format, native to BI Tools.

•TheBIconnectoralsofunctionsasapass-throughauthentication proxy.

214

BI Connector Package

BI Connector is a composite of the connector daemon and a schema deﬁnition utility.

•mongosqld :Runsasaserverdaemon and responds to incoming SQL queries

•mongodrdl:Utilitythatgeneratesdrdl ﬁles from the databases and colletions in MongoDB

The mongodrdl

mongodrdl generates a Document-Relation Deﬁnition Language ﬁle.

•Thedrdl ﬁle is a mapping between a given collection(s) shape and it’s corresponding relational schema

schema:

- db: <database name>

tables:

- table: <SQL table name>

collection: <MongoDB collection name>

pipeline:

- <optional pipeline elements>

columns:

- Name: <MongoDB field name>

MongoType: <MongoDB field type>

SqlName: <mapped SQL column name>

SqlType: <mapped SQL column type>

mongodrdl Example

To generate a drdl ﬁle we need to connect mongodrdl to a MongoDB instance:

mongodrdl -d training -c zips --host localhost:27017

cat zips.drdl

schema:

- db: training

tables:

- table: zips

collection: zips

pipeline: []

columns:

- Name: _id

MongoType: bson.ObjectId

SqlName: _id

SqlType: varchar

...

215

Custom Filtering

mongodrdl allows you to deﬁne a --customFilter ﬁeld in case we need to express MongoDB native queries

from within our SQL query expression.

mongodrdl -c zips -d training -o zips.drdl --customFilterField "mongoqueryfield"

For example, executing a geospatial query:

SELECT *FROM zips

WHERE mongoqueryfield ="{’loc’: {’$geoNear’: [30, 48, 100]}}"

mongosqld Daemon

mongosqld runs as a server daemon and responds to incoming SQL queries.

mongosqld --mongo-uri mongodb://localhost:27017 --schema zips.drdl

•Bydefaultmongosqld will be listening for incoming requests on 127.0.0.1:3307

mongosqld Authentication & Authorization

The BI Connector offers integration for three different authentication mechanisms:

• SCRAM-SHA-1

• MONGODB-CR

•PLAIN(LDAP Authentication)

And external LDAP Authorization:

•requiresdeﬁningthesource attribute in the user name string

grace?mechanism=PLAIN&source=$external

mongosqld Encryption

BI Connector supports network encrytpion on all segments of the connection.

216

SQL Compatibalility

•BIConnectorversion2.0iscompatiblewith

SQL-99 SELECT32 statements

•UsesMySQLwireprotocol

mysql --protocol tcp --port 3307

•ThismeanswecanuseaSQLclientlikemysql to query data on MongoDB

use training;

SELECT *FROM zips;

32 https://docs.mongodb.com/bi-connector/master/supported-operations/

217

Find out more

mongodb.com | mongodb.org

university.mongodb.com

Having trouble?

File a JIRA ticket:

jira.mongodb.org

@MongoDBInc

@MongoDB

Mongo DB Admin Guide

Navigation menu

Versions of this User Manual:

Views

Navigation