Ruby And Mongo DB Web Development Beginner's Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 332 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Ruby and MongoDB
Web Development
Beginner's Guide
Create dynamic web applicaons by combining
the power of Ruby and MongoDB
Gautam Rege
BIRMINGHAM - MUMBAI
Ruby and MongoDB Web Development Beginner's Guide
Copyright © 2012 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmied in any form or by any means, without the prior wrien permission of the
publisher, except in the case of brief quotaons embedded in crical arcles or reviews.
Every eort has been made in the preparaon of this book to ensure the accuracy of the
informaon presented. However, the informaon contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers
and distributors will be held liable for any damages caused or alleged to be caused directly
or indirectly by this book.
Packt Publishing has endeavored to provide trademark informaon about all of the
companies and products menoned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this informaon.
First published: July 2012
Producon Reference: 1180712
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-84951-502-3
www.packtpub.com
Cover Image by Asher Wishkerman (wishkerman@hotmail.com)
Credits
Author
Gautam Rege
Reviewers
Bob Chesley
Ayan Dave
Michael Kohl
Srikanth AD
Acquision Editor
Karkey Pandey
Lead Technical Editor
Dayan Hyames
Technical Editor
Prashant Salvi
Copy Editors
Alda Paiva
Laxmi Subramanian
Project Coordinator
Leena Purkait
Proofreader
Linda Morris
Indexer
Hemangini Bari
Graphics
Valenna D'silva
Manu Joseph
Producon Coordinator
Prachali Bhiwandkar
Cover Work
Prachali Bhiwandkar
About the Author
Gautam Rege has over twelve years of experience in soware development. He is
a Computer Engineer from Pune Instute of Computer Technology, Pune, India. Aer
graduang in 2000, he worked in various Indian soware development companies unl
2002, aer which, he seled down in Veritas Soware (now Symantec). Aer ve years
there, his urge to start his own company got the beer of him and he started Josh Soware
Private Limited along with his long me friend Sethupathi Asokan, who was also in Veritas.
He is currently the Managing Director at Josh Soware Private Limited. Josh in Hindi
(his mother tongue) means "enthusiasm" or "passion" and these are the qualies that the
company culture is built on. Josh Soware Private Limited works exclusively in Ruby and
Ruby related technologies, such as Rails – a decision Gautam and Sethu (as he is lovingly
called) took in 2007 and it has paid rich dividends today!
Acknowledgement
I would like to thank Sethu, my co-founder at Josh, for ensuring that my focus was on the
book, even during the hecc acvies at work. Thanks to Sash Talim, who encouraged
me to write this book and Sameer Tilak, for providing me with valuable feedback while
wring this book! Big thanks to Michael Kohl, who was of great help in ensuring that every
ny technical detail was accurate and rich in content. I have become "technically mature"
because of him!
The book would not have been completed without the posive and uncondional support
from my wife, Vaibhavi and daughter, Swara, who tolerated a lot of busy weekends and late
nights where I was toiling away on the book. Thank you so much!
Last, but not the least, a big thank you to Karkey, Leena, Dayan, Ayan, Prashant, and
Vrinda from Packt, who ensured that everything I did was in order and up to the mark.
About the Reviewers
Bob Chesley is a web and database developer of around twenty years currently concentrang
on JavaScript cross plaorm mobile applicaons and SaaS backend applicaons that they
connect to. Bob is also a small boat builder and sailor, enjoying the green waters of the Tampa
Bay area. He can be contacted via his web site (www.nhsoftwerks.com) or via his blog
(www.cfmeta.com) or by email at bob.chesley@nhsoftwerks.com.
Ayan Dave is a soware engineer with eight years of experience in building and delivering
high quality applicaons using languages and components in JVM ecosystem. He is passionate
about soware development and enjoys exploring open source projects. He is enthusiasc
about Agile and Extreme Programming and frequently advocates for them. Over the years he
has provided consulng services to several organizaons and has played many dierent roles.
Most recently he was the "Architectus Oryzus" for a small project team with big ideas and
subscribes to the idea that running code is the system of truth.
Ayan has a Master's degree in Computer Engineering from the University of Houston - Clear
Lake and holds PMP, PSM-1 and OCMJEA cercaons. He is also a speaker on various
technical topics at local user groups and community events. He currently lives in Columbus,
Ohio and works with Quick Soluons Inc. In the digital world he can be found at
http://daveayan.com.
Michael Kohl got interested in programming, and the wider IT world, at the young age of
12. Since then, he worked as a systems administrator, systems engineer, Linux consultant,
and soware developer, before crossing over into the domain of IT security where he
currently works. He's a programming language enthusiast who's especially enamored with
funconal programming languages, but also has a long-standing love aair with Ruby that
started around 2003. You can nd his musings online at http://citizen428.net.
www.PacktPub.com
Support les, eBooks, discount offers and more
You might want to visit www.PacktPub.com for support les and downloads related to
your book.
Did you know that Packt oers eBook versions of every book published, with PDF and ePub
les available? You can upgrade to the eBook version at www.PacktPub.com and as a print
book customer, you are entled to a discount on the eBook copy. Get in touch with us at
service@packtpub.com for more details.
At www.PacktPub.com, you can also read a collecon of free technical arcles, sign up for a
range of free newsleers and receive exclusive discounts and oers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant soluons to your IT quesons? PacktLib is Packt's online digital book
library. Here, you can access, read and search across Packt's enre library of books.
Why Subscribe?
Fully searchable across every book published by Packt
Copy and paste, print and bookmark content
On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access
PacktLib today and view nine enrely free books. Simply use your login credenals for
immediate access.
Table of Contents
Preface 1
Chapter 1: Installing MongoDB and Ruby 11
Installing Ruby 12
Using RVM on Linux or Mac OS 12
The RVM games 16
The Windows saga 17
Using rbenv for installing Ruby 17
Installing MongoDB 18
Conguring the MongoDB server 19
Starng MongoDB 19
Stopping MongoDB 21
The MongoDB CLI 21
Understanding JavaScript Object Notaon (JSON) 21
Connecng to MongoDB using Mongo 22
Saving informaon 22
Retrieving informaon 23
Deleng informaon 24
Exporng informaon using mongoexport 24
Imporng data using mongoimport 25
Managing backup and restore using mongodump and mongorestore 25
Saving large les using mongoles 26
bsondump 28
Installing Rails/Sinatra 28
Summary 29
Chapter 2: Diving Deep into MongoDB 31
Creang documents 32
Time for acon – creang our rst document 32
NoSQL scores over SQL databases 33
Using MongoDB embedded documents 34
Table of Contents
[ ii ]
Time for acon – embedding reviews and votes 35
Fetching embedded objects 36
Using MongoDB document relaonships 36
Time for acon – creang document relaons 37
Comparing MongoDB versus SQL syntax 38
Using Map/Reduce instead of join 40
Understanding funconal programming 40
Building the map funcon 40
Time for acon – wring the map funcon for calculang vote stascs 41
Building the reduce funcon 41
Time for acon – wring the reduce funcon to process emied informaon 42
Understanding the Ruby perspecve 43
Seng up Rails and MongoDB 43
Time for acon – creang the project 43
Understanding the Rails basics 44
Using Bundler 44
Why do we need the Bundler 44
Seng up Sodibee 45
Time for acon – start your engines 45
Seng up Mongoid 46
Time for acon – conguring Mongoid 47
Building the models 48
Time for acon – planning the object schema 48
Tesng from the Rails console 52
Time for acon – pung it all together 52
Understanding many-to-many relaonships in MongoDB 56
Using embedded documents 57
Time for acon – adding reviews to books 57
Choosing whether to embed or not to embed 58
Time for acon – embedding Lease and Purchase models 59
Working with Map/Reduce 60
Time for acon – wring the map funcon to calculate rangs 63
Time for acon – wring the reduce funcon to process the
emied results 64
Using Map/Reduce together 64
Time for acon – working with Map/Reduce using Ruby 65
Summary 68
Chapter 3: MongoDB Internals 69
Understanding Binary JSON 70
Fetching and traversing data 71
Manipulang data 71
Table of Contents
[ iii ]
What is ObjectId? 71
Documents and collecons 71
Capped collecons 72
Dates in MongoDB 72
JavaScript and MongoDB 72
Time for acon – wring our own custom funcons in MongoDB 73
Ensuring write consistency or "read your writes" 73
How does MongoDB use its memory-mapped storage engine? 74
Advantages of write-ahead journaling 74
Global write lock 74
Transaconal support in MongoDB 75
Understanding embedded documents and atomic updates 75
Implemenng opmisc locking in MongoDB 75
Time for acon – implemenng opmisc locking 76
Choosing between ACID transacons and MongoDB transacons 77
Why are there no joins in MongoDB? 77
Summary 79
Chapter 4: Working Out Your Way with Queries 81
Searching by elds in a document 81
Time for acon – searching by a string value 82
Querying for specic elds 84
Time for acon – fetching only for specic elds 84
Using skip and limit 86
Time for acon – skipping documents and liming our search results 86
Wring condional queries 87
Using the $or operator 88
Time for acon – nding books by name or publisher 88
Wring threshold queries with $gt, $lt, $ne, $lte, and $gte 88
Time for acon – nding the highly ranked books 89
Checking presence using $exists 89
Searching inside arrays 90
Time for acon – searching inside reviews 90
Searching inside arrays using $in and $nin 91
Searching for exact matches using $all 92
Searching inside hashes 92
Searching inside embedded documents 93
Searching with regular expressions 93
Time for acon – using regular expression searches 94
Summary 97
Table of Contents
[ iv ]
Chapter 5: Ruby DataMappers: Ruby and MongoDB Go Hand in Hand 99
Why do we need Ruby DataMappers 99
The mongo-ruby-driver 100
Time for acon – using mongo gem 101
The Ruby DataMappers for MongoDB 103
MongoMapper 104
Mongoid 104
Seng up DataMappers 104
Conguring MongoMapper 104
Time for acon – conguring MongoMapper 105
Conguring Mongoid 107
Time for acon – seng up Mongoid 107
Creang, updang, and destroying documents 110
Dening elds using MongoMapper 110
Dening elds using Mongoid 111
Creang objects 111
Time for acon – creang and updang objects 111
Using nder methods 112
Using nd method 112
Using the rst and last methods 113
Using the all method 113
Using MongoDB criteria 113
Execung condional queries using where 113
Time for acon – fetching using the where criterion 114
Revising limit, skip, and oset 115
Understanding model relaonships 116
The one to many relaon 116
Time for acon – relang models 116
Using MongoMapper 116
Using Mongoid 117
The many-to-many relaon 118
Time for acon – categorizing books 118
MongoMapper 118
Mongoid 119
Accessing many-to-many with MongoMapper 120
Accessing many-to-many relaons using Mongoid 120
The one-to-one relaon 121
Using MongoMapper 122
Using Mongoid 122
Time for acon – adding book details 123
Understanding polymorphic relaons 124
Implemenng polymorphic relaons the wrong way 124
Implemenng polymorphic relaons the correct way 124
Table of Contents
[ v ]
Time for acon – managing the driver enes 125
Time for acon – creang vehicles using basic polymorphism 129
Choosing SCI or basic polymorphism 132
Using embedded objects 133
Time for acon – creang embedded objects 134
Using MongoMapper 134
Using Mongoid 134
Using MongoMapper 137
Using Mongoid 137
Reverse embedded relaons in Mongoid 137
Time for acon – using embeds_one without specifying embedded_in 138
Time for acon – using embeds_many without specifying embedded_in 139
Understanding embedded polymorphism 140
Single Collecon Inheritance 141
Time for acon – adding licenses to drivers 141
Basic embedded polymorphism 142
Time for acon – insuring drivers 142
Choosing whether to embed or to associate documents 144
Mongoid or MongoMapper – the verdict 145
Summary 146
Chapter 6: Modeling Ruby with Mongoid 147
Developing a web applicaon with Mongoid 147
Seng up Rails 148
Time for acon – seng up a Rails project 148
Seng up Sinatra 149
Time for acon – using Sinatra professionally 151
Understanding Rack 156
Dening aributes in models 157
Accessing aributes 158
Indexing aributes 158
Unique indexes 159
Background indexing 159
Geospaal indexing 159
Sparse indexing 160
Dynamic elds 160
Time for acon – adding dynamic elds 160
Localizaon 162
Time for acon – localizing elds 162
Using arrays and hashes in models 164
Embedded objects 165
Table of Contents
[ vi ]
Dening relaons in models 165
Common opons for all relaons 165
:class_name opon 166
:inverse_of opon 166
:name opon 166
Relaon-specic opons 166
Opons for has_one 167
:as opon 167
:autosave opon 168
:dependent opon 168
:foreign_key opon 168
Opons for has_many 168
:order opon 168
Opons for belongs_to 169
:index opon 169
:polymorphic opon 169
Opons for has_and_belongs_to_many 169
:inverse_of opon 170
Time for acon – conguring the many-to-many relaon 171
Time for acon – seng up the following and followers relaonship 172
Opons for :embeds_one 175
:cascade_callbacks opon 175
:cyclic 175
Time for acon – seng up cyclic relaons 175
Opons for embeds_many 176
:versioned opon 176
Opons for embedded_in 176
:name opon 177
Managing changes in models 178
Time for acon – changing models 178
Mixing in Mongoid modules 179
The Paranoia module 180
Time for acon – geng paranoid 180
Versioning 182
Time for acon – including a version 182
Summary 185
Chapter 7: Achieving High Performance on Your Ruby Applicaon
with MongoDB 187
Proling MongoDB 188
Time for acon – enabling proling for MongoDB 188
Using the explain funcon 190
Time for acon – explaining a query 190
Using covered indexes 193
Table of Contents
[ vii ]
Time for acon – using covered indexes 193
Other MongoDB performance tuning techniques 196
Using mongostat 197
Understanding web applicaon performance 197
Web server response me 197
Throughput 198
Load the server using hperf 198
Monitoring server performance 199
End-user response and latency 202
Opmizing our code for performance 202
Indexing elds 202
Opmizing data selecon 203
Opmizing and tuning the web applicaon stack 203
Performance of the memory-mapped storage engine 203
Choosing the Ruby applicaon server 204
Passenger 204
Mongrel and Thin 204
Unicorn 204
Increasing performance of Mongoid using bson_ext gem 204
Caching objects 205
Memcache 205
Redis server 205
Summary 206
Chapter 8: Rack, Sinatra, Rails, and MongoDB – Making Use of them All 207
Revising Sodibee 208
The Rails way 208
Seng up the project 208
Modeling Sodibee 210
Time for acon – modeling the Author class 210
Time for acon – wring the Book, Category and Address models 211
Time for acon – modeling the Order class 212
Understanding Rails routes 213
What is the RESTful interface? 214
Time for acon – conguring routes 214
Understanding the Rails architecture 215
Processing a Rails request 216
Coding the Controllers and the Views 217
Time for acon – wring the AuthorsController 218
Solving the N+1 query problem using the includes method 219
Relang models without persisng them 220
Designing the web applicaon layout 223
Table of Contents
[ viii ]
Time for acon – designing the layout 223
Understanding the Rails asset pipeline 230
Designing the Authors lisng page 231
Time for acon – lisng authors 231
Adding new authors and their books 234
Time for acon – adding new authors and books 234
The Sinatra way 240
Time for acon – seng up Sinatra and Rack 240
Tesng and automaon using RSpec 243
Understanding RSpec 244
Time for acon – installing RSpec 244
Time for acon – sporking it 246
Documenng code using YARD 247
Summary 250
Chapter 9: Going Everywhere – Geospaal Indexing with MongoDB 251
What is geolocaon 252
How accurate is a geolocaon 253
Converng geolocaon to geocoded coordinates 253
Idenfying the exact geolocaon 254
Storing coordinates in MongoDB 255
Time for acon – geocoding the Address model 255
Tesng geolocaon storage 257
Time for acon – saving geolocaon coordinates 257
Using geocoder to update coordinates 258
Time for acon – using geocoder for storing coordinates 258
Firing geolocaon queries 260
Time for acon – nding nearby addresses 260
Using mongoid_spacial 262
Time for acon – ring near queries in Mongoid 262
Dierences between $near and $geoNear 263
Summary 264
Chapter 10: Scaling MongoDB 265
High availability and failover via replicaon 266
Implemenng the master/slave replicaon 266
Time for acon – seng up the master/slave replicaon 266
Using replica sets 271
Time for acon – implemenng replica sets 272
Recovering from crashes – failover 277
Adding members to the replica set 277
Implemenng replica sets for Sodibee 278
Table of Contents
[ ix ]
Time for acon – conguring replica sets for Sodibee 278
Implemenng sharding 283
Creang the shards 284
Time for acon – seng up the shards 284
Conguring the shards with a cong server 285
Time for acon – starng the cong server 285
Seng up the roung service – mongos 286
Time for acon – seng up mongos 286
Tesng shared replicaon 288
Implemenng Map/Reduce 289
Time for acon – planning the Map/Reduce funconality 290
Time for acon – Map/Reduce via the mongo console 291
Time for acon – Map/Reduce via Ruby 293
Performance benchmarking 295
Time for acon – iterang Ruby objects 295
Summary 298
Pop Quiz Answers 299
Index 301
Preface
And then there was light – a lightweight database! How oen have we all wanted some
database that was "just a data store"? Sure, you can use it in many complex ways but in
the end, it's just a plain simple data store. Welcome MongoDB!
And then there was light – a lightweight language that was fun to program in. It supports all
the constructs of a pure object-oriented language and is fun to program in. Welcome Ruby!
Both MongoDB and Ruby are the fruits of people who wanted to simplify things in a complex
world. Ruby, wrien by Yokihiro Matsumoto was made, picking the best constructs from Perl,
SmallTalk and Scheme. They say Matz (as he is called lovingly) "writes in C so that you don't
have to". Ruby is an object-oriented programming language that can be summarized in one
word: fun!
It's interesng to know that Ruby was created as an "object-oriented
scripng language". However, today Ruby can be compiled using JRuby
or Rubinius, so we could call it a programming language.
MongoDB has its roots from the word "humongous" and has the primary goal to manage
humongous data! As a NoSQL database, it relies heavily on data stored as key-value pairs.
Wait! Did we hear NoSQL – (also pronounced as No Sequel or No S-Q-L)? Yes! The roots of
MongoDB lie in its data not having a structured format! Even before we dive into Ruby and
MongoDB, it makes sense to understand some of these basic premises:
NoSQL
Brewer's CAP theorem
Basically Available, So-state, Eventually-consistent (BASE)
ACID or BASE
Preface
[ 2 ]
Understanding NoSQL
When the world was living in an age of SQL gurus and Database Administrators with
experse in stored procedures and triggers, a few brave men dared to rebel. The reason was
"simplicity". SQL was good to use when there was a structure and a xed set of rules. The
common databases such as Oracle, SQL Server, MySQL, DB2, and PostgreSQL, all promoted
SQL – referenal integrity, consistency, and atomic transacons. One of the SQL based rebels
- SQLite decided to be really "lite" and either ignored most of these constructs or did not
enforce them based on the premise: "Know what you are doing or beware".
Similarly, NoSQL is all about using simple keys to store data. Searching keys uses various
hashing algorithms, but at the end of the day all we have is a simple data store!
With the advent of web applicaons and crowd sourcing web portals, the mantra was
"more scalable than highly available" and "more speed instead of consistency". Some web
applicaons may be okay with these and others may not. What is important is that there is
now a choice and developers can choose wisely!
It's interesng to note that "key-value pair" databases have existed from the early 80's – the
earliest to my knowledge being Berkeley DB – blazingly fast, light-weight, and a very simple
library to use.
Brewer's CAP theorem
Brewer's CAP theorem states that any distributed computer system can support only any two
among consistency, atomicity, and paron tolerance.
Consistency deals with consistency of data or referenal integrity
Atomicity deals with transacons or a set of commands that execute as
"all or nothing"
Paron tolerance deals with distributed data, scaling and replicaon
There is sucient belief that any database can guarantee any two of the above. However, the
essence of the CAP theorem is not to nd a soluon to have all three behaviors, but to allow us
to look at designing databases dierently based on the applicaon we want to build!
For example, if you are building a Core Banking System (CBS), consistency and atomicity are
extremely important. The CBS must guarantee these two at the cost of paron tolerance.
Of course, a CBS has its failover systems, backup, and live replicaon to guarantee zero
downme, but at the cost of addional infrastructure and usually a single large instance
of the database.
Preface
[ 3 ]
A heavily accessed informaon web portal with a large amount of data requires speed
and scale, not consistency. Does the order of comments submied at the same me really
maer? What maers is how quickly and consistently the data was delivered. This is a clear
case of consistency and paron tolerance at the cost of atomicity.
An excellent arcle on the CAP theorem is at
http://www.julianbrowne.com/article/viewer/
brewers-cap-theorem.
What are BASE databases?
"Basically Available, So-state, Eventually-consistent"!!
Just the name suggests, a trade-o, BASE databases (yes, they are called BASE databases
intenonally to mock ACID databases) use some taccs to have consistency, atomicity, and
paron tolerance "eventually". They do not really defy the CAP theorem but work around it.
Simply put: I can aord my database to be consistent over me by synchronizing informaon
between dierent database nodes. I can cache data (also called "so-state") and persist it
later to increase the response me of my database. I can have a number of database nodes
with distributed data (paron tolerance) to be highly available and any loss of connecvity
to any nodes prompts other nodes to take over!
This does not mean that BASE databases are not prone to failure. It does imply however,
that they can recover quickly and consistently. They usually reside on standard commodity
hardware, thus making them aordable for most businesses!
A lot of databases on websites prefer speed, performance, and scalability instead of pure
consistency and integrity of data. However, as the next topic will cover, it is important to
know what to choose!
Using ACID or BASE?
"Atomic, Consistent, Isolated, and Durable" (ACID) is a cliched term used for transaconal
databases. ACID databases are sll very popular today but BASE databases are catching up.
ACID databases are good to use when you have heavy transacons at the core of your
business processes. But most applicaons can live without this complexity. This does not
imply that BASE databases do not support transacons, it's just that ACID databases are
beer suited for them.
Preface
[ 4 ]
Choose a database wisely – an old man said rightly! A choice of a database can decide the
future of your product. There are many databases today that we can choose from. Here are
some basic rules to help choose between databases for web applicaons:
A large number of small writes (vote up/down) – Redis
Auto-compleon, caching – Redis, memcached
Data mining, trending – MongoDB, Hadoop, and Big Table
Content based web portals – MongoDB, Cassandra, and Sharded ACID databases
Financial Portals – ACID database
Using Ruby
So, if you are now convinced (or rather interested to read on about MongoDB), you might
wonder where Ruby ts in anyway? Ruby is one of the languages that is being adopted the
fastest among all the new-age object oriented languages. But the big dierenator is that
it is a language that can be used, tweaked, and cranked in any way that you want – from
wring sweet smelling code to wring a domain-specic language (DSL)!
Ruby metaprogramming lets us easily adapt to any new technology, frameworks, API, and
libraries. In fact, most new services today always bundle a Ruby gem for easy integraon.
There are many Ruby implementaons available today (somemes called Rubies) such as,
the original MRI, JRuby, Rubinius, MacRuby, MagLev, and the Ruby Enterprise Edion. Each
of them has a slightly dierent avors, much like the dierent avors of Linux.
I oen have to "sell" Ruby to nontechnical or technically biased people. This simple
experiment never fails:
When I code in Ruby, I can guarantee, "My grandmother can read my code". Can any other
language guarantee that? The following is a simple code in C:
/* A simple snippet of code in C */
for (i = 0; i < 10; i++) {
printf("Hi");
}
And now the same code in Ruby:
# The same snippet of code in Ruby
10.times do
print "hi"
end
Preface
[ 5 ]
There is no way that the Ruby code can be misinterpreted. Yes, I am not saying that you
cannot write complex and complicated code in Ruby, but most code is simple to read and
understand. Frameworks, such as Rails and Sinatra, use this feature to ensure that the code
we see is readable! There is a lot of code under the cover which enables this though. For
example, take a look at the following Ruby code:
# library.rb
class Library
has_many :books
end
# book.rb
class Book
belongs_to :library
end
It's quite understandable that "A library has many books" and that "A book belongs to
a library".
The really fun part of working in Ruby (and Rails) is the nesse in the language. For example,
in the small Rails code snippet we just saw, books is plural and library is singular. The
framework infers the model Book model by the symbol :books and infers the Library
model from the symbol :library – it goes the distance to make code readable.
As a language, Ruby is free owing with relaxed rules – you can dene a method call true in
your calls that could return false! Ruby is a language where you do whatever you want as
long as you know its impact. It's a human language and you can do the same thing in many
dierent ways! There is no right or wrong way; there is only a more ecient way. Here is a
simple example to demonstrate the power of Ruby! How do you calculate the sum of all the
numbers in the array [1, 2, 3, 4, 5]?
The non-Ruby way of doing this in Ruby is:
sum = 0
for element in [1, 2, 3, 4, 5] do
sum += element
end
The not-so-much-fun way of doing this in Ruby could be:
sum = 0
[1, 2, 3, 4, 5].each do |element|
sum += element
end
Preface
[ 6 ]
The normal-fun way of doing this in Ruby is:
[1, 2, 3, 4, 5].inject(0) { |sum, element| sum + element }
Finally, the kick-ass way of doing this in Ruby is either one of the following:
[1, 2, 3, 4, 5].inject(&:+)
[1, 2, 3, 4, 5].reduce(:+)
There you have it! So many dierent ways of doing the same thing in Ruby – but noce how
most Ruby code gets done in one line.
Enjoy Ruby!
What this book covers
Chapter 1, Installing MongoDB and Ruby, describes how to install MongoDB on Linux and
Mac OS. We shall learn about the various MongoDB ulies and their usage. We then install
Ruby using RVM and also get a brief introducon to rbenv.
Chapter 2, Diving Deep into MongoDB, explains the various concepts of MongoDB and how it
diers from relaonal databases. We learn various techniques, such as inserng and updang
documents and searching for documents. We even get a brief introducon to Map/Reduce.
Chapter 3, MongoDB Internals, shares some details about what BSON is, usage of JavaScript,
the global write lock, and why there are no joins or transacons supported in MongoDB. If
you are a person in the fast lane, you can skip this chapter.
Chapter 4, Working Out Your Way with Queries, explains how we can query MongoDB
documents and search inside dierent data types such as arrays, hashes, and embedded
documents. We learn about the various query opons and even regular expression
based searching.
Chapter 5, Ruby DataMappers: Ruby and MongoDB Go Hand in Hand, provides details
on how to use Ruby data mappers to query MongoDB. This is our rst introducon to
MongoMapper and Mongoid. We learn how to congure both of them, query using
these data mappers, and even see some basic comparison between them.
Chapter 6, Modeling Ruby with Mongoid, introduces us to data models, Rails, Sinatra, and how
we can model data using MongoDB data mappers. This is the core of the web applicaon and
we see various ways to model data, organize our code, and query using Mongoid.
Preface
[ 7 ]
Chapter 7, Achieving High Performance on Your Ruby Applicaon with MongoDB,
explains the importance of proling and ensuring beer performance right from the
start of developing web applicaons using Ruby and MongoDB. We learn some best
pracces and concepts concerning the performance of web applicaons, tools, and
methods which monitor the performance of our web applicaon.
Chapter 8, Rack, Sinatra, Rails, and MongoDB – Making Use of them All, describes in
detail how to build the full web applicaon in Rails and Sinatra using Mongoid. We
design the logical ow, the views, and even learn how to test our code and document it.
Chapter 9, Going Everywhere – Geospaal Indexing with MongoDB, helps us understand
geolocaon concepts. We learn how to set up geospaal indexes, get introduced to
geocoding, and learn about geolocaon spherical queries.
Chapter 10, Scaling MongoDB, provides details on how we scale MongoDB using replica
sets. We learn about sharding, replicaon, and how we can improve performance using
MongoDB map/reduce.
Appendix, Pop Quiz Answers, provides answers to the quizzes present at the end of chapters.
What you need for this book
This book would require the following:
MongoDB version 2.0.2 or latest
Ruby version 1.9 or latest
RVM (for Linux and Mac OS only)
DevKit (for Windows only)
MongoMapper
Mongoid
And other gems, of which I will inform you as we need them!
Who this book is for
This book assumes that you are experienced in Ruby and web development skills - HTML,
and CSS. Having knowledge of using NoSQL will help you get through the concepts quicker,
but it is not mandatory. No prior knowledge of MongoDB required.
Preface
[ 8 ]
Conventions
In this book, you will nd several headings appearing frequently.
To give clear instrucons of how to complete a procedure or task, we use:
Time for action – heading
1. Acon 1
2. Acon 2
3. Acon 3
Instrucons oen need some extra explanaon so that they make sense, so they are
followed with:
What just happened?
This heading explains the working of tasks or instrucons that you have just completed.
You will also nd some other learning aids in the book, including:
Pop quiz – heading
These are short mulple choice quesons intended to help you test your own understanding.
Have a go hero – heading
These set praccal challenges and give you ideas for experimenng with what you have learned.
You will also nd a number of styles of text that disnguish between dierent kinds of
informaon. Here are some examples of these styles, and an explanaon of their meaning.
Code words in text are shown as follows: "We can include other contexts through the use of
the include direcve."
A block of code is set as follows:
book = {
name: "Oliver Twist",
author: "Charles Dickens",
publisher: "Dover Publications",
published_on: "December 30, 2002",
category: ['Classics', 'Drama']
}
Preface
[ 9 ]
When we wish to draw your aenon to a parcular part of a code block, the relevant lines
or items are set in bold:
function(key, values) {
var result = {votes: 0}
values.forEach(function(value) {
result.votes += value.votes;
});
return result;
}
Any command-line input or output is wrien as follows:
$ curl -L get.rvm.io | bash -s stable
New terms and important words are shown in bold. Words that you see on the screen, in
menus or dialog boxes for example, appear in the text like this: "clicking the Next buon
moves you to the next screen".
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this
book—what you liked or may have disliked. Reader feedback is important for us to
develop tles that you really get the most out of.
To send us general feedback, simply send an e-mail to feedback@packtpub.com, and
menon the book tle through the subject of your message.
If there is a topic that you have experse in and you are interested in either wring or
contribung to a book, see our author guide on www.packtpub.com/authors.
Preface
[ 10 ]
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help
you to get the most from your purchase.
Downloading the example code
You can download the example code les for all Packt books you have purchased from
your account at http://www.packtpub.com. If you purchased this book elsewhere,
you can visit http://www.packtpub.com/support and register to have the les
e-mailed directly to you.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do
happen. If you nd a mistake in one of our books—maybe a mistake in the text or the
code—we would be grateful if you would report this to us. By doing so, you can save
other readers from frustraon and help us improve subsequent versions of this book.
If you nd any errata, please report them by vising http://www.packtpub.com/
support, selecng your book, clicking on the errata submission form link, and entering
the details of your errata. Once your errata are veried, your submission will be accepted
and the errata will be uploaded to our website, or added to any list of exisng errata,
under the Errata secon of that tle.
Piracy
Piracy of copyright material on the Internet is an ongoing problem across all media. At
Packt, we take the protecon of our copyright and licenses very seriously. If you come
across any illegal copies of our works, in any form, on the Internet, please provide us
with the locaon address or website name immediately so that we can pursue a remedy.
Please contact us at copyright@packtpub.com with a link to the suspected
pirated material.
We appreciate your help in protecng our authors, and our ability to bring you
valuable content.
Questions
You can contact us at questions@packtpub.com if you are having a problem with
any aspect of the book, and we will do our best to address it.
1
Installing MongoDB and Ruby
MongoDB and Ruby have both been created as a result of technology geng
complicated. They both try to keep it simple and manage all the complicated
tasks at the same me. MongoDB manages "humongous" data and Ruby
is fun. Working together, they form a great bond that gives us what most
programmers desire—a fun way to build large applicaons!
Now that your interest has increased, we should rst set up our system. In this chapter,
we will see how to do the following:
Install Ruby using RVM
Install MongoDB
Congure MongoDB
Set up the inial playground using MongoDB tools
But rst, what are the basic system requirements for installing Ruby and MongoDB? Do we
need a heavy-duty server? Nah! On the contrary, any standard workstaon or laptop will be
enough. Ensure that you have at least 1 GB memory and more than 32 GB disk space.
Did you say operang system? Ruby and MongoDB are both cross-plaorm compliant. This
means they can work on any avor of Linux (such as Ubuntu, Red Hat, Fedora, Gentoo, and
SuSE), Mac OS (such as Leopard, Snow Leopard, and Lion) or Windows (such as XP, 2000,
and 7).
Installing MongoDB and Ruby
[ 12 ]
If you are planning on using Ruby and MongoDB professionally, my personal
recommendaons for development are Mac OS or Linux. As we want to see detailed
instrucons, I am going to use examples for Ubuntu or Mac OS (and point out addional
instrucons for Windows whenever I can). While hosng MongoDB databases, I would
personally recommend using Linux.
It's true that Ruby is cross-plaorm, most Rubyists tend to
shy away from Windows as it's not always awless. There are
eorts underway to recfy this.
Let the games begin!
Installing Ruby
I recommend using RVM (Ruby Version Manager) for installing Ruby. The detailed
instrucons are available at http://beginrescueend.com/rvm/install/.
Incidentally, RVM was called Ruby Version Manager but its
name was changed to reect how much more it does today!
Using RVM on Linux or Mac OS
On Linux or Mac OS you can run this inial command to install RVM as follows:
$ curl -L get.rvm.io | bash -s stable
$ source ~/.rvm/scripts/'rvm'
Aer this has been successfully run, you can verify it yourself.
$ rvm list known
If you have successfully installed RVM, this should show you the enre list of Rubies
available. You will noce that there are quite a few implementaons of Ruby (MRI Ruby,
JRuby, Rubinius, REE, and so on) We are going to install MRI Ruby.
MRI Ruby is the "standard" or original Ruby implementaon.
It's called Matz Ruby Interpreter.
Chapter 1
[ 13 ]
The following is what you will see if you have successfully executed the previous command:
$ rvm list known
# MRI Rubies
[ruby-]1.8.6[-p420]
[ruby-]1.8.6-head
[ruby-]1.8.7[-p352]
[ruby-]1.8.7-head
[ruby-]1.9.1-p378
[ruby-]1.9.1[-p431]
[ruby-]1.9.1-head
[ruby-]1.9.2-p180
[ruby-]1.9.2[-p290]
[ruby-]1.9.2-head
[ruby-]1.9.3-preview1
[ruby-]1.9.3-rc1
[ruby-]1.9.3[-p0]
[ruby-]1.9.3-head
ruby-head
# GoRuby
goruby
# JRuby
jruby-1.2.0
jruby-1.3.1
jruby-1.4.0
jruby-1.6.1
jruby-1.6.2
jruby-1.6.3
jruby-1.6.4
jruby[-1.6.5]
jruby-head
Installing MongoDB and Ruby
[ 14 ]
# Rubinius
rbx-1.0.1
rbx-1.1.1
rbx-1.2.3
rbx-1.2.4
rbx[-head]
rbx-2.0.0pre
# Ruby Enterprise Edition
ree-1.8.6
ree[-1.8.7][-2011.03]
ree-1.8.6-head
ree-1.8.7-head
# Kiji
kiji
# MagLev
maglev[-26852]
maglev-head
# Mac OS X Snow Leopard Only
macruby[-0.10]
macruby-nightly
macruby-head
# IronRuby -- Not implemented yet.
ironruby-0.9.3
ironruby-1.0-rc2
ironruby-head
Isn't that beauful? So many Rubies and counng!
Chapter 1
[ 15 ]
Fun fact
Ruby is probably the only language that has a plural notaon!
When we work with mulple versions of Ruby, we collecvely
refer to them as "Rubies"!
Before we actually install any Rubies, we should congure the RVM packages that are
necessary for all the Rubies. These are the standard packages that Ruby can integrate with,
and we install them as follows:
$ rvm package install readline
$ rvm package install iconv
$ rvm package install zlib
$ rvm package install openssl
The preceding commands install some useful libraries for all the Rubies that we will
install. These libraries make it easier to work with the command line, internaonalizaon,
compression, and SSL. You can install these packages even aer Ruby installaon, but it's just
easier to install them rst.
$ rvm install 1.9.3
The preceding command will install Ruby 1.9.3 for us. However, while installing Ruby, we
also want to pre-congure it with the packages that we have installed. So, here is how we do
it, using the following commands:
$ export rvm_path=~/.rvm
$ rvm install 1.9.3 --with-readline-dir=$rvm_path/usr --with-iconv-
dir=$rvm_path/usr --with-zlib-dir=$rvm_path/usr --with-openssl-dir=$rvm_
path/usr
The preceding commands will miraculously install Ruby 1.9.3 congured with the packages
we have installed. We should see something similar to the following on our screen:
$ rvm install 1.9.3
Installing Ruby from source to: /Users/user/.rvm/rubies/ruby-1.9.3-p0,
this may take a while depending on your cpu(s)...
Installing MongoDB and Ruby
[ 16 ]
ruby-1.9.3-p0 - #fetching
ruby-1.9.3-p0 - #downloading
ruby-1.9.3-p0, this may take a while depending on your connection...
...
ruby-1.9.3-p0 - #extracting
ruby-1.9.3-p0 to /Users/user/.rvm/src/ruby-1.9.3-p0
ruby-1.9.3-p0 - #extracted to /Users/user/.rvm/src/ruby-1.9.3-p0
ruby-1.9.3-p0 - #configuring
ruby-1.9.3-p0 - #compiling
ruby-1.9.3-p0 - #installing
...
Install of ruby-1.9.3-p0 - #complete
Of course, whenever we start our machine, we do want to load RVM, so do add this line in
your startup prole script:
$ echo '[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" #
Load RVM function' >> ~/.bash_profile
This will ensure that Ruby is loaded when you log in.
$ rvm requirements is a command that can assist you on
custom packages to be installed. This gives instrucons based
on the operang system you are on!
The RVM games
Conguring RVM for a project can be done as follows:
$ rvm –create –rvmrc use 1.9.3%myproject
The previous command allows us to congure a gemset for our project. So, when we move
to this project, it has a .rvmrc le that gets loaded and voila — our very own
custom workspace!
Chapter 1
[ 17 ]
A gemset, as the name suggests, is a group of gems that are loaded for a parcular version
of Ruby or a project. As we can have mulple versions of the same gem on a machine, we
can congure a gemset for a parcular version of Ruby and for a parcular version of the
gem as well!
$ cd /path/to/myproject
Using ruby 1.9.2 p180 with gemset myproject
In case you need to install something via RVM with sudo
access, remember to use rvmsudo instead of sudo!
The Windows saga
RVM does not work on Windows, instead you can use pik. All the detailed instrucons
to install Ruby are available at http://rubyinstaller.org/. It is prey simple and
a one-click installer.
Do remember to install DevKit as it is required for compiling
nave gems.
Using rbenv for installing Ruby
Just like all good things, RVM becomes quite complex because the community started
contribung heavily to it. Some people wanted just a Ruby version manager, so rbenv was
born. Both are quite popular but there are quite a few dierences between rbenv and RVM.
For starters, rbenv does not need to be loaded into the shell and does not override any shell
commands. It's very lightweight and unobtrusive. Install it by cloning the repository into your
home directory as .rbenv. It is done as follows:
$ cd
$ git clone git://github.com/sstephenson/rbenv.git .rbenv
Add the preceding command to the system path, that is, the $PATH variable and you're
all set.
rbenv works on a very simple concept of shims. Shims are scripts that understand what
version of Ruby we are interested in. All the versions of Ruby should be kept in the $HOME/.
rbenv/versions directory. Depending on which Ruby version is being used, the shim
inserts that parcular path at the start of the $PATH variable. This way, that Ruby version
is picked up!
Installing MongoDB and Ruby
[ 18 ]
This enables us to compile the Ruby source code too (unlike RVM where we have to specify
ruby-head).
For more informaon on rbenv, see https://github.com/
sstephenson/rbenv.
Installing MongoDB
MongoDB installers are a bunch of binaries and libraries packaged in an archive. All you
need to do is download and extract the archive. Could this be any simpler?
On Mac OS, you have two popular package managers Homebrew and MacPorts. If you
are using Homebrew, just issue the following command:
$ brew install MongoDB
If you don't have brew installed, it is strongly recommended to install it. But don't fret.
Here is the manual way to install MongoDB on any Linux, Mac OS, or Windows machine:
1. Download MongoDB from http://www.mongodb.org/downloads.
2. Extract the .tgz le to a folder (preferably which is in your system path).
It's done!
On any Linux Shell, you can issue the following commands to download and install. Be sure
to append the /path/to/MongoDB/bin to your $PATH variable:
$ cd /usr/local/
$ curl http://fastdl.mongodb.org/linux/mongodb-linux-i686-2.0.2.tgz >
mongo.tgz
$ tar xf mongo.tgz
$ ln –s mongodb-linux-i686-2.0.2 MongoDB
For Windows, you can simply download the ZIP le and extract it in a folder. Ensure that
you update the </path/to/MongoDB/bin> in your system path.
MongoDB v1.6, v1.8, and v2.x are considerably dierent. Be
sure to install the latest version. Over the course of wring this
book, v2.0 was released and the latest version is v2.0.2. It is
that version that this book will reference.
Chapter 1
[ 19 ]
Conguring the MongoDB server
Before we start the MongoDB server, it's necessary to congure the path where we want to
store our data, the interface to listen on, and so on. All these conguraons are stored in
mongod.conf. The default mongod.conf looks like the following code and is stored at the
same locaon where MongoDB is installed—in our case /usr/local/mongodb:
# Store data in /usr/local/var/mongodb instead of the default /data/db
dbpath = /usr/local/var/mongodb
# Only accept local connections
bind_ip = 127.0.0.1
dbpath is the locaon where the data will be stored. Tradionally, this used to be /data/db
but this has changed to /usr/local/var/mongodb. MongoDB will create this dbpath if
you have not created it already.
bind_ip is the interface on which the server will run. Don't mess with this entry unless
you know what you are doing!
Write-ahead logging is a technique to ensure durability and
atomicity in database systems. Before actually wring to the
database, the informaon (such as redo and undo) is wrien to a
log (called the journal). This ensures that recovering from a crash
is credible and fast. We shall learn more about this in the book.
Starting MongoDB
We can start the MongoDB server using the following command:
$ sudo mongod --config /usr/local/mongodb/mongod.conf
Remember that if we don't give the --config parameter, the default dbpath will be
taken as /data/db.
When you start the server, if all is well, you should see something like the following:
$ sudo mongod --config /usr/local/mongodb/mongod.conf
Sat Sep 10 15:46:31 [initandlisten] MongoDB starting : pid=14914
port=27017 dbpath=/usr/local/var/mongodb 64-bit
Installing MongoDB and Ruby
[ 20 ]
Sat Sep 10 15:46:31 [initandlisten] db version v2.0.2, pdfile version 4.5
Sat Sep 10 15:46:31 [initandlisten] git version:
c206d77e94bc3b65c76681df5a6b605f68a2de05
Sat Sep 10 15:46:31 [initandlisten] build sys info: Darwin erh2.10gen.
cc 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008;
root:xnu-1228.9.59~1/RELEASE_I386 i386 BOOST_LIB_VERSION=1_40
Sat Sep 10 15:46:31 [initandlisten] journal dir=/usr/local/var/mongodb/
journal
Sat Sep 10 15:46:31 [initandlisten] recover : no journal files present,
no recovery needed
Sat Sep 10 15:46:31 [initandlisten] waiting for connections on port 27017
Sat Sep 10 15:46:31 [websvr] web admin interface listening on port 28017
The preceding process does not terminate as it is running in the foreground! Some
explanaons are due here:
The server started with pid 14914 on port 27017 (default port)
The MongoDB version is 2.0.2
The journal path is /usr/local/var/mongodb/journal (It also menons that
there is no current journal le, as this is the rst me we are starng this up!)
The web admin port is on 28017
The MongoDB server has some prey interesng command-line
opons:–v is verbose. –vv is more verbose and –vvv is even
more verbose. Include mulple mes for more verbosity!
There are plenty of command line opons that allow us to use MongoDB in various ways.
For example:
1. --jsonp allows JSONP access.
2. --rest turns on REST API.
3. Master/Slave, opons, replicaon opons, and even sharing opons
(We shall see more in Chapter 10, Scaling MongoDB).
Chapter 1
[ 21 ]
Stopping MongoDB
Press Ctrl+C if the process is running in the foreground. If it's running as a daemon, it has
its standard startup script. On Linux avors such as Ubuntu, you have upstart scripts that
start and stop the mongod daemon. On Mac, you have launchd and launchct commands
that can start and stop the daemon. On other avors of Linux, you would nd more of the
resource scripts in the /etc/init.d directory. On Windows, the Services in the Control
Panel can control the daemon process.
The MongoDB CLI
Along with the MongoDB server binary, there are plenty of other ulies too that help us in
administraon, monitoring, and management of MongoDB.
Understanding JavaScript Object Notation (JSON)
Even before we see how to use MongoDB ulies, it's important to know how informaon is
stored. We shall study a lot more of the object model in Chapter 2, Diving Deep into MongoDB.
What is a JavaScript object? Surely you've heard of JavaScript Object Notaon (JSON).
MongoDB stores informaon similar to this. (It's called Binary JSON (BSON), which we shall
read more about in Chapter 3, The MongoDB Internals). BSON, in addion to JSON formats,
is ideally suited for "Document" storage. Don't worry, more informaon on this later!
So, if you want to save informaon, you simply use the JSON protocol:
{
name : 'Gautam Rege',
passion: [ 'Ruby', 'MongoDB' ],
company : {
name : "Josh Software Private Limited",
country : 'India'
}
}
The previous example shows us how to store informaon:
String: "" or ''
Integer: 10
Float: 10.1
Array: ['1', 2]
Hash: {a: 1, b: 2}
Installing MongoDB and Ruby
[ 22 ]
Connecting to MongoDB using Mongo
The Mongo client ulity is used to connect to MongoDB database. Considering that this
is a Ruby and MongoDB book, it is a ulity that we shall use rarely (because we shall be
accessing the database using Ruby). The Mongo CLI client, however, is indeed useful for
tesng out basics.
We can connect to MongoDB databases in various ways:
$ mongo book
$ mongo 192.168.1.100/book
$ mongo db.myserver.com/book
$ mongo 192.168.1.100:9999/book
In the preceding case, we connect to a database called book on localhost, on a remote
server, or on a remote server on a dierent port. When you connect to a database, you
should see the following:
$ mongo book
MongoDB shell version: 2.0.2
connecting to: book
>
Saving information
To save data, use the JavaScript object and execute the following command:
> db.shelf.save( { name: 'Gautam Rege',
passion : [ 'Ruby', 'MongoDB']
})
>
The previous command saves the data (that is, usually called "Document") into the collecon
shelf. We shall talk more about collecons and other terminologies in Chapter 3, MongoDB
Internals. A collecon can vaguely be compared to tables.
Chapter 1
[ 23 ]
Retrieving information
We have various ways to retrieve the previously stored informaon:
Fetch the rst 10 objects from the book database (also called a collecon),
as follows:
> db.shelf.find()
{ "_id" : ObjectId("4e6bb98a26e77d64db8a3e89"), "name" : "Gautam
Rege", "passion" : [ "Ruby", MongoDB" ] }
>
Find a specic record of the name aribute. This is achieved by execung the
following command:
> db.shelf.find( { name : 'Gautam Rege' })
{ "_id" : ObjectId("4e6bb98a26e77d64db8a3e89"), "name" : "Gautam
Rege", "passion" : [ "Ruby", MongoDB" ] }
>
So far so good! But you may be wondering what the big deal is. This is similar to a select
query I would have red anyway. Well, here is where things start geng interesng.
Find records by using regular expressions! This is achieved by execung the
following command:
$ db.shelf.find( { name : /Rege/ })
{ "_id" : ObjectId("4e6bb98a26e77d64db8a3e89"), "name" : "Gautam
Rege", "passion" : [ "Ruby", MongoDB" ] }
>
Find records by using regular expressions using the case-insensive ag! This is
achieved by execung the following command:
$ db.shelf.find( { name : /rege/i })
{ "_id" : ObjectId("4e6bb98a26e77d64db8a3e89"), "name" : "Gautam
Rege", "passion" : [ "Ruby", MongoDB" ] }
>
As we can see, it's easy when we have programming constructs mixed with database
constructs with a dash of regular expressions.
Installing MongoDB and Ruby
[ 24 ]
Deleting information
No surprises here!
To remove all the data from book, execute the following command:
> db.shelf.remove()
>
To remove specic data from book, execute the following command:
> db.shelf.remove({name : 'Gautam Rege'})
>
Exporting information using mongoexport
Ever wondered how to extract informaon from MongoDB? It's mongoexport! What is
prey cool is that the Mongo data transfer protocol is all in JSON/BSON formats. So what?
- you ask. As JSON is now a universally accepted and common format of data transfer,
you can actually export the database, or the collecon, directly in JSON format — so even
your web browser can process data from MongoDB. No more three-er applicaons! The
opportunies are innite!
Ok, back to basics. Here is how you can export data from MongoDB:
$ mongoexport –d book –c shelf
connected to: 127.0.0.1
{ "_id" : { "$oid" : "4e6c45b81cb76a67a0363451" }, "name" : "Gautam
Rege", "passion" : [ "Ruby", MongoDB" ]}
exported 1 records
This couldn't be simpler, could it? But wait, there's more. You can export this data into a
CSV le too!
$ mongoexport -d book -c shelf -f name,passion --csv -o test.csv
The preceding command saves data in a CSV le. Similarly, you can export data as a JSON
array too!
$ mongoexport -d book -c shelf --jsonArray
connected to: 127.0.0.1
[{ "_id" : { "$oid" : "4e6c61a05ff70cac810c6996" }, "name" : "Gautam
Rege", "passion" : [ "Ruby", "MongoDB" ] }]
exported 1 records
Chapter 1
[ 25 ]
Importing data using mongoimport
Wasn't this expected? If there is a mongoexport, you must have a mongoimport! Imagine
when you want to import informaon; you can do so in a JSON array, CSV, TSV or plain JSON
format. Simple and sweet!
Managing backup and restore using mongodump and
mongorestore
Backups are important for any database and MongoDB is no excepon. mongodump dumps
the enre database or databases in binary JSON format. We can store this and use this later to
restore it from the backup. This is the closest resemblance to mysqldump! It is done as follows:
$ mongodump -dconfig
connected to: 127.0.0.1
DATABASE: config to dump/config
config.version to dump/config/version.bson
1 objects
config.system.indexes to dump/config/system.indexes.bson
14 objects
...
config.collections to dump/config/collections.bson
1 objects
config.changelog to dump/config/changelog.bson
10 objects
$
$ ls dump/config/
changelog.bson databases.bson mongos.bson system.indexes.bson
chunks.bson lockpings.bson settings.bson version.bson
collections.bson locks.bson shards.bson
Now that we have backed up the database, in case we need to restore it, it is just a maer
of supplying the informaon to mongorestore, which is done as follows:
$ mongorestore -dbkp1 dump/config/
connected to: 127.0.0.1
dump/config/changelog.bson
Installing MongoDB and Ruby
[ 26 ]
going into namespace [bkp1.changelog]
10 objects found
dump/config/chunks.bson
going into namespace [bkp1.chunks]
7 objects found
dump/config/collections.bson
going into namespace [bkp1.collections]
1 objects found
dump/config/databases.bson
going into namespace [bkp1.databases]
15 objects found
dump/config/lockpings.bson
going into namespace [bkp1.lockpings]
5 objects found
...
1 objects found
dump/config/system.indexes.bson
going into namespace [bkp1.system.indexes]
{ key: { _id: 1 }, ns: "bkp1.version", name: "_id_" }
{ key: { _id: 1 }, ns: "bkp1.settings", name: "_id_" }
{ key: { _id: 1 }, ns: "bkp1.chunks", name: "_id_" }
{ key: { ns: 1, min: 1 }, unique: true, ns: "bkp1.chunks", name: "ns_1_
min_1" }
...
{ key: { _id: 1 }, ns: "bkp1.databases", name: "_id_" }
{ key: { _id: 1 }, ns: "bkp1.collections", name: "_id_" }
14 objects found
Saving large les using mongoles
The database should be able to store a large amount of data. Typically, the maximum size of
JSON objects stores 4 MB (and in v1.7 onwards, 16 MB). So, can we store videos and other
large documents in MongoDB? That is where the mongofiles ulity helps.
MongoDB uses GridFS specicaon for storing large les. Language bindings are available to
store large les. GridFS splits larger les into chunks and maintains all the metadata in the
collecon. It's interesng to note that GridFS is just a specicaon, not a mandate and all
MongoDB drivers adhere to this voluntarily.
Chapter 1
[ 27 ]
To manage large les directly in a database, we use the mongofiles ulity.
$ mongofiles -d book -c shelf put /home/gautam/Relax.mov
connected to: 127.0.0.1
added file: { _id: ObjectId('4e6c6f9cc7bd0bf42f31aa3b'), filename:
"/Users/gautam/Relax.mov", chunkSize: 262144, uploadDate: new
Date(1315729317190), md5: "43883ace6022c8c6682881b55e26e745", length:
49120795 }
done!
Noce that 47 MB of data was saved in the database. I wouldn't want to leave you in the
dark, so here goes a lile bit of explanaon. GridFS creates an fs collecon that has two
more collecons called chunks and files. You can retrieve this informaon from MongoDB
from the command line or using Mongo CLI.
$ mongofiles –d book list
connected to: 127.0.0.1
/Users/gautam/Downloads/Relax.mov 49120795
Let's use Mongo CLI to fetch this informaon now. This can be done as follows:
$ mongo
MongoDB shell version: 1.8.3
connecting to: test
> use book
switched to db book
> db.fs.chunks.count()
188
> db.fs.files.count()
1
> db.fs.files.findOne()
{
"_id" : ObjectId("4e6c6f9cc7bd0bf42f31aa3b"),
"filename" : "/Users/gautam/Downloads/Relax.mov",
"chunkSize" : 262144,
Installing MongoDB and Ruby
[ 28 ]
"uploadDate" : ISODate("2011-09-11T08:21:57.190Z"),
"md5" : "43883ace6022c8c6682881b55e26e745",
"length" : 49120795
}
>
bsondump
This is a ulity that helps analyze BSON dumps. For example, if you want to lter all the
objects from a BSON dump of the book database, you could run the following command:
$ bsondump --filter "{name:/Rege/}" dump/book/shelf.bson
This command would analyze the enre dump and get all the objects where name has the
specied value in it! The other very nice feature of bsondump is if we have a corrupted dump
during any restore, we can use the objcheck ag to ignore all the corrupt objects.
Installing Rails/Sinatra
Considering that we aim to do web development with Ruby and MongoDB, Rails or Sinatra
cannot be far behind.
Rails 3 packs a punch. Sinatra was born because Rails 2.x was a really
heavy framework. However, Rails 3 has Metal that can be congured
with only what we need in our applicaon framework. So Rails 3 can be
as lightweight as Sinatra and also get the best of the support libraries.
So Rails 3 it is, if I have to choose between Ruby web frameworks!
Installing Rails 3 or Sinatra is as simple as one command, as follows:
$ gem install rails
$ gem install sinatra
At the me of wring this chapter, Rails 3.2 had just been released in
producon mode. That is what we shall use!
Chapter 1
[ 29 ]
Summary
What we have learned so far is about geng comfortable with Ruby and MongoDB. We
installed Ruby using RVM, learned a lile about rbenv and then installed MongoDB. We saw
how to congure MongoDB, start it, stop it, and nally we played around with the various
MongoDB ulies to dump informaon, restore it, save large les and even export to CSV
or JSON.
In the next chapter, we shall dive deep into MongoDB. We shall learn how to work with
documents, save them, fetch them, and search for them — all this using the mongo ulity.
We shall also see a comparison with SQL databases.
2
Diving Deep into MongoDB
Now that we have seen the basic les and CLI ulies available with MongoDB,
we shall now use them. We shall see how these objects are modeled via Mongo
CLI as well as from the Ruby console.
In this chapter we shall learn the following:
Modeling the applicaon data.
Mapping it to MongoDB objects.
Creang embedded and relaonal objects.
Fetching objects.
How does this dier from the SQL way?
Take a brief look at a Map/Reduce, with an example.
We shall start modeling an applicaon, whereby we shall learn various constructs of
MongoDB and then integrate it into Rails and Sinatra. We are going to build the Sodibee
(pronounced as |saw-d-bee|) Library Manager.
Books belong to parcular categories including Fiction, Non-fiction, Romance,
Self-learning, and so on. Books belong to an author and have one publisher.
Books can be leased or bought. When books are bought or leased, the customer's details
(such as name, address, phone, and e-mail) are registered along with the list of books
purchased or leased.
Diving Deep into MongoDB
[ 32 ]
An inventory maintains the quanty of each book with the library, the quanty sold and the
number of mes it was leased.
Over the course of this book, we shall evolve this applicaon into a full-edged web
applicaon powered by Ruby and MongoDB. In this chapter we will learn the various
constructs of MongoDB.
Creating documents
Let's rst see how we can create documents in MongoDB. As we have briey seen, MongoDB
deals with collecons and documents instead of tables and rows.
Time for action – creating our rst document
Suppose we want to create the book object having the following schema:
book = {
name: "Oliver Twist",
author: "Charles Dickens",
publisher: "Dover Publications",
published_on: "December 30, 2002",
category: ['Classics', 'Drama']
}
Downloading the example code
You can download the example code les for all Packt books you have
purchased from your account at http://www.packtpub.com. If you
purchased this book elsewhere, you can visit http://www.packtpub.
com/support and register to have the les e-mailed directly to you.
On the Mongo CLI, we can add this book object to our collecon using the following command:
> db.books.insert(book)
Suppose we also add the shelf collecon (for example, the oor, the row, the column the
shelf is in, the book indexes it maintains, and so on that are part of the shelf object), which
has the following structure:
shelf : {
name : 'Fiction',
location : { row : 10, column : 3 },
floor : 1
lex : { start : 'O', end : 'P' },
}
Chapter 2
[ 33 ]
Remember, it's quite possible that a few years down the line, some shelf instances may
become obsolete and we might want to maintain their record. Maybe we could have another
shelf instance containing only books that are to be recycled or donated. What can we do?
We can approach this as follows:
The SQL way: Add addional columns to the table and ensure that there is a default
value set in them. This adds a lot of redundancy to the data. This also reduces the
performance a lile and considerably increases the storage. Sad but true!
The NoSQL way: Add the addional elds whenever you want. The following are the
MongoDB schemaless object model instances:
> db.book.shelf.find()
{ "_id" : ObjectId("4e81e0c3eeef2ac76347a01c"), "name" : "Fiction",
"location" : { "row" : 10, "column" : 3 }, "floor" : 1 }
{ "_id" : ObjectId("4e81e0fdeeef2ac76347a01d"), "name" : "Romance",
"location" : { "row" : 8, "column" : 5 }, "state" : "window broken",
"comments" : "keep away from children" }
What just happened?
You will noce that the second object has more elds, namely comments and state. When
fetching objects, it's ne if you get extra data. That is the beauty of NoSQL. When the rst
document is fetched (the one with the name Fiction), it will not contain the state and
comments elds but the second document (the one with the name Romance) will have them.
Are you worried what will happen if we try to access non-exisng data from an object,
for example, accessing comments from the rst object fetched? This can be logically
resolved—we can check the existence of a key, or default to a value in case it's not there,
or ignore its absence. This is typically done anyway in code when we access objects.
Noce that when the schema changed we did not have to add elds in every object with
default values like we do when using a SQL database. So there is no redundant informaon
in our database. This ensures that the storage is minimal and in turn the object informaon
fetched will have concise data. So there was no redundancy and no compromise on storage
or performance. But wait! There's more.
NoSQL scores over SQL databases
The way many-to-many relaons are managed tells us how we can do more with MongoDB
that just cannot be simply done in a relaonal database. The following is an example:
Each book can have reviews and votes given by customers. We should be able to see these
reviews and votes and also maintain a list of top voted books.
Diving Deep into MongoDB
[ 34 ]
If we had to do this in a relaonal database, this would be somewhat like the relaonship
diagram shown as follows: (get scared now!)
Book User
Votes Review
vote_count
review count
The vote_count and review_countelds are inside the books table that would need to be
updated every me a user votes up/down a book or writes a review. So, to fetch a book along
with its votes and reviews, we would need to re three queries to fetch the informaon:
SELECT * from book where id = 3;
SELECT * from reviews where book_id = 3;
SELECT * from votes where book_id = 3;
We could also use a join for this:
SELECT * FROM books JOIN reviews ON reviews.book_id = books.id JOIN votes
ON votes.book_id = books.id;
In MongoDB, we can do this directly using embedded documents
or relaonal documents.
Using MongoDB embedded documents
Embedded documents, as the name suggests, are documents that are embedded in other
documents. This is one of the features of MongoDB and this cannot be done in relaonal
databases. Ever heard of a table embedded inside another table?
Instead of four tables and a complex many-to-many relaonship, we can say that reviews and
votes are part of a book. So, when we fetch a book, the reviews and the votes automacally
come along with the book.
Chapter 2
[ 35 ]
Embedded documents are analogous to chapters inside a book. Chapters cannot be read
unless you open the book. Similarly embedded documents cannot be accessed unless you
access the document.
For the UML savvy, embedded documents are similar to the contains
or composion relaonship.
Time for action – embedding reviews and votes
In MongoDB, the embedded object physically resides inside the parent. So if we had to
maintain reviews and votes we could model the object as follows:
book : { name: "Oliver Twist",
reviews : [
{ user: "Gautam",
comment: "Very interesting read"
},
{ user: "Harry",
comment: "Who is Oliver Twist?"
}
]
votes: [ "Gautam", "Tom", "Dick"]
}
What just happened?
We now have reviews and votes inside the book. They cannot exist on their own. Did you
noce that they look similar to JSON hashes and arrays? Indeed, they are an array of hashes.
Embedded documents are just like hashes inside another object.
There is a subtle dierence between hashes and embedded objects as we shall see later on
in the book.
Have a go hero – adding more embedded objects to the book
Try to add more embedded objects such as orders inside the book document. It works!
order = {
name: "Toby Jones"
type: "lease",
units: 1,
cost: 40
}
Diving Deep into MongoDB
[ 36 ]
Fetching embedded objects
We can fetch a book along with the reviews and the votes with it. This can be done by
execung the following command:
> var book = db.books.findOne({name : 'Oliver Twist'})
> book.reviews.length
2
> book.votes.length
3
> book.reviews
[
{ user: "Gautam",
comment: "Very interesting read"
},
{ user: "Harry",
comment: "Who is Oliver Twist?"
}
]
> book.votes
[ "Gautam", "Tom", "Dick"]
This does indeed look simple, doesn't it? By fetching a single object, we are able to get the
review and vote count along with the data.
Use embedded documents only if you really have to!
Embedded documents increase the size of the object. So, if we have
a large number of embedded documents, it could adversely impact
performance. Even to get the name of the book, the reviews and
the votes are fetched.
Using MongoDB document relationships
Just like we have embedded documents, we can also set up relaonships between
dierent documents.
Chapter 2
[ 37 ]
Time for action – creating document relations
The following is another way to create the same relaonship between books, users, reviews,
and votes. This is more like the SQL way.
book: {
_id: ObjectId("4e81b95ffed0eb0c23000002"),
name: "Oliver Twist",
author: "Charles Dickens",
publisher: "Dover Publications",
published_on: "December 30, 2002",
category: ['Classics', 'Drama']
}
Every document that is created in MongoDB has an object ID associated
with it. In the next chapter, we shall soon learn about object IDs in
MongoDB. By using these object IDs we can easily idenfy dierent
documents. They can be considered as primary keys.
So, we can also create the reviews collecon and the votes collecon as follows:
users: [
{
_id: ObjectId("8d83b612fed0eb0bee000702"),
name: "Gautam"
},
{
_id : ObjectId("ab93b612fed0eb0bee000883"),
name: "Harry"
}
]
reviews: [
{
_id: ObjectId("5e85b612fed0eb0bee000001"),
user_id: ObjectId("8d83b612fed0eb0bee000702"),
book_id: ObjectId("4e81b95ffed0eb0c23000002"),
comment: "Very interesting read"
},
{
_id: ObjectId("4585b612fed0eb0bee000003"),
user_id : ObjectId("ab93b612fed0eb0bee000883"),
book_id: ObjectId("4e81b95ffed0eb0c23000002"),
Diving Deep into MongoDB
[ 38 ]
comment: "Who is Oliver Twist?"
}
]
votes: [
{
_id: ObjectId("6e95b612fed0eb0bee000123"),
user_id : ObjectId("8d83b612fed0eb0bee000702"),
book_id: ObjectId("4e81b95ffed0eb0c23000002"),
},
{
_id: ObjectId("4585b612fed0eb0bee000003"),
user_id : ObjectId("ab93b612fed0eb0bee000883"),
}
]
What just happened?
Hmm!! Not very interesng, is it? It doesn't even seem right. That's because it isn't the
right choice in this context. It's very important to know how to choose between nesng
documents and relang them.
In your object model, if you will never search by the nested document
(that is, look up for the parent from the child), embed it.
Just in case you are not sure about whether you would need to search by an embedded
document, don't worry too much – it does not mean that you cannot search among embedded
objects. You can use Map/Reduce to gather the informaon. There is more on this later in this
chapter and a lot more in detail, in Chapter 4, Working out Your Way with Queries.
Comparing MongoDB versus SQL syntax
This is a good me to sit back and evaluate the similaries and dissimilaries between the
MongoDB syntax and the SQL syntax. Let's map them together:
SQL commands NoSQL (MongoDB) equivalent
SELECT * FROM books db.books.find()
SELECT * FROM books WHERE
id = 3;
db.books.find( { id : 3 } )
Chapter 2
[ 39 ]
SQL commands NoSQL (MongoDB) equivalent
SELECT * FROM books WHERE
name LIKE 'Oliver%'
db.books.find( { name :
/^Oliver/ } )
SELECT * FROM books WHERE
name like '%Oliver%'
db.books.find( { name : /
Oliver/ } )
SELECT * FROM books
WHERE publisher = 'Dover
Publications' AND
published_date = "2011-8-
01"
db.books.find( { publisher
: "Dover Publications",
published_date :
ISODate("2011-8-01") } )
SELECT * FROM books WHERE
published_date > "2011-8-
01"
db.books.find ( {
published_date : { $gt :
ISODate("2011-8-01") } } )
SELECT name FROM books
ORDER BY published_date
db.books.find( {}, { name
: 1 } ).sort( { published_
date : 1 } )
SELECT name FROM books
ORDER BY published_date
DESC
db.books.find( {}, { name
: 1 } ).sort( { published_
date : -1 } )
SELECT votes.name from
books JOIN votes where
votes.book_id = books.id
db.books.find( { votes : {
$exists : 1 } }, { votes.
name : 1 } )
Some more notable comparisons between MongoDB and relaonal databases are:
MongoDB does not support joins. Instead it res mulple queries or uses
Map/Reduce. We shall soon see why the NoSQL facon does not favor joins.
SQL has stored procedures. MongoDB supports JavaScript funcons.
MongoDB has indexes similar to SQL.
MongoDB also supports Map/Reduce funconality.
MongoDB supports atomic updates like SQL databases.
Embedded or related objects are used somemes instead of a SQL join.
MongoDB collecons are analogous to SQL tables.
MongoDB documents are analogous to SQL rows.
Diving Deep into MongoDB
[ 40 ]
Using Map/Reduce instead of join
We have seen this menoned a few mes earlier—it's worth jumping into it, at least briey.
Map/Reduce is a concept that was introduced by Google in 2004.
It's a way of distributed task processing. We "map" tasks to works
and then "reduce" the results.
Understanding functional programming
Funconal programming is a programming paradigm that has its roots from lambda calculus.
If that sounds inmidang, remember that JavaScript could be considered a funconal
language. The following is a snippet of funconal programming:
$(document).ready( function () {
$('#element').click( function () {
# do something here
});
$('#element2').change( function () {
# do something here
})
});
We can have funcons inside funcons. Higher-level languages (such as Java and Ruby)
support anonymous funcons and closures but are sll procedural funcons. Funconal
programs rely on results of a funcon being chained to other funcons.
Building the map function
The map funcon processes a chunk of data. Data that is fed to this funcon could be
accessed across a distributed lesystem, mulple databases, the Internet, or even any
mathemacal computaon series!
function map(void) -> void
The map funcon "emits" informaon that is collected by the "myscal super giganc
computer program" and feeds that to the reducer funcons as input.
MongoDB as a database supports this paradigm making it "the all powerful" (of course
I am joking, but it does indeed make MongoDB very powerful).
Chapter 2
[ 41 ]
Time for action – writing the map function for calculating vote
statistics
Let's assume we have a document structure as follows:
{ name: "Oliver Twist",
votes: ['Gautam', 'Harry']
published_on: "December 30, 2002"
}
The map funcon for such a structure could be as follows:
function() {
emit( this.name, {votes : this.votes} );
}
What just happened?
The emit funcon emits the data. Noce that the data is emied as a (key, value) structure.
Key: This is the parameter over which we want to gather informaon. Typically it
would be some primary key, or some key that helps idenfy the informaon.
For the SQL savvy, typically the key is the eld we use in
the GROUP BY clause.
Value: This is a JSON object. This can have mulple values and this is the data that is
processed by the reduce funcon.
We can call emit more than once in the map funcon. This would mean we are processing
data mulple mes for the same object.
Building the reduce function
The reduce funcons are the consumer funcons that process the informaon emied from
the map funcons and emit the results to be aggregated. For each emied data from the
map funcon, a reduce funcon emits the result. MongoDB collects and collates the results.
This makes the system of collecon and processing as a massive parallel processing system
giving the all mighty power to MongoDB.
The reduce funcons have the following signature:
function reduce(key, values_array) -> value
Diving Deep into MongoDB
[ 42 ]
Time for action – writing the reduce function to process emitted
information
This could be the reduce funcon for the previous example:
function(key, values) {
var result = {votes: 0}
values.forEach(function(value) {
result.votes += value.votes;
});
return result;
}
What just happened?
reduce takes an array of values – so it is important to process an array every me. Later
on in the book we shall see how there are various opons to Map/Reduce that help us
process data.
Let's analyze this funcon in more detail:
function(key, values) {
var result = {votes: 0}
values.forEach(function(value) {
result.votes += value.votes;
});
return result;
}
The variable result has a structure similar to what was emied from the map funcon. This
is important, as we want the results from every document in the same format. If we need to
process more results, we can use the finalize funcon (more on that later). The result
funcon has the following structure:
function(key, values) {
var result = {votes: 0}
values.forEach(function(value) {
result.votes += value.votes;
});
return result;
}
Chapter 2
[ 43 ]
The values are always passed as arrays. It's important that we iterate the array, as there
could be mulple values emied from dierent map funcons with the same key. So, we
processed the array to ensure that we don't overwrite the results and collate them.
Understanding the Ruby perspective
Unl now we have just been playing around with MongoDB. Now let's have a look at this
from Ruby. Aaahhh… bliss!
For this example, we shall write some basic classes in Ruby. We are using Rails 3 and the
Mongoid wrapper for MongoDB. (We shall see more about MongoDB wrappers later in
the book)
Setting up Rails and MongoDB
To set up a Rails project, we rst need to install the Rails gem. We shall also install the
Bundler gem that goes hand-in-hand with Rails.
Time for action – creating the project
First we shall create the sample Rails project. Assuming you have installed Ruby already, we
need to install Rails. The following command shows how to install Rails and Bundler.
$ gem install rails
$ gem install bundler
What just happened?
The preceding commands will install Rails and Bundler. For the sake of this example, I am
working with Rails 3.2.0 (that is, the current latest version) but I recommend that you should
use the latest version of Rails available.
Diving Deep into MongoDB
[ 44 ]
Understanding the Rails basics
Rails is a web framework wrien in Ruby. It was released publicly in 2005 and it has gathered
a lot of steam since then. It is interesng to note that unl Rails 2.x, the framework was a
ghtly coupled one. This was when other loosely coupled web frameworks made their way
into the developer market. The most popular among them were Merb and Sinatra. These
frameworks leveraged Ruby to its full potenal but were compeng against each other.
Around 2008-2009, the Rails core team (David Hanson and team)
met the makers of Merb (Yehuda Katz and team) and they got
together and discussed a strategy that has literally changed the
face of web development. Rails 3 emerged with a bang; it had a
brand new framework with Metal and Rack with loosely coupled
components and very customizable middleware. This has made
Rails extremely popular today.
Using Bundler
Bundler is another awesome gem by "Carlhuda" (Yahuda and Carl Leche) that manages gem
dependencies in Ruby applicaons.
Why do we need the Bundler
In the "olden" days, when everything was a system installaon, things would be running
smoothly ll somebody upgraded a system library or a gem... and then Kaboom! – the
applicaon crashed for no apparent reason and no code change. Some libraries break
compability, which in turn requires us to install the new gems. So, even if a system
administrator upgraded the system (as a roune maintenance acvity), our Ruby
applicaon was prone to crashes.
A bigger problem arose when we were required to install mulple Ruby applicaons on
the same system. Ruby version, Rails version, gem versions, and system libraries all could
potenally clash to make development and deployment a nightmare!
One soluon was to freeze gems and the Ruby version. This required us to ship everything into
our applicaon bundle. Not only was this inecient but also increased the size of the bundle.
Then came along Bundler and, as the name suggests, it keeps track of dependencies in a
Ruby applicaon. Java has a similar package called Maven. But wait! Bundler has more in
store. We can now package gems (via a Gemle) and specify environments with it. So, if we
require some gems only for tesng, it can be specied to be a part of only the "test" group.
Chapter 2
[ 45 ]
If that's not sold you over using Bundler, we can specify the source of the gem les
too – github, sourceforge or even a gem in our local le system.
Bundler generates Gemfile.lock that manages the gem dependencies for the applicaon.
It uses the system-installed gems; so that we don't have to freeze gems or Ruby versions with
each applicaon.
Setting up Sodibee
Now that we have installed Rails and Bundler, it's me to set up the Sodibee project.
Time for action – start your engines
Now we shall create the Sodibee project in Rails 3. It can be done using the following
command:
$ rails new sodibee –JO
In the previous command, -J means skip-prototype (and use jQuery instead) and -O
means skip-activerecord. This is important, as we want to use MongoDB.
Add the following to Gemle:
gem 'mongoid'
gem 'bson'
gem 'bson_ext'
Now on command line, type the following:
$ bundle install
In Rails 3.2.1 a lot of automaon has been added. bundle install
is part of the process of creang a project.
What just happened?
The previous command: bundle install fetches missing gems, their dependencies, and
installs them. It then generates Gemfile.lock. Aer bundle install is complete, you
would see the following on the screen:
$ bundle install
Fetching source index for http://rubygems.org/
Using rake (0.9.2)
Using abstract (1.0.0)
Diving Deep into MongoDB
[ 46 ]
Using activesupport (3.2.0)
Using builder (2.1.2)
Using i18n (0.5.0)
Using activemodel (3.2.0)
Using erubis (2.6.6)
Using rack (1.2.4)
Using rack-mount (0.6.14)
Using rack-test (0.5.7)
Installing tzinfo (0.3.30)
Using actionpack (3.2.0)
Using mime-types (1.16)
Using polyglot (0.3.2)
Using treetop (1.4.10)
Using mail (2.2.19)
Using actionmailer (3.2.0)
Using arel (2.0.10)
Using activerecord (3.2.0)
Using activeresource (3.2.0)
Using bson (1.4.0)
Using bundler (1.0.10)
Using mongo (1.3.1)
Installing mongoid (2.2.1)
Using rdoc (3.9.4)
Using thor (0.14.6)
Using railties (3.2.0)
Using rails (3.2.0)
Your bundle is complete! Use `bundle show [gemname]` to see where a
bundled gem is installed.
Setting up Mongoid
Now that the Rails applicaon is set up, let's congure Mongoid.
Mongoid is an Object Document Mapper (ODM) tool that maps Ruby objects to MongoDB
documents. We shall learn a lot more in detail in the later chapters on Mongoid and other
similar ODM tools. For now, we shall simply issue the command to congure Mongoid.
Chapter 2
[ 47 ]
Time for action – conguring Mongoid
The Mongoid gem has a Rails generator command to congure Mongoid.
A Rails generator, as the name suggests, sets up les. Generators are
used frequently in gems to set up cong les, with default sengs,
g can be used instead of wring generate.
$ rails g mongoid:config
What just happened?
This command created a config/mongoid.yml le that is used to connect to MongoDB.
The le would look like the following code snippet:
development:
host: localhost
database: sodibee_development
test:
host: localhost
database: sodibee_test
# set these environment variables on your prod server
production:
host: <%= ENV['MONGOID_HOST'] %>
port: <%= ENV['MONGOID_PORT'] %>
username: <%= ENV['MONGOID_USERNAME'] %>
password: <%= ENV['MONGOID_PASSWORD'] %>
database: <%= ENV['MONGOID_DATABASE'] %>
# slaves:
# - host: slave1.local
# port: 27018
# - host: slave2.local
# port: 27019
gautam-2:sodibee gautam$
Noce that there are now three environments to work with—development, test, and
producon. By default, Rails will pick up the development environment. We do not need
to explicitly create the database in MongoDB. The rst call to the database will create the
database for us.
Diving Deep into MongoDB
[ 48 ]
The previous command also congures the config/application.rb to ensure that
AcveRecord is disabled. AcveRecord is the default Rails ORM (Object Relaonal Mapper).
As we are using Mongoid, we need to disable AcveRecord.
Building the models
Now that we have the project set up, it's me we create the models. Each model will
autocreate collecons in MongoDB. To create a model, all we need to do is create a le
in the app/models folder.
Time for action – planning the object schema
Here we shall build the dierent models and add their relaons.
Building the book model
This app/models/book.rb would contain the following code:
class Book
include Mongoid::Document
field :title, type: String
field :publisher, type: String
field :published_on, type: Date
field :votes, type: Array
belongs_to :author
has_and_belongs_to_many :categories
embeds_many :reviews
end
What just happened?
Let's study the previous code snippet in more detail:
class Book
include Mongoid::Document
field :title, type: String
field :publisher, type: String
field :published_on, type: Date
Chapter 2
[ 49 ]
field :votes, type: Array
belongs_to :author
has_and_belongs_to_many :categories
embeds_many :reviews
end
The preceding code includes the Mongoid module to save the documents in MongoDB.
include is the Ruby way of adding methods to the Ruby class by
including modules. This is called module mixin. We can include as
many modules in a class as we want. Modules make the class richer
by adding all the module methods as instance methods.
extend is the Ruby way of adding class methods to a Ruby class by
including modules in it. All the methods from the modules included
become class methods.
Let's have a look at the previous snippet again:
class Book
include Mongoid::Document
field :title, type: String
field :publisher, type: String
field :published_on, type: Date
field :votes, type: Array
belongs_to :author
has_and_belongs_to_many :categories
embeds_many :reviews
end
The previous code congures the name and the type of the elds for a document.
Noce the Ruby 1.9 syntax for a hash. No more hash rockets (=>). Instead
in we use the JSON notaon directly. Remember it's type:String and
not type : String. You must have the key and the colon (:) together.
Diving Deep into MongoDB
[ 50 ]
Let's have a look at the snippet again:
class Book
include Mongoid::Document
field :title, type: String
field :publisher, type: String
field :published_on, type: Date
field :votes, type: Array
belongs_to :author
has_and_belongs_to_many :categories
embeds_many :reviews
end
The previous snippet is a relaonal document. This means that the document has a
reference to the author document.
Let's have a look at the snippet for the second me:
class Book
include Mongoid::Document
field :title, type: String
field :publisher, type: String
field :published_on, type: Date
field :votes, type: Array
belongs_to :author
has_and_belongs_to_many :categories
embeds_many :reviews
end
The previous snippet is a many-to-many relaonship between books and categories.
Let's have a look at the snippet a third me:
class Book
include Mongoid::Document
field :title, type: String
field :publisher, type: String
Chapter 2
[ 51 ]
field :published_on, type: Date
field :votes, type: Array
belongs_to :author
has_and_belongs_to_many :categories
embeds_many :reviews
end
The previous snippet is an example of nested or embedded documents. All the review
documents will be embedded into the books.
Have a go hero – building the remaining models
We need the Author, Category, and Review models. Here is how we can do this.
The app/models/author.rb le contains the following code:
class Author
include Mongoid::Document
field :name, type: String
has_many :books
end
The app/models/category.rb le contains the following code:
class Category
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :books
end
Note that the category and books have a many-to-many relaonship. The app/models/
review.rb le contains the following code:
class Review
include Mongoid::Document
field :comment, type: String
field :username, type: String
embedded_in :book
end
Diving Deep into MongoDB
[ 52 ]
It's very important that the inverse relaon that is, the embedded_in is menoned in
reviews. This tells Mongoid how to store the embedded object. If this is not wrien, objects
will be not get embedded.
Testing from the Rails console
Nothing is ever complete without tesng. The Rails community is almost fanacal about
integrang tests into the project. We shall learn about tesng soon, but for now let's test our
code from the Rails console.
Time for action – putting it all together
Now we shall test these models to see if they indeed work as expected. We shall create
dierent objects and their relaons. The fun begins! Let's start the Rails console and create
our rst book object:
$ rails console
The Rails console is a command-line interacve command prompt
that loads the Rails environment and the models. It's the best way
to check and test if our data models are correct.
Let's create a book now. We can do that using the following code:
> b = Book.new(title: "Oliver Twist", publisher: "Dover Publications",
published_on: Date.parse("2002-12-30") )
=> #<Book _id: 4e86e45efed0eb0be0000010, _type: nil, title: "Oliver
Twist", publisher: "Dover Publications", published_on: 2002-12-30
00:00:00 UTC, votes: nil, author_id: nil, category_ids: []>
Here, we have populated the basic title, publisher, and published_on elds. Now let's
work with the relaons. Let's create an author, which can be done as follows:
> Author.create(name: "Charles Dickens")
=> #<Author _id: 4e86e4b6fed0eb0be0000011, _type: nil, name: "Charles
Dickens">
Chapter 2
[ 53 ]
Let's create a couple of categories too. This can be done as follows:
> Category.create(name: "Fiction")
=> #<Category _id: 4e86e4cbfed0eb0be0000012, _type: nil, name:
"Fiction", book_ids: []>
> Category.create(name: "Drama")
=> #<Category _id: 4e86e4d9fed0eb0be0000013, _type: nil, name: "Drama",
book_ids: []>
Now, let's add an author and some categories to our book. This can be done as follows:
> b.author = Author.where(name: "Charles Dickens").first
=> #<Author _id: 4e86e4b6fed0eb0be0000011, _type: nil, name: "Charles
Dickens">
> b.categories << Category.first
=> []
> b.categories << Category.last
=> []
> b
=> #<Book _id: 4e86df21fed0eb0be000000b, _type: nil, title: "Oliver
Twist", publisher: "Dover Publications", published_on: 2002-12-30
00:00:00 UTC, votes: nil, author_id: BSON::ObjectId('4e86e4b6fed0eb0
be0000011'), category_ids: [BSON::ObjectId('4e86e4cbfed0eb0be0000012'),
BSON::ObjectId('4e86e4d9fed0eb0be0000013')]>
> b.save
=> true
Remember to save the object!
Save returns true if the object was saved successfully,
otherwise it returns false. Save will raise an excepon
if the save was unsuccessful.
Diving Deep into MongoDB
[ 54 ]
What just happened?
We have just created books, authors, and categories.
Hmm... category and books have a many-to-many relaonship. So does this mean that
category objects should also be updated? Let's check:
> Category.first
=> #<Category _id: 4e86e4cbfed0eb0be0000012, _type: nil, name:
"Fiction", book_ids: [BSON::ObjectId('4e86e45efed0eb0be0000010')]>
> Category.last
=> #<Category _id: 4e86e4d9fed0eb0be0000013, _type: nil, name: "Drama",
book_ids: [BSON::ObjectId('4e86e45efed0eb0be0000010')]>
Yeah!, we are in good shape.
Let's check what MongoDB has stored. Start the Mongo CLI and see the books.
We can do this as follows:
$ mongo
MongoDB shell version: 1.8.3
connecting to: test
> use sodibee_development
switched to db sodibee_development
> db.books.findOne()
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
"publisher" : "Dover Publications",
"published_on" : ISODate("2002-12-30T00:00:00Z"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011")
}
>
Chapter 2
[ 55 ]
And let's see the categories and author objects too
> db.categories.findOne()
{
"_id" : ObjectId("4e86e4cbfed0eb0be0000012"),
"book_ids" : [
ObjectId("4e86e45efed0eb0be0000010")
],
"name" : "Fiction"
}
> db.categories.findOne({name: "Drama"})
{
"_id" : ObjectId("4e86e4d9fed0eb0be0000013"),
"book_ids" : [
ObjectId("4e86e45efed0eb0be0000010")
],
"name" : "Drama"
}
> db.authors.findOne()
{ "_id" : ObjectId("4e86e4b6fed0eb0be0000011"), "name" : "Charles
Dickens" }
>
All is well!
Have a go hero – adding more books, authors, and categories
Let's get creave (and funny) by adding the following:
Adventures of Banana Man by Willie Slip in the Adventure category.
World's craziest Moments and Dizzying moments by Mary Go Round in
the Travel category.
Procrasnate and Laziness Personied by Toby D Cided in the Self-help category
Diving Deep into MongoDB
[ 56 ]
Understanding many-to-many relationships in MongoDB
In a SQL database, a many-to-many relaonship is done using an intermediate table. For
example, the many-to many relaonship we have menoned previously between books
and categories, would be achieved in the following manner in a SQL database:
Books
id int(10) auto increment
name varchar(255)
Categories
id int(10) auto increment
name varchar(255)
Category_books
Id int(10) auto increment
category_id references categories(id)
As MongoDB is a schemaless database, we do not need any addional temporary collecons.
The following is what the book object stores:
> db.books.findOne()
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
"publisher" : "Dover Publications",
"published_on" : ISODate("2002-12-30T00:00:00Z"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011")
}
>
The following is what the category object stores:
> db.categories.findOne()
{
"_id" : ObjectId("4e86e4cbfed0eb0be0000012"),
"book_ids" : [
Chapter 2
[ 57 ]
ObjectId("4e86e45efed0eb0be0000010")
],
"name" : "Fiction"
}
No intermediate collecons needed!
Using embedded documents
When we built the models, we embedded reviews in the book mode. An example would be
ideal to explain this.
Time for action – adding reviews to books
Let's start the Rails console again and add reviews to books. This is done as follows:
> b = Book.where(title: "Oliver Twist").first
=> #<Book _id: 4e86e45efed0eb0be0000010, _type: nil, title: "Oliver
Twist", publisher: "Dover Publications", published_on: 2002-12-30
00:00:00 UTC, votes: nil, author_id: nil, category_ids: []>
> b.reviews.create(comment: "Fast paced book!", username: "Gautam")
=> #<Review _id: 4e86f6c8fed0eb0be0000019, _type: nil, comment: "Fast
paced book!", username: "Gautam">
> b.reviews.create(comment: "Excellent literature", username: "Tom")
=> #<Review _id: 4e86f6fffed0eb0be000001a, _type: nil, comment:
"Excellent literature", username: "Tom">
What just happened?
That's it—we just created reviews for books. Let's fetch them and check:
b.reviews
=> [#<Review _id: 4e86f68bfed0eb0be0000018, _type: nil,
comment: "Fast paced book!", username: "Gautam">, #<Review _id:
4e86f6fffed0eb0be000001a, _type: nil, comment: "Excellent literature",
username: "Tom">]
Diving Deep into MongoDB
[ 58 ]
Let's look at the following code to see what was stored in MongoDB:
> db.books.findOne()
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
"published_on" : ISODate("2002-12-30T00:00:00Z"),
"publisher" : "Dover Publications",
"reviews" : [
{
"comment" : "Fast paced book!",
"username" : "Gautam",
"_id" : ObjectId("4e86f68bfed0eb0be0000018")
},
{
"comment" : "Excellent literature",
"username" : "Tom",
"_id" : ObjectId("4e86f6fffed0eb0be000001a")
}
]
}
>
Noce that the reviews are embedded inside the book object. Now when we fetch the book
object, we will automacally get all the reviews too.
Choosing whether to embed or not to embed
Suppose we want to prepare orders for a book. The book can be leased or purchased. If
we want to maintain an order history in terms of lease and purchase, how do we build the
Lease, Purchase, and Order models?
Chapter 2
[ 59 ]
Time for action – embedding Lease and Purchase models
We have three model les Order, Lease, and Purchase as follows:
# app/models/order
class Order
include Mongoid::Document
field :created_at, type: DateTime
field :type, type: String # Lease, Purchase
belongs_to :book
embeds_one :lease
embeds_one :purchase
end
Now, depending on the type eld, we can determine which embedded object to pick up,
the lease, or the purchase. You can design the Lease and Purchase models as shown in the
following code:
# app/models/lease.rb
class Lease
include Mongoid::Document
field :from, type: DateTime
field :till, type: DateTime
embedded_in :order
end
# app/models/purchase.rb
class Purchase
include Mongoid::Document
field :quantity, type: Integer
field :price, type: Float
embedded_in :order
end
Diving Deep into MongoDB
[ 60 ]
Working with Map/Reduce
To see an example of how Map/Reduce works, let's now add votes to books. The following
shows how we can add votes:
{
"username" : "Dick",
"rating" : 5
}
Rang could be on a scale of 1 to 10, with 10 being the best. Every user can rate a book.
Our aim is to collect the total rang by all users. We shall save this informaon as a hash in
the votes array in the book object. This should not be confused with an embedded object
(as it does not have an object ID).
We have not seen the MongoDB data types such as ObjectId
and ISODate. We shall learn about these data types in the future
chapters. All usual data types such as integer, oat, string, hash,
and array are supported.
The following is how we save this informaon as a hash in the votes array in the book object:
> db.books.findOne()
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
"published_on" : ISODate("2002-12-30T00:00:00Z"),
"publisher" : "Dover Publications",
"reviews" : [
{
"comment" : "Fast paced book!",
"username" : "Gautam",
"_id" : ObjectId("4e86f68bfed0eb0be0000018")
},
{
"comment" : "Excellent literature",
"username" : "Tom",
"_id" : ObjectId("4e86f6fffed0eb0be000001a")
}
],
Chapter 2
[ 61 ]
"votes" : [
{
"username" : "Gautam",
"rating" : 3
}
]
}
Before we see the example of Map/Reduce, it would be fun to add more books and votes,
so that the Map/Reduce results make more sense. This is done as shown next:
> Book.create(name: "Great Expectations", author: Author.first)
=> #<Book _id: 4e8704fdfed0eb0f97000001, _type: nil, title: nil,
publisher: nil, published_on: nil, votes: nil, author_id: BSON::Ob
jectId('4e86e4b6fed0eb0be0000011'), category_ids: [], name: "Great
Expectations">
> Book.create(name: "A tale of two cities", author: Author.first)
=> #<Book _id: 4e870521fed0eb0f97000002, _type: nil, title: nil,
publisher: nil, published_on: nil, votes: nil, author_id: BSON::Object
Id('4e86e4b6fed0eb0be0000011'), category_ids: [], name: "A tale of two
cities">
Now let's add votes for all three books.
First, for Oliver Twist (for example, one vote by Gautam)
a = Book.first
=> #<Book _id: 4e86e45efed0eb0be0000010, _type: nil, title: nil,
publisher: "Dover Publications", published_on: 2002-12-30 00:00:00 UTC,
votes: nil, author_id: BSON::ObjectId('4e86e4b6fed0eb0be0000011'),
category_ids: [BSON::ObjectId('4e86e4cbfed0eb0be0000012'), BSON::ObjectId
('4e86e4d9fed0eb0be0000013')], name: "Oliver Twist">
> b.votes = []
=> []
> b.votes << {username: "Gautam", rating: 3} => [{:username=>"Gautam",
:rating=>3}]
> b.save
=> true
Diving Deep into MongoDB
[ 62 ]
Note that we rst set b.votes = [] ,that is, an empty array. This is
because MongoDB does not add the elds to the database unl they
are populated. So, by default b.votes would return nil. Hence it's
important to inialize it the rst me.
Now, for Great Expectaons (for example, three votes, one each by Gautam, Tom, and Dick)
> b = Book.where(name: "Great Expectations").first
=> #<Book _id: 4e8704fdfed0eb0f97000001, _type: nil, title: nil,
publisher: nil, published_on: nil, votes: nil, author_id: BSON::Ob
jectId('4e86e4b6fed0eb0be0000011'), category_ids: [], name: "Great
Expectations">
> b.votes = []
=> []
> b.votes << {username: "Gautam", rating: 9}
=> [{:username=>"Gautam", :rating=>9}]
> b.votes << {username: "Tom", rating: 3}
=> [{:username=>"Gautam", :rating=>9}, {:username=>"Tom", :rating=>3}]
> b.votes << {username: "Dick", rating: 7}
=> [{:username=>"Gautam", :rating=>9}, {:username=>"Tom", :rating=>3},
{:username=>"Dick", :rating=>7}]
> b.save
=> true
Finally, for The Tale of Two Cies (for example, two votes, one each by Gautam and Dick)
> c = Book.where(name: /cities/).first
=> #<Book _id: 4e870521fed0eb0f97000002, _type: nil, title: nil,
publisher: nil, published_on: nil, votes: nil, author_id: BSON::Object
Id('4e86e4b6fed0eb0be0000011'), category_ids: [], name: "A tale of two
cities">
Chapter 2
[ 63 ]
> c.votes = []
=> []
> c.votes << {username: "Gautam", rating: 9}
=> [{:username=>"Gautam", :rating=>9}]
> c.votes << {username: "Dick", rating: 5}
=> [{:username=>"Gautam", :rating=>9}, {:username=>"Dick", :rating=>5}]
> c.save
=> true
If we want to collect all the votes and add up the rang for each user, it can be a prey
cumbersome task to iterate over all of these objects. This is the where Map/Reduce helps us.
One alternave to Map/Reduce in this parcular example would be
to capture the vote count per book by incremenng a counter while
inserng votes and reviews itself. However, we shall use Map/Reduce
here so that we understand how it works.
Time for action – writing the map function to calculate ratings
This is how we can write the map funcon. As we have seen earlier, this funcon will emit
informaon, in our case, the key is the username and the value is the rang:
function() {
this.votes.forEach(function(x) {
emit(x.username, {rating: x.rating});
});
}
What just happened?
This is a JavaScript funcon. MongoDB understands and processes all JS funcons. Every me
emit() is called, some data is emied for the reduce funcon to process. In the preceding
code this represents the collecon object.
What we want to do is emit all the rangs for each element in the votes array for every
book. The emit() takes the key and value as parameters. So, we are eming the users
votes for the reduce funcon to process. It's also important to remember the data structure
we are eming as the value. It should be consistent for all objects. In our case {rating:
x.rating}.
Diving Deep into MongoDB
[ 64 ]
Time for action – writing the reduce function to process the
emitted results
Now let's write the reduce funcon. This takes a key and an array of values, shown as follows:
function(key, values) {
var result = {rating: 0};
values.forEach(function(value) {
result.rating += value.rating;
});
return result;
}
What just happened?
The reduce funcon is the one which processes the values that were emied from the
map funcon.
Remember that the values parameter is always an array. The map funcon could emit
results for the same key mulple mes, so we should be sure to process the value as an
array and accumulate results. The return structure should be the same as what was emied.
MongoDB supports Map/Reduce and will invoke Map/Reduce
funcons in parallel. This gives it power over standard SQL databases.
The closest a SQL database comes to this is when we use a GROUP
BY query. It depends on the indexes and the query red that can get
us similar results like Map/Reduce.
Using Map/Reduce together
As MongoDB requires JavaScript funcons, the trick here is to pass the JavaScript funcons
to the MongoDB engine via a string on the Rails console. So, we create two strings for the
map and reduce funcons.
Chapter 2
[ 65 ]
Time for action – working with Map/Reduce using Ruby
We shall now create two strings in Ruby for these funcons:
> map = %q{function() {
this.votes.forEach(function(x) {
emit(x.username, {rating: x.rating});
});
}
}
> reduce = %q{function(key, values) {
var result = {rating: 0};
values.forEach(function(value) {
result.rating += value.rating;
});
return result;
}
}
%q is an ecient, clean, and opmized way of wring mulline
strings in Ruby!
Remember that we are now in the MongoDB realm, so we should not work on Ruby
objects but only on the MongoDB collecon. So, we call map_reduce on the book
collecon, as follows:
> results = Book.collection.map_reduce(map, reduce, out: "vr")
=> #<Mongo::Collection:0x20cf7a4 @name="vr", @db=#<Mongo::DB:0x1ab8564 @
name="sodibee_development",
...
...
@cache_time=300, @cache={}, @safe=false, @pk_factory=BSON::ObjectId, @
hint=nil>
Diving Deep into MongoDB
[ 66 ]
The output you saw previously is the MongoDB collecon Map/Reduce result. Let's fetch the
full results now. The following command does it for us:
> results.find().to_a
=> [{"_id"=>"Dick", "value"=>{"rating"=>12.0}}, {"_id"=>"Gautam",
"value"=>{"rating"=>21.0}}, {"_id"=>"Tom", "value"=>{"rating"=>3.0}}]
What just happened?
Voila! This shows that we have the following result:
Dick has 12 rangs
Gautam has 21 rangs
Tom has 3 rangs
Tally these rangs manually with the preceding code and verify.
What would you have to do if you did not have Map/Reduce?
Iterate over all book objects and collect the votes array. Then
keep a temporary hash of usernames and keep aggregang the
rangs. Lots of work indeed!
Don't always jump into using Map/Reduce. Somemes it's just easier to query properly.
Suppose, we want to nd all the books that have votes or reviews for them, what do we do?
Do we iterate every book object and check the length of the votes array or the
reviews array?
Do we run Map/Reduce for this?
Is there a direct query for this?
We can directly re a query from the Rails console, as follows:
irb> Book.any_of({:reviews.exists => true}, {:votes.exists => true})
If we want to search directly on the mongo console, we have to execute the following
command:
mongo> db.books.find({"$or":[{reviews:{"$exists" : true}}, {votes :
{"$exists": true}}]})
Chapter 2
[ 67 ]
Remember, we should use Map/Reduce only when we have to process data and return
results (for example, when it's mostly stascal data). For most cases, there would be a
query (or mulple queries) that would get us our results.
Pop quiz – swimming in MongoDB and Ruby
1. How does MongoDB store data?
a. As JSON.
b. As Binary JSON or BSON.
c. As text in les.
d. An encrypted binary le.
2. What are collecons in MongoDB?
a. Collecons store documents.
b. Collecons store other collecons.
c. There is no such thing as collecons.
3. How do we represent an array of hashes in MongoDB?
a. Arrays can only have strings or integers in them.
b. Like this [ { k1: "v1" }, { k1: "v2"} ].
c. Hashes are not supported in MongoDB.
d. Like this { k1: [ "v1", "v2"], k2: ["v1", "v2"] }.
4. Which answer represents one of the ways models in Ruby communicate
with MongoDB?
a. Models in Ruby cannot talk directly to MongoDB.
b. Install the BSON gem.
c. Install the Mongoid gem and include Mongoid::Document in the Ruby class.
d. We inherit the Ruby class from ActiveRecord::Base.
5. How are many-to-many relaonships mapped in MongoDB?
a. We create a third collecon to store ObjectId instances.
b. Many-to-many is not supported in MongoDB.
c. Each document saves the other in an Array eld inside it.
d. Only one document saves informaon about the other.
Diving Deep into MongoDB
[ 68 ]
6. How can we create a join of two collecons in MongoDB?
a. We cannot! Joins are not supported in MongoDB.
b. db.collection1.find( { $join: "collection2" } ).
c. Always use Map/Reduce instead of joins.
d. db.join( { collection1: 1, collection2: 1 } ).
Summary
Here we really jumped into Ruby and MongoDB, didn't we? We saw how to create objects in
MongoDB directly and then via Ruby using Mongoid. We saw how to set up a Rails project,
congure Mongoid, and build models. We even went the distance to see how Map/Reduce
would work in MongoDB.
We saw a lot of new things too, which require explanaon. For example, the various data
types that are supported in MongoDB, such as ObjectId, ISODate.
In the next chapter, we shall dive deeper in these internal concepts and understand more
about how MongoDB works. Hang on ghtly!
3
MongoDB Internals
Now that we have had a brief look at Ruby and MongoDB interacons via
Mongoid, I believe it is the right me to know what happens under the hood.
This informaon is good to know but not mandatory. If you are a person in the
fast lane, you can skip this chapter and go straight to Chapter 4, Working Out
Your Way with Queries.
In this chapter we shall learn:
What exactly MongoDB documents and objects are.
What is BSON and how is it used in MongoDB to save informaon?
How and why does MongoDB use JavaScript?
What are MongoDB journal entries; how and why are they wrien?
What is the global write lock and how does it funcon?
Why are there no joins in MongoDB?
We have seen some examples of MongoDB objects earlier; these objects look similar to
JSON objects. However, MongoDB does not use JSON to store informaon – it uses Binary
JSON (BSON) for storage. Using BSON has a lot of advantages that we shall soon see.
MongoDB Internals
[ 70 ]
Understanding Binary JSON
The following is a sample of a JSON object we have seen before:
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
"published_on" : ISODate("2002-12-30T00:00:00Z"),
"publisher" : "Dover Publications"
}
There is a strange JSON output here (that I refrained from explaining earlier) for ObjectId
and ISODate. What is even stranger is that this data is not saved to the disk in the same
format as shown in the preceding code. Instead it is saved as Binary JSON—a serialized JSON
string. The following is a simple example:
{"hello": "world"}
Every BSON data has the following format:
<size> <type> <null byte>
The data in the preceding example is stored on the disk in the following format:
\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00
This is explained as follows:
\x16\x00\x00\x00: This indicates that the size of the binary data is 22 bytes
(remember 16 hex is 22 decimal)
\x02: This indicates that the value is a BSON string
hello\x00: The is the key that is always a null terminated string.
\x00: The BSON value has been idened as a null terminated string.
You might ask, "Why not just plain old { "hello" : world"} ?" There are plenty of reasons:
Binary data is easier to store and manipulate
Binary data is packed, so it consumes less space
Inserons and deleons in binary embedded objects are easy
Of course, more explanaons are due!
Chapter 3
[ 71 ]
Fetching and traversing data
As the data is in BSON format, it's easy to traverse it. The rst 4 bytes tell us how much
data is stored, so that objects can be easily skipped without parsing the data. It's easy to
skip embedded data too, as all the size of the data is known.
Manipulating data
When an embedded document is manipulated, MongoDB simply calculates the oset and
reaches it. Now, when some data is changed or added to this embedded objects, we don't
need to write the enre object back to the disk—MongoDB simply updates that BSON
document and the length of the data. This is quick and clean.
What is ObjectId?
ObjectId is a unique ID for a document. It is a 12-byte binary value designed to have a
reasonably high probability of being unique when allocated. By default the ObjectId
eld is stored under _id.
The concept of a unique Object ID as a primary key is important for MongoDB. In a highly
scalable system, this ensures that an Object ID "almost" never repeats. The rst 4 bytes of
ObjectId indicate the me (in seconds) since epoch and the last 3 bytes represent a counter.
Even if you insert two documents at the same moment, the counter value should increase.
There is nothing called guaranteed unique IDs—but it's almost guaranteed.
According to Wikipedia, "Only aer generang 1 billion UUIDs every second
for the next 100 years, the probability of creang just one duplicate would
be about 50%". Object IDs are not UUIDs but guarantee uniqueness.
Object ID is generated using the mestamp, 3 bytes of the MD5 hash of the machine name,
its MAC address or a virtual machine ID, the process ID, and an ever incremenng value.
Though every object has a unique ID, you would noce incremenng values for object IDs.
Documents and collections
Documents in MongoDB are structured documents saved in BSON format as menoned in
the earlier secon. The maximum size of documents is 16 MB. It's interesng to note that
16 MB is not a limitaon but is maintained for the sake of sanity!
In case we are required to store documents larger than 16 MB, MongoDB may be the wrong
choice. For storing large documents, such as videos, GridFS is recommended.
MongoDB Internals
[ 72 ]
Documents are analogous to records and are stored in collecons, which are analogous
to database tables. Documents in a collecon are usually structured similarly but it's
not mandatory. That means you can have dierently structured documents in the same
collecon. That's the essence of NoSQL or a "schema-free" database.
Collecons can be scoped or namespaces. For example, we could have a collecon rack
which has shelves and panels in it. These collecons have other collecons inside them:
db.rack
db.rack.shelves
db.rack.shelves.sections
db.rack.panels
db.rack.panels.components
Capped collections
Capped collecons have a xed number of documents in them. They can be considered as a
"queue" that discards the oldest element when the cap is reached. The ideal example for this
is log entries. We create capped collecons as follows:
Db.createCollection("myqueue", {capped: true, size: 10000})
Dates in MongoDB
Dates are saved independent of the me zone. They are always stored as epoch me—the
me in seconds from January 1, 1970.
> new ISODate("2011-12-31T12:01:02+04:30")
ISODate("2011-12-31T07:31:02Z")
> new ISODate("sdf")
Tue Nov 8 08:14:49 uncaught exception: invalid ISO date
> new ISODate("garbage 2011-12-31T12:01:02+05:30 more garbage")
ISODate("2011-12-31T06:31:02Z")
JavaScript and MongoDB
JavaScript seems a strange choice for a database for server-side code execuon. However, it's
denitely a beer choice than wring a custom language syntax—JavaScript is a very popular
language, well known among developers, and just like MongoDB it's evolving fast too.
Chapter 3
[ 73 ]
We have already seen the use of JavaScript in Map/Reduce funcons. But we can do more
than that. We can write our own custom JavaScript funcons and call them when we want.
Consider them more like stored procedures wrien in JavaScript.
db.eval is a funcon that is used to evaluate custom JavaScript funcons that we write.
Time for action – writing our own custom functions in MongoDB
Let's say we want to write a funcon to delete authors that don't have any books, we can
write this in JavaScript as follows:
function rid_fakes() {
var ids = [];
db.authors.find().forEach( function(obj) {
if (db.books.find({author_id: obj._id }).length() == 0 ) {
ids.push(obj._id);
}
});
db.authors.remove({_id : { $in : ids }});
}
db.eval(rid_fakes);
In a Ruby app, it's recommended to manage the objects rather than the
documents. This is to ensure that the cache does not get corrupted.
Ensuring write consistency or "read your writes"
It's very important to ensure that the database is eventually consistent. As we shall soon
see, MongoDB delays all writes to the disk because the disk's I/O is slow. Write consistency
means that every me something is wrien to the database, the delayed write should not
cause inconsistency when we read back the data. MongoDB ensures this consistency for
every write operaon and the updated value is always returned back in the read operaon.
This is important for a couple of reasons:
Ensuring you always get the latest updated data
Easy and consistent crash recovery
MongoDB Internals
[ 74 ]
How does MongoDB use its memory-mapped storage engine?
MongoDB tries to be as ecient and fast as it can get. So, to cater to this, it uses
memory-mapped les for storage. This is as fast as it can get with the disk I/O and
system cache. As every operang system works with virtual memory, MongoDB
leverages this and can eecvely be as large as the virtual memory allows it to be.
Memory-mapped les are segments of virtual memory that are mapped
byte-for-byte between the le and the memory. So, they can be
considered as fast as primary memory.
This also has an inherent advantage that as the operang system's virtual memory
management gets beer, it automacally improves the performance of the database
storage engine too!
There is a downside to everything! Memory-mapped les store informaon in the memory
and sync to the database aer a short while (by default in MongoDB that is 100 ms). So, we are
indeed dealing with a database where we could potenally lose the last 100 ms of informaon.
Advantages of write-ahead journaling
MongoDB (v1.7.5 onwards) supports write-ahead journaling. This means that before the
data is wrien to the collecons, it is wrien to the journal. This ensures that there is always
write consistency. For every write to the database:
1. Informaon is rst wrien to the journal.
2. Aer the journal entry is synchronized to the disk, data is wrien to
the database's memory-mapped le.
3. Informaon is then synchronized to the disk.
It's important to know that when a MongoDB client writes to the database, it is guaranteed
to return the updated result. If journaling fails, the enre write operaon is deemed the
failed. Journaling can be turned o but it's strongly recommended to be enabled.
Global write lock
I menoned earlier that MongoDB writes to the disk (using fsync) every 100 ms. However,
when this data is being wrien to the disk, it's important to keep it consistent. Hence,
MongoDB, for quite some versions, used a global write lock to ensure this.
This creates a problem because the enre database is locked unl the write is complete. This
means that if we have a long running write query, the database is locked for good and the
performance and eciency is seriously hit.
Chapter 3
[ 75 ]
The later versions of MongoDB (at the me of wring) plan to implement a collecon-based
lock to ensure that we can write simultaneously across collecons – but it's not there today.
What it does have instead is lock yielding. That means, any MongoDB thread will yield their
lock on page faults or long running queries. This solves the problem of the global lock to a
level of acceptable eciency. This is also called interleaving—when a long running write is
in progress, the thread yields temporarily for intermediate reads and writes.
Transactional support in MongoDB
MongoDB's primary objecves are to manage large data, be fast, and scale easily! So,
it's never going to be a perfect t for all applicaons. This has been the source of debate
between the SQL and NoSQL facons.
From a praccal perspecve, we should know there are no ACID transacons in MongoDB.
There are a few ways to do transacons in MongoDB but it may not always be a suitable
choice. Basically if you require a mul-document transacon, such as nancial data that is
spread across dierent collecons, MongoDB may be the wrong choice. However, for most
web applicaons, transaconal support is usually a sanity check and not a complex rollback.
In any case, choose wisely!
Understanding embedded documents and atomic updates
All document updates in MongoDB are atomic. This can itself be a very easy way to simulate
transaconal support in MongoDB. For example, if we require Orders to be created with
LineItems, we can easily simulate a transacon by embedding LineItems into Order.
That way when the document is saved, we are guaranteed atomic transacons.
Implementing optimistic locking in MongoDB
We can do opmisc locking using lock versioning. First let's understand what this means.
Every me the document, object, record, or row in the database is updated, we increment
a value of the eld. When we read the document, we know the value of the eld. When we
want to save the document, we ensure that the value we had read earlier has not changed.
If it's dierent, it means someone updated the document before us—so we need to read it
again. This is also called Compare and Set (CAS).
Opmisc locking already exists in AcveRecord. If you simply add
a column called lock_version in your table, it starts opmisc
locking. StateObjectError is raised in case the document's
lock_version value has changed.
MongoDB Internals
[ 76 ]
Time for action – implementing optimistic locking
Let's add a eld in our document called lock_version and set its inial value as 0.
When we fetch this object, we know what the version is. So, when we re the update call,
we ensure that it's part of the object selector!
mongo> db.authors.findOne()
{
"_id" : ObjectId("4f81832efed0eb0bbb000002"),
"name" : "Victor Metz",
"_type" : "Author",
"lock_version" : 0
}
mongo> db.authors.update({ _id: ObjectId("4f81832efed0eb0bbb000002"),
lock_version: 0 }, {name: "Victor Matz", lock_version: 1})
mongo> db.authors.find({ _id: ObjectId("4f81832efed0eb0bbb000002") })
{ "_id" : ObjectId("4f81832efed0eb0bbb000002"), "name" : "Victor
Metz", "_type" : "Author", "lock_version" : 1 }
mongo> db.authors.update(db.authors.update({ _id: ObjectId("4f81832ef
ed0eb0bbb000002"), lock_version: 0 }, {name: "NO SUCH AUTHOR", lock_
version: 1})
mongo> db.authors.find({ _id: ObjectId("4f81832efed0eb0bbb000002") })
{ "_id" : ObjectId("4f81832efed0eb0bbb000002"), "name" : "Victor
Metz", "_type" : "Author", "lock_version" : 1 }
What just happened?
What's important is to keep a check on the lock_version eld. When we fetched the rst
author objects, the lock_version value was 0.
mongo> db.authors.update(
{ _id: ObjectId("4f81832efed0eb0bbb000002"), lock_version: 0 },
{name: "Victor Matz", lock_version: 1})
Chapter 3
[ 77 ]
We are not just updang an object that has an ID equal to 4f81832efed0eb0bbb000002
but also where the lock_version eld is set. Noce that lock_version is being updated.
This is a programmer's instrucon. If we don't update lock_version manually, this strategy
would fail! Now we have lock_version set at value 1. If we tried to update the object as
shown in the following code snippet, the object selecon would fail and the object would
not be updated:
mongo> db.authors.update(
{ _id: ObjectId("4f81832efed0eb0bbb000002"), lock_version: 0 },
{name: "NO SUCH AUTHOR", lock_version: 1})
If that object has been modied by some other process or thread, lock_version would
have been incremented. So, the object in our preceding query would not get updated if the
lock version changes. But how do we do this in our Ruby program?
How do we perform Opmisc locking using Mongoid?
There are a few extensions available for this. See an example here at
https://github.com/burgalon/mongoid_optimistic_
locking. Basically, this changes the atomic_selector method to
include a _lock_version eld and auto-increment it on every save!
Choosing between ACID transactions and MongoDB transactions
Finally, we have seen how we can manipulate data safely using atomic operaons and ensure
data consistency. However, where you require transacons that span mulple documents or
tables and that is a crical feature of your applicaon, consider not using MongoDB.
For everything else, there's MongoDB.
Why are there no joins in MongoDB?
Joins are good, they say! And for a good reason, normalizaon is the best opon! Let's say
we have authors, books, and orders. What if we wanted to nd the orders of books sold
by authors that have the name Mark! An SQL query would probably be something like the
following query:
SELECT * FROM orders, books, authors WHERE books.author_id = author.id
AND orders.book_id = book.id AND author.first_name LIKE "Mark%"
MongoDB Internals
[ 78 ]
This causes an implicit join between authors, books, and orders. This is ne only under
the following circumstances:
The data in authors, books, and orders is not huge! If we had 1 million entries in
each table, it could reach a temporary join of around 1 million * 1 million * 1 million
entries, degrading the performance drascally. Every RDBMS is smart enough not to
create such a huge temporary table of course, but the result set is sll huge.
If we consider that the data is distributed between nodes (shared), the network
latency to gather informaon for a join from dierent nodes is going to be huge.
These are a few reasons why the NoSQL facon shies away from joins. As we have seen
earlier, the priories for MongoDB is managing huge data with easy scaling, sharing, and
faster querying. So, what are the alternaves to joins? Plenty!
The simplest soluon is to re mulple queries and programmacally get your
results set. As querying is fast, the cumulave me taken by ring mulple queries
could be compared to a fancy single query join, if not faster!
Denormalize and duplicate data—somemes, it's just easier to add some redundant
informaon if it's going to make querying faster.
Use Map/Reduce techniques to distribute and gather data from the database.
Pop quiz – the dos and don'ts of MongoDB
1. Why does MongoDB use BSON and not just JSON?
a. MongoDB wants to be dierent!
b. BSON enables faster inline data manipulaon and traversal.
c. BSON and JSON are the same.
d. MongoDB uses JSON and not BSON.
2. How does MongoDB persist data?
a. In memory-mapped les that are ushed to the disk every 100 ms.
b. Data is saved in the memory.
c. Data is saved in les on the disk.
d. Data is not saved.
Chapter 3
[ 79 ]
3. Which of the following is true for MongoDB?
a. Joins and transacons are fully supported in MongoDB.
b. Joins are supported but transacons are not supported.
c. Joins and mul-collecon transacons are not supported.
d. Single collecon transacons are not supported.
4. What is write-ahead journaling in MongoDB?
a. Writes are wrien with a mestamp in the future.
b. Writes are wrien to the journal log rst and then lazily to the disk.
c. Writes are wrien to the disk rst and then to the journal log.
d. Writes are wrien only in the journal.
Summary
MongoDB has a lot of things going on under the covers, most of which we may either
take for granted or somemes do not need to know to work with MongoDB. The team
behind MongoDB has been working hard to make MongoDB faster, easier, and more
humongous. If we understand how things work and what impact it's going to have on our
data or performance, it would help us build beer applicaons by making the most of all
that is oered by MongoDB. MongoDB does not support joins and transacons. There are
alternaves to this but if you require ACID transacons, you should use an SQL database.
In the subsequent chapters, we shall learn a lot about using MongoDB but we may not see
many MongoDB internals. I do hope that this chapter makes the underlying concepts easy
to understand.
4
Working Out Your Way with Queries
Wherever there is a database, there has to be some search criteria! This
chapter takes our journey forward towards searching for data in MongoDB.
In this chapter we will see how we can search via the mongo console.
In this chapter we shall learn the techniques for:
Searching by eld aributes (such as strings, numbers, oat, and date)
Searching on indexed elds
Searching by values inside an array eld
Searching by values inside a hash eld
Searching inside embedded objects
Searching by regular expressions
Let's start searching with the help from our good old Sodibee database!
Searching by elds in a document
Let's consider a book structure like the following:
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
Working Out Your Way with Queries
[ 82 ]
"name" : "Oliver Twist",
"published_on" : ISODate("2002-12-30T00:00:00Z"),
"publisher" : "Dover Publications",
"reviews" : [
{
"comment" : "Fast paced book!",
"username" : "Gautam",
"_id" : ObjectId("4e86f68bfed0eb0be0000018")
},
{
"comment" : "Excellent literature",
"username" : "Tom",
"_id" : ObjectId("4e86f6fffed0eb0be000001a")
}
],
"votes" : [
{
"username" : "Gautam",
"rating" : 3
}
]
}
We have already done this earlier, but let's reiterate and dig deeper. Let's nd all the books
published by Dover Publicaons. First let's start the mongo console as follows:
$ mongo
MongoDB shell version: 2.0.2
connecting to: test
> use sodibee
switched to db sodibee
Time for action – searching by a string value
Let's nd all the books that were published by Dover Publicaons. The following code shows
us how to accomplish this:
> db.find({ publisher : "Dover Publications"})
{ "_id" : ObjectId("4e86e45efed0eb0be0000010"), "author_id" : ObjectId
("4e86e4b6fed0eb0be0000011"), "category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
Chapter 4
[ 83 ]
], "name" : "Oliver Twist", "publisher" : "Dover Publications",
"reviews" : [
{
"comment" : "Fast paced book!",
"username" : "Gautam",
"_id" : ObjectId("4e86f68bfed0eb0be0000018")
},
{
"comment" : "Excellent literature",
"username" : "Tom",
"_id" : ObjectId("4e86f6fffed0eb0be000001a")
}
], "votes" : [ { "username" : "Gautam", "rating" : 3 } ] }
What just happened?
We have just red a simple find() query on a collecon to help us get the relevant
documents from the database. We can also congure the parameters in find() to get more
specic details. To see what specic parameters find() has, issue the following command:
> db.books.find
function (query, fields, limit, skip) {
return new DBQuery(this._mongo, this._db, this, this._fullName,
this._massageObject(query), fields, limit, skip);
}
The conguraon parameters for find() in the preceding code are explained as follows:
query: This is the selecon criteria. For example, { publisher: "Dover
Publications" } as we had menoned earlier. This is similar to the WHERE clause
in a relaonal query.
fields: These are the elds which we want selected. This is similar to the SELECT
part of a query in a relaonal query. By default, all elds would be selected, so
SELECT * is the default. In MongoDB we can specify inclusion as well as exclusion
of elds. We will see an example of this shortly.
limit: This represents the number of elements we want returned from the query.
This is similar to the LIMIT part of a relaonal query.
skip: This is the number of elements the query should skip before collecng
results. This is similar to the OFFSET part of a relaonal query.
Working Out Your Way with Queries
[ 84 ]
Have a go hero – search for books from an author
How do we search for books that are published by Dover Publicaons and wrien by
Mark Twain?
Hint: We need to re two queries. The rst one would be to nd the author by name
"Mark Twain". Then using that ObjectId, we can nd the books wrien by that author
and published by Dover Publicaons.
Querying for specic elds
Let's now evaluate these opons in greater detail.
Time for action – fetching only for specic elds
First, let's select only a few elds and see how the fields parameter works. This would be
similar to an SQL query. For example:
SELECT name, published_on, publisher FROM books WHERE publisher =
"Dover Publications";
In MongoDB this is achieved as follows:
> db.books.find({ publisher: "Dover Publications"}, {name: 1,
published_on : 1, publisher : 1 })
{ "_id" : ObjectId("4e86e45efed0eb0be0000010"), "name" : "Oliver
Twist", "published_on" : ISODate("2002-12-30T00:00:00Z"), "publisher"
: "Dover Publications" }
So far so good! But here is where MongoDB is more customizable and can do something that
SQL cannot. Noce that the values for the selected elds are 1 (they can also be set to true
instead of 1). We can oponally set them to 0 or false and then these will be the elds
excluded from the result. Let's see it in acon in the following code:
> db.books.find({ publisher: "Dover Publications"}, {name: 0,
published_on : 0, publisher : 0 })
{ "_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
], "reviews" : [
{
Chapter 4
[ 85 ]
"comment" : "Fast paced book!",
"username" : "Gautam",
"_id" : ObjectId("4e86f68bfed0eb0be0000018")
},
{
"comment" : "Excellent literature",
"username" : "Tom",
"_id" : ObjectId("4e86f6fffed0eb0be000001a")
}
], "votes" : [ { "username" : "Gautam", "rating" : 3 } ]
}
Noce that all elds are present in the result except name, published_on, and publisher.
What just happened?
Magic! Not only can we set inclusion elds but also exclusion elds. I don't believe there is
any way to set exclusion elds in an SQL query.
Let me be fair here, SQL databases intenonally do not allow exclusion
of elds from a SELECT query because of the structured nature of the
tables, so as to ensure good performance and to ensure that the contract
between the client-server is stable!
Imagine what happens to our query if we allow exclusion of columns and
those columns are deleted—so many addional checks and degradaon
of performance! Code extremists would even say, you can fetch the data,
lter it later, and remove the columns you don't want!
You can add more criteria to the query eld and they will be set. This would be similar to the
AND part in a WHERE clause.
Playing with inclusion and exclusion of elds
Remember that you cannot set inclusion and exclusion elds in the same
query. This means either all the elds should have value 1 or all should
have value 0. Otherwise MongoDB will throw an error 10053: You
cannot currently mix including and excluding elds.
The only excepon to this is the exclusion of the _id eld. We can
exclude the _id eld while including others. This means db.books.
findOne({}, {_id: 0, name: 1}) is valid.
Working Out Your Way with Queries
[ 86 ]
Have a go hero – including and excluding elds
Well, go ahead and experiment with the following:
Set dierent inclusion or exclusion elds for the books document.
Set the limit and OFFSET for the query. Let me give you some hints here. A limit
of 0 would mean no limit. skip values can be used for paging. Give it a shot and
check a lile later in the chapter whether you got it right!
Using skip and limit
skip and limit are both oponal parameters to the nd query. limit will limit the
number of elements in the result and skip will skip elements in the result.
Time for action – skipping documents and limiting our search
results
Suppose we want to query the second and third book in the collecon. We can set the skip
value to 1 or 2 and the limit value to 1. This is done as follows:
> db.books.find({}, {}, 1, 1)
{ "_id" : ObjectId("4e8704fdfed0eb0f97000001"), "author_id" : ObjectI
d("4e86e4b6fed0eb0be0000011"), "category_ids" : [ ], "name" : "Great
Expectations", "votes" : [
{
"username" : "Gautam",
"rating" : 9
},
{
"username" : "Tom",
"rating" : 3
},
{
"username" : "Dick",
"rating" : 7
}
] }
> db.books.find({}, {}, 1, 2)
{ "_id" : ObjectId("4e870521fed0eb0f97000002"), "author_id" : ObjectI
d("4e86e4b6fed0eb0be0000011"), "category_ids" : [ ], "name" : "A tale
of two cities", "votes" : [
Chapter 4
[ 87 ]
{
"username" : "Gautam",
"rating" : 9
},
{
"username" : "Dick",
"rating" : 5
}
] }
What just happened?
Noce that in both cases, we have menoned the query and fields parameters as an
empty hash. This is just for the sake of brevity!
limit is 1 in both cases but the skip values have changed. This would be similar to the
following SQL query:
SELECT * FROM books LIMIT 1 OFFSET 1
Have a go hero – paginating document results
To see paginaon in acon, it would really be cool if you add 20 books to the collecon. Then
query them using the limit value as 10 with the skip value as 0 for geng results of page
1 and the skip value as 10 to get results of page 2.
There are ulity methods such as findOne(), which just get us the
rst record. This has only two parameters: query and fields, as
skip and limit would be irrelevant.
Writing conditional queries
We have seen how to query on mulple condions. These were in conjuncon, that is, they
were bound by the AND clause:
> db.books.find({publisher: "Dover Publications", name: "Oliver
Twist"}
This would be similar to an SQL query:
SELECT * FROM books WHERE publisher = "Dover Publications" AND name =
"Oliver Twist";
Noce that AND is the default condion when mulple query parameters are specied. But
this is not always the case!
Working Out Your Way with Queries
[ 88 ]
Using the $or operator
The $or operator is very common when we want a result set that sases any one of the
condions specied.
Time for action – nding books by name or publisher
Let's nd all the books that have the name Oliver Twist or are from Dover
Publications. For the sake of brevity, we shall select only the name eld as follows:
db.books.find({ $or : [ { name: "Oliver Twist"} , {publisher : "Dover
Publications"} ] })
This will give us our result set of books with either the name as Oliver Twist or
publisher as Dover Publications.
What just happened?
The previous query is similar to the following:
SELECT * FROM books WHERE publisher = "Dover Publications" OR name =
"Oliver Twist";
Let's look at the query parameters in a lile more detail:
{$or : [
{name: "Oliver Twist"},
{publisher : "Dover Publications"}
]
}
$or is a special operator in MongoDB and takes an array of query parameters. We can use
this in conjuncon with other parameters too:
db.books.find({ published_on: ISODate("2002-12-30"), $or : [ { name:
"Oliver Twist"} , {publisher : "Dover Publications"} ] })
This would query with AND and OR. Its SQL equivalent would be:
SELECT * from books WHERE published_on = "2002-12-30" AND (name =
"Oliver Twist" OR publisher = "Dover Publications");
Writing threshold queries with $gt, $lt, $ne, $lte, and $gte
We always require to search within a threshold, don't we?
Chapter 4
[ 89 ]
MongoDB SQL Meaning
$gt > Greater than
$lt < Less than
$gte >= Greater than or equal to
$lte <= Less than or equal to
$ne != Not equal to
Time for action – nding the highly ranked books
Suppose we add the rank eld to the books, our book object will look something as follows:
{
"_id" : ObjectId("4e870521fed0eb0f97000002"),
"rank" : 10
}
Now, if we want to search for all books having a rank in the top 10 ranks, we can re the
following query:
> db.books.find({ "rank" : { $lte : 10 } } )
You can add more operators in the same hash too. For example, if we want to nd books in
the top ten but not the top ranked book (that is, rank != 1), we can do the following:
> db.books.find({ "rank" : { $lte : 10, $ne : 1 } } )
Have a go hero – nd books via rank
Why don't you give this a shot?
Find books which have a rank between 5 and 10
Find books before and aer a parcular date
Checking presence using $exists
As MongoDB is schema free, there are mes when we want to check the presence of some
eld in a document. For example, over the years, our schema for books evolved and we
added some new elds. If we want to take a specic acon on books that only have these
new elds, we may need to check if these elds exist.
Working Out Your Way with Queries
[ 90 ]
Suppose we want to search only for those books that have the rank eld in them, it can be
done as follows:
> db.books.find({ "rank" : { $exists : 1} })
Searching inside arrays
Unlike most SQL databases, MongoDB can store values inside arrays and hashes. Now, we
shall see how we can search inside arrays.
Did you know that most of the operators we learned about earlier,
could be used directly on arrays inside a document just like normal
elds? For example:
> db.books.insert( { "categories" : [ " Drama", "Acon"] } )
> db.books.nd( { categories : { $ne : "Romance"} } )
This will return the document we inserted previously. Isn't that cool?!
Time for action – searching inside reviews
Let's now have a look at our books document. We have an array of reviews. A review is an
embedded object (noce the _id parameter):
"reviews" : [
{
"comment" : "Fast paced book!",
"username" : "Gautam",
"_id" : ObjectId("4e86f68bfed0eb0be0000018")
},
{
"comment" : "Excellent literature",
"username" : "Tom",
"_id" : ObjectId("4e86f6fffed0eb0be000001a")
}
]
Let's try to retrieve reviews from "Gautam".
> db.books.find( { "reviews.username" : "Gautam")
Chapter 4
[ 91 ]
What just happened?
The MongoDB classic act!
"reviews.username" searches inside all the elements in the array for any eld called
"username", which has the specied value.
Of course, there are other convenonal ways of searching inside arrays.
Searching inside arrays using $in and $nin
This is something similar to the IN clause in SQL. Suppose we want to nd documents for
a specied number of values of a eld, we can use the $in operator. Let's see one of our
book objects:
> db.books.findOne()
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
}
We do know that these are Category objects referenced in some other collecon. But that
should not stop us from ring a direct query:
> db.books.find( { category_ids : { $in : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
] } } )
Alternavely, we could re a NOT IN query too, as follows:
> db.books.find( { category_ids : { $nin : [
ObjectId("555555555555555555555555"),
ObjectId("666666666666666666666666")
] } } )
This would return all the books in the collecon!
Working Out Your Way with Queries
[ 92 ]
Searching for exact matches using $all
As we just saw $in helps us search for documents that have any one of the values in the
array. It's $all that searches for documents that have all the values within the array in the
eld. Let's take this book object again:
> db.books.findOne()
{
"_id" : ObjectId("4e86e45efed0eb0be0000010"),
"author_id" : ObjectId("4e86e4b6fed0eb0be0000011"),
"category_ids" : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
],
"name" : "Oliver Twist",
}
Now, if we want to nd books which belong to both the categories menoned in the
previous code, we re the following query:
> db.books.find( { category_ids : { $all : [
ObjectId("4e86e4cbfed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
] } } )
This will return all the books that are in both categories. However, unlike the earlier case of
$in, the following query will not return the previously menoned book because it doesn't
belong to all the categories menoned next:
> db.books.find( { category_ids : { $all : [
ObjectId("4e86e4d9fed0eb0be0000011"),
ObjectId("4e86e4d9fed0eb0be0000012"),
ObjectId("4e86e4d9fed0eb0be0000013")
] } } )
Searching inside hashes
Just like arrays, we also want to search inside hashes. Searching inside hashes involves keys and
values. Let's assume that the book object looks as follows (that is, a hash instead of an array):
{
categories: {
'drama': 1,
'thriller': 2
},
}
Chapter 4
[ 93 ]
We can search for all books that have the drama set as 1:
> db.books.find({ "categories.drama" : 1 })
Noce that we access hash elds just like standard JSON object access.
It's interesng to note that the criteria for searching in
hashes and arrays is the same in most cases.
Searching inside embedded documents
Searching inside embedded documents is exactly like searching inside hashes. This seems to
make sense because MongoDB saves every document as a hash.
Embedded documents are somemes also called nested
documents in discussion.
The following is an example of an embedded document:
{
"_id" : ObjectId("6234a68bfed0eb0beabcd234"),
"name" : "The Adventures of Sindbad",
"category" : {
"_id" : ObjectId("5ad6f68bfed0eb0be1231213"),
"name" : "Adventure",
}
}
To fetch the category object it's exactly the same way as searching inside a hash:
> db.books.find( { "category.name" : "Adventure" }
And just like that, searching inside arrays, hashes, and embedded documents have almost
the same syntax!
Searching with regular expressions
The story isn't complete without regular expressions! Let's see a sample structure for the
names collecon:
{
_id : ObjectId("1ad6f68bfed0eb0be1231234"),
name : "Joe"
}
Working Out Your Way with Queries
[ 94 ]
{
_id : ObjectId("1ad6f68bfed0eb0be1231235"),
name : "Joey"
}
{
_id : ObjectId("1ad6f68bfed0eb0be1231236"),
name : "Jonas South"
}
{
_id : ObjectId("1ad6f68bfed0eb0be1231237"),
name : "Aron Bjoe"
}
Time for action – using regular expression searches
Now if we want to search for all the objects that have Joe in their name, we can re the
following query:
> db.names.find({ name : /Joe/} )
{ _id : ObjectId("1ad6f68bfed0eb0be1231234"), name : "Joe" }
{ _id : ObjectId("1ad6f68bfed0eb0be1231235"), name : "Joey" }
Noce that we got the objects that had a "Joe" in them. But wait! What happened to the
third record, it has a Joe in it too!
MongoDB searches are case-sensive!
Now, if we require all the names that have a joe in them, irrespecve of the case, we re a
similar query again:
> db.names.find({ name : /joe/i} )
{ _id : ObjectId("1ad6f68bfed0eb0be1231234"), name : "Joe"}
{ _id : ObjectId("1ad6f68bfed0eb0be1231235"), name : "Joey"}
{ _id : ObjectId("1ad6f68bfed0eb0be1231237"), name : "Aron Bjoe"}
Now we get all three objects. What if I want only the authors who start with a Jo, we re
another query as follows:
> db.names.find({ name : /^Jo/} )
{ _id : ObjectId("1ad6f68bfed0eb0be1231234"), name : "Joe" }
{ _id : ObjectId("1ad6f68bfed0eb0be1231235"), name : "Joey" }
{ _id : ObjectId("1ad6f68bfed0eb0be1231236"), name : "Jonas South" }
Chapter 4
[ 95 ]
Noce the dierence in the search result!
What just happened?
The magic of regular expressions! Here is a brief idea about how regular expressions work.
Then we can try out something complicated.
Regular expressions are divided into two parts—paern and occurrence. Paern, as the
name suggests, is the regular expression paern. Occurrence is the number of mes the
paern should occur:
Paern Occurrence
\w: Alphanumeric a*: 0 or more of a
\d: Digits a+: 1 or more of a
.: Any character a?: 0 or 1 of a
\s: Any whitespace a{10}: Exactly 10 of a
\W: Non alphanumerics a{3,10}: between 3 and 10 of a
\D: Non digits A{5,}: 5 or more of a
\S: Non whitespace a{,10}: at most 10 of a
\b: Word boundary [abc]: a or b or c
[a-z]: any character between a and z [^abc]: not a, b or c
[0-9]: Any digit between 0 and 9 ^: start of line
|: regex separator $: end of line
(...) regex group
While specifying the regular expressions, we write it enrely in front slashes (/):
/<some regex>/<flags>/
Flags can be:
i: Case insensive.
m: Mulline.
x: Extended—ignore all whitespaces in the regex.
a: Dot all. Allow dot to match all characters, including new line characters!
Let's see examples of their usage:
For one or more occurrences of a:
/a+/
Working Out Your Way with Queries
[ 96 ]
For one or more occurrences of a followed by 0 or more of b:
/a+b*/
# abc or xyz only
/abc|xyz/
For a case insensive match for alphanumerics:
/\w/i
For zero or more occurrences of x,y or z:
/[xyz]*/
Have a go hero – validate an e-mail address
Build a regular expression to match an e-mail ID. Let's keep this simple and not strictly follow
the ISO-compliant e-mail address format. This is just for learning and fun. Here are some hints:
An e-mail ID should start with two alphabets
An e-mail ID should be alphanumeric and may contain the following special
characters such as ., +, and _
Some examples of valid e-mail IDs are gautam@joshsoftware.com and gautam.
rege@gmail.co.in while those of invalid e-mail IDs are gautam%rege@invalid and
gautam.@.com
Pop quiz – searching the right way
1. How do we nd the 10th to 15th documents in the books collecon, including the
10th and 15th document?
a. db.books.find({},{}, 10, 15)
b. db.books.find({}, {}, 10, 5)
c. db.books.find({}, {}, 6, 9)
d. db.books.find(10, 5)
2. How do we nd the books only with the id and no other elds?
a. db.books.find({}, { _id: 1})
b. db.books.find()
c. db.books.find({_id : 1 } )
d. db.books.find
Chapter 4
[ 97 ]
3. How can we nd all the book documents that have a categories hash in them?
a. db.books.find( $exists: { categories : 1 })
b. db.books.find( { categories: $exists } )
c. db.books.exists( { categories: 1 } )
d. db.books.find({ categories : { $exists : 1 } } )
4. How do we nd all the books whose tle do not have the words the or a in it? For
example, "The Great Escape" should not be selected but "Tale of Two Cies" should
be selected.
a. db.books.find( { $nin: { title : [/the/, /a/] } )
b. db.books.find( { title: { $nin : [/the\b/i, /a\b/i ] } } )
c. db.books.find( { title: { $ne : "the"}, { $ne : "a"} } )
d. db.books.find( { title: { $neq : /the|a/i } } )
Summary
In this chapter, we have seen the various ways to query objects in MongoDB. We can search
by elds, inside arrays, hashes, and even embedded objects. We can even search by regular
expressions. Searching forms a vital part of any applicaon as there would typically be a lot
more reads than writes to the database. Searching eciently improves the performance of
the applicaon, so it's important that we understand these concepts well.
This is just the p of the iceberg. In the next chapters, we shall relate these querying
paradigms via Ruby using the various Ruby DataMappers.
5
Ruby DataMappers: Ruby and
MongoDB Go Hand in Hand
This is where we shi gears. Welcome to the land of Ruby. Unl now we have
been seeing how things work in MongoDB. Now, we shall connect to MongoDB
from Ruby. From here onwards there will be more of Ruby, objects, relaons,
and less of MongoDB syntax.
In this chapter we shall learn the following:
Why we need Ruby DataMappers
The dierent Ruby DataMappers and the power of open source
Comparing dierent Ruby DataMappers
Querying objects
Managing object relaons
Let's dive straight into Ruby with our Sodibee library management system!
Why do we need Ruby DataMappers
Well, how else would we connect to MongoDB? Let's rst see what a data mapper is.
By denion, a datamapper is a process, framework, or library that maps two dierent
sources of data. In our parcular case, one source is the MongoDB data structure and the
other is the Ruby object model.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 100 ]
If we have a relaonal database, we have tables which have columns. These are oen
mapped to the object-oriented language constructs—classes map to tables and aributes
map to columns. Considering the object-oriented nature of Ruby and the document data
structure of MongoDB, this makes a very good combinaon for a DataMapper. A class maps
to the collecon name and the object is the document inside a collecon. This is shown in
the following diagram:
class User {
Integer nage;
String name;
Float height;
}
Age Name Height
10 Gauta 5.10m
USER
Instead of directly ring queries on MongoDB using raw connecons, it's beer to have an
abstracon—via a data mapper. As is common in the open source world, there are usually
mulple opons available for everything and Ruby DataMappers are no dierent. There are
plenty of Ruby DataMappers for MongoDB and more are being born. In this book, we shall
concentrate on a few of the most popular ones.
The mongo-ruby-driver
This is the core driver that is available via the mongo gem. To install this gem, we simply
use the following command:
$ gem install mongo
MongoDB uses Binary JSON (BSON) to save data. So it's also necessary to install bson and
bson_ext gems. In most cases, as these are dependent gems, they should install along
with the mongo gem. Remember that you require the same version for mongo, bson, and
bson_ext! At the me of wring this book, the latest version of this driver is 1.6.2.
In case you see messages like the one shown next, please ensure that bson, bson_ext,
and mongo gem have the same version:
**Notice: C extension not loaded. This is required for optimum MongoDB
Ruby driver performance.
You can install the extension as follows:
gem install bson_ext
If you continue to receive this message after installing, make sure
that the bson_ext gem is in your load path and that the bson_ext and
mongo gems are of the same version.
$
Chapter 5
[ 101 ]
Time for action – using mongo gem
It's never complete without an example. So, let's write a sample Ruby program to connect to
our Sodibee database.
require 'mongo'
conn = Mongo::Connection.new
db = conn['sodibee_development']
coll = db['books']
puts coll.find.first.inspect
The output should look something like this:
$ ruby mongo_driver.rb
{"_id"=>BSON::ObjectId('4e86e45efed0eb0be0000010'), "author_id"=>BSON::O
bjectId('4e86e4b6fed0eb0be0000011'), "category_ids"=>[BSON::ObjectId('4
e86e4cbfed0eb0be0000012'), BSON::ObjectId('4e86e4d9fed0eb0be0000013')],
"name"=>"Oliver Twist", "published_on"=>2002-12-30 00:00:00 UTC,
"publisher"=>"Dover Publications", "reviews"=>[{"_id"=>BSON::ObjectId(
'4e86f68bfed0eb0be0000018'), "comment"=>"wow!", "username"=>"Gautam"},
{"comment"=>"Excellent literature", "username"=>"Tom", "_id"=>BSON::Ob
jectId('4e86f6fffed0eb0be000001a')}], "votes"=>[{"username"=>"Gautam",
"rating"=>3}]}
What just happened?
Wow! We just connected to MongoDB from a Ruby program and fetched the rst book from
the books collecon. Let's take this slowly, shall we? Let's see the previous code again:
require 'mongo'
conn = Mongo::Connection.new
db = conn['sodibee_development']
coll = db['books']
puts coll.find.first.inspect
The command require loads the Ruby Mongo library.
In case you are using Ruby 1.8.7, you may need to require "rubygems" or
add "rubygems" to your RUBYOPTS environment variable. In Ruby 1.9
onwards, this is implicitly included. Rubygems is a gem which helps Ruby
load Ruby library paths.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 102 ]
Let's have a look at the previous code once again:
require 'mongo'
conn = Mongo::Connection.new
db = conn['sodibee_development']
coll = db['books']
puts coll.find.first.inspect
This sets up the connecon with MongoDB. Did I hear you say "What the hell?!
Magically? what happened to the host or the port?" Welcome to the world of
"convenon over conguraon".
The Mongo driver is congured with defaults:
Host: Localhost is the default
Port: 27017 is the default
Opons:
safe: If it is true, MongoDB starts in safe mode (it is false by default)
slave_ok: It is (false by default) set to true only when connecng to
a single slave
logger: Remember that logging can degrade performance (It is nil
by default)
pool_size: It is (1 by default) the number of sockets connecons
in the pool
pool_timeout: It is (5.0 seconds by default) the seconds to wait
before which an excepon will be thrown
op_timeout: It is (nil by default) the read meout. There is no
meout by default
connect_timeout: It is (nil by default) the connecon meout.
By default the connecon never mes out
ssl: It is (false by default) set to true for secure connecons only
Whoa! These are a lot of opons. Noce the default values. You don't need to remember
them all if you are working with defaults.
Once again, let's have a look at the previous code:
require 'mongo'
conn = Mongo::Connection.new
Chapter 5
[ 103 ]
db = conn['sodibee_development']
coll = db['books']
puts coll.find.first.inspect
We now select the database we require and the collecon we want.
Guess what, looks are decepve! The Mongo::Connection class has the method
Mongo::Connection#[] that inializes a Mongo::Db object and returns it. We can then
access the collecon we want in this database. In case you require some specic opons for
the database object (for example, you may want to access the database in strict mode),
you would need to explicitly instanate the database object. This is done as follows:
db = Mongo::Db.new('sodibee_development', conn, :strict => true)
Strict mode ensures that the collecon exists before accessing it.
Otherwise it throws an error.
Of course, we usually require the former:
require 'mongo'
conn = Mongo::Connection.new
db = conn['sodibee_development']
coll = db['books']
puts coll.find.first.inspect
The command coll.find gets us the collecon object cursor (similar to database cursors)
and from this we print the rst. We shall see a lot of the find method later on in this chapter.
The Ruby DataMappers for MongoDB
We do not want to get into details of how the mongo-ruby-driver is wrien. This is because
it does a lot of work under the cover and we don't want to get our hands that dirty! Think of
this like a device driver—we use them but we are not the experts who write them. So, we
leave the niy-griy details to the DataMappers!
There are quite a few DataMappers built in Ruby to map to documents in MongoDB. The
ones that are very popular while this book is being wrien, are:
MongoMapper
Mongoid
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 104 ]
We shall now learn how to use both and you can see for yourself which to use. It's a close
race for the winner and towards the end of this chapter I do declare a verdict based on my
experiments with them.
MongoMapper
MongoMapper was one of the rst Ruby data mappers for MongoDB. Created by John
Nunemaker in early 2009, it has gained a lot of popularity. The enre library is wrien in
Ruby. However, the MongoMapper is ghtly coupled for Rails applicaons and does not
use the mongo-ruby-driver.
Mongoid
The work for the mongo-ruby-driver began in late 2008 and as it got stable it was also heavily
used in Ruby DataMappers. Mongoid, which began in mid-2009 by Durran Jordan has gained
tremendous popularity. It uses the Mongo driver for accessing MongoDB.
There has not been any clear winner among them, but my preference is with Mongoid.
I do leave it to your choice which one to choose as I will be going through both of them
in some detail.
Setting up DataMappers
We have seen how we can use the mongo-ruby-driver to access the MongoDB store via Ruby.
Now, we shall see how to use DataMappers for connecng, creang, and querying documents.
Conguring MongoMapper
As with any gem installaon, this is done as follows:
$ gem install mongo_mapper
If you are using Bundler, we could also set this in the Gemle using the following:
gem 'mongo_mapper'
If you are using Rails 3.1 or greater, we can create a new Rails project as follows:
$ rails new sodibee-mm
You should see something as follows:
create
create README
create Rakefile
Chapter 5
[ 105 ]
create config.ru
create .gitignore
create Gemfile
create vendor/plugins
create vendor/plugins/.gitkeep
run bundle install
Fetching source index for http://rubygems.org/
Using rake (0.9.2.2)
Using multi_json (1.0.4)
...
Installing sqlite3 (1.3.5) with native extensions
Installing turn (0.8.2)
Installing uglifier (1.2.0)
Your bundle is complete! Use 'bundle show [gemname]' to see where a
bundled gem is installed.
$
Now that we have set up a project, we need to install MongoMapper.
Time for action – conguring MongoMapper
Let's set up MongoMapper for generang the mongo config le.
$ rails generate mongo_mapper:config
create config/mongo.yml
The contents of config/mongo.yml look like the following code lisng:
defaults: &defaults
host: 127.0.0.1
port: 27017
development:
<<: *defaults
database: sodibee_mm_development
test:
<<: *defaults
database: sodibee_mm_test
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 106 ]
# set these environment variables on your prod server
production:
<<: *defaults
database: sodibee_mm
username: <%= ENV['MONGO_USERNAME'] %>
password: <%= ENV['MONGO_PASSWORD'] %>
The preceding le is a standard YML le with defaults. Now let's generate a mongo model
as follows:
$ rails generate mongo_mapper:model Author
The preceding code should generate the following les:
create app/models/author.rb
invoke test_unit
create test/unit/author_test.rb
create test/fixtures/authors.yml
The model le would be like the following—very complicated!
class Author
include MongoMapper::Document
end
What just happened?
We just saw two things:
We congured MongoMapper (through config/mongo.yml).
We generated models pre-congured with MongoMapper
MongoMapper::Document is a Ruby module that we can include in any model. Rails 3 now
advocates the use of ActiveModel and not inheritance from ActiveRecord.
Ruby module mixins are a unique and interesng feature of Ruby. Using
modules, we can make classes richer by including or extending modules
in classes.
Have a go hero – creating models using MongoMapper
Create the other Sodibee models for MongoMapper: book, category, and review. Refer
to Chapter 2, Diving Deep into MongoDB for details on these elds.
Chapter 5
[ 107 ]
Conguring Mongoid
Just like MongoMapper, Mongoid can be installed as a gem as follows:
$ gem install mongoid
You can also put the following in a Gemle:
gem 'mongoid'
Time for action – setting up Mongoid
Once we have a project created (just like we saw earlier), we can congure Mongoid
as follows:
$ rails generate mongoid:config
create config/mongoid.yml
The next code lisng is what the config/mongoid.yml looks like:
development:
host: localhost
database: sodibee_development
test:
host: localhost
database: sodibee_test
# set these environment variables on your prod server
production:
host: <%= ENV['MONGOID_HOST'] %>
port: <%= ENV['MONGOID_PORT'] %>
username: <%= ENV['MONGOID_USERNAME'] %>
password: <%= ENV['MONGOID_PASSWORD'] %>
database: <%= ENV['MONGOID_DATABASE'] %>
# slaves:
# - host: slave1.local
# port: 27018
# - host: slave2.local
# port: 27019
There is no direct generator for Mongoid. Simply do the following:
class Author
include Mongoid::Document
end
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 108 ]
Your Rails project should not load ActiveRecord (For Rails version less than 3.0).
Ensure the following:
Remove config/database.yml
Remove the following line from config/application.rb:
require 'rails/all'
Add the following line in config/application.rb:
require "action_controller/railtie"
require "action_mailer/railtie"
require "active_resource/railtie"
require "rails/test_unit/railtie"
For Rails 3.1.x and Rails 3.0.x to ensure that you do not load ActiveRecord.
Execute the following command:
$ rails new <project_name> -O –skip-bundle
What just happened?
We set up Mongoid, which looks almost similar to MongoMapper. However, the
Mongoid::Document and MongoMapper::Document dier considerably in the
way they are structured internally.
MongoMapper::Document includes the various plugins as follows:
include Plugins::ActiveModel
include Plugins::Document
include Plugins::Querying
include Plugins::Associations
include Plugins::Caching
include Plugins::Clone
include Plugins::DynamicQuerying
include Plugins::Equality
include Plugins::Inspect
include Plugins::Indexes
include Plugins::Keys
include Plugins::Dirty
include Plugins::Logger
Chapter 5
[ 109 ]
include Plugins::Modifiers
include Plugins::Pagination
include Plugins::Persistence
include Plugins::Accessible
include Plugins::Protected
include Plugins::Rails
include Plugins::Safe
include Plugins::Sci
include Plugins::Scopes
include Plugins::Serialization
include Plugins::Timestamps
include Plugins::Userstamps
include Plugins::Validations
include Plugins::EmbeddedCallbacks
include Plugins::Callbacks
Mongoid::Document includes these modules via Mongoid::Components as follows:
include ActiveModel::Conversion
include ActiveModel::MassAssignmentSecurity
include ActiveModel::Naming
include ActiveModel::Observing
include ActiveModel::Serializers::JSON
include ActiveModel::Serializers::Xml
include Mongoid::Atomic
include Mongoid::Attributes
include Mongoid::Collections
include Mongoid::Copyable
include Mongoid::DefaultScope
include Mongoid::Dirty
include Mongoid::Extras
include Mongoid::Fields
include Mongoid::Hierarchy
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 110 ]
include Mongoid::Indexes
include Mongoid::Inspection
include Mongoid::JSON
include Mongoid::Keys
include Mongoid::Matchers
include Mongoid::NamedScope
include Mongoid::NestedAttributes
include Mongoid::Persistence
include Mongoid::Relations
include Mongoid::Safety
include Mongoid::Serialization
include Mongoid::Sharding
include Mongoid::State
include Mongoid::Validations
include Mongoid::Callbacks
include Mongoid::MultiDatabase
If we compare the modules, there is lile to debate. Both have similar features but are
implemented in dierent ways internally. The only way to understand them in detail is to
dig into the code.
Inially, I did wonder about why MongoMapper and Mongoid don't
just merge like Rails and Merb. When I started digging into the code,
I realized how dierent the internal implementaon is. Do read this
http://www.rubyinside.com/mongoid-vs-mongomapper-
two-great-mongodb-libraries-for-ruby-3432.html.
Creating, updating, and destroying documents
Now let's work with objects—creang, updang, and deleng them. But rst, we need to set
up the model with aributes. We add these aributes in the models directly. Each aribute
has a name and also species the type of data storage. To ensure we see all the standard
data types, we shall see the Person model.
Dening elds using MongoMapper
We dene the model in the app/models/person.rb le as follows:
Chapter 5
[ 111 ]
class Person
include MongoMapper::Document
key :name, String
key :age, Integer
key :height, Float
key :born_on, Date
key :born_at, Time
key :interests, Array
key :is_alive, Boolean
end
Dening elds using Mongoid
With Mongoid, there is just a dierence in syntax:
class Person
include Mongoid::Document
field :name, type: String
field :age, type: Integer
field :height, type: Float
field :born_on, type: Date
field :born_at, type: Time
field :interests, type: Array
field :is_alive, type: Boolean
end
Creating objects
The way to create objects does not depend on the mapper. Just like we create objects in
Ruby, we pass the parameters as hash arguments.
Time for action – creating and updating objects
Let's create an object of the Person model with dierent values as shown next:
person = Person.new( name: "Tom Sawyer", age: 33, height: 5.10,
born_on: Date.parse("1972-12-23"),
born_at: Time.now, is_alive: true,
interests: ["Soccer", "Movies"])
=> #<Person _id: BSON::ObjectId('4ef4ab59fed0eb8962000002'), age: 33,
born_at: Fri, 23 Dec 2011 16:24:57 UTC +00:00, born_on: Sat, 23 Dec
1972, height: 5.1, interests: ["Soccer", "Movies"], is_alive: true,
name: "Tom Sawyer">
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 112 ]
Now, if we want to update the previous object, we save it by calling the save method aer
seng the name. It is done as follows:
person.name = "Huckleberry Finn"
person.save
Now if we want to destroy this object, we simply issue the following command:
person.destroy
That's it!
What just happened?
There is no dierent syntax when using Mongoid or MongoMapper. This is the real
advantage of using Ruby DataMappers.
In reality, Ruby frameworks such as Rails and Sinatra, try to be as independent of the data
source as possible. So, if we used MySQL, PostgreSQL, or any other database, we can easily
migrate them to MongoDB and vice versa by altering some part of the code.
However, this does not mean that there would be no code change. As we will soon see in
the querying documents, and later in Understanding model relaonships, it's not that simple
and straighorward.
Using nder methods
This is where the real fun begins! We shall start seeing dierent ways to search among
objects. Both, MongoMapper and Mongoid try to adhere to the standard querying interface
as much as possible.
Finders are rounes that return the objects as part of the result. Both MongoMapper and
Mongoid implement the standard querying interface.
Using nd method
The find method nds the object with the specied ID:
person = Person.find('4ef4ab59fed0eb8962000002')
It's interesng to see that the MongoDB object ID is _id while for Ruby
it is id. Both can be used interchangeably.
Chapter 5
[ 113 ]
Using the rst and last methods
As the name suggests, we can get the rst and the last objects with these methods as follows:
Person.first # => The first object.
Person.last # => The last object.
Using the all method
As the name suggests, this method fetches all the objects. We can oponally pass it some
selecon criteria too. This is done as follows:
Person.all
Or
Person.all(:age => 33)
So, what happens if we have 1 million person objects and we re Person.all? Does this
mean all 1 million objects are fetched? MongoDB internally uses the cursor to fetch objects
in batches. By default 1000 objects are fetched.
Using MongoDB criteria
Criteria are proxy objects or intermediate results. These are not queries that are red on the
database immediately—that is why they are called the criteria. We can chain criteria. When
all criteria are completed and we really need the data, the nal query is red and documents
are fetched from the database. This has immense advantages while programming in Ruby.
In Rails, these are called scopes (and in earlier versions they were called
named scopes).
We saw the use of all earlier. Mongoid treats all as a criteria while
MongoMapper resolves it—that is all returns an array.
Executing conditional queries using where
This is the most frequently used criterion:
Person.where(:all => 33)
This looks uncannily similar to the all method we have seen earlier. However, the result
from where is enrely dierent from all.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 114 ]
Time for action – fetching using the where criterion
When we want to fetch (and chain) results, we use the where criteria. For example, if we
have a web applicaon and there are dierent lters, such as age and name, we can chain
these criteria easily in a Ruby applicaon as shown next:
people = Person.where(:age.gt => 15)
people = people.where(:name => /saw/i)
=> #<Person _id: BSON::ObjectId('4ef4ab59fed0eb8962000002'), age: 33,
born_at: Fri, 23 Dec 2011 16:24:57 UTC +00:00, born_on: Sat, 23 Dec
1972, height: 5.1, interests: ["Soccer", "Movies"], is_alive: true,
name: "Tom Sawyer">
What just happened?
We not only saw how criteria work but also the dierent selecon criteria syntax. Let's
analyze this in detail.
MongoMapper uses Plucky— a gem for managing proxy objects. It
basically creates a lambda based on the selecon criteria. Then we
can chain these lambda instances together and get a result.
This same funconality in Mongoid is available in the
Mongoid::Critera object. This is one of the key internal
dierences between both MongoMapper and Mongoid.
Take a look at the following code:
people = Person.where(:age.gt => 15)
people = people.where(:name => /saw/i)
The previous code returns a criterion object. If we are using MongoMapper, this would
return a Plucky object:
=> #<Plucky::Query age: {"$gt"=>15}, transformer: #<Proc:0x1d8cab0@/
Users/gautam/.rvm/gems/ruby-1.9.2-p290/gems/mongo_mapper-0.10.1/lib/
mongo_mapper/plugins/querying.rb:79 (lambda)>>
If we use Mongoid, the following code would return a Mongoid::Criteria object:
=> #<Mongoid::Criteria
selector: {},
options: {:age=>{"$gt"=>15}},
class: Person,
embedded: false>
Chapter 5
[ 115 ]
It's important to remember that the database query has not been red yet.
Noce the construct :age.gt => 15. This is the short form of wring
:age => { "$gt" => 15 } and this means "age greater than 15".
Now let's analyze the next line. This makes things very interesng!
people = Person.where(:age.gt => 15)
people = people.where(:name => /saw/i)
The people criterion is now "chained" with another criterion. If we use MongoMapper,
this is what we see of the people object now:
=> #<Plucky::Query age: {"$gt"=>15}, name: /saw/i, transformer:
#<Proc:0x1d86778@/Users/gautam/.rvm/gems/ruby-1.9.2-p290/gems/mongo_
mapper-0.10.1/lib/mongo_mapper/plugins/querying.rb:79 (lambda)>>
Did you noce the second line of code:
people = people.where(:name => /saw/i)
We have chained where to the earlier people criterion. Also noce that name: /saw/i
is now part of the selecon criterion. If we use Mongoid, this would look like the following:
=> #<Mongoid::Criteria
selector: {:age=>{"$gt"=>15}, :name=>/saw/i},
options: {},
class: Person,
embedded: false>
It's interesng to know that the query has sll not been red. Only when all the criteria are
fullled, will the objects be fetched from the database. This is unlike an SQL query, which
directly fetches results; this is instead more ecient as we resolve the enre scope of the
selecon before fetching objects.
Noce the /saw/i construct. This is a case-insensive regular
expression search for any name that has saw in it, such as Sawyer!
Revisiting limit, skip, and offset
We have seen the use of limit, skip, and offset earlier in Chapter 4, Working Out Your
Way with Queries. Now, we shall see how simple it is to set them from MongoMapper or
Mongoid. It is done as follows:
Person.where(:age.gt => 15).limit(5)
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 116 ]
Paginaon is an excellent example of this. This chains criteria to ensure that at most ve
results are returned in the results set.
Person.all.skip(5).limit(5) # Page 2 with 5 elements
Person.all.skip(10).limit(5) # Page 3 with 5 elements
Understanding model relationships
Now we shall see dierent types of object relaons. They are as follows :
One-to-many relaon
Many-to-many relaon
One-to-one relaon
Polymorphic relaons
The one to many relation
Let's get back to Sodibee! Let's assume that one book has one author. In a relaonship
statement, this means, "An Author has many books" and "A book belongs to one author".
We write a relaonship exactly like this.
Time for action – relating models
We shall see how we can set up relaons in both MongoMapper as well as Mongoid.
Using MongoMapper
As we know the author model is in the app/models/author.rb le and book is in the
app/models/book.rb le:
class Author
include MongoMapper::Document
key :name, String
many :books
end
class Book
include MongoMapper::Document
key :name, String
Chapter 5
[ 117 ]
key :publisher, String
key :published_on, Date
belongs_to :author
end
Using Mongoid
The le locaons remain the same, it's only the syntax that changes as follows:
class Author
include Mongoid::Document
field :name, type: String
has_many :books
end
class Book
include Mongoid::Document
field :name, type: String
field :publisher, type: String
field :published_on, type: Date
belongs_to :author
end
Let's now create some books and authors. This object creaon code remains the same,
irrespecve of which data mapper we use. We create books and authors as follows:
irb> charles = Author.create(name: "Charles Dickens")
=> => #<Author _id: BSON::ObjectId('4ef5a7eafed0eb8c7d000001'),
name: "Charles Dickens">
irb> b = Book.create (name: "Oliver Twist", published_on: Date.
parse("1983-12-23"), publisher: "Dover Publications", author: charles)
=> #<Book _id: BSON::ObjectId('4ef5a888fed0eb8c7d000002'), author_id:
BSON::ObjectId('4ef5a7eafed0eb8c7d000001'), name: "Oliver Twist",
published_on: Fri, 23 Dec 1983, publisher: "Dover Publications">
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 118 ]
What just happened?
many is a method in MongoMapper that takes the relaon (also called the associaon) as a
parameter. Its equivalent in Mongoid is has_many.
belongs_to is a reverse relaon that tells us who the parent is.
As with all relaons, the child references the parent. This means the book document has an
author_id eld.
In SQL, it's a thumb rule that the foreign key resides with the child table.
Similarly, the reference resides in the child document in MongoDB.
Let's look at the book creaon code in more detail:
irb> b = Book.create (name: "Oliver Twist", published_on: Date.
parse("1983-12-23"), publisher: "Dover Publications", author: charles)
=> #<Book _id: BSON::ObjectId('4ef5a888fed0eb8c7d000002'), author_id:
BSON::ObjectId('4ef5a7eafed0eb8c7d000001'), name: "Oliver Twist",
published_on: Fri, 23 Dec 1983, publisher: "Dover Publications">
Noce, that we have passed author: charles, a variable which references the author
object. However, when the object is created we see author_id: BSON::ObjectId(..)
The many-to-many relation
Let's introduce the Category model here. A book can have many categories and a category
can have many books.
Time for action – categorizing books
As always, we shall now see how MongoMapper achieves a many-to-many relaon rst and
then how Mongoid does the same.
MongoMapper
We are adding a new model—app/models/category.rb. This is done as follows:
class Category
include MongoMapper::Document
key :name, String
key :book_ids, Array
Chapter 5
[ 119 ]
many :books, in: :book_ids
end
class Book
include MongoMapper::Document
key :title, String
key :publisher, String
key :published_on, Date
belongs_to :author
end
Mongoid
The following code shows how we do this using Mongoid:
class Category
include Mongoid::Document
key :name, String
has_and_belongs_to_many :books
end
class Book
include MongoMapper::Document
key :title, String
key :publisher, String
key :published_on, Date
belongs_to :author
has_and_belongs_to_many :categories
end
Here is another area where MongoMapper and Mongoid dier in the internal
implementaon. Noce, that when using MongoMapper, the Book model has
no changes. This means we cannot access the categories of a book from the Book
object directly. We shall see this in more detail.
MongoMapper has only a one-way associaon for many-to-many.
Mongoid maintains the inverse relaon, that is, it updates both
documents. A plus one for Mongoid!
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 120 ]
Accessing many-to-many with MongoMapper
First create a few categories as follows:
irb> fiction = Category.create(name: "Fiction")
=> #<Category _id: BSON::ObjectId('4ef5b159fed0eb8d9c00000a'), book_
ids: [], name: "Fiction">
irb> drama = Category.create(name: "Drama")
=> #<Category _id: BSON::ObjectId('4ef5b231fed0eb8df5000005'), book_
ids: [], name: "Drama">
Now, let's associate our book with these categories as follows:
irb> fiction.books << Book.first
irb> fiction.save!
So far so good! We should be able to retrieve this relaon too. This is done as shown next:
irb> fiction.books
=> [#<Book _id: BSON::ObjectId('4ef5a888fed0eb8c7d000002'), author_
id: BSON::ObjectId('4ef5a7eafed0eb8c7d000001'), name: "Oliver Twist",
published_on: Fri, 23 Dec 1983, publisher: "Dover Publications">]
In MongoMapper, we cannot nd the categories of a book object.
We have to look via the Category model only, as the inverse
relaon is not supported yet.
Accessing many-to-many relations using Mongoid
Let's create a few categories again as follows:
irb> fiction = Category.create(name: "Fiction")
=> #<Category _id: 4e86e4cbfed0eb0be0000012, _type: nil, name:
"Fiction", book_ids: []>
irb> drama = Category.create(name: "Drama")
=> #<Category _id: 4e86e4d9fed0eb0be0000013, _type: nil, name:
"Drama", book_ids: []>
Noce the book_ids aribute. It is present because of the has_and_belongs_to_many
statement. Now let's associate the books and categories as follows:
irb> fiction.books << Book.first
Chapter 5
[ 121 ]
That's it! Now let's check the relaon by fetching it as follows:
irb> fiction.books.first
=> => #<Book _id: 4e86e45efed0eb0be0000010, _type: nil, title: nil,
publisher: "Dover Publications", published_on: 2002-12-30 00:00:00
UTC, author_id: BSON::ObjectId('4e86e4b6fed0eb0be0000011'), category_
ids: [BSON::ObjectId('4e86e4cbfed0eb0be0000012')], name: "Oliver
Twist">
Looks good! However, let's go one step further than MongoMapper.
irb> Book.first.categories
=> [#<Category _id: 4e86e4cbfed0eb0be0000012, _type: nil, name:
"Fiction", book_ids: [BSON::ObjectId('4e86e45efed0eb0be0000010')]> ]
What just happened?
I would give this round to Mongoid. We created many-to-many relaons in both
MongoMapper and Mongoid. However, Mongoid maintains the inverse relaon!
So, if we were using MongoMapper, the following relaon gives an error:
irb> Book.first.categories
NoMethodError: undefined method 'categories' for #<Book:0x1d63fd4>
from: (method_missing)
This would not happen if we were using Mongoid.
When we write many :books in the model, the many method
denes a new method called books, which references the associaon.
As the many-to-many relaon is one-sided in MongoMapper, we have
not declared any associaon in the book model for categories.
Hence, the method_missing error.
One addional point to be menoned here is that in MongoMapper,
we save informaon to an array, not a relaon. So, the object has to be
explicitly saved. In Mongoid, we use an associaon to save the relaon,
so we do not need to call save explicitly on the object.
The one-to-one relation
Let's add a BookDetail model to Sodibee. The BookDetail model contains informaon
about the number of pages, the cost, the binding style, among others.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 122 ]
Using MongoMapper
We will now add the new model app/models/book_detail.rb.
In Rails, the BookDetail model is stored in the book_detail.rb
le—snake case.
We can add the BookDetail model using MongoMapper as follows:
class Book
include MongoMapper::Document
key :title, String
key :publisher, String
key :published_on, Date
belongs_to :author
one :book_detail
end
class BookDetail
include MongoMapper::Document
key :page_count, Integer
key :price, Float
key :binding, String
key :isbn, String
belongs_to :book
end
Using Mongoid
Now we will extend the book model and add the new book_detail.rb as follows:
class Book
include MongoMapper::Document
key :title, String
key :publisher, String
key :published_on, Date
belongs_to :author
has_and_belongs_to_many :categories
has_one :book_detail
Chapter 5
[ 123 ]
end
class BookDetail
include Mongoid::Document
field :page_count, type: Integer
field :price, type: String
field :binding, type: String
field :isbn, type: String
belongs_to :book
end
Time for action – adding book details
Let's add book details for our book now. It's the same for both MongoMapper and Mongoid.
The following code shows you how to do it:
irb> oliver = Book.first
=> #<Book _id: 4e86e45efed0eb0be0000010, _type: nil, title: nil,
publisher: "Dover Publications", published_on: 2002-12-30 00:00:00
UTC, author_id: BSON::ObjectId('4e86e4b6fed0eb0be0000011'), category_
ids: [BSON::ObjectId('4e86e4cbfed0eb0be0000012')], name: "Oliver
Twist">
irb> oliver.create_book_detail(page_count: 250, price: 10, binding:
"standard", isbn: "124sdf23sd")
=> => #<BookDetail _id: 4ef5bdaafed0eb8ed7000002, _type: nil, page_
count: 250, price: 10.0, binding: "standard", isbn: "124sdf23sd",
book_id: BSON::ObjectId('4e86e45efed0eb0be0000010')>
What just happened?
We created a BookDetail object. That was obvious, wasn't it? However, a closer look at
this and we learn something new as follows:
irb> oliver.create_book_detail(page_count: 250, price: 10,
When we have only a direct single associaon (or relaon), we build it using the create_
prex. In the earlier case for a many-to-many relaon, in case we want to add a new
category, we could do something similar to the following:
irb> oliver.categories.create(name: "New Theater")
This would create a new category and associate that category with the Book object.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 124 ]
Have a go hero – create the other models
Create the Book, Author, and Category objects. Then associate them!
Understanding polymorphic relations
Before we even see how this is done using MongoMapper or Mongoid, it's important to
understand the basic concept of polymorphic relaons.
Polymorphic means mulple forms or mulple behaviors. When we use it in the context of a
database, we do mean mulple forms of the object. Let's see an example.
"Abstract base objects" in technical terms and "Generic common nouns" in layman's terms
are ideal examples for explaining polymorphic relaons.
For example, a vehicle could mean a two-wheeler, three-wheeler, a car, a truck or even a
space shule! A vehicle has at least one driver, so we have a relaon between a vehicle
and its driver. Let's assume that a vehicle has only one driver. A driver has dierent skills.
For example he could be a cyclist, an astronaut, or an F1 driver! So, how do we map these
dierent types of driver proles?
Implementing polymorphic relations the wrong way
If we are using a relaonal database, we can create a table called vehicles. We map all
aributes of a vehicle as columns in the table. So, we have all elds of a vehicle (right from a
cycle to a space shule) mapped in columns and then populate only the relevant elds. We
also keep a type column, which signies what the vehicle type is—cycle, car, space shule
among others.
This is crazy because we could end up with a table having a few thousand columns! Wrong,
wrong, wrong!
You could argue that using a document database like MongoDB could alleviate this problem
— because it is schema free. So, we could create a collecon called vehicles and we could
map dierent elds in a document and keep going unl we can. The type eld idenes the
type of the vehicle. However, this is sll not a praccal or a scalable approach and degrades
performance as data increases. Considering that a document has a limited size.
Implementing polymorphic relations the correct way
There are two types of polymorphic relaons:
Single Collecon Inheritance (SCI)
Basic polymorphic relaons
Chapter 5
[ 125 ]
We shall study both of them in detail. Aer that, we shall see when to choose the right
approach. Let's study them rst.
Single Collection Inheritance
This is very similar to the inheritance of standard object-oriented programming. See the
following diagram for the inheritance hierarchy for drivers:
Driver
- name : string
- age : int
+accelerate()
+brake()
+turn()
AcroSpace
-gForce:float
AcroSpace
-can_swim : boolean
Terrestrial
-license : boolean
Astronaut Pilot
-eject()
ShipDriver SubmarineDriver BikeDriver CarDriver
+reverse()
+climb()
Time for action – managing the driver entities
Let's see the code for this. First let's create the generic Driver model as follows:
# app/model/driver.rb
class Driver
include Mongoid::Document
field :name, type: String
field :age, type: Integer
field :address, type: String
field :weight, type: Float
end
This is prey much straighorward. Now let's see the AeroSpace, Terrestrial, and
Marine classes. They are shown next:
# app/models/terrestrial.rb
class Terrestrial < Driver
field :license, type: Boolean
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 126 ]
end
# app/models/marine.rb
class Marine < Driver
field :can_swim, type: Boolean
end
# app/model/aero_space.rb
class AeroSpace < Driver
field :gforce, type: Float
end
Here we simply inherit from the Driver class. Let's dive deeper. Let's create the Pilot,
Astronaut, and other lower-level classes as follows:
# app/models/pilot.rb
class Pilot < AeroSpace
end
# app/models/astronaut.rb
class Astronaut < AeroSpace
end
# app/models/ship_driver.rb
class ShipDriver < Marine
end
# app/models/submarine_driver.rb
class SubmarineDriver < Marine
end
# app/models/car_driver.rb
class CarDriver < Terrestrial
end
# app/models/bike_driver.rb
class BikeDriver < Terrestrial
end
Now let's create some objects as follows:
irb> Pilot.create(name: "Gautam")
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: nil, weight: nil, gforce: nil>
irb> CarDriver.create(name: "Car Gautam")
Chapter 5
[ 127 ]
=> #<CarDriver _id: 4ef9b206fed0eb9824000001, _type: "CarDriver",
name: "Car Gautam", age: nil, address: nil, weight: nil, license: nil>
irb> ShipDriver.create(name: "Ship Gautam")
=> #<ShipDriver _id: 4ef9b21afed0eb9824000002, _type: "ShipDriver",
name: "Ship Gautam", age: nil, address: nil, weight: nil, can_swim:
nil>
irb> > Marine.count
=> 1
> Marine.first
=> #<ShipDriver _id: 4ef9b21afed0eb9824000002, _type: "ShipDriver",
name: "Ship Gautam", age: nil, address: nil, weight: nil, can_swim:
nil>
> Terrestrial.count
=> 1
> Terrestrial.first
=> #<CarDriver _id: 4ef9b206fed0eb9824000001, _type: "CarDriver",
name: "Car Gautam", age: nil, address: nil, weight: nil, license: nil>
irb> Driver.count
=> 3
What just happened?
Using Single Collecon Inheritance, we can nd out how dierent types of drivers form
dierent levels of specializaon.
Let's create a few objects as follows:
irb> Pilot.create(name: "Gautam")
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: nil, weight: nil, gforce: nil>
irb> CarDriver.create(name: "Car Gautam")
=> #<CarDriver _id: 4ef9b206fed0eb9824000001, _type: "CarDriver",
name: "Car Gautam", age: nil, address: nil, weight: nil, license: nil>
irb> ShipDriver.create(name: "Ship Gautam")
=> #<ShipDriver _id: 4ef9b21afed0eb9824000002, _type: "ShipDriver",
name: "Ship Gautam", age: nil, address: nil, weight: nil, can_swim:
nil>
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 128 ]
Here we created a Pilot, ShipDriver, and a CarDriver object. All in the standard normal
way of creang objects. However, we can also access these objects in dierent ways.
> Marine.first
=> #<ShipDriver _id: 4ef9b21afed0eb9824000002, _type: "ShipDriver",
name: "Ship Gautam", age: nil, address: nil, weight: nil, can_swim:
nil>
Remember that we never created a Marine object. However, when we try to fetch the rst
Marine object, it works! Noce that even the type of object fetched is not a Marine but a
ShipDriver object. What's going on? We wanted to fetch the rst Marine object and it
returned a ShipDriver object!
This is polymorphism in acon. The Marine class behaves in dierent ways depending on
the object it represents. In other words, the Marine class has a polymorphic relaon with
its subclasses.
Going deeper into this:
irb> Driver.count
=> 3
We created a Pilot, ShipDriver, and a CarDriver but the Driver count is 3.
Basic polymorphic relations
Now let's see a dierent way of managing polymorphic relaons. Let's consider the vehicles.
There are dierent types of vehicles—all having totally dierent properes but all are
vehicles nevertheless. So, SCI may not be a good choice for a space shule and a bike,
as they are enrely dierent vehicles!
Choosing SCI or basic polymorphism.
What you need to consider is the number of collecons you want. If you
want all objects to reside in one collecon use SCI. If you want objects to
reside in dierent collecons use basic polymorphism.
In other words, in case the polymorphism is data-centric (that is, if objects
have a lot of dierent properes or data), use basic polymorphism.
If the polymorphism is more funconality-centric (that is, if objects have
similar properes but dierent funcons) use SCI.
Chapter 5
[ 129 ]
Time for action – creating vehicles using basic polymorphism
Let's design the Vehicle model:
# app/models/vehicle.rb
class Vehicle
include Mongoid::Document
belongs_to :resource, :polymorphic => true
field :terrain, type: String
field :cost, type: Float
field :weight, type: Float
field :max_speed, type: Float
end
This is the main polymorphic class. We now use this class in other models.
Unlike SCI, each model is independent, but can choose to be a part
of Vehicle. It has its own identy and does not inherit from any
parent model.
Let's create a few objects. The code to create a Bike model is as follows:
# app/models/bike.rb
class Bike
include Mongoid::Document
has_one :vehicle, :as => :resource
field :gears, type: Integer
field :has_handle, type: Boolean
field :cubic_capacity, type: Float
end
The code to create a Ship model is as follows:
# app/models/ship.rb
class Ship
include Mongoid::Document
has_one :vehicle, :as => :resource
field :is_military, type: Boolean
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 130 ]
field :is_cruise, type: Boolean
field :missile_capable, type: Boolean
field :anti_aircraft, type: Boolean
field :number_engines, type: Integer
end
The code to create a Submarine model is as follows:
# app/models/submarine.rb
class Submarine
include Mongoid::Document
has_one :vehicle, :as => :resource
field :max_depth, type: Float
field :is_nuclear, type: Boolean
field :missile_capable, type: Boolean
end
The code to create a SpaceShuttle model is as follows:
# app/models/space_shuttle.rb
class SpaceShuttle
include Mongoid::Document
has_one :vehicle, :as => :resource
field :boosters, type: Integer
field :launch_location, type: String
end
The code to create an Aeroplane model is as follows:
# app/models/aerorplane.rb
class Aeroplane
include Mongoid::Document
has_one :vehicle, :as => :resource
field :seating, type: Integer
field :max_altitude, type: Integer
field :wing_span, type: Float
end
Chapter 5
[ 131 ]
The code to create a Car model is as follows:
# app/models/car.rb
class Car
include Mongoid::Document
has_one :vehicle, :as => :resource
field :windows, type: Integer
field :seating, type: Integer
field :bhp, type: Float
end
Here, you see that each model has a bunch of properes that are dierent from each other but
all basically fall under the Vehicle category. One of the advantages of basic polymorphism is
that it's easy to enter and exit from this paern. It's very easy to incorporate an exisng model
into a polymorphic paern and equally easy to remove an exisng model from one. We just
add or remove the relaonship to the polymorphic model.
Now let's build objects as follows:
irb> ship = Ship.new(is_military: true)
=> #<Ship _id: 4f042c53fed0ebc45b000003, _type: "Ship", is_military:
true, is_cruise: nil, missile_capable: nil, anti_aircraft: nil,
number_engines: nil>
irb> vehicle = Vehicle.create(resource: ship)
=> #<Vehicle _id: 4f042c87fed0ebc481000002, _type: "Vehicle",
resource_type: "Ship", resource_id: BSON::ObjectId('4f042c53fed0ebc4
5b000003'), terrain: nil, cost: nil, weight: nil, max_speed: nil>
What just happened?
We created a Ship object and then associated it to Vehicle. Let's have a closer look at this
in the following code:
irb> vehicle = Vehicle.create(resource: ship)
=> #<Vehicle _id: 4f042c87fed0ebc481000002, _type: "Vehicle",
resource_type: "Ship", resource_id: BSON::ObjectId('4f042c53fed0ebc4
5b000003'), terrain: nil, cost: nil, weight: nil, max_speed: nil>
Noce the resource_id and resource_type elds, they dene the resource that the
vehicle represents. To get actual informaon about the vehicle, we have to lookup the
Ship object.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 132 ]
This two-step process could have been done in one step itself, as follows:
irb> Vehicle.create(resource: Ship.create(is_military: true))
=> #<Vehicle _id: 4f042de8fed0ebc4c5000004, _type: "Vehicle",
resource_type: "Ship", resource_id: BSON::ObjectId('4f042de8fed0ebc
4c5000003'), terrain: nil, cost: nil, weight: nil, max_speed: nil>
Remember, that we cannot do this the other way round:
irb>ship = Ship.create(:vehicle => Vehicle.create)
=> #<Ship _id: 4f042dd0fed0ebc4c5000002, _type: "Ship", is_military:
nil, is_cruise: nil, missile_capable: nil, anti_aircraft: nil, number_
engines: nil>
irb> Vehicle.last
=> #<Vehicle _id: 4f042dd0fed0ebc4c5000001, _type: "Vehicle",
resource_type: nil, resource_id: nil, terrain: nil, cost: nil, weight:
nil, max_speed: nil>
irb> Vehicle.create(:resource => Ship.create)
When the rst command is run, the Vehicle object is created rst, so the Ship object
cannot be assigned as the resource. That is the reason the Vehicle object has resource_
type and resource_id as nil. Obvious, wasn't it?
Choosing SCI or basic polymorphism
As menoned earlier, this is the choice of single collecon or mulple collecons. It's best
shown by an example. The MongoDB collecon looks like the following for drivers and
vehicles:
> db.drivers.find()
{"_id":ObjectId("..."), "name":"Gautam", "_type":"Pilot" }
{"_id":ObjectId("..."), "name":"Gautam", "_type":"CarDriver" }
{"_id":ObjectId("..."), "name":"Gautam", "_type":"ShipDriver" }
Noce, that for the drivers collecon, the _type of objects are dierent in the same
collecon. This is SCI!
> db.vehicles.find()
{"_id":ObjectId("..."), "_type" : "Vehicle", "resource_id" : ObjectId("4f
02077dfed0ebb308000001"), "resource_type" : "Ship" }
{"_id":ObjectId("..."), "_type" : "Vehicle", "resource_id" : ObjectId("4f
020807fed0ebb308000007"), "resource_type" : "Ship" }
However, in the vehicles collecon, the _type of objects is the same—Vehicle. This is
basic polymorphism.
Chapter 5
[ 133 ]
Using embedded objects
We know what embedded objects are and we have seen this already in the previous
chapters. Now, we shall see how these are built via DataMappers. Just to recap, an
embedded document is one that resides inside a parent document. We have seen a
sample of this already, it's listed next:
book : { name: "Oliver Twist",
...
reviews: [
{
_id: ObjectId("5e85b612fed0eb0bee000001"),
user_id: ObjectId("8d83b612fed0eb0bee000702"),
book_id: ObjectId("4e81b95ffed0eb0c23000002"),
comment: "Very interesting read"
},
{
_id: ObjectId("4585b612fed0eb0bee000003"),
user_id : ObjectId("ab93b612fed0eb0bee000883"),
book_id: ObjectId("4e81b95ffed0eb0c23000002"),
comment: "Who is Oliver Twist?"
}
]
...
}
In the preceding code, reviews is an array of embedded objects. How do you idenfy an
embedded object?
{
_id: ObjectId("5e85b612fed0eb0bee000001"),
user_id: ObjectId("8d83b612fed0eb0bee000702"),
book_id: ObjectId("4e81b95ffed0eb0c23000002"),
comment: "Very interesting read"
}
When ObjectId exists, it's an embedded object. Now, let's see how we dene them using
DataMappers. As with all associaons, these are two-way associaons.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 134 ]
Time for action – creating embedded objects
Let's connue our example and assume that a driver has one address and many bank
accounts. As addresses or bank accounts have hardly any relevance without a driver,
we choose to embed them into the Driver model.
Using MongoMapper
First let's revisit the Driver model as shown next:
class Driver
include MongoMapper::Document
one :address
many :bank_accounts
end
Now let's see how the Address and BankAccount models are constructed. This is done
as follows:
# app/models/address.rb
class Address
include MongoMapper::EmbeddedDocument
key :street, String
key :city, String
end
# app/models/bank_account.rb
class BankAccount
include MongoMapper::EmbeddedDocument
key :account_number, String
key :balance, Float
end
Using Mongoid
Using Mongoid, it looks like the following:
class Driver
include Mongoid::Document
field :name, type: String
...
Chapter 5
[ 135 ]
embeds_one :address
embeds_many :bank_accounts
end
And the Address and BankAccount models are wrien as follows:
# app/models/address.rb
class Address
include Mongoid::Document
field :street, type: String
field :city, type: String
embedded_in :driver
end
# app/model/bank_account.rb
class BankAccount
include Mongoid::Document
field :account_number, type: String
field :balance, type: Float
embedded_in :driver
end
If we try this on the Rails console, we can create Driver, Address, and BankAccount
objects. Using either of the DataMappers, we can create the objects as follows:
irb> d = Driver.first
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: nil, weight: nil, gforce: nil>
irb> d.address = Address.new(street: "SB Road", city: "Pune")
=> #<Address _id: 4f0491bcfed0ebcc59000001, _type: nil, street: "SB
Road", city: "Pune">
irb> d.bank_accounts << BankAccount.new(account_number:
"1230001231225", balance: 1231.23)
=> [#<BankAccount _id: 4f0491f6fed0ebcc59000002, _type: nil, account_
number: "1230001231225", balance: 1231.23>]
irb> d.save
=> true
irb> d = Driver.first
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 136 ]
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: {"street"=>"SB Road", "city"=>"Pune", "_
id"=>BSON::ObjectId('4f0491bcfed0ebcc59000001')}, weight: nil, gforce:
nil>
irb> d.address
=> #<Address _id: 4f0491bcfed0ebcc59000001, _type: nil, street: "SB
Road", city: "Pune">
irb> d.bank_accounts
=> [#<BankAccount _id: 4f0491f6fed0ebcc59000002, _type: nil, account_
number: "1230001231225", balance: 1231.23>]
What just happened?
When we add an Address object or a BankAccount object to Driver, an object is created
but it's embedded inside the Driver object. If we see the MongoDB document, we will
noce the following:
mongo> db.drivers.findOne()
{ "_id" : ObjectId("4ef9a410fed0eb977d000002"), "_type" : "Pilot",
"address" : { "street" : "SB Road", "city" : "Pune", "_id" : ObjectId(
"4f0491bcfed0ebcc59000001") },
"name" : "Gautam"
"bank_accounts" : [
{
"account_number" : "1230001231225",
"balance" : 1231.23,
"_id" : ObjectId("4f0491f6fed0ebcc59000002")
}
]
}
Noce that address and bank_accounts are elds in the document but have ObjectId
specied in them.
Remember that you cannot create or access embedded objects without
the parent object context.
If you try to create an embedded object without any context of the document it's embedded
in, you will get an error. We'll see this in the following secons.
Chapter 5
[ 137 ]
Using MongoMapper
irb> Address.create
NoMethodError: undefined method 'create' for Address:Class
The Address class does not have a create method. This is because it is embedded into
another object. Let's see if we can nd an address (as weird as that sounds).
irb> > Address.first
NoMethodError: undefined method 'first' for Address:Class
That didn't work either—and rightly so.
Using Mongoid
Mongoid gives slightly dierent errors instead of MongoMapper:
irb> Address.create
NoMethodError: undefined method 'new?' for nil:NilClass
Undened method!! That's a weird one! If we dig deeper into the Mongoid code, we see
that a model maps to a collecon and we create documents inside that collecon. Address
is not a collecon (as it's an embedded document). So, when we call create on this, it tries
to resolve that model to collecon. As there is no collecon by this name, nil is passed to
the Persistence module, resulng in the NilClass error. Not very intuive, but please
pardon Mongoid!
irb> Address.first
Mongoid::Errors::InvalidCollection: Access to the collection for
Address is not allowed since it is an embedded document, please access
a collection from the root document.
Wow! Finally we get an error that makes sense. Mongoid tells us to access the parent
document and not access the embedded document, as there is no collecon named Address.
This error also gives more insight into how dierent the internal behavior
of Mongoid and MongoMapper is.
Reverse embedded relations in Mongoid
The reverse embedded relaons for embedded documents is very important. Mongoid uses
them to resolve where these documents are to be embedded. Here are some things we
should keep in mind to avoid unforeseen behavior.
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 138 ]
Time for action – using embeds_one without specifying
embedded_in
If we only specify the embeds_one relaonship in the parent but do not specify the
embedded_in relaonship in the embedded relaon, the document will not be
embedded and there will be no error issued either. Have a look at the following code:
class Driver
include Mongoid::Document
...
embeds_one :address
end
class Address
include Mongoid::Document
# have intentionally not put the embedded_in relation.
End
If we now try to embed the Address object into the Driver, a half-baked Driver object
gets created:
irb> d = Driver.first
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: {"street"=>"SB Road", "city"=>"Pune", "_
id"=>BSON::ObjectId('4f0491bcfed0ebcc59000001')}, weight: nil, gforce:
nil>
irb> d.address = Address.new(street: "A new street")
=> #<Address _id: 4f0662c2fed0ebe0ee000002, _type: nil, street: "A
new street", city: nil>
irb> d.save
=> true
irb> Driver.first
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: {"street"=>"SB Road", "city"=>"Pune", "_
id"=>BSON::ObjectId('4f0491bcfed0ebcc59000001')}, weight: nil, gforce:
nil>
Chapter 5
[ 139 ]
What just happened?
Noce that the address has not changed in the object saved to database, even though
MongoDB says that the object was saved correctly. The reason why the address did not
change from SB Road to A new street is because when Mongoid tried to save the
embedded document, it looked for the reverse relaon and did not nd it, so that data
was ignored.
Under the cover, Mongoid treats embedded models also as Mongoid::Document.
The embedded_in method helps resolve the parent.
Time for action – using embeds_many without specifying
embedded_in
Not specifying the embedded_in can cause some real problems even for a many-to-many
relaon. This would create new half-baked parent objects in the collecon. Have a look at
the following code:
class Driver
include Mongoid::Document
...
embeds_many :bank_accounts
end
class BankAccount
include Mongoid::Document
# have intentionally not put the embedded_in relation.
end
Now, if we try to add BankAccounts to the Driver object, we get into trouble! This is
shown next:
irb> d = Driver.last
=> #<Driver _id: 4f06667cfed0ebe13e000001, _type: nil, name:
nil, age: nil, address: {"_id"=>BSON::ObjectId('4f066684fed0ebe1
3e000002')}, weight: nil>
irb> d.bank_accounts << BankAccount.new
=> [#<BankAccount _id: 4f06672cfed0ebe164000001, _type: nil, account_
number: nil, balance: nil>]
irb> Driver.last
=> #<Driver _id: 4f06672cfed0ebe164000001, _type: nil, name: nil,
age: nil, address: nil, weight: nil>
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 140 ]
What just happened?
First we fetched the last Driver object as follows:
irb> d = Driver.last
=> #<Driver _id: 4f06667cfed0ebe13e000001, _type: nil, name:
nil, age: nil, address: {"_id"=>BSON::ObjectId('4f066684fed0ebe1
3e000002')}, weight: nil>
Here, we can see that it's a proper Driver object with some addresses embedded in it.
We also see that the Driver object has the ID 4f06667cfed0ebe13e000001.
Now, we are trying to embed a BankAccount object into the Driver bank_accounts array
but remember that we have not specied the embedded_in relaon. This is done as follows:
irb> d.bank_accounts << BankAccount.new
=> [#<BankAccount _id: 4f06672cfed0ebe164000001, _type: nil, account_
number: nil, balance: nil>]
Noce, that we rightly see the BankAccount object inserted into the bank_accounts
array. However, there is something seriously wrong in the database update:
irb> Driver.last
=> #<Driver _id: 4f06672cfed0ebe164000001, _type: nil, name: nil,
age: nil, address: nil, weight: nil>
Now, if we try to fetch the last driver object, we see a Driver object with the ID
4f06672cfed0ebe164000001. This is the object ID of the BankAccount object
we created in the earlier step. So, we have a half-baked Driver object.
Be careful! As MongoDB is a schema-free database, it will allow such
incorrect behavior to creep in—but it's only we who are to blame
when we use Mongoid incorrectly.
MongoMapper, on the other hand, treats embedded documents
dierently as they are MongoMapper::EmbeddedDocuments,
so this problem does not arise.
Understanding embedded polymorphism
Yes! We can use polymorphism even for embedded documents. Why treat them
dierently? We already know the concept of polymorphism. Let's extend this to
embedded documents too.
Chapter 5
[ 141 ]
Single Collection Inheritance
Let's assume that a driver has dierent types of licenses—to y, to drive a car, to drive a bike, to
drive a ship, to command a space shule, among others. As the license cannot exist without a
driver, we embed it into the Driver model. However, the license shows polymorphic behavior.
Time for action – adding licenses to drivers
First, let's embed licenses into the Driver model using Single Collecon Inheritance. This
can be done as follows:
class Driver
include Mongoid::Document
field :name, type: String
...
embeds_many :licenses
end
And now let's create a License model as follows:
# app/models/lincense.rb
class License
include Mongoid::Document
embedded_in :driver
end
# app/models/car_license.rb
class CarLicense < License
end
Let's see how to embed the License model into the Driver model in the following code:
irb> d = Driver.first
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: {"street"=>"SB Road", "city"=>"Pune", "_
id"=>BSON::ObjectId('4f0491bcfed0ebcc59000001')}, weight: nil, gforce:
nil>
irb> d.licenses << CarLicense.new
=> [#<CarLicense _id: 4f065ed4fed0ebd605000003, _type: "CarLicense">]
irb> d.save
=> true
irb> Driver.first.licenses
=> [#<CarLicense _id: 4f065ed4fed0ebd605000003, _type: "CarLicense">]
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 142 ]
What just happened?
We can see that the licenses array now has a CarLicense object in it. It's also interesng
to see from the MongoDB console that the ID was really embedded:
{ "_id" : ObjectId("4ef9a410fed0eb977d000002"), "_type" : "Pilot",
"address" : { "street" : "SB Road", "city" : "Pune", "_id" : ObjectId(
"4f0491bcfed0ebcc59000001") }, "bank_accounts" : [
{
"account_number" : "1230001231225",
"balance" : 1231.23,
"_id" : ObjectId("4f0491f6fed0ebcc59000002")
}
], "licenses" : [
{
"_id" : ObjectId("4f065ed4fed0ebd605000003"),
"_type" : "CarLicense"
}
], "name" : "Gautam" }
Yes it was indeed!
Basic embedded polymorphism
Let's consider the case of insurance for drivers. Assume that drivers may or may not
have insurance. For example, suppose we say that pilots and astronauts must have travel
insurance and car drivers must have the insurance. Bike riders don't need any insurance.
In such a case, we don't want insurance to be a part of the Driver model.
Instead, we should have the opon to put it in any class that really needs it. This also means
that these insurance classes may be related to dierent driver subclasses. As insurance is
moot without the driver's existence, we should embed it.
Time for action – insuring drivers
Let's prepare dierent types of insurance as follows:
# app/models/pilot.rb
class Pilot < AeroSpace
embeds_many :insurances, as: :insurable
end
# app/models/car_driver.rb
class CarDriver < Terrestrial
embeds_many :insurance, as: :insurable
Chapter 5
[ 143 ]
end
# app/models/astronaut.rb
class Astronaut < AeroSpace
embeds_many :insurances, as: :insurable
end
And now we design the Insurance class as follows:
# app/models/insurance.rb
class Insurance
include Mongoid::Document
embedded_in :insurable, polymorphic: true
end
# app/models/travel_insurance.rb
class TravelInsurance < Insurance
end
# app/models/theft_insurance.rb
class TheftInsurance < Insurance
end
Now let's provide insurance policies for our drivers as follows:
irb> p = Pilot.first
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: {"street"=>"asfds", "city"=>"Pune", "_id
"=>BSON::ObjectId('4f0491bcfed0ebcc59000001')}, weight: nil, gforce:
nil>
irb> p.insurances << TravelInsurance.new
=> [#<TravelInsurance _id: 4f06ad2efed0ebe598000002, _type:
"TravelInsurance">]
irb> a = Astronaut.first
=> #<Astronaut _id: 4f069fd8fed0ebe45d000001, _type: "Astronaut",
name: nil, age: nil, address: nil, weight: nil, gforce: nil>
irb> a.insurances << TravelInsurance.new
=> [#<TravelInsurance _id: 4f06b058fed0ebe598000004, _type:
"TravelInsurance">]
irb> a.insurances << FireInsurance.new
Ruby DataMappers: Ruby and MongoDB Go Hand in Hand
[ 144 ]
=> [#<FireInsurance _id: 4f06ad6bfed0ebe598000003, _type:
"FireInsurance">]
irb> a.insurances
=> [#<FireInsurance _id: 4f06ad6bfed0ebe598000003, _type:
"FireInsurance">, #<TravelInsurance _id: 4f06b058fed0ebe598000004,
_type: "TravelInsurance">]
What just happened?
Let's have a closer look at the preceding commands:
irb> p = Pilot.first
=> #<Pilot _id: 4ef9a410fed0eb977d000002, _type: "Pilot", name:
"Gautam", age: nil, address: {"street"=>"asfds", "city"=>"Pune", "_id
"=>BSON::ObjectId('4f0491bcfed0ebcc59000001')}, weight: nil, gforce:
nil>
irb> p.insurances << TravelInsurance.new
=> [#<TravelInsurance _id: 4f06ad2efed0ebe598000002, _type:
"TravelInsurance">]
Here, Insurance is polymorphic. This means that the Insurance object can be embedded
in mulple parents. In this case, we have TravelInsurance (that is, a model, which
inherits from Insurance) being assigned to the Pilot class:
irb> a = Astronaut.first
=> #<Astronaut _id: 4f069fd8fed0ebe45d000001, _type: "Astronaut",
name: nil, age: nil, address: nil, weight: nil, gforce: nil>
irb> a.insurances << TravelInsurance.new
=> [#<TravelInsurance _id: 4f06b058fed0ebe598000004, _type:
"TravelInsurance">]
Now, we have the TravelInsurance object being embedded in the Astronaut class. This
shows us the polymorphic nature of the Insurance embedded object – it can be embedded
in dierent parents.
Have a go hero
Why don't you try and assign TheftInsurance to CarDriver?
Choosing whether to embed or to associate documents
This is indeed somemes a dilemma. While modeling data, if you see that the child document
cannot exist without the parent object and if you are relavely sure that you would not need to
search for the child objects directly, you could embed them.
Chapter 5
[ 145 ]
For the UML savvy, a composion relaon is a good candidate for embedding.
When in doubt do not embed!
So, what happens if you embed an object and realize later that you need to process
embedded objects? Or maybe the relaon was wrong—it should not have been embedded?
Don't worry! The following are a couple of opons you have:
Change the code from embed to associaon. As MongoDB is schema free, new
objects will automacally pick up the relaon.
Fire queries on the embedded objects if required. But, this may not be a good
soluon as it would mean unnecessary calls for even basic lookups.
Mongoid or MongoMapper – the verdict
It's neutral! Sck to either Mongoid or MongoMapper, not both at the same me.
My personal preference is Mongoid as it's closer to the ActiveModel relaons than
MongoMapper.