Puppet 5 Beginners Guide 3rd Edition

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 267 [warning: Documents this large are best viewed by clicking the View PDF Link!]

[ 1 ]
Puppet 5 Beginner's Guide
Third Edition
Go from newbie to pro with Puppet 5
John Arundel
Puppet 5 Beginner's Guide
Third Edition
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system,
or transmied in any form or by any means, without the prior wrien permission of the
publisher, except in the case of brief quotaons embedded in crical arcles or reviews.
Every eort has been made in the preparaon of this book to ensure the accuracy of the
informaon presented. However, the informaon contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers
and distributors will be held liable for any damages caused or alleged to be caused directly or
indirectly by this book.
Packt Publishing has endeavored to provide trademark informaon about all of the
companies and products menoned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this informaon.
First published: April 2013
Second edion: May 2017
Third edion: October 2017
Producon reference: 1031017
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78847-290-6
John Arundel
Jo Rhe
Acquision Editor
Ben Renow-Clarke
Project Editor
Alish Firasta
Content Development Editor
Monika Sangwan
Technical Editors
Bhagyashree Rai
Gaurav Gavas
Copy Editor
Gladson Monteiro
Mariammal Cheyar
Kirk D'Penha
Producon Coordinator
Arvindkumar Gupta
Cover Work
Arvindkumar Gupta
About the Author
John Arundel is a DevOps consultant, which means he helps people build world-
class web operaons teams and infrastructure and has fun doing it. He was formerly a
senior operaons engineer at global telco Verizon, designing resilient, high-performance
infrastructures for major corporaons such as Ford, McDonald's, and Bank of America. He is
now an independent consultant, working closely with selected clients to deliver web-scale
performance and enterprise-grade resilience on a startup budget.
He likes wring books, especially about Puppet (Puppet 2.7 Cookbook and Puppet 3
Cookbook are available from Packt as well). It seems that at least some people enjoy reading
them, or maybe they just like the pictures. He also provides training and coaching on Puppet
and DevOps, which, it turns out, is far harder than simply doing the work himself.
O the clock, he is a medal-winning compeve rie and pistol shooter and a decidedly
uncompeve piano player. He lives in a small coage in Cornwall, England and believes,
like Cicero, that if you have a garden and a library, then you have everything you need.
You may like to follow him on Twier at @bitfield.
My grateful thanks are due to Jo Rhe, who made innumerable improvements and
suggesons to this book, and whose Puppet experse and clarity of wring I can only strive
to emulate. Also to the original Puppet master, Luke Kanies, who created a conguraon
management tool that sucks less, and my many other friends at Puppet. Many of the key
ideas in this book came from them and others including Przemyslaw 'SoboL' Sobieski,
Peter Bleeck, and Igor Galić.
The techniques and examples in the book come largely from real producon codebases, of
my consulng clients and others, and were developed with the indispensable assistance of
my friends and colleagues Jon Larkowski, Jusn Domingus, Walter Smith, Ian Shaw, and Mike
Thomas. Special thanks are also due to the Perseids Project at Tus University, and most of
all to the inesmable Bridget Almas, who paently read and tested everything in the book
several mes and made many valuable suggesons, not to menon providing connuous
moral support, love, and guidance throughout the wring process. This book is for her.
About the Reviewer
Jo Rhe is a DevOps architect with more than 25 years of experience conceptualizing
and delivering large-scale Internet services. He creates automaon and infrastructure
to accelerate deployment and minimize outages.
Jo has been using, promong, and enhancing conguraon management systems for over
20 years. He builds improvements and plugins for Puppet, Mcollecve, Chef, Ansible, Docker,
and many other DevOps tools.
Jo is the author of the following books:
Learning Puppet 4 by O'Reilly
Learning MCollecve by O'Reilly
Instant Puppet 3 Starter by Packt Publishing
I'd like to thank the Puppet community for their never-ending inspiraon
and support.
eBooks, discount offers, and more
Did you know that Packt oers eBook versions of every book published, with PDF and ePub
les available? You can upgrade to the eBook version at www.PacktPub.com and as a print
book customer, you are entled to a discount on the eBook copy. Get in touch with us at
customercare@packtpub.com for more details.
At www.PacktPub.com, you can also read a collecon of free technical arcles, sign up for
a range of free newsleers and receive exclusive discounts and oers on Packt books and
Do you need instant soluons to your IT quesons? PacktLib is Packt's online digital book
library. Here, you can search, access, and read Packt's enre library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial
process. To help us improve, please leave us an honest review on this book's Amazon page
at https://www.amazon.com/dp/178847290X.
If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@
packtpub.com. We award our regular reviewers with free eBooks and videos in exchange
for their valuable feedback. Help us be relentless in improving our products!
[ i ]
Table of Contents
Preface xi
Chapter 1: Geng started with Puppet 1
Why do we need Puppet anyway? 2
Keeping the conguraon synchronized 2
Repeang changes across many servers 3
Self-updang documentaon 3
Version control and history 4
Why not just write shell scripts? 4
Why not just use containers? 4
Why not just use serverless? 5
Conguraon management tools 5
What is Puppet? 5
Resources and aributes 6
Puppet architectures 7
Geng ready for Puppet 7
Installing Git and downloading the repo 7
Installing VirtualBox and Vagrant 8
Running your Vagrant VM 8
Troubleshoong Vagrant 9
Summary 9
Chapter 2: Creang your rst manifests 11
Hello, Puppet – your rst Puppet manifest 12
Understanding the code 12
Modifying exisng les 13
Dry-running Puppet 14
How Puppet applies the manifest 15
Creang a le of your own 15
Table of Contents
[ ii ]
Managing packages 16
How Puppet applies the manifest 17
Exercise 17
Querying resources with the puppet resource 17
Services 18
Geng help on resources with puppet describe 18
The package-le-service paern 19
Nofying a linked resource 19
Resource ordering with require 20
Summary 21
Chapter 3: Managing your Puppet code with Git 23
What is version control? 24
Tracking changes 24
Sharing code 25
Creang a Git repo 25
Making your rst commit 26
How oen should I commit? 27
Branching 28
Distribung Puppet manifests 28
Creang a GitHub account and project 29
Pushing your repo to GitHub 30
Cloning the repo 31
Fetching and applying changes automacally 32
Wring a manifest to set up regular Puppet runs 32
Applying the run-puppet manifest 33
The run-puppet script 33
Tesng automac Puppet runs 34
Managing mulple nodes 34
Summary 35
Chapter 4: Understanding Puppet resources 37
Files 38
The path aribute 38
Managing whole les 38
Ownership 39
Permissions 40
Directories 40
Trees of les 41
Symbolic links 41
Table of Contents
[ iii ]
Packages 42
Uninstalling packages 42
Installing specic versions 42
Installing the latest version 43
Installing Ruby gems 43
Installing gems in Puppet's context 44
Using ensure_packages 45
Services 45
The hasstatus aribute 45
The paern aribute 46
The hasrestart and restart aributes 46
Users 47
Creang users 48
The user resource 48
The group resource 49
Managing SSH keys 49
Removing users 50
Cron resources 51
Aributes of the cron resource 52
Randomizing cron jobs 52
Removing cron jobs 53
Exec resources 53
Automang manual interacon 54
Aributes of the exec resource 54
The user aribute 56
The onlyif and unless aributes 56
The refreshonly aribute 57
The logoutput aribute 59
The meout aribute 59
How not to misuse exec resources 59
Summary 61
Chapter 5: Variables, expressions, and facts 63
Introducing variables 64
Using Booleans 64
Interpolang variables in strings 65
Creang arrays 65
Declaring arrays of resources 66
Understanding hashes 67
Seng resource aributes from a hash 68
Table of Contents
[ iv ]
Introducing expressions 68
Meeng Puppet's comparison operators 69
Introducing regular expressions 69
Using condional expressions 70
Making decisions with if statements 70
Choosing opons with case statements 71
Finding out facts 72
Using the facts hash 72
Running the facter command 72
Accessing hashes of facts 73
Referencing facts in expressions 74
Using memory facts 74
Discovering networking facts 75
Providing external facts 75
Creang executable facts 76
Iterang over arrays 77
Using the each funcon 78
Iterang over hashes 79
Summary 79
Chapter 6: Managing data with Hiera 81
Why Hiera? 82
Data needs to be maintained 82
Sengs depend on nodes 82
Operang systems dier 82
The Hiera way 83
Seng up Hiera 83
Adding Hiera data to your Puppet repo 85
Troubleshoong Hiera 85
Querying Hiera 86
Typed lookups 86
Types of Hiera data 86
Single values 87
Boolean values 87
Arrays 87
Hashes 88
Interpolaon in Hiera data 88
Using lookup() 89
Using alias() 89
Using literal() 90
Table of Contents
[ v ]
The hierarchy 90
Dealing with mulple values 91
Merge behaviors 91
Data sources based on facts 92
What belongs in Hiera? 93
Creang resources with Hiera data 93
Building resources from Hiera arrays 94
Building resources from Hiera hashes 94
The advantages of managing resources with Hiera data 97
Managing secret data 97
Seng up GnuPG 98
Adding an encrypted Hiera source 99
Creang an encrypted secret 100
How Hiera decrypts secrets 101
Eding or adding encrypted secrets 102
Distribung the decrypon key 103
Summary 103
Chapter 7: Mastering modules 105
Using Puppet Forge modules 106
What is the Puppet Forge? 106
Finding the module you need 106
Using r10k 107
Understanding the Puppeile 109
Managing dependencies with generate-puppeile 109
Using modules in your manifests 110
Using puppetlabs/mysql 110
Using puppetlabs/apache 113
Using puppet/archive 116
Exploring the standard library 117
Safely installing packages with ensure_packages 118
Modifying les in place with le_line 119
Introducing some other useful funcons 120
The pry debugger 122
Wring your own modules 123
Creang a repo for your module 123
Wring the module code 124
Creang and validang the module metadata 125
Tagging your module 127
Installing your module 127
Applying your module 128
Table of Contents
[ vi ]
More complex modules 128
Uploading modules to Puppet Forge 129
Summary 130
Chapter 8: Classes, roles, and proles 131
Classes 132
The class keyword 132
Declaring parameters to classes 132
Automac parameter lookup from Hiera data 135
Parameter data types 135
Available data types 136
Content type parameters 136
Range parameters 137
Flexible data types 137
Dened resource types 138
Type aliases 140
Managing classes with Hiera 141
Using include with lookup() 141
Common and per-node classes 142
Roles and proles 143
Roles 143
Proles 144
Summary 147
Chapter 9: Managing les with templates 149
What are templates? 150
The dynamic data problem 150
Puppet template syntax 150
Using templates in your manifests 151
Referencing template les 151
Inline templates 152
Template tags 153
Computaons in templates 153
Condional statements in templates 154
Iteraon in templates 155
Iterang over Facter data 155
Iterang over structured facts 156
Iterang over Hiera data 157
Working with templates 158
Passing parameters to templates 159
Table of Contents
[ vii ]
Validang template syntax 160
Rendering templates on the command line 161
Legacy ERB templates 162
Summary 163
Chapter 10: Controlling containers 165
Understanding containers 166
The deployment problem 166
Opons for deployment 167
Introducing the container 167
What Docker does for containers 168
Deployment with Docker 169
Building Docker containers 169
The layered lesystem 170
Managing containers with Puppet 170
Managing Docker with Puppet 171
Installing Docker 171
Running a Docker container 172
Stopping a container 173
Running mulple instances of a container 174
Managing Docker images 174
Building images from Dockerles 175
Managing Dockerles 176
Building dynamic containers 178
Conguring containers with templates 178
Self-conguring containers 179
Persistent storage for containers 181
Host-mounted volumes 181
Docker volumes 182
Networking and orchestraon 184
Connecng containers 184
Container orchestraon 187
What is orchestraon? 187
What orchestraon tools are available? 188
Running Puppet inside containers 188
Are containers mini VMs or single processes? 189
Conguring containers with Puppet 189
Containers need Puppet too 190
Summary 190
Table of Contents
[ viii ]
Chapter 11: Orchestrang cloud resources 191
Introducing the cloud 192
Automang cloud provisioning 192
Using CloudFormaon 192
Using Terraform 193
Using Puppet 193
Seng up an Amazon AWS account 193
Creang an AWS account 194
Creang an IAM policy 194
Creang an IAM user 195
Storing your AWS credenals 197
Geng ready to use puppetlabs/aws 197
Creang a key pair 198
Installing the puppetlabs/aws module 199
Installing the AWS SDK gem 199
Creang EC2 instances with Puppet 199
Choosing an Amazon Machine Image (AMI) 200
Creang the EC2 instance 200
Accessing your EC2 instance 202
VPCs, subnets, and security groups 202
The ec2_securitygroup resource 203
The ec2_instance resource 204
Managing custom VPCs and subnets 205
Creang an instance in a custom VPC 205
The ec2_vpc resource 206
The ec2_vpc_internet_gateway resource 206
The ec2_vpc_routetable resource 207
The ec2_vpc_subnet resource 208
Other AWS resource types 209
Provisioning AWS resources from Hiera data 209
Iterang over Hiera data to create resources 210
Cleaning up unused resources 212
Summary 213
Chapter 12: Pung it all together 215
Geng the demo repo 216
Copying the repo 216
Understanding the demo repo 217
The control repo 217
Module management 217
Classes 218
Table of Contents
[ ix ]
Roles 219
Proles 219
Users and access control 220
SSH conguraon 221
Sudoers conguraon 223
Time zone and clock synchronizaon 224
Puppet conguraon 225
The bootstrap process 227
Adapng the repo for your own use 230
Conguring users 231
Adding per-node data les and role classes 231
Modifying the bootstrap credenals 232
Bootstrapping a new node 232
Bootstrapping a Vagrant VM 232
Bootstrapping physical or cloud nodes 232
Using other distribuons and providers 233
Summary 233
The beginning 234
Index 235
There are many bad ways to write a technical book. One simply rehashes the ocial
documentaon. Another walks the reader through a large and complex example, which
doesn't necessarily do anything useful, except show how clever the author is. Yet another
exhausvely sets out every available feature of the technology, and every possible way you
can use them, without much guidance as to which features you'll really use, or which are
best avoided.
Like you, I read a lot of technical books as part of my job. I don't need a paraphrase of the
documentaon: I can read it online. I also don't want huge blocks of code for something that
I don't need to do. And I certainly don't want an uncrical exposion of every single feature.
What I do want is for the author to give me a cogent and readable explanaon of how the
tool works, in enough detail that I can get started using it straight away, but not so much
detail that I get bogged down. I want to learn about features in the order in which I'm
likely to use them, and I want to be able to start building something that runs and delivers
business value from the very rst chapters.
That's what you can expect from this book. Whether you're a developer, a system
administrator, or merely Puppet-curious, you're going to learn Puppet skills you can put into
pracce right away. Without going into lots of theory or background detail, I'll show you
how to install packages and cong les, create users, set up scheduled jobs, provision cloud
instances, build containers, and so on. Every example deals with something real and praccal
that you're likely to need in your work, and you'll see the complete Puppet code to make it
happen, along with step-by-step instrucons for what to type and what output you'll see. All
the examples are available in a GitHub repo for you to download and adapt.
[ xii ]
Aer each exercise, I'll explain in detail what each line of code does and how it works, so that
you can adapt it to your own purposes, and feel condent that you understand everything
that's happened. By the end of the book, you will have all the skills you need to do real,
useful, everyday work with Puppet, and there's a complete demo Puppet repository you can
use to get your infrastructure up and running with minimum eort.
So let's get started.
What this book covers
Chapter 1, Geng started with Puppet, introduces Puppet and gets you up and running with
the Vagrant virtual machine that accompanies this book.
Chapter 2, Creang your rst manifests, shows you how Puppet works, and how to write
code to manage packages, les, and services.
Chapter 3, Managing your Puppet code with Git, introduces the Git version control tool,
shows you how to create a repository to store your code, and how to distribute it to your
Puppet-managed nodes.
Chapter 4, Understanding Puppet resources, goes into more detail about the package,
file, and service resources, as well as introducing resources to manage users, SSH keys,
scheduled jobs, and commands.
Chapter 5, Variables, expressions, and facts, introduces Puppet's variables, data types,
expressions, and condional statements, shows you how to get data about the node using
Facter, and how to create your own custom facts.
Chapter 6, Managing data with Hiera, explains Puppet's key-value database and how to use
it to store and retrieve data, including secrets, and how to create Puppet resources from
Hiera data.
Chapter 7, Mastering modules, teaches you how to install ready-to-use modules from the
Puppet Forge using the r10k tool, introduces you to four key modules including the standard
library, and shows you how to build your own modules.
Chapter 8, Classes, roles, and proles, introduces you to classes and dened resource types,
and shows you the best way to organize your Puppet code using roles and proles.
Chapter 9, Managing les with templates, shows you how to build complex conguraon
les with dynamic data using Puppet's EPP template mechanism.
Chapter 10, Controlling containers, introduces Puppet's powerful new support for
Docker containers, and shows you how to download, build, and run containers using
Puppet resources.
[ xiii ]
Chapter 11, Orchestrang cloud resources, explains how you can use Puppet to provision
cloud servers on Amazon AWS, and introduces a fully-automated cloud infrastructure based
on Hiera data.
Chapter 12, Pung it all together, takes you through a complete example Puppet
infrastructure that you can download and modify for your own projects, using ideas
from all the previous chapters.
What you need for this book
You'll need a reasonably modern computer system and access to the Internet. You won't
need to be a Unix expert or an experienced sysadmin; I'll assume you can install soware,
run commands, and edit les, but otherwise I'll explain everything you need as we go.
Who this book is for
The main audience for this book are those who are new to Puppet, including system
administrators and developers who are looking to manage computer server systems for
conguraon management. No prior programming or system administraon experience is
assumed. However, if you have used Puppet before, you'll get a thorough grounding in all the
latest features and modules, and I hope you'll sll nd plenty of new things to learn.
In this book, you will nd a number of styles of text that disnguish between dierent
kinds of informaon. Here are some examples of these styles, and an explanaon of
their meaning.
Code words in text, database table names, folder names, lenames, le extensions,
pathnames, dummy URLs, user input, and Twier handles are shown as follows:
"Puppet can manage les on a node using the file resource"
A block of code is set as follows:
file { '/tmp/hello.txt':
ensure => file,
content => "hello, world\n",
[ xiv ]
When we wish to draw your aenon to a parcular part of a code block, the relevant lines
or items are set in bold:
file { '/tmp/hello.txt':
ensure => file,
content => "hello, world\n",
Any command-line input or output is wrien as follows:
sudo puppet apply /vagrant/examples/file_hello.pp
Notice: Compiled catalog for ubuntu-xenial in environment production
in 0.07 seconds
New terms and important words are shown in bold. Words that you see on the screen, in
menus or dialog boxes for example, appear in the text like this: "In the AWS console, select
VPC from the Services menu".
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—
what you liked or disliked. Reader feedback is important for us as it helps us develop tles
that you will really get the most out of.
To send us general feedback, simply e-mail feedback@packtpub.com, and menon the
book's tle in the subject of your message.
If there is a topic that you have experse in and you are interested in either wring or
contribung to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you
to get the most from your purchase.
[ xv ]
Downloading the example code
You can download the example code les for this book from your account at
http://www.packtpub.com. If you purchased this book elsewhere, you can visit
http://www.packtpub.com/support and register to have the les e-mailed
directly to you.
You can download the code les by following these steps:
1. Log in or register to our website using your e-mail address and password.
2. Hover the mouse pointer on the SUPPORT tab at the top.
3. Click on Code Downloads & Errata.
4. Enter the name of the book in the Search box.
5. Select the book for which you're looking to download the code les.
6. Choose from the drop-down menu where you purchased this book from.
7. Click on Code Download.
You can also download the code les by clicking on the Code Files buon on the book's
webpage at the Packt Publishing website. This page can be accessed by entering the book's
name in the Search box. Please note that you need to be logged in to your Packt account.
Once the le is downloaded, please make sure that you unzip or extract the folder using the
latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at the following URLs:
You can use the code bundle on GitHub from the Packt Publishing repository as well:
We also have other code bundles from our rich catalog of books and videos available at
https://github.com/PacktPublishing/. Check them out!
[ xvi ]
Although we have taken every care to ensure the accuracy of our content, mistakes do
happen. If you nd a mistake in one of our books—maybe a mistake in the text or the code—
we would be grateful if you could report this to us. By doing so, you can save other readers
from frustraon and help us improve subsequent versions of this book. If you nd any errata,
please report them by vising http://www.packtpub.com/submit-errata, selecng
your book, clicking on the Errata Submission Form link, and entering the details of your
errata. Once your errata are veried, your submission will be accepted and the errata will
be uploaded to our website or added to any list of exisng errata under the Errata secon of
that tle.
To view the previously submied errata, go to https://www.packtpub.com/books/
content/support and enter the name of the book in the search eld. The required
informaon will appear under the Errata secon.
Piracy of copyrighted material on the Internet is an ongoing problem across all media.
At Packt, we take the protecon of our copyright and licenses very seriously. If you come
across any illegal copies of our works in any form on the Internet, please provide us with the
locaon address or website name immediately so that we can pursue a remedy.
Please contact us at copyright@packtpub.com with a link to the suspected pirated material.
We appreciate your help in protecng our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at
questions@packtpub.com, and we will do our best to address the problem.
[ 1 ]
Getting started with Puppet
For a list of all the ways technology has failed to improve the quality of life,
please press three.
—Alice Kahn
In this chapter, you'll learn about some of the challenges of managing conguraon on
servers, some common soluons to these problems, and how automaon tools such as
Puppet can help. You'll also learn how to download the GitHub repository containing all of
the source code and examples in this book, how to set up your own Vagrant virtual machine
to run the code, and how to download and install Puppet.
Whether you're a system administrator, a developer who needs to wrangle servers from me
to me, or just someone who's annoyed at how long it takes to deploy a new app, you'll have
come across the kind of problems Puppet is designed to solve.
Geng started with Puppet
[ 2 ]
Why do we need Puppet anyway?
Managing applicaons and services in producon is hard work, and there are a lot of steps
involved. To start with, you need some servers to serve the services. Luckily, these are readily
available from your local cloud provider, at low, low prices. So you've got a server, with a
base operang system installed on it, and you can log into it. So now what? Before you can
deploy, you need to do a number of things:
Add user accounts and passwords
Congure security sengs and privileges
Install all the packages needed to run the app
Customize the conguraon les for each of these packages
Create databases and database user accounts; load some inial data
Congure the services that should be running
Deploy the app code and stac assets
Restart any aected services
Congure the machine for monitoring
That's a lot to do—and for the next server you build, you'll need to do the exact same things
all over again. There's something not right about that. Shouldn't there be an easier soluon
to this problem?
Wouldn't it be nice if you could write an executable specicaon of how the server should
be set up, and you could apply it to as many machines as you liked?
Keeping the conguration synchronized
Seng up servers manually is tedious. Even if you're the kind of person who enjoys tedium,
though, there's another problem to consider. What happens the next me you set up a
server, a few weeks or months later?
Your careful notes will no longer be up to date with reality. While you were on vacaon, the
developers installed a couple of new libraries that the app now depends on—I guess they
forgot to tell you! They are under a lot of schedule pressure, of course. You could send out
a sternly worded email demanding that people update the build document whenever they
change something, and people might even comply with that. But even if they do update the
documentaon, no-one actually tests the new build process from scratch, so when you come
to do it, you'll nd it doesn't work anymore. Turns out that if you just upgrade the database
in place, it's ne, but if you install the new version on a bare server, it's not.
Chapter 1
[ 3 ]
Also, since the build document was updated, a new version of a crical library was released
upstream. Because you always install the latest version as part of the build, your new server
is now subtly dierent to the old one. This will lead to subtle problems which will take you
three days, or three boles of whiskey, to debug.
By the me you have four or ve servers, they're all a lile dierent. Which is the
authoritave one? Or are they all slightly wrong? The longer they're around, the more they
will dri apart. You wouldn't run four or ve dierent versions of your app code at once, so
what's up with that? Why is it acceptable for server conguraon to be in a mess like this?
Wouldn't it be nice if the state of conguraon on all your machines could be regularly
checked and synchronized with a central, standard version?
Repeating changes across many servers
Humans just aren't good at accurately repeang complex tasks over and over; that's why we
invented robots. It's easy to make mistakes, miss things out, or be interrupted and lose track
of what you've done.
Changes happen all the me, and it becomes increasingly dicult to keep things up to date
and in sync as your infrastructure grows. Again, when you make a change to your app code,
you don't go and make that change manually with a text editor on each server. You change
it once and roll it out everywhere. Isn't your rewall setup just as much part of your code as
your user model?
Wouldn't it be nice if you only had to make changes in one place, and they rolled out to
your whole network automacally?
Self-updating documentation
In real life, we're too busy to stop every ve minutes and document what we just did.
As we've seen, that documentaon is of limited use anyway, even if it's kept fanacally
The only reliable documentaon, in fact, is the state of the servers themselves. You can look
at a server to see how it's congured, but that only applies while you sll have the machine.
If something goes wrong and you can't access the machine, or the data on it, your only
opon is to reconstruct the lost conguraon from scratch.
Wouldn't it be nice if you had a clear, human-readable build procedure which was
independent of your servers, and was guaranteed to be up to date, because the servers
are actually built from it?
Geng started with Puppet
[ 4 ]
Version control and history
When you're making manual, ad hoc changes to systems, you can't roll them back to a point
in me. It's hard to undo a whole series of changes; you don't have a way of keeping track of
what you did and how things changed.
This is bad enough when there's just one of you. When you're working in a team, it gets even
worse, with everybody making independent changes and geng in each other's way.
When you have a problem, you need a way to know what changed and when, and who did it.
And you also need to be able to set your conguraon back to any previously stable state.
Wouldn't it be nice if you could go back in me?
Why not just write shell scripts?
Many people manage conguraon with shell scripts, which is beer than doing it manually,
but not much. Some of the problems with shell scripts include the following:
Fragile and non-portable
Hard to maintain
Not easy to read as documentaon
Very site-specic
Not a good programming language
Hard to apply changes to exisng servers
Why not just use containers?
Containers! Is there any word more thrilling to the human soul? Many people feel as though
containers are going to make conguraon management problems just go away. This feeling
rarely lasts beyond the rst few hours of trying to containerize an app. Yes, containers
make it easy to deploy and manage soware, but where do containers come from? It
turns out someone has to build and maintain them, and that means managing Dockerles,
volumes, networks, clusters, image repositories, dependencies, and so on. In other words,
conguraon. There is an axiom of computer science which I just invented, called The Law
of Conservaon of Pain. If you save yourself pain in one place, it pops up again in another.
Whatever cool new technology comes along, it won't solve all our problems; at best, it will
replace them with refreshingly dierent problems.
Yes, containers are great, but the truth is, container-based systems require even more
conguraon management. You need to congure the nodes that run the containers,
build and update the container images based on a central policy, create and maintain the
container network and clusters, and so on.
Chapter 1
[ 5 ]
Why not just use serverless?
If containers are powered by magic pixies, serverless architectures are pure fairy dust. The
promise is that you just push your app to the cloud, and the cloud takes care of deploying,
scaling, load balancing, monitoring, and so forth. Like most things, the reality doesn't quite
live up to the markeng. Unfortunately, serverless isn't actually serverless: it just means your
business is running on servers you don't have direct control over, plus, you have higher xed
costs because you're paying someone else to run them for you. Serverless can be a good way
to get started, but it's not a long-term soluon, because ulmately, you need to own your
own conguraon.
Conguration management tools
Conguraon management (CM) tools are the modern, sensible way to manage
infrastructure as code. There are many such tools available, all of which operate more or
less the same way: you specify your desired conguraon state, using editable text les and
a model of the system's resources, and the tool compares the current state of each node
(the term we use for conguraon-managed servers) with your desired state and makes any
changes necessary to bring it in line.
As with most unimportant things, there is a great deal of discussion and argument on
the Internet about which CM tool is the best. While there are signicant dierences in
approaches and capabilies between dierent tools, don't let that obscure the fact that
using a tool of any sort to manage conguraon is much beer than trying to do it by hand.
That said, while there are many CM tools available, Puppet is an excellent choice. No other
tool is more powerful, more portable, or more widely adopted. In this book, I'm going to
show you what makes Puppet so good and the things that only Puppet can do.
What is Puppet?
Puppet is two things: a language for expressing the desired state (how your nodes should be
congured), and an engine that interprets code wrien in the Puppet language and applies it
to the nodes to bring about the desired state.
What does this language look like? It's not exactly a series of instrucons, like a shell script or
a Ruby program. It's more like a set of declaraons about the way things should be. Have a
look at the following example:
package { 'curl':
ensure => installed,
Geng started with Puppet
[ 6 ]
In English, this code says, "The curl package should be installed." When you apply this
manifest (Puppet programs are called manifests), the tool will do the following:
1. Check the list of installed packages on the node to see if curl is already installed.
2. If it is, do nothing.
3. If not, install it.
Here's another example of Puppet code:
user { 'bridget':
ensure => present,
This is Puppet language for the declaraon, "The bridget user should be present."
(The keyword ensure means "the desired state of the resource is..."). Again, this results
in Puppet checking for the existence of the bridget user on the node, and creang it if
necessary. This is also a kind of documentaon that expresses human-readable statements
about the system in a formal way. The code expresses the author's desire that Bridget should
always be present.
So you can see that the Puppet program—the Puppet manifest—for your conguraon is a
set of declaraons about what things should exist, and how they should be congured.
You don't give commands, like "Do this, then do that". Rather, you describe how things
should be, and let Puppet take care of making it happen. These are two quite dierent
kinds of programming. One kind (so-called procedural style) is the tradional model used by
languages such as C, Python, shell, and so on. Puppet's is called declarave style because you
declare what the end result should be, rather than specify the steps to get there.
This means that you can apply the same Puppet manifest repeatedly to a node and the end
result will be the same, no maer how many mes you apply the manifest. It's beer to
think of Puppet manifests as a kind of specicaon, or declaraon, rather than as a program
in the tradional sense.
Resources and attributes
Puppet lets you describe conguraon in terms of resources (types of things that can exist,
such as users, les, or packages) and their aributes (appropriate properes for the type of
resource, such as the home directory for a user, or the owner and permissions for a le). You
don't have to get into the details of how resources are created and congured on dierent
plaorms. Puppet takes care of it.
The power of this approach is that a given manifest can be applied to dierent nodes, all
running dierent operang systems, and the results will be the same everywhere.
Chapter 1
[ 7 ]
Puppet architectures
It's worth nong that there are two dierent ways to use Puppet. The rst way, known as
agent/master architecture, uses a special node dedicated to running Puppet, which all other
nodes contact to get their conguraon.
The other way, known as stand-alone Puppet or masterless, does not need a special Puppet
master node. Puppet runs on each individual node and does not need to contact a central
locaon to get its conguraon. Instead, you use Git, or any other way of copying les to the
node, such as SFTP or rsync, to update the Puppet manifests on each node.
Both stand-alone and agent/master architectures are ocially supported by Puppet. It's
your choice which one you prefer to use. In this book, I will cover only the stand-alone
architecture, which is simpler and easier for most organizaons, but almost everything in the
book will work just the same whether you use agent/master or stand-alone Puppet.
To set up Puppet with an agent/master architecture, consult the
ocial Puppet documentaon.
Getting ready for Puppet
Although Puppet is inherently cross-plaorm and works with many dierent operang
systems, for the purposes of this book, I'm going to focus on just one operang system,
namely the Ubuntu 16.04 LTS distribuon of Linux, and the most recent version of Puppet,
Puppet 5. However, all the examples in the book should work on any recent operang system
or Puppet version with only minor changes.
You will probably nd that the best way to read this book is to follow along with the
examples using a Linux machine of your own. It doesn't maer whether this is a physical
server, desktop or laptop, cloud instance, or a virtual machine. I'm going to use the popular
Vagrant soware to run a virtual machine on my own computer, and you can do the same.
The public GitHub repository for this book contains a Vagranile, which you can use to get up
and running with Puppet in just a few steps.
Installing Git and downloading the repo
To get a copy of the repo that accompanies this book, follow these steps:
1. Browse to https://git-scm.com/downloads
2. Download and install the right version of Git for your operang system.
3. Run the following command:
git clone https://github.com/bitfield/puppet-beginners-guide-3.git
Geng started with Puppet
[ 8 ]
Installing VirtualBox and Vagrant
If you already have a Linux machine or cloud server you'd like to use for working through the
examples, skip this secon and move on to the next chapter. If you'd like to use VirtualBox
and Vagrant to run a local virtual machine (VM) on your computer to use with the examples,
follow these instrucons:
1. Browse to https://www.virtualbox.org/
2. Download and install the right version of VirtualBox for your operang system
3. Browse to https://www.vagrantup.com/downloads.html
4. Select the right version of Vagrant for your operang system: OS X, Windows,
and so on
5. Follow the instrucons to install the soware
Running your Vagrant VM
Once you have installed Vagrant, you can start the Puppet Beginner's Guide virtual machine:
1. Run the following commands:
cd puppet-beginners-guide-3
Vagrant will begin downloading the base box. Once that has booted, it will install
Puppet. This may take a while, but once the installaon is complete, the virtual
machine will be ready to use.
2. Connect to the VM with the following command:
vagrant ssh
3. You now have a command-line shell on the VM. Check that Puppet is installed
and working by running the following command (you may get a dierent version
number, which is ne):
puppet --version
If you're using Windows, you may need to install the PuTTY software to
connect to your VM. There is some helpful advice about using Vagrant on
Windows at:
Chapter 1
[ 9 ]
Troubleshooting Vagrant
If you have any problems running the VM, look for help on the VirtualBox or Vagrant
websites. In parcular, if you have an older machine, you may see a message like the
VT-x/AMD-V hardware acceleration is not available on your system. Your
64-bit guest will fail to detect a 64-bit CPU and will not be able to
Your computer may have a BIOS seng to enable 64-bit hardware virtualizaon (depending
on the manufacturer, the trade name for this is either VT-x or AMD-V). Enabling this feature
may x the problem. If not, you can try the 32-bit version of the Vagrant box instead. Edit the
le named Vagrantfile in the Git repository, and comment out the following line with a
leading # character:
config.vm.box = "ubuntu/xenial64"
Uncomment the following line by removing the leading # character:
# config.vm.box = "ubuntu/xenial32"
Now re-run the scripts/start_vagrant.sh command.
In this chapter, we looked at the various problems that conguraon management tools
can help solve, and how Puppet in parcular models the aspects of system conguraon.
We checked out the Git repository of example code for this book, installed VirtualBox and
Vagrant, started the Vagrant VM, and ran Puppet for the rst me.
In the next chapter, we'll write our rst Puppet manifests, get some insight into the structure
of Puppet resources and how they're applied, and learn about the package, file, and
service resources.
[ 11 ]
Creating your rst manifests
Beginnings are such delicate times.
—Frank Herbert, 'Dune'
In this chapter, you'll learn how to write your rst manifest with Puppet, and how to put
Puppet to work conguring a server. You'll also understand how Puppet compiles and applies
a manifest. You'll see how to use Puppet to manage the contents of les, how to install
packages, and how to control services.
Creang your rst manifests
[ 12 ]
Hello, Puppet – your rst Puppet manifest
The rst example program in any programming language, by tradion, prints hello,
world. Although we can do that easily in Puppet, let's do something a lile more ambious,
and have Puppet create a le on the server containing that text.
On your Vagrant box, run the following command:
sudo puppet apply /examples/file_hello.pp
Notice: Compiled catalog for ubuntu-xenial in environment production
in 0.07 seconds
Notice: /Stage[main]/Main/File[/tmp/hello.txt]/ensure: defined content
as '{md5}22c3683b094136c3398391ae71b20f04'
Notice: Applied catalog in 0.01 seconds
We can ignore the output from Puppet for the moment, but if all has gone well, we should
be able to run the following command:
cat /tmp/hello.txt
hello, world
Understanding the code
Let's look at the example code to see what's going on (run cat /example/file_hello.pp,
or open the le in a text editor):
file { '/tmp/hello.txt':
ensure => file,
content => "hello, world\n",
The code term file begins a resource declaraon for a file resource. A resource is some
bit of conguraon that you want Puppet to manage: for example, a le, user account, or
package. A resource declaraon follows this paern:
Chapter 2
[ 13 ]
Resource declaraons will make up almost all of your Puppet manifests, so it's important to
understand exactly how they work:
RESOURCE_TYPE indicates the type of resource you're declaring; in this case, it's a
TITLE is the name that Puppet uses to idenfy the resource internally. Every
resource must have a unique tle. With file resources, it's usual for this to be the
full path to the le: in this case, /tmp/hello.
The remainder of this block of code is a list of aributes that describe how the resource
should be congured. The aributes available depend on the type of the resource. For a le,
you can set aributes such as content, owner, group, and mode, but one aribute that
every resource supports is ensure.
Again, the possible values for ensure are specic to the type of resource. In this case, we
use file to indicate that we want a regular le, as opposed to a directory or symlink:
ensure => file,
Next, to put some text in the le, we specify the content aribute:
content => "hello, world\n",
The content aribute sets the contents of a le to a string value you provide. Here, the
contents of the le are declared to be hello, world, followed by a newline character (in
Puppet strings, we write the newline character as \n).
Note that content species the enre content of the le; the string you provide will replace
anything already in the le, rather than be appended to it.
Modifying existing les
What happens if the le already exists when Puppet runs and it contains something else?
Will Puppet change it?
sudo sh -c 'echo "goodbye, world" >/tmp/hello.txt'
cat /tmp/hello.txt
goodbye, world
sudo puppet apply /examples/file_hello.pp
cat /tmp/hello.txt
hello, world
The answer is yes. If any aribute of the le, including its contents, doesn't match the
manifest, Puppet will change it so that it does.
Creang your rst manifests
[ 14 ]
This can lead to some surprising results if you manually edit a le managed by Puppet. If
you make changes to a le without also changing the Puppet manifest to match, Puppet will
overwrite the le the next me it runs, and your changes will be lost.
So it's a good idea to add a comment to les that Puppet is managing: something like the
# This file is managed by Puppet - any manual edits will be lost
Add this to Puppet's copy of the le when you rst deploy it, and it will remind you and
others not to make manual changes.
Dry-running Puppet
Because you can't necessarily tell in advance what applying a Puppet manifest will change on
the system, it's a good idea to do a dry run rst. Adding the --noop ag to puppet apply
will show you what Puppet would have done, without actually changing anything:
sudo sh -c 'echo "goodbye, world" >/tmp/hello.txt'
sudo puppet apply --noop /examples/file_hello.pp
Notice: Compiled catalog for ubuntu-xenial in environment production
in 0.04 seconds
Notice: /Stage[main]/Main/File[/tmp/hello.txt]/content: current_value
{md5}7678..., should be {md5}22c3... (noop)
Puppet decides whether or not a file resource needs updang, based on its MD5 hash
sum. In the previous example, Puppet reports that the current value of the hash sum for
/tmp/hello.txt is 7678..., whereas according to the manifest, it should be 22c3....
Accordingly, the le will be changed on the next Puppet run.
If you want to see what change Puppet would actually make to the le, you can use the
--show_diff opon:
sudo puppet apply --noop --show_diff /examples/file_hello.pp
Notice: Compiled catalog for ubuntu-xenial in environment production
in 0.04 seconds
Notice: /Stage[main]/Main/File[/tmp/hello.txt]/content:
--- /tmp/hello.txt 2017-02-13 02:27:13.186261355 -0800
+++ /tmp/puppet-file20170213-3671-2yynjt 2017-02-13
02:30:26.561834755 -0800
@@ -1 +1 @@
-goodbye, world
+hello, world
Chapter 2
[ 15 ]
These opons are very useful when you want to make sure that your Puppet manifest will
aect only the things you're expecng it to—or, somemes, when you want to check if
something has been changed outside Puppet without actually undoing the change.
How Puppet applies the manifest
Here's how your manifest is processed. First, Puppet reads the manifest and the list of
resources it contains (in this case, there's just one resource), and compiles these into a
catalog (an internal representaon of the desired state of the node).
Puppet then works through the catalog, applying each resource in turn:
1. First, it checks if the resource exists on the server. If not, Puppet creates it. In the
example, we've declared that the le /tmp/hello.txt should exist. The rst me
you run sudo puppet apply, this won't be the case, so Puppet will create the le
for you.
2. Then, for each resource, it checks the value of each aribute in the catalog against
what actually exists on the server. In our example, there's just one aribute:
content. We've specied that the content of the le should be hello, world\n.
If the le is empty or contains something else, Puppet will overwrite the le with
what the catalog says it should contain.
In this case, the le will be empty the rst me you apply the catalog, so Puppet will write
the string hello, world\n into it.
We'll go on to examine the file resource in much more detail in later chapters.
Creating a le of your own
Create your own manifest le (you can name it anything you like, so long as the le extension
is .pp). Use a file resource to create a le on the server with any contents you like.
Apply the manifest with Puppet and check that the le is created and contains the text you
Edit the le directly and change the contents, then re-apply Puppet and check that it changes
the le back to what the manifest says it should contain.
Creang your rst manifests
[ 16 ]
Managing packages
Another key resource type in Puppet is the package. A major part of conguring servers
by hand involves installing packages, so we will also be using packages a lot in Puppet
manifests. Although every operang system has its own package format, and dierent
formats vary quite a lot in their capabilies, Puppet represents all these possibilies with
a single package type. If you specify in your Puppet manifest that a given package should
be installed, Puppet will use the appropriate package manager commands to install it on
whatever plaorm it's running on.
As you've seen, all resource declaraons in Puppet follow this form:
package resources are no dierent. The RESOURCE_TYPE is package, and the only
aribute you usually need to specify is ensure, and the only value it usually needs to take is
package { 'cowsay':
ensure => installed,
Try this example:
sudo puppet apply /examples/package.pp
Notice: Compiled catalog for ubuntu-xenial in environment production
in 0.52 seconds
Notice: /Stage[main]/Main/Package[cowsay]/ensure: created
Notice: Applied catalog in 29.53 seconds
Let's see whether cowsay is installed:
cowsay Puppet rules!
< Puppet rules! >
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Now that's a useful package!
Chapter 2
[ 17 ]
How Puppet applies the manifest
The tle of the package resource is cowsay, so Puppet knows that we're talking about a
package named cowsay.
The ensure aribute governs the installaon state of packages: unsurprisingly, installed
tells Puppet that the package should be installed.
As we saw in the earlier example, Puppet processes this manifest by examining each
resource in turn and checking its aributes on the server against those specied in the
manifest. In this case, Puppet will look for the cowsay package to see whether it's installed.
It is not, but the manifest says it should be, so Puppet carries out all the necessary acons to
make reality match the manifest, which here means installing the package.
It's sll early on in the book, but you can already do a great deal with Puppet!
If you can install packages and manage the contents of les, you can get a very
long way towards seng up any kind of server conguraon you might need. If
you were to stop reading right here (which would be a shame, but we're all busy
people), you would sll be able to use Puppet to automate a large part of the
conguraon work you will encounter. But Puppet can do much more.
Create a manifest that uses the package resource to install any soware you nd useful
for managing servers. Here are some suggesons: tmux, sysdig, atop, htop, and dstat.
Querying resources with the puppet resource
If you want to see what version of a package Puppet thinks you have installed, you can use
the puppet resource tool:
puppet resource package openssl
package { 'openssl':
ensure => '1.0.2g-1ubuntu4.8',
puppet resource TYPE TITLE will output a Puppet manifest represenng the current
state of the named resource on the system. If you leave out TITLE, you'll get a manifest for
all the resources of the type TYPE. For example, if you run puppet resource package,
you'll see the Puppet code for all the packages installed on the system.
Creang your rst manifests
[ 18 ]
puppet resource even has an interacve conguraon feature. To use
it, run the following command:
puppet resource -e package openssl
If you run this, Puppet will generate a manifest for the current state of the
resource, and open it in an editor. If you now make changes and save it,
Puppet will apply that manifest to make changes to the system. This is a
fun lile feature, but it would be rather me-consuming to do your enre
conguraon this way.
The third most important Puppet resource type is the service: a long-running process that
either does some connuous kind of work, or waits for requests and then acts on them.
For example, on most systems, the sshd process runs all the me and listens for SSH login
Puppet models services with the service resource type. The service resources look like
the following example (you can nd this in service.pp in the /examples/ directory. From
now on, I'll just give the lename of each example, as they are all in the same directory):
service { 'sshd':
ensure => running,
enable => true,
The ensure parameter governs whether the service should be running or not. If its value is
running, then as you might expect, Puppet will start the service if it is not running. If you
set ensure to stopped, Puppet will stop the service if it is running.
Services may also be set to start when the system boots, using the enable parameter. If
enable is set to true, the service will start at boot. If, on the other hand, enable is set to
false, it will not. Generally speaking, unless there's a good reason not to, all services should
be set to start at boot.
Getting help on resources with puppet describe
If you're struggling to remember all the dierent aributes of all the dierent resources,
Puppet has a built-in help feature that will remind you. Run the following command, for
puppet describe service
Chapter 2
[ 19 ]
This will give a descripon of the service resource, along with a complete list of aributes
and allowed values. This works for all built-in resource types as well as many provided
by third-party modules. To see a list of all the available resource types, run the following
puppet describe --list
The package-le-service pattern
It's very common for a given piece of soware to require these three Puppet resource
types: the package resource installs the soware, the file resource deploys one or
more conguraon les required for the soware, and the service resource runs
the soware itself.
Here's an example using the MySQL database server (package_file_service.pp):
package { 'mysql-server':
ensure => installed,
notify => Service['mysql'],
file { '/etc/mysql/mysql.cnf':
source => '/examples/files/mysql.cnf',
notify => Service['mysql'],
service { 'mysql':
ensure => running,
enable => true,
The package resource makes sure the mysql-server package is installed.
The cong le for MySQL is /etc/mysql/mysql.cnf, and we use a file resource to copy
this le from the Puppet repo so that we can control MySQL sengs.
Finally, the service resource ensures that the mysql service is running.
Notifying a linked resource
You might have noced a new aribute, called notify, in the file resource in the previous
file { '/etc/mysql/mysql.cnf':
source => '/examples/files/mysql.cnf',
notify => Service['mysql'],
Creang your rst manifests
[ 20 ]
What does this do? Imagine you've made a change to the mysql.cnf le and applied
this change with Puppet. The updated le will be wrien to a disk, but because the mysql
service is already running, it has no way of knowing that its cong le has changed.
Therefore, your changes will not actually take eect unl the service is restarted.
However, Puppet can do this for you if you specify the notify aribute on the file
resource. The value of notify is the resource to nofy about the change, and what that
involves depends on the type of resource that's being noed. When it's a service, the
default acon is to restart the service. (We'll nd out about the other opons in Chapter 4,
Understanding Puppet resources.)
Usually, with the package-le-service paern, the le noes the service, so whenever
Puppet changes the contents of the le, it will restart the noed service to pick up the
new conguraon. If there are several les that aect the service, they should all nofy
the service, and Puppet is smart enough to only restart the service once, however many
dependent resources are changed.
The name of the resource to nofy is specied as the resource type, capitalized, followed by
the resource tle, which is quoted and within square brackets: Service['mysql'].
Resource ordering with require
In the package-le-service example, we declared three resources: the mysql-server
package, the /etc/mysql/mysql.cnf le, and the mysql service. If you think about it,
they need to be applied in that order. Without the mysql-server package installed, there
will be no /etc/mysql/ directory to put the mysql.cnf le in. Without the package or the
cong le, the mysql service won't be able to run.
A perfectly reasonable queson to ask is, "Does Puppet apply resources in the same order
in which they're declared in the manifest?" The answer is usually yes, unless you explicitly
specify a dierent order, using the require aribute.
All resources support the require aribute, and its value is the name of another resource
declared somewhere in the manifest, specied in the same way as when using notify.
Here's the package-le-service example again, this me with the resource ordering specied
explicitly using require (package_file_service_require.pp):
package { 'mysql-server':
ensure => installed,
file { '/etc/mysql/mysql.cnf':
source => '/examples/files/mysql.cnf',
notify => Service['mysql'],
Chapter 2
[ 21 ]
require => Package['mysql-server'],
service { 'mysql':
ensure => running,
enable => true,
require => [Package['mysql-server'], File['/etc/mysql/mysql.cnf']],
You can see that the mysql.cnf resource requires the mysql-server package. The mysql
service requires both the other resources, listed as an array within square brackets.
When resources are already in the right order, you don't need to use require, as Puppet
will apply the resources in the order you declare them. However, it can be useful to specify
an ordering explicitly, for the benet of those reading the code, especially when there are
lots of resources in a manifest le.
In older versions of Puppet, resources were applied in a more or less arbitrary order, so it
was much more important to express dependencies using require. Nowadays, you won't
need to use it very much, and you'll mostly come across it in legacy code.
In this chapter, we've seen how a manifest is made up of Puppet resources. You've learned
how to use Puppet's file resource to create and modify les, how to install packages
using the package resource, and how to manage services with the service resource.
We've looked at the common package-le-service paern and seen how to use the
notify aribute on a resource to send a message to another resource indicang that its
conguraon has been updated. We've covered the use of the require aribute to make
dependencies between resources explicit, when necessary.
You've also learned to use puppet resource to inspect the current state of the system
according to Puppet, and puppet describe to get command-line help on all Puppet
resources. To check what Puppet would change on the system without actually changing it,
we've introduced the --noop and --show_diff opons to puppet apply.
In the next chapter, we'll see how to use the version control tool Git to keep track of your
manifests, we'll get an introducon to fundamental Git concepts, such as the repo and the
commit, and you'll learn how to distribute your code to each of the servers you're going to
manage with Puppet.
[ 23 ]
Managing your Puppet code with Git
We define ourselves by our actions. With each decision, we tell ourselves and
the world who we are.
—Bill Watterson
In this chapter, you'll learn how to use the Git version control system to manage your Puppet
manifests. I'll also show you how to use Git to distribute the manifests to mulple nodes, so
that you can start managing your whole network with Puppet.
Managing your Puppet code with Git
[ 24 ]
What is version control?
If you're already familiar with Git, you can save some reading by skipping ahead to the
Creang a Git repo secon. If not, here's a gentle introducon.
Even if you're the only person who works on a piece of source code (for example, Puppet
manifests), it's sll useful to be able to see what changes you made, and when. For example,
you might realize that you introduced a bug at some point in the past, and you need to
examine exactly when a certain le was modied and exactly what the change was. A version
control system lets you do that, by keeping a complete history of the changes you've made
to a set of les over me.
Tracking changes
When you're working on code with others, you also need a way to communicate with the
rest of the team about your changes. A version control tool such as Git not only tracks
everyone's changes, but lets you record a commit message, explaining what you did and
why. The following example illustrates some aspects of a good commit message:
Summarize changes in around 50 characters or less
More detailed explanatory text, if necessary. Wrap it to about 72
characters or so. In some contexts, the first line is treated as
the subject of the commit and the rest of the text as the body.
The blank line separating the summary from the body is critical
(unless you omit the body entirely); various tools like `log`,
`shortlog`, and `rebase` can get confused if you run the two together.
Explain the problem that this commit is solving. Focus on why you
are making this change as opposed to how (the code explains that).
Are there side effects or other unintuitive consequences of this
change? Here's the place to explain them.
Further paragraphs come after blank lines.
- Bullet points are okay, too
- Typically a hyphen or asterisk is used for the bullet, preceded
by a single space, with blank lines in between, but conventions
vary here
If you use an issue tracker, put references to them at the bottom,
like this:
Resolves: #123
See also: #456, #789
Chapter 3
[ 25 ]
This example is taken from Chris Beams' excellent blog post on How to
Write a Git Commit Message:
Of course, you won't oen need such a long and detailed message;
most of the me, a single line will suce. However, it's beer to give
more informaon than less.
Git also records when the change happened, who made it, what les were changed, added,
or deleted, and which lines were added, altered, or removed. As you can imagine, if you're
trying to track down a bug, and you can see a complete history of changes to the code, that's
a big help. It also means you can, if necessary, roll back the state of the code to any point in
history and examine it.
You might think this introduces a lot of extra complicaon. In fact, it's very simple. Git keeps
out of your way unl you need it, and all you have to do is write a commit message when
you decide to record changes to the code.
Sharing code
A set of les under Git version control is called a repository, which is usually equivalent to a
project. A Git repository (from now on, just repo) is also a great way to distribute your code
to others, whether privately or publicly, so that they can use it, modify it, contribute changes
back to you, or develop it in a dierent direcon for their own requirements. The public
GitHub repo for this book which we looked at in Chapter 1, Geng started with Puppet
is a good example of this. You'll be able to use this repo for working through examples
throughout the book, but you can also use it for help and inspiraon when building Puppet
manifests for your own infrastructure.
Because Git is so important for managing Puppet code, it's a good idea to get familiar with
it, and the only way to do that is to use it for real. So let's start a new Git repo we can use to
experiment with.
Creating a Git repo
It's very easy to create a Git repo. Follow these steps:
1. Make a directory to hold your versioned les using the following commands:
mkdir puppet
2. Now run the following commands to turn the directory into a Git repo:
cd puppet
git init
Initialized empty Git repository in /home/ubuntu/puppet/.git/
Managing your Puppet code with Git
[ 26 ]
Making your rst commit
You can change the les in your repo as much as you like, but Git will not know about the
changes unl you make what's called a commit. You can think of a commit as being like
a snapshot of the repo at a parcular moment, but it also stores informaon about what
changed in the repo since the previous commit. Commits are stored forever, so you will
always be able to roll back the repo to the state it was in at a certain commit, or show
what les were changed in a past commit and compare them to the state of the repo
at any other commit.
Let's make our rst commit to the new repo:
1. Because Git records not only changes to the code, but also who made them, it needs
to know who you are. Set your idencaon details for Git (use your own name and
email address, unless you parcularly prefer mine) using the following commands:
git config --global user.name "John Arundel"
git config --global user.email john@bitfieldconsulting.com
2. It's tradional for Git repos to have a README le, which explains what's in the
repo and how to use it. For the moment, let's just create this le with a placeholder
echo "Watch this space... coming soon!" >README.md
3. Run the following command:
git status
On branch master
Initial commit
Untracked files:
(use "git add <file>..." to include in what will be committed)
nothing added to commit but untracked files present (use "git add"
to track)
4. Because we've added a new le to the repo, changes to it won't be tracked by Git
unless we explicitly tell it to. We do this by using the git add command, as follows:
git add README.md
Chapter 3
[ 27 ]
5. Git now knows about this le, and changes to it will be included in the next commit.
We can check this by running git status again:
git status
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: README.md
6. The le is listed under Changes to be committed, so we can now actually make
the commit:
git commit -m 'Add README file'
[master (root-commit) ee21595] Add README file
1 file changed, 1 insertion(+)
create mode 100644 README.md
7. You can always see the complete history of commits in a repo by using the git log
command. Try it now to see the commit you just made:
git log
commit ee215951199158ef28dd78197d8fa9ff078b3579
Author: John Arundel <john@bitfieldconsulting.com>
Date: Tue Aug 30 05:59:42 2016 -0700
Add README file
How often should I commit?
A common pracce is to commit when the code is in a consistent, working state, and
have the commit include a set of related changes made for some parcular purpose. So,
for example, if you are working to x bug number 75 in your issue-tracking system, you
might make changes to quite a few separate les and then, once you're happy the work is
complete, make a single commit with a message such as:
Make nginx restart more reliable (fixes issue #75)
On the other hand, if you are making a large number of complicated changes and you are not
sure when you'll be done, it might be wise to make a few separate commits along the way,
so that if necessary you can roll the code back to a previous state. Commits cost nothing, so
when you feel a commit is needed, go ahead and make it.
Managing your Puppet code with Git
[ 28 ]
Git has a powerful feature called branching, which lets you create a parallel copy of the code
(a branch) and make changes to it independently. At any me, you can choose to merge
those changes back into the master branch. Or, if changes have been made to the master
branch in the meanme, you can incorporate those into your working branch and carry on.
This is extremely useful when working with Puppet, because it means you can switch a single
node to your branch while you're tesng it and working on it. The changes you make won't
be visible to other nodes which aren't on your branch, so there's no danger of accidentally
rolling out changes before you're ready.
Once you're done, you can merge your changes back into master and have them roll out to
all nodes.
Similarly, two or more people can work independently on their own branches, exchanging
individual commits with each other and with the master branch as they choose. This is a very
exible and useful way of working.
For more informaon about Git branching, and indeed about Git in
general, I recommend the excellent book 'Pro Git', by Sco Chacon and
Ben Straub, published by Apress. The whole book is available for free at:
Distributing Puppet manifests
So far in this book we've only applied Puppet manifests to one node, using puppet apply
with a local copy of the manifest. To manage several nodes at once, we need to distribute
the Puppet manifests to each node so that they can be applied.
There are several ways to do this, and as we saw in Chapter 1, Geng started with Puppet,
one approach is to use the agent/master architecture, where a central Puppet master server
compiles your manifests and distributes the catalog (the desired node state) to all nodes.
Another way to use Puppet is to do without the master server altogether, and use Git
to distribute manifests to client nodes, which then runs puppet apply to update their
conguraon. This stand-alone Puppet architecture doesn't require a dedicated Puppet
master server, and there's no single point of failure.
Chapter 3
[ 29 ]
Both agent/master and stand-alone architectures are ocially supported by Puppet, and
it's possible to change from one to the other if you decide you need to. The examples in
this book were developed with the stand-alone architecture, but will work just as well with
agent/master if you prefer it. There is no dierence in the Puppet manifests, language, or
structure; the only dierence is in the way the manifests are applied.
All you need for a stand-alone Puppet architecture is a Git server which each node can
connect to and clone the repo. You can run your own Git server if you like, or use a public
Git hosng service such as GitHub. For ease of explanaon, I'm going to use GitHub for this
example setup.
In the following secons, we'll create a GitHub account, push our new Puppet repo to
GitHub, and then set up our virtual machine to automacally pull any changes from the
GitHub repo and apply them with Puppet.
Creating a GitHub account and project
If you already have a GitHub account, or you're using another Git server, you can skip this
1. Browse to https://github.com/
2. Enter the username you want to use, your email address, and a password.
3. Choose the Unlimited public repositories for free plan.
4. GitHub will send you an email to verify your email address. When you get the email,
click on the vericaon link.
5. Select Start a project.
6. Enter a name for your repo (I suggest puppet, but it doesn't maer).
7. Free GitHub accounts can only create public repos, so select Public.
Be careful what informaon you put into a public Git repo, because
it can be read by anybody. Never put passwords, login credenals,
private keys, or other condenal informaon into a repo like this
unless it is encrypted. We'll see how to encrypt secret informaon
in your Puppet repo in Chapter 6, Managing data with Hiera.
8. Click Create repository.
Managing your Puppet code with Git
[ 30 ]
9. GitHub will show you a page of instrucons about how to inialize or import code
into your new repository. Look for the https URL which idenes your repo; it will
be something like this (https://github.com/pbgtest/puppet.git):
Pushing your repo to GitHub
You're now ready to take the Git repo you created locally earlier in this chapter and push it to
GitHub so that you can share it with other nodes.
1. In your repo directory, run the following commands. Aer git remote add
origin, specify the URL to your GitHub repo:
git remote add origin YOUR_REPO_URL
git push -u origin master
2. GitHub will prompt you for your username and password:
Username for 'https://github.com': pbgtest
Password for 'https://pbgtest@github.com':
Counting objects: 3, done.
Writing objects: 100% (3/3), 262 bytes | 0 bytes/s, done.
Chapter 3
[ 31 ]
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/pbgtest/puppet.git
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
3. You can check that everything has worked properly by vising the repo URL in your
browser. It should look something like this:
Cloning the repo
In order to manage mulple nodes with Puppet, you will need a copy of the repo on each
node. If you have a node you'd like to manage with Puppet, you can use it in this example.
Otherwise, use the Vagrant box we've been working with in previous chapters.
Run the following commands (replace the argument to git clone with the URL of your
own GitHub repo, but don't lose the production at the end):
cd /etc/puppetlabs/code/environments
sudo mv production production.sample
sudo git clone YOUR_REPO_URL production
Cloning into 'production'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
Checking connectivity... done.
Managing your Puppet code with Git
[ 32 ]
How does this work? The standard place for Puppet manifests in a producon environment
is the /etc/puppetlabs/code/environments/production/ directory, so that's
where our cloned repo needs to end up. However, the Puppet package installs some sample
manifests in that directory, and Git will refuse to clone into a directory that already exists, so
we move that directory out of the way with the mv production production.sample
command. The git clone command then recreates that directory, but this me it contains
our manifests from the repo.
Fetching and applying changes automatically
In a stand-alone Puppet architecture, each node needs to automacally fetch any changes
from the Git repo at regular intervals, and apply them with Puppet. We can use a simple shell
script for this, and there's one in the example repo (/examples/files/run-puppet.sh):
cd /etc/puppetlabs/code/environments/production && git pull
/opt/puppetlabs/bin/puppet apply manifests/
We will need to install this script on the node to be managed by Puppet, and create a
cron job to run it regularly (I suggest every 15 minutes). Of course, we could do this work
manually, but isn't this book partly about the advantages of automaon? Very well, then:
let's pracce what we're preaching.
Writing a manifest to set up regular Puppet runs
In this secon, we'll create the necessary Puppet manifests to install the run-puppet script
on a node and run it regularly from cron:
1. Run the following commands to create the required directories in your Puppet repo:
cd /home/ubuntu/puppet
mkdir manifests files
2. Run the following command to copy the run-puppet script from the examples/
cp /examples/files/run-puppet.sh files/
3. Run the following command to copy the run-puppet manifest from the
examples/ directory:
cp /ubuntu/examples/run-puppet.pp manifests/
4. Add and commit the les to Git with the following commands:
git add manifests files
git commit -m 'Add run-puppet script and cron job'
git push origin master
Chapter 3
[ 33 ]
Your Git repo now contains everything you need to automacally pull and apply changes on
your managed nodes. In the next secon, we'll see how to set up this process on a node.
You might have noced that every me you push les to your GitHub repo, Git
prompts you for your username and password. If you want to avoid this, you can
associate an SSH key with your GitHub account. Once you've done this, you'll be
able to push without having to re-enter your credenals every me. For more
informaon about using an SSH key with your GitHub account see this arcle:
Applying the run-puppet manifest
Having created and pushed the manifest necessary to set up automac Puppet runs, we now
need to pull and apply it on the target node.
In the cloned copy of your repo in /etc/puppetlabs/code/environments/
production, run the following commands:
sudo git pull
sudo puppet apply manifests/
Notice: Compiled catalog for localhost in environment production in
0.08 seconds
Notice: /Stage[main]/Main/File[/usr/local/bin/run-puppet]/ensure:
defined content as '{md5}83a6903e69564bcecc8fd1a83b1a7beb'
Notice: /Stage[main]/Main/Cron[run-puppet]/ensure: created
Notice: Applied catalog in 0.07 seconds
You can see from Puppet's output that it has created the /usr/local/bin/run-puppet
script and the run-puppet cron job. This will now run automacally every 15 minutes, pull
any new changes from the Git repo, and apply the updated manifest.
The run-puppet script
The run-puppet script does the following two things in order to automacally update the
target node:
1. Pull any changes from the Git server (git pull).
2. Apply the manifest (puppet apply).
Our Puppet manifest in run-puppet.pp deploys this script to the target node, using a file
resource, and then sets up a cron job to run it every 15 minutes, using a cron resource.
We haven't met the cron resource before, but we will cover it in more detail in Chapter 4,
Understanding Puppet resources.
Managing your Puppet code with Git
[ 34 ]
For now, just note that the cron resource has a name (run-puppet), which is just for the
benet of us humans, to remind us what it does, and it also has a command to run and hour
and minute aributes to control when it runs. The value */15 tells cron to run the job
every 15 minutes.
Testing automatic Puppet runs
To prove that the automac Puppet run works, make a change to your manifest which
creates a le (/tmp/hello.txt, for example). Commit and push this change to Git. Wait 15
minutes, and check your target node. The le should be present. If not, something is broken.
To troubleshoot the problem, try running sudo run-puppet manually. If this works,
check that the cron job is correctly installed by running sudo crontab -l. It should look
something like the following:
# HEADER: This file was autogenerated at 2017-04-05 01:46:03 -0700 by
# HEADER: While it can still be managed manually, it is definitely not
# HEADER: Note particularly that the comments starting with 'Puppet
Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: run-puppet
*/15 * * * * /usr/local/bin/run-puppet
Managing multiple nodes
You now have a fully automated stand-alone Puppet infrastructure. Any change that
you check in to your Git repo will be automacally applied to all nodes under Puppet
management. To add more nodes to your infrastructure, follow these steps for each
new node:
1. Install Puppet (not necessary if you're using the Vagrant box).
2. Clone your Git repo (as described in the Cloning the repo secon).
3. Apply the manifest (as described in the Applying the run-puppet manifest secon).
You might be wondering how to tell Puppet how to apply dierent manifests to dierent
nodes. For example, you might be managing two nodes, one of which is a web server and
the other a database server. Naturally, they will need dierent resources.
Chapter 3
[ 35 ]
We'll learn more about nodes and how to control the applicaon of resources to dierent
nodes in Chapter 8, Classes, roles, and proles, but rst, we need to learn about Puppet's
resources and how to use them. We'll do that in the next chapter.
In this chapter, we introduced the concepts of version control, and the essenals of Git in
parcular. We set up a new Git repo, created a GitHub account, pushed our code to it, and
cloned it on a node. We wrote a shell script to automacally pull and apply changes from
the GitHub repo on any node, and a Puppet manifest to install this script and run it regularly
from cron.
In the next chapter, we'll explore the power of Puppet resources, going into more detail
about the Puppet file, package, and service resources we've already encountered,
and introducing three more important resource types: user, cron, and exec.
[ 37 ]
Understanding Puppet resources
Perplexity is the beginning of knowledge.
—Khalil Gibran
We've already met three important types of Puppet resources: package, file, and
service. In this chapter, we'll learn more about these, plus other important resource
types for managing users, groups, SSH keys, cron jobs, and arbitrary commands.
Understanding Puppet resources
[ 38 ]
We saw in Chapter 2, Creang your rst manifests that Puppet can manage les on a node
using the file resource, and we looked at an example which sets the contents of a le to a
parcular string using the content aribute. Here it is again (file_hello.pp):
file { '/tmp/hello.txt':
content => "hello, world\n",
The path attribute
We've seen that every Puppet resource has a tle (a quoted string followed by a colon). In
the file_hello example, the tle of the file resource is '/tmp/hello.txt'. It's easy
to guess that Puppet is going to use this value as the path of the created le. In fact, path is
one of the aributes you can specify for a file, but if you don't specify it, Puppet will use
the tle of the resource as the value of path.
Managing whole les
While it's useful to be able to set the contents of a le to a short text string, most les we're
likely to want to manage will be too large to include directly in our Puppet manifests. Ideally,
we would put a copy of the le in the Puppet repo, and have Puppet simply copy it to the
desired place in the lesystem. The source aribute does exactly that (file_source.pp):
file { '/etc/motd':
source => '/examples/files/motd.txt',
To try this example with your Vagrant box, run the following commands:
sudo puppet apply /examples/file_source.pp
cat /etc/motd
The best software in the world only sucks. The worst software is
significantly worse than that.
-Luke Kanies
(From now on, I won't give you explicit instrucons on how to run the examples; just apply
them in the same way using sudo puppet apply as shown here. All the examples in this
book are in the examples/ directory of the GitHub repo, and I'll give you the name of the
appropriate le for each example, such as file_source.pp.)
Chapter 4
[ 39 ]
Why do we have to run sudo puppet apply instead of just puppet
apply? Puppet has the permissions of the user who runs it, so if Puppet
needs to modify a le owned by root, it must be run with root's
permissions (which is what sudo does). You will usually run Puppet as
root because it needs those permissions to do things like installing
packages, modifying cong les owned by root, and so on.
The value of the source aribute can be a path to a le on the node, as here, or an HTTP
URL, as in the following example (file_http.pp):
file { '/tmp/README.md':
source => 'https://raw.githubusercontent.com/puppetlabs/puppet/
Although this is a handy feature, bear in mind that every me you add an external
dependency like this to your Puppet manifest, you're adding a potenal point of failure.
Wherever you can, use a local copy of a le instead of having Puppet fetch
it remotely every me. This parcularly applies to soware which needs to
be built from a tarball downloaded from a website. If possible, download
the tarball and serve it from a local webserver or le server. If this isn't
praccal, using a caching proxy server can help save me and bandwidth
when you're building a large number of nodes.
On Unix-like systems, les are associated with an owner, a group, and a set of permissions
to read, write, or execute the le. Since we normally run Puppet with the permissions of the
root user (via sudo), the les Puppet manages will be owned by that user:
ls -l /etc/motd
-rw-r--r-- 1 root root 109 Aug 31 04:03 /etc/motd
Oen, this is just ne, but if we need the le to belong to another user (for example, if that
user needs to be able to write to the le), we can express this by seng the owner aribute
file { '/etc/owned_by_ubuntu':
ensure => present,
owner => 'ubuntu',
ls -l /etc/owned_by_ubuntu
-rw-r--r-- 1 ubuntu root 0 Aug 31 04:48 /etc/owned_by_ubuntu
Understanding Puppet resources
[ 40 ]
You can see that Puppet has created the le and its owner has been set to ubuntu. You can
also set the group ownership of the le using the group aribute (file_group.pp):
file { '/etc/owned_by_ubuntu':
ensure => present,
owner => 'ubuntu',
group => 'ubuntu',
ls -l /etc/owned_by_ubuntu
-rw-r--r-- 1 ubuntu ubuntu 0 Aug 31 04:48 /etc/owned_by_ubuntu
Note that this me we didn't specify either a content or source aribute for the le, but
simply ensure => present. In this case, Puppet will create a le of zero size.
Files on Unix-like systems have an associated mode which determines access permissions
for the le. It governs read, write, and execute permissions for the le's owner, any user
in the le's group, and other users. Puppet supports seng permissions on les using the
mode aribute. This takes an octal value (base 8, indicated by a leading 0 digit), with each
digit represenng a eld of 3 binary bits: the permissions for owner, group, and other,
respecvely. In the following example, we use the mode aribute to set a mode of 0644
("read and write for the owner, read-only for the group, and read-only for other users") on a
le (file_mode.pp):
file { '/etc/owned_by_ubuntu':
ensure => present,
owner => 'ubuntu',
mode => '0644',
This will be quite familiar to experienced system administrators, as the octal values for le
permissions are exactly the same as those understood by the Unix chmod command. For
more informaon, run the command man chmod.
Creang or managing permissions on a directory is a common task, and Puppet uses the
file resource to do this too. If the value of the ensure aribute is directory, the le will
be a directory (file_directory.pp):
file { '/etc/config_dir':
ensure => directory,
Chapter 4
[ 41 ]
As with regular les, you can use the owner, group, and mode aributes to control access to
Trees of les
We've already seen that Puppet can copy a single le to the node, but what about a whole
directory of les, possibly including subdirectories (known as a le tree)? The recurse
aribute will take care of this (file_tree.pp):
file { '/etc/config_dir':
source => '/examples/files/config_dir',
recurse => true,
ls /etc/config_dir/
1 2 3
When recurse is true, Puppet will copy all the les and directories (and their
subdirectories) in the source directory (/examples/files/config_dir/ in this example)
to the target directory (/etc/config_dir/).
If the target directory already exists and has les in it, Puppet will not interfere
with them, but you can change this behavior using the purge aribute. If this is
true, Puppet will delete any les and directories in the target directory which
are not present in the source directory. Use this aribute with care.
Symbolic links
Another common requirement for managing les is to create or modify a symbolic link
(known as a symlink, for short). You can have Puppet do this by seng ensure => link on
the file resource and specifying the target aribute (file_symlink.pp):
file { '/etc/this_is_a_link':
ensure => link,
target => '/etc/motd',
ls -l /etc/this_is_a_link
lrwxrwxrwx 1 root root 9 Aug 31 05:05 /etc/this_is_a_link -> /etc/motd
Understanding Puppet resources
[ 42 ]
We've already seen how to install a package using the package resource, and this is all you
need to do with most packages. However, the package resource has a few extra features
which may be useful.
Uninstalling packages
The ensure aribute normally takes the value installed in order to install a package, but
if you specify absent instead, Puppet will remove the package if it happens to be installed.
Otherwise, it will take no acon. The following example will remove the apparmor package
if it's installed (package_remove.pp):
package { 'apparmor':
ensure => absent,
By default, when Puppet removes packages, it leaves in place any les managed by the
package. To purge all the les associated with the package, use purged instead of absent.
Installing specic versions
If there are mulple versions of a package available to the system's package manager,
specifying ensure => installed will cause Puppet to install the default version (usually
the latest). But, if you need a specic version, you can specify that version string as the value
of ensure, and Puppet will install that version (package_version.pp):
package { 'openssl':
ensure => '1.0.2g-1ubuntu4.8',
It's a good idea to specify an exact version whenever you manage packages
with Puppet, so that all the nodes will get the same version of a given
package. Otherwise, if you use ensure => installed, they will just
get whatever version was current at the me they were built, leading to a
situaon where dierent nodes have dierent package versions.
When a newer version of the package is released, and you decide it's me to upgrade to it,
you can update the version string specied in the Puppet manifest and Puppet will upgrade
the package everywhere.
Chapter 4
[ 43 ]
Installing the latest version
On the other hand, if you specify ensure => latest for a package, Puppet will make
sure that the latest available version is installed every me the manifest is applied. When a
new version of the package becomes available, it will be installed automacally on the next
Puppet run.
This is not generally what you want when using a package repository that's
not under your control (for example, the main Ubuntu repository). It means
that packages will be upgraded at unexpected mes, which may break your
applicaon (or at least result in unplanned downme). A beer strategy is
to tell Puppet to install a specic version which you know works, and test
upgrades in a controlled environment before rolling them out to producon.
If you maintain your own package repository and control the release of new packages to
it, ensure => latest can be a useful feature: Puppet will update a package as soon as
you push a new version to the repo. If you are relying on upstream repositories, such as
the Ubuntu repositories, it's beer to manage the version number directly by specifying an
explicit version as the value of ensure.
Installing Ruby gems
Although the package resource is most oen used to install packages using the normal
system package manager (in the case of Ubuntu, that's APT), it can install other kinds of
packages as well. Library packages for the Ruby programming language are known as gems.
Puppet can install Ruby gems for you using the provider => gem aribute (package_
package { 'ruby':
ensure => installed,
package { 'puppet-lint':
ensure => installed,
provider => gem,
puppet-lint is a Ruby gem and therefore we have to specify provider => gem for this
package so that Puppet doesn't think it's a standard system package and try to install it via
APT. Since the gem provider is not available unless Ruby is installed, we install the ruby
package rst, then the puppet-lint gem.
Understanding Puppet resources
[ 44 ]
The puppet-lint tool, by the way, is a good thing to have installed. It will check your
Puppet manifests for common style errors and make sure they comply with the ocial
Puppet style guide. Try it now:
puppet-lint /examples/lint_test.pp
WARNING: indentation of => is not properly aligned (expected in column
11, but found it in column 10) on line 2
In this example, puppet-lint is warning you that the => arrows are not lined up vercally,
which the style guide says they should be:
file { '/tmp/lint.txt':
ensure => file,
content => "puppet-lint is your friend\n",
When puppet-lint produces no output, the le is free of lint errors.
Installing gems in Puppet's context
Puppet itself is wrien at least partly in Ruby, and makes use of several Ruby gems. To
avoid any conicts with the version of Ruby and gems which the node might need for other
applicaons, Puppet packages its own version of Ruby and associated gems under the /opt/
puppetlabs/ directory. This means you can install (or remove) whichever system version of
Ruby you like and Puppet will not be aected.
However, if you need to install a gem to extend Puppet's capabilies in some way, then doing
it with a package resource and provider => gem won't work. That is, the gem will be
installed, but only in the system Ruby context, and it won't be visible to Puppet.
Fortunately, the puppet_gem provider is available for exactly this purpose. When you use
this provider, the gem will be installed in Puppet's context (and, naturally, won't be visible
in the system context). The following example demonstrates how to use this provider
package { 'r10k':
ensure => installed,
provider => puppet_gem,
To see the gems installed in Puppet's context, use Puppet's own
version of the gem command with the following path:
/opt/puppetlabs/puppet/bin/gem list
Chapter 4
[ 45 ]
Using ensure_packages
To avoid potenal package conicts between dierent parts of your Puppet code or
between your code and third-party modules, the Puppet standard library provides a useful
wrapper for the package resource, called ensure_packages(). We'll cover this in detail in
Chapter 7, Mastering modules.
Although services are implemented in a number of varied and complicated ways at the
operang system level, Puppet does a good job of abstracng away most of this with
the service resource and exposing just the two aributes of services which you most
commonly need to manage: whether they're running (ensure) and whether they start at
boot me (enable). We covered the use of these in Chapter 2, Creang your rst manifests,
and most of the me, you won't need to know any more about service resources.
However, you'll occasionally encounter services which don't play well with Puppet, for a
variety of reasons. Somemes, Puppet is unable to detect that the service is already running
and keeps trying to start it. Other mes, Puppet may not be able to properly restart the
service when a dependent resource changes. There are a few useful aributes for service
resources which can help resolve these problems.
The hasstatus attribute
When a service resource has the aribute ensure => running aribute, Puppet needs
to be able to check whether the service is, in fact, running. The way it does this depends on
the underlying operang system. On Ubuntu 16 and later, for example, it runs systemctl
is-active SERVICE. If the service is packaged to work with systemd, that should be just
ne, but in many cases, parcularly with older soware, it may not respond properly.
If you nd that Puppet keeps aempng to start the service on every Puppet run, even
though the service is running, it may be that Puppet's default service status detecon isn't
working. In this case, you can specify the hasstatus => false aribute for the service
service { 'ntp':
ensure => running,
enable => true,
hasstatus => false,
Understanding Puppet resources
[ 46 ]
When hasstatus is false, Puppet knows not to try to check the service status using the
default system service management command, and instead, will look in the process table for
a running process which matches the name of the service. If it nds one, it will infer that the
service is running and take no further acon.
The pattern attribute
Somemes, when using hasstatus => false, the service name as dened in Puppet
doesn't actually appear in the process table, because the command that provides the service
has a dierent name. If this is the case, you can tell Puppet exactly what to look for using the
pattern aribute.
If hasstatus is false and pattern is specied, Puppet will search for the value of
pattern in the process table to determine whether or not the service is running. To nd the
paern you need, you can use the ps command to see the list of running processes:
ps ax
Find the process you're interested in and pick a string which will match only the name of
that process. For example, if it's ntpd, you might specify the pattern aribute as ntpd
service { 'ntp':
ensure => running,
enable => true,
hasstatus => false,
pattern => 'ntpd',
The hasrestart and restart attributes
When a service is noed (for example, if a file resource uses the notify aribute
to tell the service its cong le has changed, a common paern which we looked at in
Chapter 2, Creang your rst manifests), Puppet's default behavior is to stop the service,
then start it again. This usually works, but many services implement a restart command
in their management scripts. If this is available, it's usually a good idea to use it: it may be
faster or safer than stopping and starng the service. Some services take a while to shut
down properly when stopped, for example, and Puppet may not wait long enough before
trying to restart them, so that you end up with the service not running at all.
Chapter 4
[ 47 ]
If you specify hasrestart => true for a service, then Puppet will try to send a restart
command to it, using whatever service management command is appropriate for the current
plaorm (systemctl, for example, on Ubuntu). The following example shows the use of
hasrestart (service_hasrestart.pp):
service { 'ntp':
ensure => running,
enable => true,
hasrestart => true,
To further complicate things, the default system service restart command may not work,
or you may need to take certain special acons when the service is restarted (disabling
monitoring nocaons, for example). You can specify any restart command you like for
the service using the restart aribute (service_custom_restart.pp):
service { 'ntp':
ensure => running,
enable => true,
restart => '/bin/echo Restarting >>/tmp/debug.log && systemctl
restart ntp',
In this example, the restart command writes a message to a log le before restarng the
service in the usual way, but it could, of course, do anything you need it to. Note that the
restart command is only used when Puppet restarts the service (generally because it
was noed by a change to some cong le). It's not used when starng the service from
a stopped state. If Puppet nds the service has stopped and needs to start it, it will use the
normal system service start command.
In the extremely rare event that the service cannot be stopped or started using the default
service management command, Puppet also provides the stop and start aributes so that
you can specify custom commands to stop and start the service, just the same way as with
the restart aribute. If you need to use either of these, though, it's probably safe to say
that you're having a bad day.
A user on Unix-like systems does not necessarily correspond to a human person who logs
in and types commands, although it somemes does. A user is simply a named enty that
can own les and run commands with certain permissions and that may or may not have
permission to read or modify other users' les. It's very common, for sound security reasons,
to run each service on a system with its own user account. This simply means that the
service runs with the identy and permissions of that user.
Understanding Puppet resources
[ 48 ]
For example, a web server will oen run as the www-data user, which exists solely to own
les the web server needs to read and write. This limits the danger of a security breach via the
web server, because the aacker would only have the www-data user's permissions, which
are very limited, rather than the root user's, which can modify any aspect of the system. It
is generally a bad idea to run services exposed to the public Internet as the root user. The
service user should have only the minimum permissions it needs to operate the service.
Given this, an important part of system conguraon involves creang and managing users,
and Puppet's user resource provides a model for doing just that. Just as we saw with
packages and services, the details of implementaon and the commands used to manage
users vary widely from one operang system to another, but Puppet provides an abstracon
which hides those details behind a common set of aributes for users.
Creating users
The following example shows a typical user and group declaraon in Puppet (user.pp):
group { 'devs':
ensure => present,
gid => 3000,
user { 'hsing-hui':
ensure => present,
uid => '3001',
home => '/home/hsing-hui',
shell => '/bin/bash',
groups => ['devs'],
The user resource
The tle of the resource is the username (login name) of the user; in this example, hsing-
hui. The ensure => present aribute says that the user should exist on the system.
The uid aribute needs a lile more explanaon. On Unix-like systems, each user has an
individual numerical id, known as the uid. The text name associated with the user is merely
a convenience for those (mere humans, for example) who prefer strings to numbers. Access
permissions are in fact based on the uid and not the username.
Chapter 4
[ 49 ]
Why set the uid aribute? Oen, when creang users manually, we don't
specify a uid, so the system will assign one automacally. The problem with this
is that if you create the same user (hsing-hui, for example) on three dierent
nodes, you may end up with three dierent uids. This would be ne as long as
you have never shared les between nodes, or copied data from one place to
another. But in fact, this happens all the me, so it's important to make sure that
a given user's uid is the same across all the nodes in your infrastructure. That's
why we specify the uid aribute in the Puppet manifest.
The home aribute sets the user's home directory (this will be the current working directory
when the user logs in, if she does log in, and also the default working directory for cron jobs
that run as the user).
The shell aribute species the command-line shell to run when the user logs in
interacvely. For humans, this will generally be a user shell, such as /bin/bash or /bin/
sh. For service users, such as www-data, the shell should be set to /usr/sbin/nologin
(on Ubuntu systems), which does not allow interacve access, and prints a message saying
This account is currently not available. All users who do not need to log in
interacvely should have the nologin shell.
If the user needs to be a member of certain groups, you can pass the groups aribute an
array of the group names (just devs in this example).
Although Puppet supports a password aribute for user resources, I don't advise you
to use it. Service users don't need passwords, and interacve users should be logging in
with SSH keys. In fact, you should congure SSH to disable password logins altogether (set
PasswordAuthentication no in sshd_config).
The group resource
The tle of the resource is the name of the group (devs). You need not specify a gid
aribute but, for the same reasons as the uid aribute, it's a good idea to do so.
Managing SSH keys
I like to have as few interacve logins as possible on producon nodes, because it reduces
the aack surface. Fortunately, with conguraon management, it should rarely be
necessary to actually log in to a node. The most common reasons for needing an interacve
login are for system maintenance and troubleshoong, and for deployment. In both cases
there should be a single account named for this specic purpose (for example, admin or
deploy), and it should be congured with the SSH keys of any users or systems that need to
log in to it.
Understanding Puppet resources
[ 50 ]
Puppet provides the ssh_authorized_key resource to control the SSH keys associated
with a user account. The following example shows how to use ssh_authorized_key
to add an SSH key (mine, in this instance) to the ubuntu user on our Vagrant VM
ssh_authorized_key { 'john@bitfieldconsulting.com':
user => 'ubuntu',
type => 'ssh-rsa',
The tle of the resource is the SSH key comment, which reminds us who the key belongs
to. The user aribute species the user account which this key should be authorized for.
The type aribute idenes the SSH key type, usually ssh-rsa or ssh-dss. Finally, the
key aribute sets the key itself. When this manifest is applied, it adds the following to the
ubuntu user's authorized_keys le:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEA3ATqENg+GWACa2BzeqTdGnJhNoBer8x6pf
A user account can have mulple SSH keys associated with it, and anyone holding one of the
corresponding private keys and its passphrase will be able to log in as that user.
Removing users
If you need to have Puppet remove user accounts (for example, as part of an employee
leaving process), it's not enough to simply remove the user resource from the Puppet
manifest. Puppet will ignore any users on the system that it doesn't know about, and it
certainly will not remove anything it nds on the system that isn't menoned in the Puppet
manifest; that would be extremely undesirable (almost everything would be removed). So
we need to retain the user declaraon for a while, but set the ensure aribute to absent
user { 'godot':
ensure => absent,
Chapter 4
[ 51 ]
Once Puppet has run everywhere, you can remove the user resource if you like, but it does
no harm to simply leave it in place, and in fact, it's a good idea to do this, unless you can
verify manually that the user has been deleted from every aected system.
If you need to prevent a user logging in, but want to retain the account
and any les owned by the user, for archival or compliance purposes, you
can set their shell to /usr/sbin/nologin. You can also remove any
ssh_authorized_key resources associated with their account, and set
the purge_ssh_keys aribute to true on the user resource. This will
remove any authorized keys for the user that are not managed by Puppet.
Cron resources
Cron is the mechanism on Unix-like systems which runs scheduled jobs, somemes known
as batch jobs, at specied mes or intervals. For example, system housekeeping tasks, such
as log rotaon or checking for security updates, are run from cron. The details of what to run
and when to run it are kept in a specially formaed le called crontab (short for cron table).
Puppet provides the cron resource for managing scheduled jobs, and we saw an example of
this in the run-puppet manifest we developed in Chapter 3, Managing your Puppet code
with Git (run-puppet.pp):
cron { 'run-puppet':
command => '/usr/local/bin/run-puppet',
hour => '*',
minute => '*/15',
The tle run-puppet idenes the cron job (Puppet writes a comment to the crontab
le containing this name to disnguish it from other manually-congured cron jobs). The
command aribute species the command for cron to run, and the hour and minute specify
the me (*/15 is a cron syntax, meaning "every 15 minutes").
For more informaon about cron and the possible ways to specify the
mes of scheduled jobs, run the command man 5 crontab.
Understanding Puppet resources
[ 52 ]
Attributes of the cron resource
The cron resource has a few other useful aributes which are shown in the following
example (cron.pp):
cron { 'cron example':
command => '/bin/date +%F',
user => 'ubuntu',
environment => ['MAILTO=admin@example.com', 'PATH=/bin'],
hour => '0',
minute => '0',
weekday => ['Saturday', 'Sunday'],
The user aribute species who should run the cron job (if none is specied, the job runs as
root). If the environment aribute is given, it sets any environment variables the cron job
might need. A common use for this is to email any output from the cron job to a specied
email address, using the MAILTO variable.
As before, the hour and minute aributes set the me for the job to run, while you can
use the weekday aribute to specify a parcular day, or days, of the week. (The monthday
aribute works the same way, and can take any range or array of values between 1-31 to
specify the day of the month.)
One important point about cron scheduling is that the default value for any
schedule aribute is *, which means all allowed values. For example, if you
do not specify an hour aribute, the cron job will be scheduled with an
hour of *, meaning that it will run every hour. This is generally not what you
want. If you do want it to run every hour, specify hour => '*' in your
manifest, but otherwise, specify the parcular hour it should run at. The
same goes for minute. Accidentally leaving out the minute aribute and
having a job run sixty mes an hour can have amusing consequences, to say
the least.
Randomizing cron jobs
If you run a cron job on many nodes, it's a good idea to make sure the job doesn't run
everywhere at the same me. Puppet provides a built-in funcon fqdn_rand() to help with
this; it provides a random number up to a specied maximum value, which will be dierent
on each node, because the random number generator is seeded with the node's hostname.
Chapter 4
[ 53 ]
If you have several such jobs to run, you can also supply a further seed value to the
fqdn_rand() funcon, which can be any string and which will ensure that the value is
dierent for each job (fqdn_rand.pp):
cron { 'run daily backup':
command => '/usr/local/bin/backup',
minute => '0',
hour => fqdn_rand(24, 'run daily backup'),
cron { 'run daily backup sync':
command => '/usr/local/bin/backup_sync',
minute => '0',
hour => fqdn_rand(24, 'run daily backup sync'),
Because we gave a dierent string as the second argument to fqdn_rand for each cron job,
it will return a dierent random value for each hour aribute.
The range of values returned by fqdn_rand() includes 0, but does not include the
maximum value you specify. So, in the previous example, the values for hour will be
between 0 and 23, inclusive.
Removing cron jobs
Just as with user resources, or any type of resource, removing the resource declaraon from
your Puppet manifest does not remove the corresponding conguraon from the node. In
order to do that you need to specify ensure => absent on the resource.
Exec resources
While the other resource types we've seen so far (file, package, service, user, ssh_
authorized_key, and cron) have modeled some concrete piece of state on the node,
such as a le, the exec resource is a lile dierent. An exec allows you to run any arbitrary
command on the node. This might create or modify state, or it might not; anything you can
run from the command line, you can run via an exec resource.
Understanding Puppet resources
[ 54 ]
Automating manual interaction
The most common use for an exec resource is to simulate manual interacon on the
command line. Some older soware is not packaged for modern operang systems,
and needs to be compiled and installed from source, which requires you to run certain
commands. The authors of some soware have also not realized, or don't care, that users
may be trying to install their product automacally and have install scripts which prompt for
user input. This can require the use of exec resources to work around the problem.
Attributes of the exec resource
The following example shows an exec resource for building and installing an imaginary piece
of soware (exec.pp):
exec { 'install-cat-picture-generator':
cwd => '/tmp/cat-picture-generator',
command => '/tmp/cat-picture/generator/configure && /usr/bin/make
creates => '/usr/local/bin/cat-picture-generator',
The tle of the resource can be anything you like, though, as usual with Puppet resources it
must be unique. I tend to name exec resources aer the problem they're trying to solve, as
in this example.
The cwd aribute sets the working directory where the command will be run (current
working directory). When installing soware, this is generally the soware source directory.
The command aribute gives the command to run. This must be the full path to the
command, but you can chain several commands together using the shell && operator. This
executes the next command only if the previous one succeeded, so in the example, if the
configure command completes successfully, Puppet will go on to run make install,
otherwise, it will stop with an error.
If you apply this example, Puppet will give you an error like the following:
Error: /Stage[main]/Main/Exec[install-cat-picture-
generator]/returns: change from notrun to 0 failed:
Could not find command '/tmp/cat-picture/generator/
This is expected because the specied command does not, in fact, exist. In your
own manifests, you may see this error if you give the wrong path to a command,
or if the package that provides the command hasn't been installed yet.
Chapter 4
[ 55 ]
The creates aribute species a le which should exist aer the command has been run.
If this le is present, Puppet will not run the command again. This is very useful because
without a creates aribute, an exec resource will run every me Puppet runs, which is
generally not what you want. The creates aribute tells Puppet, in eect, "Run the exec
only if this le doesn't exist."
Let's see how this works, imagining that this exec is being run for the rst me. We assume
that the /tmp/cat-picture/ directory exists and contains the source of the cat-
picture-generator applicaon.
1. Puppet checks the creates aribute and sees that the /usr/local/bin/cat-
picture-generator le is not present; therefore, the exec resource must be run.
2. Puppet runs the /tmp/cat-picture-generator/configure && /usr/bin/
make install command. As a side eect of these commands, the /usr/local/
bin/cat-picture-generator le is created.
3. Next me Puppet runs, it again checks the creates aribute. This me /usr/
local/bin/cat-picture-generator exists, so Puppet does nothing.
This exec resource will never be applied again so long as the le specied in the creates
aribute exists. You can test this by deleng the le and applying Puppet again. The exec
resource will be triggered and the le recreated.
Make sure that your exec resources always include a creates aribute
(or a similar control aribute, such as onlyif or unless, which we'll
look at later in this chapter). Without this, the exec command will be run
every me Puppet runs, which is almost certainly not what you want.
Note that building and installing soware from source is not a recommended pracce for
producon systems. It's beer to build the soware on a dedicated build server (perhaps
using Puppet code similar to this example), create a system package for it, and then use
Puppet to install that package on producon nodes.
Understanding Puppet resources
[ 56 ]
The user attribute
If you don't specify a user aribute for an exec resource, Puppet will run the command as
the root user. This is oen appropriate for installing system soware or making changes to
the system conguraon, but if you need the command to run as a parcular user, specify
the user aribute, as in the following example (exec_user.pp):
exec { 'say-hello':
command => '/bin/echo Hello, this is `whoami` >/tmp/hello-ubuntu.
user => 'ubuntu',
creates => '/tmp/hello-ubuntu.txt',
This will run the specied command as the ubuntu user. The whoami command returns
the name of the user running it, so when you apply this manifest, the le /tmp/hello-
ubuntu.txt will be created with the following contents:
Hello, this is ubuntu
As with the earlier example, the creates aribute prevents Puppet from running this
command more than once.
The onlyif and unless attributes
Suppose you only want an exec resource to be applied under certain condions. For
example, a command which processes incoming data les only needs to run if there are data
les waing to be processed. In this case, it's no good adding a creates aribute; we want
the existence of a certain le to trigger the exec, not prevent it.
The onlyif aribute is a good way to solve this problem. It species a command for Puppet
to run, and the exit status from this command determines whether or not the exec will be
applied. On Unix-like systems, commands generally return an exit status of zero to indicate
success and a non-zero value for failure. The following example shows how to use onlyif in
this way (exec_onlyif.pp):
exec { 'process-incoming-cat-pictures':
command => '/usr/local/bin/cat-picture-generator --import /tmp/
onlyif => '/bin/ls /tmp/incoming/*',
The exact command isn't important here, but let's assume it's something that we would only
want to run if there are any les in the /tmp/incoming/ directory.
Chapter 4
[ 57 ]
The onlyif aribute species the check command which Puppet should run rst, to
determine whether or not the exec resource needs to be applied. If there is nothing in the
/tmp/incoming/ directory, then ls /tmp/incoming/* will return a non-zero exit status.
Puppet interprets this as failure, so does not apply the exec resource.
On the other hand, if there are les in the /tmp/incoming/ directory, the ls command
will return success. This tells Puppet the exec resource must be applied, so it proceeds to
run the /usr/local/bin/cat-picture-generator command (and we can assume this
command deletes the incoming les aer processing).
You can think of the onlyif aribute as telling Puppet, "Run the exec resource only if this
command succeeds."
The unless aribute is exactly the same as onlyif but with the opposite sense. If you
specify a command to the unless aribute, the exec will always be run unless the
command returns a zero exit status. You can think of unless as telling Puppet, "Run the
exec resource unless this command succeeds."
When you apply your manifest, if you see an exec resource running every me which
shouldn't be, check whether it species a creates, unless, or onlyif aribute. If it
species the creates aribute, it may be looking for the wrong le; if the unless or
onlyif command is specied, it may not be returning what you expect. You can see what
command is being run and what output it generates by running sudo puppet apply with
the -d (debug) ag:
sudo puppet apply -d exec_onlyif.pp
Debug: Exec[process-incoming-cat-pictures](provider=posix): Executing
check '/bin/ls /tmp/incoming/*'
Debug: Executing: '/bin/ls /tmp/incoming/*'
Debug: /Stage[main]/Main/Exec[process-incoming-cat-pictures]/onlyif: /
The refreshonly attribute
It's quite common to use exec resources for one-o commands, such as rebuilding a
database, or seng a system-tunable parameter. These generally only need to be triggered
once, when a package is installed, or occasionally, when a cong le is updated. If an exec
resource needs to run only when some other Puppet resource is changed, we can use the
refreshonly aribute to do this.
Understanding Puppet resources
[ 58 ]
If refreshonly is true, the exec will never be applied unless another resource triggers
it with notify. In the following example, Puppet manages the /etc/aliases le
(which maps local usernames to email addresses), and a change to this le triggers
the execuon of the command newaliases, which rebuilds the system alias database
file { '/etc/aliases':
content => 'root: john@bitfieldconsulting.com',
notify => Exec['newaliases'],
exec { 'newaliases':
command => '/usr/bin/newaliases',
refreshonly => true,
When this manifest is applied for the rst me, the /etc/aliases resource causes a
change to the le's contents, so Puppet sends a notify message to the exec resource. This
causes the newaliases command to be run. If you apply the manifest again, you will see
that the aliases le is not changed, so the exec is not run.
While the refreshonly aribute is occasionally extremely useful, over-use
of it can make your Puppet manifests hard to understand and debug, and it can
also be rather fragile. Felix Frank makes this point in a blog post, Friends Don't
Let Friends Use Refreshonly:
"With the exec resource type considered the last ditch, its refreshonly
parameter should be seen as especially outrageous. To make an exec
resource t into Puppet's model beer, you should use [the creates,
onlyif, or unless] parameters instead." Refer to:
Note that you don't need to use the refreshonly aribute in order to make the exec
resource noable by other resources. Any resource can nofy an exec resource in order to
make it run; however, if you don't want it to run unless it's noed, use refreshonly.
By the way, if you actually want to manage email aliases on a node,
use Puppet's built-in mailalias resource. The previous example is
just to demonstrate the use of refreshonly.
Chapter 4
[ 59 ]
The logoutput attribute
When Puppet runs shell commands via an exec resource, the output is normally hidden
from us. However, if the command doesn't seem to be working properly, it can be very useful
to see what output it produced, as this usually tells us why it didn't work.
The logoutput aribute determines whether Puppet will log the output of the exec
command along with the usual informave Puppet output. It can take three values: true,
false, or on_failure.
If logoutput is set to on_failure (which is the default), Puppet will only log command
output when the command fails (that is, returns a non-zero exit status). If you never want to
see command output, set it to false.
Somemes, however, the command returns a successful exit status but does not appear
to do anything. Seng logoutput to true will force Puppet to log the command output
regardless of exit status, which should help you gure out what's going on.
The timeout attribute
Somemes, commands can take a long me to run, or never terminate at all. By default,
Puppet allows an exec command to run for 300 seconds, at which point Puppet will
terminate it if it has not nished. If you need to allow a lile longer for the command
to complete, you can use the timeout aribute to set this. The value is the maximum
execuon me for the command in seconds.
Seng a timeout value of 0 disables the automac meout altogether and allows the
command to run forever. This should be the last resort, as a command which blocks or hangs
could stop Puppet's automac runs altogether if no meout is set. To nd a suitable value for
timeout, try running the command a few mes and choose a value which is perhaps twice
as long as a typical run. This should avoid failures caused by slow network condions, for
example, but not block Puppet from running altogether.
How not to misuse exec resources
The exec resource can do anything to the system that you could do from the command line.
As you can imagine, such a powerful tool can be misused. In theory, Puppet is a declarave
language: the manifest species the way things should be, and it is up to Puppet to take the
necessary acons to make them so. Manifests are therefore what computer sciensts call
idempotent: the system is always in the same state aer the catalog has been applied, and
however many mes you apply it, it will always be in that state.
Understanding Puppet resources
[ 60 ]
The exec resource rather spoils this theorecal picture, by allowing Puppet manifests to
have side-eects. Since your exec command can do anything, it could, for example, create
a new 1 GB le on disk with a random name, and since this will happen every me Puppet
runs, you could rapidly run out of disk space. It's best to avoid commands with side-eects
like this. In general, there's no way to know from within Puppet exactly what changes to a
system were caused by an exec resource.
Commands run via exec are also somemes used to bypass Puppet's exisng resources. For
example, if the user resource doesn't do quite what you want for some reason, you could
create a user by running the adduser command directly from an exec. This is also a bad
idea, since by doing this you lose the declarave and cross-plaorm nature of Puppet's built-
in resources. exec resources potenally change the state of the node in a way that's invisible
to Puppet's catalog.
In general, if you need to manage a concrete aspect of system state which
isn't supported by Puppet's built-in resource types, you should think
about creang a custom resource type and provider to do what you want.
This extends Puppet to add a new resource type, which you can then use
to model the state of that resource in your manifests. Creang custom
types and providers is an advanced topic and not covered in this book,
but if you want to know more, consult the Puppet documentaon:
You should also think twice before running complex commands via exec, especially
commands which use loops or condionals. It's a beer idea to put any complicated logic in
a shell script (or, even beer, in a real programming language), which you can then deploy
and run with Puppet (avoiding, as we've said, unnecessary side-eects).
As a maer of good Puppet style, every exec resource should have at least
one of creates, onlyif, unless, or refreshonly specied, to stop it
from being applied on every Puppet run. If you nd yourself using exec just
to run a command every me Puppet runs, make it a cron job instead.
Chapter 4
[ 61 ]
We've explored Puppet's file resource in detail, covering le sources, ownership,
permissions, directories, symbolic links, and le trees. We've learned how to manage
packages by installing specic versions, or the latest version, and how to uninstall packages.
We've covered Ruby gems, both in the system context and Puppet's internal context. Along
the way, we met the very useful puppet-lint tool.
We have looked at service resources, including the hasstatus, pattern, hasrestart,
restart, stop, and start aributes. We've learned how to create users and groups,
manage home directories, shells, UIDs, and SSH authorized keys. We saw how to schedule,
manage, and remove cron jobs.
Finally, we've learned all about the powerful exec resource, including how to run arbitrary
commands, and how to run commands only under certain condions, or only if a specic
le is not present. We've seen how to use the refreshonly aribute to trigger an exec
resource when other resources are updated, and we've explored the useful logoutput and
timeout aributes of exec resources.
In the next chapter, we'll nd out how to represent data and variables in Puppet manifests,
including strings, numbers, Booleans, arrays, and hashes. We'll learn how to use variables
and condional expressions to determine which resources are applied, and we'll also learn
about Puppet's facts hash and how to use it to get informaon about the system.
[ 63 ]
Variables, expressions, and facts
It is impossible to begin to learn that which one thinks one already knows.
In this chapter, you will learn about Puppet variables and data types, expressions, and
condional statements. You will also learn how Puppet manifests can get data about the
node using Facter, nd out which are the most important standard facts, and see how to
create your own external facts. Finally, you will use Puppet's each funcon to iterate over
arrays and hashes, including Facter data.
Variables, expressions, and facts
[ 64 ]
Introducing variables
A variable in Puppet is simply a way of giving a name to a parcular value, which we could
then use wherever we would use the literal value (variable_string.pp):
$php_package = 'php7.0-cli'
package { $php_package:
ensure => installed,
The dollar sign ($) tells Puppet that what follows is a variable name. Variable names must
begin with a lowercase leer or an underscore, though the rest of the name can also contain
uppercase leers or numbers.
A variable can contain dierent types of data; one such type is a String (like php7.0-cli),
but Puppet variables can also contain Number or Boolean values (true or false). Here are
a few examples (variable_simple.pp):
$my_name = 'Zaphod Beeblebrox'
$answer = 42
$scheduled_for_demolition = true
Using Booleans
Strings and numbers are straighorward, but Puppet also has a special data type to
represent true or false values, which we call Boolean values, aer the logician George
Boole. We have already encountered some Boolean values in Puppet resource aributes
service { 'sshd':
ensure => running,
enable => true,
The only allowed values for Boolean variables are the literal values true and false, but
Boolean variables can also hold the values of condional expressions (expressions whose
value is true or false), which we'll explore later in this chapter.
Chapter 5
[ 65 ]
You might be wondering what type the value running is in the previous
example. It's actually a string, but a special, unquoted kind of string called
a bare word. Although it would be exactly the same to Puppet if you used a
normal quoted string 'running' here, it's considered good style to use bare
words for aribute values which can only be one of a small number of words
(for example, the ensure aribute on services can only take the values
running or stopped). By contrast, true is not a bare word but a Boolean
value, and it is not interchangeable with the string 'true'. Always use the
unquoted literal values true or false for Boolean values.
Interpolating variables in strings
It's no good being able to store something in a variable if you can't get it out again, and
one of the most common ways to use a variable's value is to interpolate it in a string.
When you do this, Puppet inserts the current value of the variable into the contents of the
string, replacing the name of the variable. String interpolaon looks like this (string_
$my_name = 'John'
notice("Hello, ${my_name}! It's great to meet you!")
When you apply this manifest, the following output is printed:
Notice: Scope(Class[main]): Hello, John! It's great to meet you!
To interpolate (that is, to insert the value of) a variable in a string, prex its name with a $
character and surround it with curly braces ({}). This tells Puppet to replace the variable's
name with its value in the string.
We sneaked a new Puppet funcon, notice(), into the previous
example. It has no eect on the system, but it prints out the value of its
argument. This can be very useful for troubleshoong problems or nding
out what the value of a variable is at a given point in your manifest.
Creating arrays
A variable can also hold more than one value. An Array is an ordered sequence of values,
each of which can be of any type. The following example creates an array of Integer values
$heights = [193, 120, 181, 164, 172]
$first_height = $heights[0]
Variables, expressions, and facts
[ 66 ]
You can refer to any individual element of an array by giving its index number in square
brackets, where the rst element is index [0], the second is [1], and so on. (If you nd this
confusing, you're not alone, but it may help to think of the index as represenng an oset
from the beginning of the array. Naturally, then, the oset of the rst element is 0.)
Declaring arrays of resources
You already know that in Puppet resource declaraons, the tle of the resource is usually
a string, such as the path to a le or the name of a package. You might as well ask, "What
happens if you supply an array of strings as the tle of a resource instead of a single string?
Does Puppet create mulple resources, one for each element in the array?" Let's try an
experiment where we do exactly that with an array of package names and see what happens
$dependencies = [
package { $dependencies:
ensure => installed,
If our intuion is right, applying the previous manifest should give us a package resource for
each package listed in the $dependencies array, and each one should be installed. Here's
what happens when the manifest is applied:
sudo apt-get update
sudo puppet apply /examples/resource_array.pp
Notice: Compiled catalog for ubuntu-xenial in environment production
in 0.68 seconds
Notice: /Stage[main]/Main/Package[php7.0-cgi]/ensure: created
Notice: /Stage[main]/Main/Package[php7.0-cli]/ensure: created
Notice: /Stage[main]/Main/Package[php7.0-common]/ensure: created
Notice: /Stage[main]/Main/Package[php7.0-gd]/ensure: created
Notice: /Stage[main]/Main/Package[php7.0-json]/ensure: created
Notice: /Stage[main]/Main/Package[php7.0-mcrypt]/ensure: created
Chapter 5
[ 67 ]
Notice: /Stage[main]/Main/Package[php7.0-mysql]/ensure: created
Notice: /Stage[main]/Main/Package[php7.0-soap]/ensure: created
Notice: Applied catalog in 56.98 seconds
Giving an array of strings as the tle of a resource results in Puppet creang mulple
resources, all idencal except for the tle. You can do this not just with packages, but also
with les, users, or, in fact, any type of resource. We'll see some more sophiscated ways of
creang resources from data in Chapter 6, Managing data with Hiera.
Why did we run sudo apt-get update before applying the manifest?
This is the Ubuntu command to update the system's local package catalog from
the upstream servers. It's always a good idea to run this before installing any
package to make sure you're installing the latest version. In your producon
Puppet code, of course, you can run this via an exec resource.
Understanding hashes
A hash, also known as a diconary in some programming languages, is like an array, but
instead of just being a sequence of values, each value has a name (variable_hash.pp):
$heights = {
'john' => 193,
'rabiah' => 120,
'abigail' => 181,
'melina' => 164,
'sumiko' => 172,
notice("John's height is ${heights['john']}cm.")
The name for each value is known as the key. In the previous example, the keys of this hash
are john, rabiah, abigail, melina, and sumiko. To look up the value of a given key, you
put the key in square brackets aer the hash name: $heights['john'].
Puppet style note
Did you spot the trailing comma on the last hash key-value pair and
the last element of the array in the previous example? Although the
comma isn't strictly required, it's good style to add one. The reason is
that it's very common to want to add another item to an array or hash,
and if your last item already has a trailing comma, you won't have to
remember to add one when extending the list.
Variables, expressions, and facts
[ 68 ]
Setting resource attributes from a hash
You might have noced that a hash looks a lot like the aributes of a resource: it's a
one-to-one mapping between names and values. Wouldn't it be convenient if, when
declaring resources, we could just specify a hash containing all the aributes and their
values? As it happens, you can do just that (hash_attributes.pp):
$attributes = {
'owner' => 'ubuntu',
'group' => 'ubuntu',
'mode' => '0644',
file { '/tmp/test':
ensure => present,
* => $attributes,
The * character, cheerfully named the aribute splat operator, tells Puppet to treat the
specied hash as a list of aribute-value pairs to apply to the resource. This is exactly
equivalent to specifying the same aributes directly, as in the following example:
file { '/tmp/test':
ensure => present,
owner => 'vagrant',
group => 'vagrant',
mode => '0644',
Introducing expressions
Variables are not the only things in Puppet that have a value. Expressions also have a value.
The simplest expressions are just literal values:
'Oh no, not again.'
You can combine numeric values with arithmec operators, such as +, -, *, and /, to create
arithmec expressions, which have a numeric value, and you can use these to have Puppet
do calculaons (expression_numeric.pp):
$value = (17 * 8) + (12 / 4) - 1
Chapter 5
[ 69 ]
The most useful expressions, though, are which that evaluate to true or false, known as
Boolean expressions. The following is a set of examples of Boolean expressions, all of which
evaluate to true (expression_boolean.pp):
notice(9 < 10)
notice(11 > 10)
notice(10 >= 10)
notice(10 <= 10)
notice('foo' == 'foo')
notice('foo' in 'foobar')
notice('foo' in ['foo', 'bar'])
notice('foo' in { 'foo' => 'bar' })
notice('foo' =~ /oo/)
notice('foo' =~ String)
notice(1 != 2)
Meeting Puppet's comparison operators
All the operators in the Boolean expressions shown in the previous example are known
as comparison operators, because they compare two values. The result is either true or
false. These are the comparison operators Puppet provides:
== and != (equal, not equal)
>, >=, <, and <= (greater than, greater than or equal to, less than, less than or equal
A in B (A is a substring of B, A is an element of the array B, or A is a key of the hash
A =~ B (A is matched by the regular expression B, or A is a value of data type B.
For example, the expression 'hello' =~ String is true, because the value
'hello' is of type String.)
Introducing regular expressions
The =~ operator tries to match a given value against a regular expression. A regular
expression (regular in the sense of constung a paern or a rule) is a special kind of
expression which species a set of strings. For example, the regular expression /a+/
describes the set of all strings that contain one or more consecuve as: a, aa, aaa, and so
on, as well as all strings which contain such a sequence among other characters. The slash
characters // delimit a regular expression in Puppet.
When we say a regular expression matches a value, we mean the value is one of the set of
strings specied by the regular expression. The regular expression /a+/ would match the
string aaa or the string Aaaaargh!, for example.
Variables, expressions, and facts
[ 70 ]
The following example shows some regular expressions that match the string foo
$candidate = 'foo'
notice($candidate =~ /foo/) # literal
notice($candidate =~ /f/) # substring
notice($candidate =~ /f.*/) # f followed by zero or more characters
notice($candidate =~ /f.o/) # f, any character, o
notice($candidate =~ /fo+/) # f followed by one or more 'o's
notice($candidate =~ /[fgh]oo/) # f, g, or h followed by 'oo'
Regular expressions are more-or-less a standard language for expressing
string paerns. It's a complicated and powerful language, which really
deserves a book of its own (and there are several), but suce it to say for
now that Puppet's regular expression syntax is the same as that used in the
Ruby language. You can read more about it in the Ruby documentaon at:
Using conditional expressions
Boolean expressions, like those in the previous example, are useful because we can use
them to make choices in the Puppet manifest. We can apply certain resources only if a given
condion is met, or we can assign an aribute one value or another, depending on whether
some expression is true. An expression used in this way is called a condional expression.
Making decisions with if statements
The most common use of a condional expression is in an if statement. The following
example shows how to use if to decide whether to apply a resource (if.pp):
$install_perl = true
if $install_perl {
package { 'perl':
ensure => installed,
} else {
package { 'perl':
ensure => absent,
Chapter 5
[ 71 ]
You can see that the value of the Boolean variable $install_perl governs whether or not
the perl package is installed. If $install_perl is true, Puppet will apply the following
package { 'perl':
ensure => installed,
If, on the other hand, $install_perl is false, the resource applied will be:
package { 'perl':
ensure => absent,
You can use if statements to control the applicaon of any number of resources or, indeed,
any part of your Puppet manifest. You can leave out the else clause if you like; in that case,
when the value of the condional expression is false, Puppet will do nothing.
Choosing options with case statements
The if statement allows you to take a yes/no decision based on the value of a Boolean
expression. But if you need to make a choice among more than two opons, you can use a
case statement instead (case.pp):
$webserver = 'nginx'
case $webserver {
'nginx': {
notice("Looks like you're using Nginx! Good choice!")
'apache': {
notice("Ah, you're an Apache fan, eh?")
'IIS': {
notice('Well, somebody has to.')
default: {
notice("I'm not sure which webserver you're using!")
In a case statement, Puppet compares the value of the expression to each of the cases listed
in order. If it nds a match, the corresponding resources are applied. The special case called
default always matches, and you can use it to make sure that Puppet will do the right thing
even if none of the other cases match.
Variables, expressions, and facts
[ 72 ]
Finding out facts
It's very common for Puppet manifests to need to know something about the system they're
running on, for example, its hostname, IP address, or operang system version. Puppet's
built-in mechanism for geng system informaon is called Facter, and each piece of
informaon provided by Facter is known as a fact.
Using the facts hash
You can access Facter facts in your manifest using the facts hash. This is a Puppet variable
called $facts which is available everywhere in the manifest, and to get a parcular fact, you
supply the name of the fact you want as the key (facts_hash.pp):
On the Vagrant box, or any Linux system, this will return the value Linux.
In older versions of Puppet, each fact was a disnct global variable, like this:
You will sll see this style of fact reference in some Puppet code, though it is now deprecated
and will eventually stop working, so you should always use the $facts hash instead.
Running the facter command
You can also use the facter command to see the value of parcular facts, or just see what
facts are available. For example, running facter os on the command line will show you the
hash of available OS-related facts:
facter os
architecture => "amd64",
distro => {
codename => "xenial",
description => "Ubuntu 16.04 LTS",
id => "Ubuntu",
release => {
full => "16.04",
major => "16.04"
family => "Debian",
hardware => "x86_64",
name => "Ubuntu",
Chapter 5
[ 73 ]
release => {
full => "16.04",
major => "16.04"
selinux => {
enabled => false
You can also use the puppet facts command to see what facts will be available to Puppet
manifests. This will also include any custom facts dened by third-party Puppet modules (see
Chapter 7, Mastering modules, for more informaon about this).
Accessing hashes of facts
As in the previous example, many facts actually return a hash of values, rather than a single
value. The value of the $facts['os'] fact is a hash with the keys architecture, distro,
family, hardware, name, release, and selinux. Some of those are also hashes; it's
hashes all the way down!
As you know, to access a parcular value in a hash, you specify the key name in square
brackets. To access a value inside a hash, you add another key name in square brackets aer
the rst, as in the following example (facts_architecture.pp):
You can keep on appending more keys to get more and more specic informaon (facts_
Key fact
The operang system major release is a very handy fact and one you'll
probably use oen:
Variables, expressions, and facts
[ 74 ]
Referencing facts in expressions
Just as with ordinary variables or values, you can use facts in expressions, including
condional expressions (fact_if.pp):
if $facts['os']['selinux']['enabled'] {
notice('SELinux is enabled')
} else {
notice('SELinux is disabled')
Although condional expressions based on facts can be useful, an even beer
way of making decisions based on facts in your manifests is to use Hiera, which
we'll cover in the next chapter. For example, if you nd yourself wring an if or
case statement which chooses dierent resources depending on the operang
system version, consider using a Hiera query instead.
Using memory facts
Another useful set of facts is that relang to the system memory. You can nd out the total
physical memory available, and the amount of memory currently used, as well as the same
gures for swap memory.
One common use for this is to congure applicaons dynamically based on the amount
of system memory. For example, the MySQL parameter innodb_buffer_pool_size
species the amount of memory allocated to database query cache and indexes, and it
should generally be set as high as possible ("as large a value as praccal, leaving enough
memory for other processes on the node to run without excessive paging", according to
the documentaon). So you might decide to set this to three-quarters of total memory (for
example), using a fact and an arithmec expression, as in the following snippet (fact_
$buffer_pool = $facts['memory']['system']['total_bytes'] * 3/4
Key fact
The total system memory fact will help you calculate conguraon
parameters which vary as a fracon of memory:
Chapter 5
[ 75 ]
Discovering networking facts
Most applicaons use the network, so you'll nd Facter's network-related facts very useful
for anything to do with network conguraon. The most commonly used facts are the system
hostname, fully qualied domain name (FQDN), and IP address (fact_networking.pp):
notice("My hostname is ${facts['hostname']}")
notice("My FQDN is ${facts['fqdn']}")
notice("My IP is ${facts['networking']['ip']}")
Key fact
The system hostname is something you'll need to refer to oen in your
Providing external facts
While the built-in facts available to Puppet provide a lot of important informaon, you can
make the $facts hash even more useful by extending it with your own facts, known as
external facts. For example, if nodes are located in dierent cloud providers, each of which
requires a slightly dierent networking setup, you could create a custom fact called cloud to
document this. You can then use this fact in manifests to make decisions.
Puppet looks for external facts in the /opt/puppetlabs/facter/facts.d/ directory.
Try creang a le in that directory called facts.txt with the following contents (fact_
A quick way to do this is to run the following command:
sudo cp /examples/fact_external.txt /opt/puppetlabs/facter/facts.d
The cloud fact is now available in your manifests. You can check that the fact is working by
running the following command:
sudo facter cloud
To use the fact in your manifest, query the $facts hash just as you would for a built-in fact
case $facts['cloud'] {
'aws': {
notice('This is an AWS cloud node ')
Variables, expressions, and facts
[ 76 ]
'gcp': {
notice('This is a Google cloud node')
default: {
notice("I'm not sure which cloud I'm in!")
You can put as many facts in a single text le as you like, or you can have each fact in a
separate le: it doesn't make any dierence. Puppet will read all the les in the facts.d/
directory and extract all the key=value pairs from each one.
Text les work well for simple facts (those that return a single value). If your external facts
need to return structured data (arrays or hashes, for example), you can use a YAML or JSON
le instead to do this. We'll be learning more about YAML in the next chapter, but for now, if
you need to build structured external facts, consult the Puppet documentaon for details.
It's common to set up external facts like this at build me, perhaps as part of an automated
bootstrap script (see Chapter 12, Pung it all together, for more about the bootstrap
Creating executable facts
External facts are not limited to stac text les. They can also be the output of scripts or
programs. For example, you could write a script that calls a web service to get some data,
and the result would be the value of the fact. These are known as executable facts.
Executable facts live in the same directory as other external facts (/opt/puppetlabs/
facter/facts.d/), but they are disnguished by having the execute bit set on their les
(recall that les on Unix-like systems each have a set of bits indicang their read, write, and
execute permissions) and they also can't be named with .txt, .yaml, or .json extensions.
Let's build an executable fact which simply returns the current date, as an example:
1. Run the following command to copy the executable fact example into the external
fact directory:
sudo cp /examples/date.sh /opt/puppetlabs/facter/facts.d
2. Set the execute bit on the le with the following command:
sudo chmod a+x /opt/puppetlabs/facter/facts.d/date.sh
3. Now test the fact:
sudo facter date
Chapter 5
[ 77 ]
Here is the script which generates this output (date.sh):
echo "date=`date +%F`"
Note that the script has to output date= before the actual date value. This is because Facter
expects executable facts to output a list of key=value pairs (just one such pair, in this case).
The key is the name of the fact (date), and the value is whatever is returned by `date
+%F` (the current date in ISO 8601 format). You should use ISO 8601 format (YYYY-MM-DD)
whenever you need to represent dates, by the way, because it's not only the internaonal
standard date format, but it is also unambiguous and sorts alphabecally.
As you can see, executable facts are quite powerful because they can return any informaon
which can be generated by a program (the program could make network requests or
database queries, for example). However, you should use executable facts with care, as
Puppet has to evaluate all external facts on the node every me it runs, which means
running every script in /opt/puppetlabs/facter/facts.d.
If you don't need the informaon from an executable fact to be
regenerated every me Puppet runs, consider running the script from a
cron job at longer intervals and having it write output to a stac text le
in the facts directory instead.
Iterating over arrays
Iteraon (doing something repeatedly) is a useful technique in your Puppet manifests to
avoid lots of duplicated code. For example, consider the following manifest, which creates
several les with idencal properes (iteration_simple.pp):
file { '/usr/local/bin/task1':
content => "echo I am task1\n",
mode => '0755',
file { '/usr/local/bin/task2':
content => "echo I am task2\n",
mode => '0755',
file { '/usr/local/bin/task3':
content => "echo I am task3\n",
mode => '0755',
Variables, expressions, and facts
[ 78 ]
You can see that each of these resources is idencal, except for the task number: task1,
task2, and task3. Clearly, this is a lot of typing and should you later decide to change the
properes of these scripts (for example, moving them to a dierent directory), you'll have to
nd and change each one in the manifest. For three resources, this is already annoying, but
for thirty or a hundred resources it's completely impraccal. We need a beer soluon.
Using the each function
Puppet provides the each funcon to help with just this kind of situaon. The each funcon
takes an array and applies a block of Puppet code to each element of the array. Here's
the same example we saw previously, only this me using an array and the each funcon
$tasks = ['task1', 'task2', 'task3']
$tasks.each | $task | {
file { "/usr/local/bin/${task}":
content => "echo I am ${task}\n",
mode => '0755',
Now this looks more like a computer program! We have a loop, created by the each
funcon. The loop goes round and round, creang a new file resource for each element of
the $tasks array. Let's look at a schemac version of an each loop:
ARRAY.each | ELEMENT | {
The following list describes the components of the each loop:
ARRAY can be any Puppet array variable or literal value (it could even be a call to
Hiera that returns an array). In the previous example, we used $tasks as the array.
ELEMENT is the name of a variable which will hold, each me round the loop, the
value of the current element in the array. In the previous example, we decided to
name this variable $task, although we could have called it anything.
BLOCK is a secon of Puppet code. This could consist of a funcon call, resource
declaraons, include statements, condional statements: anything which you can
put in a Puppet manifest, you can also put inside a loop block. In the previous
example, the only thing in the block was the file resource, which creates /usr/
Chapter 5
[ 79 ]
Iterating over hashes
The each funcon works not only on arrays, but also on hashes. When iterang over a hash,
the loop takes two ELEMENT parameters: the rst is the hash key, and the second is the
value. The following example shows how to use each to iterate over a hash resulng from a
Facter query (iteration_hash.pp):
$nics = $facts['networking']['interfaces']
$nics.each | String $interface, Hash $attributes | {
notice("Interface ${interface} has IP ${attributes['ip']}")
The list of interfaces returned by $facts['networking']['interfaces'] is a hash,
where the key is the name of the interface (for example, lo0 for the local loopback
interfaces) and the value is a hash of the interface's aributes (including the IP address,
netmask, and so on). Applying the manifest in the previous example gives this result (on my
Vagrant box):
sudo puppet apply /examples/iteration_hash.pp
Notice: Scope(Class[main]): Interface enp0s3 has IP
Notice: Scope(Class[main]): Interface lo has IP
In this chapter, we've gained an understanding of how Puppet's variable and data type
system works, including the basic data types: Strings, Numbers, Booleans, Arrays, and
Hashes. We've seen how to interpolate variables in strings and how to quickly create sets
of similar resources using an array of resource names. We've learned how to set common
aributes for resources using a hash of aribute-value pairs and the aribute splat operator.
We've seen how to use variables and values in expressions, including arithmec expressions,
and explored the range of Puppet's comparison operators to generate Boolean expressions.
We've used condional expressions to build if…else and case statements and had a brief
introducon to regular expressions.
We've learned how Puppet's Facter subsystem supplies informaon about the node via the
facts hash and how to use facts in our own manifests and in expressions. We've pointed out
some key facts, including the operang system release, the system memory capacity, and the
system hostname. We've seen how to create custom external facts, such as a cloud fact,
and how to dynamically generate fact informaon using executable facts.
Variables, expressions, and facts
[ 80 ]
Finally, we've learned about iteraon in Puppet using the each funcon and how to create
mulple resources based on data from arrays or hashes, including Facter queries.
In the next chapter, we'll stay with the topic of data and explore Puppet's powerful Hiera
database. We'll see what problems Hiera solves, look at how to set up and query Hiera, how
to write data sources, how to create Puppet resources directly from Hiera data, and also how
to use Hiera encrypon to manage secret data.
[ 81 ]
Managing data with Hiera
What you don't know can't hurt me.
—Edward S. Marshall
In this chapter, you will learn why it's useful to separate your data and code. You will see how
to set up Puppet's built-in Hiera mechanism, how to use it to store and query conguraon
data, including encrypted secrets such as passwords, and how to use Hiera data to create
Puppet resources.
Managing data with Hiera
[ 82 ]
Why Hiera?
What do we mean by conguraon data? There will be lots of pieces of informaon in your
manifests which we can regard as conguraon data: for example, the values of all your
resource aributes. Look at the following example:
package { 'puppet-agent':
ensure => '5.2.0-1xenial',
The preceding manifest declares that version 5.2.0-1xenial of the puppet-agent
package should be installed. But what happens when a new version of Puppet is released?
When you want to upgrade to it, you'll have to nd this code, possibly deep in mulple levels
of directories, and edit it to change the desired version number.
Data needs to be maintained
Mulply this by all the packages managed throughout your manifest, and there is there's
already a problem. But this is just one piece of data that needs to be maintained, and there
are many more: the mes of cron jobs, the email addresses for reports to be sent to, the
URLs of les to fetch from the web, the parameters for monitoring checks, the amount of
memory to congure for the database server, and so on. If these values are embedded in
code in hundreds of manifest les, you're seng up trouble for the future.
How can you make your cong data easy to nd and maintain?
Settings depend on nodes
Mixing data with code makes it harder to nd and edit that data. But there's another
problem. What if you have two nodes to manage with Puppet, and there's a cong value
which needs to be dierent on each of them? For example, they might both have a cron job
to run the backup, but the job needs to run at a dierent me on each node.
How can you use dierent values for dierent nodes, without having lots of complicated
logic in your manifest?
Operating systems differ
What if you have some nodes running Ubuntu 16, and some on Ubuntu 18? As you'll know if
you've ever had to upgrade the operang system on a node, things change from one version
to the next. For example, the name of the database server package might have changed from
mysql-server to mariadb-server.
How can you nd the right value to use in your manifest depending on what operang
system the node is running?
Chapter 6
[ 83 ]
The Hiera way
What we want is a kind of central database in Puppet where we can look up conguraon
sengs. The data should be stored separately from Puppet code, and make it easy to nd
and edit values. It should be possible to look up values with a simple funcon call in Puppet
code or templates. Further, we need to be able to specify dierent values depending on
things like the hostname of the node, the operang system, or potenally anything else.
We would also like to be able to enforce a parcular data type for values, such as String or
Boolean. The database should do all of this work for us, and just return the appropriate value
to the manifest where it's needed.
Fortunately, Hiera does exactly this. Hiera lets you store your cong data in simple text les
(actually, YAML, JSON, or HOCON les, which use popular structured text formats), and it
looks like the following example:
test: 'This is a test'
consul_node: true
apache_worker_factor: 100
apparmor_enabled: true
In your manifest, you query the database using the lookup() funcon, as in the following
example (lookup.pp):
file { lookup('backup_path', String):
ensure => directory,
The arguments to lookup are the name of the Hiera key you want to retrieve (for example
backup_path), and the expected data type (for example String).
Setting up Hiera
Hiera needs to know one or two things before you can start using it, which are specied in
the Hiera conguraon le, named hiera.yaml (not to be confused this with Hiera data
les, which are also YAML les, and we'll nd about those later in this chapter.) Each Puppet
environment has its own local Hiera cong le, located at the root of the environment
directory (for example, for the production environment, the local Hiera cong le would be
Managing data with Hiera
[ 84 ]
Hiera can also use a global cong le located at /etc/puppetlabs/
puppet/hiera.yaml, which takes precedence over the per-environment
le, but the Puppet documentaon recommends you only use this cong layer
for certain exceponal purposes, such as temporary overrides; all your normal
Hiera data and conguraon should live at the environment layer.
The following example shows a minimal hiera.yaml le (hiera_minimal.config.yaml):
version: 5
datadir: data
data_hash: yaml_data
- name: "Common defaults"
path: "common.yaml"
YAML les begin with three dashes and a newline (---). This is part of the YAML format, not
a Hiera feature; it's the syntax indicang the start of a new YAML document.
The most important seng in the defaults secon is datadir. This tells Hiera in which
directory to look for its data les. Convenonally, this is in a data/ subdirectory of the
Puppet manifest directory, but you can change this if you need to.
Large organizaons may nd it useful to manage Hiera data les
separately to Puppet code, perhaps in a separate Git repo (for
example, you might want to give certain people permission to edit
Hiera data, but not Puppet manifests).
The hierarchy secon is also interesng. This tells Hiera which les to read for its data and
in which order. In the example only Common defaults is dened, telling Hiera to look for
data in a le called common.yaml. We'll see later in this chapter what else you can do with
the hierarchy secon.
Chapter 6
[ 85 ]
Adding Hiera data to your Puppet repo
Your Vagrant VM is already set up with a suitable Hiera cong and the sample data le, in the
/etc/puppetlabs/code/environments/pbg directory. Try it now:
Run the following commands:
sudo puppet lookup --environment pbg test
--- This is a test
We haven't seen the --environment switch before, so it's me to briey
introduce Puppet environments. A Puppet environment is a directory
containing a Hiera cong le, Hiera data, a set of Puppet manifests—in other
words, a complete, self-contained Puppet setup. Each environment lives in
a named directory under /etc/puppetlabs/code/environments.
The default environment is production, but you can use any environment
you like by giving the --environment switch to the puppet lookup
command. In the example, we are telling Puppet to use the /etc/
puppetlabs/code/environments/pbg directory.
When you come to add Hiera data to your own Puppet environment, you can use the
example hiera.yaml and data les as a starng point.
Troubleshooting Hiera
If you don't get the result This is a test , your Hiera setup is not working properly. If
you see the warning Config file not found, using Hiera defaults, check that
your Vagrant box has an /etc/puppetlabs/code/environments/pbg directory. If not,
destroy and re-provision your Vagrant box with:
vagrant destroy
If you see an error like the following, it generally indicates a problem with the Hiera data le
Error: Evaluation Error: Error while evaluating a Function Call,
(/etc/puppetlabs/code/environments/pbg/hiera.yaml): did not find
expected key while parsing a block mapping at line 11 column 5 at
line 1:8 on node ubuntu-xenial
If this is the case, check the syntax of your Hiera data les.
Managing data with Hiera
[ 86 ]
Querying Hiera
In Puppet manifests, you can use the lookup() funcon to query Hiera for the specied key
(you can think of Hiera as a key-value database, where the keys are strings, and values can be
any type).
In general, you can use a call to lookup() anywhere in your Puppet manifests you might
otherwise use a literal value. The following code shows some examples of this (lookup2.pp):
notice("Apache is set to use ${lookup('apache_worker_factor',
Integer)} workers")
unless lookup('apparmor_enabled', Boolean) {
exec { 'apt-get -y remove apparmor': }
notice('dns_allow_query enabled: ', lookup('dns_allow_query',
To apply this manifest in the example environment, run the following command:
sudo puppet apply --environment pbg /examples/lookup2.pp
Notice: Scope(Class[main]): Apache is set to use 100 workers
Notice: Scope(Class[main]): dns_allow_query enabled: true
Typed lookups
As we've seen, lookup() takes a second parameter which species the expected type of the
value to be retrieved. Although this is oponal, you should always specify it, to help catch
errors. If you accidentally look up the wrong key, or mistype the value in the data le, you'll
get an error like this:
Error: Evaluation Error: Error while evaluating a Function Call,
Found value has wrong type, expects a Boolean value, got String at /
examples/lookup_type.pp:1:8 on node ubuntu-xenial
Types of Hiera data
As we've seen, Hiera data is stored in text les, structured using the format called YAML Ain't
Markup Language, which is a common way of organizing data. Here's another snippet from
our sample Hiera data le, which you'll nd at /etc/puppetlabs/code/environments/
pbg/data/common.yaml on the VM:
syslog_server: ''
Chapter 6
[ 87 ]
- ''
- ''
- ''
- ''
manage_dhcp: true
pxe_just_once: true
There are actually three dierent kinds of Hiera data structures present: single values,
arrays, and hashes. We'll examine these in detail in a moment.
Single values
Most Hiera data consists of a key associated with a single value, as in the previous example:
syslog_server: ''
The value can be any legal Puppet value, such as a String, as in this case, or it can be an
apache_worker_factor: 100
Boolean values
You should specify Boolean values in Hiera as either true or false, without surrounding
quotes. However, Hiera is fairly liberal in what it interprets as Boolean values: any of true,
on, or yes (with or without quotes) are interpreted as a true value, and false, off, or no
are interpreted as a false value. For clarity, though, sck to the following format:
consul_node: true
When you use lookup() to return a Boolean value in your Puppet code, you can use it as
the condional expression in, for example, an if statement:
if lookup('is_production', Boolean) {
Usefully, Hiera can also store an array of values associated with a single key:
- ''
- ''
- ''
- ''
Managing data with Hiera
[ 88 ]
The key (monitor_ips) is followed by a list of values, each on its own line and preceded by
a hyphen (-). When you call lookup('monitor_ips', Array) in your code, the values
will be returned as a Puppet array.
As we saw in Chapter 5, Variables, expressions, and facts, a hash (also called a diconary
in some programming languages) is like an array where each value has an idenfying name
(called the key), as in the following example:
manage_dhcp: true
pxe_just_once: true
Each key-value pair in the hash is listed, indented, on its own line. The cobbler_config
hash has two keys, manage_dhcp and pxe_just_once. The value associated with each of
those keys is true.
When you call lookup('cobbler_config', Hash) in a manifest, the data will be
returned as a Puppet hash, and you can reference individual values in it using the normal
Puppet hash syntax, as we saw in Chapter 5, Variables, expressions, and facts (lookup_
$cobbler_config = lookup('cobbler_config', Hash)
$manage_dhcp = $cobbler_config['manage_dhcp']
$pxe_just_once = $cobbler_config['pxe_just_once']
if $pxe_just_once {
notice('pxe_just_once is enabled')
} else {
notice('pxe_just_once is disabled')
Since it's very common for Hiera data to be a hash of hashes, you can retrieve values from
several levels down in a hash by using the following "dot notaon" (lookup_hash_dot.pp):
$web_root = lookup('cms_parameters.static.web_root', String)
notice("web_root is ${web_root}")
Interpolation in Hiera data
Hiera data is not restricted to literal values; it can also include the value of Facter facts or
Puppet variables, as in the following example:
backup_path: "/backup/%{facts.hostname}"
Chapter 6
[ 89 ]
Anything within the %{} delimiters inside a quoted string is evaluated and interpolated by
Hiera. Here, we're using the dot notaon to reference a value inside the $facts hash.
Using lookup()
Helpfully, you can also interpolate Hiera data in Hiera data, by using the lookup() funcon
as part of the value. This can save you repeang the same value many mes, and can make
your data more readable, as in the following example (also from hiera_sample.yaml):
home: ''
office1: ''
office2: ''
- "%{lookup('ips.home')}"
- "%{lookup('ips.office1')}"
- "%{lookup('ips.office2')}"
This is much more readable than simply lisng a set of IP addresses with no indicaon of
what they represent, and it prevents you accidentally introducing errors by updang a value
in one place but not another. Use Hiera interpolaon to make your data self-documenng.
Using alias()
When you use the lookup() funcon in a Hiera string value, the result is always a string.
This is ne if you're working with string data, or if you want to interpolate a Hiera value into
a string containing other text. However, if you're working with arrays, hashes, or Boolean
values, you need to use the alias() funcon instead. This lets you re-use any Hiera data
structure within Hiera, just by referencing its name:
- "%{lookup('ips.home')}"
- "%{lookup('ips.office1')}"
- "%{lookup('ips.office2')}"
vpn_allow_list: "%{alias('firewall_allow_list')}"
Don't be fooled by the surrounding quotes: it may look as though vpn_allow_list will be
a string value, but because we are using alias(), it will actually be an array, just like the
value it is aliasing (firewall_allow_list).
Managing data with Hiera
[ 90 ]
Using literal()
Because the percent character (%) tells Hiera to interpolate a value, you might be wondering
how to specify a literal percent sign in data. For example, Apache uses the percent sign in
its conguraon to refer to variable names like %{HTTP_HOST}. To write values like these
in Hiera data, we need to use the literal() funcon, which exists only to refer to a literal
percent character. For example, to write the value %{HTTP_HOST} as Hiera data, we would
need to write:
You can see a more complicated example in the sample Hiera data le:
comment: "Force WWW"
rewrite_cond: "%{literal('%')}{HTTP_HOST} !^www\\. [NC]"
rewrite_rule: "^(.*)$ https://www.%{literal('%')}{HTTP_
HOST}%{literal('%')}{REQUEST_URI} [R=301,L]"
The hierarchy
So far, we've only used a single Hiera data source (common.yaml). Actually, you can have as
many data sources as you like. Each usually corresponds to a YAML le, and they are listed
in the hierarchy secon of the hiera.yaml le, with the highest-priority source rst and
the lowest last:
- name: "Host-specific data"
path: "nodes/%{facts.hostname}.yaml"
- name: "OS release-specific data"
path: "os/%{facts.os.release.major}.yaml"
- name: "OS distro-specific data"
path: "os/%{facts.os.distro.codename}.yaml"
- name: "Common defaults"
path: "common.yaml"
In general, though, you should keep as much data as possible in the common.yaml le,
simply because it's easier to nd and maintain data if it's in one place, rather than scaered
through several les.
Chapter 6
[ 91 ]
For example, if you have some Hiera data which is only used on the monitor node, you
might be tempted to put it in a nodes/monitor.yaml le. But, unless it has to override
some sengs in common.yaml, you'll just be making it harder to nd and update. Put
everything in common.yaml that you can, and reserve other data sources only for overrides
to common values.
Dealing with multiple values
You may be wondering what happens if the same key is listed in more than one Hiera data
source. For example, imagine the rst source contains the following:
consul_node: false
Also, assume that common.yaml contains:
consul_node: true
What happens when you call lookup('consul_node', Boolean) with this data? There
are two dierent values for consul_node in two dierent les, so which one does Hiera
The answer is that Hiera searches data sources in the order they are listed in the hierarchy
secon; that is to say, in priority order. It returns the rst value found, so if there are mulple
values, only the value from the rst—that is, highest-priority—data source will be returned
(that's the "hierarchy" part).
Merge behaviors
We said in the previous secon that if there is more than one value matching the specied
key, the rst matching data source takes priority over the others. This is the default behavior,
and this is what you'll usually want. However, somemes you may want lookup() to return
the union of all the matching values found, throughout the hierarchy. Hiera allows you to
specify which of these strategies it should use when mulple values match your lookup.
This is called a merge behavior, and you can specify which merge behavior you want as the
third argument to lookup(), aer the key and data type (lookup_merge.pp):
notice(lookup('firewall_allow_list', Array, 'unique'))
The default merge behavior is called first, and it returns only one value, the rst found. By
contrast, the unique merge behavior returns all the values found, as a aened array, with
duplicates removed (hence unique).
Managing data with Hiera
[ 92 ]
If you are looking up hash data, you can use the hash merge behavior to return a merged
hash containing all the keys and values from all matching hashes found. If Hiera nds two
hash keys with the same name, only the value of the rst will be returned. This is known
as a shallow merge. If you want a deep merge (that is, one where matching hashes will be
merged at all levels, instead of just the top level) use the deep merge behavior.
If this all sounds a bit complicated, don't worry. The default merge behavior is probably what
you want most of the me, and if you should happen to need one of the other behaviors
instead, you can read more about it in the Puppet documentaon.
Data sources based on facts
The hierarchy mechanism lets you set common default values for all situaons (usually in
common.yaml), but override them in specic circumstances. For example, you can set a data
source in the hierarchy based on the value of a Puppet fact, such as the hostname:
- name: "Host-specific data"
path: "nodes/%{facts.hostname}.yaml"
Hiera will look up the value of the specied fact and search for a data le with that name in
the nodes/ directory. In the previous example, if the node's hostname is web1, Hiera will
look for the data le nodes/web1.yaml in the Hiera data directory. If this le exists and
contains the specied Hiera key, the web1 node will receive that value for its lookup, while
other nodes will get the default value from common.
Note that you can organize your Hiera data les in subdirectories
under the main data/ directory if you like, such as data/nodes/.
Another useful fact to reference in the hierarchy is the operang system major version or
codename. This is very useful when you need your manifest to work on more than one release
of the operang system. If you have more than a handful of nodes, migrang to the latest OS
release is usually a gradual process, upgrading one node at a me. If something has changed
from one version to the next that aects your Puppet manifest, you can use the os.distro.
codename fact to select the appropriate Hiera data, as in the following example:
- name: "OS-specific data"
path: "os/%{facts.os.distro.codename}.yaml"
Alternavely, you can use the os.release.major fact:
- name: "OS-specific data"
path: "os/%{facts.os.release.major}.yaml"
Chapter 6
[ 93 ]
For example, if your node is running Ubuntu 16.04 Xenial, Hiera will look for a data le
named os/xenial.yaml (if you're using os.distro.codename) or os/16.04.yaml (if
you're using os.release.major) in the Hiera data directory.
For more informaon about facts in Puppet, see Chapter 5, Variables, expressions, and facts.
What belongs in Hiera?
What data should you put in Hiera, and what should be in your Puppet manifests? A good
rule of thumb about when to separate data and code is to ask yourself what might change
in the future. For example, the exact version of a package is a good candidate for Hiera data,
because it's quite likely you'll need to update it in the future.
Another characterisc of data that belongs in Hiera is that it's specic to your site or
company. If you take your Puppet manifest and give it to someone else in another company
or organizaon, and she has to modify any values in the code to make it work at her site,
then those values should probably be in Hiera. This makes it much easier to share and re-use
code; all you have to do is edit some values in Hiera.
If the same data is needed in more than one place in your manifests, it's also a good idea for
that data to be stored in Hiera. Otherwise, you have to either repeat the data, which makes it
harder to maintain, or use a global variable, which is bad style in any programming language,
and especially so in Puppet.
If you have to change a data value when you apply your manifests on a dierent operang
system, that's also a candidate for Hiera data. As we've seen in this chapter, you can use the
hierarchy to select the correct value based on facts, such as the operang system or version.
One other kind of data that belongs in Hiera is parameter values for classes and modules;
we'll see more about that in Chapter 7, Mastering modules.
Creating resources with Hiera data
When we started working with Puppet, we created resources directly in the manifest using
literal aribute values. In this chapter, we've seen how to use Hiera data to ll in the tle
and aributes of resources in the manifest. We can now take this idea one step further and
create resources directly from Hiera queries. The advantage of this method is that we can
create any number of resources of any type, based purely on data.
Managing data with Hiera
[ 94 ]
Building resources from Hiera arrays
In Chapter 5, Variables, expressions, and facts, we learned how to use Puppet's each
funcon to iterate over an array or hash, creang resources as we go. Let's apply this
technique to some Hiera data. In our rst example, we'll create some user resources from a
Hiera array.
Run the following command:
sudo puppet apply --environment pbg /examples/hiera_users.pp
Notice: /Stage[main]/Main/User[katy]/ensure: created
Notice: /Stage[main]/Main/User[lark]/ensure: created
Notice: /Stage[main]/Main/User[bridget]/ensure: created
Notice: /Stage[main]/Main/User[hsing-hui]/ensure: created
Notice: /Stage[main]/Main/User[charles]/ensure: created
Here's the data we're using (from the /etc/puppetlabs/code/environments/pbg/
data/common.yaml le):
- 'katy'
- 'lark'
- 'bridget'
- 'hsing-hui'
- 'charles'
And here's the code which reads it and creates the corresponding user instances (hiera_
lookup('users', Array[String]).each | String $username | {
user { $username:
ensure => present,
Combining Hiera data with resource iteraon is a powerful idea. This short manifest could
manage all the users in your infrastructure, without you ever having to edit the Puppet code
to make changes. To add new users, you need only edit the Hiera data.
Building resources from Hiera hashes
Of course, real life is never quite as simple as a programming language example. If you were
really managing users with Hiera data in this way, you'd need to include more data than just
their names: you'd need to be able to manage shells, UIDs, and so on, and you'd also need to
be able to remove the users if necessary. To do that, we will need to add some structure to
the Hiera data.
Chapter 6
[ 95 ]
Run the following command:
sudo puppet apply --environment pbg /examples/hiera_users2.pp
Notice: Compiled catalog for ubuntu-xenial in environment pbg in 0.05
Notice: /Stage[main]/Main/User[katy]/uid: uid changed 1001 to 1900
Notice: /Stage[main]/Main/User[katy]/shell: shell changed '' to '/bin/
Notice: /Stage[main]/Main/User[lark]/uid: uid changed 1002 to 1901
Notice: /Stage[main]/Main/User[lark]/shell: shell changed '' to '/bin/
Notice: /Stage[main]/Main/User[bridget]/uid: uid changed 1003 to 1902
Notice: /Stage[main]/Main/User[bridget]/shell: shell changed '' to '/
Notice: /Stage[main]/Main/User[hsing-hui]/uid: uid changed 1004 to
Notice: /Stage[main]/Main/User[hsing-hui]/shell: shell changed '' to
Notice: /Stage[main]/Main/User[charles]/uid: uid changed 1005 to 1904
Notice: /Stage[main]/Main/User[charles]/shell: shell changed '' to '/
Notice: Applied catalog in 0.17 seconds
The rst dierence from the previous example is that instead of the data being a simple
array, it's a hash of hashes:
ensure: present
uid: 1900
shell: '/bin/bash'
ensure: present
uid: 1901
shell: '/bin/sh'
ensure: present
uid: 1902
shell: '/bin/bash'
ensure: present
uid: 1903
shell: '/bin/sh'
ensure: present
uid: 1904
shell: '/bin/bash'
Managing data with Hiera
[ 96 ]
Here's the code which processes that data (hiera_users2.pp):
lookup('users2', Hash, 'hash').each | String $username, Hash $attrs |
user { $username:
* => $attrs,
Each of the keys in the users2 hash is a username, and each value is a hash of user
aributes such as uid and shell.
When we call each on this hash, we specify two parameters to the loop instead of one:
| String $username, Hash $attrs |
As we saw in Chapter 5, Variables, expressions, and facts, when iterang over a hash, these
two parameters receive the hash key and its value, respecvely.
Inside the loop, we create a user resource for each element of the hash:
user { $username:
* => $attrs,
You may recall from the previous chapter that the * operator (the aribute splat operator)
tells Puppet to treat $attrs as a hash of aribute-value pairs. So the rst me round
the loop, with user katy, Puppet will create a user resource equivalent to the following
user { 'katy':
ensure => present,
uid => 1900,
shell => '/bin/bash',
Every me we go round the loop with the next element of users, Puppet will create another
user resource with the specied aributes.
Chapter 6
[ 97 ]
The advantages of managing resources with Hiera data
The previous example makes it easy to manage users across your network without having
to edit Puppet code: if you want to remove a user, for example, you would simply change
her ensure aribute in the Hiera data to absent. Although each of the users happens to
have the same set of aributes specied, this isn't essenal; you could add any aribute
supported by the Puppet user resource to any user in the data. Also, if there's an aribute
whose value is always the same for all users, you need not list it in the Hiera data for every
user. You can add it as a literal aribute value of the user resource inside the loop, and thus
every user will have it.
This makes it easier to add and update users on a roune basis, but there are other
advantages too: for example, you could write a simple web applicaon which allowed HR
sta to add or edit users using a browser interface, and it would only need to output a YAML
le with the required data. This is much easier and more robust than trying to generate
Puppet code automacally. Even beer, you could pull user data from an LDAP or Acve
Directory (AD) server and put it into Hiera YAML format for input into this manifest.
This is a very powerful and exible technique, and of course you can use it to manage any
kind of Puppet resource: les, packages, Apache virtual hosts, MySQL databases—anything
you can do with a resource you can do with Hiera data and each. You can also use Hiera's
override mechanism to create dierent sets of resources for dierent nodes, roles, or
operang systems.
However, you shouldn't over-use this technique. Creang resources from Hiera data adds
a layer of abstracon which makes it harder to understand the code for anyone trying to
read or maintain it. With Hiera, it can also be dicult to work out from inspecon exactly
what data the node will get in a given set of circumstances. Keep your hierarchy as simple
as possible, and reserve the data-driven resources trick for situaons where you have a
large and variable number of resources which you need to update frequently. In Chapter 11,
Orchestrang cloud resources, we'll see how to use the same technique to manage cloud
instances, for example.
Managing secret data
Puppet oen needs to know your secrets; for example, passwords, private keys, and
other credenals need to be congured on the node, and Puppet must have access to this
informaon. The problem is how to make sure that no-one else does. If you are checking this
data into a Git repo, it will be available to anybody who has access to the repo, and if it's a
public GitHub repo, everybody in the world can see it.
Managing data with Hiera
[ 98 ]
Clearly, it's essenal to be able to encrypt secret data in such a way that Puppet can decrypt
it on individual nodes where it's needed, but it's indecipherable to anybody who does
not have the key. The popular GnuPG encrypon tool is a good choice for this. It lets you
encrypt data using a public key which can be distributed widely, but only someone with the
corresponding private key can decrypt the informaon.
Hiera has a pluggable backend system which allows it to support various dierent ways of
storing data. One such backend is called hiera-eyaml-gpg, which allows Hiera to use a
GnuPG-encrypted data store. Rather than encrypng a whole data le, hiera-eyaml-gpg
lets you mix encrypted and plaintext data in the same YAML le. That way, even someone
who doesn't have the private key can sll edit and update the plaintext values in Hiera data
les, although the encrypted data values will be unreadable to them.
Setting up GnuPG
First, we'll need to install GnuPG and create a key pair for use with Hiera. The following
instrucons will help you do this:
1. Run the following command:
sudo apt-get install gnupg rng-tools
2. Once GnuPG is installed, run the following command to generate a new key pair:
gpg --gen-key
3. When prompted, select the RSA and RSA key type:
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
Your selection? 1
4. Select a 2,048 bit key size:
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 2048
5. Enter 0 for the key expiry me:
Key is valid for? (0) 0
Key does not expire at all
Is this correct? (y/N) y
Chapter 6
[ 99 ]
6. When prompted for a real name, email address, and comment for the key, enter
whatever is appropriate for your site:
Real name: Puppet
Email address: puppet@cat-pictures.com
You selected this USER-ID:
"Puppet <puppet@cat-pictures.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
7. When prompted for a passphrase, just hit Enter (the key can't have a passphrase,
because Puppet won't be able to supply it).
It may take a few moments to generate the key, but once this is complete, GnuPG will print
out the key ngerprint and details (yours will look dierent):
pub 2048R/40486112 2016-09-30
Key fingerprint = 6758 6CEE D221 7AA0 8369 FF3A FEC1 0055 4048
uid Puppet <puppet@cat-pictures.com>
sub 2048R/472954EB 2016-09-30
This key is now stored in your GnuPG keyring, and Hiera will be able to use it to encrypt and
decrypt your secret data on this node. We'll see later in the chapter how to distribute this
key to other nodes managed by Puppet.
Adding an encrypted Hiera source
A Hiera source using GPG-encrypted data needs a couple of extra parameters. Here's the
relevant secon from the example hiera.yaml le:
- name: "Secret data (encrypted)"
lookup_key: eyaml_lookup_key
path: "secret.eyaml"
gpg_gnupghome: '/home/ubuntu/.gnupg'
As with normal data sources, we a have name and a path to the data le, but we also need
to specify the lookup_key funcon, which in this case is eyaml_lookup_key, and set
options['gpg_gnupghome'] to point to the GnuPG directory, where the decrypon
key lives.
Managing data with Hiera
[ 100 ]
Creating an encrypted secret
You're now ready to add some secret data to your Hiera store.
1. Create a new empty Hiera data le with the following commands:
cd /etc/puppetlabs/code/environments/pbg
sudo touch data/secret.eyaml
2. Run the following command to edit the data le using the eyaml editor (which
automacally encrypts the data for you when you save it). Instead of puppet@cat-
pictures.com, use the email address that you entered when you created your
GPG key.
sudo /opt/puppetlabs/puppet/bin/eyaml edit --gpg-always-trust
--gpg-recipients=puppet@cat-pictures.com data/secret.eyaml
3. If the system prompts you to select your default editor, choose the editor you prefer.
If you're familiar with Vim, I recommend you choose that, but otherwise, you will
probably nd nano the easiest opon. (You should learn Vim, but that's a subject for
another book.)
4. Your selected editor will be started with the following text already inserted in
the le:
#| This is eyaml edit mode. This text (lines starting with #| at
the top of the
#| file) will be removed when you save and exit.
#| - To edit encrypted values, change the content of the
#| block (or DEC(<num>)::GPG[]!).
#| WARNING: DO NOT change the number in the parentheses.
#| - To add a new encrypted value copy and paste a new block from
#| appropriate example below. Note that:
#| * the text to encrypt goes in the square brackets
#| * ensure you include the exclamation mark when you copy and
#| * you must not include a number when adding a new block
#| e.g. DEC::PKCS7[]! -or- DEC::GPG[]!
5. Enter the following text below the commented message, exactly as shown, including
the beginning three hyphens:
test_secret: DEC::GPG[This is a test secret]!
6. Save the le and exit the editor.
Chapter 6
[ 101 ]
7. Run the following command to test that Puppet can read and decrypt your secret:
sudo puppet lookup --environment pbg test_secret
--- This is a test secret
How Hiera decrypts secrets
To prove to yourself that the secret data is actually encrypted, run the following command to
see what it looks like in the data le on disk:
cat data/secret.eyaml
test_secret: ENC[GPG,hQEMA4+8DyxHKVTrAQf/QQPL4zD2kkU7T+FhaEdptu68RA
Of course, the actual ciphertext will be dierent for you, since you're using a dierent
encrypon key. The point is, though, the message is completely scrambled. GnuPG's
encrypon algorithms are extremely strong; even using every computer on Earth
simultaneously, it would take (on average) many mes the current age of the Universe to
unscramble data encrypted with a 2,048-bit key. (Or, to put it a dierent way, the chances of
decrypng the data within a reasonable amount of me are many billions to one.)
When you reference a Hiera key such as test_secret in your manifest, what happens
next? Hiera consults its list of data sources congured in hiera.yaml. The rst source in
the hierarchy is secret.eyaml, which contains the key we're interested in (test_secret).
Here's the value:
ENC[GPG,hQEMA4 … EEU4cw==]
The ENC tells Hiera that this is an encrypted value, and the GPG idenes which type of
encrypon is being used (hiera-eyaml supports several encrypon methods, of which GPG
is one). Hiera calls the GPG subsystem to process the encrypted data, and GPG searches the
keyring to nd the appropriate decrypon key. Assuming it nds the key, GPG decrypts the
data and passes the result back to Hiera, which returns it to Puppet, and the result is the
This is a test secret
Managing data with Hiera
[ 102 ]
The beauty of the system is that all of this complexity is hidden from you; all you have to
do is call the funcon lookup('test_secret', String) in your manifest, and you get
the answer.
Editing or adding encrypted secrets
If the secret data is stored in encrypted form, you might be wondering how to edit it when
you want to change the secret value. Fortunately, there's a way to do this. Recall that when
you rst entered the secret data, you used the following command:
sudo /opt/puppetlabs/puppet/bin/eyaml edit --gpg-always-trust --gpg-
recipients=puppet@cat-pictures.com data/secret.eyaml
If you run the same command again, you'll nd that you're looking at your original plaintext
(along with some explanatory comments):
test_secret: DEC(1)::GPG[This is a test secret]!
You can edit the This is a test secret string (make sure to leave everything else
exactly as it is, including the DEC::GPG[]! delimiters). When you save the le and close the
editor, the data will be re-encrypted using your key, if it has changed.
Don't remove the (1) in parentheses aer DEC; it tells Hiera that this is an exisng secret,
not a new one. As you add more secrets to this le, they will be idened with increasing
For convenience of eding, I suggest you make a shell script, called something like /usr/
local/bin/eyaml_edit, which runs the eyaml edit command. There's an example on
your Vagrant box, at /examples/eyaml_edit.sh, which you can copy to /usr/local/
bin and edit (as before, substute the gpg-recipients email address with the one
associated with your GPG key):
/opt/puppetlabs/puppet/bin/eyaml edit --gpg-always-trust --gpg-
recipients=puppet@cat-pictures.com /etc/puppetlabs/code/environments/
Now, whenever you need to edit your secret data, you can simply run the following
sudo eyaml_edit
To add a new secret, add a line like this:
new_secret: DEC::GPG[Somebody wake up Hicks]!
When you save and quit the editor, the newly-encrypted secret will be stored in the data le.
Chapter 6
[ 103 ]
Distributing the decryption key
Now that your Puppet manifests use encrypted Hiera data, you'll need to make sure that
each node running Puppet has a copy of the decrypon key. Export the key to a text le
using the following command (use your key's email address, of course):
sudo sh -c 'gpg --export-secret-key -a puppet@cat-pictures.com >key.
Copy the key.txt le to any nodes which need the key, and run the following command to
import it:
sudo gpg --import key.txt
sudo rm key.txt
Make sure that you delete all copies of the text le once you have imported the key.
Important note
Because all Puppet nodes have a copy of the decrypon key, this method only
protects your secret data from someone who does not have access to the
nodes. It is sll considerably beer than pung secret data in your manifests
in plaintext, but it has the disadvantage that someone with access to a node
can decrypt, modify, and re-encrypt the secret data. For improved security you
should use a secrets management system where the node does not have the key,
and Puppet has read-only access to secrets. Some opons here include Vault,
from Hashicorp, and Summon, from Conjur.
In this chapter we've outlined some of the problems with maintaining conguraon data in
Puppet manifests, and introduced Hiera as a powerful soluon. We've seen how to congure
Puppet to use the Hiera data store, and how to query Hiera keys in Puppet manifests using
We've looked at how to write Hiera data sources, including string, array, and hash data
structures, and how to interpolate values into Hiera strings using lookup(), including
Puppet facts and other Hiera data, and how to duplicate Hiera data structures using
alias(). We've learned how Hiera's hierarchy works, and how to congure it using the
hiera.yaml le.
We've seen how our example Puppet infrastructure is congured to use Hiera data, and
demonstrated the process by looking up a data value in a Puppet manifest. In case of
problems, we also looked at some common Hiera errors, and we've discussed rules of
thumb about when to put data into Hiera.
Managing data with Hiera
[ 104 ]
We've explored using Hiera data to create resources, using an each loop over an array or
hash. Finally, we've covered using encrypted data with Hiera, using the hiera-eyaml-gpg
backend, and we've seen how to create a GnuPG key and use it to encrypt a secret value, and
retrieve it again via Puppet. We've explored the process Hiera uses to nd and decrypt secret
data, developed a simple script to make it easy to edit encrypted data les, and outlined a
basic way to distribute the decrypon key to mulple nodes.
In the next chapter, we'll look at how to nd and use public modules from Puppet Forge;
how to use public modules to manage soware including Apache, MySQL, and archive les;
how to use the r10k tool to deploy and manage third-party modules; and how to write and
structure your own modules.
[ 105 ]
Mastering modules
There are no big problems, there are just a lot of little problems.
—Henry Ford
In this chapter you'll learn about Puppet Forge, the public repository for Puppet modules,
and you'll see how to install and use third-party modules from Puppet Forge, using the
r10k module management tool. You'll see examples of how to use three important Forge
modules: puppetlabs/apache, puppetlabs/mysql, and puppet/archive. You'll
be introduced to some useful funcons provided by puppetlabs/stdlib, the Puppet
standard library. Finally, working through a complete example, you'll learn how to develop
your own Puppet module from scratch, how to add appropriate metadata for your module,
and how to upload it to Puppet Forge.
Mastering modules
[ 106 ]
Using Puppet Forge modules
Although you could write your own manifests for everything you want to manage, you can
save yourself a lot of me and eort by using public Puppet modules wherever possible. A
module in Puppet is a self-contained unit of shareable, reusable code, usually designed to
manage one parcular service or piece of soware, such as the Apache web server.
What is the Puppet Forge?
The Puppet Forge is a public repository of Puppet modules, many of them ocially
supported and maintained by Puppet and all of which you can download and use.
You can browse the Forge at the following URL:
One of the advantages of using a well-established tool like Puppet is that there are a large
number of mature public modules available, which cover the most common soware you're
likely to need. For example, here is a small selecon of the things you can manage with
public modules from Puppet Forge:
MySQL/PostgreSQL/SQL Server
Amazon AWS
Git repos
Firewalls (via iptables)
Finding the module you need
The Puppet Forge home page has a search bar at the top. Type what you're looking for into
this box, and the website will show you all the modules which match your search keywords.
Oen, there will be more than one result, so how do you decide which module to use?
Chapter 7
[ 107 ]
The best choice is a Puppet Supported module, if one is available. These are ocially
supported and maintained by Puppet, and you can be condent that supported modules
will work with a wide range of operang systems and Puppet versions. Supported modules
are indicated by a yellow SUPPORTED ag in search results, or you can browse the list of all
supported modules at the following URL:
The next best opon is a Puppet Approved module. While not ocially supported, these
modules are recommended by Puppet and have been checked to make sure they follow best
pracces and meet certain quality standards. Approved modules are indicated by a green
APPROVED ag in search results, or you can browse the list of all approved modules at the
following URL:
Assuming that a Puppet-Supported or Puppet-Approved module is not available, another
useful way to choose modules is by looking at the number of downloads. Selecng the Most
Downloads tab on the Puppet Forge search results page will sort the results by downloads,
with the most popular modules rst. The most-downloaded modules are not necessarily the
best, of course, but they're usually a good place to start.
It's also worth checking the latest release date for modules. If the module you're looking at
hasn't been updated in over a year, it may be beer to go with a more acvely-maintained
module, if one is available. Clicking on the Latest release tab will sort search results by the
most recently updated.
You can also lter search results by operang system support and Puppet version
compability; this can be very useful for nding a module that works with your system.
Having chosen the module you want, it's me to add it to your Puppet infrastructure.
Using r10k
In the past, many people used to download Puppet Forge modules directly and check a copy
of them into their codebase, eecvely forking the module repo (and some sll do this).
There are many drawbacks to this approach. One is that your codebase becomes cluered
with code that is not yours, and this can make it dicult to search for the code you want.
Another is that it's dicult to test your code with dierent versions of public modules,
without creang your own Git branches, redownloading the modules, and so on. You also
won't get future bug xes and improvements from the Puppet Forge modules unless you
manually update your copies. In many cases, you will need to make small changes or xes
to the modules to use them in your environment, and your version of the module will then
diverge from the upstream version, storing up maintenance problems for the future.
Mastering modules
[ 108 ]
A much beer approach to module management, therefore, is to use the r10k tool,
which eliminates these problems. Instead of downloading the modules you need directly
and adding them to your codebase, r10k installs your required modules on each Puppet-
managed node, using a special text le called a Puppeile. r10k will manage the contents
of your modules/ directory, based enrely on the Puppeile metadata. The module code
is never checked into your codebase, but always downloaded from the Puppet Forge when
requested. So you can stay up to date with the latest releases if you want, or pin each
module to a specied version which you know works with your manifest.
r10k is the de facto standard module manager for Puppet deployments, and we'll be using it
to manage modules throughout the rest of this book.
In this example, we'll use r10k to install the puppetlabs/stdlib module. The Puppeile
in the example repo contains a list of all the modules we'll use in this book. Here it is (we'll
look more closely at the syntax in a moment):
forge 'http://forge.puppetlabs.com'
mod 'garethr/docker', '5.3.0'
mod 'puppet/archive', '1.3.0'
mod 'puppet/staging', '2.2.0'
mod 'puppetlabs/apache', '2.0.0'
mod 'puppetlabs/apt', '3.0.0'
mod 'puppetlabs/aws', '2.0.0'
mod 'puppetlabs/concat', '4.0.1'
mod 'puppetlabs/docker_platform', '2.2.1'
mod 'puppetlabs/mysql', '3.11.0'
mod 'puppetlabs/stdlib', '4.17.1'
mod 'stahnma/epel', '1.2.2'
mod 'pbg_ntp',
:git => 'https://github.com/bitfield/pbg_ntp.git',
:tag => '0.1.4'
Follow these steps:
1. Run the following commands to clear out your modules/ directory, if there's
anything in it (make sure you have backed up anything here you want to keep):
cd /etc/puppetlabs/code/environments/pbg
sudo rm -rf modules/
2. Run the following command to have r10k process the example Puppeile here and
install your requested modules:
sudo r10k puppetfile install --verbose
Chapter 7
[ 109 ]
r10k downloads all the modules listed in the Puppeile into the modules/ directory. All
modules in this directory will be automacally loaded by Puppet and available for use in your
manifests. To test that the stdlib module is correctly installed, run the following command:
sudo puppet apply --environment pbg -e "notice(upcase('hello'))"
Notice: Scope(Class[main]): HELLO
The upcase funcon, which converts its string argument to uppercase, is part of the stdlib
module. If this doesn't work, then stdlib has not been properly installed. As in previous
examples, we're using the --environment pbg switch to tell Puppet to look for code,
modules, and data in the /etc/puppetlabs/code/environments/pbg directory.
Understanding the Puppetle
The example Puppeile begins with the following:
forge 'http://forge.puppetlabs.com'
The forge statement species the repository where modules should be retrieved from.
There follows a group of lines beginning with mod:
mod 'garethr/docker', '5.3.0'
mod 'puppet/archive', '1.3.0'
mod 'puppet/staging', '2.2.0'
The mod statement species the name of the module (puppetlabs/stdlib) and the
specic version of the module to install (4.17.0).
Managing dependencies with generate-puppetle
r10k does not automacally manage dependencies between modules. For example,
the puppetlabs/apache module depends on having both puppetlabs/stdlib and
puppetlabs/concat installed. r10k will not automacally install these for you unless you
specify them, so you also need to include them in your Puppeile.
However, you can use the generate-puppetfile tool to nd out what dependencies you
need so that you can add them to your Puppeile.
1. Run the following command to install the generate-puppetfile gem:
sudo gem install generate-puppetfile
Mastering modules
[ 110 ]
2. Run the following command to generate the Puppeile for a list of specied
modules (list all the modules you need on the command line, separated by spaces):
generate-puppetfile puppetlabs/docker_platform
Installing modules. This may take a few minutes.
Your Puppetfile has been generated. Copy and paste between the
forge 'http://forge.puppetlabs.com'
# Modules discovered by generate-puppetfile
mod 'garethr/docker', '5.3.0'
mod 'puppetlabs/apt', '3.0.0'
mod 'puppetlabs/docker_platform', '2.2.1'
mod 'puppetlabs/stdlib', '4.17.1'
mod 'stahnma/epel', '1.2.2'
3. Run the following command to generate a list of updated versions and dependencies
for an exisng Puppeile:
generate-puppetfile -p /etc/puppetlabs/code/environments/pbg/
This is an extremely useful tool both for nding dependencies you need to specify in your
Puppeile and for keeping your Puppeile up to date with the latest versions of all the
modules you use.
Using modules in your manifests
Now that we know how to nd and install public Puppet modules, let's see how to use them.
We'll work through a few examples, using the puppetlabs/mysql module to set up a
MySQL server and database, using the puppetlabs/apache module to set up an Apache
website, and using puppet/archive to download and unpack a compressed archive. Aer
you've tried out these examples, you should feel quite condent in your ability to nd an
appropriate Puppet module, add it to your Puppetfile, and deploy it with r10k.
Using puppetlabs/mysql
Follow these steps to run the puppetlabs/mysql example:
1. If you've previously followed the steps in the Using r10k secon, the required
module will already be installed. If not, run the following commands to install it:
cd /etc/puppetlabs/code/environments/pbg
sudo r10k puppetfile install
Chapter 7
[ 111 ]
2. Run the following command to apply the manifest:
sudo puppet apply --environment=pbg /examples/module_mysql.pp
Notice: Compiled catalog for ubuntu-xenial in environment pbg in
0.89 seconds
Notice: /Stage[main]/Mysql::Server::Config/File[/etc/mysql]/
ensure: created
Notice: /Stage[main]/Mysql::Server::Config/File[/etc/mysql/
conf.d]/ensure: created
Notice: /Stage[main]/Mysql::Server::Config/File[mysql-config-
file]/ensure: defined content as '{md5}44e7aa974ab98260d7d013a2087
Notice: /Stage[main]/Mysql::Server::Install/Package[mysql-server]/
ensure: created
Notice: /Stage[main]/Mysql::Server::Root_password/Mysql_
user[root@localhost]/password_hash: password_hash changed '' to
Notice: /Stage[main]/Mysql::Server::Root_password/File[/root/.
my.cnf]/ensure: defined content as '{md5}4d59f37fc8a385c9c50f8bb32
Notice: /Stage[main]/Mysql::Client::Install/Package[mysql_client]/
ensure: created
Notice: /Stage[main]/Main/Mysql::Db[cat_pictures]/Mysql_
database[cat_pictures]/ensure: created
Notice: /Stage[main]/Main/Mysql::Db[cat_pictures]/Mysql_
user[greebo@localhost]/ensure: created
Notice: /Stage[main]/Main/Mysql::Db[cat_pictures]/Mysql_
grant[greebo@localhost/cat_pictures.*]/ensure: created
Notice: Applied catalog in 79.85 seconds
Let's take a look at the example manifest (module_mysql.pp). The rst part installs the
MySQL server itself, by including the class mysql::server:
# Install MySQL and set up an example database
include mysql::server
The mysql::server class accepts a number of parameters, most of which we needn't
worry about for now, but we would like to set a couple of them for this example. Although
you can set the values for class parameters directly in your Puppet manifest code, just as
you would for resource aributes, I'll show you a beer way to do it: using Hiera's automac
parameter lookup mechanism.
Mastering modules
[ 112 ]
We menoned briey in Chapter 6, Managing data with Hiera, that Hiera can
supply values for class and module parameters, but how does it work, exactly?
When you include a class x which takes a parameter y, Puppet automacally
searches Hiera for any keys matching the name x::y. If it nds one, it uses
that value for the parameter. Just as with any other Hiera data, you can use the
hierarchy to set dierent values for dierent nodes, roles, or operang systems.
In this example, our parameters are set in the example Hiera data le (/etc/puppetlabs/
mysql::server::root_password: 'hairline-quotient-inside-tableful'
mysql::server::remove_default_accounts: true
The root_password parameter, as you'd expect, sets the password for the MySQL root
user. We also enable remove_default_accounts, which is a security feature. MySQL
ships with various default user accounts for tesng purposes, which should be turned o in
producon. This parameter disables these default accounts.
Note that although we've specied the password in plain text for
the purposes of clarity, in your producon manifests, this should
be encrypted, just as with any other credenals or secret data
(see Chapter 6, Managing data with Hiera).
Next comes a resource declaraon:
mysql::db { 'cat_pictures':
user => 'greebo',
password => 'tabby',
host => 'localhost',
grant => ['SELECT', 'UPDATE'],
As you can see, this looks just like the built-in resources we've used before, such as the file
and package resources. In eect, the mysql module has added a new resource type to
Puppet: mysql::db. This resource models a specic MySQL database: cat_pictures in
our example.
The tle of the resource is the name of the database, in this case, cat_pictures. There
follows a list of aributes. The user, password, and host aributes specify that the user
greebo should be allowed to connect to the database from localhost using the password
tabby. The grant aribute species the MySQL privileges that the user should have:
SELECT and UPDATE on the database.
Chapter 7
[ 113 ]
When this manifest is applied, Puppet will create the cat_pictures database and
set up the greebo user account to access it. This is a very common paern for Puppet
manifests which manage an applicaon: usually, the applicaon needs some sort of database
to store its state, and user credenals to access it. The mysql module lets you congure this
very easily.
So we can now see the general principles of using a Puppet Forge module:
We add the module and its dependencies to our Puppetfile and deploy it using
We include the class in our manifest, supplying any required parameters as Hiera
Oponally, we add one or more resource declaraons of a custom resource type
dened by the module (in this case, a MySQL database)
Almost all Puppet modules work in a similar way. In the next secon, we'll look at some key
modules which you're likely to need in the course of managing servers with Puppet.
Using puppetlabs/apache
Most applicaons have a web interface of some kind, which usually requires a web server,
and the venerable Apache remains a popular choice. The puppetlabs/apache module not
only installs and congures Apache, but also allows you to manage virtual hosts (individual
websites, such as the frontend for your applicaon).
Here's an example manifest which uses the apache module to create a simple virtual host
serving an image le (module_apache.pp):
include apache
apache::vhost { 'cat-pictures.com':
port => '80',
docroot => '/var/www/cat-pictures',
docroot_owner => 'www-data',
docroot_group => 'www-data',
file { '/var/www/cat-pictures/index.html':
content => "<img
owner => 'www-data',
group => 'www-data',
Mastering modules
[ 114 ]
Follow these steps to apply the manifest:
1. If you've previously followed the steps in the Using r10k secon, the required
module will already be installed. If not, run the following commands to install it:
cd /etc/puppetlabs/code/environments/pbg
sudo r10k puppetfile install
2. Run the following command to apply the manifest:
sudo puppet apply --environment=pbg /examples/module_apache.pp
3. To test the new website, point your browser to (for Vagrant users; if you're not using
the Vagrant box, browse to port 80 on the server you're managing with Puppet)
You should see a picture of a happy cat:
Let's go through the manifest and see how it works in detail.
1. It starts with the include declaraon which actually installs Apache on the server
include apache
2. There are many parameters you could set for the apache class, but in this example,
we only need to set one, and as with the other examples, we set it using Hiera data
in the example Hiera le:
apache::default_vhost: false
This disables the default Apache 2 Test Page virtual host.
Chapter 7
[ 115 ]
3. Next comes a resource declaraon for an apache::vhost resource, which creates
an Apache virtual host or website.
apache::vhost { 'cat-pictures.com':
port => '80',
docroot => '/var/www/cat-pictures',
docroot_owner => 'www-data',
docroot_group => 'www-data',
The tle of the resource is the domain name which the virtual host will respond to
(cat-pictures.com). The port tells Apache which port to listen on for requests.
The docroot idenes the pathname of the directory where Apache will nd the
website les on the server. Finally, the docroot_owner and docroot_group
aributes specify the user and group which should own the docroot/ directory.
4. Finally, we create an index.html le to add some content to the website, in this
case, an image of a happy cat.
file { '/var/www/cat-pictures/index.html':
content => "<img
owner => 'www-data',
group => 'www-data',
Note that port 80 on the Vagrant box is mapped to port 8080 on your local
machine, so browsing to http://localhost:8080 is the equivalent
of browsing directly to port 80 on the Vagrant box. If for some reason you
need to change this port mapping, edit your Vagrantfile (in the Puppet
Beginner's Guide repo) and look for the following line:
config.vm.network "forwarded_port", guest: 80, host:
Change these sengs as required and run the following command on your
local machine in the PBG repo directory:
vagrant reload
Mastering modules
[ 116 ]
Using puppet/archive
While installing soware from packages is a common task, you'll also occasionally need to
install soware from archive les, such as a tarball (a .tar.gz le) or ZIP le. The puppet/
archive module is a great help for this, as it provides an easy way to download archive les
from the Internet, and it can also unpack them for you.
In the following example, we'll use the puppet/archive module to download and unpack
the latest version of the popular WordPress blogging soware. Follow these steps to apply
the manifest:
1. If you've previously followed the steps in the Using r10k secon, the required
module will already be installed. If not, run the following commands to install it:
cd /etc/puppetlabs/code/environments/pbg
sudo r10k puppetfile install
2. Run the following command to apply the manifest:
sudo puppet apply --environment=pbg /examples/module_archive.pp
Notice: Compiled catalog for ubuntu-xenial in environment
production in 2.50 seconds
Notice: /Stage[main]/Main/Archive[/tmp/wordpress.tar.gz]/ensure:
download archive from https://wordpress.org/latest.tar.gz to /tmp/
wordpress.tar.gz and extracted in /var/www with cleanup
Unlike the previous modules in this chapter, there's nothing to install with archive, so we
don't need to include the class itself. All you need to do is declare an archive resource.
Let's look at the example in detail to see how it works (module_archive.pp):
archive { '/tmp/wordpress.tar.gz':
ensure => present,
extract => true,
extract_path => '/var/www',
source => 'https://wordpress.org/latest.tar.gz',
creates => '/var/www/wordpress',
cleanup => true,
1. The tle gives the path to where you want the archive le to be downloaded (/tmp/
wordpress.tar.gz). Assuming you don't need to keep the archive le aer it's
been unpacked, it's usually a good idea to put it in /tmp.
2. The extract aribute determines whether or not Puppet should unpack the
archive; this should usually be set to true.
Chapter 7
[ 117 ]
3. The extract_path aribute species where to unpack the contents of the archive.
In this case, it makes sense to extract it to a subdirectory of /var/www/, but this
will vary depending on the nature of the archive. If the archive le contains soware
which will be compiled and installed, for example, it may be a good idea to unpack it
in /tmp/, so that the les will be automacally cleaned up aer the next reboot.
4. The source aribute tells Puppet where to download the archive from, usually (as
in this example) a web URL.
5. The creates aribute works exactly the same way as creates on an exec
resource, which we looked at in Chapter 4, Understanding Puppet resources. It
species a le which unpacking the archive will create. If this le exists, Puppet
knows the archive has already been unpacked, so it does not need to unpack it
6. The cleanup aribute tells Puppet whether or not to delete the archive le once
it has been unpacked. Usually, this will be set to true, unless you need to keep the
archive around or unless you don't need to unpack it in the rst place.
Once the le has been deleted by cleanup, Puppet won't redownload the
archive le /tmp/wordpress.tar.gz the next me you apply the manifest,
even though it has ensure => present. The creates clause tells Puppet
that the archive has already been downloaded and extracted.
Exploring the standard library
One of the oldest-established Puppet Forge modules is puppetlabs/stdlib, the ocial
Puppet standard library. We looked at this briey earlier in the chapter when we used it as
an example of installing a module with r10k, but let's look more closely now and see what
the standard library provides and where you might use it.
Rather than managing some specic soware or le format, the standard library aims to
provide a set of funcons and resources which could be useful in any piece of Puppet code.
Consequently, well-wrien Forge modules use the facilies of the standard library rather
than implemenng their own ulity funcons which do the same thing.
You should do the same in your own Puppet code: when you need a parcular piece of
funconality, check the standard library rst to see if it solves your problem rather than
implemenng it yourself.
Mastering modules
[ 118 ]
Before trying the examples in this secon, make sure the stdlib module is installed by
following these steps: If you've previously followed the steps in the Using r10k secon, the
required module will already be installed. If not, run the following commands to install it:
cd /etc/puppetlabs/code/environments/pbg
sudo r10k puppetfile install
Safely installing packages with ensure_packages
As you know, you can install a package using the package resource, like this (package.pp):
package { 'cowsay':
ensure => installed,
But what happens if you also install the same package in another class in a dierent part of
your manifest? Puppet will refuse to run, with an error like this:
Error: Evaluation Error: Error while evaluating a Resource Statement,
Duplicate declaration: Package[cowsay] is already declared in file /
examples/package.pp:1; cannot redeclare at /examples/package.pp:4 at /
examples/package.pp:4:1 on node ubuntu-xenial
If both of your classes really require the package, then you have a problem. You could create
a class which simply declares the package, and then include that in both classes, but that is
a lot of overhead for a single package. Worse, if the duplicate declaraon is in a third-party
module, it may not be possible, or advisable, to change that code.
What we need is a way to declare a package which will not cause a conict if that package
is also declared somewhere else. The standard library provides this facility in the ensure_
packages() funcon. Call ensure_packages() with an array of package names, and they
will be installed if they are not already declared elsewhere (package_ensure.pp):
To apply this example, run the following command:
sudo puppet apply --environment=pbg /examples/package_ensure.pp
You can try all the remaining examples in this chapter in the same way. Make sure you supply
the --environment=pbg switch to puppet apply, as the necessary modules are only
installed in the pbg environment.
Chapter 7
[ 119 ]
If you need to pass addional aributes to the package resource, you can supply them
in a hash as the second argument to ensure_packages(), like this (package_ensure_
'ensure' => 'latest',
Why is this beer than using the package resource directly? When you declare the same
package resource in more than one place, Puppet will give an error message and refuse
to run. If the package is declared by ensure_packages(), however, Puppet will run
Since it provides a safe way to install packages without resource conicts, you should always
use ensure_packages() instead of the built-in package resource. It is certainly essenal
if you're wring modules for public release, but I recommend you use it in all your code.
We'll use it to manage packages throughout the rest of this book.
Modifying les in place with le_line
Oen, when managing conguraon with Puppet, we would like to change or add a
parcular line to a le, without incurring the overhead of managing the whole le with
Puppet. Somemes it may not be possible to manage the whole le in any case, as another
Puppet class or another applicaon may be managing it. We could write an exec resource
to modify the le for us, but the standard library provides a resource type for exactly this
purpose: file_line.
Here's an example of using the file_line resource to add a single line to a system cong
le (file_line.pp):
file_line { 'set ulimits':
path => '/etc/security/limits.conf',
line => 'www-data - nofile 32768',
If there is a possibility that some other Puppet class or applicaon may need to modify the
target le, use file_line instead of managing the le directly. This ensures that your class
won't conict with any other aempts to control the le.
Mastering modules
[ 120 ]
You can also use file_line to nd and modify an exisng line, using the match aribute
file_line { 'adjust ulimits':
path => '/etc/security/limits.conf',
line => 'www-data - nofile 9999',