Cloud Foundry: The Definitive Guide Foundry Duncan C. E. Winn
User Manual:
Open the PDF directly: View PDF .
Page Count: 324 [warning: Documents this large are best viewed by clicking the View PDF Link!]
- Copyright
- Table of Contents
- Foreword
- Preface
- Chapter 1. The Cloud-Native Platform
- Chapter 2. Concepts
- Undifferentiated Heavy Lifting
- The Cloud Operating System
- Do More
- The Application as the Unit of Deployment
- Using cf push Command to Deploy
- Staging
- Self-Service Application Life Cycle
- The Twelve-Factor Contract
- Release Engineering through BOSH
- Built-In Resilience and Fault Tolerance
- Aggregated Streaming of Logs and Metrics
- Security
- UAA Management
- Organizations and Spaces
- Domains Hosts and Routes
- Rolling Upgrades and Blue/Green Deployments
- Summary
- Chapter 3. Components
- Chapter 4. Preparing Your Cloud Foundry Environment
- Chapter 5. Installing and Configuring Cloud Foundry
- Chapter 6. Diego
- Chapter 7. Routing Considerations
- Chapter 8. Containers, Containers, Containers
- Chapter 9. Buildpacks and Docker
- Chapter 10. BOSH Concepts
- Chapter 11. BOSH Releases
- Chapter 12. BOSH Deployments
- Chapter 13. BOSH Components and Commands
- Chapter 14. Debugging Cloud Foundry
- Cloud Foundry Acceptance Tests
- Logging
- Typical Failure Scenarios
- Scenario One: The App Is Not Reachable
- Scenario Two: Network Address Translation Instance Deleted (Network Failure)
- Scenario Three: Security Group Misconfiguration That Blocks Ingress Traffic
- Scenario Four: Invoking High Memory Usage That Kills a Container
- Scenario Five: Route Collision
- Scenario 6: Release Job Process Failures
- Scenario 7: Instance Group Failure
- Summary
- Chapter 15. User Account and Authentication Management
- Chapter 16. Designing for Resilience, Planning for Disaster
- Chapter 17. Cloud Foundry Roadmap
- Index
- About the Author
- Colophon
Duncan C. E. Winn
DEVELOP, DEPLOY, AND SCALE
Cloud Foundr y
The Definitive Guide
Duncan C. E. Winn
Cloud Foundry:
The Definitive Guide
Develop, Deploy, and Scale
Boston Farnham Sebastopol Tokyo
Beijing Boston Farnham Sebastopol Tokyo
Beijing
978-1-491-93243-8
[LSI]
Cloud Foundry: The Definitive Guide
by Duncan C. E. Winn
Copyright © 2017 Duncan Winn. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc. , 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or corporate@oreilly.com.
Editors: Brian Anderson and Virginia Wilson Indexer: Judy McConville
Production Editor: Melanie Yarbrough Interior Designer: David Futato
Copyeditor: Octal Publishing, Inc. Cover Designer: Karen Montgomery
Proofreader: Christina Edwards Illustrator: Rebecca Demarest
May 2017: First Edition
Revision History for the First Edition
2017-05-22: First Release
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Cloud Foundry: The Definitive Guide,
the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
To my daughters, Maya and Eva.
Dream BIG. The world awaits you...
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1. The Cloud-Native Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Why You Need a Cloud-Native Platform 1
Cloud-Native Platform Concepts 2
The Structured Platform 4
The Opinionated Platform 4
The Open Platform 5
Summary 5
2. Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Undifferentiated Heavy Lifting 7
The Cloud Operating System 8
Do More 9
The Application as the Unit of Deployment 10
Using cf push Command to Deploy 11
Staging 11
Self-Service Application Life Cycle 12
The Twelve-Factor Contract 13
Release Engineering through BOSH 14
Built-In Resilience and Fault Tolerance 15
Self-Healing Processes 16
Self-Healing VMs 16
Self-Healing Application Instance Count 16
Resiliency Through Availability Zones 16
Aggregated Streaming of Logs and Metrics 17
v
Security 19
Distributed System Security 19
Environmental Risk Factors for Advanced Persistent Threats 20
Challenge of Minimal Change 20
The Three Rs of Enterprise Security 21
UAA Management 23
Organizations and Spaces 23
Orgs 24
Spaces 24
Resource Allocation 25
Domains Hosts and Routes 25
Route 25
Domains 26
Context Path–Based Routing 26
Rolling Upgrades and Blue/Green Deployments 27
Summary 27
3. Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Component Overview 30
Routing via the Load Balancer and GoRouter 31
User Management and the UAA 32
The Cloud Controller 33
System State 34
The Application Life-Cycle Policy 34
Application Execution 35
Diego 35
Garden and runC 35
Metrics and Logging 36
Metron Agent 36
Loggregator 36
Messaging 37
Additional Components 37
Stacks 37
A Marketplace of On-Demand Services 37
Buildpacks and Docker Images 39
Infrastructure and the Cloud Provider Interface 40
The Cloud Foundry GitHub Repository 40
Summary 41
4. Preparing Your Cloud Foundry Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Installation Steps 43
Non-technical Considerations 44
vi | Table of Contents
Team Structure: Platform Operations for the Enterprise 44
Deployment Topology 46
Cloud Foundry Dependencies and Integrations 47
IaaS and Infrastructure Design 48
Designing for Resilience 50
Sizing and Scoping the Infrastructure 50
Setting Up an AWS VPC 55
Jumpbox 57
Networking Design and Routing 58
Using Static IPs 59
Subnets 60
Security Groups 61
Setting Up the Load Balancer 62
Setting Up Domains and Certificates 62
Summary 63
5. Installing and Configuring Cloud Foundry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Installation Steps 66
Installing Cloud Foundry 67
Changing Stacks 73
Growing the Platform 73
Validating Platform Integrity in Production 73
Start with a Sandbox 73
Production Verification Testing 74
Logical Environment Structure 75
Pushing Your First App 77
Summary 77
6. Diego. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Why Diego? 79
A Brief Overview of How Diego Works 82
Essential Diego Concepts 83
Action Abstraction 84
Composable Actions 85
Layered Architecture 87
Interacting with Diego 88
CAPI 89
Staging Workflow 90
The CC-Bridge 93
Logging and Traffic Routing 97
Diego Components 97
The BBS 98
Table of Contents | vii
Diego Cell Components 101
The Diego Brain 104
The Access VM 106
The Diego State Machine and Workload Life Cycles 107
The Application Life Cycle 109
Task Life Cycle 111
Additional Components and Concepts 111
The Route-Emitter 112
Consul 112
Application Life-Cycle Binaries 112
Putting It All Together 114
Summary 117
7. Routing Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Routing Primitives 119
Routes 120
Hostnames 122
Domains 123
Context Path Routing 124
Routing Components Overview 125
Routing Flow 127
Route-Mapping Flow 127
Load Balancer Considerations 128
Setting Request Header Fields 128
WebSocket Upgrades 129
The PROXY Protocol 130
TLS Termination and IPSec 130
GoRouter Considerations 131
Routing Table 131
Router and Route High Availability 131
Router Instrumentation and Logging 132
Sticky Sessions 133
The TCPRouter 134
TCP Routing Management Plane 134
TCPRouter Configuration Steps 135
Route Services 136
Route Service Workflow 137
Route Service Use Cases 138
Summary 139
8. Containers, Containers, Containers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
What Is a Container? 141
viii | Table of Contents
Container Fervor 143
Linux Containers 144
Namespaces 145
CGroups 148
Disk Quotas 148
Filesystems 148
Container Implementation in Cloud Foundry 150
Why Garden? 150
OCI and runC 151
Container Scale 153
Container Technologies (and the Orchestration Challenge) 153
Summary 154
9. Buildpacks and Docker. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Why Buildpacks? 156
Why Docker? 157
Buildpacks Explained 158
Staging 159
Detect 160
Compile 160
Release 161
Buildpack Structure 162
Modifying Buildpacks 163
Overriding Buildpacks 163
Using Custom or Community Buildpacks 164
Forking Buildpacks 164
Restaging 164
Packaging and Dependencies 165
Buildpack and Dependency Pipelines 166
Summary 167
10. BOSH Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Release Engineering 169
Why BOSH? 170
The Cloud Provider Interface 172
Infrastructure as Code 172
Creating a BOSH Environment 174
Single-Node versus Distributed BOSH 174
BOSH Lite 175
BOSH Top-Level Primitives 175
Stemcells 176
Releases 177
Table of Contents | ix
Deployments 179
BOSH 2.0 180
Cloud Configuration 180
BOSH Links 187
Orphaned Disks 188
Addons 188
Summary 188
11. BOSH Releases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Release Overview 191
Cloud Foundry BOSH Release 192
BOSH Director BOSH Release 192
Anatomy of a BOSH Release 192
Jobs 193
Packages 196
Src, Blobs, and Blobstores 197
Packaging a Release 199
Compilation VMs 200
Summary 200
12. BOSH Deployments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
YAML Files 201
Understanding YAML Syntax 202
Deployment Manifests 204
Director UUID and Deployment Name 204
Release Names 205
Stemcell 205
Instance Groups 205
Properties 207
Update 208
Credentials 209
Summary 211
13. BOSH Components and Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
The BOSH Director 214
Director Blobstore 215
Director Task, Queue, and Workers 215
Director Database 215
Director Registry 215
BOSH Agent 215
Errand 216
The Command Line Interface 216
x | Table of Contents
The Cloud Provider Interface 216
Health Monitor 216
Resurrector 217
Message Bus (NATS) 217
Creating a New VM 217
Disk Creation 219
Networking Definition 220
The BOSH CLI v2 221
Basic BOSH Commands 221
Summary 223
14. Debugging Cloud Foundry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Cloud Foundry Acceptance Tests 225
Logging 226
Typical Failure Scenarios 228
Configuration Failures 228
Infrastructure Failures 229
Release Job Process Failure 229
Scenario One: The App Is Not Reachable 230
Scenario Two: Network Address Translation Instance Deleted (Network
Failure) 231
Scenario Three: Security Group Misconfiguration That Blocks Ingress Traffic 234
Scenario Four: Invoking High Memory Usage That Kills a Container 236
Scenario Five: Route Collision 241
Scenario 6: Release Job Process Failures 245
Scenario 7: Instance Group Failure 247
Summary 250
15. User Account and Authentication Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Background Information 252
OAuth 2.0 252
UAA Documentation 252
UAA Release 253
UAA Responsibilities 253
Securing Cloud Foundry Components and API Endpoints 253
Securing Service Access for Apps 254
UAA Architecture and Configuration Within Cloud Foundry 255
Instance Groups Governed by the UAA 255
UAA Instance Groups 255
UAA Database 256
UAA Runtime Components 256
UAA Logging and Metrics 256
Table of Contents | xi
Keys, Tokens, and Certificate Rotation 257
User Import 258
Roles and Scopes 259
Scopes 259
Roles 259
Summary 261
16. Designing for Resilience, Planning for Disaster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
High Availability Considerations 263
Extending Cloud Foundry’s Built-In Resiliency 265
Resiliency Through Multiple Cloud Foundry Deployments 266
Resiliency Through Pipelines 267
Data Consistency Through Services 267
HA IaaS Configuration 268
AWS Failure Boundaries 268
vCenter Failure Boundaries 269
Backup and Restore 272
Restoring BOSH 273
Bringing Back Cloud Foundry 274
Validating Platform Integrity in Production 274
Start with a Sandbox 275
Production Verification Testing 275
Summary 276
17. Cloud Foundry Roadmap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
v3 API 277
Multiple Droplets per App 277
Multiple Apps per Droplet (Process Types) 280
Tasks 280
Diego Scheduling 281
Cell Rebalancing 281
Boulders 282
Tracing 283
Containers 283
Network Shaping 283
Container Snapshots 283
Container-to-Container Networking 283
Traffic Resiliency 284
Buildpacks and Staging 284
Multibuildpacks 285
Post-Staging Policy or Step 285
Compiler-Less Rootfs and Stemcells 285
xii | Table of Contents
Foreword
When we think of transformative innovation, it’s easy for our minds to grasp the tan‐
gible and overt technologies—the television, the personal computer, the smartphone.
These inventions are visible, material commodities that serve a physical purpose in
our lives. These technologies start small and then eventually gain widespread adop‐
tion, at which point they change the way we interact and engage with the technology
—and often with the world around us. When we talk about strides in technology to
most people, these are the gadgets they envision: separate objects that can be picked
up, plugged in, and turned off.
But for those of us who are quietly enabling digital transformation across multiple
industries, we know what innovation can look like. It can be invisible and intangible
—the velocity behind a high dive into the pool. The operators and developers of the
world no longer reside in the technology aisle. You are leading the change at every
kind of company across every industry. It’s one thing to demonstrate how a printing
press increases the pace of printing papers exponentially. It’s another thing entirely to
explain how a platform that is not visible has the ability to transform a company’s
ability to compete in a quickly changing marketplace. This book is a resource for you
as you lead the digital revolution within your organization.
This is undoubtedly a technical book devoted to the underlying concepts of Cloud
Foundry and how it works, but it is emblematic of something larger at play. The
author, Duncan Winn, has spent a career helping customers achieve more with tech‐
nology. Most recently, at Pivotal, he helped customers implement, deploy, and get
apps up and running on Cloud Foundry. He saw such incredible results that he took
it upon himself to begin the project of cataloging, explaining, and evangelizing the
technology. Duncan saw the monumental benefit of Cloud Foundry to everyone,
from the business itself right down to the developer. He saw how cloud-native appli‐
cation, architecture, and development are driving and accelerating digital innovation,
and that Cloud Foundry was the invisible platform that could take this process from
days to minutes.
Foreword | xv
Cloud Foundry is dedicated to improving the ability of developers to code and deploy
new applications. The collaborative nature of this open source project facilitates
cooperative, interactive creation, driving innovation. A platform that expedites the
deployment of applications does so with the understanding that an iterative approach
to development enables a user-first mentality. Cloud Foundry’s support of continu‐
ous delivery empowers developers to iterate applications based on user feedback,
eliminating the need for late-night adjustments during limited change windows. It
minimizes risk around release failure because incremental change is easier to perform
and less drastic. In short, Cloud Foundry makes a developer’s job faster and easier.
Cloud Foundry is the standard for application platforms with the noble vision of uni‐
fying the market for enterprise software. Cloud Foundry: The Definitive Guide is an
integral rulebook for building the software of the future and maintaining the
momentum of digital transformation across industries. The power of open source is
self-evident in the potency of Cloud Foundry, with a commitment to sharing and
continuous innovation.
It is my hope that you use this book as your digital transformation encyclopedia.
Read it, revisit it, learn from it, and challenge it. Cloud Foundry is for you.
— Abby Kearns
Executive Director of
Cloud Foundry Foundation
xvi | Foreword
Preface
Cloud Foundry is a platform that helps you develop and deploy applications and
tasks with velocity. Velocity, as a vector quantity, is different from raw speed because
it is direction aware. In our case, direction is based on user feedback. Application
velocity allows you to adopt an iterative approach to development through repeatedly
gaining fast feedback from end users. Ultimately, this approach allows you to align
your products, user requirements, and expectations. This book covers Cloud Foun‐
dry’s technical concepts, providing a breakdown of the various platform components
and how they interrelate. It also will walk you through a typical setup of BOSH (a
release-engineering tool chain) and Cloud Foundry, and unpack the broader consid‐
erations of adopting Cloud Foundry for enterprise workloads.
Like all distributed systems, Cloud Foundry involves various levels of complexity.
Complexity is fine if it is well defined. Cloud Foundry does an excellent job defining
its internal complexity by providing explicit boundaries between each of its compo‐
nents. For the most part, this removes the need for the platform operator to deal with
Cloud Foundry’s internal complexity. However, there are still many other factors to
consider when running Cloud Foundry; for example, choosing your underlying
infrastructure, defining your networking architecture, and establishing your resil‐
iency requirements. These concerns are environmental, relating to the ecosystem in
which Cloud Foundry resides. Getting these concerns right is essential for reliably
handling business critical workloads. Cloud Foundry: The Definitive Guide aims to
tackle these environmental considerations and decision points that are required for
configuring and running Cloud Foundry at scale.
In addition to unpacking the various considerations required for running the tech‐
nology, this book also explores the concepts surrounding Cloud Foundry. My goal is
to provide the necessary content for understanding the following:
• Cloud Foundry’s underlying concepts
Preface | xvii
•How Cloud Foundry works, including the flow of communication between the
distributed components
•How to begin deploying BOSH and Cloud Foundry, including the key configura‐
tion decision points
An understanding of how Cloud Foundry works is vital if you are running business-
critical applications, services, and workloads on the platform. Understanding the role
of each component along with the flow of communication is vital for troubleshooting
platform issues.
Who Should Read This Book
My hope is that you are reading this book because you already believe Cloud Foundry
will enable you to deliver software with higher velocity. If you are unfamiliar with the
high-level concepts of Cloud Foundry and what it enables you to achieve, I suggest
(shameless plug, I know) that you begin by reading Cloud Foundry: The Cloud-Native
Platform. The purpose of that book is to provide an understanding of why you should
care about using a platform to achieve application velocity.
This book is primarily aimed at Cloud Foundry operators who are responsible for
installing, configuring, and operating Cloud Foundry. Cloud Foundry significantly
simplifies the operational concerns of running applications and services. For exam‐
ple, imagine not having to provision infrastructure for every new project, and sys‐
tematically repaving all your deployed machines via rolling upgrades with zero
downtime. Empowering developers to deliver software with significantly less com‐
plexity and overhead is powerful. However, configuring and running Cloud Foundry
itself can involve some complexity.
The team responsible for operating Cloud Foundry is often known as the platform
operations team. This team’s responsibility includes deploying and operating Cloud
Foundry so that application developers can take advantage of its capabilities.
Typical expertise required for the platform operations team is very broad. A list of
required roles and skills is discussed at length in “Team Structure: Platform Opera‐
tions for the Enterprise” on page 44. The list is diverse because Cloud Foundry lever‐
ages different technology layers across your infrastructure. Many platform operators
have expertise with a subset of these disciplines. The challenge of adopting technol‐
ogy that touches so many varied layers is understanding each layer and how it should
be configured to interact with the others. An additional objective of this book is
therefore to enable platform operators to quickly expand their areas of expertise, and
to gain understanding of the salient points of any unfamiliar territory.
The approaches in this text come from real experience with numerous companies
from many different industries. The companies I have worked with all have
xviii | Preface
embarked on a journey toward establishing Cloud Foundry at scale. If you are look‐
ing for a way of introducing Cloud Foundry into your development and deployment
pipeline, this book is for you.
Cloud Foundry also makes developers’ lives easier, abstracting away the middleware,
OS, and infrastructure concerns so that they can focus on just their application and
desired backing services. Using Cloud Foundry is straightforward and sufficiently
covered in the Cloud Foundry documentation. Therefore, developer usage of Cloud
Foundry is not a focus of this book. With that said, many developers I have worked
with find Cloud Foundry’s technology interesting and often desire a deeper under‐
standing of the operational aspects detailed in this book.
This is a technically focused book intended for platform operators. Therefore, you
should have some of the following basic system administrative skills:
• Know how to open a terminal window in your OS of choice
• Know how to install software tools such as command-line interfaces (CLIs)
•Know how to use secure shell (SSH) and work with virtual machines (VMs)
•Know how to work with source code from GitHub by both downloading and
installing it (Mac users—Homebrew should be your go-to tool of choice here)
When I talk about specific Cloud Foundry tools such as BOSH, I will link you to an
appropriate download location (often a GitHub repository). You can then follow the
instructions in the linked repositories. For Mac users, you can also install most Cloud
Foundry tools via Homebrew.
Why I Wrote This Book
As a platform architect for Pivotal, I have worked with numerous companies from
various industries to help them install and configure Cloud Foundry. Like most plat‐
form operators, I began with knowledge of a subset of the technology; everything else
I learned on the job. In my early days with Cloud Foundry, there are two key things
that would have helped me:
•An understanding of the critical configuration considerations for both the plat‐
form and the underlying distributed infrastructure
•A reference architecture detailing the rationale and trade-offs for all implementa‐
tion decisions
To address the first point, Cloud Foundry has forged a fantastic collaborative com‐
munity from numerous companies and industries. I have been fortunate to work
alongside an incredibly talented team with a diverse skill set, both within Pivotal and
from other companies within the Cloud Foundry community. The reason I wrote this
Preface | xix
book is to document the best practices and considerations I have learned through
working with Cloud Foundry.
Regarding the second point, as a consultant working across numerous industries, I
see the same issues and questions coming up with every new engagement. It is there‐
fore my hope that this book will explain the basic reference architecture for Cloud
Foundry deployments, including detailing the rationale and trade-offs for all imple‐
mentation decisions.
A Word on Cloud-Native Platforms
Cloud Foundry is a cloud-native platform. Such platforms are designed to do more
for you so that you can focus on what is important: delivering applications that
directly affect your business. Specifically, cloud-native platforms are designed to do
more (including reliably and predictably running and scaling applications) on top of
potentially unreliable cloud-based infrastructure. If you are unfamiliar with the high-
level concepts of Cloud Foundry and what it enables you to achieve, you should begin
by reading Cloud Foundry: The Cloud-Native Platform.
Online Resources
There are some great online references that you should familiarize yourself with as
you embark on your Cloud Foundry journey:
•The Cloud Foundry Foundation
•Bosh.io
• The cf-deployment GitHub repository
• Cloud Foundry’s continuous integration tool Concourse
Conventions Used in This Book
The following typographical conventions are used in this book:
Italics
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed verbatim by the user.
xx | Preface
Constant width italics
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.
This icon indicates a item to take note of.
Sidebar
Sidebars are used to provide some additional context to the main text.
Command prompts always start with $, for example:
$ cf push
O’Reilly Safari
Safari (formerly Safari Books Online) is a membership-based
training and reference platform for enterprise, government,
educators, and individuals.
Members have access to thousands of books, training videos, Learning Paths, interac‐
tive tutorials, and curated playlists from over 250 publishers, including O’Reilly
Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Pro‐
fessional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco
Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt,
Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett,
and Course Technology, among others.
Preface | xxi
For more information, please visit http://oreilly.com/safari.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
To comment or ask technical questions about this book, send email to bookques‐
tions@oreilly.com.
For more information about our books, courses, conferences, and news, see our web‐
site at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
One of the things I love about Cloud Foundry is its community. It is genuinely col‐
laborative, and many people within the community have invested both time and
expertise helping to shape the content and accuracy of this book. A brief section is
not enough to encapsulate the extent to which my friends, family, and colleagues
have helped me, but I will most certainly mention their names. Due to the breadth of
support and the time it took to write this book, I have a sinking feeling that I’ve
missed someone really important, in which case I apologize.
Various product managers and subject matter experts were incredibly generous with
their time, both upfront to go deep on specific topics, and later on reviewing the rele‐
vant sections at length. In chapter order, I would like to thank: David Sabeti and Evan
Farrar on Chapter 5; Eric Malm and Brandon Shroyer on Diego; Shannon Coen on
Routing; Will Pragnell, Glyn Normington, and Julian Friedman on Containers; Ben
Hale on Buildpacks; Dmitriy Kalinin on BOSH; Dan Higham on Debugging; Allen
Duet and Mark Alston on Logging; Sree Tummidi and Filip Hanik on UAA; Haydon
Ryan and Sean Keery on HA and DR; and Dieu Cao on the final Summary.
Numerous colleagues provided incredibly valuable input and tech reviews, including
Matthew Stine, James Bayer, Onsi Fakhouri, Robert Mee, Graham Winn, Amit
xxii | Preface
Gupta, Ramiro Salas, Ford Donald, Merlin Glynn, Shaozhen Ding, John Calabrese,
Caleb Washburn, Ian Zink, Keith Strini, Shawn Neal, Rohit Kelapure, Dave Wallraff,
David Malone, Christopher Umbel, Rick Farmer, Stu Radnidge, Stuart Charlton, and
Jim Park, Alex Ley, Daniel Jones, and Dr Nick Williams—along with many folks at
Stark and Wayne.
Most of the material in this book was derived from experiences in the trenches, and
there are many people who have toughed it out in those trenches alongside me. Sev‐
eral of the people already mentioned belong in this category, but in addition, I would
like to thank Raghvender Arni, Mark Ruesink, Dino Cicciarelli, Joe Fitzgerald, and
Matt Russell for their superb guidance and for establishing excellent teams in which I
had the good fortune to work.
Thanks also to my good friend Vinodini Murugesan for her excellent editing. Special
thanks to my mother, who always invested in my education, and to my father, who
provided tireless coaching and feedback throughout my career; you both inspired me
to do what I love to the best of my ability. And, finally, and most importantly, thanks
to my wonderful wife, Tanya Winn, for her endless understanding and support in all
my endeavors.
Preface | xxiii
CHAPTER 1
The Cloud-Native Platform
Cloud Foundry is a platform for running applications, tasks, and services. Its purpose
is to change the way applications, tasks, and services are deployed and run by signifi‐
cantly reducing the develop-to-deployment cycle time.
As a cloud-native platform, Cloud Foundry directly uses cloud-based infrastructure
so that applications running on the platform can be infrastructure unaware. Cloud
Foundry provides a contract between itself and your cloud-native apps to run them
predictably and reliably, even in the face of unreliable infrastructure.
If you need a brief summary of the benefits of the Cloud Foundry platform, this
chapter is for you. Otherwise, feel free to jump ahead to Chapter 2.
Why You Need a Cloud-Native Platform
To understand the business reasons for using Cloud Foundry, I suggest that you
begin by reading Cloud Foundry: The Cloud-Native Platform, which discusses the
value of Cloud Foundry and explores its overall purpose from a business perspective.
Cloud Foundry is an “opinionated” (more on this later in the chapter), structured
platform that imposes a strict contract between the following:
• The infrastructure layer underpinning it
• The applications and services it supports
Cloud-native platforms do far more than provide developers self-service resources
through abstracting infrastructure. Chapter 2 discusses at length their inbuilt fea‐
tures, such as resiliency, log aggregation, user management, and security. Figure 1-1
shows a progression from traditional infrastructure to Infrastructure as a Service
(IaaS) and on to cloud-native platforms. Through each phase of evolution, the value
1
line rises due to increased abstraction. Your responsibility and requirement to config‐
ure various pieces of the software, middleware, and infrastructure stack in support of
your application code diminish as the value line rises. The key is that cloud-native
platforms are designed to do more for you so that you can focus on delivering appli‐
cations with business value.
Figure 1-1. Cloud-native platform evolution
Cloud-Native Platform Concepts
In the Preface, I pointed out that Cloud Foundry’s focus is not so much what a plat‐
form is or what it does, but rather what it enables you to achieve. It has the potential
to make the software build, test, deploy, and scale cycle significantly faster. It removes
many of the hurdles involved in deploying software, making it possible for you to
release software at will.
Specifically, here’s what the Cloud Foundry platform offers:
2 | Chapter 1: The Cloud-Native Platform
Services as a higher level of abstraction above infrastructure
Cloud Foundry provides a self-service mechanism for the on-demand deploy‐
ment of applications bound to an array of provisioned middleware and routing
services. This benefit removes the management overhead of both the middleware
and infrastructure layer from the developer, significantly reducing the
development-to-deployment time.
Containers
Cloud Foundry runs all deployed applications in containers. You can deploy
applications as container images or as standalone apps containerized by Cloud
Foundry. This provides flexibility. Companies already established with Docker
can deploy existing Docker images to Cloud Foundry. However, containerizing
applications on the user’s behalf offers additional productivity and operational
benefits because the resulting container image is built from known and vetted
platform components. This approach allows you to run your vulnerability scans
against your trusted artifacts once per update. From this point, only the applica‐
tion source code requires additional vulnerability scanning on a per deployment
basis. Essentially, there is less to check on a per deployment basis because all of
your supporting artifacts have already been vetted.
Agile and automation
You can use Cloud Foundry as part of a CI/CD pipeline to provision environ‐
ments and services on demand as the application moves through the pipeline to a
production-ready state. This helps satisfy the key Agile requirement of getting
code into the hands of end users when required.
Cultural shift to DevOps
Cross-cutting concerns is a well-understood concept by developers. Adopting
Cloud Foundry is ideally accompanied by a cultural shift to DevOps, meaning
that you need to break down traditional walls, team silos, and ticket-based hand-
offs to get the most benefit from it.
Microservices support
Cloud Foundry supports microservices through providing mechanisms for inte‐
grating and coordinating loosely coupled services. To realize the benefits of
microservices, a platform is required to provide additional supporting capabili‐
ties; for example, Cloud Foundry provides applications with capabilities such as
built-in resilience, application authentication, and aggregated logging.
Cloud-native application support
Cloud Foundry provides a contract against which applications can be developed.
This contract makes doing the right thing simple and will result in better applica‐
tion performance, management, and resilience.
Cloud-Native Platform Concepts | 3
Not all cloud-native platforms are the same. Some are self-built and pieced together
from various components; others are black-boxed and completely proprietary. The
Cloud Foundry cloud-native platform has three defining characteristics: it is struc‐
tured, opinionated, and open. I’ll examine each of these traits in the following sec‐
tions.
The Structured Platform
Within the platform space, two distinct architectural patterns have emerged: struc‐
tured and unstructured:
•Structured platforms provide built-in capabilities and integration points for key
concerns such as enterprise-wide user management, security, and compliance.
With these kinds of platforms, everything you need to run your applications
should be provided in a repeatable way, regardless of what infrastructure you run
on. Cloud Foundry is a perfect example of a structured platform.
•Unstructured platforms have the flexibility to define a bespoke solution at a
granular level. An example of an unstructured platform would involve a “build
your own platform” approach with a mix of cloud-provided services and home‐
grown tools, assembled for an individual company.
Structured platforms focus on simplifying the overall operational model. Rather than
integrating, operating, and maintaining numerous individual components, the plat‐
form operator just deals with the one platform. Structured platforms remove all the
undifferentiated heavy lifting: tasks that must be done—for example, service discov‐
ery or application placement—but that are not directly related to revenue-generating
software.
Although structured platforms are often used for building new cloud-native applica‐
tions, they also support legacy application integration where it makes sense to do so,
allowing for a broader mix of workloads than traditionally anticipated. The struc‐
tured approach provides a much faster “getting started” experience with lower overall
effort required to operate and maintain the environment.
The Opinionated Platform
When you look at successful software, the greatest and most widely adopted technol‐
ogies are incredibly opinionated. What this means is that they are built on, and
adhere to, a set of well-defined principles employing best practices. They are proven
to work in a practical way and reflect how things can and should be done when not
constrained by the baggage of technical debt. Opinions produce contracts to ensure
applications are constrained to do the right thing.
4 | Chapter 1: The Cloud-Native Platform
Platforms are opinionated because they make specific assumptions and optimizations
to remove complexity and pain from the user. Opinionated platforms are designed to
be consistent across environments, with every feature working as designed out of the
box. For example, the Cloud Foundry platform provides the same user experience
when deployed over different IaaS layers and the same developer experience regard‐
less of the application language. Opinionated platforms such as Cloud Foundry can
still be configurable and extended, but not to the extent that the nature of the plat‐
form changes. Platforms should have opinions on how your software is deployed,
run, and scaled, but not where an application is deployed; this means that, with
respect to infrastructure choice, applications should run anywhere.
The Open Platform
Cloud Foundry is an open platform. It is open on three axes:
•It allows a choice of IaaS layer to underpin it (Google Cloud Platform [GCP],
Amazon Web Services [AWS], Microsoft Azure, VMware vSphere, OpenStack,
etc.).
•It allows for a number of different developer frameworks, polyglot languages,
and application services (Ruby, Go, Spring, etc.).
•It is open-sourced under an Apache 2 license and governed by a multi-
organization foundation.
Closed platforms can be proprietary and often focus on a specific problem. They
might support only a single infrastructure, language, or use case. Open platforms
offer choice where it matters.
Summary
Cloud Foundry is an opinionated, structured, and open platform. As such, it is:
•built on, and adheres to, a set of well-defined principles employing best practices.
•constrained to do the right thing for your application, based on defined con‐
tracts.
•consistent across different infrastructure/cloud environments.
•configurable and extendable, but not to the degree that the nature of the platform
changes.
For the developer, Cloud Foundry provides a fast “on rails” development and deploy‐
ment experience. For the operator, it reduces operational effort through providing
built-in capabilities and integration points for key enterprise concerns such as user
management, security, and self-healing.
The Open Platform | 5
CHAPTER 2
Concepts
This chapter explains the core concepts underpinning Cloud Foundry. Understand‐
ing these concepts paints a complete picture of why and how you should use the plat‐
form. These concepts include the need to deal with undifferentiated heavy lifting and
why cloud-based operating systems are essential in protecting your cloud investment.
This chapter also touches on the philosophical perspectives behind Cloud Foundry
with its opinionated do more approach. Operational aspects, including release engi‐
neering through BOSH, and built-in resilience and fault tolerance are also intro‐
duced. Finally, some of the core capabilities of the platform beyond container
orchestration are introduced, including the aggregated streaming of logs and metrics
and the user access and authentication (UAA) management.
Undifferentiated Heavy Lifting
Cloud Foundry is a platform for running applications and one-off tasks. The essence
of Cloud Foundry is to provide companies with the speed, simplicity, and control
they need to develop and deploy applications. It achieves this by undertaking many of
the burdensome boilerplate responsibilities associated with delivering software.
These types of responsibilities are referred to as undifferentiated heavy lifting, tasks
that must be done—for example, container orchestration or application placement—
but that are not directly related to the development of revenue-generating software.
The following are some examples of undifferentiated heavy lifting:
•Provisioning VMs, OSs, middleware, and databases
• Application runtime configuration and memory tuning
• User management and SSO integration
•Load balancing and traffic routing
7
• Centralized log aggregation
• Scaling
• Security auditing
•Providing fault tolerance and resilience
• Service discovery
•Application placement and container creation and orchestration
• Blue/green deployments with the use of canaries
If you do not have a platform to abstract the underlying infrastructure and provide
the aforementioned capabilities, this additional burden of responsibility remains
yours. If you are spending significant time and effort building bespoke environments
for shipping software, refocusing investment back into your core business will pro‐
vide a huge payoff. Cloud Foundry allows enterprises to refocus effort back into the
business by removing as much of the undifferentiated heavy lifting as possible.
The Cloud Operating System
As an application platform, Cloud Foundry is infrastructure-agnostic, sitting on top
of your infrastructure of choice. As depicted in Figure 2-1, Cloud Foundry is effec‐
tively a cloud-based operating system that utilizes cloud-based resources, which are
hidden and abstracted away from the end user. As discussed in Chapter 1, in the
same way that the OS on your phone, tablet, or laptop abstracts the underlying physi‐
cal compute resource, Cloud Foundry abstracts the infrastructure’s compute resource
(specifically virtual storage, networking, RAM, and CPU). The net effect is that Cloud
Foundry serves both as a standard and efficient way to deploy applications and serv‐
ices across different cloud-computing environments. Conversely, if you are stuck
with directly using IaaS–specific APIs, it requires knowledge of the developer pat‐
terns and operations specific to the underlying IaaS technology, frequently resulting
in applications becoming tightly coupled to the underlying infrastructure.
8 | Chapter 2: Concepts
Figure 2-1. Cloud Foundry layers forming a cloud-based OS
Do More
Historically the long pole of application delivery, the part on the critical path that
blocks progress, has been the IT department. This results in a concept I call server
hugging, whereby developers hold on to (and hug) a plethora of VMs just in case they
need them again someday.
Nowadays, businesses no longer need to be constrained by lengthy IT processes or
organizational silos. Cloud Foundry provides a contractual promise to allow busi‐
nesses to move with velocity and establish a developer–feedback loop so that they can
tightly align products to user expectations. With Cloud Foundry, product managers
get their business back and IT engineers can focus on more interesting issues and get
to eat dinner at home.
Platforms are concerned not only with providing environments and middleware for
running applications. For example, Cloud Foundry takes on the responsibility of
keeping applications up and running in the face of failures within the system. It also
provides security, user administration, workload scheduling, and monitoring capabil‐
ities. Onsi Fakhouri, Pivotal’s Vice President of Research and Development, famously
tweeted this haiku:
Here is my source code,
run it on the cloud for me.
I do not care how!
Onsi’s quote captures the essence of Cloud Foundry’s do more capability. Cloud
Foundry is about doing more on behalf of both the developer and operator so that
they can focus on what really differentiates the business. This characteristic is seen all
Do More | 9
throughout the Cloud Foundry ecosystem. You can take a similar approach with
BOSH, Cloud Foundry’s release-engineering system, and state, “Here are my servers,
make them a Cloud Foundry. I do not care how!”
The Application as the Unit of Deployment
Traditionally, deploying application code required provisioning and deploying VMs,
OSs, and middleware to create a development environment for the application to run
in. After that environment was provisioned, it required patching and ongoing main‐
tenance. New environments were then created as the application moved through the
deployment pipeline.
Early incarnations of platforms centered on middleware: defining complex topology
diagrams of application servers, databases, and messaging engines into which you
could drop your application. When this topology diagram (or blueprint) was defined,
you then specified some additional configuration such as IP addresses and ports to
bring the defined topology and applications into existence. Although this was a step
in the right direction, from a developer’s perspective there was still a layer of com‐
plexity that you needed to configure for each deployment.
Cloud Foundry differs from traditional provisioning and orchestration engines in a
fundamental way: it uses middleware and infrastructure directly, allowing stream‐
lined development through self-service environments. Developers can build, deploy,
run, and scale applications on Cloud Foundry without having to be mindful of the
specific underlying infrastructure, middleware, and container implementation.
Cloud Foundry allows the unit of deployment, i.e., what you deploy to run your appli‐
cation, to be isolated to just the application itself. Even though there are some bene‐
fits to encapsulating both your application and dependencies as a precomposed
container image, I still believe it is more secure and more efficient to keep just the
application as the unit of deployment and allow the platform to handle the remaining
concerns. The trade-offs between both approaches are discussed further in Chapter 9;
however, the benefit of Cloud Foundry is that it supports both approaches. Build‐
packs are discussed at length in that chapter, but for now, it is enough to know that
buildpacks provide the framework and runtime support for your applications. A spe‐
cific buildpack is used to package your application with all of its dependencies. The
resulting staged application is referred to as a droplet.
On-boarding developers is easy; they can deploy applications to Cloud Foundry
using existing tool chains with little to no code modification. It enables the developer
to remove the cost and complexity of configuring infrastructure for their applica‐
tions. Using a self-service model, developers can deploy and scale applications
without being directly locked into the IaaS layer.
10 | Chapter 2: Concepts
Because developers no longer need to concern themselves with, for example, which
application container to use, which version of Java, and which memory settings or
garbage-collection (GC) policy to employ, they can simply push their applications to
Cloud Foundry, and the applications run. This allows developers to focus on deliver‐
ing applications that offer business value. Applications can then be bound to a wide
set of backing services that are available on demand.
Units of Deployment
The phrase “the application is the unit of deployment” is used lib‐
erally. Applications as the sole unit of currency has changed with
the emergence of Diego, Cloud Foundry’s new runtime. Cloud
Foundry now supports both applications running as long running
processes (LRPs) and discrete “run once” tasks such as Bash scripts
and Cron-like jobs. Diego LRPs are also referred to as application
instances, or AIs. What you deploy, be it an actual app or just a
script, is not important. The key takeaway is the removal of the
need for deploying additional layers of technology.
Using cf push Command to Deploy
Cloud Foundry provides several ways for a user to interact with it, but the principal
avenue is through its CLI. The most renowned CLI command often referenced by the
Cloud Foundry community is $ cf push.
You use the cf push command to deploy your application. It has demonstrably
improved the deployment experience. From the time you run cf push to the point
when the application is available, Cloud Foundry performs the following tasks:
• Uploads and stores application files
• Examines and stores application metadata
• Stages the application by using a buildpack to create a droplet
•Selects an appropriate execution environment in which to run the droplet
• Starts the AI and streams logs to the Loggregator
This workflow is explored in more depth in Chapter 6.
Staging
Although it is part of the cf push workflow, staging is a core Cloud Foundry con‐
cept. Cloud Foundry allows users to deploy a prebuilt Docker image or an application
artifact (source code or binaries) that has not yet been containerized. When deploy‐
ing an application artifact, Cloud Foundry will stage the application on a machine or
Using cf push Command to Deploy | 11
VM known as a Cell, using everything required to compile and run the apps locally,
including the following:
• The OS stack on which the application runs
•A buildpack containing all languages, libraries, dependencies, and runtime serv‐
ices the app uses
The staging process results in a droplet that the Cell can unpack, compile, and run.
You can then run the resulting droplet (as in the case of a Docker image) repeatedly
over several Cells. The same droplet runs the same app instances over multiple Cells
without incurring the cost of staging every time a new instance is run. This ability
provides deployment speed and confidence that all running instances from the same
droplet are identical.
Self-Service Application Life Cycle
In most traditional scenarios, the application developer and application operator typ‐
ically perform the following:
•Develop an application
• Deploy application services
• Deploy an application and connect (bind) it to application services
• Scale an application, both up and down
•Monitor an application
• Upgrade an application
This application life cycle is in play until the application is decommissioned and
taken offline. Cloud Foundry simplifies the application life cycle by offering self-
service capabilities to the end user. Adopting a self-service approach removes hand-
offs and potentially lengthy delays between teams. For example, the ability to deploy
an application, provision and bind applications to services, scale, monitor, and
upgrade are all offered by a simple call to the platform.
With Cloud Foundry, as mentioned earlier, the application or task itself becomes the
single unit of deployment. Developers just push their applications to Cloud Foundry,
and those applications run. If developers require multiple instances of an application
to be running they can use cf scale to scale the application to N number of AIs.
Cloud Foundry removes the cost and complexity of configuring infrastructure and
middleware per application. Using a self-service model, users can do the following:
• Deploy applications
12 | Chapter 2: Concepts
•Provision and bind additional services, such as messaging engines, caching solu‐
tions, and databases
• Scale applications
•Monitor application health and performance
• Update applications
•Delete applications
Deploying and scaling applications are completely independent operations. This pro‐
vides the flexibility to scale at will, without the cost of having to redeploy the applica‐
tion every time. Users can simply scale an application with a self-service call to the
platform. Through commercial products such as Pivotal Cloud Foundry, you can set
up autoscaling policies for dynamic scaling of applications when they meet certain
configurable thresholds.
Removing the infrastructure, OS, and middleware configuration concerns from
developers allows them to focus all their effort on the application instead of deploy‐
ing and configuring supporting technologies. This keeps the development focus
where it needs to be, on the business logic that generates revenue.
The Twelve-Factor Contract
An architectural style known as cloud-native applications has been established to
describe the design of applications specifically written to run in a cloud environment.
These applications avoid some of the antipatterns that were established in the client-
server era, such as writing data to the local filesystem. Those antipatterns do not work
as well in a cloud environment because, for example, local storage is ephemeral given
that VMs can move between different hosts. The Twelve-Factor App explains the 12
principles underpinning cloud-native applications.
Platforms offer a set of contracts to the applications and services that run on them.
These contracts ensure that applications are constrained to do the right thing. Twelve
Factor can be thought of as the contract between an application and a cloud-native
platform.
There are benefits to adhering to a contract that constrains things correctly. Twitter is
a great example of a constrained platform. You can write only 140 characters, but
that constraint becomes an extremely valuable feature of the platform. You can do a
lot with 140 characters coupled with the rich features surrounding that contract. Sim‐
ilarly, platform contracts are born out of previously tried-and-tested constraints; they
are enabling and make doing the right thing—good developer practices—easy for
developers.
The Twelve-Factor Contract | 13
1The terms VM and machine are used interchangeably because BOSH can deploy to multiple infrastructure
environments ranging from containers to VMs, right down to configuring physical servers.
Release Engineering through BOSH
In addition to developer concerns, the platform provides responsive IT operations,
with full visibility and control over the application life cycle, provisioning, deploy‐
ment, upgrades, and security patches. Several other operational benefits exist, such as
built-in resilience, security, centralized user management, and better insights through
capabilities like aggregated metrics and logging.
Rather than integrating, operating, and maintaining numerous individual compo‐
nents, the platform operator deals only with the platform. Structured platforms han‐
dle all the aforementioned undifferentiated heavy lifting tasks.
The Cloud Foundry repository is structured for use with BOSH. BOSH is an open
source tool chain for release-engineering, deployment, and life cycle management.
Using a YAML (YAML Ain’t Markup Language) deployment manifest, BOSH creates
and deploys (virtual) machines1 on top of the targeted computing infrastructure and
then deploys and runs software (in our case Cloud Foundry and supporting services)
on to those created machines. Many of the benefits to operators are provided through
using BOSH to deploy and manage Cloud Foundry. BOSH is often overlooked as just
another component of Cloud Foundry, but it is the bedrock of Cloud Foundry and a
vital piece of the ecosystem. It performs monitoring, failure recovery, and software
updates with zero-to-minimal downtime. Chapter 10 discusses BOSH at length.
Rather than utilizing a bespoke integration of a variety of tools and techniques that
provide solutions to individual parts of the release-engineering goal, BOSH is
designed to be a single tool covering the entire set of requirements of release engi‐
neering. BOSH enables software deployments to be:
• Automated
• Reproducible
•Scalable
• Monitored with self-healing failure recovery
• Updatable with zero-to-minimal downtime
BOSH translates intent into action via repeatability by always ensuring every provi‐
sioned release is identical and repeatable. This removes the challenge of configuration
drift and removes the sprawl of snowflake servers.
BOSH configures infrastructure through code. By design, BOSH tries to abstract
away the differences between infrastructure platforms (IaaS or physical servers) into
14 | Chapter 2: Concepts
a generalized, cross-platform description of your deployment. This provides the ben‐
efit of being infrastructure agnostic (as far as possible).
BOSH performs monitoring, failure recovery, software updates, and patching with
zero-to-minimal downtime. Without such a release-engineering tool chain, all these
concerns remain the responsibility of the operations team. A lack of automation
exposes the developer to unnecessary risk.
Built-In Resilience and Fault Tolerance
A key feature of Cloud Foundry is its built-in resilience. Cloud Foundry provides
built-in resilience and self-healing based on control theory. Control theory is a branch
of engineering and mathematics that uses feedback loops to control and modify the
behavior of a dynamic system. Resiliency is about ensuring that the actual system
state (the number of running applications, for example) matches the desired state at
all times, even in the event of failures. Resiliency is an essential but often costly com‐
ponent of business continuity.
Cloud Foundry automates the recovery of failed applications, components, and pro‐
cesses. This self-healing removes the recovery burden from the operator, ensuring
speed of recovery. Cloud Foundry, underpinned by BOSH, achieves resiliency and
self-healing through:
•Restarting failed system processes
• Recreating missing or unresponsive VMs
• Deployment of new AIs if an application crashes or becomes unresponsive
•Application striping across availability zones (AZs) to enforce separation of the
underlying infrastructure
•Dynamic routing and load balancing
Cloud Foundry deals with application orchestration and placement focused on even
distribution across the infrastructure. The user should not need to worry about how
the underlying infrastructure runs the application beyond having equal distribution
across different resources (known as availability zones). The fact that multiple copies
of the application are running with built-in resiliency is what matters.
Cloud Foundry provides dynamic load balancing. Application consumers use a route
to access an application; each route is directly bound to one or more applications in
Cloud Foundry. When running multiple instances, it balances the load across the
instances, dynamically updating its routing table. Dead application routes are auto‐
matically pruned from the routing table, with new routes added when they become
available.
Built-In Resilience and Fault Tolerance | 15
Without these capabilities, the operations team is required to continually monitor
and respond to pager alerts from failed apps and invalid routes. By replacing manual
interaction with automated, self-healing software, applications and system compo‐
nents are restored quickly with less risk and downtime. The resiliency concern is sat‐
isfied once, for all applications running on the platform, as opposed to developing
customized monitoring and restart scripts per application. The platform removes the
ongoing cost and associated maintenance of bespoke resiliency solutions.
Self-Healing Processes
Traditional infrastructure as code tools do not check whether provisioned services are
up and running. BOSH has strong opinions on how to create your release, forcing
you to create a monitor script for the process. If a BOSH-deployed component has a
process that dies, the monitor script will try to restart it.
Self-Healing VMs
BOSH has a Health Monitor and Resurrector. The Health Monitor uses status and
life cycle events to monitor the health of VMs. If the Health Monitor detects a prob‐
lem with a VM, it can trigger an alert and invoke the Resurrector. The Resurrector
automatically recreates VMs identified by the Health Monitor as missing or unre‐
sponsive.
Self-Healing Application Instance Count
Cloud Foundry runs the application transparently, taking care of the application life
cycle. If an AI dies for any reason (e.g., because of a bug in the application) or a VM
dies, Cloud Foundry can self-heal by restarting new instances so as to keep the
desired capacity to run AIs. It achieves this by monitoring how many instances of
each application are running. The Cell manages its AIs, tracks started instances, and
broadcasts state messages. When Cloud Foundry detects a discrepancy between the
actual number of running instances versus the desired number of available AIs, it
takes corrective action and initiates the deployment of new AIs. To ensure resiliency
and fault tolerance, you should run multiple AIs for a single application. The AIs will
be distributed across multiple Cells for resiliency.
Resiliency Through Availability Zones
Finally, Cloud Foundry supports the use of availability zones (AZs). As depicted in
Figure 2-2, you can use AZs to enforce separation of the underlying infrastructure.
For example, when running on AWS, you can directly map Cloud Foundry AZs to
different AWS AZs. When running on vCenter, you can map Cloud Foundry AZs to
different vCenter Cluster and resource-pool combinations. Cloud Foundry can then
deploy its components across the AZs. When you deploy multiple AIs, Cloud Foun‐
16 | Chapter 2: Concepts
dry will distribute them evenly across the AZs. If, for example, a rack of servers fails
and brings down an entire AZ, the AIs will still be up and serving traffic in the
remaining AZs.
Figure 2-2. Application resiliency through Cloud Foundry AZs
Aggregated Streaming of Logs and Metrics
Cloud Foundry provides insight into both the application and the underlying plat‐
form through aggregated logging and metrics. The logging system within Cloud
Foundry is known as the Loggregator. It is the inner voice of the system, telling the
operator and developer what is happening. It is used to manage the performance,
health, and scale of running applications and the platform itself, via the following:
•Logs provide visibility into behavior; for example, application logs can be used to
trace through a specific call stack.
•Metrics provide visibility into health; for example, container metrics can include
memory, CPU, and disk-per-app instance.
Insights are obtained through storing and analyzing a continuous stream of aggrega‐
ted, time-ordered events from the output streams of all running processes and back‐
ing services. Application logs are aggregated and streamed to an endpoint via Cloud
Foundry’s Loggregator Firehose. Logs from the Cloud Foundry system components
can also be made available and processed through a separate syslog drain. Cloud
Foundry produces both the application and system logs to provide a holistic view to
the end user.
Figure 2-3 illustrates how application logs and syslogs are separated as streams, in
part to provide isolation and security between the two independent concerns, and in
part due to consumer preferences. Generally speaking, app developers do not want to
wade through component logs to resolve an app-specific issue. Developers can trace
the log flow from the frontend router to the application code from a single log file.
Aggregated Streaming of Logs and Metrics | 17
Figure 2-3. The Loggregator system architecture used for aggregating application logs
and metrics
In addition to logs, metrics are gathered and streamed from system components.
Operators can use metrics information to monitor an instance of Cloud Foundry.
Furthermore, Cloud Foundry events show specific events such as when an application
is started or stopped. The benefits of aggregated log, metric, and event streaming
include the following:
• You can stream logs to a single endpoint.
•Streamed logs provide timestamped outputs per application.
•Both application logs and system-component logs are aggregated, simplifying
their consumption.
•Metrics are gathered and streamed from system components.
•Operators can use metrics information to monitor an instance of Cloud Foun‐
dry.
•You can view logs from the command line or drain them into a log management
service such as an ELK stack (Elasticsearch, Logstash, and Kibana), Splunk, or
PCF Metrics.
•Viewing events is useful when debugging problems. For example, it is useful to
be able to correlate an app instance event (like an app crash) to the container’s
specific metrics (high memory prior to crash).
The cost of implementing an aggregated log and metrics-streaming solution involves
bespoke engineering to orchestrate and aggregate the streaming of both syslog and
18 | Chapter 2: Concepts
application logs from every component within a distributed system into a central
server. Using a platform removes the ongoing cost and associated maintenance of
bespoke logging solutions.
Security
For enterprises working with cloud-based infrastructure, security is a top concern.
Usually the security teams have the strongest initial objections to Cloud Foundry
because it works in a way that is generally unprecedented to established enterprise
security teams. However, as soon as these teams understand the strength of Cloud
Foundry’s security posture, my experience is that they become one of your strongest
champions.
Distributed System Security
Cloud Foundry offers significant security benefits over traditional approaches to
deploying applications because it allows you to strengthen your security posture
once, for all applications deployed to the platform. However, securing distributed
systems involves inherent complexity. For example, think about these issues:
•How much effort is required to automatically establish and apply network traffic
rules to isolate components?
•What policies should be applied to automatically limit resources in order to
defend against denial-of-service (DoS) attacks?
•How do you implement role-based access controls (RBAC) with in-built auditing
of system access and actions?
•How do you know which components are potentially affected by a specific vul‐
nerability and require patching?
•How do you safely patch the underlying OS without incurring application down‐
time?
These issues are standard requirements for most systems running in corporate data
centers. The more custom engineering you use, the more you need to secure and
patch that system. Distributed systems increase the security burden because there are
more moving parts, and with the advances in container technology, new challenges
arise, such as “How do you dynamically apply microsegmentation at the container
layer?” Additionally, when it comes to rolling out security patches to update the sys‐
tem, many distributed systems suffer from configuration drift—namely, the lack of
consistency between supposedly identical environments. Therefore, when working
with complex distributed systems (specifically any cloud-based infrastructure), envi‐
ronmental risk factors are intensified.
Security | 19
The Challenge of Configuration Drift
Deployment environments (such as staging, quality assurance, and
production) are often complex and time-consuming to construct
and administer, producing the ongoing challenge of trying to man‐
age configuration drift to maintain consistency between environ‐
ments and VMs. Reproducible consistency through release-
engineering tool chains such as Cloud Foundry’s BOSH addresses
this challenge.
Environmental Risk Factors for Advanced Persistent Threats
Malware known as advanced persistent threats (APTs) needs three risk factors in
order to thrive:
1. Time
2. Leaked or misused credentials
3. Misconfigured and/or unpatched software
Given enough time, APTs can observe, analyze, and learn what is occurring within
your system, storing away key pieces of information at will. If APTs obtain creden‐
tials, they can then further access other systems and data such as important ingress
points into your protected data layer. Finally, unpatched software vulnerabilities pro‐
vide APTs the freedom to further exploit, compromise, and expose your system.
Challenge of Minimal Change
There has been a belief that if enterprises deliver software with velocity, the trade-off
is they must reduce their security posture and increase risk. Therefore, traditionally,
many enterprises have relied on a concept of minimal change to mitigate risk and
reduce velocity. Security teams establish strict and restrictive policies in an attempt to
minimize the injection of new vulnerabilities. This is evident by ticketing systems to
make basic configuration changes to middleware and databases, long-lived transport
layer security (TLS) credentials, static firewall rules, and numerous security policies
to which applications must adhere.
Minimal change becomes compounded by complexity of the environment. Because
machines are difficult to patch and maintain, environmental complexity introduces a
significant lag between the time a vulnerability is discovered and the time a machine
is patched, be it months, or worse, even years in some production enterprise environ‐
ments.
20 | Chapter 2: Concepts
2The three Rs of enterprise security is a phrase coined in an article by Justin Smith, a cloud identity and secu‐
rity expert. I strongly suggest that if you’re interested in enterprise security, you read the full article titled The
Three R’s of Enterprise Security.
The Three Rs of Enterprise Security
IThese combined risk factors provide a perfect ecosystem in which APTs can flour‐
ish, and minimal change creates an environment in which all three factors are likely
to occur. Cloud Foundry inverts the traditional enterprise security model by focusing
on the three Rs of enterprise security: rotate, repave, repair.2"
1. Rotate the credentials frequently so that they are valid only for short periods of
time.
2. Repave (rebuild) servers and applications from a known good state to cut down
on the amount of time an attack can live.
3. Repair vulnerable software as soon as updates are available.
For the three Rs to be effective in minimizing the APT risk factors, you need to
implement them repeatedly at high velocity. For example, data center credentials can
be rotated hourly, servers and applications can be rebuilt several times a day, and
complete vulnerability patching can be achieved within hours of patch availability.
With this paradigm in mind, faster now equates to a safer and stronger security pos‐
ture.
Additional Cloud Foundry Security
Cloud Foundry, along with BOSH and continuous integration (CI) tooling, provides
tooling to make the three Rs’ security posture a reality. Additionally, Cloud Foundry
further protects you from security threats by automatically applying the following
additional security controls to isolate applications and data:
•Manages software-release vulnerability by using new Cloud Foundry releases
created with timely updates to address code issues
•Manages OS vulnerability by using a new OS created with the latest security
patches
•Implements RBACs, applying and enforcing roles and permissions to ensure that
users of the platform can view and affect only the resources to which they have
been granted access
•Secures both code and the configuration of an application within a multitenant
environment
Security | 21
•Deploys each application within its own self-contained and isolated container‐
ized environment
• Prevents possible DoS attacks through resource starvation
•Provides an operator audit trail showing all operator actions applied to the plat‐
form
•Provides a user audit trail recording all relevant API invocations of an applica‐
tion
•Implements network traffic rules (security groups) to prevent system access to
and from external networks, production services, and between internal compo‐
nents
BOSH and the underlying infrastructure expands the security posture further by han‐
dling data-at-rest encryption support through the infrastructure layer, usually by
some device mechanism or filesystem-level support. For example, BOSH can use
AWS EBS (Elastic Block Store) volume encryption for persistent disks.
Because every component within Cloud Foundry is created with the same OS image,
Cloud Foundry eases the burden of rolling out these OS and software-release updates
by using BOSH. BOSH redeploys updated VMs, component by component, to ensure
zero-to-minimal downtime. This ultimately removes patching and updating concerns
from the operator and provides a safer, more resilient way to update Cloud Foundry
while keeping applications running. It is now totally possible to rebuild every VM in
your data center from a known good state, as desired, with zero application down‐
time.
In addition, you can rebuild and redeploy the applications themselves from a known
good release, upon request, with zero downtime. These rebuilding, repairing, and
redeploying capabilities ensure that the patch turnaround time for the entire stack is
as fast and as encompassing as possible, reaching every affected component across
the stack with minimal human intervention. Cadence is limited only by the time it
takes to run the pipeline and commit new code.
In addition to patching, if for any reason a component becomes compromised, you
can instantly recreate it by using a known and clean software release and OS image,
and move the compromised component into a quarantine area for further inspection.
There are additional detailed technical aspects that further improve security; for
example, using namespaces for all containerized processes. I suggest reviewing the
individual components for a more detailed understanding of how components such
as Garden or the UAA help to further increase the security posture.
22 | Chapter 2: Concepts
UAA Management
Role-based access defines who can use the platform and how. Cloud Foundry uses
RBAC, with each role granting permissions to a specific environment the user is tar‐
geting. All collaborators target an environment with their individual user accounts
associated with a role that governs what level and type of access the user has within
that environment. Cloud Foundry’s UAA is the central identity management service
for both users and applications. It supports federated login, Lightweight Directory
Access Protocol (LDAP), Security Assertion Markup Language (SAML), SSO, and
multifactor authentication. UAA is a powerful component for strengthening your
security posture for both user and application authentication. Chapter 15 looks at
UAA in more detail.
Organizations and Spaces
Most developers are familiar with using VMs for development and deployment.
Cloud Foundry is a virtualization layer (underpinned by containers) on top of a vir‐
tualization layer underpinned by VMs. Therefore, users do not have direct access to a
specific machine or VM; rather, they simply access a logical partition of resources to
deploy their apps.
To partition and allocate resources, Cloud Foundry uses logical boundaries known as
Organizations (Orgs) and Spaces. Orgs contain one or more Spaces. Users can belong
to any number of Orgs and/or Spaces, and users can have unique roles and permis‐
sions in each Org or Space to which they belong.
Orgs and Spaces provide the following:
• Logical separation and assignment of Cloud Foundry resources
• Isolation between different teams
•Logical isolation of development, test, staging, and production environments
For some enterprise customers with traditional silos, defining their required Orgs
and Spaces can at first seem complex. Ideally, development teams should have
autonomy to create and manage their own Spaces, as required. For development
teams embracing microservices, the best approach is to organize teams by the big-A
application—meaning a group of related (small-a) applications or services that can
collectively and appropriately be grouped together, often referred to as bulkheading.
Ultimately, you know your business and how your developers work, so use these logi‐
cal structures to provide meaningful working environments and pipelines for your
developers.
UAA Management | 23
Orgs
An Org is the top level of separation. A logical mapping could be to your business
unit, a big-A application, or some other reasonable bounded context. When working
with large enterprises that might have 200 developers in a business unit, I normally
try to shift their thinking to the “Two-Pizza Team” model for application develop‐
ment. However, the actual number of developers within an Org or Space should not
be a contentious point if it maps well to the physical organization and does not
impede development or deployment.
Spaces
Every application and service is scoped to a Cloud Foundry Space. A Space provides a
shared location that a set of users can access for application development, deploy‐
ment, and maintenance. Every Space belongs to one Org. Each Org contains at least
one Space but could contain several, and therefore can contain a broader set of col‐
laborators than a single Space.
Environment Variables for Properties
The cf push command is the user saying to Cloud Foundry, “Here
is my application artifact; run it on the cloud for me. I do not care
how!”
The “I do not care how” needs explaining. There are properties you
should care about, and these properties are configured by environ‐
ment variables. Cloud Foundry uses environment variables to
inform a deployed application about its environment. Environ‐
ment variables include the following:
•How much memory to use
•What routes should be bound to the application
•How many instances of an application should be run
•Additional app-specific environment properties
The Cloud Foundry user can configure all of these properties.
Spaces are more of a developer concern. I believe there should be a limit to the
amount of developers in a Space because there is a requirement for a level of trust due
to the scope of shared resources and exposed environment variables that reside at the
Space level. For the hyper security-conscious who have no trust between application
teams, one Space per application is the only way forward. In reality, a Space for a big-
A application might be more appropriate.
24 | Chapter 2: Concepts
Colocation and Application Interactions
When considering the logical separations of your Cloud Foundry
deployment, namely what Orgs and Spaces to construct, it is
important to consider your application-to-application and
application-to-services interactions. Although this consideration is
more of a microservices consideration, an understanding of appli‐
cation and service boundaries is beneficial in understanding any
colocation requirements. An example of this would be an applica‐
tion needing to access a corporate service in a specific data center
or network. These concerns and their subsequent impacts become
more noticeable at scale. You will need to design, understand, and
document service discovery and app-to-app dependency, and addi‐
tional frameworks such as Spring Cloud can significantly help here.
Resource Allocation
In addition to defining team structures, you can use Orgs and Spaces for assigning
appropriate resources to each team.
Collaborators in an Org share the following:
• Resource quota
• Applications
•Services availability
• Custom domains
Domains Hosts and Routes
To enable traffic from external clients, applications require a specific URL known as a
route. A route is a URL comprised of a domain and an optional host as a prefix. The
host in this context is the portion of the URL referring to the application or applica‐
tions, such as these:
my-app-name is the host prefix
my-business.com is the domain
my-app-name.my-business.com is the route
Route
Application consumers use a route to access an application. Each route is directly
bound to one or more applications in Cloud Foundry. When running multiple
instances, Cloud Foundry automatically load balances application traffic across mul‐
Domains Hosts and Routes | 25
tiple AIs through a component called the GoRouter. Because individual AIs can come
and go for various reasons (scaling, app deletion, app crash), the GoRouter dynami‐
cally updates its routing table. Dead routes are automatically pruned from the routing
table and new routes are added when they become available. Dynamic routing is a
powerful feature. Traditional manual route table maintenance can be slow because it
often requires submitting tickets to update or correct domain name server (DNS) or
load balancer components. Chapter 7 discusses these various routing concepts fur‐
ther.
Domains
Domains provide a namespace from which to create routes. Cloud Foundry uses
domains within routes to direct requests to specific applications. You can also regis‐
ter and use a custom domain, known as an owned domain. Domains are associated
with Orgs and are not directly bound to applications. Domains can be shared, mean‐
ing that they are registered to multiple Orgs, or private, registered to only one Org.
Owned domains are always private.
Context Path–Based Routing
A context path in a URL extends a top-level route with additional context so as to
route the client to either specific application functionality or a different application.
For example, http://my-app-name.my-business.com can be extended to http://my-app-
name.my-business.com/home to direct a client to a homepage.
In the preceding example, your clients can reach the application via my-app-
name.my-business.com. Therefore, if a client targets that route and uses a different
context path, it will still reach only a single application. For example, http://my-app-
name.my-business.com/home and http://my-app-name.my-business.com/somewher
eelse will both be routed by GoRouter to your app my-app-name.
This approach works if all the functionality under the route my-app-name.my-
business.com can be served by a single app. However, when using microservices, there
is a need to have a unified top-level route that can be backed by a number of micro‐
services. Each service uses the same top-level domain but can be individually reached
by different paths in the URL. The microservices collectively serve all supported
paths under the domain my-app-name.my-business.com. With context path–based
routing, you can independently scale those portions of your app that are experiencing
high/low traffic.
26 | Chapter 2: Concepts
Rolling Upgrades and Blue/Green Deployments
As discussed in “Security” on page 19, both the applications running on the platform
and the platform itself allow for rolling upgrades and zero-downtime deployment
through a distributed consensus.
You can update applications running on the platform with zero downtime through a
technique known as blue/green deployments.
Summary
This chapter walked you through the principal concepts of Cloud Foundry. For
example, you should now understand the meaning of a cloud OS, and the importance
of the twelve-factor contract. The primary premise of Cloud Foundry is to enable the
application development-to-deployment process to be as fast as possible. Cloud
Foundry, underpinned by the BOSH release-engineering tool chain, achieves this by
doing more on your behalf. Cloud Foundry provides the following:
•Built-in resiliency through automated recovery and self-healing of failed applica‐
tions, components, and processes
• Built-in resiliency through striping applications across different resources
•Authentication and authorization for both users and applications, with the addi‐
tion of RBAC for users
•Increased security posture with the ability to rotate credentials and repave and
repair components
•The ability to update the platform with zero downtime via rolling upgrades
across the system
•Speed in the deployment of apps with the ability to connect to a number of serv‐
ices via both platform-managed service brokers and services running in your
existing IT infrastructure
•Built-in management and operation of services for your application, such as met‐
rics and log aggregation, monitoring, autoscaling, and performance management
Now that you understand both the concepts and capabilities of Cloud Foundry, you
are ready to learn about the individual components that comprise a Cloud Foundry
deployment.
Rolling Upgrades and Blue/Green Deployments | 27
1The terms VM and machine are used interchangeably because BOSH can leverage and deploy to multiple
infrastructure environments ranging from containers, to VMs, right down to configuring physical servers.
CHAPTER 3
Components
This chapter explores the details of Cloud Foundry components. If you are keen to
begin your journey by deploying Cloud Foundry, you are free to jump ahead to
Chapter 4 and refer to this chapter at a later time.
Cloud Foundry is a distributed system involving several different components. Dis‐
tributed systems balance their processing loads over multiple networked machines.1
They are optimized between efficiency and resiliency against failures. Cloud Foundry
is comprised of a modular distributed architecture with discrete components utilized
for dedicated purposes.
Distributed Systems
As a distributed system, each Cloud Foundry component has a well-defined responsi‐
bility. The different components interact with one another to achieve a common goal.
Distributed systems achieve component interaction through communicating via mes‐
sages and using central data stores for system-wide state coordination. There are sev‐
eral benefits to using a distributed component model, such as the ability to scale a
single component in isolation, or the ability to change one component without
directly affecting another.
It is important for the operator to understand what comprises the Cloud Foundry
distributed system. For example, some components are responsible for system state
and you need to back them up. Other components are responsible for running your
applications, and therefore you most probably want more than one instance of those
29
2System state, along with configuration, is the most critical part of your environment; everything else is just
wiring. Processes can come and go, but your system state and configuration must maintain integrity.
components to remain running to ensure resiliency. Ultimately, understanding these
components and where their boundaries of responsibility lie is vital when it comes to
establishing concerns such as resiliency and disaster recovery.
In this chapter, you will learn about the following:
1. The core components, including their purpose and boundary of responsibility
2. The flow of communication and interaction between components
3. The components responsible for state2
Component Overview
The Cloud Foundry components are covered in detail in the Cloud Foundry docu‐
mentation. The Cloud Foundry code base is being rapidly developed by a growing
open source team of around 250 developers. Any snapshot in time is going to change
almost immediately. This book focuses not on the specific implementations that are
subject to change, but on the purpose and function of each component. The imple‐
mentation details change; the underlying patterns often remain.
We can group by function the components that make up Cloud Foundry into differ‐
ent layers. Table 3-1 lists these layers.
Table 3-1. Cloud Foundry component layers
Layer Compoenents
Routing GoRouter, TCPRouter, and external load balancera
Authentication and user management User Access and Authentication Management
Application life cycle and system state Cloud Controller, Diego’s core components (e.g., BBS and Brain)
App storage and execution blobstore (including app artifacts/droplets and the Application Life-Cycle
Binaries), Diego Cell (Garden, and runC)
Services Service Broker, User Provided Service
Messaging NATS (Network Address Translation) Messaging Bus
Metrics and logging Loggregator (including Doppler and the Firehose)
a The external load balancer is not a Cloud Foundry component; it fronts the traffic coming into Cloud Foundry.
Figure 3-1 provides a visual representation of these components.
30 | Chapter 3: Components
Figure 3-1. Cloud Foundry component layers
To discuss the core components and their role within Cloud Foundry, we will take a
top-down approach beginning with components that handle traffic coming into
Cloud Foundry.
Routing via the Load Balancer and GoRouter
All HTTP-based traffic first enters Cloud Foundry from an external load balancer
fronting Cloud Foundry. The load balancer is primarily used for routing traffic to the
GoRouter.
Load Balancer Preference
The choice of load balancer is according to your preference. Infra‐
structure hosted on AWS often use Amazon’s elastic load balancer
(ELB). On-premises deployments such as those on vSphere or
OpenStack take advantage of existing enterprise load balancers
such as F5’s BIG-IP.
The load balancer can either handle secure socket layer (SSL) decryption and then
route traffic on to the GoRouter, or pass the SSL connection on to the GoRouter for
SSL decryption.
Routing via the Load Balancer and GoRouter | 31
3The GoRouter supports Sticky Session configuration if required.
The GoRouter receives all incoming HTTP traffic from the load balancer. The term
“router” can be misleading to networking engineers who expect routers to implement
specific networking standards. Conceptually, the router should be treated as a reverse
proxy, responsible for centrally managing and routing all incoming HTTP(S) traffic
to the appropriate component. Traffic will typically be passed on to either an applica‐
tion or the Cloud Controller:
•Application users target their desired applications via a dedicated domain. The
GoRouter will route application traffic to the appropriate AI running on a Diego
Cell. If multiple AIs are running, the GoRouter will round-robin traffic across
the AIs to distribute the workload.3
•Cloud Foundry users address the Cloud Controller: Cloud Foundry’s API known
as the CAPI. Some client traffic will go directly from the GoRouter to the UAA;
however, most UAA calls are initiated from the Cloud Controller.
The GoRouter periodically queries Diego, Cloud Foundry’s container runtime sys‐
tem, for information on the location of the Cells and containers on which each appli‐
cation is currently running.
Applications require a route for external traffic to access them. The GoRouter uses a
routing table to keep track of the available applications. Because applications can
have multiple AIs, all with a single route, each route has an associated array of
host:port entries. The host is the Diego Cell machine running the application. The
GoRouter regularly recomputes new routing tables based on the Cell’s IP addresses
and the host-side port numbers for the Cell’s containers.
Routing is an important part of Cloud Foundry. Chapter 7 discusses the GoRouter
and routing in general in greater detail.
User Management and the UAA
As traffic enters Cloud Foundry, it needs to be authenticated. Cloud Foundry’s UAA
service is the central identity management service for managing:
• Cloud Foundry developers
•Application clients/end users
• Applications requiring application-to-application interactions
The UAA is an OAuth2 authorization server, that issues access tokens for client
applications to use when they act on behalf of Cloud Foundry users; for example,
32 | Chapter 3: Components
when they request access to platform resources. The UAA is based on the most up-
to-date security standards like OAuth, OpenID Connect, and System for Cross-
domain Identity Management (SCIM). It authenticates platform users via their Cloud
Foundry credentials. When users register an account with the Cloud Foundry plat‐
form, the UAA acts as the user identity store, retaining user passwords in the UAA
database. The UAA can also act as a SSO service. It has endpoints for managing user
accounts and registering OAuth2 clients as well as various other management func‐
tions. In addition, you can configure the UAA’s user-identity store to either store
user information internally or connect to an external user store through LDAP or
SAML. Here are a couple of examples:
•Users can use LDAP credentials to gain access to the Cloud Foundry platform
instead of registering a separate account.
•Operators can use SAML to connect to an external user store in order to enable
SSO for users who want to access Cloud Foundry.
The UAA has its own database known as the UAADB. Like all databases responsible
for system state, this is a critical component and you must make backups. Chapter 15
looks at the UAA in more detail.
The Cloud Controller
The Cloud Controller exposes Cloud Foundry’s REST API. Users of Cloud Foundry
target the Cloud Controller so that they can interact with the Cloud Foundry API
(CAPI). Clients directly interact with the Cloud Controller for tasks such as these:
• Pushing, staging, running, updating, and retrieving applications
• Pushing, staging, and running discrete one-off tasks
You can interact with the Cloud Controller in the following three ways:
• A scriptable CLI
• Language bindings (currently Java)
• Integration with development tools (IDEs) to ease the deployment process
You can find a detailed overview of the API commands at API Docs. The V3 API
extends the API to also include Tasks in addition to applications LRPs.
The Cloud Controller is responsible for the System State and the Application Life-
Cycle Policy.
The Cloud Controller | 33
System State
The Cloud Controller uses two components for storing state: a blobstore and a data‐
base known as the CCDB.
The Cloud Controller blobstore
To store large binary files such as application artifacts and staged application drop‐
lets, Cloud Foundry’s Cloud Controller uses a blobstore that holds different types of
artifacts:
•Application code packages—unstaged files that represent an application
• Resource files
•Buildpacks
• Droplets and other container images
Resource files are uploaded to the Cloud Controller and then cached in the blobstore
with a unique secure hash algorithm (SHA) so as to be reused without reuploading
the file. Before uploading all the application files, the Cloud Foundry CLI issues a
resource match request to the Cloud Controller to determine if any of the application
files already exist in the resource cache. When the application files are uploaded, the
Cloud Foundry CLI omits files that exist in the resource cache by supplying the result
of the resource-match request. The uploaded application files are combined with the
files from the resource cache to create the complete application package.
Droplets are the result of taking an app package and staging it via a processing build‐
pack to create an executable binary artifact. The blobstore uses FOG so that it can use
abstractions like Amazon Simple Storage Service (Amazon S3) or a mounted network
file system (NFS) for storage.
The CCDB
The Cloud Controller, via its CCDB, also maintains records of the logical hierarchical
structure including available orgs, spaces, apps, services, service instances, user roles,
and more. It maintains users’ information and how their roles map to orgs and
spaces.
The Application Life-Cycle Policy
The Cloud Controller is responsible for the Application Life-Cycle Policy. Conceptu‐
ally, the Cloud Controller is Cloud Foundry’s CPU that drives the other components.
For example, when you use the cf push command to push an application or task to
Cloud Foundry, the Cloud Controller stores the raw application bits in its blobstore,
34 | Chapter 3: Components
4The OCI is an open governance structure for the express purpose of creating open industry standards around
container formats and runtime. For more information, see https://www.opencontainers.org/.
creates a record to track the application metadata in its database, and directs the
other system components to stage and run the application.
The Cloud Controller is application and task centric. It implements all of the object
modeling around running applications (handling permissions, buildpack selection,
service binding, etc.). The Cloud Controller is concerned with policy; for example,
“run two instances of my application,” but the responsibility of orchestration and exe‐
cution has been passed to Diego.
Continuous Delivery Pipelines
Many companies choose to interact with Cloud Foundry through a
continuous delivery pipeline such as Concourse.ci that uses
machines to utilize the CAPI. This approach reduces human error
and offers deployment repeatability.
Application Execution
The components responsible for executing your applications and tasks include Diego,
Garden (a container management API), and an Open Container Initiative (OCI)–
compatible4 backend container implementation (like runC).
Diego
Diego is the container runtime architecture for Cloud Foundry. Whereas the Cloud
Controller is concerned with policy, it is Diego that provides the scheduling, orches‐
tration, and placement of applications and tasks. Diego is designed to keep applica‐
tions available by constantly monitoring their states and reconciling the actual system
state with the expected state by starting and stopping processes as required. Chapter 6
covers Diego at length.
Garden and runC
Garden is a platform-agnostic Go API for container creation and management. Gar‐
den has pluggable backends for different platforms and runtimes, including Linux,
Windows, and runC, an implementation of the OCI specification. Chapter 8 looks at
containers in detail.
Application Execution | 35
Metrics and Logging
As discussed in “Aggregated Streaming of Logs and Metrics” on page 17, system logs
and metrics are continually generated and streamed from every component. In addi‐
tion, AIs should produce logs as a continual stream for each running AI. Cloud
Foundry’s logging system is known as the Loggregator. The Loggregator aggregates
component metrics and application logs into a central location using a Metron agent.
Metron Agent
The Metron agent comes from the Loggregator subsystem and resides on every VM.
It gathers a mix of metrics and log statistics from the Cloud Foundry components; for
example, the Metron agent gathers application logs from the Cloud Foundry Diego
hosts known as Cells. Operators can use the collected component logs and metrics to
monitor a Cloud Foundry deployment. These metrics and logs can be forwarded by a
syslog forwarder onto a syslog drain. It is possible to drain syslogs into multiple con‐
sumers. You can set up as many syslog “sinks” for an application as you want. Each
sink will receive all messages.
Metron has the job of forwarding application logs and component metrics to the
Loggregator subsystem by taking traffic from the various emitter sources (Cells in the
case of apps) and routing that logging traffic to one or more Loggregator compo‐
nents. An instance of the Metron agent runs on each VM in a Cloud Foundry system
and logs are therefore co-located on the emitter sources.
Loggregator
The Loggregator (log aggregator) system continually streams logging and metric
information. The Loggregator’s Firehose provides access to application logs, con‐
tainer metrics (memory, CPU, and disk-per-app instance), some component metrics,
and component counter/HTTP events. If you want to see firsthand the output from
the Loggregator, you can invoke the CF CLI command $ cf logs APP, as demon‐
strated in the following example:
2017-02-14T13:10:32.260-08:00 [RTR/0] [OUT] twicf-signup.cfapps.io -
[2017-02-14T21:10:32.250+0000] "GET /favicon.ico HTTP/1.1" 200 0 946
"https://twicf-signup.cfapps.io/"... "10.10.66.187:26208" "10.10.147
.77:60076" x_forwarded_for:"71.202.60.71" x_forwarded_proto:"https"
vcap_request_id:"2c28e2fe-54f7-48d7-5f0b-aaead5ab5c7c" response_time
:0.009104115 app_id:"ff073944-4d18-4c73-9441-f1a4c4bb4ca3" app_index:"0"
The Firehose does not provide component logs. Component logs are retrieved
through an rsyslog drain.
36 | Chapter 3: Components
Messaging
Most component machines communicate with one another internally through HTTP
and HTTPS protocols. Temporary messages and data are captured in two locations:
•A Consul server stores the longer-lived control data such as component IP
addresses and distributed locks that prevent components from duplicating
actions.
•Diego’s bulletin board system (BBS) stores a real-time view of the system, includ‐
ing the more frequently updated and disposable data such as Cell and application
status, unallocated work, and heartbeat messages.
The Route-Emitter component still uses the NATS messaging system, a lightweight
distributed publish–subscribe message queuing system, to broadcast the latest rout‐
ing tables to the routers.
Additional Components
In addition to the aforementioned core system components, there are other compo‐
nents that comprise the Cloud Foundry ecosystem.
Stacks
A stack is a prebuilt root filesystem (rootfs). Stacks are used along with droplets (the
output of buildpack staging). They provide the container filesystem used for running
applications.
Cells can support more than one stack if configured correctly; however, a Cell must
ensure buildpack and stack compatibility. For example, a Windows “stack” cannot
run on Linux VMs. Therefore, to stage or run a Linux app, a Cell running a Linux
stack must be available (and have free memory).
A Marketplace of On-Demand Services
Applications often depend on additional backing services such as databases, caches,
messaging engines, or third-party APIs. Each Cloud Foundry deployment has the
concept of a marketplace. The Cloud Foundry marketplace is a platform extension
point. It exposes a set of services that are available for developers to use in support of
running applications. Developers do not build applications in isolation. Applications
often require additional middleware services such as data persistence, search, cach‐
ing, graphing, messaging, API management, and more.
The platform operator exposes additional services to the marketplace through service
brokers, route services, and user-provided services. The marketplace provides Cloud
Messaging | 37
Foundry users with the self-service, on-demand provisioning of additional service
instances. The platform operator can expose different service plans to different Cloud
Foundry Orgs. Developers are able to view and create service instances only for ser‐
vice plans that have been configured to be visible for their targeted Org and Space.
You can make service plans public and visible to all, or private to limit service visibil‐
ity.
A service can offer different service plans to provide varying levels of resources or
features for the same service. An example service plan is a database service offering
small, medium, or large plans with differing levels of concurrent connections and
storage sizes. The provisioned service provides a unique set of credentials that can be
used to bind and connect an application to the service.
Services not only enhance applications through providing middleware; they are con‐
cerned with all possible components to enable development teams to move quickly,
including, for example, GitHub, Pivotal Tracker, CI services, and route services. For
example, you can expose to the marketplace any application running on the platform
that offers a service for others to consume. One advantage of this approach is that the
service broker plans can prepopulate datastores with a specific schema or set of data
(such as a sample customer set required for unit testing). Another example could be a
service broker plan to provide specific preconfigured templates for apps.
Service brokers
Developers can provision service instances and then bind those instances to an appli‐
cation via a service broker responsible for providing the service instance.
A service broker interacts with the Cloud Controller to provision a service instance.
Service brokers advertise a catalog of service offerings and service plans (e.g., a single-
node MySQL plan or a clustered multinode MySQL plan). A service broker imple‐
ments the CAPI to provide the user with the following:
• List service offerings
•Provision (create) and deprovision (delete) service instances
• Enable applications to bind to, and unbind from, the service instances
In general, provision reserves service resources (e.g., creates a new VM) and bind
delivers the required information for accessing the resource to an application. The
reserved resource is known, in Cloud Foundry parlance, as a service instance.
The service instance is governed by the broker author. What a service instance repre‐
sents can vary not just by service, but also by plan. For example, an instance could be
a container, a VM, or a new table and user in an existing database. Plans could offer a
single database on a multitenant server, a dedicated datastore cluster, or simply an
account and specific configuration on a running Cloud Foundry application. The key
38 | Chapter 3: Components
concern is that the broker implements the required API to interact with the Cloud
Controller.
User-provided services
In addition to Cloud Foundry-managed service instances, operators can expose exist‐
ing services via user-provided services. This allows established services such as a cus‐
tomer database to be bound to Cloud Foundry applications.
Buildpacks and Docker Images
How do you run applications inside Cloud Foundry? Applications can simply be
pushed through the following CF CLI command:
$ cf push
There are currently two defined artifact types that you can cf push:
•A standalone application
•A prebuilt Docker image (that could contain additional runtime and middleware
dependencies)
You can push standalone applications in both the form of a prebuild artifact such as a
war/jar file or, in some cases, via raw source code such as a link to a Git remote.
Because a standalone application is not already part of a container image, when it is
pushed to Cloud Foundry the buildpack process runs a compile phase. This involves
compiling any source code and packaging the application and runtime dependencies
into an executable container image. The buildpack process is also responsible for
constructing the application runtime environment, deploying the application, and,
finally, starting the required processes.
Buildpacks are a core link in the chain of the Cloud Foundry deployment process if
you are deploying only an application artifact (e.g., JAR, WAR, Ruby, or Go source).
The buildpack automates the following:
•The detection of an application framework
• The application compilation (known in Cloud Foundry terminology as staging)
• Running the application
The officially supported buildpacks are listed at http://docs.cloudfoundry.org/build
packs/index.html. This list includes Ruby, Java (including other JVM-based lan‐
guages), Go, Python, PHP, Node.js, and the binary and staticfile buildpacks.
Numerous additional community buildpacks exist. You can extend buildpacks or
create new ones for specific language, framework, and runtime support. For example,
Additional Components | 39
5NAT is only required if you are using nonroutable addresses.
the reason you might extend the Java buildpack (JBP) is to add support for additional
application servers or a specific monitoring agent.
Buildpacks take your application artifact and containerize it into what Cloud Foun‐
dry calls a droplet. However, if you already have an OCI-compatible container image
such as a Docker image, you can use cf push to move that directly to Cloud Foundry
in order to run it. Containerization is the delivery mechanism for applications. This
is true whether you push an application that Cloud Foundry containerizes into a
droplet, or you push a prebuilt Docker image.
Infrastructure and the Cloud Provider Interface
Cloud Foundry relies on existing cloud-based infrastructure. The underlying infra‐
structure will have implementation-specific details. For example, vSphere’s vCenter
deals with clusters and resource pools, whereas AWS deals with regions and AZs.
There are, however, some fundamental capabilities that need to be available and set
up prior to installing Cloud Foundry:
• Networks and subnets (typically a /22 private network)
• VMs with specified CPU and memory requirements
•Storage for VMs
• File server or blobstore
•DNS, certificates, and wildcard domains
• Load balancer to pass traffic into the GoRouter
• NAT5 for traffic flowing back to the load balancer
Cloud Foundry abstracts away infrastructure-specific implementations through the
use of a cloud provider interface (CPI). Chapter 10 examines this topic further.
The Cloud Foundry GitHub Repository
Cloud Foundry uses the git system on GitHub to version-control all source code,
documentation, and other resources such as buildpacks. Currently, the integrated
Cloud Foundry code base can be located at cf-deployment. To check out Cloud
Foundry code from GitHub, use the master branch because it points to the most
recent stable final release.
40 | Chapter 3: Components
Summary
By design, Cloud Foundry is a distributed system involving several components,
grouped into the following functional layers:
•Routing for handling application and platform user traffic
•Authentication and user management
•Application life cycle and system state through the Cloud Controller and Diego’s
BBS
•Container runtime and app execution through Diego
• Services via the service marketplace
• Messaging
• Metrics and logging
Decoupling Cloud Foundry into a set of services allows each individual function to
grow and evolve as required. Each Cloud Foundry component has a well-defined
responsibility, and the different components interact with one another and share
state in order to achieve a common goal. This loose coupling is advantageous:
•You can scale individual components in isolation
• You can swap and replace components to extend the platform capability
• You can promote a pattern of reuse
Now that you are aware of the components that comprise Cloud Foundry, you are
ready to begin creating a Cloud Foundry instance. Chapter 4 defines the prerequisites
for installing Cloud Foundry.
Summary | 41
CHAPTER 4
Preparing Your Cloud Foundry Environment
This chapter explores the steps you must perform prior to bootstrapping BOSH and
deploying Cloud Foundry. Critically, Cloud Foundry is not a “one size fits all” tech‐
nology, and therefore, you must make some decisions at the outset prior to installing
the platform. It is important that you understand the key concerns and decision
points that define your environment, including:
•Installation steps
• Non-technical considerations
• Cloud Foundry dependencies and integrations
• IaaS and infrastructure design
• Networking design and routing
This chapter assumes that you are familiar with the Cloud Foundry components dis‐
cussed in Chapter 3.
Installation Steps
Following are general steps for deploying Cloud Foundry:
1. Create and configure your IaaS environment, including all the periphery infra‐
structure that Cloud Foundry requires, such as networks, security groups, blob‐
stores, and load balancers.
2. Set up any additional external enterprise services such as LDAP, syslog endpoints
or monitoring, and metrics dashboards.
3. Deploy the BOSH Director.
43
1Deployment and configuration steps are significantly easier to manage if using a CI pipeline such as Con‐
course.ci.
4. Create an IaaS/infrastructure-specific BOSH configuration such as cloud config‐
uration.
5. Create a deployment manifest to deploy Cloud Foundry.
6. Integrate Cloud Foundry with the required enterprise services (via your deploy‐
ment manifest).
7. Deploy Cloud Foundry.1
The rest of this chapter explores the necessary considerations for each step. Before we
dive into those topics, let’s address the non-technical considerations.
Non-technical Considerations
Before addressing the technical points, it is worth establishing two critical things:
• The team structure required for installing Cloud Foundry
• The required deployment topology of Cloud Foundry
These two concerns are especially pertinent when deploying Cloud Foundry for a
large enterprise. Both concerns go hand in hand because the “where and how” of
deploying Cloud Foundry should always be coupled with the team responsible for
operating it.
Team Structure: Platform Operations for the Enterprise
Cloud Foundry is generally deployed either on a team-by-team basis with one instal‐
lation per business unit, or via a central Platform Operations team who deploy and
operate a centralized platform that other DevOps-centric teams can utilize.
In my experience while working with companies helping them to
install and run Cloud Foundry, I came across a decentralized
deployment model only once. For that one company, it worked
well; however, most enterprises choose to establish a centralized
Platform Operations team.
Even with a cultural shift toward DevOps, organizations structure teams in a variety
of different ways. Choose the team structure that works best for you. Whichever var‐
iant you choose, the Platform Operator’s overall responsibility typically includes the
following roles:
44 | Chapter 4: Preparing Your Cloud Foundry Environment
• Networking administrator
• Storage administrator
• System administrator
• IaaS administrator
• Software development
•Security
• QA and performance testing
• Release management
•Project management
These are not necessarily nine exclusive roles; individuals might combine a number
of the preceding capabilities. For example, an IaaS administrator might also have
storage experience. Most of these roles—networking and security, for example—are
more pertinent at the outset when you’re setting up the platform and operational
processes. It is often sufficient just to maintain a regular point of contact for ongoing
expertise in these areas. A point of contact must still be responsive! If it takes a ticket
and three days of waiting to get a network change, your operating model is broken.
If Cloud Foundry is deployed in your data center instead of on a hosted virtual infra‐
structure, the Platform Operations team will need to work closely with the teams
responsible for the physical infrastructure. This is both to help facilitate capacity
planning and because an understanding and appreciation of the hardware capabilities
underpinning the IaaS layer is required to ensure a sufficient level of availability and
scalability. If Cloud Foundry is deployed to a hosted virtual infrastructure such as
GCP, Microsoft Azure, or AWS, your Platform Operations team still needs to work
closely with the teams responsible for networking so as to establish a direct network
connection between the cloud provider and your enterprise.
There is a software development role in the mix because it is necessary to define and
understand the application requirements and best practices based on Twelve-Factor
applications. Additionally, the Platform Operations team needs to appreciate the
application services required to support the cloud-native applications running on the
platform. Developers often have specific requirements for choosing a particular tech‐
nology appropriate for their application. This need can be magnified in a world of
microservices in which each service should be free to use an appropriate backing
technology so that it can grow and evolve independently from other services. The
Platform Operations team should work directly with other business-capability teams
to ensure that they offer a rich and defined portfolio of platform services. You can
host these services directly on the platform, where it makes sense to do so, instead of
being operated and connected adjacent to the platform.
Non-technical Considerations | 45
2All companies I have worked with have adopted separate preproduction and production instances. From a
technical standpoint, if you do not require strict networking segregation, and you size the environment for all
workloads, then separate instances between preproduction and production are not required. However, the
clean separation of concerns between the two environments is appealing for most companies, especially when
considering additional concerns such as performance testing.
Deployment Topology
How many Cloud Foundry instances should you deploy? There are a number of fac‐
tors to consider when addressing this question. For example, ask yourself the follow‐
ing:
•Do you need one instance for the entire company or one instance per organiza‐
tion?
•Should you have one instance for preproduction workloads and a separate
instance for production apps?
•Do you need to isolate applications or data due to regulatory reasons such as
PCI, NIST, or HIPAA compliance?
•Do you need high availability of Cloud Foundry itself (data center-level redun‐
dancy)?
There is no single answer; businesses have different requirements and companies
adopt different approaches. However, the decision can be guided by the following
considerations:
•A single instance, or set of instances (e.g., a sandbox; preproduction and produc‐
tion) for the entire company is a conservative and reasonable starting point. As
you progress with your Cloud Foundry journey, an additional instance or set of
instances per organization might be required if an organization mandates spe‐
cific customizations (e.g., dedicated internal services that must be isolated from
other departments at the network layer).
•Of primary importance, a sandbox environment is always essential for the Plat‐
form Operator to test out new releases without risking developer or production
downtime.
•An instance for pre-production and a separate instance for production might be
required if strict network segregation to backed services (such as a production
database) is mandated.2
•Separate instances for non-regulated and regulated apps might be required if
dealing with industry regulations such as PCI or HIPPA compliance. A new fea‐
ture known as isolation segments might alleviate this requirement over time, as
discussed further in “Isolation Segments” on page 285.
46 | Chapter 4: Preparing Your Cloud Foundry Environment
•Two instances, one in each data center, might be required to allow for data-
center failover or active–active geodispersed application deployments.
Taking the aforementioned points into consideration, for a scenario in which Cloud
Foundry is centrally managed, I tend to see five deployments for a company that has
data centers in two locations (e.g., East and West regions):
•Two environments in a West data center—pre-production and production
instances
•Two environments in an East data center—pre-production and production
instances
•A sandbox environment for the Platform Operator
The importance of the sandbox environment should not be underestimated. Changes
to the Cloud Foundry environment can occur from several aspects, such as the fol‐
lowing:
• Changing or upgrading the network architecture
•Upgrading the IaaS and networking layer
• Upgrading Cloud Foundry
• Upgrading application services
When applying updates across the stack, there is always the potential, no matter how
small, of breaking the underlying infrastructure, the platform itself, or the apps or
services running on the platform. From the Platform Operator’s perspective, both
production and preproduction environments are essentially production systems
requiring 100 percent uptime. Companies strive to avoid both application and devel‐
oper downtime and there is a financial impact if any downtime occurs to either
group. Therefore, it is vital for the Platform Operator to have a safe environment that
can potentially be broken as new changes are tested, without taking developers or
production applications offline. It is worth pointing out that although there is always
a risk when updating software, Cloud Foundry and BOSH impose a unique set of fac‐
tors, such as instance group canaries and rolling deployments, that all help mitigate
risks and deliver exceptionally reliable updates.
Cloud Foundry Dependencies and Integrations
Before you can deploy and operate Cloud Foundry, you must ensure that all prereq‐
uisite dependencies are in place. These dependencies include concerns such as provi‐
sioning the IaaS and configuring a load balancer. Additionally, you can integrate
Cloud Foundry with other enterprise services such as syslog endpoints, SSO solu‐
tions, and metrics dashboards.
Cloud Foundry Dependencies and Integrations | 47
Following are the minimum external dependencies:
•Configured IaaS and infrastructure environment with available administrator
credentials
• Configured networking (subnets and security groups)
•Defined storage policy and an additional NFS- or Amazon S3–compatible blob‐
store (for both the BOSH blobstore and Cloud Foundry blobstore)
• External load balancers set up to point to GoRouter IP addresses
•DNS records set up, including defining appropriate system, app, and any other
required wildcard domains along with SSL certificates
Additional integration considerations based on enterprise services may require the
following:
• SAML, LDAP, or SSO configured for use with Cloud Foundry where required.
•A syslog endpoint such as Splunk or ELK (Elasticsearch, Logstash, and Kibana) is
available to receive component syslog information.
•System monitoring and metrics dashboards such as DataDog set up to receive
system metrics.
•An application performance management (APM) tool such as Dynatrace, New‐
Relic, or AppDynamics set up for receiving application metrics.
This section focuses only on the dependencies and integration
points required for getting started with Cloud Foundry. However,
before installing BOSH and Cloud Foundry, it is still worth consid‐
ering any services and external dependencies with which Cloud
Foundry might need to integrate. This is important because if, for
example, Cloud Foundry requires access to an external production
database, additional considerations such as network access and the
latency of roundtrip requests might arise. Ultimately, where your
vital backing services reside might be the key deciding factor in
where you deploy Cloud Foundry.
IaaS and Infrastructure Design
Before installing Cloud Foundry, you must correctly configure the underpinning
infrastructure.
The first technical decision to make is what IaaS or infrastructure you should use to
underpin Cloud Foundry. Through the use of the BOSH release tool chain, Cloud
Foundry is designed to run on any IaaS provider that has a supported CPI. Refer to
48 | Chapter 4: Preparing Your Cloud Foundry Environment
3An example of such a deployment is documented at the Pivotal blog.
the BOSH documentation for an up-to-date list of supported CPIs. As of this writing,
there are BOSH CPIs for the following infrastructures:
• Google Compute Platform
• AWS
• Azure
•OpenStack
• vSphere’s vCenter
• vSphere’s vCloudAir
•Photon
• RackHD
• Your local machine (for deploying BOSH Lite)
Your infrastructure decision is often made based on the following three factors:
•Does your company desire to have an on-premises (vSphere or OpenStack) or a
cloud-based (AWS, Azure, GCP) infrastructure? What on-premises infrastruc‐
ture management exists today, and what is your desired future state?
• How long will it take to provision new infrastructure (if managed internally)?
• What is the long-term cost model (both capital and operational expenditure)?
With regard to the time to provision new infrastructure, public cloud providers such
as AWS, GCP, or Azure become appealing for many companies because you can
quickly spin up new infrastructure and then scale on demand. I have been on six-
week engagements for which installing the hardware into the company’s data centers
required four weeks. Time lags such as these are painful; thus utilizing public clouds
can accelerate your startup experience. Some companies choose to adopt both a pub‐
lic and private model to provide resiliency through dual deployments.3
From this point on, as a reference architecture, this chapter explores a Cloud Foun‐
dry installation on AWS. I selected AWS as the target IaaS for this book because it is
accessible both to corporations and individuals who might not have access to other
supported IaaS environments.
IaaS and Infrastructure Design | 49
BOSH Lite
If you would like to begin installing Cloud Foundry and using BOSH but do not want
to incur the cost of running several VMs, I suggest using BOSH Lite. BOSH Lite is a
local development environment for BOSH using containers in a Vagrant box. The
BOSH Director that comes with BOSH Lite uses a Garden CPI, which uses containers
to emulate VMs. The usage of containers makes it an excellent choice for local devel‐
opment, testing, and general BOSH exploration because you can deploy the entire cf-
deployment into containers all running on a single VM. This is a great environment
to try Cloud Foundry, but be mindful that because everything is running in a single
VM, it is suitable only for experimentation, not production workloads. BOSH Lite is
discussed further in “BOSH Lite” on page 175.
Designing for Resilience
As discussed in “Built-In Resilience and Fault Tolerance” on page 15, Cloud Foundry
promotes different levels of built-in resilience and fault tolerance. The lowest layer of
resilience within a single installation is achieved through the use of AZs to drive anti‐
affinity of components and applications to physical servers.
Cloud Foundry AZs (assuming they have been configured correctly) ensure that mul‐
tiple instances of Cloud Foundry components and the subsequent apps running on
Cloud Foundry all end up on different physical hosts. An important point here is that
if you desire to use AZs for high availability (HA), you will need a minimum of three
AZs (due to the need to maintain quorum for components that are based on the Raft
consensus algorithm). Chapter 16 offers more information on designing for resil‐
ience.
The Cloud Foundry deployment example discussed in Chapter 5
uses bosh-bootloader and cf-deployment. bosh-bootloader will
configure the infrastructure on your behalf and will try and config‐
ure three AZs by default (assuming the IaaS you deploy to can sup‐
port three distinct regions).
Sizing and Scoping the Infrastructure
Correctly sizing infrastructure for a specific Cloud Foundry installation can be chal‐
lenging, especially if you cannot accurately forecast how many apps are going to be
deployed to the platform. What’s more, this challenge becomes compounded because
a single application might require several instances due to a combination of the fol‐
lowing:
50 | Chapter 4: Preparing Your Cloud Foundry Environment
• Running multiple instances for resilience
•Running multiple instances over different spaces throughout the application life
cycle
•Running multiple instances for concurrency performance
For example, as depicted in Figure 4-1, a single AI can often require a minimum of
seven instances when running in an active–active setting:
• Two instances in development (blue and green versions)
• One instance in the test or the CI/CD pipeline
•Four instances in production (two instances in each active–active data center)
Figure 4-1. Typical multiregion, multi-instance deployment with user-interaction flow
In addition, a typical application involves more than one process; for example, it
might have a UI layer, controller/model layer, a service layer, and data layer. You can
quickly begin to see that accurate upfront sizing can become complex as the number
of running instances starts to scale.
Therefore, sizing and capacity planning is generally considered to be a “day two”
concern that should be informed, driven, and actioned via metrics and real data, as
opposed to hypothesis prior to deploying Cloud Foundry. A key advantage is that
Cloud Foundry scales well at every layer, from deploying more AIs to deploying
more Cloud Foundry resources, right down to upgrading the IaaS and the physical
infrastructure underpinning the IaaS.
When sizing infrastructure for Cloud Foundry, I strongly suggest leaving sufficient
headroom for growth. However, there is a danger that by doing this you can end up
with an initial IaaS capacity that drastically exceeds what you actually require. For
IaaS and Infrastructure Design | 51
4These reference architectures and tooling are designed for Pivotal Cloud Foundry. The rationale behind these
resources holds true for all Cloud Foundry deployments with the exception of AI Packs and the Ops Manager
VM, which can be discounted if you are not using Pivotal Cloud Foundry.
this reason, I tend to move away from explicit sizing and focus on general scoping.
Scoping is based on a set of predefined concerns. Through scoping, it is possible to
define the underlying infrastructure, and it becomes easier to be more explicit as to
what resources will be required.
Scoping deployments on AWS is relatively straightforward because you can pick your
desired AWS region, create and configure a new virtual private cloud (VPC) [along
with subnets, security groups, etc.], and then map your Cloud Foundry AZs to the
AWS region AZs. When deploying Cloud Foundry to physical infrastructure, you
need to size your infrastructure in advance. For example, most vSphere customers
have a minimum vSphere cluster configuration of three hosts. Two-host clusters are
not ideal if 50% of the cluster is taken up via vCenter HA demands, and a single-host
cluster is meaningless from a vSphere perspective. vSphere hosts sizes usually contain
256 GB, 384 GB, or 512 GB of RAM. Taking into account the requirement of three
AZs based on three vSphere clusters, this implies that the total provisioned RAM is
2.25 to 4.5 TB, which is often drastically more than the initial requirement. The same
logic holds true for CPU cores.
For an excellent reference point, the Pivotal Professional Services
team has put together two fantastic resources to help structure and
size your infrastructure:4
•For establishing the base architectural design, you can review
the reference architecture.
•For sizing your base architecture for the required workload,
there is a sizing tool—the PCF sizing tool—based on some
fundamental scoping questions.
After you define your architectural footprint and then size the infrastructure based
on your anticipated workload, the next key requirement is to understand the implica‐
tions of your applications operating under peak load. For example, a retail app on a
critical sale day might require more instances to deal with concurrency concerns.
Additional instances might require additional Cell resources. Throughput of traffic to
the additional apps might require additional GoRouters to handle the extra concur‐
rent connections. When designing your routing tier to handle the required load, you
can review the Cloud Foundry routing performance page.
52 | Chapter 4: Preparing Your Cloud Foundry Environment
5Diego is the subsystem responsible for app placement and the container life cycle.
Cell sizing
Cell sizing is of particular importance because Cells run your applications. This is
where most of your expansion will be when your AIs begin to increase. Table 4-1 lists
typical sizing for a Cell.
Table 4-1. Cloud Foundry example Cell configuration
Resource Sizing
AI average RAM 1.5 GB
AI average storage 2 GB
Cell instance type AWS m3.2xlarge (or vSphere eight, 4-core vCPUs)
Cell mem size 64 GB
Cell ephemeral storage size 64 GB
Overcommitting Memory
Overcommitting memory is configurable in BOSH. However, you
need to have a crystal-clear profile of your workloads if you over‐
commit memory because you can run out of memory in cata‐
strophic ways if you calculate this wrong. Memory is relatively
inexpensive, so it is best to avoid this.
A typical three-AZ deployment is often initially set up with six Cells in total: two
Cells per AZ. You can always increase the size of your Cells or add additional Cells at
a later date.
A key consideration when choosing Cell size is that if the Cells are too small and the
average app size too high, you risk wasting space. For example, imagine an 8 GB Cell,
with 2 GB apps. When you take into account headroom for the host OS, you will be
able to fit only three 2 GB apps on a single Cell. This means that you are wasting well
over 1 GB of RAM per Cell; as you replicate this pattern over several Cells, the wasted
RAM begins to accumulate to a significant amount.
Conversely, if your Cells are too large, you risk unnecessary churn if a Cell fails. For
example, imagine a 128 GB Cell hosting two hundred fifty-five 512 MB apps.
Although the app density is great, if for some reason the Cell fails, Cloud Foundry5
will need to subsequently replace 255 apps in one go, in addition to any other work‐
load coming onto the platform. Although this volume of replacement most likely will
be fine, replicating this pattern across several concurrent Cell failures—for example,
due to an AZ failure—can cause unnecessary churn. Even though Cloud Foundry can
cope with a huge amount of churn in replacing applications, but if all of the Cells are
IaaS and Infrastructure Design | 53
running at maximum capacity (as in the scenario stated), rescheduling the apps will
not work until you add additional Cells. However, Cloud Foundry is eventually con‐
sistent. Even for a situation in which Cloud Foundry fails to place the apps, it will
eventually fix itself and reach a consistent state when additional resources are added.
Based on the preceding two illustrations, 32 GB to 64 GB Cells (for an average app
size of between 512 MB and 2 GB) is a good sweet spot to work with.
cf-deployment’s manifest generation capability provides default
Cell sizes that can be reconfigured after you generate the deploy‐
ment manifest.
Instance group replication
HA requirements are discussed in detail in “High Availability Considerations” on
page 263. For deployment of a single Cloud Foundry foundation, you are generally con‐
cerned with defining three AZs and removing single points of failure through
instance group replication.
When using AZs, you can ensure that there are very few (or zero) single points of
failure. In addition to running a multiple Cell instance group as previously described,
you should also run multiple instances of the following:
• MySQL (for BBS, UAA, and CCDBs)
•NATS
• Diego (BBS, Brain, AccessVM, CC-Bridge, Cells)
• Routing (GoRouter, Consul, RouteEmitter)
• Cloud Controller (including the CC-Worker)
• UAA
•Doppler server and Loggregator traffic controller
Strictly speaking, most of the aforementioned instance groups
require only two AZs. Clustered components such as MySQL,
Consul, and etcd require quorum and thus require three AZs. For
simplicity, I recommend replicating all components across three
AZs if you have enough IaaS capacity.
Therefore, a single Cloud Foundry deployment spanning three AZs should have at
least two of the aforementioned components that do not require quorum. As dis‐
cussed previously, any component that uses the Raft consensus algorithm (MySQL,
54 | Chapter 4: Preparing Your Cloud Foundry Environment
Consul, etcd) should have three instances to maintain a quorum. These instance
groups always need an odd number of instances deployed, but you get no additional
benefit from having more than three instances. For the other instance groups in the
list, at least two instances are required to ensure HA and to allow for rolling
upgrades. However, you might want a higher number of instances in order to handle
additional capacity and concurrency.
For example, because the GoRouter is on the critical path for application traffic, it
might be considered necessary to have two GoRouters per AZ to ensure that you can
handle peak traffic if an AZ goes offline. Because the GoRouters load-balance
between all the backend instances, the GoRouter is able to span AZs to reach the
desired AI. Therefore, if two GoRouters can handle peak workload, adding an addi‐
tional instance per AZ is not necessary.
Clock Global remains the only singleton. It is not a critical job. It is used for Cloud
Controller cleanup and BOSH resurrection should be sufficient for this instance.
Beyond that, the only single jobs remaining would be the internal NFS. The NFS can
and should be replaced by an external blobstore such as Amazon S3. Additionally, the
HAProxy and MySQL proxy are singletons by default. Again, you can and should use
an enterprise load balancer, in which case the MySQL proxy can have multiple
instances (e.g., one per AZ) when used in conjunction with an external load balancer.
If you’re using more than one Cloud Foundry AZ, any singleton instance group must
be placed into a specific AZ. Other components that can run with several instances
can and should be balanced across multiple AZs.
bosh-bootloader will try to configure three AZs by default (assum‐
ing that the IaaS to which you deploy will support three distinct
regions). cf-deployment will then deploy multiple instances of cer‐
tain instance groups across the three AZs.
Setting Up an AWS VPC
Each IaaS that you use will have its own unique set of dependencies and configura‐
tion steps. For example, before setting up an AWS VPC, you will need the following:
•An AWS account that can accommodate the minimum resource requirements
for a Cloud Foundry installation.
•The appropriate region selected within your AWS account (this example uses
US-west).
•The AWS CLI installed on your operator machine, configured with user creden‐
tials that have administrative access to your AWS account.
IaaS and Infrastructure Design | 55
6bosh-bootloader will create this key-pair for you.
•Sufficiently high instance limits (or no instance limits) on your AWS account.
Installing Cloud Foundry might require more than the default 20 concurrent
instances.
•A key-pair to use with your Cloud Foundry deployment. This key-pair allows
you to configure SSH access to your instances. The key-pair name should be
uploaded to the NAT instance upon VPC creation.6
•A certificate for your chosen default domain.
If you are deploying a highly distributed Cloud Foundry on AWS,
built for HA, you will need to file a ticket with Amazon to ensure
that your account can launch more than the default 20 instances.
In the ticket, ask for a limit of 50 t2.micro instances and 20 c4.large
instances in the region you are using. You can check the limits on
your account by visiting the EC2 Dashboard on the AWS console;
in the Navigation pane on the left, click Limits.
You can bootstrap VPC setup via bosh-bootloader, as described in “Installing Cloud
Foundry” on page 67. Table 4-2 lists the external IaaS-related dependencies that you
should take into consideration when manually setting up an AWS VPC.
Table 4-2. Cloud Foundry AWS VPC dependencies
Item
Access key ID
AWS secret key
VPC ID
AWS security group name
Region
Key-pair name (used for VM-to-VM communication and SSH)
DNS IP(s)
NTP IP(s)
SSH password (if applicable)
VPC to data center connection
If you are not using bosh-bootloader, there are some excellent instructions on setting
up an AWS IaaS environment ready to install Cloud Foundry at prepare-aws.
Figure 4-2 presents an example of a typical direct VPC setup.
56 | Chapter 4: Preparing Your Cloud Foundry Environment
Figure 4-2. Setting up the VPC
A strong word of caution. The Identity and Access Management
(IAM) policy offered by BOSH.io is sufficient to get you started,
but for corporate or production environments, it is highly recom‐
mended you set a very restrictive policy to limit unnecessary user
access. In addition, when setting up the AWS security group
(egress and ingress network rules), it is highly insecure to run any
production environment with 0.0.0.0/0 as the source or to make
any BOSH management ports publicly accessible.
Jumpbox
A jumpbox VM (or Bastian VM) provides layered security because it acts as a single
access point for your control plane. It is the jumpbox that accesses the BOSH Direc‐
tor and subsequent deployed VMs. For operator resilience, there should be more
than one jumpbox. Allowing access through the firewall only to the jumpboxes and
disabling direct access to the other VMs is a common security measure.
IaaS and Infrastructure Design | 57
Using a jumpbox from your VPC to deploy BOSH and Cloud Foundry can help navi‐
gate past network connectivity issues, given that most corporate networks will have
firewall rules preventing SSH connections from a VPC to a local machine.
Instead of using SSH to connect to the jumpbox to run commands, newer CLIs
respect the SOCKS5 protocol. SOCKS5 routes packets between a server and a client
using a proxy server. This means that you can set up an SSH tunnel from your laptop
to the jumpbox, forwarded to a local port. By setting an environment variable prior
to executing the CLI, the CLI will then run through the jumpbox on your worksta‐
tion. This provides the advantage that you do not need to install anything on the
jumpbox, but you have access to the jumpbox’s network space.
You can create a jumpbox by using $bosh create env. There are also tools in the
Cloud Foundry community to help manage, instantiate, and configure your jump‐
box. One such tool is Jumpbox.
Networking Design and Routing
As with all networking, many configurations are possible. The key concern with net‐
working is to design a network topology that suits your business while factoring in
any specific security requirements. The most appropriate network configuration for
Cloud Foundry is established by treating the Cloud Foundry ecosystem as a virtual
appliance. Doing so establishes the expectation that all Cloud Foundry components
are allowed to talk among themselves. Adopting this virtual appliance mindset will
help articulate and shape the networking strategy that could stand in contrast to
more traditional networking policies. Security and networking policies should be in
place and adhered to for sound reasons, not because “that’s the way we’ve always
done it.”
The principal networking best practice is to deploy Cloud Foundry to run on a trus‐
ted, isolated network, typically only accessible via an external SSL terminating load
balancer that fronts Cloud Foundry. Although it is possible to run Cloud Foundry on
any network—for example, where every deployed VM is accessible via its own IP—
nearly all of the projects that I have worked on are deployed into a private network
utilizing NAT for outbound traffic.
Before setting up your VPC, you need to consider the VPC-to-data center connec‐
tion. If you are running your VPC in an isolated manner in which all external calls
use NAT to reach out to your corporate network, the VPC to data center connection
is not important. However, the VPC to data center connection becomes important if
you are treating the VPC as an extension of your corporate network (through capa‐
bilities such as directconnect) because the available network range can have a bearing
on your VPC classless interdomain routings (CIDRs). The important concern in this
scenario is to avoid IP collision with existing resources on the network by defining a
58 | Chapter 4: Preparing Your Cloud Foundry Environment
portion of the existing network that is free and can be strictly reserved for Cloud
Foundry. There are arguments for and against using NAT from the VPC because
some companies desire the components within the Cloud Foundry network to
remain explicitly routable. However, it is worth noting that if you consider the net‐
work as a single address space, you still can use NAT when required, but you cannot
“unNAT” a VPC if your VPC has overlapping addresses with the corporate network.
For each AWS subnet used, you should gather and document the information listed
in Table 4-3. You will use this information in your deployment manifest if you are
creating it from scratch.
Table 4-3. Network dependencies
Network dependencies
AWS network name
VPC subnet ID
Subnet (CIDR range)
Excluded IP ranges
DNS IP(s)
Gateway
A best practice is to co-locate the services in a separate network from Cloud Foundry
and allow a full range of communication (bidirectionally) between the two segments
(the services and Cloud Foundry). You also might choose to have one network per
service and do the same. However, white-listing ports can make the process very
painful, and, for most companies, this extra precaution is regarded as unnecessary.
The general recommendation is to allow full communication among all Cloud Foun‐
dry–managed components at the network-segment level. There is no reason to white-
list ports if you trust all components. This approach is often regarded as entirely
reasonable given that all of the components are under the same management domain.
For a simple installation, the Cloud Foundry documentation provides recommenda‐
tions on basic security group egress and ingress rules.
Using Static IPs
The following Cloud Foundry components (referred to in BOSH parlance as instance
groups) and IaaS components require static IPs or resolution via external DNS:
• Load balancer (static IP for HAProxy, VIP for F5, etc.)
• GoRouter (depending on the IaaS, some CPIs will resolve this for you)
• NATS
•Consul/etcd
Networking Design and Routing | 59
•Database such as a MySQL cluster (relational databse service [RDS] referenced
by DNS)
Static IPs are required for the aforementioned components because they are used by
other downstream instance groups. For example, Consul and etcd, being apex
instance groups, require static IPs; all other instance groups can then be resolved
through external DNS, or internally through Consul’s internal DNS resolution. For
example, the following Cloud Foundry instance groups can have IP resolution via
internal DNS:
•Diego components (BBS, Brain, etc.)
•Cloud Controller
• Cell
• GoRouter (depending on the IaaS, some CPIs will resolve this for you)
• Routing API
• Loggregator
•WebDAV, nfs_server (or S3 blobstore referenced by DNS)
• etcd
Subnets
Network design is dependent on a number of factors. As a minimum, it is recom‐
mended that you create one with a public and private subnet. With a public subnet,
you get an internet GW with no NAT, thus every VM that needs internet access will
require a public IP. With the private network, you can use an NAT gateway or NAT
instance. By default, using bosh-bootloader will provision a management subnet and
then a separate subnet per AZ. This is a basic configuration to get you started. How‐
ever, there is a typical pattern of subnetting your network that has emerged as a com‐
mon practice for production environments (see also Figure 4-3):
• A management subnet for the BOSH Director and or jumpbox
•A dedicated subnet for the core Cloud Foundry components (which also could
contain the BOSH Director)
•A dedicated subnet for the Diego Cells (so as to scale independently as apps
scale)
•A dedicated internet protocol security (IPsec) subnet for services that require the
use of IPSec
• A dedicated subnet for services that do not use IPSec
60 | Chapter 4: Preparing Your Cloud Foundry Environment
The first three networks can be on an IPSec-enabled network if required, but it is
important to ensure that you have a non-IPSec subnet for services that do not or can‐
not use IPSec.
Figure 4-3. Subnet architecture example
Security Groups
Security group requirements—with respect to egress and ingress rules—are defined
in the Cloud Foundry documentation. It is worth noting that, when setting up the
security group inbound rules, you will define SSH from your current IP. If you are on
a corporate network, your IP will change as you hop on and off it. Therefore, if you
are trying to use $ bosh create-env to update your environment, and you have a
previously created security group policy, you might not have authority to connect to
the Director VM as you try to establish a new connection. After the VM is fully set
up, you can remove the “connect from MyIP” rules for SSH because they are no
longer required.
bosh-bootloader will establish and configure the networking
requirements (static IPs, subnets, security groups) on your behalf,
across your VPC AZs.
Networking Design and Routing | 61
Setting Up the Load Balancer
Application end users access applications via a generic domain such as myapp.com. A
DNS is consulted as part of the path to the application; this is done to route traffic on
to the correct Cloud Foundry installation.
Most Cloud Foundry deployments need more than one instance of the GoRouter for
HA. However, DNS targets a single IP, often referred to as a VIP (virtual IP). Conse‐
quently, you need another device in front of the GoRouter to take requests on that IP
and balance them across the available routers. In addition to balancing traffic, the
load balancer (ELB in the case of AWS) can be used to terminate SSL or it can pass
SSL on to the GoRouter for termination.
bosh-bootloader will create your AWS elastic load balancer and
subsequent security groups.
Setting Up Domains and Certificates
Regardless of the installation approach you choose, your Cloud Foundry installation
will need a registered wildcard domain for its default domain. You will need this reg‐
istered domain when configuring your SSL certificate and Cloud Controller. For
more information, read the AWS documentation on Creating a Server Certificate.
When setting up a single Cloud Foundry deployment, it is recommended that you
have at least two default wildcard domains: a system domain and a separate applica‐
tion domain, as demonstrated here:
•*.system.cf.com
•*.apps.cf.com
The system domain allows Cloud Foundry to know about its internal components
because Cloud Foundry itself runs some of its components as applications; for exam‐
ple, the UAA. Developers use the application domain to access their applications.
If Cloud Foundry had only one combined system and app domain, there would be no
separation of concerns. A developer could register an app domain name that infers it
was a potential system component. This can cause confusion to the Cloud Foundry
operator as to what are system applications and what are developer-deployed applica‐
tions. Even worse, if an app is mapped to an existing system-component route such
as api.mycf.com, the Cloud Foundry environment can become unusable. This is dis‐
cussed in “Scenario Five: Route Collision” on page 241. By enforcing a separation of
system and app domains, it is far more likely that you will avoid this scenario. There‐
62 | Chapter 4: Preparing Your Cloud Foundry Environment
fore, it is recommended that you always have at least one default system and default
app domain per environment.
It is also recommended that you use unique system and application domain as
opposed to deploying Cloud Foundry with a system domain that is in the app’s
domain list. For instance, you should use the preceding example; do not use the fol‐
lowing:
•*.system.apps.cf.com
•*.apps.cf.com
You will need an SSL certificate for your Cloud Foundry domains. This can be a self-
signed certificate, but you should use a self-signed certificate only for testing and
development. A certificate should be obtained from a certificate authority (CA) for
use in production.
A quick and effective way to get started is the AWS Certificate Manager. This allows
you to create SSL certificates (multidomain storage-area network [SAN], wildcard,
etc.) for free if you are using the AWS ELB. This makes it very easy to get the “green
padlock” for Cloud Foundry on AWS environments. After it is authorized, a certifi‐
cate will be issued and stored on the AWS ELB, allowing for secure HTTPS connec‐
tions to your ELB.
You can use bosh-bootloader to generate and upload your AWS key and certificate.
Alternatively, if you want to generate your certificate manually, you can use openssl
as follows:
$ openssl req -sha256 -new -key <YOUR_KEY.pem> -out <YOUR_KEY_csr.pem>
$ openssl x509 -req -days 365 -in <YOUR_KEY_csr.pem> \
-signkey <YOUR_KEY.pem> -out <YOUR_CERT.pem>
If you are not using bosh-bootloader, you will need to add your key manually and
upload the certificate to your VPC, as follows:
$ aws iam upload-server-certificate --server-certificate-name <YOUR_CERT_NAME> \
--certificate-body file://<YOUR_CERT.pem> \
--private-key file://<YOUR_KEY.pem>
One final point: be sure to register your chosen domain name in Route53 by creating
the appropriate record to point to your ELB.
Summary
Cloud Foundry is a complex distributed system that requires forethought prior to
deploying. How you configure and deploy Cloud Foundry becomes even more
important if you intend to use your environment for production applications. Appro‐
priate effort and assessment at the outset can help mitigate challenges such as future
Summary | 63
resource contention or infrastructure and component failure. This chapter explored
both the technical and non-technical topics that require upfront consideration prior
to deploying Cloud Foundry.
Although upfront sizing and architectural assessment is vital, keep in mind that
timely progression toward an end state is always better then stagnation due to indeci‐
sion. The benefit of Cloud Foundry’s rolling-upgrade capability is that it offers you
the freedom to modify your environment at any point in the future.
Now that you are aware of the prerequisites, considerations, and decision points that
are required prior to installing Cloud Foundry, Chapter 5 walks you through a Cloud
Foundry installation.
64 | Chapter 4: Preparing Your Cloud Foundry Environment
CHAPTER 5
Installing and Configuring Cloud Foundry
This chapter explores the steps required for bootstrapping BOSH and installing the
Cloud Foundry BOSH release. The general topic of how to install Cloud Foundry is
nuanced and varied, depending on the approach taken. Nonetheless, there are some
fundamental patterns required to achieve a new deployment of Cloud Foundry, and
the steps for deploying Cloud Foundry are broadly the same regardless of the IaaS
platform or the specific tooling used.
This chapter walks you through setting up Cloud Foundry along with the following
key concerns and decision points:
• Using bosh-bootloader
• Installing Cloud Foundry
•Growing the platform
• Validating platform integrity in production
• Logical environment structure (Orgs and Spaces)
• Deploying an application
This chapter assumes that you are familiar with the Cloud Foundry components dis‐
cussed in Chapter 3. It also assumes that you are familar with the basics of Cloud
Foundry’s release-engineering tool chain, BOSH (a recursive acronym meaning
BOSH outer shell). I will provide you with all of the required BOSH commands for
provisioning Cloud Foundry; however, if you would like a deeper overview of BOSH,
feel free to jump ahead to Chapter 10.
65
1The IaaS environment is created and configured by bosh-bootloader for standard configurations. If you
desire a different configuration, you will need to manually create the IaaS environment.
2Deployment and configuration steps are significantly easier to manage if you’re using a CI pipeline such as
Concourse.ci.
The Canonical Approach to Bootstrapping Cloud Foundry
Until recently, there was no canonical way of getting up and run‐
ning with BOSH and Cloud Foundry. Many open source
approaches exist, such as Stark and Wayne’s genesis. Commercial
products also exist such as Pivotal Cloud Foundry, which make it
easy to install and configure BOSH, Cloud Foundry, and additional
backing services. The Cloud Foundry community has recently
added additional tooling and BOSH functionality to help bootstrap
a Cloud Foundry environment.
Installation Steps
In “Installation Steps” on page 43, I described the prerequisites for installing Cloud
Foundry:
1. Create and configure the IaaS environment (including any networks, security
groups, blobstores, and load balancers).1
2. Set up the required external dependencies; this includes additional enterprise
services (such as LDAP, syslog endpoints, monitoring and metrics dashboards).
After the prerequisites are in place, here is the next set of steps to actually install
Cloud Foundry and BOSH:
1. Deploy the BOSH Director.
2. Create an IaaS/infrastructure-specific BOSH configuration such as cloud config‐
uration.
3. Create a deployment manifest to deploy Cloud Foundry.
4. Integrate Cloud Foundry with the required enterprise services (via the deploy‐
ment manifest).
5. Deploy Cloud Foundry.2
The rest of this chapter explores the necessary considerations for each step.
66 | Chapter 5: Installing and Configuring Cloud Foundry
Installing Cloud Foundry
To install Cloud Foundry, you need an infrastructure (in our case, an AWS VPC) and
a BOSH Director. bosh-bootloader was created to bootstrap both of these concerns.
A single BOSH environment consists of both the Director and the deployments that
it orchestrates. Therefore, to create an empty BOSH environment, we first need a
Director. The Director VM includes all necessary BOSH components that will be
used to manage different IaaS resources.
Bootstrapping just the Director was solved by the BOSH CLI, through:
$ bosh create-env
create-env provides only the Director, there still remains the requirement to provi‐
sion and configure the VPC, subnets, security groups, ELB, databases, and blobstores.
You can install both BOSH and the required IaaS environment using bosh-
bootloader. Bosh-bootloader is a command-line utility for setting up Cloud Foundry
(and Concourse) on an IaaS. Under the hood, bosh-bootloader uses the BOSH CLI to
set up the BOSH Director.
Before using bosh-bootloader, be sure to do the following:
• Create and change into a clean directory by running the following commands:
$ mkdir bosh-bootloader;cd bosh-bootloader
•Download the latest stable release and add it to a directory on your PATH, as in
this example (assuming /usr/local/bin is on your PATH):
$ wget https://github.com/cloudfoundry/bosh-bootloader/releases/
download/v2.3.0/bbl-v2.3.0_osx
$ chmod +x bbl-v2.3.0_osx
$ mv bbl-v2.3.0_osx /usr/local/bin/bbl
•Add the correct inline policy to your AWS user (see the bosh-bootloader reposi‐
tory for details)
•Export any required environment variables (see repo for details), such as the fol‐
lowing:
$ export BBL_AWS_ACCESS_KEY_ID=<YOUR ACCESS KEY>
$ export BBL_AWS_SECRET_ACCESS_KEY=<YOUR SECRET KEY>
$ export BBL_AWS_REGION=<YOUR AWS DEPLOYMENT REGION>
Now you are going to focus on bosh-bootloader:
1. To set up the AWS VPC and deploy the BOSH Director, run the following:
$ bbl up
Installing Cloud Foundry | 67
You should now have a BOSH Director residing in a new VPC with the appro‐
priate subnets, security groups, and NAT in place.
The BOSH team recommends updating your BOSH environ‐
ment frequently to be sure you’re using the latest version of
BOSH. The sequence of steps that follows assumes that you
are using BOSH 2.0.
2. Pull out the Director IP, ca-cert, username, and password from the bbl CLI and
then log in to your BOSH Director. BOSH requires a ca-cert because the BOSH
v2 CLI connects over HTTPS to the Director.
A typical approach is to alias the BOSH environment (Director IP address) and
log in as follows:
$ bosh alias-env my-bosh -e <YOUR-BOSH-IP>
Using environment '<YOUR-BOSH-IP>' as user 'user-********'
$ bosh -e my-bosh --ca-cert <(bbl director-ca-cert) \
login --user $(bbl director-username)
--password $(bbl director-password)
Alternatively, you might find it advantageous to set the Director credentials as
environment variables and log in as follows:
$ export BOSH_CLIENT=$(bbl director-username)
$ export BOSH_CLIENT_SECRET=$(bbl director-password)
$ export BOSH_ENVIRONMENT=$(bbl director-address)
$ export BOSH_CA_CERT=$(bbl director-ca-cert)
# check if the above environment is set up correctly
$ $ bosh env
# At this point you can login to your BOSH Director
$ bosh login
You need only provide the credentials the first time you log in to BOSH.
For any of the aforementioned bbl query commands to work,
you need to run them from the same directory as your bbl-
state.json file. This file is created in the directory where you
first ran bbl up.
3. Create the ELB using the key and certificate you created in “Setting Up Domains
and Certificates” on page 62:
$ bbl create-lbs --type cf --cert <YOUR-cert.pem> --key <YOUR-KEY.pem>
68 | Chapter 5: Installing and Configuring Cloud Foundry
This command updates your bbl-state.json. Save this file in a secure location
such as Lastpass after this step.
Cloud Foundry approaches deploying the platform in an unique way. Open
source software often resides in a single repository and can be distributed as
either a single binary or a set of binaries.
The Cloud Foundry cf-deployment does not ship as a single binary or even a set
of binaries that you first compile then deploy. Instead, cf-deployment is a set of
software packages pulled from other Git repositories. These software packages
are automatically compiled from source as part of the BOSH deployment pro‐
cess.
As part of the deployment process, BOSH sets up separate, temporary VMs to
compile packages and automatically store the results for subsequent distribution.
These VMs are known as compilation VMs.
Before beginning a new installation of Cloud Foundry, be sure to use the latest
stable BOSH v2 CLI obtained from http://bosh.io.
4. BOSH provides a way to capture all OS dependencies as one image, known as a
stemcell (discussed later in “Stemcells” on page 176). Upload the appropriate stem‐
cell for your IaaS environment; for example:
$ bosh -e my-bosh upload-stemcell \
https://bosh.io/d/stemcells/bosh-aws-xen-hvm-ubuntu-trusty-go_agent
5. Deploy Cloud Foundry. You can obtain the canonical manifest for deploying
Cloud Foundry from the cf-deployment Git repository. In time, this repo will
also contain tooling for aiding deployment.
To deploy Cloud Foundry, run the following:
$ bosh -e my-env -d cf deploy cf-deployment/cf-deployment.yml \
--vars-store env-repo/deployment-vars.yml \
-v system_domain=<YOUR-CFDomain.com>
The cf-deployment.yml manifest requires two additional parameter types:
• Environment-specific data; as of this writing, this is just your system domain
• Sensitive configuration information such as credentials
You should propagate all other configurations by using option fields. For addi‐
tional information, see cf-deployment.
To import these datasets, the BOSH CLI currently uses the --vars-store flag.
This flag reads in .yml files and extracts the values present in those files to fill
out the template represented by cf-deployment. The BOSH v2 CLI generates
this .yml file with all of the necessary variables to populate the cf-deployment
manifest. In the command in the previous example, the --vars-store env-
repo/deployment-vars.yml -v system_domain=$SYSTEM_DOMAIN generates a
Installing Cloud Foundry | 69
deployment-vars.yml file based on custom variables such as your Cloud Foun‐
dry system domain. Note that for production deployments, you should use
config-server instead of var-store.
cf-deployment contains all required BOSH releases for
deploying Cloud Foundry; for example:
releases:
- name: capi
url: https://bosh.io/d/github.com/cloudfoundry/
capi-release?v=1.15.0
version: 1.15.0
sha1: 2008137d5bb71e701cedba96cb363e1bfbdebd45
The BOSH CLI will inspect the release URL and then
retrieve and upload releases to the Director’s blobstore.
There is no requirement to explicitly upload a release to the
Director. Instead of linking to a tarball on bosh.io, as in the
example here, you can link to a Git repository that then
links to a release blobstore. This detail is explained further
in “Packaging a Release” on page 199.
6. Use the following command to view your deployed environment:
$ bosh instances --ps #view everything deployed
Figure 5-1 shows the output.
One important point: be sure to register your chosen wildcard
domain names in a DNS such as Route53, and create the
appropriate records to point to your load balancer.
70 | Chapter 5: Installing and Configuring Cloud Foundry
Figure 5-1. Deployed instance groups that comprise Cloud Foundry
7. Target the cf api domain and then log in as follows:
$ cf api api.<YOUR-CFDomain.com> --skip-ssl-validation
8. Log in via cf login. For the administrator username and password, use the fol‐
lowing values from the deployment-vars.yml file:
user:
Pull uaa_scim_users_admin_name out of deployment-vars. This currently
defaults to admin: password:. Pull uaa_scim_users_admin_password out of
deployment-vars.yml. The final deployment should look something like that
shown in Figure 5-2.
Installing Cloud Foundry | 71
Figure 5-2. bbl and cf-deployment topology on AWS spanning two AZs (three AZs are
preferable, but at the time of writing only two AZs are available in us-west)
72 | Chapter 5: Installing and Configuring Cloud Foundry
Changing Stacks
As we discussed in “Stacks” on page 37, a stack is a prebuilt rootfs used along with
droplets to provide the container filesystem used for running applications. The fol‐
lowing Cloud Foundry command lists all the stacks available in a deployment:
$ cf stacks
To change a stack and restage an application, use this command:
$ cf push APPNAME -s STACKNAME
Growing the Platform
bosh-bootloader provides a reasonable, default configuration for getting started with
Cloud Foundry. However, over time, your deployment and platform requirements
are likely to grow in scope and external integration points. You will also need to
upgrade the platform to take advantage of both security updates and new features.
As a distributed platform with many moving parts, Cloud Foundry supports rolling
upgrades, which make it possible for you to make changes to platform configuration
and underlying infrastructure. If you need additional or bigger Cells, you can resize
them and/or increase their number and then simply redeploy Cloud Foundry using
BOSH’s ability to perform rolling upgrades. If you need additional IaaS capacity, you
can grow your infrastructure to support your existing Cloud Foundry deployment.
You can also deploy additional Cloud Foundry environments, as required.
The key is that you must make some important design decisions at the outset, but
you are not locked into a static configuration and are free to grow and develop your
Cloud Foundry environment, as needed.
Validating Platform Integrity in Production
This section discusses how to validate the health and integrity of your production
environment. After you have successfully deployed Cloud Foundry, you should run
the smoke tests and Cloud Foundry acceptance tests (CATS) to ensure that your
environment is working properly. You should also maintain a dedicated sandbox to
test any changes prior to altering any development or production environment.
Start with a Sandbox
If you run business-critical workloads on a production instance of Cloud Foundry,
you should consider using a sandbox or staging environment prior to making any
production platform changes. This approach allows you to test any production apps
in an isolated environment before rolling out the platform changes (be it an infra‐
Changing Stacks | 73
3cf-smoke-tests are part of cf-deployment and will run as a one-off errand.
4cf-acceptance-tests are part of cf-deployment and will run as a one-off errand.
structure upgrade, stemcell change, Cloud Foundry release upgrade, buildpack
change, or service upgrade) to production.
The sandbox environment should mirror the production environment as closely as
possible. It should also contain a representation of the production apps and a mock
up of the production services to ensure that you validate the health of the applica‐
tions running on the platform. An example set of application tests could include the
following:
• Use the cf push command to push the app
•Bind an application to a service(s)
• Start an app
• Target the app on a restful endpoint and validate the response
• Target the app on a restful endpoint to write to a given data service
•Target the app on a restful endpoint to read the written value
• Generate and log a unique string to validate application logging
• Stop an app
• Delete an app
This suite of tests should be designed to exercise the core user-facing functionality,
including the applications interacting with any backing services.
Running these sorts of tests against each Cloud Foundry instance on a CI server with
a metrics dashboard is the desired approach for ease of repeatability. Not only do you
get volume testing for free (e.g., you can easily fill up a buildpack cache that way), you
can publish the dashboard URL to your platform consumers and stakeholders alike.
Tying these tests up to alerting/paging systems is also more efficient than paging peo‐
ple due to IaaS-level failures.
Production Verification Testing
Before making the production environment live, test the behavior of your platform
by running the following:
•cf-smoke-tests3 to ensure core Cloud Foundry functionality is working
•cf-acceptance-tests4 to test Cloud Foundry behavior and component integration
in more detail
74 | Chapter 5: Installing and Configuring Cloud Foundry
5For a current view on what is governed by quota plans, check out the Cloud Foundry documentation on
Creating and Modifying Quota Plans.
•Your own custom acceptance tests against the applications and services you have
written, including any established or customized configuration (this ensures that
the established application behavior does not break)
•External monitoring against your deployed apps
After you make the environment live, it is still important to identify any unintended
behavior changes. Therefore, the ongoing periodic running of your own acceptance
tests in production is absolutely recommended. For example, it is possible that you
might uncover a problem with the underlying infrastructure that can be noticed only
by repeatedly running acceptance tests in production.
Production Configuration Validation
cf-acceptance-tests (CATS) was never designed for live configura‐
tion validation. It is a developer workflow for engineers building
Cloud Foundry to verify that changes have not adversely affected
the platform. As a precheck, CATS is a valuable suite of tests to
run. The Cloud Foundry engineering team is currently working on
a production configuration validation test harness. If, in the mean‐
time, you need ongoing production validation, you can run your
own custom acceptance tests against the applications and services
you have written, including any established or customized configu‐
ration. Due to the way CATS modifies global state across all run‐
ning apps, running CATS against a live production environment is
not recommended.
Logical Environment Structure
Earlier, in “IaaS and Infrastructure Design” on page 48, we described the need to both
design and provision the infrastructure and IaaS layer to support a single Cloud
Foundry foundation. After deploying Cloud Foundry, the next task is to divide the
IaaS resources into logical environments for various teams, products, and users to
utilize. These logical environments within Cloud Foundry are known as Orgs and
Spaces.
As discussed in “Organizations and Spaces” on page 23, Orgs and Spaces provide a
way to group users together for management purposes. All members of an Org share
the same resource quota plan, services availability, and custom domain. Quota plans
are associated with Orgs. A quota will apply to all the activities within a particular
organization.5
Logical Environment Structure | 75
Each Org will have at least one Space but can have multiple Spaces. Every application
and service is scoped to a Space. Spaces provide a shared location for application
development, deployment, and maintenance, and users will have specific Space-
related roles. All members of a Space have access to any application environment
variables configured in the Space. Therefore, the Space members must have a strict
level of trust among one another.
Orgs and Spaces allow you to create a logical multitenant environment within your
Cloud Foundry deployment. They give you a level of abstraction with which you can
define who can do what in a particular environment. They also provide a way of allo‐
cating resources and governing chargeback.
You can define Orgs and Spaces any way that you like. Typically, Orgs are defined
around constructs such as the lines of businesses or particular projects and initiatives.
Spaces are generally defined in a couple of different ways:
•Typically, bigger organizations are broken down into “two-pizza teams,” with
each team owning its own Space. Usually, an individual team is responsible for
developing an isolated component or components such as a specific microser‐
vice.
•When deploying through a pipeline, it is often useful to have a development
space, test space, staging space, and a production space for the applications mov‐
ing into production.
There is no right or wrong way to structure your Orgs and Spaces; an important
point to keep in mind is that you are free to alter your logical Org and Space bound‐
aries at any point.
To begin, you can use the default Org and simply create a Space, as demonstrated
here:
$ cf create-space developer
Creating space developer in org default_org as admin...
OK
Assigning role SpaceManager to user admin in org default_org /
space developer as admin...
OK
Assigning role SpaceDeveloper to user admin in org default_org /
space developer as admin...
OK
You can then target the new developer space:
$ cf target -o "default_org" -s "developer"
76 | Chapter 5: Installing and Configuring Cloud Foundry
Pushing Your First App
You can push many different apps to Cloud Foundry. If you do not have one on
hand, you can download spring-music.
$ git clone https://github.com/scottfrederick/spring-music
$ cd spring-music
The Spring Music repository contains a sample app manifest. An app manifest pro‐
vides an easy way to define and capture any required command-line arguments and
application metadata. This is important because you can source-control the argu‐
ments you used when deploying your app. You do not need to deploy your app with
an app manifest. If you prefer, you can simply provide any required arguments via
the command line, as shown here:
---
applications:
- name: spring-music
memory: 1G
random-route: true
path: build/libs/spring-music.jar
Therefore, to deploy the app, all you need to do is compile the code by
using ./gradlew assemble and then use cf push:
$ ./gradlew assemble
$ cf push
The cf push command should return a URL that you can then use to access your
deployed app.
Summary
This chapter walked you through an end-to-end installation of Cloud Foundry using
cf-deployment and bosh-bootloader. If you followed all the steps, you should have
accomplished the following:
•Deployed the BOSH Director and created IaaS-specific components using bosh-
bootloader
• Deployed a working instance of Cloud Foundry using BOSH and cf-deployment
• Successfully validated platform integrity through CATS smoke tests
• Set up an Org and Space
•Pushed an application
Now that you are up and running with Cloud Foundry, Chapter 6 introduces you to
the underlying concepts and operational aspects of Diego.
Pushing Your First App | 77
CHAPTER 6
Diego
Diego is the container runtime architecture for Cloud Foundry. It is responsible for
managing the scheduling, orchestration, and running of containerized workloads.
Essentially, it is the heart of Cloud Foundry, running your applications and one-off
tasks in containers, hosted on Windows and Linux backends. Most Cloud Foundry
users (e.g., developers) do not interact with Diego directly. Developers interact only
with Cloud Foundry’s API, known as CAPI. However, comprehending the Diego
container runtime is essential for Platform Operators because, as an operator, you are
required to interact with Diego for key considerations such as resilience requirements
and application troubleshooting. Understanding Diego is essential for understanding
how workloads are deployed, run, and managed. This understanding also provides
you with an appreciation of the principles underpinning container orchestration.
This chapter explores Diego’s concepts and components. It explains the purpose of
each Diego service, including how the services interrelate as state changes throughout
the system.
Implementation Changes
It is important to understand the fundamental concepts of the
Diego system. The specific technical implementation is less conse‐
quential because it is subject to change over time. What Diego does
is more important than how it does it.
Why Diego?
Residing at the core of Cloud Foundry, Diego handles the scheduling, running, and
monitoring of tasks and long-running processes (applications) that reside inside
79
managed containers. Diego extends the traditional notion of running applications to
scheduling and running two types of processes:
Task
A process that is guaranteed to be run at most once, with a finite duration. A
Task might be an app-based script or a Cloud Foundry “job”; for example, a stag‐
ing request to build an application droplet.
Long-running process (LRP)
An LRP runs continuously and may have multiple instances. Cloud Foundry dic‐
tates to Diego the desired number of instances for each LRP, encapsulated as
DesiredLRPs. All of these desired instances are run and represented as actual
LRPs known as ActualLRPs. Diego attempts to keep the correct number of
ActualLRPs running in the face of any network partitions, crashes, or other fail‐
ures. ActualLRPs only terminate due to intervention, either by crashing or being
stopped or killed. A typical example of an ActualLRP is an instance of an applica‐
tion running on Cloud Foundry.
Cloud Foundry is no longer solely about the application as a unit of work. The addi‐
tion of running one-off tasks in isolation opens up the platform to a much broader
set of workloads; for example, running Bash scripts or Cron-like jobs to process a
one-off piece of data. Applications can also spawn and run tasks for isolated compu‐
tation. Tasks can also be used for local environmental adjustments to ensure that
applications adhere to service-level agreements (SLAs).
The required scope for what Cloud Foundry can run is ever increasing as more work‐
loads migrate to the platform. Tasks and LRPs, along with the new TCPRouter (dis‐
cussed in “The TCPRouter” on page 134), have opened up the platform to
accommodate a much broader set of workloads. In addition to traditional web-based
applications, you can now consider Cloud Foundry for the following:
•Internet of Things (IoT) applications such as aggregating and processing device
data
• Batch applications
• Applications with application tasks
• Computational numerical modeling
• Reactive streaming applications
•TCP-based applications
By design, Diego is agnostic to preexisting Cloud Foundry components such as the
Cloud Controller. This separation of concerns is compelling. Being agnostic to both
client interaction and runtime implementation has allowed for diverse workloads
80 | Chapter 6: Diego
with composable backends. For example, Diego has generalized the way container
image formats are handled; Diego’s container management API is Garden.
The Importance of Container Management
For a container image (such as a Docker image or other OCI–compatible image) to
be instantiated and run as an isolated process, it requires a container-management
layer that can create, run, and manage the container process. The Docker company
provides its own Linux Container manager for running Docker images. Other
container-management technologies also exist, such as rkt and runC. As a more gen‐
eralized container management API, Cloud Foundry uses Garden to support a variety
of container technologies. Garden is discussed further in “Why Garden?” on page 150.
Through Garden, Diego can now support any container image format that the Gar‐
den API supports. Diego still supports the original droplet plus a stack combination
but can now accommodate other image formats; for example, OCI-compatible
images such as Docker or Rocket. In addition, Diego has added support for running
containers on any Garden-based container technology, including Linux and
Windows-based container backends that implement the Garden API. Figure 6-1
illustrates Diego’s ability to support multiple application artifacts and container
image formats.
Figure 6-1. Developer interaction, cf pushing different application artifacts to Cloud
Foundry
Why Diego? | 81
A Brief Overview of How Diego Works
Container scheduling and orchestration is a complex topic. Diego comprises several
components, and each component comprises one or more microservices. Before get‐
ting into detail about these components and services, it is worth taking a moment to
introduce the end-to-end flow of the Diego component interactions. At this point, I
will begin to introduce specific component names for readability. The individual
component responsibilities will be explained in detail in “Diego Components” on
page 97.
At its core, Diego executes a scheduler. Scheduling is a method by which work, speci‐
fied by some means, is assigned to resources that attempt to undertake and complete
that work.
A Tip on Scheduling
Schedulers are responsible for using resources in such a way so as
to allow multiple users to share system resources effectively while
maintaining a targeted quality of service. Scheduling is an intrinsic
part of the execution model of a distributed system. Scheduling
makes it possible to have distributed multitasking spread over dif‐
ferent nodes, with a centralized component responsible for pro‐
cessing. Within OSs, this processing unit is referred to as a CPU.
Within Cloud Foundry, Diego acts as a centralized processing unit
for all scheduling requirements.
Diego clients—in our case, the Cloud Controller (via a bridge component known as
the CC-Bridge)—submit, update, and retrieve Tasks and LRPs to the BBS. The BBS
service is responsible for Diego’s central data store and API. Communication from
the Cloud Controller to the BBS is via a remote procedure call (RPC)–style API
implemented though Google protocol buffers.
The scheduler within Diego is governed by Diego’s Brain component. The Brain’s
orchestration function, known as the Auctioneer service, retrieves information from
the BBS and optimally distributes Tasks and LRPs to the cluster of Diego Cell
machines (typically VMs). The Auctioneer distributes its work via an auction process
that queries Cells for their capacity to handle the work and then sends work to the
Cells. After the Auctioneer assigns work to a Cell, the Cell’s Executor process creates
a Garden container and executes the work encoded in the Task/LRP. This work is
encoded as a generic, platform-independent recipe of composable actions (described
in “Composable Actions” on page 85). Composable actions are the actual actions that
run within a container; for example, a RunAction that runs a process in the con‐
tainer, or a DownloadAction that fetches and extracts an archive into the container.
To assist in setting up the environment for the process running within the container,
82 | Chapter 6: Diego
1These services, along with the concept of domain freshness, are discussed further in “The CC-Bridge” on page
93.
Application Life-Cycle Binaries (e.g., Buildpacks) are downloaded from a file server
that is responsible for providing static assets.
Application Life-Cycle Binaries
Staging is the process that takes an application and composes a
executable binary known as a droplet. Staging is discussed further
in “Staging” on page 159. The process of staging and running an
application is complex and filled with OS and container
implementation-specific requirements. These implementation-
specific concerns have been encapsulated in a collection of binaries
known collectively as the Application Life-Cycle. The Tasks and
LRPs produced by the CC-Bridge download the Application Life-
Cycle Binaries from a blobstore (in Cloud Foundry’s case, the
Cloud Controller blobstore). These Application Life-Cycle Binaries
are helper binaries used to stage, start, and health-check Cloud
Foundry applications. You can read more about Application Life-
Cycle Binaries in “Application Life-Cycle Binaries” on page 112.
Diego ensures that the actual LRP (ActualLRP) state matches the desired LRP (Desir‐
edLRP) state through interaction with its client (the CC-Bridge). Specific services on
the CC-Bridge are responsible for keeping the Cloud Controller and Diego synchron‐
ized, ensuring domain freshness.1
The BBS also provides a real-time representation of the state of the Diego cluster
(including all DesiredLRPs, running ActualLRP instances, and in-flight Tasks). The
Brain’s Converger periodically analyzes snapshots of this representation and corrects
discrepancies, ensuring that Diego is eventually consistent. This is a clear example of
Diego using a closed feedback loop to ensure that its view of the world is accurate.
Self-healing is the first essential feature of resiliency; a closed feedback loop ensures
that eventual consistency continues to match the ever-changing desired state.
Diego sends real-time streaming logs for Tasks/LRPs to the Loggregator system.
Diego also registers its running LRP instances with the GoRouter via the Route-
Emitter, ensuring external web traffic can be routed to the correct container.
Essential Diego Concepts
Before exploring Diego’s architecture and component interaction, you need to be
aware of two fundamental Diego concepts:
Essential Diego Concepts | 83
• Action abstraction
• Composable actions
As work flows through the distributed system, Diego components describe their
actions using different levels of abstraction. Diego can define different abstractions
because the architecture is not bound by a single entity; for example, a single mono‐
lithic component with a static data schema. Rather, Diego consists of distributed
components that each host one or more microservices. Although microservices
architecture is no panacea, if designed correctly for appropriate use cases, microservi‐
ces offer significant advantages by decoupling complex interactions. Diego’s micro‐
services have been composed with a defined boundary and are scoped to a specific
component. Furthermore, Diego establishes well-defined communication flows
between its component services. This is vital for a well designed system.
Components versus Services
Components like the Cell, Brain, and CC-Bridge are all names of VMs where Diego
services are located. The location of those services is mutable, and depending on the
deployment setup, some components can have more than one service residing on
them. When describing the Diego workflow, consider the Diego components as
though they were locations of specific services rather than the actual actors. For
example, it makes sense to talk about distributing work to the Cells because a Cell is
where the container running your workload will reside. However, the Cell does not
actually create the container; it is currently the Rep service that is responsible for ini‐
tiating container creation. Theoretically, the Rep could reside on some other compo‐
nent such as the BBS. Similarly, the Auctioneer is the service responsible for
distributing work; the Brain is just the component VM on which the Auctioneer is
running. In BOSH parlance, components are referred to as instance groups and serv‐
ices are referred to as release jobs.
Action Abstraction
Because each microservice is scoped to a specific Diego component, each service is
free to express its work using its own abstraction. This design is incredibly powerful
because bounded abstractions offer an unprecedented degree of flexibility. Abstrac‐
tion levels move from course high-level abstractions to fine-grained implementations
as work moves through the system. For example, work can begin its journey at the
Cloud Controller as an app, but ultimately all work ends up as a scheduled process
running within a created container. This low-level implementation of a scheduled
process is too specific to be hardwired into every Diego component. If it were hard‐
wired throughout, the distributed system would become incredibly complex for end
84 | Chapter 6: Diego
2The available actions are documented in the Cloud Foundry BBS Release on GitHub.
users and brittle to ongoing change. The abstraction boundaries provides two key
benefits:
• The freedom of a plug-and-play model
• A higher-level concern for Diego clients
Plug-and-play offers the freedom to replace key parts of the system when required,
without refactoring the core services. A great example of this is a pluggable container
implementation. Processes can be run via a Docker image, a droplet plus a stack, or
in a Windows container. The required container implementation (both image format
and container backend) can be plugged into Diego as required without refactoring
the other components.
Moreover, clients of Diego have high-level concerns. Clients should not need to be
concerned with underlying implementation details such as how containers (or, more
specifically, process isolation) are created in order to run their applications. Clients
operate at a much higher level of abstraction, imperatively requesting “run my appli‐
cation.” They are not required to care about any of the underlying implementation
details. This is one of the core benefits of utilizing a platform to drive application
velocity. The more of the undifferentiated heavy lifting you allow the platform to
undertake, the faster your business code will progress.
Composable Actions
At the highest level of abstraction, the work performed by a Task or LRP is expressed
in terms of composable actions, exposed via Diego’s public API. As described earlier,
composable actions are the actual actions that run within a container; for example, a
RunAction that runs a process in the container, or a DownloadAction that fetches
and extracts an archive into the container.
Conceptually each composable action implements a specific instruction. A set of
composable actions can then be used to describe an explicit imperative action, such
as “stage my application” or “run my application.”2 Composable actions are, by and
large, hidden from Cloud Foundry users. Cloud Foundry users generally interact only
with the Cloud Controller. The Cloud Controller (via the CC-Bridge) then interacts
with Diego through Diego’s composable actions. However, even though as a Platform
Operator you do not interact with composable actions directly, it is essential that you
understand the available composable actions when it comes to debugging Cloud
Foundry. For example, the UploadAction might fail due to misconfigured blobstore
credentials, or a TimeoutAction might fail due to a network partition.
Essential Diego Concepts | 85
Composable actions include the following:
1. RunAction runs a process in the container.
2. DownloadAction fetches an archive (.tgz or .zip) and extracts it into the con‐
tainer.
3. UploadAction uploads a single file, in the container, to a URL via POST.
4. ParallelAction runs multiple actions in parallel.
5. CodependentAction runs multiple actions in parallel and will terminate all code‐
pendent actions after any single action exits.
6. SerialAction runs multiple actions in order.
7. EmitProgressAction wraps another action and logs messages when the wrapped
action begins and ends.
8. TimeoutAction fails if the wrapped action does not exit within a time interval.
9. TryAction runs the wrapped action but ignores errors generated by the wrapped
action.
Because composable actions are a high-level Diego abstraction, they describe generic
activity, not how the activity is actually achieved. For example, the UploadAction
describes uploading a single file to a URL; it does not specify that the URL should be
the Cloud Controller’s blobstore. Diego, as a generic execution environment, does
not care about the URL; Cloud Foundry, as a client of Diego, is the entity responsible
for defining that concern. This concept ties back to the action abstraction discussed
previously, allowing Diego to remain as an independently deployable subsystem, free
from specific end-user concerns.
So, how do composable actions relate to the aforementioned Cloud Foundry Tasks
and LRPs? Consider the steps involved when the Cloud Controller issues a run com‐
mand for an already staged application. To bridge the two abstractions, there is a
Cloud Foundry-to-Diego bridge component known as the Cloud Controller Bridge
(CC-Bridge). This essential function is discussed at length in “The CC-Bridge” on
page 93. For now, it is sufficient to know that the CC-Bridge knows how to take the
various resources (e.g., a droplet, metadata ,and blobstore location) that the Cloud
Controller provides, coupled with a desired application message. Then, using these
composable actions, the CC-Bridge directs Diego to build a sequence of composable
actions to run the droplet within the container, injecting the necessary information
provided by the Cloud Controller. For example, the specific composable action
sequence for a run command will be:
•DownLoadAction to download the droplet from the CC blobstore into a speci‐
fied location inside the container.
86 | Chapter 6: Diego
•DownLoadAction to download the set of static plugin (AppLifeCycle) binaries
from a file server into a specified location within the container.
•RunAction to run the start command in the container, with the correct parame‐
ters. RunAction ensures that the container runs the code from the droplet using
the helper (AppLifeCycle) binaries that correctly instantiate the container envi‐
ronment.
Applications are broken into a set of Tasks such as “stage an app” and “run an app.”
All Diego Tasks will finally result in a tree of composable actions to be run within a
container.
Layered Architecture
Diego is comprised of a number of microservices residing on several components.
Diego is best explained by initially referring to the diagram in Figure 6-2.
Figure 6-2. Diego components
Layered Architecture | 87
3You can find more information on Diego online, including the Diego BOSH release repository.
We can group the components broadly, as follows:
•A Cloud Foundry layer of user-facing components (components with which you,
as a platform user, will directly interact)
•The Diego Container Runtime layer (components with which the core Cloud
Foundry components interact)
Each Diego component is a single deployable BOSH machine (known as an instance
group) that can have any number of machine instances. Although there can be multi‐
ple instances of each instance group to allow for HA and horizontal scaling, some
instance groups require a global lock to ensure that only one instance is allowed to
make decisions.
Diego BOSH Releases
As with other Cloud Foundry components, you can deploy and run the Diego BOSH
release as a standalone system. Having the property of being independently deploya‐
ble is powerful. The engineering team responsible for the Diego subsystem has a
smaller view of the distributed system allowing decoupling from other Cloud Foun‐
dry subsystems and faster iterative development. It also means that the release-
engineering team can adopt more holistic granular testing in a heterogeneous
environment of different subsystems. Because each Cloud Foundry subsystem can be
deployed in isolation, more complex upgrade migration paths can now be verified
because older versions of one subsystem, such as the Postgres release, can be
deployed alongside the latest Diego release.3
We will begin exploring Diego by looking at the user-facing Cloud Foundry compo‐
nents that act as a client to Diego, and then move on to the specific Diego compo‐
nents and services.
Interacting with Diego
Cloud Foundry users do not interact with Diego directly; they interact with the Cloud
Foundry user-facing components, which then interact with Diego on the user’s
behalf. Here are the Cloud Foundry user-facing components that work in conjunc‐
tion with Diego:
•CAPI components: the Cloud Foundry API (Cloud Controller and the CC-
Bridge)
88 | Chapter 6: Diego
4User management is primarily handled by the User Access and Authentication, which does not directly inter‐
act with Diego but still remains an essential piece of the Cloud Foundry ecosystem.
• The logging system defined by the Loggregator and Metron agents
• Routing (GoRouter, TCPRouter, and the Route-Emitter)
Collectively, these Cloud Foundry components are responsible for the following:
• Application policy
• Uploading application artifacts, droplets, and metadata to a blobstore
•Traffic routing and handling application traffic
• Logging
•User management including end-user interaction via the Cloud Foundry API
commands4
Diego seamlessly hooks into these different Cloud Foundry components to run appli‐
cations and tasks, route traffic to your applications, and allow the retrieval of
required logs. With the exception of the CC-Bridge, these components were dis‐
cussed at length in Chapter 3.
CAPI
Diego Tasks and LRPs are submitted to Diego via a Diego client. In Cloud Foundry’s
case, the Diego client is the CAPI, exposed by the Cloud Controller. Cloud Foundry
users interact with the Cloud Controller through the CAPI. The Cloud Controller
then interacts with Diego’s BBS via the CC-Bridge, a Cloud Foundry-to-Diego trans‐
lation layer. This interaction is depicted in Figure 6-3.
Figure 6-3. API interaction from the platform user through to Diego
The Cloud Controller provides REST API endpoints for Cloud Foundry users to
interact with Cloud Foundry for commands including the following:
•Pushing, staging, running, and updating
• Pushing and running discrete one-off tasks
Interacting with Diego | 89
• Creating users, Orgs, Spaces, Routes, Domains and Services, and so on
• Retrieving application logs
The Cloud Controller (discussed in “The Cloud Controller” on page 33) is concerned
with imperatively dictating policy to Diego, stating “this is what the user desires;
Diego, make it so!”; for example, “run two instances of this application.” Diego is
responsible for orchestrating and executing the required workload. It deals with
orchestration through a more autonomous subsystem at the backend. For example,
Diego deals with the orchestration of the Cells used to fulfill a workload request
through an auction process governed by the Auctioneer.
This design means that the Cloud Controller is not coupled to the execution
machines (now known as Cells) that run your workload. The Cloud Controller does
not talk to Diego directly; instead, it talks only to the translation component, the CC-
Bridge, which translates the Cloud Controller’s app-specific messages to the more
generic Diego language of Tasks and LRPs. The CC-Bridge is discussed in-depth in
“The CC-Bridge” on page 93. As just discussed, this abstraction allows the Cloud
Foundry user to think in terms of apps and tasks, while allowing each Diego service
to express its work in a meaningful abstraction that makes sense to that service.
Staging Workflow
To better understand the Cloud Controller interaction with Diego, we will explore
what happens during a staging request. Exploring staging introduces you to two
Diego components:
• The BBS: Diego’s database that exposes the Diego API.
• Cells: the execution machines responsible for running applications in containers.
These two components are discussed at length later in this chapter. Understanding
the staging process provides you with a clear picture of how Cloud Foundry inter‐
prets the $ cf push and translates it into a running ActualLRP instance. Figure 6-4
provides an overview of the process.
90 | Chapter 6: Diego
Figure 6-4. Interaction between Cloud Foundry’s Cloud Controller components and
Diego, while staging and running an application
Interacting with Diego | 91
The numbered list that follows corresponds to the callout numbers in Figure 6-4 and
provides an explanation of each stage:
1. A developer/Platform Operator uses the Cloud Foundry command-line tool to
issue a cf push command.
2. The Cloud Foundry command-line tool instructs the Cloud Controller to create
a record for the application and sends over the application metadata (e.g., the
app name, number of instances, and the required buildpack, if specified).
3. The Cloud Controller stores the application metadata in the CCDB.
4. The Cloud Foundry command-line tool uploads the application files (such as
a .jar file) to the Cloud Controller.
5. The Cloud Controller stores the raw application files in the Cloud Controller
blobstore.
6. The Cloud Foundry command-line tool issues an app start command (unless a
no-start argument was specified).
7. Because the app has not already been staged, the Cloud Controller, through the
CC-Bridge, instructs Diego to stage the application.
8. Diego, through an auction process, chooses a Cell for staging and sends the stag‐
ing task to the Cell.
9. The staging Cell downloads the required life-cycle binaries that are hosted on the
Diego file server, and then uses the instructions in the buildpack to run the stag‐
ing task in order to stage the application.
10. The staging Cell streams the output of the staging process (to loggregator) so that
the developer can troubleshoot application staging problems.
11. The staging Cell packages the resulting staged application into a tarball (.tar file)
called a droplet and uploads it to the Cloud Controller blobstore.
12. Diego reports to the Cloud Controller that staging is complete. In addition, it
returns metadata about the application back to the CCDB.
13. The Cloud Controller (via CC-Bridge) issues a run AI command to Diego to run
the staged application.
14. Diego, through an auction process, chooses a Cell to run an LRP instance as an
ActualLRP.
15. The running Cell downloads the application droplet directly from the Cloud
Controller blobstore (the ActualLRP has a Cloud Controller URL for the asset).
16. The running Cell downloads the Application Life-Cycle Binaries hosted by the
Diego file server and uses these binaries to create an appropriate container and
then starts the ActualLRP instance.
92 | Chapter 6: Diego
17. Diego reports the status of the application to the Cloud Controller, which peri‐
odically receives the count of running instances and any crash events.
18. The Loggregator log/metric stream goes straight from the Cell to the Loggregator
system (not via the CC-Bridge). The application logs can then be obtained from
the Loggregator through the CF CLI.
Diego Staging
The preceding steps explore only staging and running an app from
Cloud Foundry’s perspective. The steps gloss over the interaction
of the internal Diego services. After exploring the remaining Diego
components and services, we will explore staging an LRP from
Diego’s perspective. Analyzing staging provides a concise way of
detailing how each service interacts, and therefore it is important
for you to understand.
The CC-Bridge
The CC-Bridge (Figure 6-5) is a special component comprised of four microservices.
It is a translation layer designed to interact both with the Cloud Controller and
Diego’s API, which is exposed by Diego’s BBS. The BBS is discussed shortly in “The
BBS” on page 98.
Figure 6-5. The CC-Bridge
The components on the CC-Bridge are essential for establishing domain freshness.
Domain freshness means two things:
1. The actual state reflects the desired state.
2. The desired state is always understood.
Domain freshness is established through a combination of self-healing and closed
feedback loops.
Interacting with Diego | 93
The Future of the CC-Bridge Component
For now, you can think of the CC-Bridge as a translation layer that
converts the Cloud Controller’s domain-specific requests into
Diego’s generic Tasks and LRPs. Eventually, Cloud Foundry’s
Cloud Controller might be modified to communicate directly with
the BBS, making the CC-Bridge redundant. Either way, the func‐
tion of the four services that currently reside on the CC-Bridge will
still be required no matter where they reside.
CC-Bridge services translate the Cloud Controller’s domain-specific requests of stage
and run applications into Diego’s generic language of LRP and Tasks. In other words,
Diego does not explicitly know about applications or staging tasks; instead, it just
knows about a Task or LRP that it has been requested to execute. The CC-Bridge
services include the following:
Stager
The Stager handles staging requests.
CC-Uploader
A file server to serve static assets to the Cloud Controller’s blobstore.
Nsync
This service is responsible for handling domain freshness from the Cloud Con‐
troller to Diego.
TPSThis service is responsible for handling domain freshness from Diego to the
Cloud Controller.
Stager
The Stager handles staging requests from the Cloud Controller. It translates these
requests into generic Tasks and submits the Tasks to Diego’s BBS. The Stager also
instructs the Cells (via BBS Task auctions) to inject the platform-specific Application
Life-Cycle Binary into the Cell to perform the actual staging process. Application
Life-Cycle Binaries provide the logic to build, run, and health-check the application.
(You can read more about them in “Application Life-Cycle Binaries” on page 112.)
After a task is completed (successfully or otherwise), the Stager sends a response to
the Cloud Controller.
CC-Uploader
The CC-Uploader acts as a file server to serve static assets such as droplets and build
artifacts to the Cloud Controller’s blobstore. It mediates staging uploads from the
Cell to the Cloud Controller, translating a simple generic HTTP POST request into
94 | Chapter 6: Diego
the complex correctly formed multipart-form upload request that is required by the
Cloud Controller. Droplet uploads to the CC-Uploader are asynchronous, with the
CC-Uploader polling the Cloud Controller until the asynchronous UploadAction is
completed.
Nsync and TPS
Nsync and TPS are the two components responsible for handling domain freshness,
matching desired and actual state between Diego and the Cloud Controller. They are
effectively two sides of the same coin:
•The Nsync primarily retrieves information from the Cloud Controller. It is the
component responsible for constructing the DesiredLRP that corresponds with
the application request originating from the Cloud Controller.
•The TPS primarily provides feedback information from Diego to the Cloud Con‐
troller.
Both components react to events via their listener process, and will periodically check
state validity via their respective bulker/watcher processes.
Figure 6-6 shows the high-level component interaction between the Cloud Control‐
ler, Diego’s Nsync, and Converger components right through to the Cell. The figure
also illustrates the convergence from a DesiredLRP stored in the BBS to an
ActualLRP running on a Cell.
Figure 6-6. High-level process interaction involving domain freshness, convergence, and
ActualLRP placement into a Cell’s container
Interacting with Diego | 95
Domain Freshness
Diego’s design has built-in resilience for its domain of responsibility—namely, run‐
ning Tasks and LRPs. Diego’s eventual-consistency model periodically compares the
desired state (the set of DesiredLRPs) to the actual state (the set of ActualLRPs) and
takes action to keep the actual and desired state synchronized. To achieve eventual
consistency in a safe way, Diego must ensure that its view of the desired state is com‐
plete and up to date. Consider a rare scenario in which Diego’s database has crashed
and requires repopulating. In this context, Diego correctly knows about the
ActualLRPs but has an invalid view of the DesiredLRPs, so it would be catastrophic
for Diego to shut down all ActualLRPs in a bid to reconcile the actual and desired
state.
To mitigate this effect, it is the Diego client’s responsibility to repeatedly inform
Diego of the desired state. Diego refers to this as the “freshness” of the desired state,
or domain freshness. Diego consumers explicitly mark desired state as fresh on a
domain-by-domain basis. Failing to do so will prevent Diego from taking actions to
ensure eventual consistency (Diego will not stop extra instances if there is no corre‐
sponding desired state). To maintain freshness, the client typically supplies a time-to-
live (TTL) and attempts to bump the freshness of the domain before the TTL expires
(thus verifying that the contents of Diego’s DesiredLRP are up to date). It is possible
to opt out of this by updating the freshness with no TTL so that freshness will never
expire, allowing Diego to always perform all of its eventual-consistency operations.
Only destructive operations, performed during an eventual-consistency convergence
cycle, will override freshness; Diego will continue to start and stop instances when
explicitly instructed to do so.
The Nsync is responsible for keeping Diego “in sync” with the Cloud Controller. It
splits its responsibilities between two independent processes: a bulker and a listener.
Let’s look at each of them:
Nsync-Listener
The Nsync-Listener is a service that responds to DesiredLRP requests from the
Cloud Controller. It actively listens for desired app requests and, upon receiving
a request, either creates or updates the DesiredLRP via a record in the BBS data‐
base. This is the initial mechanism for dictating desired state from the Cloud
Controller to Diego’s BBS.
Nsync-Bulker
The Nsync-Bulker is focused on maintaining the system’s desired state, periodi‐
cally polling the Cloud Controller and Diego’s BBS for all DesiredLRPs to ensure
that the DesiredLRP state known to Diego is up to date. This component pro‐
96 | Chapter 6: Diego
vides a closed feedback loop, ensuring that any change of desired state from the
Cloud Controller is reflected on to Diego.
The process status reporter (TPS) is responsible for reporting Diego’s status; it is
Diego’s “hall monitor.” It splits its responsibilities between two independent pro‐
cesses: listener and watcher submodules. Here’s what each one does:
TPS-Listener
The TPS-Listener provides the Cloud Controller with information about cur‐
rently running ActualLRP instances. It responds to the Cloud Foundry CLI
requests for cf apps and cf app <may-app-name>.
TPS-Watcher
The TPS-Watcher monitors ActualLRP activity for crashes and reports them to
the Cloud Controller.
Logging and Traffic Routing
To conclude our review of the Cloud Foundry layer of user-facing components, let
look at logging.
Diego uses support for streaming logs from applications to Cloud Foundry’s Loggre‐
gator system and provides support for routing traffic to applications via the routing
subsystem. With the combined subsystems—Diego, Loggregator, and the routing
subsystem—we have everything we need to do the following:
• Run any number of applications as a single user
• Route traffic to the LRPs
•Stream logs from the LRPs
The Loggregator aggregates and continually streams log and event data. Diego uses
the Loggregator’s Metron agent to provide real-time streaming of logs for all Tasks
and LRPs in addition to the streaming of logs and metrics for all Diego components.
The routing system routes incoming application traffic to ActualLRPs running within
Garden containers on Diego Cells. (The Loggregator and routing subsystem were dis‐
cussed in Chapter 3.)
Diego Components
At the time of writing, there are four core Diego component machines:
1. BBS (Database)
2. Cell
3. Brain
Diego Components | 97
4. Access (an external component)
The functionality provided by these components is broken up into a number of
microservices running on their respective component machines.
The BBS
The BBS manages Diego’s database by maintaining an up-to-date cache of the state of
the Diego cluster including a picture-in-time of all DesiredLRPs, running ActualLRP
instances, and in-flight Tasks. Figure 6-7 provides an overview of the CC–Bridge-to-
BBS interaction.
Figure 6-7. The BBS interaction with the Cloud Controller via the CC-Bridge
The BBS provides Diego’s internal components and external clients with a consistent
API to carry out the following:
1. Query and update the system’s shared state (the state machine)
2. Trigger operations that execute the placement of Tasks and LRPs
3. View the datastore underpinning the state machine
For example, certain operations will cause effects. Consider creating a new LRP in the
system. To achieve this, the CC-Bridge communicates to the BBS API endpoints.
These endpoints will save any required state in the BBS database. If you specify a
DesiredLRP with N instances, the BBS will automatically trigger the required actions
for creating N ActualLRPs. This is the starting point of the state machine. Because
understanding the state machine is important for debugging the platform, we cover it
in more detail in “The Diego State Machine and Workload Life Cycles” on page 107.
Communication to the BBS is achieved through Google protocol buffers (protobufs).
98 | Chapter 6: Diego
Protocol Buffers
Protocol buffers (protobufs) provide an efficient way to marshal
and unmarshal data. They are a language-neutral, platform-
neutral, extensible mechanism for serializing structured data, simi‐
lar in concept to eXtensible Markup Language (XML) but smaller,
faster, and much less verbose.
The Diego API
The BBS provides an RPC-style API for core Diego components (Cell reps and Brain)
and any external clients (SSH proxy, CC-Bridge, Route-Emitter). The BBS endpoint
can be accessed only by Diego clients that reside on the same “private” network; it is
not publicly routable. Diego uses dynamic service discovery between its internal
components. Diego clients (the CC-Bridge) will look up the active IP address of the
BBS using internal service discovery. Here are the main reasons why the Diego end‐
point is not accessible via an external public route through the GoRouter:
•Clients are required to use TLS for communication with the BBS. The GoRouter
is currently not capable of establishing or passing through a TLS connection to
the backend.
•Clients are required to use mutual TLS authentication. To talk to the BBS, the
client must present a certificate signed by a CA that both the client and BBS rec‐
ognize.
The BBS encapsulates access to the backing database and manages data migrations,
encoding, and encryption. The BBS imperatively directs Diego’s Brain to state “here is
a task with a payload, find a Cell to run it.” Diego clients such as the CC-Bridge, the
Brain, and the Cell all communicate with the BBS. The Brain and the Cell are both
described in detail later in this chapter.
State versus Communication
The BBS is focused on data as opposed to communication. Putting state into an even‐
tually consistent store (such as etcd, clustered MySQL, or Consul) is an effective way
to build a distributed system because everything is globally visible. Eventual consis‐
tency is a data problem, not a communication problem. Tools like etcd and Consul,
which operate as a quorum, handle consistency for you.
It is important that Diego does not store inappropriate data in a component designed
for state. For instance, a “start auction” announcement is not a statement of truth; it
is a transient piece of communication representing intent. Therefore, transient infor‐
mation should be communicated by messaging (i.e., NATS or HTTP—although
direct communication over HTTPS is preferred). Communication via messaging is
temporary and, as such, it can be lossy. For Diego auctions, loss is acceptable. Diego
Diego Components | 99
can tolerate loss because of eventual consistency and the closed feedback loops
described earlier.
For visibility into the BBS, you can use a tool called Veritas.
The Converger process
The Converger is a process responsible for keeping work eventually consistent.
Eventual Consistency
Eventual consistency is a consistency model used for maintaining
the integrity of stateful data in a distributed system. To achieve
high availability, a distributed system might have several “copies”
of the data-backing store. Eventual consistency informally guaran‐
tees that, if no new updates are made to a given item of data, even‐
tually all requests for that item will result in the most recently
updated value being returned. To learn more about HA, see “High
Availability Considerations” on page 263.
The Converger is a process that currently resides on the Brain instance group. It is
important to discuss now, because it operates on the BBS periodically and takes
actions to ensure that Diego attains eventual consistency. Should the Cell fail cata‐
strophically, the Converger will automatically move the missing instances to other
Cells. The Converger maintains a lock in the BBS to ensure that only one Converger
performs convergence. This is primarily for performance considerations because
convergence should be idempotent.
The Converger uses the converge methods in the runtime-schema/BBS to ensure
eventual consistency and fault tolerance for Tasks and LRPs. When converging LRPs,
the Converger identifies which actions need to take place to bring the DesiredLRP
state and ActualLRP state into accord. Two actions are possible:
• If an instance is missing, a start auction is sent.
•If an extra instance is identified, a stop message is sent to the Cell hosting the
additional instance.
In addition, the Converger watches for any potentially missed messages. For example,
if a Task has been in the PENDING state for too long, it is possible that the request to
hold an auction for the Task never made it to the Auctioneer. In this case, the Con‐
verger is responsible for resending the auction message. Periodically the Converger
sends aggregate metrics about DesiredLRPs, ActualLRPs, and Tasks to the Loggrega‐
tor.
100 | Chapter 6: Diego
Resilience with RAFT
Whatever the technology used to back the BBS (etcd, Consul, clus‐
tered MySQL), it is likely to be multinode to remove any single
point of failure. If the backing technology is based on the Raft con‐
sensus algorithm, you should always ensure that you have an odd
number of instances (three at a minimum) to maintain a quorum.
Diego Cell Components
Cells are where applications run. The term application is a high-level construct; Cells
are concerned with running desired Tasks and LRPs. Cells are comprised of a num‐
ber of subcomponents (Rep/Executor/Garden; see Figure 6-8) that deal with running
and maintaining Tasks and LRPs. One Cell typically equates to a single VM, as gov‐
erned by the CPI in use. You can scale-out Cells both for load and resilience con‐
cerns.
Figure 6-8. The Cell processes gradient from the Rep to the Executor through to Garden
and its containers
There is a specificity gradient across the Rep, Executor, and Garden. The Rep is con‐
cerned with Tasks and LRPs, and knows the details about their life cycles. The Execu‐
tor knows nothing about Tasks and LRPs but merely knows how to manage a
collection of containers and run the composable actions in these containers. Garden,
in turn, knows nothing about actions and simply provides a concrete implementation
of a platform-specific containerization technology that can run arbitrary commands
in containers.
Rep
The Rep is the Cell’s API endpoint. It represents the Cell and mediates all communi‐
cation between the BBS and Brain—the Rep is the only Cell component that commu‐
Diego Components | 101
nicates with the BBS. This single point of communication is important to understand;
by using the Rep for all communication back to the Brain and BBS, the other Cell
components can remain implementation independent. The Rep is also free to be
reused across different types of Cells. The Rep is not concerned with specific con‐
tainer implementations; it knows only that the Brain wants something run. This
means that specific container technology implementations can be updated or swap‐
ped out and replaced at will, without forcing additional changes within the wider dis‐
tributed system. The power of this plug-and-play ability should not be
underestimated. It is an essential capability for upgrading the system with zero-
downtime deployments.
Specifically, the Rep does the following (see also Figure 6-9):
•Participates in the Brain’s Auctioneer auctions to bid for Tasks and LRPs. It bids
for work based on criteria such as its capacity to handle work, and then subse‐
quently tries to accept what is assigned.
•Schedules Tasks and LRPs by asking its in-process Executor to create a container
to run generic action recipes in the newly created container.
•Repeatedly maintains the presence of the Cell in the BBS. Should the Cell fail cat‐
astrophically, the BBS will invoke the Brain’s Converger to automatically move
the missing instances to another Cell with availability.
•Ensures that the set of Tasks and ActualLRPs stored in the BBS are synchronized
with the active containers that are present and running on the Cell, thus com‐
pleting the essential feedback loop between the Cell and the BBS.
Figure 6-9. The BBS, auction, and Cell feedback loop
The Cell Rep is responsible for getting the status of a container periodically from
Garden via its in-process Executor and reporting that status to the BBS database.
There is only one Rep running on every Diego Cell.
102 | Chapter 6: Diego
5A conceptual adaption from the earlier container technology LMCTFY, which stands for Let Me Contain
That For You.
Executor
The Executor is a logical process inside the Rep. Its remit is “Let me run that for
you.”5 The Executor still resides in a separate repository, but it is not a separate job; it
is part of the Rep.
The Executor does not know about the Task versus LRP distinction. It is primarily
responsible for implementing the generic Executor “composable actions,” as dis‐
cussed in “Composable Actions” on page 85. Essentially, all of the translation
between the Rep and Garden is encapsulated by the Executor: the Executor is a gate‐
way adapter from the Rep to the Garden interface. The Rep deals with simplistic
specifications (execute this tree of composable actions), and the Executor is in charge
of actually interacting with Garden to, for example, make the ActualLRP via the life
cycle objects. Additionally, the Executor streams Stdout and Stderr to the Metron-
agent running on the Cell. These log streams are then forwarded to the Loggregator.
Garden
Cloud Foundry’s container implementation, Garden, is separated into the API imple‐
mentation and the actual container implementation. This separation between API
and actual containers is similar to how Docker Engine has both an API and a con‐
tainer implementation based on runC.
Garden is a container API. It provides a platform-independent client server to man‐
age Garden-compatible containers such as runC. It is backend agnostic, defining an
interface to be implemented by container-runners (e.g., Garden-Linux, Garden-
Windows, libcontainer, runC). The backend could be anything as long as it under‐
stands requests through the Garden API and is able to translate those requests into
actions.
Container Users
By default, all applications run as the vcap user within the con‐
tainer. This user can be changed with a runAction, the composable
action responsible for running a process in the container. This
composable action allows you to specify, among other settings, the
user. This means Diego’s internal composable actions allow pro‐
cesses to run as any arbitrary user in the container. That said, the
only users that really make sense are distinguished unprivileged
users known as vcap and root. These two users are provided in the
cflinuxfs2 rootfs. For Buildpack-based apps, Cloud Foundry always
specifies the user to be vcap.
Diego Components | 103
Chapter 8 looks at Garden in greater detail.
The Diego Brain
We have already discussed the Brain’s Converger process. The Brain (Figure 6-10) is
also responsible for running the Auctioneer process. Auctioning is the key compo‐
nent of Diego’s scheduling capability. There are two components in Diego that par‐
ticipate in auctions:
The Auctioneer
Responsible for holding auctions whenever a Task or LRP needs to be scheduled
The Cell’s Rep
Represents a Cell in the auction by making bids for work and, if picked as the
winner, running the Task or LRP
Figure 6-10. The Cell–Brain interaction
104 | Chapter 6: Diego
Horizontal Scaling for Controlling Instance Groups
Each core component (known in BOSH parlance as an instance
group) is deployed on a dedicated machine or VM. There can be
multiple instances of each instance group, allowing for HA and
horizontal scaling.
Some instance groups, such as the Auctioneer, are essentially state‐
less. However, it is still important that only one instance is actively
making decisions. Diego allows for lots of potential running
instances in order to establish HA, but only one Auctioneer can be
in charge at any one period in time. Within a specific instance
group, the instance that is defined as “in charge” is specified by a
global lock. Finding out which specific instance is in charge is
accomplished through an internal service discovery mechanism.
When holding an auction, the Auctioneer communicates with the Cell Reps via
HTTP. The auction process decides where Tasks and ActualLRP instances are run
(remember that a client can dictate that one LRP requires several ActualLRP instan‐
ces for availability). The Auctioneer maintains a lock in the BBS such that only one
Auctioneer may handle auctions at any given time. The BBS determines which Auc‐
tioneer is active from a lock record (effectively the active Auctioneer holding the
lock). When the BBS is at a point at which it wants to assign a payload to a Cell for
execution, the BBS directs the Brain’s Auctioneer by requesting, “here is a task with a
payload, find a Cell to run it.” The Auctioneer asks all of the Reps what they are cur‐
rently running and what their current capacity is. Reps proactively bid for work and
the Auctioneer uses the Reps’ responses to make a placement decision.
Scheduling Algorithms
At the core of Diego is a distributed scheduling algorithm designed
to orchestrate where work should reside. This distribution algo‐
rithm is based on several factors such as existing Cell content and
app size. Other open source schedulers exist, such as Apache
Mesos or Google’s Kubernetes. Diego is optimized specifically for
application and task workloads. The supply–demand relationship
for Diego differs from the likes of Mesos. For Mesos, all worker
Cells report, "I am available to take N pieces of work" and Mesos
decides where the work goes. In Diego, an Auctioneer says, "I have
N pieces of work, who wants them?" Diego’s worker Cells then join
an auction, and the winning Cell of each auctioned-off piece of
work gets that piece of work. Mesos’ approach is supply driven, or
a “reverse-auction,” and Diego’s approach is demand driven.
A classic optimization problem in distributed systems is that there is a small lag
between the time the system realizes it is required to make a decision (e.g., task place‐
Diego Components | 105
ment) and the time when it takes action on that decision. During this lag the input
criteria that the original decision was based upon might have changed. The system
needs to take account of this to optimize its work-placement decisions.
Consequently, there are currently two auction actions for LRPs: LRPStartAuction
and LRPStopAuction. Let’s look at each:
LRPStartAuctions
These occur when LRPs need to be assigned somewhere to run. Essentially, one
of Diego’s Auctioneers is saying, “We need another instance of this app. Who
wants it?”
LRPStopAuctions
These occur when there are too many LRPs instances running for a particular
application. In this case, the Auctioneer is saying, “We have too many instances of
this app at index X. Who wants to remove one?"
The Cell that wins the auction either starts or stops the requested LRP.
Simulations
Diego includes simulations for testing the auction orchestration. Diego values the
tight feedback loops achieved with the auction algorithms. This feedback allows the
engineers to know how Diego internals are working and performing. It is especially
valuable for dealing with order dependency: where “A” must run before “B.” Simulat‐
ing this means it can be tested and reasoned over more quickly.
Simulations can be run either in-process or across multiple processes. Unit tests are
great for isolation and integration tests exercise the traditional usage. Simulation test‐
ing provides an extra layer of performance testing.
The Access VM
The access VM contains the file server and the SSH proxy services.
File server
The file server serves static assets used by various Diego components. In particular, it
provides the Application Life-Cycle Binaries to the Cells.
The SSH proxy
Diego supports SSH access to ActualLRP instances. This feature provides direct
access to the application for tasks such as viewing application logs or inspecting the
state of the container filesystem. The SSH proxy is a stateless routing tier. The pri‐
mary purpose of the SSH proxy is to broker connections between SSH clients and
106 | Chapter 6: Diego
SSH servers running within containers. The SSH proxy is a lightweight SSH daemon
that supports the following:
• Command execution
• Secure file copy via SCP
• Secure file transfer via SFTP
•Local port forwarding
•Interactive shells, providing a simple and scalable way to access containers asso‐
ciated with ActualLRPs
The SSH proxy hosts the user-accessible SSH endpoint so that Cloud Foundry users
can gain SSH access to containers running ActualLRPs. The SSH proxy is responsible
for the following:
• SSH authentication
• Policy enforcement
• Access controls
After a user successfully authenticates with the proxy, the proxy attempts to locate
the target container, creating the SSH session with a daemon running within the con‐
tainer. It effectively creates a “man-in-the-middle” connection with the client, bridg‐
ing two SSH sessions:
• A session from the client to the SSH proxy
•A session from the SSH proxy to the container
After both sessions have been established, the proxy will manage the communication
between the user’s SSH client and the container’s SSH daemon.
The daemon is self-contained and has no dependencies on the container root filesys‐
tem. It is focused on delivering basic access to ActualLRPs running in containers and
is intended to run as an unprivileged process; interactive shells and commands will
run as the daemon user. The daemon supports only one authorized key and is not
intended to support multiple users. The daemon is available on Diego’s file server. As
part of the application life cycle bundle, Cloud Foundry’s LRPs will include a down‐
loadAction to acquire the binary and then a runAction to start it.
The Diego State Machine and Workload Life Cycles
Diego’s semantics provide clients with the ability to state, “Here is my workload: I
want Diego to keep it running forever. I don’t care how.” Diego ensures this request
becomes a reality. If for some reason an ActualLRP crashes, Diego will reconcile
The Diego State Machine and Workload Life Cycles | 107
desired and actual state back to parity. These life cycle concerns are captured by
Diego’s state machine. It is important that you understand the state machine should
you need to debug the system. For example, if you notice many of your ActualLRPs
remain in UNCLAIMED state, it is highly likely that your Cells have reached capacity
and require additional resources.
Stateful and Globally Aware Components
When it comes to state, Diego is comprised of three types of components: stateful
components, stateless components, and stateless globally aware components. To
describe the difference between the three components, consider the differences
between the Cell, the BBS, and the Auctioneer.
A Cell has a global presence, but it is effectively stateless. Cells advertise themselves
over HTTP to the BBS via a service discovery mechanism. They do not provide the
entire picture of their current state; they simply maintain a presence in the BBS via
recording a few static characteristics such as the AZ they reside in and their IP
address. Maintaining a presence in the BBS allows the Auctioneer to contact the Cell
at the recorded location in order to try to place work.
The BBS retains the global state associated with managing the persistence layer. The
BBS also has the responsibility of managing migrations (between data migrations and
schema migrations or API migrations reflected in the schema). For this reason, the
BBS is an essential component to back up.
The Auctioneer is required to be globally aware; it has a global responsibility but is
not directly responsible for system state. When a component has a global responsibil‐
ity, there should only ever be one instance running at any one time.
An understanding of the state machine (see Figure 6-11) and how it relates to the app
and task life cycles is essential for understanding where to begin with debugging a
specific symptom.
108 | Chapter 6: Diego
Figure 6-11. Diego’s state machine
The responsibility for state belongs, collectively, to several components. The partition
of ownership within the distributed system is dependent on the task or LRP (app) life
cycle.
The Application Life Cycle
The application life cycle is as follows:
1. When a client expresses the desire to run an application, the request results in
Diego’s Nsync process creating an ActualLRP record.
2. An ActualLRP has a state field (a process globally unique identifier [GUID] with
an index) recorded in the BBS. The ActualLRP begins its life in an UNCLAIMED
state, resulting in the BBS passing it over to the Auctioneer process.
3. Work collectively is batched up and then distributed by the Auctioneer. When
the Auctioneer has a batch of work that requires allocating, it looks up all the
running Cells through service discovery and individually asks each Cell (Rep) for
a current snapshot of its state, including how much capacity the Cell has to
receive extra work. Auction activity is centrally controlled. It is like a team leader
assigning tasks to team members based on their desire for the work and their
capacity to perform it. The batch of work is broken up and distributed appropri‐
ately across the Cells.
4. An auction is performed and the ActualLRP is placed on a Cell. The Cell’s Rep
immediately transfers the ActualLRP state to CLAIMED. If placement is success‐
ful, the LRP is now RUNNING and the Rep now owns this record. If the
ActualLRP cannot be placed (if a Cell is told to do work but cannot run that
work for some reason), it reenters the auction process in an UNCLAIMED state.
The Cell rejects the work and the Auctioneer can retry the work later.
The Diego State Machine and Workload Life Cycles | 109
5. The Cell’s response on whether it has the capacity to do the work should not
block actually starting the work. Therefore, the “perform” request from the Auc‐
tioneer instructs the Cell to try to undertake the work. The Cell then attempts to
reserve capacity within the Executor’s state management, identifying the extra
resource required for the work.
6. The Cell quickly reports success or failure back to the Auctioneer. Upon success,
the Cell then reserves this resource (as quickly as possible) so as not to advertise
reserved resources during future auctions. The Executor is now aware that it has
the responsibility for the reserved work and reports this back to the Rep, which
knows the specifics of the Diego state machine for Tasks and LRPs.
7. Based on the current state that the Executor is reporting to the Rep, the Rep is
then going to make decisions about how to progress workloads through their life
cycles. In general, if anything is in reserved state, the Rep states to the Executor:
"start running that workload.” At that point, the Executor takes the container
specification and creates a container in Garden.
8. The Auctioneer is responsible for the placement of workloads across the entire
cluster of Cells. The Converger is responsible for making sure the desired and
actual workloads are reconciled across the entire cluster of Cells. If the
ActualLRP crashes, it is placed in a CRASHED state and the Rep moves the state
ownership back to the BBS because the ActualLRP is no longer running. When
the Rep undertakes its own local convergence cycle, trying to converge the actual
running states in its Garden containers with its representation within the BBS,
the Rep will discover the ActualLRP CRASHED state. The Rep then looks for the
residual state that might still reside from its management of that ActualLRP.
Even if the container itself is gone, the Executor might still have the container
represented in a virtual state. The virtual state’s “COMPLETED” state informa‐
tion might provide a clue as to why the ActualLRP died (e.g., it may have been
placed in failure mode). The Rep then reports to the BBS that the ActualLRP has
crashed. The BBS will attempt three consecutive restarts and then the restart pol‐
icy will begin to back off exponentially, attempting subsequent restarts after a
delayed period.
9. The Brain’s Converger runs periodically looking for “CRASHED” ActualLRPs.
Based on the number of times it has crashed (which is also retained in the BBS),
the Converger will pass the LRP back to the Auctioneer to resume the
ActualLRP. The Converger deals with the mass of unclaimed LRPs, moving them
to ActualLRPs “CLAIMED” and “RUNNING”. The Converger maps the state
(held in the BBS) on to the desired LRPs. If the BBS desires five instances but the
Rep only reports on four records, the Converger will make the fifth record in the
BBS to kickstart the placement.
10. There is a spectrum of responsibilities that extends from the Converger to the
BBS. Convergence requires a persistence-level convergence, including the
110 | Chapter 6: Diego
required cleanup process. There is also a business-model convergence in which
we can strip away any persistence artifacts and deal with the concepts we are
managing—and Diego ensures that the models of these concepts are in harmony.
The persistence layer always happens in the BBS, but it is triggered by the Con‐
verger running a convergence loop.
Task Life Cycle
Tasks in Diego also undergo a life cycle. This life cycle is encoded in the Task’s state
as follows:
PENDING
When a task is first created, it enters the PENDING state.
CLAIMED
When successfully allocated to a Diego Cell, the Task enters the CLAIMED state
and the Task’s Cell_id is populated.
RUNNING
The Task enters the RUNNING state when the Cell begins to create the container
and run the defined Task action.
COMPLETED
Upon Task completion, the Cell annotates the TaskResponse (failed, failure_rea‐
son, result), and the Task enters the COMPLETED state.
Upon Task completion, it is up to the consumer of Diego to acknowledge and resolve
the completed Task, either via a completion callback or by deleting the Task. To dis‐
cover if a Task is completed, the Diego consumer must either register a comple‐
tion_callback_url or periodically poll the API to fetch the Task in question. When the
Task is being resolved, it first enters the RESOLVING state and is ultimately removed
from Diego. Diego will automatically reap Tasks that remain unresolved after two
minutes.
Additional Components and Concepts
In addition to its core components, Diego also makes use of the following:
• The Route-Emitter
• Consul
• Application Life-Cycle Binaries
Additional Components and Concepts | 111
The Route-Emitter
The Route-Emitter is responsible for registering and unregistering the ActualLRPs
routes with the GoRouter. It monitors DesiredLRP state and ActualLRP state via the
information stored in the BBS. When a change is detected, the Route-Emitter emits
route registration/unregistration messages to the router. It also periodically emits the
entire routing table to the GoRouter. You can read more about routing in Chapter 7.
Consul
Consul is a highly available and distributed service discovery and key-value store.
Diego uses it currently for two reasons:
•It provides dynamic service registration and load balancing via internal DNS res‐
olution.
•It provides a consistent key-value store for maintenance of distributed locks and
component presence. For example, the active Auctioneer holds a distributed lock
to ensure that other Auctioneers do not compete for work. The Cells Rep main‐
tains a global presence in Consul so that Consul can maintain a correct view of
the world. The Converger also maintains a global lock in Consul.
To provide DNS, Consul uses a cluster of services. For services that require DNS res‐
olution, a Consul agent is co-located with the hosting Diego component. The consul-
agent job adds 127.0.0.1 as the first entry in the nameserver list. The consul-agent
that is co-located on the Diego component VM serves DNS for consul-registered
services on 127.0.0.1:53. When Consul tries to resolve an entry, the Consul domain
checks 127.0.0.1 first. This reduces the number of component hops involved in DNS
resolution. Consul allows for effective intercomponent communication.
Other services that expect external DNS resolution also need a reference to the exter‐
nal DNS server to be present in /etc/resolv.conf.
Like all RAFT stores, if Consul loses quorum, it may require manual intervention.
Therefore, a three-Consul-node cluster is required, preferably spanning three AZs. If
you restart a node, when it comes back up, it will begin talking to its peers and replay
the RAFT log to get up to date and synchronized with all the database history. It is
imperative to ensure that a node is fully back up and has rejoined the cluster prior to
taking a second node down; otherwise, when the second node goes offline, you might
lose quorum. BOSH deploys Consul via a rolling upgrade to ensure that each node is
fully available prior to bringing up the next.
Application Life-Cycle Binaries
Diego aims to be platform agnostic. All platform-specific concerns are delegated to
two types of components:
112 | Chapter 6: Diego
• the Garden backend
• the Application Life-Cycle Binaries
The process of staging and running an application is complex. These concerns are
encapsulated in a set of binaries known collectively as the Application Life-Cycle
Binaries. There are different Application Life-Cycle Binaries depending on the con‐
tainer image (see also Figure 6-12):
•Buildpack-Application Life Cycle implements a traditional buildpack-based life
cycle.
•Docker-Application Life Cycle implements a Docker-based OCI-compatible life
cycle.
•Windows-Application Life Cycle implements a life cycle for .NET applications
on Windows.
Figure 6-12. The Application Life-Cycle Binaries
Each of the aforementioned Application Life Cycles provides a set of binaries that
manage a specific application type. For the Buildpack-Application Life Cycle, there
are three binaries:
•The Builder stages a Cloud Foundry application. The CC-Bridge runs the Builder
as a Task on every staging request. The Builder performs static analysis on the
application code and performs any required preprocessing before the application
is first run.
•The Launcher runs a Cloud Foundry application. The CC-Bridge sets the
Launcher as the Action on the app’s DesiredLRP. The Launcher executes the
user’s start command with the correct system context (working directory, envi‐
ronment variables, etc.).
•The Healthcheck performs a status check of the running ActualLRP from within
the container. The CC-Bridge sets the Healthcheck as the Monitor action on the
app’s DesiredLRP.
Additional Components and Concepts | 113
The Stager Task produced by the CC-Bridge downloads the appropriate Application
Life-Cycle Binaries and runs them to invoke life cycle scripts such as stage, start, and
health-check in the ActualLRP.
This is a pluggable module for running OS-specific components. Because the life
cycle is OS specific, the OS is an explicitly specified field required by the LRP. For
example, the current Linux setting references the cflinuxfs2 rootfs. In addition to the
rootfs, the only other OS-specific component that is explicitly specified is the back‐
end container type.
RunAction and the Launcher Binary
When Diego invokes start-command for an app (ActualLRP), it does so as a RunAc‐
tion. A Task or LRP RunAction invokes a process that resides under the process iden‐
tifier (PID) namespace of the container within which it is invoked. The RunAction
process is not like a simple bash -c "start-command". It is run through the launcher
binary, which adds additional environmental setup for the process before the process
calls exec to spawn bash -c "start-command". The launcher is invoked after the
container is created and is focused on setting up the environment for the start-
command process (e.g., the launcher sets appropriate environmental variables, the
working directory, and all other environment concerns inside the container). This
environment setup is not very extensive; for example, for buildpack apps, it changes
HOME and TMPDIR, puts some per-instance information in VCAP_APPLICA‐
TION, and then sources any scripts the buildpack placed in .profile.d. When the
launcher runs, it uses the exec system call to invoke the start command (and so dis‐
appears). When you review the process tree within the Garden container, you can
observe that after the launcher finishes running and setting up the environment, the
top-level process is init (the first daemon process started when booting the system),
and your applications will be a child of initd.
In addition to the Linux life cycle, Diego also supports a Windows life cycle and a
Docker life cycle. The Docker life cycle (to understand how to stage a Docker image)
is based on metadata from the Cloud Controller, based on what type of app we want
to run. As part of the information for “stage these app bits,” there will be some indica‐
tion of what branching the Stager is required to undertake. The binaries have tried to
ensure that, as much as possible, these specifications are shared.
Putting It All Together
We discussed what happens when staging an application in “Staging Workflow” on
page 90. Specifically, we discussed the following interactions:
•The CLI and the Cloud Controller
114 | Chapter 6: Diego
• The Cloud Controller and CC-Bridge
• The CC-Bridge and Diego, including Diego Cells
Up to this point, we have glossed over the interaction of the internal Diego compo‐
nents. This section discusses the interaction between the Diego components during
staging LRPs. This section assumes you are staging an app using the buildpack Appli‐
cation Life-Cycle Binaries as opposed to pushing a prebuilt OCI-compatible image.
Staging takes application artifacts and buildpacks and produces a binary droplet arti‐
fact coupled with metadata. This metadata can be anything from hosts and route
information to the detected buildpack and the default start command. Essentially,
any information that comes out of the entire buildpack compilation and release pro‐
cess can be encapsulated via the buildpack’s metadata.
The internals of Diego know nothing about buildpacks. The Cloud
Controller, via the Application Life-Cycle Binaries, provides all the
required buildpacks to run in a container. The Cloud Controller
can be conservative on what it downloads; if you specify a build‐
pack (e.g., the JBP), only that buildpack will be downloaded. If no
buildpack is specified, the Cloud Controller will download all
buildpacks. Additionally, individual Cells can cache buildpacks so
that they do not need to be repeatedly downloaded from the Cloud
Controller. Chapter 9 looks at buildpacks in greater detail.
The steps for staging are as follows:
1. The staging process begins with a cf push request from the Cloud Foundry CLI
to the Cloud Controller (CAPI). Diego’s role in the process occurs when the
Cloud Controller instructs Diego (via the CC-Bridge) to stage the application. All
Tasks and LRPs are submitted to the CC-Bridge via Cloud Foundry’s Cloud
Controller. The Cloud Controller begins the staging process by sending a “stage
app bits” request as a task.
2. The CC-Bridge’s Stager picks up and handles the “stage app bits” request. The
Stager constructs a staging message and forwards it to the BBS. Thus, the Stager
represents a transformation function.
3. The first step of the BBS is to store the task information. When the task request is
stored in the BBS, the BBS is responsible for validating it. At this stage the task
request is only stored; no execution of the task has taken place.
4. Diego is now at a point at which it wants to assign a payload (the Task) to a Cell
that is best suited to run it. The BBS determines which of the Brain’s Auctioneers
is active by looking for the Auctioneer that currently holds the lock record. The
BBS communicates to the Auctioneer, directing it to find a Cell to run the Task.
Putting It All Together | 115
5. The Auctioneer optimally distributes Tasks and LRPs to the cluster of Diego
Cells via an auction involving the Cell Reps. The Auctioneer asks all of the Reps
what they are currently running. It uses the Reps’ responses to make a placement
decision. After it selects a Cell, it directs the chosen Cell to run the desired Task.
You can configure it so that this auction is done every time you push an applica‐
tion; however, you can also batch auctions for performance to reduce the auction
overhead.
There is additional scope for sharding this auction process by AZ. It is inappro‐
priate to cache the auction results because state is changing all the time; for
example, a task might complete or an ActualLRP might crash. To look up which
Cells are registered (via their Reps), the Auctioneer communicates with the BBS
to get the shared system state. Reps report to BBS directly to inform BBS of their
current state. When the Auctioneer is aware of the available Reps, it contacts all
Reps directly.
6. The chosen Rep is assigned the Task. After a Task/LRP is assigned to a Cell, that
Cell will try to allocate containers based on its internal accounting. Inside the
Rep, there is a gateway to Garden (the Executor). (We introduced Garden in
“Garden” on page 103 and it is discussed further in Chapter 8.) The Rep runs
Tasks/ActualLRPs by asking its in-process Executor to create a container to run
generic action recipes. The Executor creates a Garden container and executes the
work encoded in the Task/ActualLRP. This work is encoded as a generic,
platform-independent recipe of composable actions (we discussed these in
“Composable Actions” on page 85). If the Cell cannot perform the Task, it
responds to the Auctioneer, announcing that it was unable to run the requested
work. Payloads that are distributed are a batch of Tasks rather than one Task per
request. This approach reduces chatter and allows the Cell to attempt all reques‐
ted tasks and report back any Task it was unable to accomplish.
7. The staging Cell uses the instructions in the buildpack and the staging task to
stage the application. It obtains the buildpack through the buildpack Application
Life-Cycle Binaries via the file server.
8. Assuming that the task completes successfully, the staging task will result in a
droplet, and Garden reports that the container has completed all processes. This
information bubbles up through the Rep and the Task is now marked as being in
a completed state. When a Task is completed it can call back to the BBS using the
callback in the Task request. The callback URL then calls back to the Stager so
that the Stager knows that the task is complete (Stagers are stateless, so the call‐
back will return to any Stager). The callback is the process responsible for
uploading the metadata from Cell to Stager, and the Stager passes the metadata
back to the Cloud Controller.
9. The Cloud Controller also provides information as to where to upload the drop‐
let. The Cell that was responsible for staging the droplet can asynchronously
116 | Chapter 6: Diego
upload the droplet back to the Cloud Controller blobstore. The droplet goes back
to the Cloud Controller blobstore via the Cell’s Executor. The Executor uploads
the droplet to the file server. The file server asynchronously uploads the blob‐
store to the Cloud Controller’s blobstore, regularly polling to find out when the
upload has been completed.
10. Additionally the staging Cell streams the output of the staging process so that the
developer can troubleshoot application staging problems. After this staging pro‐
cess has completed successfully, the Cloud Controller subsequently issues a “run
application droplet command” to Diego to run the staged application.
From a developer’s perspective, you issue a command to Cloud Foundry and Cloud
Foundry will run your app. However, as discussed at the beginning of this chapter, as
work flows through the distributed system, Diego components describe their actions
using different levels of abstraction. Internal interactions between the Cloud Control‐
ler and Diego’s internal components are abstracted away from the developer. They
are, however, important for an operator to understand in order to troubleshoot any
issues.
Summary
Diego is a distributed system that allows you to run and scale N number of applica‐
tions and tasks in containers across a number of Cells. Here are Diego’s major char‐
acteristics and attributes:
•It is responsible for running and monitoring OCI-compatible images, standalone
applications, and tasks deployed to Cloud Foundry.
• It is responsible for container scheduling and orchestration.
•It is agnostic to both client interaction and runtime implementation.
•It ensures applications remain running by reconciling desired state with actual
state through establishing eventual consistency, self-healing, and closed feedback
loops.
•It has a generic execution environment made up of composable actions, and a
composable backend to allow the support of multiple different Windows- and
Linux-based workloads.
When you step back and consider the inherent challenges with any distributed sys‐
tem, the solutions to the consistency and orchestration challenges provided by Diego
are extremely elegant. Diego has been designed to make the container runtime sub‐
system of Cloud Foundry modular and generic. As with all distributed systems,
Diego is complex. There are many moving parts and the communication flows
between them are not trivial. Complexity is fine, however, if it is well defined within
Summary | 117
bounded contexts. The Cloud Foundry team has gone to great lengths to design
explicit boundaries for the Diego services and their interaction flows. Each service is
free to express its work using its own abstraction, and ultimately this allows for a
modular composable plug-and-play system that is easy both to use and operate.
118 | Chapter 6: Diego
CHAPTER 7
Routing Considerations
This chapter looks at Cloud Foundry’s routing mechanisms in more detail. User-
facing apps need to be accessed by a URL, often referred to in Cloud Foundry par‐
lance as a route. End users target the URL for the app that they want to access. The
app then hopefully returns the correct response. However, there is often a lot more
going on behind that simple request–response behavior.
Operators can use routing mechanisms for reasons such as to provide additional
security, ease deployment across a microservices architecture, and avoid downtime
during upgrades through well-established techniques such as deploying canaries and
establishing blue/green deployments. For these reasons, an understanding of Cloud
Foundry’s routing mechanisms along with an appreciation of the routing capabilities
is an important operational concern. Additionally, understanding how different
Cloud Foundry components dynamically handle routing is important for debugging
platform- or app-routing issues.
Routing Primitives
The Cloud Foundry operator deals with the following:
• Routes
• Hostnames
• Domains
• Context paths
•Ports
The Cloud Foundry documentation explores these concepts at length. This chapter
explores the key considerations for establishing routing best practices. We begin with
119
a brief introduction to the terms and then move on to the routing mechanisms and
capabilities.
Routes
To enable traffic from external clients, apps require a specific URL, known as a route.
For example, developers can create a route by mapping the route myapp.shared-cf-
domain.com to the app myapp.
You can construct routes via a combination of the following:
• Domain
• Host
• Port
•Context path
Route construction is explained further in a later section.
Each Cloud Foundry instance can have a single default domain and further addi‐
tional domains that can be shared across organizations (Orgs) in a single Cloud
Foundry instance. Routes are then based on those domains. Routes belong to a Space,
and only apps in the same Space as that route can be mapped to it. A developer of
one Space cannot create or use a route if it already exists in another Space. For this
reason, many developers place app-name-${random-word} in their route to ensure
that their app route is unique during the dev/test phase.
One app, one route, multiple app instances
You can map an individual app to either a single route or, if desired, multiple routes.
Because apps can have multiple app instances (ActualLRPs), all accessed by the single
route, each route has an associated array of host:port entries stored in a routing table
on the GoRouter. Figure 7-1 shows that the host is the Diego Cell machine running
the LRP in a container and the port corresponds to a dedicated host port for that con‐
tainer. The router regularly recomputes new routing tables based on the Cell IP
addresses and the host-side port numbers for the containers, as illustrated in
Figure 7-1. In a cloud-based distributed environment, both desired and actual state
can rapidly change; thus, it is important to dynamically update routes both periodi‐
cally and immediately in response to state changes.
120 | Chapter 7: Routing Considerations
Figure 7-1. Cell-to-container port mapping
One app, multiple routes
You can also map an individual app to multiple routes, granting multiple URLs
access to that app. This capability is illustrated in Figure 7-2.
Figure 7-2. One app mapped to two different routes
Mapping more than one route to an app can be a valuable feature for establishing
techniques such as blue/green deployments. You can read more about blue/green
deployments in the Cloud Foundry Documentation.
Several apps, one route
In addition to being able to map all identical app instances to a single route, as depic‐
ted in Figure 7-3, you can also map independent apps to a single route. This results in
the GoRouter load-balancing requests for the route across all instances of all mapped
apps, as demonstrated in Figure 7-3. This feature is also important for enabling the
use of blue/green and canary deployment strategies. It is also used when dealing with
different apps that must work collectively with a single entry point; for example,
microservices architecture (discussed shortly).
Routing Primitives | 121
Figure 7-3. Routing mechanism allowing for several apps mapped to the same route
Hostnames
Cloud Foundry provides the option of creating a route with a hostname. A hostname
is a name that can be explicitly used to specify an app, as shown in the following
code:
$ cf create-route my-space shared-domain.com --hostname myapp
This creates the unique route myapp.shared-domain.com comprising the host that is
prepended on to the shared domain.
At this stage, all we have done is reserved the route so that the route is not used in
another Space. The app is only routable by this route after it is mapped to the route,
as in the following:
$ cf mapp-route app-name domain hostname
Note that although this route is created for the Space my-space, the Space is not fea‐
tured in the route name.
Routes created for shared domains must always use a hostname. Alternatively, you
can create a route without a hostname. This approach creates a route for the domain
itself and is permitted for private domains only. You can create private domains as
follows:
$ cf create-route my-space private-domain.com
This example creates a route in the Space my-space from domain private-
domain.com. After configuring your DNS, Cloud Foundry will route requests for
http(s)://private-domain.com or any context path under that URL (e.g., private-
domain.com/app1) to apps that are mapped to that route or context path. Any subdo‐
main (e.g., *foo*.private-domain.com) will fail unless additional routes are
specified for that subdomain and then mapped to a subsequent app.
You can use wildcard routes here as a catch-all (e.g., *.private-domain.com); for
example, to serve a custom 404 page or a specific homepage.
122 | Chapter 7: Routing Considerations
1The enterprise load balancer you use is your choice. Cloud Foundry is not opinionated about the load-
balancing strategy that fronts it.
Domains
Cloud Foundry’s use of the terms domain, shared domain, and private domain differ
from their common use:
• Domains provide a namespace from which to create routes.
•Shared domains are available to users in all Orgs, and every Cloud Foundry
instance requires a single default shared domain.
•Private domains allow users to create routes for privately registered domain
names.
As discussed in “Hostnames” on page 122, by default apps are assigned a route with a
hostname my-app, and the app domain my-app.apps.cf-domain.com, resulting in the
route my-app.apps.cf-domain.com.
The presence of a domain in Cloud Foundry indicates that requests for any route cre‐
ated from that domain will be routed to a specific Cloud Foundry instance. This pro‐
vision requires a DNS to be configured to resolve the domain name to the IP address
of a load balancer that fronts traffic entering Cloud Foundry.1
The recommended practice is to have a wildcard canonical name (CNAME) that you
can use as a base domain for other subdomains. An example of a wildcard CNAME is
*.cf-domain.com.
To use a subdomain of your registered domain name with apps on Cloud Foundry,
configure the subdomain as a CNAME record with your DNS provider, pointing at
any shared domain offered in Cloud Foundry.
When installing Cloud Foundry, it is good practice to have a subsequent system
domain and one or more app domains; for example, system.cf-domain.com for your
system domain, and apps.cf-domain.com for (one of) your app domain(s). Multiple
app domains can be advantageous and are discussed further later.
The system domain allows Cloud Foundry to receive requests for and send commu‐
nication between its internal components (like the UAA and Cloud Controller).
Cloud Foundry itself can run some of its components as apps. For example, a service
broker can deploy an app. The app domain guarantees that requests for routes based
off that domain will go to a specific Cloud Foundry instance.
If we had only one combined system and app domain, there would be no separation
of concerns. A developer could register an app domain name that infers it is a poten‐
tial system component. This can cause confusion to the Cloud Foundry operator with
Routing Primitives | 123
respect to what are system apps and what are developer-deployed apps. Moreover,
mapping an arbitrary app to a system component can cause fundamental system fail‐
ures, as we will explore in “Scenario Five: Route Collision” on page 241. For these rea‐
sons, it is recommended that you always have at least one default system domain and
default app domain per environment.
All system components should register routes that are extensions of the system
domain; for example, login.system.cf-domain.com, uaa.system.cf-domain.com,
doppler.system.cf-domain.com, and api.system.cf-domain.com.
Using Wildcard Domains
For an operator not to have to create a new certificate for every app, Cloud Foundry
depends on wildcard domains. Wildcard SSL certificates (referred to as “certs”) are
extremely beneficial for rapid app development and deployment. You will need to
generate wildcard certificates based on your chosen domains for use in Cloud Foun‐
dry. You can either generate these certificates using your own CA, or through any
other tool for generating certificates, such as Cloud Foundry’s “Generate Self-Signed
Certificate” tool.
Certs are necessary for the platform to operate, but this does not mean that your
external load balancer must use these wildcard domains and/or wildcard certificates.
However, it is worth mentioning that if you do not use wildcard certs at your load
balancer, every time you want to deploy a new app, you will need to set up a new
DNS record and then generate a new certificate for that app and install that cert on
your load balancer. This approach adds time and an additional burden that delays the
velocity of pushing new apps to Cloud Foundry. To put it bluntly, not using wildcard
domains in your TLS certificates violates the entire reason for using Cloud Foundry.
Multiple app domains
There are some advantages to using multiple app domains. For example, an operator
might want to establish a dedicated app domain with a dedicated cert and VIP. If you
issue a certificate for a critical app on a dedicated app domain, and for some reason
that certificate becomes compromised, you have the flexibility of revoking just that
certificate without affecting all of your other apps that are on a different app domain.
Context Path Routing
Context path routing allows for routing to be based not only on the route domain
name (essentially the host header), but also the path specified in the route’s URL. The
GoRouter inspects the URL for additional context paths and, upon discovery, can
then route requests to different apps based on that path. Here are a couple of exam‐
ples:
124 | Chapter 7: Routing Considerations
•myapp.mycf-domain.com/foo can be mapped to the foo app.
•myapp.mycf-domain.com/bar can be mapped to the bar app.
This is important when dealing with a microservices architecture. With microservi‐
ces, a single “big-A” apps can be comprised of a suite of smaller microservices apps,
as shown in Figure 7-4. The smaller applications often require the same single top-
level route myapp.mycf-domain.com to offer a single entry point for the user. Context
path routing allows different microservices apps (e.g., foo and bar), all served by the
same parent route, to provide support for different paths in the URL, based on their
unique context path.
With context path-based routing, you can also independently scale up or down those
portions of your big-A app that are being heavily utilized.
Figure 7-4. Routing using a single route and context paths to target a specific app
Routing Components Overview
There are several components involved in the flow of ingress Cloud Foundry traffic.
We can broadly group these as follows:
• Routing tier (load balancer, GoRouter, TCPRouter)
• The control plain and user management (Cloud Controller and UAA)
• The app components (Cells and the SSH proxy)
Figure 7-5 provides a high-level view of the components.
Routing Components Overview | 125
Figure 7-5. Routing components and communication flow
Let’s take a closer look at each of these components:
Load balancer
All HTTP-based traffic first enters Cloud Foundry from an external load bal‐
ancer fronting Cloud Foundry. The load balancer is primarily used for traffic
routing to Cloud Foundry routers.
GoRouter
The GoRouter receives all incoming HTTP(s) traffic from the load balancer. The
GoRouter also receives WebSocket requests and performs the HTTP-to-
WebSocket upgrade to establish a consistent TCP connection to the backend.
TCPRouter
The TCPRouter receives all incoming (non-HTTP) TCP traffic from the load
balancer.
126 | Chapter 7: Routing Considerations
Cloud Controller and the UAA
Operators address the Cloud Controller through Cloud Foundry’s API. As part
of this flow, identity management is provided by the UAA.
Cells and SSH_Proxy
App users target their desired apps via a dedicated hostname and/or domain
combination. The GoRouter will route app traffic to the appropriate app instance
(ActualLRP) running on a Diego Cell. If multiple app instances are running, the
GoRouter will round-robin traffic across the app instances to distribute the
workload. App users can use SSH to the app’s container running on a host Cell
via the SSH proxy service.
Routing Flow
All traffic enters Cloud Foundry from an external load balancer. The load balancer
routes the traffic as follows:
• HTTP/HTTPS and WebSocket traffic to the GoRouter
•(Non-HTTP) TCP traffic to the TCPRouter
App traffic is routed from the routers to the required app. If you’re running multiple
app instances (ActualLRPs), the routers will load-balance the traffic across the run‐
ning ActualLRPs. If an app requires the user to authenticate, you can redirect
requests to the UAA’s login server. Upon authentication, the user is passed back to
the original app. The Cloud Controller provides an example of this behavior.
Platform users target the Cloud Controller. Requests come in from the load balancer
through the GoRouter and hit the CAPI. If the user has yet to log in, requests are
redirected to the UAA for authentication. Upon authentication, the user is redirected
back to the Cloud Controller.
Route-Mapping Flow
When you create a route in the routing table (either directly via Cloud Foundry map-
route command or indirectly via cf push), the Route-Emitter is listening to events
in Diego’s BBS and notices all newly created routes. It takes the route-mapping info
(of Cell host:port) and then dynamically updates the route mapping in the routing
table. Any additional changes—for example, a deleted or moved app—will also result
in the emitter updating the routing table. We discuss route mapping and the Route-
Emitter further in “Routing Table” on page 131.
Routing Flow | 127
Load Balancer Considerations
Although the choice of load balancer is yours to make, there are some specific con‐
siderations required:
•Setting the correct request header fields.
•Determining where to terminate SSL.
•Configuring the load balancer to handle HTTP upgrades to WebSockets (assum‐
ing these requests are then being passed on to the TCPRouter). Ideally, you
should avoid this with the WebSocket upgrade; you should instead use the
GoRouter.
Setting Request Header Fields
When a client connects to a web server through an HTTP proxy or load balancer, it is
possible to identify the originating IP address and the send protocol by setting the X-
Forwarded-For and X-Forwarded-Proto request header fields, respectively. These
headers must be set on the load balancer that fronts the traffic coming into Cloud
Foundry. HTTP traffic passed from the GoRouter to an app will include these head‐
ers. If an app wants to behave differently based on the transport protocol used, it can
inspect the headers to determine whether traffic was received over HTTP or HTTPS.
X-Forwarded-For
X-Forwarded-For (XFF) provides the IP address of the originating client request. For
example, an XFF request header for a client with an IP address of 203.0.56.67 would
be as follows:
X-Forwarded-For: 203.0.56.67
If you did not use XFF, connections through the router would reveal only the origi‐
nating IP address of the router itself, effectively turning the router into an anonymiz‐
ing service and making the detection and prevention of abusive access significantly
more difficult. The usefulness of XFF depends on the GoRouter truthfully reporting
the original host IP address. If your load balancer terminates TLS upstream from the
GoRouter, it must append these headers to the requests forwarded onto the
GoRouter.
X-Forwarded-Proto
X-Forwarded-Proto (XFP) identifies the client protocol (HTTP or HTTPS) used
from the client to connect to the load balancer. The scheme is HTTP if the client
made an insecure request, or HTTPS if the client made a secure request. For example,
128 | Chapter 7: Routing Considerations
an XFP for a request that originated from the client as an HTTPS request would be as
follows:
X-Forwarded-Proto: https
As with most client-server architectures, the GoRouter access logs contain only the
protocol used between the GoRouter and the load balancer; they do not contain the
protocol information used between the client and the load balancer. XFP allows the
router to determine this information. The load balancer stores the protocol used
between the client and the load balancer in the XFP request header and passes the
header along to the router.
XFP is important because you can configure apps to reject insecure requests by
inspecting the header for the HTTP scheme. This header is as important, or even
more so, for system components than for apps. The UAA, for example, will reject all
login attempts if this header is not set.
WebSocket Upgrades
WebSockets is a protocol providing bidirectional communication over a single, long-
lived TCP connection. It is commonly implemented by web clients and servers. Web‐
Sockets are initiated via HTTP as an upgrade request. The GoRouter supports
WebSocket upgrades, holding the TCP connection open with the selected app
instance.
Supporting WebSockets is important because the Firehose (the endpoint of all aggre‐
gated and streamed app logs) is a WebSockets endpoint that streams all event data
originating from a Cloud Foundry deployment. To support WebSockets, operators
must configure their load balancer to pass WebSockets requests through as opaque
TCP connections. WebSockets are also vital for app log streaming, allowing develop‐
ers to view their app logs.
Some load balancers are unable to support listening for both HTTP and TCP traffic
on the same port. Take, for example, ELB offered by AWS. ELB can listen on a port
in either HTTP(s) mode or TCP mode. To pass through a WebSocket request, ELB
must be in TCP mode. However, if ELB is terminating TLS requests on 443 and
appending the XFF and XFP headers, it must be in HTTP mode. Therefore, ELB can‐
not handle WebSockets on the same port. In this scenario, you can do the following:
•Configure your load balancer to listen for WebSocket requests on a nonstandard
port (e.g., 8443) and then forward WebSocket requests to this port in TCP mode
to the GoRouter on port 80 or 443. App clients must make WebSockets upgrade
requests to this port.
Load Balancer Considerations | 129
•Add a second load balancer listening in TCP mode on standard port 80. Config‐
ure DNS with a dedicated hostname for use with WebSockets that resolves to the
new load balancer serving port 80 in TCP mode.
The PROXY Protocol
As just described, WebSockets require a TCP connection; however, when using TCP
mode, load balancers will not add the XFF HTTP protocol headers, so you cannot
identify your clients. Another solution for client identification is to use the Proxy
protocol. This protocol allows your load balancer to add the Proxy protocol header so
that your apps can still identify your clients even when you use TCP mode at the load
balancer.
Another scenario is to terminate TLS with a component that does not support HTTP
and operates only in TCP mode. Therefore, an HTTP connection is then passed on to
the GoRouter.
A point to note is that some load balancers in TCP mode will not give you HTTP
multiplexing and pipelining. This could cause a performance problem unless you
have a content delivery network (CDN) in front.
TLS Termination and IPSec
Although Cloud Foundry is a distributed system, conceptually we can consider it as a
software appliance. It is designed to sit in a dedicated network with defined egress
and ingress firewall rules. For this reason, if the load balancer sits within or on the
edge of the private network, it can handle TLS decryption and then route traffic
unencrypted to the GoRouter.
However, if the load balancer is not dedicated to Cloud Foundry and it is located on a
general-purpose corporate network, it is possible to pass the TLS connection on to
the GoRouter for decryption. To implement this, you must configure your load bal‐
ancer to re-sign the request between the load balancer and the GoRouter using your
wildcard certificate. You will also need to configure the GoRouter with your Cloud
Foundry certificates.
There might be situations in which you require encryption directly back to the app
and data layer. For these scenarios, you can use the additional IPSec BOSH add-on
that provides encrypted traffic between every component machine.
130 | Chapter 7: Routing Considerations
GoRouter Considerations
The GoRouter serves HTTP(S) traffic only. HTTP(S) connections to apps from the
outside world are accepted only on ports 80 or 443. (Protocol upgrade requests for
WebSockets are also acceptable.)
All router logic is contained in a single process. This approach removes unnecessary
latency introduced through interprocess communication. Additionally, with full con‐
trol over every client connection, the router can more easily allow for connection
upgrades to WebSockets and other types of traffic (e.g., HTTP tunneling and proxy‐
ing via HTTP CONNECT).
Routing Table
The router uses a routing table to keep track of available apps. This table contains an
up-to-date list of all the routes to the Cells and containers that are currently running
ActualLRPs. As described earlier, you can map multiple routes to an app and map
multiple apps to a route. The routing table keeps track of this mapping. It provides
the source of truth for all routing, dynamically checking for and pruning dead routes
to avoid 404 errors.
Diego uses its Route-Emitter component to consume event streams from the Diego
Database—the BBS—and then pushes the route updates to the router. Additionally,
the Route-Emitter performs a bulk lookup operation against its database every 20
seconds to fetch all the desired and actual routes.
The GoRouter then recomputes a new routing table based on the IP addresses of each
Cell machine and the host-side port numbers for the Cell’s containers. This ensures
that the routing table information is up to date in the event that an app fails.
Router and Route High Availability
GoRouters should be clustered both for resiliency and for handling a large number of
concurrent client connections.
When GoRouters come on line, they send router.start messages informing Route-
Emitters that they are running. Route-Emitters are monitoring desired and actual
LRP events in the BBS to establish and map routes to app instances. They compute
the routing table and send this table to the GoRouter via NATS at regular intervals.
This ensures that new GoRouters update their routing table and synchronize with
existing GoRouters. Routes will be pruned from the routing table if an app connec‐
tion goes stale. To maintain an active route, the route must be updated by default at
least every two minutes.
GoRouter Considerations | 131
An important implementation consideration for the GoRouter is that because it uses
NATS, it must be brought online after the NATS component in order to function
properly.
Router Instrumentation and Logging
Like the other Cloud Foundry components, the GoRouter provides logging through
its Metron agent. In addition, a /routes endpoint returns the entire routing table as
JSON. Because of the nature of the data present in /routes, the endpoint requires
HTTP basic authentication credentials served on port 8080. These credentials are
obtained from the deployment manifest under the router job:
status:
password: some_password
port: 8080
user: some_user
The credentials can also be obtained from the GoRouter VM at /var/vcap/jobs/
gorouter/config/gorouter.yml.
Each route contains an associated array of host:port entries, which is useful for
debugging:
$ curl -vvv "http://some_user:some_password@127.0.0.1:8080/routes"
In addition to the routing table endpoint, the GoRouter offers a healthcheck end‐
point on /health:
$ curl -v "http://10.0.32.15:8080/health"
This is particularly useful when performing healthchecks from a load balancer. This
endpoint does not require credentials and should be accessed at port 8080. Because
load balancers typically round-robin the GoRouters, by regularly checking the GoR‐
outer health, they can avoid sending traffic to GoRouters that are temporarily not
responding.
You can configure the GoRouter logging levels in the Cloud Foundry deployment
manifest. The meanings of the router’s log levels are as follows:
fatalAn error has occurred that makes the current request unserviceable; for example,
the router cannot bind to its TCP port, or a Cloud Foundry component has pub‐
lished invalid data to the GoRouter.
warn
An unexpected state has occurred. For example, the GoRouter tried to publish
data that could not be encoded as JSON.
132 | Chapter 7: Routing Considerations
info, debug
An expected event has occurred. For example, a new Cloud Foundry component
was registered with the GoRouter, and the GoRouter has begun to prune routes
for stale containers.
Sticky Sessions
For compatible apps, the GoRouter supports sticky sessions (aka session affinity) for
incoming HTTP requests.
When multiple app instances are running, sticky sessions will cause requests from a
particular client to always reach the same app instance. This makes it possible for
apps to store session data specific to a user session. Generally, this approach is not
good practice; however, for some select pieces of data such as discrete and lightweight
user information, it can be a pragmatic approach.
Sticky Sessions
A single app can have several instances running concurrently.
Functional use of the local filesystem is limited to local caching
because filesystems provided to apps are ephemeral unless you use
a filesystem service. By default, changes to the filesystem are not
preserved between app restarts, nor are they synchronized or
shared between multiple app instances.
This means Cloud Foundry does not natively maintain or replicate
HTTP session data across app instances, and all cached session
data will be discarded if the app instance hosting the sticky session
is terminated. If you require session data to be saved, it must be
offloaded to a backing service that offers data persistence.
To support sticky sessions, apps must return a JSESSIONID cookie in their respon‐
ses.
If an app returns a JSESSIONID cookie to a client request, the GoRouter appends an
additional VCAP_ID cookie to the response, which contains a unique identifier for
the app instance. On subsequent client requests, the client provides both the JSES‐
SIONID and VCAP_ID cookies, allowing the GoRouter to forward client requests
back to the same app instance.
If the app instance identified by the VCAP_ID is no longer available, the GoRouter
attempts to route the request to a different instance of the app. If the GoRouter finds
a healthy instance of the app, it initiates a new sticky session.
Sticky Sessions | 133
The TCPRouter
Support for non-HTTP workloads on Cloud Foundry is provided by the TCPRouter.
The TCPRouter allows operators to offer TCP routes to app developers based on res‐
ervable ports.
When pushing an app mapped to a TCP route:
$ cf p myapp -d tcp.mycf-domain.com --random-route
the response from the Cloud Controller includes a port associated with the TCP
route. Client requests to these ports will be routed to apps running on Cloud Foun‐
dry through a layer 4 protocol-agnostic routing tier.
Both HTTP and TCP routes will be directed to the same app port, identified by envi‐
ronment variable $PORT.
The developer experience for TCP routing is similar to previous routing-related
workflows. For example, the developer begins by discovering a domain that supports
TCP routing through cf domains. cf domains will show whether a specific domain
has been enabled for TCP routing by setting up DNS for that domain to point to the
load balancers, and load balancers then pointing to the TCPRouters.
The discovered domain gives a developer an indication that requests for routes cre‐
ated from that domain will be forwarded to apps mapped to that route. It also pro‐
vides a namespace allowing operators to control access for one domain or another.
After you choose your domain, you can then create a route from that domain via the
usual create route and map route commands. However, the cf push experience is
streamlined because the appropriate route will be configured for that app simply by
selecting a TCP domain. For example, to create a TCP route for the app myapp using
the domain tcp-example-domain.com you can run the following:
$ cf push myapp -d tcp-example-domain.com --random-route
TCP routes are different from HTTP routes because they do not use hostnames;
instead, routing decisions are based on a port. For each TCP route, we reserve a port
on the TCPRouter. This requires clients of apps that receive TCP app traffic to sup‐
port these arbitrary ports.
TCP Routing Management Plane
The TCP routing management plane (see Figure 7-5) has similar functionality to the
HTTP routing management plane. There is a Route-Emitter listening to events in
Diego’s BBS. For example, whenever a new app is created or an app is moved or
scaled, the BBS is updated.
134 | Chapter 7: Routing Considerations
2Make sure your port range accounts for sufficient capacity because every TCP connection will require a dedi‐
cated port from your reserved port range.
The emitter detects these events, constructs the routing table, and then sends this
table on to the routing API. The routing API effectively replaces the need for NATS;
it maintains the routing table and then makes the configuration available across a tier
of TCPRouter instances. Therefore, TCPRouters receive their configuration from the
routing API and not via NATS. Both the TCPRouter and the Route-Emitter receive
their configuration from both periodic bulk fetches and real-time server sent events.
TCP routing introduces some complexity through additional NAT involving differ‐
ent ports at different tiers, as illustrated in Figure 7-6. There is a route port to which
clients send requests. This port is reserved on the TCPRouter when you create a TCP
route. Behind the scenes, the TCPRouter makes a translation between that route port
and the app instances. Containers include app ports (that default to 8080). These
ports are not directly accessible via the TCPRouter, because containers are running in
a Cell providing an additional NAT for the container. Therefore, the ports made
known to the TCPRouter are the Cell ports (the backend port).
Figure 7-6. TCPRouter port mappings from the load balancer through to the app
TCPRouter Configuration Steps
Here are the deployment steps required for configuring TCP routing:
1. Choose a domain name from which developers will create their TCP routes.
2. Configure DNS to point to that domain name via the load balancer.
3. Choose how many TCP routes the platform should support based on the
reserved ports on the TCPRouter.
4. Configure the load balancer to listen on the port range2 and then forward
requests for that port range and domain to the TCPRouters.
The TCPRouter | 135
5. Configure the TCPRouter group within Cloud Foundry with the same port
range.
6. Create a domain associated with that TCPRouter group.
7. Configure a quota to entitle Orgs to create TCP routes.
Route Services
Apps often have additional requirements over and above traditional middleware
components, such as databases, caches, and message buses. The additional app
requirements include tasks such as authentication, or might require a special firewall
or rate limiting. Traditionally, these burdens have been placed on the developer and
app operator to build additional (nonbusiness) capabilities into the app or directly
use and configure some other external capability such as an edge caching appliance.
The route services capability makes it possible for developers to select a specific route
service from the marketplace (in a similar fashion to middleware services) and insert
that service into the app request path. They offer a new point of integration and a
new class of service.
As seen in Figure 7-7, route services give you the ability to dynamically insert a com‐
ponent (in the case of Figure 7-7, Apigee) into the network path as traffic flows to
apps. Traditionally, developers had to file a ticket to get a new load balancer configu‐
ration for additional firewall settings, or IT had to manually insert additional net‐
work components for things like rate limiting. Route services now offer these
additional capabilities dynamically via integration with the router.
136 | Chapter 7: Routing Considerations
Figure 7-7. The Route Service showing the path of application user traffic to an app,
accessing an API Proxy Service (in this case provided by Apigee)
Unlike middleware marketplace services that are bound to an app, route services are
bound to a route for a specific app. New requests to that app can then be modified via
a route service. Just like the middleware services in the marketplace, the route service
might not necessarily be Cloud Foundry, or BOSH–deployed and managed. For
example, a route service could be the following:
• An app running on Cloud Foundry
• A BOSH-, Puppet-, or Chef-deployed component
• Some other external enterprise service provided by a third party such as Apigee
Route Service Workflow
All requests arrive (1) via the external load balancer, which passes traffic on to the
GoRouter (2). The GoRouter checks for a bound route service for a route, and if no
service exists, it will simply pass traffic on to the appropriate LRP (3). If a route ser‐
vice does exist for the route, the router will then pass that traffic on to the service that
is bound to the route.
Before passing the request on to the service, the router generates an encrypted short-
lived message to include both the requested route and the route service GUID. The
Route Services | 137
router then appends this message to the request header and forwards the request to
the bound route service. After the specific route service has undertaken its work (e.g.,
header modification or rate limiting) the service can do one of two things:
•Respond directly to the request (e.g., serve an access-denied message if acting as
an app firewall)
•Pass the traffic to the app
The route service passes traffic to the app (4) by resolving via DNS back to the load
balancer (5), then to the router (6), and then to the app. The response traffic then
follows that same flow backward to return a response to the client (7/8/9/10/11/12).
This allows the service to do further modification on the returned response body if
required. Figure 7-8 provides an overview of the architecture.
Figure 7-8. Route service workflow showing the redirection of app traffic to the route
service before being directed back to the app via the load balancer (note that the return
flow retraces the same path in reverse)
Route Service Use Cases
You can consider any use cases that can be on the request path as eligible for a route
service. Here are some examples:
• Gateway use cases such as rate limiting, metering, and caching
•Security use cases such as authentication, authorization, auditing, fraud detec‐
tion, and network sniffing
• Analytics use cases such as monetization, chargeback, and utilization
•Mobile backend as a service (MBaaS) such as push notifications and data services
138 | Chapter 7: Routing Considerations
In line with the rest of Cloud Foundry, the key goal of routing services is to increase
app velocity. Without the developer self-service that route services provide, most
organizations are left with the pain of ticketing systems and extra configuration to
achieve these use-case capabilities.
Summary
The core premise of Cloud Foundry is to allow apps to be deployed with velocity and
operated with ease. The routing abstractions and mechanisms within Cloud Foundry
have been designed and implemented to support that premise:
•Routing is an integral part of deploying and operating apps.
•Cloud Foundry provides a rich set of abstractions and mechanisms for support‐
ing fast deployment, rolling upgrades, and other complicated routing require‐
ments.
•Establishing the most appropriate routing architecture is essential for app secu‐
rity, resiliency, and updatability.
•With the introduction of the TCPRouter and additional route services, the plat‐
form can take on more diverse workloads with broader, more granular routing
requirements.
Summary | 139
CHAPTER 8
Containers, Containers, Containers
Containers, as a concept, is not new technology. However, in recent years, there has
been rapid adoption of new container-based technologies such as Docker, Garden,
and Rocket. Many organizations regard containers as a key enabler for adopting tech‐
nologies such as microservices-based architectures or continuous delivery pipelines.
Therefore, containers have become a critical part of the digital transformation strat‐
egy of most companies.
Some companies I work with establish a mandate to adopt containers but cannot
articulate a specific use case or problem that would be solved by adopting containers.
Others believe containers will help them deploy apps quicker but cannot explain why,
or they believe containers will provide better utilization but have not profiled their
existing infrastructure utilization. The use of container technology absolutely can
provide significant benefits. It is essential to understand how those benefits are
achieved in such a way as to derive the most from container technology.
The Meaning of “Container”
Like the term “platform,” the term container is also overloaded.
This chapter is not about traditional app server containers such as
Tomcat; it is about OS–level containers such as runC.
What Is a Container?
Despite their huge popularity, there still are a lot of misconceptions about containers.
Principally, in the Linux world at least, containers are not a literal entity, they are a
logical construct. Strictly speaking, a container is nothing more than a controlled user
process. Container technology typically takes advantage of OS kernel features to con‐
strain and control various resources, and isolate and secure the relevant container‐
141
ized processes. In addition, the term “container” conflates various concepts, which
adds to the confusion.
Containers have two key elements:
Container Images
These package a repeatable runtime environment (encapsulating your app and
all of its dependencies, including the filesystem) in a format that is self-
describing and portable, allowing for images to be moved between container
hosts. The self-describing nature of container images specifies instructions on
how they should be run, but they are not explicitly self-executable, meaning that
they cannot run without a container management solution and container run‐
time. However, regardless of the container contents, any compliant container
runtime should be able to run the container image without requiring extra
dependencies.
Container management
Often referred to as container engine, the management layer typically uses OS
kernel features (e.g., Linux kernel primitives such as control groups and name‐
spaces) to run a container image in isolation, often within a shared kernel space.
Container-management engines typically expose a management API for user
interaction and utilize a backend container runtime such as runC or runV that is
responsible for building and running an isolated containerized process.
Container Terminology
The challenge in discussing containers is that implementation ter‐
minology can mean different things to different people, and it can
be implementation-specific. I have tried to be as generic as possible
in my description, but it is important to note that different terms
like engine or backend can have meanings relating to a specific
technology implementation.
Container images can be created through a concept of containerization: the notion of
packaging up a filesystem, runtime dependencies, and any other required technology
artifacts to produce a single encapsulated binary image. You can then port these
images around and run them in different container backends via the container API/
management layer. The container backend implementation is host-specific; for
example, the term Linux container is a reference to the Linux technology (originally
based on LXC) for running containerized images. Currently, Linux containers are by
far the most widely adopted container technology. For this reason, the rest of this
chapter focuses on Linux containers to explain the fundamental container concepts.
142 | Chapter 8: Containers, Containers, Containers
1The terms “VM” and “machine” are used interchangeably because containers can run in both environments.
Container Fervor
Why have containers become so popular so quickly? Containers offer three distinct
advantages over traditional VMs:
•Speed and efficiency because they are lightning fast to create
•Greater resource consolidation, especially if you overcommit resources
• App stack portability
Because containers use a slice of a prebuilt machine or VM,1 they are generally regar‐
ded to be significantly faster to create than a new VM. Effectively, to create a new
container, you simply fork and isolate a process.
Containers also allow for a greater degree of resource consolidation because you can
run several container instances in isolation on a single host machine by a single OS
kernel.
In addition, containers have enabled a new era of app-stack portability because apps
and dependencies developed to run in a container backend can easily be packaged
into a single container image (usually containing a tarball with some additional meta‐
data). You can then deploy and run container images in several different environ‐
ments. Container images make it easy to efficiently ship deltas, and therefore moving
whole images between different host machines becomes practical. App-stack porta‐
bility is one of the key reasons why containers have become so popular. Container
images have become a key enabler for trends such as DevOps and CD by enabling
both the app artifacts and all of the related runtime dependencies to migrate
unchanged, as a layered binary image, through a CI pipeline into production. This
has provided a unified approach to delivering software into production as opposed to
the old and defunct “it worked on my machine” approach. Chapter 9 discusses this
unified approach and its various merits in further detail.
Increased deployment efficiency becomes paramount when deploying apps using a
microservices architecture because there are more moving parts and more overall
churn. Container-based infrastructure has therefore become a natural choice for
teams using a microservices architecture.
Containers, as with all technology, are a means to an end. It is not technology itself
that is important; it is how you take advantage of it that is key. For example, we dis‐
cussed using container images to propagate apps through a pipeline and into produc‐
tion. However, the pipeline itself can also effectively use containers. For example, the
Concourse CI controls pipeline inputs so that results are always repeatable. Rather
Container Fervor | 143
than sharing state, every task runs in its own container, thus controlling its own
dependencies. Containers are created and then destroyed every time a task is run,
ensuring repeatability by avoiding build “pollution.” Concourse’s use of containers is
a perfect example of how we can use container technology to provide a clear tangible
benefit over more traditional approaches that experience polluted build-up due to a
pattern of VM reuse.
Linux Containers
Linux containers provide a way of efficiently running an isolated user process. As just
discussed, strictly speaking, Linux containers do not exist in a purely literal sense:
there are no specific Linux kernel container features. Existing kernel features are
combined together to produce the behavior associated with containers, but Linux
containers themselves remain a high-level abstract concept.
The essence of container abstraction is to run processes in an isolated environment.
Linux container technologies use lower Linux primitives and kernel features to pro‐
duce container attributes such as the required process isolation. Other Unix OSs
implement containers at the OS kernel level; for example, BSDJails or Solaris Zones.
How do containers differ from VMs? There are two key elements:
• Where processes are run
•What is actually run
Although the container backend (the runtime) can, technically, be backed by a VM,
traditionally speaking, containers are fundamentally different in concept. A VM vir‐
tualizes the entire machine and then runs the kernel and device drivers that are then
isolated to that VM. This approach provides superb isolation; however, historically,
at least, VMs have been considered relatively slow and expensive to create. Contain‐
ers, on the other hand, all share the same kernel within a host machine, with isolation
achieved by using various kernel features to secure processes from one another. Cre‐
ating a container amounts to forking a process within an existing host machine. This
is orders of magnitude faster than instantiating a traditional VM.
Containers versus VMs
The container versus VM debate is blurring in many ways. Special‐
ized, minimal, single-address-space machine images known as uni‐
kernels allow for fast VM instantiation. You can replace container
backends like runC with VM equivalents such as runV. At the end
of the day, the important concern is not what runs your container‐
ized process, but that your process is being run with the appropri‐
ate isolation guarantees and resource constraints.
144 | Chapter 8: Containers, Containers, Containers
The core Linux primitives and kernel features that produce container attributes
include the following:
• Namespaces to enforce isolation between containers
• Control groups to enforce resource sharing between containers
We look at both of these kernel features in more detail a little later in the chapter. It is
worth keeping in mind that the typical Cloud Foundry developer or operator does
not require a deep understanding of containers, because the Cloud Foundry platform
handles the container creation and orchestration concerns. However, gaining a
deeper understanding is valuable for both the curious and the security-minded oper‐
ator.
Namespaces
Namespaces provide isolation. They offer a way of splitting kernel resources by wrap‐
ping global system resources in an abstraction. The process within a namespace sees
only a subset (its own set) of those kernel resources. This produces the appearance of
the namespaced process having its own isolated instance of the global resource.
Changes to a resource governed by a namespace are visible only to other processes
that are members of that namespace; they cannot be seen by processes outside of the
namespace.
A process can be placed into the following namespaces:
PID
These processes view other processes running inside the same (or a child) name‐
space.
network
These processes have their own isolated view of the network stack.
cgroup
These processes have a virtualized view of their CGroups root directories.
mount
These processes have their own view of the mount table; mounts and unmounts
do not affect processes in other namespaces.
uts/username
These processes have their own hostname and domain name.
ipc
These processes can communicate only with other processes within the same
namespace via system-level IPCs (interprocess communication).
Linux Containers | 145
user
These processes have their own user and group IDs. Host users and groups are
remapped to local users and groups within the user namespace.
Take, for example, the PID namespace. Processes are always structured as a tree with
a single root parent process. A Linux host uses PID 1 for the root process. All the
other processes are eventually parented (or grandparented) by PID 1.
A container’s first process—which might have a PID 123 in the host—will appear as
PID 1 within the container. All other processes in the container are similarly map‐
ped. Processes that are not in the container’s PID namespace do not appear to exist to
the namespaced process. Effectively, both the namespaced process and the host have
different views of the same PID. It is the kernel that provides this mapping.
Upon container creation, a clone of the container process is created within the newly
created namespace. This cloned process becomes PID 1 in the newly created name‐
space. If that process then makes a kernel call asking “what is my Process ID,” the ker‐
nel does the mapping for the process transparently. The process is unaware it is
running within a PID namespace.
As another example, take user namespaces. The host views a user namespaced pro‐
cess with, for example, UID 4000. With user namespaces, the process can ask “what is
my User ID”? If the process is running as root user, the namespaced response from
the kernel will be UID 0 (root). However, the kernel has explicitly mapped the name‐
spaced process such that the process thinks it is root. The host still knows the process
as, for example, UID 4000. If the process attempts to open a file owned by the host
root, the kernel maps the process to UID 4000, checks the UID against the host file‐
system, which is actually mapped to 0, and will then correctly deny the process access
to that host file because of invalid permissions.
The preceding is just an illustration. With the abstraction of con‐
tainers in Cloud Foundry, host files are not even visible to try to
open them. For other container scenarios outside of Cloud Foun‐
dry, there can be, however, some value in joining only a subset of
namespaces. For example, you might join the same network name‐
space to another process but not the mount namespace. This would
allow you to achieve an independent container image that shares
just the network of another container. This approach is known as a
sidecar container.
Security through namespaces
When running processes in a dedicated VM, the responsibility for sharing physical
resources is pushed down to the hypervisor. Containers that run on a single host are
at the mercy of the kernel for security concerns, not an isolated hypervisor. Even
146 | Chapter 8: Containers, Containers, Containers
though container isolation is achieved through namespaces, the namespaces still
share the underlying resource.
Cloud Foundry provides multilayered security for containers. Principally, Garden—
Cloud Foundry’s container creation and management API—does not allow processes
to run as root (the host’s root). Garden can run two types of containers:
•Privileged containers that have some root privileges (useful for testing Garden
itself)
•Unprivileged containers secured as much as possible; for example, processes run‐
ning as pseudo root not host root
For tighter security, Cloud Foundry recommends that everything be run in unprivi‐
leged containers. A buildpack-built app will never run as root; it will always run as
the Cloud Foundry created user vcap. In addition, for buildpack-built apps, Cloud
Foundry uses a trusted secure rootfs provided by the platform. The Cloud Foundry
engineering team has hardened the rootfs to remove exploits that could enable a user
to become root. Building containers from the same known and trusted rootfs
becomes a powerful tool for release engineering, as discussed in “Why BOSH?” on
page 170.
Isolating the use of the root user is not unique to containers; this is generally how
multiuser systems are secured. The next layer of security is containerization itself.
The act of containerization uses Linux namespaces to ensure each container has its
own view of system resources. For example, when it comes to protecting app data,
each container has its own view of its filesystem with no visibility of other containers’
files.
For Linux, conceptually, the kernel is unaware that a process is running in a con‐
tainer. (Remember, there is no actual container, only an isolated process.) Without
the use of namespaces, any non-namespaced kernel call that provides access to a host
resource could allow one process to directly affect another. Namespaces therefore
provide additional security for processes running in a shared kernel.
A point to be aware of is that Docker containers can run as root and, therefore, if not
mitigated, could potentially compromise the underlying host. To address this vulner‐
ability, Cloud Foundry uses a namespaced root for Docker images, so a Docker con‐
tainer still cannot read random memory. Assuming that you have PID and user
namespaces, you should never have the ability to read a random process’s RAM.
Linux Containers | 147
Data Locality
Data is generally regarded to be the principal attack surface as
opposed to simply another app process. Therefore, it is critical to
maintain data isolation between processes. Container technology
uses namespaces to isolate system resources such as filesystem
access. Arguably, containers are still less secure than a dedicated
VM because VMs are not sharing memory at the OS level. The
only way to access another process’s isolated memory would be to
break out of the VM on to the host hypervisor. With containers, if
a file descriptor is left open, because container processes reside in
the same OS, that file descriptor becomes more exposed. For the
Cloud Foundry model, this should not be a major issue, because,
generally speaking, data is not stored on the local filesystem but
rather in a backing service.
CGroups
Namespaces provide containers with an isolated view of the host; however, they are
not responsible for enforcing resource allocation between containers. A Linux kernel
feature known as control groups (CGroups) is used for resource control, accounting,
and isolation (CPU, memory, disk I/O, network, etc.). CGroups enforce fair sharing
of system resources such as CPU and RAM between processes. You can apply
CGroups to a collection of processes; in our case, a namespaced set of container pro‐
cesses.
Disk Quotas
Resource limits (R limits) define things such as how many files can be opened or how
many processes can be run. However, the Linux kernel does not provide any way to
limit disk usage; therefore, disk quotas have been established. Disk quotas were origi‐
nally dependent on quotas based on user IDs. However, because in Docker a process
can be root, that process could create a new user to get around disk quotas, given that
all new users will be provisioned with a new disk. This loophole allows disk usage to
keep growing. As a result, today disk quotas tend to use a layered—copy-on-write—
filesystem for Linux. This allows Linux to scale for the storage that will be available,
with the ability to administer and manage the filesystem with a clean interface and a
clear view of what is being used.
Filesystems
The filesystem image format comes from either a preexisting container image (e.g., if
you use Docker) or it will be Cloud Foundry–created based on available stemcells.
Cloud Foundry–created containers are known as trusted containers because of their
use of a hardened rootfs. Trusted containers use a single-layer filesystem known as a
148 | Chapter 8: Containers, Containers, Containers
stack. The stack works in combination with a buildpack to support apps running in
containers.
A stack is a prebuilt root filesystem (rootfs). Stacks support a spe‐
cific OS; for example, Linux-based filesystems require /usr
and /bin directories at their root. Cloud Foundry app machines
(Diego Cells) can support multiple stacks.
Upon creation, every trusted container uses the same base image (or one of a small
set of base images). The container manager in Cloud Foundry, Garden, can create
multiple containers from its base images. Cloud Foundry uses a layered filesystem. If
every newly created container required a new filesystem, around 700 MB would have
to be copied from the base image. This would take multiple seconds or even minutes,
even on a fast solid-state drive (SSD), and would result in wasted unnecessary stor‐
age. The layered approach amounts to immediately instantiating a filesystem because
it provides a read-only view of the filesystem.
Here’s how it works:
1. On every Cell, there resides a tarball containing the container’s rootfs.
2. When a Cell starts, it untars the rootfs (for argument’s sake, to /var/rootfs).
3. When Garden creates a new container, it takes in a parameter called rootfs (pass‐
ing in /var/rootfs) and imports the contents of that directory into a layered
filesystem graph.
4. For this new container and only this container, Garden then makes a layer on top
of this rootfs; this layer is the resulting filesystem for this container.
5. When a second container is used, it recognizes that the base layer is already in
place, and will create a sibling (a second layer) on the base rootfs for the new
container.
6. The host has a tree structure consisting of a single base rootfs for all containers,
with each container having a layer on top of that base. All containers are there‐
fore siblings of one another.
7. On a write, the container copies that write to a layer above the base image.
8. When the container does a read, it first checks the top level and then goes down
for the read-only content.
Untrusted containers (containers such as Docker that can run as a pseudo root) also
often use a layered filesystem. A Docker container image is slightly more complex
than just a single filesystem. Docker images are constructed from a Dockerfile, a
script containing a set of instructions for building a Docker image very similar to a
vagrant script used for building VMs. Every line in the script becomes a layer stored
Linux Containers | 149
upon the previous layer. The Docker image can then build up multiple filesystem lay‐
ers on top of one another. In addition to the layered filesystem, Docker images usu‐
ally also contain metadata such as environment variables and entry points. Cloud
Foundry must apply a quota to the rootfs to stop users from pushing containers with,
for example, 100 GB of MapReduce data.
Container Implementation in Cloud Foundry
There is currently a degree of confusion and misinformation in the marketplace with
respect to containers. This is largely because, as described earlier in “What Is a Con‐
tainer?” on page 141, the term “container” conflates various concepts. Moreover, the
terminology used to describe container concepts tends to differ based on specific
implementations. For example, when people refer to running containers, what they
are really describing is the running of containers plus a number of other things such
as package management, distribution, networking, and container orchestration. Con‐
tainers are of limited value if considered in isolation because there are additional con‐
cerns surrounding them that must be addressed in order to run production
workloads at scale; for example, container orchestration, clustering, resiliency, and
security.
Cloud Foundry’s container manager, Garden, runs container processes using the
runC backend runtime, which is a CLI tool for spawning and running containers.
When Cloud Foundry uses the Garden API to make a container (running a process
within runC), a couple of additional things happen:
File/volume system management
Containers require a filesystem. Whether that comes from Cloud Foundry’s
rootfs or a Docker image on Docker Hub or some other Docker registry, broadly
speaking, the container engine sets up a volume on disk that a container can use
as its root filesystem. This mechanism provides filesystem isolation. The filesys‐
tem is a path on a disk and Garden imperatively tells the container, “make this
filesystem path the root filesystem of the container.”
Networking
runC has no opinions about the network. The runC API allows Garden to specify
a container that should be run within a network namespace. Therefore, Cloud
Foundry provides additional code that will set up interfaces and assign IPs for
each container so that it can provide each container with its own IP in order for
it to be addressable.
Why Garden?
Garden offers some key advantages when used with Cloud Foundry. First, because
Garden uses runC for the container runtime, it allows both Docker images and
150 | Chapter 8: Containers, Containers, Containers
buildpack-staged containers to be run in Cloud Foundry. Therefore, the use of Gar‐
den does not preclude the use of Docker images.
The primary reason why Garden is the right choice for Cloud Foundry is that good
architecture (especially, complex distributed architecture) must support change. Gar‐
den provides a platform-neutral, lightweight container abstraction so that it can be
backed by multiple backends. Currently, Cloud Foundry Garden supports both a
Linux backend and a Windows backend. This allows Cloud Foundry to support a
wider array of apps. For example, Windows-based .NET apps can run on Cloud
Foundry along with other apps running in Linux containers such as .NET Core, Java,
Go, and Ruby.
The Garden API contains a set of interfaces that each platform-specific backend must
implement. These interfaces contain methods to perform the following actions:
• Create/delete containers
• Apply resource limits to containers
•Open and attach network ports to containers
• Copy files to and from containers
• Run processes within containers, streaming back stdout and stderr data
• Annotate containers with arbitrary metadata
• Snapshot containers for zero-downtime redeploys
In Diego, the Garden API is currently implemented by the following:
•Garden-runC (Garden backed by runC), which provides a Linux-specific imple‐
mentation of a Garden interface.
•Garden-Windows, which provides a Windows-specific implementation of a Gar‐
den interface.
OCI and runC
As discussed earlier in “Linux Containers” on page 144, Linux achieves container-like
behavior through isolation, resource sharing, and resource limits. The first well-
known Linux container implementation was LXC. Both Docker and Cloud Foundry’s
first container manager, Warden, originally used LXC and then built container man‐
agement capability on top. Both Docker and Cloud Foundry have since moved to
employing runC as the technology that spawns and runs the container process.
runC is a reference implementation according to the Open Container Project (OCP),
governed by the OCI. OCI is an open-governance structure for creating open indus‐
try standards around container formats and runtime.
Container Implementation in Cloud Foundry | 151
OCI has standardized backend formats and provided the runC reference implemen‐
tation, which many higher-level systems have now adopted (e.g., Garden and
Docker). runC was established primarily through Docker pulling out the non-
Docker-specific container creation library (libcontainer) for reuse.
Container implementation should not be a concern to the developer, because it is just
an implementation detail. Developers are more productive by focusing on their app,
and Cloud Foundry makes it possible for them to keep their focus on the higher level
of abstraction of apps and tasks. So, with this in mind, why focus on runC, which is a
specific implementation? Why is runC important?
The answer is unification for the good of all! Container fervor is exploding. Various
technologies (such as VMs and containers) are, in some cases, blending together, and
in other cases (such as container orchestration), pulling in opposite directions. In this
complex, emerging, and fast-moving space, standards serve to unite the common
effort around a single goal while leaving enough room for differentiation, depending
on the use case. This promise of containers as a source of lightweight app portability
requires the establishment of certain standards around both format and runtime.
These standards must do the following:
• Not be bound to a specific product, project, or commercial vendor
•Be portable across a wide variety of technologies and IaaS (including OSs, hard‐
ware, CPU architectures, etc.)
The OCI specification defines a bundle. A bundle is JSON that describes the con‐
tainer, stating the following:
•It should have this path as the rootfs.
• It should have this process and be in these namespaces.
It is the bundle that should become truly portable. Users should be able to push this
OCI bundle to any number of OCI-compatible container environments and expect
the container to always run as designed. The debate on the use of a specific container
technology such as Garden versus Docker Engine is less important because both of
these technologies support the same image format and use runC. These technologies
are just an implementation detail that a platform user never sees nor need ever be
concerned about. Most container orchestration tools are increasingly agnostic about
what actually runs the container backend. For example, if you want to swap out runC
for runV—a hypervisor-based runtime for OCI—this should not affect the running
of your OCI container bundle.
152 | Chapter 8: Containers, Containers, Containers
Container Scale
Cloud Foundry is well suited for apps that can scale-out horizontally via process-
based scaling. Cloud Foundry can also accommodate apps that need to scale verti‐
cally to some extent (such as increased memory or disk size); however, this limit is
bound by the size of your containers and ultimately the size of your container host.
One app should not consume all of the available RAM or disk space on a host
machine. Generally speaking, apps that scale memory vertically do so because they
hold some state data in memory. Apps that scale disk space vertically do so because
they write data to a local disk. As a best practice, you should avoid holding excessive
state in memory or writing user data to local disk, or you should look to minimize
these and offload them to a dedicated backing service such as a cache or database
wherever possible.
Container Technologies (and the Orchestration Challenge)
All mainstream container technologies allow you deploy container images (at the
moment, generally in Docker image format, although other standards exist) in a
fairly similar way. Therefore, as mentioned earlier, the key concern is not the stand‐
ardized backend implementation, but the user experience of the container technol‐
ogy.
You should not run containers in isolation. Low-level container technology alone is
insufficient for dealing with scale and production environment concerns. Here are a
few examples:
•If your container dies (along with your production app), who will notice, and
how will it be restarted?
•How can you ensure that your multiple container instances will be equally dis‐
tributed across your VMs and AZs?
•How will you set up and isolate your networks on demand to limit access to spe‐
cific services?
For these reasons, container orchestration, container clustering, and various other
container technologies have been established. As a side effect of the explosive growth
in the container ecosystem over recent years, there is a growing list of container
orchestration tools including Kubernetes, Docker Swarm, Amazon EC2 Container
Service, and Apache Mesos. In addition, some companies still invest in in-house solu‐
tions for container orchestration.
Container orchestration is a vital requirement for running containers at scale.
Orchestration tools handle the spawning and management of the various container
processes within a distributed system. Container orchestration manages the con‐
Container Technologies (and the Orchestration Challenge) | 153