Csmo Field Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 34

DownloadCsmo-field-guide
Open PDF In BrowserView PDF
Download the current version of the
IBM Cloud Service Management &
Operations Field Guide
https://ibm.biz/csmo-field-guide

© Copyright International Business Machines Corporation 2018, 2019. US
Government Users Restricted Rights - Use, duplication or disclosure restricted
by GSA ADP Schedule Contract with IBM Corp.

Cloud service management &
operations
Most people who say they are doing DevOps are doing
mostly dev and very little ops. Cloud service management
and operations is about designing, implementing, and
continuously improving the operations management
processes you use in your enterprise. Cloud service
management and operations is organized into personas
who do the work, processes that define what work is
needed and how it is performed, and tools to enable and
support these activities.
KEEP THE OPS IN DEVOPS

Enable agile for operations. Implement agile and
continuous delivery practices for operations in the same
way you do for development.
Refine for the cloud. Revisit the activities of plan, design,
deliver, operate, and control then transform them to
better fit the needs of cloud based operations.
Realize the benefits. Support applications in the cloud to
ensure an “always on” experience for your customers.

What’s inside?
This field guide provides a high-level overview of cloud
service management and operations.
LEARN IT
A summary of the
concepts.

GET STARTED
Considerations for moving
ops into your development
process.

LEARN IT

IBM’s unique approach
After an application is pushed to production, it must be
managed. Cloud service and management operations
addresses the operational aspect of your application and
services. Applications are monitored to ensure availability
and performance according to service level agreements. As
methods to develop, test, and release new functions become
more agile, service management must also transform to
support this paradigm shift.
REINVENT YOUR CLOUD OPERATIONS
Organize your team. Create dedicated DevOps teams with
full lifecycle responsibility from design to development to
operations and global Site Reliability Engineering teams, to
ensure availability, stability, and growth.
Streamline your processes. Adapt service management
processes to work in the context of DevOps automation and
continuous delivery.
Choose and use your tools. Adopt tools and methodologies,
such as ChatOps, to enable collaboration and rapid
restoration of service.
Build your culture. Change your existing culture to embrace
blameless post-mortems and agile operations.
Learn more
Check out IBM Cloud Service Management architecture.
https://ibm.biz/csmo-guide-ibm

2

Redefine service management

Work with IBM to improve your operations practices.

LEARN IT

Slow is the new down
It’s all about the customer’s experience. Customers demand fast
service along with the rapid delivery of new products and features.
If your mobile app or website is slow and does not perform, your
site might as well be down. Your customer will take their business
elsewhere.
TARGET 24/7/365 AVAILABILITY
Shift left. Use automation to test and deploy your applications as early
in your development cycle as possible.
Test your apps. Run your automated tests as part of your DevOps
pipeline - every time you deploy.
Test APIs and health checks. Ensure that the APIs and health checks
used by your app are accurate and available.
The 4 golden signals. To ensure you detect an issue before it causes an
outage, prioritize monitoring of the four golden signals: latency, traffic,
error rate, and saturation.
Monitor what is important. Provide observability by instrumenting
the application and services. Extend the management and monitoring
functionality of containers through sidecars and a service mesh
framework such as Istio. Most importantly, monitor the service as it is
experienced by the end-user.
Learn more
Check out the Manage practices of the Cloud Innovate method.
https://ibm.biz/csmo-guide-slow

4

Performance is critical

Customers expect your app to perform on demand.

GET STARTED

Relevance of
Service Management
Originally developed in the 1980’s, the Information Technology
Infrastructure Library (ITIL) is one of several competing standards
for IT Service development and management. ITIL is comprehensive
and has proven its worth in defining key processes and their
relationships.
LEARN FROM THE PAST. ADAPT FOR THE FUTURE.
Is ITIL relevant now? With modernization and an understanding of
where to deviate, the premise and concepts of ITIL are absolutely
still relevant.
Proven practices. Jumpstart your own agile IT operational
processing using proven patterns, including process descriptions,
feeds, and outputs defined in ITIL.
Adapt to modern team structures. Break down the silos and adapt
ITIL to integrated, cross-function team structures inherent in agile
DevOps teams.
Adopt an agile approach. Especially in a hybrid environment,
traditional approaches (like ITIL) need to meet and integrate with
Learn more
Check out Service management for IT and cloud services.
https://ibm.biz/csmo-guide-itil

6

Bring the old into the new.

GET STARTED

Incident management
Incident management restores your services as quickly as possible
by using a first-responder team that is equipped with automation
and well-defined runbooks. To maintain the best possible levels of
service quality and availability, your team performs sophisticated
monitoring to detect issues early, before the service is affected.
DETECT ISSUES EARLY. QUICKLY RESTORE APP SERVICES.
Enable monitoring with notifications. Detect outages and
performance saturations and alert your subject matter experts
(SMEs) when something is going wrong.
Analyze the incident. Use event management to correlate events,
remove noise, and show actionable alerts enriched with additional
context.
Plan and collaborate. Use ChatOps to enable SMEs across multiple
domains to collaborate, isolate the incident, and identify an effective
response.
Resolve the incident. Respond by fixing the problem and informing
stakeholders of progress and resolution.
Learn more
Check out the Incident Management subdomain.
https://ibm.biz/csmo-guide-incident

8

Resolve incidents fast!

Quickly find and fix your problems to
minimize customer impact.

GET STARTED

ChatOps
ChatOps integrates development tools, operations tools, and
processes into a collaboration platform so that teams can efficiently
communicate and easily manage the flow of their work. The solution
maintains a time line of team communication that provides a record
and keeps everyone up to date, avoiding information overload.
EXTEND THE POWER AT YOUR FINGERTIPS
Simplify your processes. Streamline collaboration and increase
visibility to other’s actions by pushing information to problem
solvers, instead of ping-ponging issues and working hard to find
information.
Integrate your tools. Integrate Service Management and DevOps
tools into the chat platform so the team can concentrate on solving
problems without disruptive context switches and lengthy hand-offs.
Automate everything. Increase velocity with bots that answer
questions and remotely execute commands, resulting in fewer
meetings, less repetition, less manual work and more teamwork and
reuse.

Learn more
See how the effective teams uses Slack for team communications.
https://ibm.biz/csmo-guide-chatops

10

Opt for ChatOps!

Through collaboration tools, teams can get to know each
other a little better, work together more efficiently and
even have more fun at work!

GET STARTED

Problem management
People expect cloud services to always be available and to improve
continuously. It’s the driver to eliminate repeated issues. You must
fix the right issue the first time, as repeated problems lead to a loss
of faith in your application. Problem management means getting to
the root cause of a system degradation or unavailability. Apply the 5
Whys technique to quickly discover the root cause of a problem.
BLAMELESS POST INCIDENT REVIEW
Know when to dig deeper. Perform root cause analysis when issues
occur more than once, when an outage could affect many users, or
when the system is not working as designed.
Apply the 5 Whys. State the issue and discuss why it happened.
Agree on the answer and again ask why? Repeat until you get to the
root cause of the problem and take action to address the root cause.
Remember, in complex landscapes there may be more than one root
cause.
Set response standards. Deliver the initial root cause response in
24 hours and the final findings within 5 days. Standards create a
sense of urgency and ensure the correct focus on the problem.

Learn more
Check out Root-cause analysis using the 5 Whys technique.
https://ibm.biz/csmo-guide-problem

12

Don’t assign blame.

Use the 5 Whys iteratively until you find the
root cause of the problem.

GET STARTED

Operational readiness
When apps fail and it takes excessive time to determine the root
cause and restore service, customers get frustrated. You want
to ensure your customers are delighted. An assessment of your
organization’s operational readiness answers three questions: what
needs to change? how significant is the change? and what are the
expected benefits? You can then use these answers to identify the
gaps you need to close.
REVIEW. ANALYZE. IMPROVE. REPEAT.
Assess where you are. Engage in an operational readiness review
to examine all key operational processes and determine the as-is
versus the to-be state.
Determine where you need to be. There are cost and risk tradeoffs
inherent in all processes. Assess each process to determine where
you need to be.
Improve and assess continuously. Identify gaps, where processes
don’t meet minimum requirements and put plans in place to address
them. As you mature and needs change over time, repeat the whole
process regularly.
Learn more
Check out the IBM Cloud Service Management offering.
https://ibm.biz/csmo-guide-readiness

14

Learn where your gaps are

Assess where you are and determine where you need to be.

GET STARTED

Build to Manage
The Build to Manage principles mandate that the developers use a
set of standards and solutions to make the application manageable
and ensure that the application will meet service level objectives.
BUILD MANAGEABLE APPS FROM THE START
Standards for mangeability. Expose manageable features using
Build to Manage principles so you can scale the management of
loosely coupled applications developed in different ways by different
teams.
Shift left. More than ever before, application developers have a
larger responsibility to develop application management capabilities
into their application. They are the ones who know how to create
runbooks, analyze logs and traces to identify and solve issues.
Observability. Develop applications with APIs that can report
health, metrics and other application status to your management
platform.
Test like it runs. Every step in the application lifecycle must be
accompanied by an equivalent automated test.
Learn more
Check out Build to Manage principles.
https://ibm.biz/csmo-guide-b2m

16

Manage from the start

Build manageability into your app from the start.

GET STARTED

Site reliability engineering
(SRE)
SREs are engineers who specialize in reliability with the right mix
of knowledge and skills in software and systems, responsible
to analyze business needs, problem determination, advise &
design, build, test, deploy, changes and maintenance of a well
engineered information system. SREs often work hand-in-hand with
development scrum team members.
EMBRACE RISK IN A CONTROLLED FASHION
Strengthen the infrastructure. Engage in an operational readiness
review to examine all key operational processes and determine the
as-is versus the to-be state.
Automate everything. SREs use automation to provide reliability
resiliency, and availability aspects to an application. SREs don’t stop
at automation, they engineer the problem away.
Manage risk using an error budget. The SRE team defines the
quality of a service and manages the velocity and frequency of
changes allowed based on this service level objective (SLO).
Get to the root of the problem. SREs ensure outages do not recur
by conducting blameless post mortems to get to the root cause of
problems and identify a balanced action plan so that technical debt
doesn’t grow disproportionately.
Learn more
Check out the IBM Cloud Service Management architecture.
https://ibm.biz/csmo-guide-sre

18

How reliable is it?

An engineering-oriented approach to operations, driven
by data.

GET STARTED

Culture changes
Historically, organizational models encouraged domain specific
processes that limited visibility, shared responsibility, and placed
boundaries around teams. Emerging models include new roles and
responsibilities that demand cultural changes. As with any significant change, the senior leadership must understand, support and
help drive the organizational changes and find ways to make the
change successful.
CULTURAL CHANGE STARTS AT THE TOP
Blame free environment. Encourage understanding without blame
so others can learn from mistakes without fear of consequences.
Remove organizational silos. The team owns the deliverables
through the entire lifecycle.
Iterate. Create minimally viable products (MVP) and experiment to
gather feedback and provide a delightful user experience.
Rigid engineering. Fail forward and continue to work on the next
version.
Transparency. Share your data, share your knowledge, and give
people a voice.
Learn more
Check out the Cloud Innovate method Culture practices.
https://ibm.biz/csmo-guide-culture

20

Build an agile Ops team

Manage risk by removing organizational silos and delivering value
at increased speed.

GET STARTED

New roles for operations in
the cloud
When you move to the cloud, the resulting culture change requires
modifications to the structure and roles of your project teams. Some
team members can play multiple roles and groups might be merged
to create a cohesive, diverse squad. When forming the ops side of
your DevOps team, consider the addition of several new roles.
BUILD YOUR OPS TEAM TO FIX PROBLEMS FAST
First responder. Evaluates problems and assigns priority and
urgency. This team member is empowered and skilled to solve
problems, collaborating with others when needed.
Incident commander. Manages the investigation, communication,
and resolution of major incidents.
Subject matter expert. Applies the deep technical skills required to
resolve new and unique application issues.
Site reliability engineer. Takes operational responsibility to support
applications running on the cloud.

Learn more
Check out the Roles in a squad practice.
https://ibm.biz/csmo-guide-roles

22

Diversity is a strength

Ensure the solution does not incur technical debt.

GET STARTED

The Ops in DevOps
DevOps is a set of practices that automate the processes between
development and IT operations teams. The concept is founded
on building a culture of collaboration between development and
operations teams that historically functioned in relative silos. The
promised benefits include increased trust, faster software releases,
ability to quickly solve critical issues, and better manage unplanned
work.
DEV+OPS = TRUST, QUALITY, FASTER VELOCITY
Configuration management & infrastructure as code. Create code
to automate operational tasks, and operating system and host
configurations. Using code makes configuration changes repeatable.
Monitoring & logging. Monitor metrics and logs to see how issues
impact the user experience. Be proactive and fix things before users
are aware an issue exists.
Communication & collaboration. Use tools and automation,
including chat applications, issue or project tracking systems, and
wikis to keep everyone informed.

Learn more
Check out Building a DevOps culture and team.
https://ibm.biz/csmo-guide-devops

24

Teamwork is critical for
success!

Remove the silos. Work as a unified DevOps team.

GET STARTED

IBM Cloud Garage for cloud
management
Your ideas plus IBM’s proven expertise equal great solutions on a
global scale. IBM service management subject matter experts have
extensive experience with managing applications running on the
cloud – private, hybrid and public. Cloud service management and
operations reduces the cost of delivering cloud services and helps
support and justify your investments.
WHEN YOU DON’T KNOW, ASK THE EXPERTS.
Start with an MVP. Understand how to manage and operate on the
cloud. Build your service management minimum viable product
(MVP) and roadmap.
Work with IBM SMEs. Leverage the expertise of IBM SMEs who
have extensive experience defining the processes and using the
tools needed to operate and manage your applications and cloud
environment.
Implement and repeat. Implement your MVP to demonstrate
success quickly. Choose the next MVP on your roadmap and
continue your operations transformation journey.
Learn more
Check out the Garage Offering for Cloud Management.
https://ibm.biz/csmo-guide-garage

26

Design for manageability

Apply the Cloud Innovate method to cloud service
management & operations.

Notes:
Reinvent your Cloud operations
vices/

https://ibm.com/cloud/garage/ser
manage-offering

Get your Cloud service
management and
operations badge
https://ibm.biz/csmo-badge

Explore the IBM Cloud Service
Management Architecture
https://www.ibm.com/
cloud/garage/architectures/
serviceManagementArchitecture

*

Get the Book:
“The Cloud Adoption
Playbook”, available on
amazon.com

https://ibm.biz/cloud-adoptionplaybook

Read the IBM App
modernization field guide!!!

https://www.ibm.com/cloud/gara
content/culture/app-modernizatioge/
nfield-guide/

Notices
© Copyright International Business Machines Corporation 2018, 2019.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your
local IBM representative for information on the products and services currently available in your area. Any reference
to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or
service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user’s responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive, MD-NC119
Armonk, NY 10504-1785
US
The following paragraph does not apply to the United Kingdom or any other country where such provisions are
inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION
“AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions;
therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp.,
registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other
companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml.

IBM CLOUD SERVICE MANAGEMENT
& OPERATIONS

© 2018, 2019 IBM CORPORATION



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Language                        : en-US
Tagged PDF                      : Yes
XMP Toolkit                     : Adobe XMP Core 5.6-c143 79.161424, 2017/09/22-11:52:47
Create Date                     : 2019:01:25 13:33:41-05:00
Metadata Date                   : 2019:01:25 13:33:47-05:00
Modify Date                     : 2019:01:25 13:33:47-05:00
Creator Tool                    : Adobe InDesign CC 13.1 (Macintosh)
Instance ID                     : uuid:957ade88-e0d4-e643-9e08-3f4b61f17553
Original Document ID            : xmp.did:564252dc-00f3-4739-973a-612a18f72362
Document ID                     : xmp.id:21954674-7336-4b79-be7b-3b28c8a4942f
Rendition Class                 : proof:pdf
Derived From Instance ID        : xmp.iid:9f72acb9-a3b1-4cad-a258-dcc05609d11b
Derived From Document ID        : xmp.did:c8853152-97eb-472e-a56c-83edd48a4de2
Derived From Original Document ID: xmp.did:564252dc-00f3-4739-973a-612a18f72362
Derived From Rendition Class    : default
History Action                  : converted
History Parameters              : from application/x-indesign to application/pdf
History Software Agent          : Adobe InDesign CC 13.1 (Macintosh)
History Changed                 : /
History When                    : 2019:01:25 13:33:41-05:00
Format                          : application/pdf
Producer                        : Adobe PDF Library 15.0
Trapped                         : False
Page Layout                     : TwoColumnRight
Page Count                      : 34
Creator                         : Adobe InDesign CC 13.1 (Macintosh)
EXIF Metadata provided by EXIF.tools

Navigation menu