Cisco Ucsm Nagios User Guide

User Manual: Pdf

Open the PDF directly: View PDF .
Page Count: 25

Download
Open PDF In Browser	View PDF

Cisco UCSM Plugin and Addon
For Nagios Core

User Guide
October 8, 2015

Table of Content
1.

OVERVIEW ............................................................................................................................................... 1
1.1
1.2

2.

ACRONYMS AND ABBREVIATIONS .................................................................................................................... 1
SYSTEM REQUIREMENTS ................................................................................................................................ 1

DEPLOYING THE SOLUTION .................................................................................................................. 3
2.1
2.2
2.3

3.

INSTALL PATHS ............................................................................................................................................ 3
INSTALL NAGIOS MONITORING PLUGIN............................................................................................................ 3
AUTO DISCOVERY NAGIOS ADD-ON ................................................................................................................ 4

FEATURES ............................................................................................................................................... 6
3.1
3.2
3.3
3.4

4.

MAPS VIEW ................................................................................................................................................. 6
SERVICE VIEW .............................................................................................................................................. 6
DETAIL FAULT VIEW ..................................................................................................................................... 7
FAULT FOR OPERATIONAL POWER STATE ........................................................................................................ 8

MONITORING PLUGIN ............................................................................................................................. 9
4.1
4.2

5.

PLUGIN SCRIPT ............................................................................................................................................ 9
PLUGIN CLI EXAMPLE ................................................................................................................................. 11

AUTO DISCOVERY ADDON ................................................................................................................... 16
5.1
5.2

6.

WORKING WITH AUTO DISCOVERY ............................................................................................................... 16
ADD SERVICE ............................................................................................................................................. 18

CUSTOMIZING MONITORING PLUGIN .................................................................................................. 19
6.1
6.2
6.3
6.4

CUSTOMIZE INVENTORY INFORMATION .......................................................................................................... 19
CUSTOMIZE STATISTICS INFORMATION .......................................................................................................... 20
CUSTOMIZE FAULT INFORMATION ................................................................................................................. 23
SKIPPING FAULTS ....................................................................................................................................... 23

7.

UNINSTALL ............................................................................................................................................ 23

8.

KNOWN CAVEATS ................................................................................................................................. 23
8.1

Table
Table
Table
Table
Table
Table
Table

FREQUENT SERVICE TIMEOUTS ..................................................................................................................... 23

1
2
3
4
5
6
7

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

List of Tables

: Acronyms and Abbreviations ........................................................................................... 1
: CSV to CLI parameter mapping ........................................................................................ 4
: Cisco UCSM Fault Severity to Nagios State Mapping ................................................... 7
: Plugin Argument Parameters ......................................................................................... 10
: Host and Service Mapping ............................................................................................... 17
: Auto discovery CLI options ............................................................................................. 17
: Auto discovery CFG file options .................................................................................... 18

List of Figures

1 : UCS Domain Map in Nagios .............................................................................................. 6
2 : UCS Service Overview in Nagios .................................................................................... 6
3 : UCS Service View in Nagios ............................................................................................. 7
4 : UCS Fault Details View in Nagios ................................................................................... 8
5 : UCS Fault View for Operation Power State in Nagios ............................................... 8
6 : Custom Service under domain host ............................................................................ 19
7 : Custom Service with its own host ............................................................................... 19
8 : Custom Inventory Information ..................................................................................... 20
9 : Performance Data ........................................................................................................... 21
10 : Graph plotted using performance data ................................................................... 23
11 : Custom fault details ..................................................................................................... 23

User Guide

Overview

1. Overview
Data center administrators have been using Nagios for more than a decade now and it has
emerged as one of the favorite open source tool for the Data Center monitoring.
Nagios is an open source computer system monitoring, network monitoring and infrastructure
monitoring software application. Nagios offers monitoring and alerting services for servers,
switches, applications, and services.
The solution provides end-user with two primary components. The first is the Nagios monitoring
plugin script which will provide end-user with the capability of monitoring the UCS domains.
The second is an add-on to the Nagios, which will provide end-user with the capability to auto
discover UCS domains.

1.1 Acronyms and Abbreviations
The following table describes the acronyms and abbreviations used in the document.

Abbreviation

Translation

DNS

Domain Name Server

FQDN

Fully Qualified Domain Name

UCS

Unified Computing System

FI

Fabric Interconnect

IOM

I/O Module

FEX

Fabric Extender

NAGIOS_HOME

Nagios install directory.

NAGIOS_ETC_DIR

Nagios etc directory path

NAGIOS_PLUGIN_DIR

Nagios plugin directory path

NAGIOS_LOGOS_DIR

Nagios logos image directory
Table 1 : Acronyms and Abbreviations

1.2 System Requirements
The Nagios must meet the below mentioned minimum requirements for this solution to work





Operating System – Nagios supported Linux server.
o http://www.nagios.org/about/propaganda/distros
Nagios Core
o http://www.nagios.org/download/core/thanks/?t=1398749242
Latest Nagios Plugins
o http://www.nagios.org/download/plugins/
Latest UCS Python SDK 0.8.3 or higher
o https://communities.cisco.com/docs/DOC-36899

1

User Guide

Deploying the solution

2. Deploying the solution


The solution is provided in the tar gzip format which can be easily extracted in any of the Nagios
supported Linux distributions.



The tar gzip file extracts to a folder named cisco_nagios and contains following three files



INSTALL – This file guides end user on how to install the solution in an existing Nagios installation



cisco-ucs-nagios-x.x.x.tar.gz – This tar gzip contains the Cisco UCS monitoring plugin and
autodiscovery add-on tar gzip.



installer.py – This is installer script which uses the cisco-ucs-nagios-x.x.x.tar.gz tar gzip and
installs the solution as per the user environment.

2.1 Install paths
Installation requires that you know the paths to the following locations and they will
depend on your Nagios installation as your environment. Please check with your Nagios
administrator for more information.
Listed below are typical install locations and directories for different Linux distributions


For Debian/SUSE
o

The Nagios home directory
NAGIOS_HOME=/etc/nagios3

o

Nagios etc directory that has nagios configuration files
NAGIOS_ETC_DIR=/etc/nagios3

o

Nagios plugin directory that has all the Nagios plugin
NAGIOS_PLUGIN_DIR=/usr/lib/nagios/plugins

o

Nagios logos directory
The logos directory is generally a part of CGI configuration file and the root path to the logos
directory is denoted by ‘physical_html_path'.
Appending 'images/logos' to the value of the above variable provides us the logos
directory path for Nagios.
The cgi.cfg file can be found in NAGIOS_ETC_DIR
NAGIOS_LOGOS_DIR=/usr/share/nagios3/htdocs/images/logos/



In other Linux variants, typical paths can be
NAGIOS_HOME=/usr/local/nagios
NAGIOS_ETC_DIR=/usr/local/nagios/etc
NAGIOS_PLUGIN_DIR=/usr/local/nagios/libexec
NAGIOS_LOGOS_DIR=/usr/local/nagios/share/images/logos/

2.2 Install Nagios Monitoring Plugin
Following are the steps for installing the Nagios monitoring plugin.
a. Extract the installation tar gzip file in a temporary location.
# tar zxvf cisco-ucs-nagios-x.x.x.tar.gz
b. Now run the installer, which should be present in the extracted folder.
# ./installer.py
c. Installer auto detects various install paths and prompt with default options for installing this plugin.

3

User Guide

Deploying the solution

d. Installer also updates the configuration files which are required for the working of this plugin. It
prompts and creates the backups of all the files which will be modified in this process
e. By default installer will install the monitoring plugin along with the auto discovery scripts. In
case, only monitoring plugin is to be installed then use the '--plugin' option
# ./installer.py --plugin

2.3 Auto Discovery Nagios Add-on
1. In the autodiscovery directory, add/Update UCSHostInfo.csv with the UCS domain
IP/FQDN and login credentials. User can also use the CLI parameters if a single
domain or a range of IPs needs to be discovered.
Below is the parameter mapping of CSV file to CLI parameters:

CSV Parameter
HostName
User
Password
Port
NoSSL(True/False)
Proxy URL

CLI Parameter
-H, --host
-u, --user
-p, --password
-n, --port
--NoSsl
--proxy

Table 2 : CSV to CLI parameter mapping
The servers that are defined in this CSV/CLI will be discovered and added to the Nagios
for monitoring.
Example CSV:
,,,,,
10.65.183.10,admin,password,80,True
10.65.183.5,admin,password,80,True,http://proxy.ip.com:8080
10.65.183.5-10,admin,password

The HostName, User and Password fields are mandatory for the auto discovery to
discover the UCS domain.
User can provide IP range in the hostname. Auto-Discovery script allows range
definition by passing “–“in the fourth octet. For all IPs in that range, connection
parameters will be same i.e. the username, password, port, SSL and proxy data if
applicable.
As given in the above CSV entries, there is an entry “10.65.183.5-10”.
This range will be expanded by Auto-discovery script as shown below:
10.65.183.5,admin,password
10.65.183.6,admin,password
...
...
10.65.183.10, admin, password

Note:
1. In case user password contains any special character then it has to be
provided in double quotes.

Example
10.65.183.16,admin,"pass,word"
4

User Guide

Deploying the solution

10.65.183.16,admin,"My_password"
2. In case user password contains "(double quotes) then it has to be
escaped by another "(double quotes).
Example
If password is my"password then we will write the same in csv file

as
10.65.183.16,admin,"my""password"
3. In case user is a domain user then the user field should be
defined as “ucs-\”
Example
xxx.xxx.xxx.xxx,"ucs-somedomain\user",Test12345
4. Giving IP range is allowed only for IPv4 addresses.
2. Now run the auto discovery script



If using CSV file:
#./NagiosAutoDiscoveryUCS.py



If using CLI parameters:
#./NagiosAutoDiscoveryUCS.py


–H



-u  -p

Note:
CLI parameters will be given preference over CSV file, i.e. if the Host Parameter
via CLI is given then script will skip reading the CSV file.

5

User Guide

Features

3. Features
Once the installation is complete and auto discovery is executed, user can now see the
components of the Cisco UCS domains which are discovered.

3.1 Maps View
The discovered UCS domain will be displayed in the Map section as shown below.

Figure 1 : UCS Domain Map in Nagios

3.2 Service View
The monitoring plugin for UCSM not only monitors the health status of the components in the UCS
domain but also provides the relevant inventory information in case no fault has occurred on the
given component.
The default monitoring service for the rack server checks for all the faults that may have occurred
for the given server.

Figure 2 : UCS Service Overview in Nagios

6

User Guide

Features

Figure 3 : UCS Service View in Nagios
Based on the UCSM fault information the plugin decides on the Nagios service state.
The following table shows the mapping of the severity levels of the Cisco UCSM faults to Nagios
States.

UCSM Fault Severity

Nagios States

Critical and Major

CRITICAL

Minor and Warning

WARNING

Info and Cleared

OK

Table 3 : Cisco UCSM Fault Severity to Nagios State Mapping
Note:
User can modify this mapping by updating the below lists present in the plugin
configuration file. Any fault severity which is not mentioned in this list will be considered
in Nagios “OK” state.
# Define User Mapping for Nagios Critical and Warning
# with UCS fault states. It is a regex based mapping
NAGIOS_CRITICAL=critical|major
NAGIOS_WARNING=minor|warning

3.3 Detail Fault View
The monitoring plugin for Cisco UCSM will fetch the relevant faults details for a given dn or
class.
For example a fault as major on the chassis will be depicted as critical and the fault details
will be shown on the service state information page of the Nagios service.

7

User Guide

Features

Figure 4 : UCS Fault Details View in Nagios

3.4 Fault for Operational Power State
The plugin has the capability to detect and show faults for the UCS blades and Rack
servers depending on their Operating Power State.
When a service check is run on the blade or rack unit, the plugin internally validates the
power state of that blade and returns a fault message if it’s in an unhealthy condition.
This validation of power state is only carried if the blade/rack server is in associated
condition i.e. the blade has some service profile attached to it. If the blade is not
associated the plugin will skip the power state validation process and treat it as healthy.
The fault state of a blade is controlled by below entries in the “cisco_ucs_nagios.cfg” file:
POW_STATE_CRITICAL= failed|degraded|error|unknown|not-supported
POW_STATE_WARNING= off|offline|offduty|power-save|test
POW_STATE_HEALTHY= on|online|ok

User can configure different states as CRITICAL, WARNING and HEALTHY by
modifying this list.
The plugin will show a fault if the operating state is from “CRITICAL” or “WARNING” list
otherwise will not display anything if it’s in healthy state.

Figure 5 : UCS Fault View for Operation Power State in Nagios

8

User Guide

Monitoring Plugin

4. Monitoring Plugin
4.1 Plugin Script
As per the Nagios standards, the Cisco UCS Nagios monitoring plugin takes multiple standard
inputs like the host information, connection information and service status criteria. The plugin is
named as “cisco_ucs_nagios” and can take the following cli inputs
Argument Name
-H/ --host
-n/--port
--NoSsl
-u /--user

Description
IP/FQDN of a UCS domain
UCS domain http/https port
If defined this flag will turn off the secure
communication (SSL) with the UCS domain
UCS domain login user name.
Note: In case user is a domain user then the user
field should be defined as
“ucs-\”
User needs to prefix the domain with "ucs-"

Remarks
Required
Optional
Optional
Required

Example
"ucs-QALAB\admin"
-p/--password

UCS domain login password

--passEnc

Base64 encoded password for internal framework
communication. Services added by auto discovery
uses this variable.

-t/-type

-a/ --attribute

Query type, i.e class or dn.
Example
-t class or
--type dn.
Value of query class or dn.
Example
if -t was dn then -q should be a dn like
–q sys/chassis-1,sys/chassis-1/blade-1
If -t was class then -q should be
--query EquipmentBlade or
–q EquipmentFan
Attribute that user want to fetch value for. User can
either provide just the attribute name or user can
provide the attribute name and the user given name
which will be displayed on the Nagios web UI.
So, user can either provide just the
-a  or
--attribute :
If  is not provided then the  will be displayed in the Nagios web UI along
with its value.
If  is provided the web UI will
display the  along with the
attribute value.

-w/ --warn

User defined Warning level

-c/ --critical

User defined Critical level

-r/ --regex

Regular Expression for matching the pass condition
which will result in service status in Nagios as ‘OK’
else it will be ‘CRITICAL’
Proxy URL for connecting to the UCS domains.

-q/--query

--proxy

Required, if --passEnc is
not provided.Either of the
one should be provided.
Required, if -p or -password is not provided.
Either of the one should
be provided.
Required

Required

In case user provide -r,-w
or –c options then it is
required else optional.

In case user provides a –
c option then this is
required else optional
In case user provides a –
w option then this is
required else optional.
Optional

Optional

9

User Guide

Monitoring Plugin

Example
--proxy http://:
--proxy http://user:pass@:
-R / --inHierarchical

If specified this will provide a hierarchical overall
health status for all the elements under the given
class or dn. The information that user may want
from this option can be controlled via
CLASS_FILTER_LIST parameter defined in the plugin
configuration file.

Optional

--verbose

If specified it will work with inHierarchical flag and
will provide a detailed status information for all the
subcomponents which may be there in the provided
dn or class.
If specified it will work with inHierarchical flag and
will look for fault details under the given class or dn.
It is quick way for checking the overall status of the
given dn or class
Provide
a
filter
string
in
the
format
attribute:value. This filter is only valid for type
class and will apply on the class provided in the CLI.
User can provide wildcard filter which uses standard
regular expression syntax.
Example
--filter dn:sys/chassis-1/blade-1
--filter dn:sys/chassis-1/blade-1/*
--filter dn:^sys/chassis-1$
If defined it will print the xml communication
between the plugin and UCS domain. It is also
helpful for detailed debugging in case of any error
that may have occurred while using this CLI. Use
this for debug purpose only.
If specified it will try and reuse an existing UCSM
connection for a specified user.
If the connection does not exist, then a session will
be created and left for other processes to reuse this
connection again.
Else, if not defined, plugin will create a new user
session every time and will destroy this after each
run.
If this flag is specified this will provide performance
data for the Nagios to use to plot graphs. This flag
can only be used when -a option is used.
If specified, it will print the current Cisco UCSM
Nagios Monitoring plugin version.
NOTE: Any other options will be ignored.
Prints the help content and the plugin input
arguments supported

Optional

-F / --faultDetails

-f / --filter

--debug

-S/--useSharedSession

--getPerfStats

--version

-h/--help

Optional

Optional

Optional

Optional

Optional

Optional

Optional

Table 4 : Plugin Argument Parameters
There are multiple ways in which this script can work. For example in a conventional way, user can
provide a range for warning or critical values and based on the given values the plugin script can
decide the service state.

# cisco_ucs_nagios -u  -p  -H  -t
class -q equipmentFanStats -a speed -w 3000 -c 4000
Else user can also provide a regular expression as OK or CRITICAL criteria.

# cisco_ucs_nagios -u  -p  -H  -t dn
-q sys/chassis-1/blade-2/health-led -a color -r green

10

User Guide

Monitoring Plugin

By default the script uses the Cisco UCSM faults as the basis for returning the service state. Here
user can just pass a dn or class as query and the plugin script will return CRITICAL, WARNING or
OK as per the faults found on that dn or class.

# cisco_ucs_nagios -u  -p  -H  -t dn
-q sys/chassis-1/blade-2
So based on the query it will fetch all the related faults and if this query has a critical fault then the
plugin script will return the service as CRITICAL.

In case there is no fault in the query passed then Nagios plugin script will fetch the
relevant inventory information and will display the same on the Nagios web UI or CLI.
Note: In one of the CLI combination where end user passes the ‘--inHierarchical’ flag with ‘-verbose’ flag, user may get lot of information as per the query passed.
To help end user with limiting the required information we have provided a filter variable named
CLASS_FILTER_LIST where end user can provide name of those sub classes that user want the
information for.
So for example, for ComputeBlade class there are number of subclasses like
BiosVfConsoleRedirection, ComputeBoard,
MemoryArray,
MemoryUnit,
BiosUnit,
MgmtController, AdaptorHostEthIfFsm etc to name some.
User may only be interested in say ComputeBoard, MemoryArray and MemoryUnit then these
classes can be defined in this filter list and the plugin will then only display the status information
only related to these three classes.

4.2 Plugin CLI Example
Below are some examples of different CLI options that can be used for fetching different type of
information and status for a given query (dn or class).
CLI (DN as input) – This will provide status for only the given DN
# cisco_ucs_nagios -u  -p  -H  -t "dn" -q "sys/chassis1"
Output
sys/chassis-1:OK - partNumber : 68-4777-02,serial : FOX1721GVH5

CLI (Class as input) – This will provide the status for all chassis objects in given UCS domain.
# cisco_ucs_nagios -u  -p  -H  -t "class" -q
"EquipmentChassis"
Output
sys/chassis-1:CRITICAL CRITICAL - sys/chassis-1-Power state on chassis 1 is redundancy-failed
sys/chassis-2:CRITICAL CRITICAL - sys/chassis-2-Power state on chassis 2 is redundancy-failed
==== Fault # 1 ====
Dn : sys/chassis-1/fault-F0408
Descr : Power state on chassis 1 is redundancy-failed
severity : major
Cause : power-problem
Type : environmental
Created : 2013-09-16T23:42:17.258

11

User Guide

Monitoring Plugin

==== Fault # 2 ====
Dn : sys/chassis-2/fault-F0408
Descr : Power state on chassis 2 is redundancy-failed
severity : major
Cause : power-problem
Type : environmental
Created : 2013-08-19T18:32:16.878

CLI (with –a, -w and –c) – Here the end user can provide a warning and a critical value for a given
attribute. Based on these inputs the plugin will return the service status as per the attribute value.
# cisco_ucs_nagios -u  -p  -H  -t "class" -q
"EquipmentFanStats" -a SpeedAvg -w 3600 -c 3700
Output
Overall Status : CRITICAL WARNING - sys/chassis-1/fan-module-1-1/fan-1/stats - SpeedAvg : 3696
CRITICAL - sys/chassis-1/fan-module-1-1/fan-2/stats - SpeedAvg : 3828
OK - sys/chassis-1/fan-module-1-2/fan-1/stats - SpeedAvg : 3520
CRITICAL - sys/chassis-1/fan-module-1-2/fan-2/stats - SpeedAvg : 3740
WARNING - sys/chassis-1/fan-module-1-3/fan-1/stats - SpeedAvg : 3608
WARNING - sys/chassis-1/fan-module-1-3/fan-2/stats - SpeedAvg : 3696
OK - sys/chassis-1/fan-module-1-4/fan-1/stats - SpeedAvg : 3564
CRITICAL - sys/chassis-1/fan-module-1-4/fan-2/stats - SpeedAvg : 3740
WARNING - sys/chassis-1/fan-module-1-5/fan-1/stats - SpeedAvg : 3652
CRITICAL - sys/chassis-1/fan-module-1-5/fan-2/stats - SpeedAvg : 3828
OK - sys/chassis-1/fan-module-1-6/fan-1/stats - SpeedAvg : 3476
WARNING - sys/chassis-1/fan-module-1-6/fan-2/stats - SpeedAvg : 3696
WARNING - sys/chassis-1/fan-module-1-7/fan-1/stats - SpeedAvg : 3608
CRITICAL - sys/chassis-1/fan-module-1-7/fan-2/stats - SpeedAvg : 3784
OK - sys/chassis-1/fan-module-1-8/fan-1/stats - SpeedAvg : 3520
OK - sys/chassis-1/fan-module-1-8/fan-2/stats - SpeedAvg : 3564

CLI (with –a and –r) – The end user can provide a regular expression for a given attribute and
based on these inputs the plugin will return if the service status is OK or in a CRITICAL state.
Output
# cisco_ucs_nagios -u  -p  -H  -t "dn" -q sys/switchA/slot-1/switch-ether/port-7 -a operState -r up
CRITICAL - sys/switch-A/slot-1/switch-ether/port-7 - operState : sfp-not-present

CLI (with -a and --getPerfStats) – The end user can provide the getPerfStats flag with attribute
option. When this flag is set then the CLI will return the performance data appended to the other
output via a pipeline “|”.
Output
# cisco_ucs_nagios -u "admin" -p "Nbv12345" -H "10.65.183.5" -t "dn" -q "sys/switch-A/fanmodule-1-2/fan-2/stats" -a "speed" –getPerfStats

12

User Guide

Monitoring Plugin

OK - sys/switch-A/fan-module-1-2/fan-2/stats - speed : 9215|speed=9215

CLI (with –a, -w, -c and --getPerfStats) – Here the end user can provide a warning and a critical
value for a given attribute. Based on these inputs the plugin will return the service status as per
the attribute value. With getPerfStats flag the attribute value and the warning and critical values
are used to return the performance data.
Output
# cisco_ucs_nagios -u "admin" -p "Nbv12345" -H "10.65.183.5" -t "dn" -q "sys/switch-A/fanmodule-1-2/fan-2/stats" -a "speed" -w 10000 -c 12000 --useSharedSession –getPerfStats
OK - sys/switch-A/fan-module-1-2/fan-2/stats - speed : 9215|speed=9215;10000;12000;
CLI (with --inHierarchical) – This will provide an overall hierarchical overview of health status for
the given query
# cisco_ucs_nagios -u  -p  -H  -t "dn" -q
"sys/chassis-1" --inHierarchical
Output
Overall Health Status:CRITICAL EquipmentChassis (sys/chassis-1) - CRITICAL - Power state on chassis 1 is redundancy-failed
*** Hierarchical Fault Filtering ON ***
Please Check CLASS_FILTER_LIST property.

CLI (with –inHierarchical and --verbose) – This will provide a detailed status for a given query
and the output details can be controlled via CLASS_FILTER_LIST property as detailed in Section 5
note.
# cisco_ucs_nagios -u  -p  -H  -t "dn" -q "sys/chassis1/blade-2" –inHierarchical --verbose

Output
Overall Health Status:OK ComputeBlade (sys/chassis-1/blade-2)- OK
LsbootDef (sys/chassis-1/blade-2/boot-policy)- OK
ComputeBoard (sys/chassis-1/blade-2/board)- OK
MemoryArray (sys/chassis-1/blade-2/board/memarray-1)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-12)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-11)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-10)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-9)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-8)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-7)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-6)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-5)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-4)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-3)- OK
MemoryUnit (sys/chassis-1/blade-2/board/memarray-1/mem-2)- OK
… (text truncated)
*** Hierarchical Fault Filtering ON ***

13

User Guide

Monitoring Plugin

Please Check CLASS_FILTER_LIST property

CLI (with –inHierarchical and --faultDetails) – This will fetch all the components which are faulty
(WARNING or CRITICAL) with their fault details. And will display state OK with inventory
information for the parent class if no fault has been found in its hierarchy.
# cisco_ucs_nagios -u  -p  -H  -t "class" -q
"computeBlade" –inHierarchical --faultDetails
Output
Overall Health Status:WARNING ComputeBoard (sys/chassis-1/blade-1/board) - WARNING - TCA: computeMbPowerStats
consumedPower, current-value = 131.745239, raised above esc value 75.000000
sys/chassis-1/blade-2 -OK - TotalMemory : 32768,AssignedToDn : org-root/lsChassis_1_Blade_2,PartNumber : 74-7334-02,NumOfCpus : 2,NumOfCores : 12

==== Fault Detail ====
Dn : sys/chassis-1/blade-1/board/fault-F35962
Descr : TCA: computeMbPowerStats consumedPower, current-value = 131.745239, raised above
esc value 75.000000
severity : warning
Cause : threshold-crossed
Type : server
Created : 2015-09-18T14:59:25.678

*** Hierarchical Fault Filtering ON ***
Please Check CLASS_FILTER_LIST property.
CLI (with --filter) – With this option, user can provide a class attribute as a filter string to the
plugin CLI. This option helps in reducing the monitoring scope of the plugin.
Like for example, if user wants to monitor ‘processorUnit’ health for all the blades in chassis 1,
then user can define the plugin cli as
# cisco_ucs_nagios -u  -p  -H  -t class -q
processorUnit --filter dn:sys/chassis-1/*
Output
sys/chassis-1/blade-1/board/cpu-1:OK - Cores : 4,Model : Intel(R) Xeon(R) CPU
2.27GHz,CPU Speed(Mhz) : 2.266000

E5520 @

sys/chassis-1/blade-6/board/cpu-2:OK - Cores : 4,Model : Intel(R) Xeon(R) CPU
2.27GHz,CPU Speed(Mhz) : 2.266000

E5520 @

sys/chassis-1/blade-6/board/cpu-1:OK - Cores : 4,Model : Intel(R) Xeon(R) CPU
2.27GHz,CPU Speed(Mhz) : 2.266000

E5520 @

The filter uses wildcard filtering hence user can provide standard regular expression syntax which
can be used to fetch the desired results.
Another example can be to fetch health status for chassis 1.
As there can be more than one chassis in a UCS domain hence a simple filter like
“dn:sys/chassis-1” may end up matching all chassis-10,chassis-11…chassis-19.
In any such cases it is recommended that user should anchor the filter in between ^ and $
expression, like “dn:^sys/chassis-1$”. This filter will only match chassis-1 now.
# cisco_ucs_nagios -u  -p  -H  -t class -q
equipmentChassis --filter dn:^sys/chassis-1$
Output
sys/chassis-1:CRITICAL CRITICAL - sys/chassis-1-Current connectivity for chassis 1 does not match discovery policy:
unsupported-connectivity

14

User Guide

Monitoring Plugin

CRITICAL - sys/chassis-1-Power state on chassis 1 is redundancy-failed
(text truncated)
Although the above result can easily be achieved by using query type as ‘dn’ and query as
‘sys/chassis-1’.
# cisco_ucs_nagios -u  -p  -H  -t dn -q sys/chassis-1
Output
sys/chassis-1:CRITICAL CRITICAL - sys/chassis-1-Current connectivity for chassis 1 does not match discovery policy:
unsupported-connectivity
CRITICAL - sys/chassis-1-Power state on chassis 1 is redundancy-failed
(text truncated)
NOTE: Filter option only work with query type class..

15

User Guide

Auto Discovery Addon

5. Auto Discovery Addon
5.1 Working With Auto Discovery
Currently auto discovery addon creates host and services in the Nagios system as
per the details provide in the table below.
Host

UCS Domain

Chassis

Services
Ping
UCS
Domain
virtual IP

Check
Chassis

Check PSU

Fex

Check Fex

FI

Check FI

IOM

Check IOM

Blade Server

Check Blade
Server

Check CPU

Check
Memory

Rack Server

Check
Server

Rack

Check CPU

Check
Memory
Check PSU

Service Details
This service will check the ping to the given UCS domain.

This service checks overall faults status of the UCS chassis
and will return Nagios state accordingly. If there are no faults
on the UCS chassis then it will display the inventory
information for the same.
This service checks just the faults that may have occurred on
the chassis PSU’s and will return Nagios state accordingly. If
there are no faults on the chassis PSU(s) then it will display
the inventory information for the same.
This service checks overall faults status of the UCS FEX and
its sub components. Based on this service will return Nagios
state accordingly. If there are no faults on the UCS FEX then
it will display the inventory information for the same.
This service checks overall faults status of the UCS FI switch
and its sub components. Based on this service will return
Nagios state accordingly. If there are no faults on the UCS FI
switch then it will display the inventory information for the
same.
This service checks overall faults status of the UCS Chassis
IOM module and its sub components. Based on this service
will return Nagios state accordingly. If there are no faults on
the UCS Chassis IOM module then it will display the
inventory information for the same.
This service checks overall faults status of the UCS blade
server and its sub components. Based on this service will
return Nagios state accordingly. If there are no faults on the
UCS blade server then it will display the inventory
information for the same.
This service checks all the faults that may have occurred on
the blade CPU(s) and will return Nagios state accordingly. If
there are no faults on the blade CPU(s) then it will display
the inventory information for the same.
This service checks overall faults status of the memory array
units and its sub components. Based on this service will
return Nagios state accordingly. If there are no faults on the
memory array unit then it will display the inventory
information for the same.
This service checks overall faults status of the UCSM
managed C-series rack server and its sub components.
Based on this service will return Nagios state accordingly. If
there are no faults on the UCSM managed rack server then it
will display the inventory information for the same.
This service checks all the faults that may have occurred on
the rack CPU(s) and will return Nagios state accordingly. If
there are no faults on the rack CPU(s) then it will display the
inventory information for the same.
This service checks overall faults status of the memory array
units and its sub components. Based on this service will
return Nagios state accordingly. If there are no faults on the
UCS rack memory array then it will display the inventory
information for the same.
This service checks just the faults that may have occurred on

16

User Guide

Auto Discovery Addon

the rack PSU(s) and will return Nagios state accordingly. If
there are no faults on the PSU(s) then it will display the
inventory information for the same.

Table 5 : Host and Service Mapping
The auto-discovery add-on script can either be manually invoked or can be added to
a cron job for periodic inventory checks. This script can take the following optional
inputs from the end user.
Argument Name
-f / --csvFile

-r /--restartService

-H / --Host

-u / --user

Description
User can provide an absolute path with the UCS host csv file name. If this option
is not provided then by default the path is taken as the current working directory
and filename is UCSHostInfo.csv
This option provides end user with the flexibility of restarting the Nagios service
after the auto discovery is finished. The default is not to restart the Nagios
service. If user wants to restart the service he can just pass this option in the
CLI.
IP/FQDN of a UCS domain.
Note: If this option is given the script will skip the CSV file and only discover the
Host provided by this option.
UCS domain login user name.
Note: In case user is a domain user then the user field should be defined as
“ucs-\”
User needs to prefix the domain with "ucs-"
Example
"ucs-QALAB\admin"

-p / --password
-n / --port
--NoSsl
--proxy

-D/--disablehostgroup-creation
-S, --singleHost
--debug

--version
-h/--help

Specifies UCS users password to login to the server.
This specifies the UCS Manager http/https port.
If defined this flag will turn off the secure communication (SSL) with the UCS
domain.
User can specifies a proxy url that contains proxy connection and optionally
authentication information.
Example
--proxy http://:
--proxy http://user:pass@:
This option provides user a way to disable default host groups creation via autodiscovery.
If defined this flag will disable multiple host creation and will just create one host
i.e UCS Domain and put all discovered services under it.
If defined it will print the xml communication between the plugin and UCS
domain. It is also helpful for detailed debugging in case of any error that may
have occurred while using this CLI. Use this for debug purpose only.
If specified, it will print the current Cisco UCSM Nagios Auto Discovery version.
NOTE: Any other options will be ignored.
Prints the help content and the cli input arguments supported.

Table 6 : Auto discovery CLI options

Argument Name
ASSOCIATED_BLADES_ONLY

Description
This flag toggles the Auto-Discovery add-on
behavior to discover the blades which are
associated with some service-profile.
If set to “FALSE” then all blades will be discovered.
If set to “TRUE” then blades with service profile will
be discovered and others will be skipped.

REMOVE_OLD_DISCOVERY

This flag in the configuration file sets the add-on

17

User Guide

Auto Discovery Addon

script behavior to either remove old discovered
device or keep old discoveries and update only the
new UCS domains.
If this flag is set to False, then add-on will not
remove the previously discovered objects.
If this flag is set to True, then the old discoveries will
be removed. This is the default behavior.

Table 7 : Auto discovery CFG file options
If the script is invoked without using the “-r /--restartService” cli options then, at the
end of the discovery process it will prompt end user for input on restarting the Nagios
service.

5.2 Add Service
As an advance feature in the auto discovery add-on, user can define his own services around the
UCS components.
This can be done by editing the 'NagiosAutoDiscoveryUCS.cfg' and updating the different service
variables which are defined as ‘Service_’ suffixed by the class name of the components.
For example:
 Service_EquipmentChassis
 Service_ComputeBlade
 Service_EquipmentFex
 Service_EquipmentNetworkElementFanStats
Here user can provide his own service name and service class or dn which is restricted to sub
classes or dn of the above said classes. Optionally, user can also provide various cli options that
user want to pass to the monitoring plugin script.
This value should be in the following format
: ,::
For example user can update the NagiosAutoDiscoveryUCS.cfg with the following custom service
list like
Service_ComputeRackUnit = Fault Status:ComputeRackUnit:"--inHierarchical -faultDetails", Processors:processorUnit, Memory:memoryArray,
Adaptor:adaptorUnit
Service_EquipmentNetworkElementFanStats=Check Fan
Stats:EquipmentNetworkElementFanStats:"--useSharedSession -a speed -w 10000 -c
12000 --getPerfStats"

If user wants to get these services to be discovered by Auto-Discovery AddOn, then the
Class Name should be added in the “DISCOVERY_CLASS_LIST” present in the same
configuration file. This entry needs to be made when user is adding a new entry of
“Service_” and wants these services to be discovered.
#This List defines the Class list for which Services need to be
discovered.
#DISCOVERY_CLASS_LIST=EquipmentChassis,ComputeBlade,EquipmentFex,…,
DISCOVERY_CLASS_LIST=EquipmentChassis,ComputeBlade,...,EquipmentNetwork
ElementFanStats

18

User Guide

Customizing Monitoring Plugin

By default all the services created via new “Service_” will be kept under the Domain host.
So, now when the auto discovery is executed again the following list of services will
appear in the Nagios web UI.

Figure 6 : Custom Service under domain host
If user wants that these new services should have their own host and should not be
placed under domain. Then an entry for the class should be made under
“HOST_CLASS_LIST” parameter present in the same configuration file.
#This List defines the Class Names which can have Nagios Host Created.
HOST_CLASS_LIST=EquipmentChassis,ComputeBlade, EquipmentFex
,...,

Now on re-running the auto-discovery process these new services will be placed under
a new host of that class.

Figure 7 : Custom Service with its own host

6. Customizing Monitoring Plugin
6.1 Customize Inventory information
Another feature in the plugin is a provision wherein user can select the required valid
attribute(s) for a given class and provide a user friendly name to these attribute that user
wants to see on the Nagios Web UI. This can be configured via the configuration file
“cisco_ucs_nagios.cfg” which is found in the same location as that of the monitoring
plugin.
For fetching Inventory attributes from the class user needs to provide "Inv_" as prefix
followed by the class name as the variable name and the list of attributes as its value.
So the configuration property string will be of the following type
Inv_=,:,

Here user can also customize the attribute name that user wants to see on the Nagios
Web UI. So for example, if the attribute name is say ‘OperPower’ and user wants that to be
seen as say ‘Power(W)’, then user can define the attribute first followed by a colon “:” and
the name that user wants to see on the Nagios Web UI.

19

User Guide

Customizing Monitoring Plugin

Like OperPower:Power(W)
So a complete example for a class ‘ComputeRackUnit’, the entry may look like
Inv_ComputeRackUnit=Serial,Uuid,Model,Vendor,OperPower:Power(W),TotalMemory:Memo
ry(MB),NumOfCores:Cores,NumOfCpus:CPUs

Figure 8 : Custom Inventory Information

6.2 Customize Statistics Information
The plugin provides user with flexibility to select the required valid attribute(s) for a given
class to be used as performance data. This can be configured via the configuration file
“cisco_ucs_nagios.cfg” which is found in the same location as that of the monitoring
plugin.
One could add entries in this configuration file for getting performance stats for specific
“Class” by adding at least one of its attribute.
The basic format of the entry is as shown below:
Stats_= ;;;;;

: It’s the name of the class for which the stats needs to be generated.
The plugin will look for this entry and read the given parameters.
: This is a MANDATORY parameter. This will be one of the valid
attribute from the selected class. This attribute should return a numeric value as graphs
are plotted against the numeric values only.
One could also give an optional name to this attribute by writing this name after “:”. If this
optional name is given then this will be shown as the label instead of the “attribute
name”. Below is an example for it.
Stats_MemoryArray=CurrCapacity:”Current Capacity(MBs)”
 =  : 

: Unit of measurement. It’s the unit associated to the value of the attribute. This
field is OPTIONAL and can be left blank. It can have one of the following values.
a. no unit specified - assume a number (int or float) of things (eg, users,
processes, load averages)
b. s - seconds (also us, ms)
c. % - percentage
d. B - bytes (also KB, MB, TB)
e. c - a continuous counter (such as bytes transmitted on an interface)
Note:
Allowed Unit of measurement is controlled by “STATS_UOM_LIST” parameter
present in configuration file. User can update the list according to the use.
#User can append more "Unit Of Measurements" which they want to allow in
getting performance statistics.
STATS_UOM_LIST =%,s,us,ms,c,B,KB,MB,TB

20

User Guide

Customizing Monitoring Plugin

,,, : These are OPTIONAL parameters. These parameters
can either be a numeric value or a parameter of the class returning some numeric
number. All these parameters should be in same “Unit of Measurement” as that of label.
If any of these value is not present or user does not want to set any parameter for them
then he can leave these fields blank.
Warn – It sets the warning threshold in graphing the stats for that attribute.
Crit – it sets the Critical level threshold in graphing the stats for that attribute.
Min – This field sets the minimum possible value for the selected attribute.
Max – This field sets the maximum possible value for the selected attribute.
Below are few possible ways to write Stats Class definition in the configuration file.
# Only attribute defined all optional parameters skipped.
Stats_ProcessorUnit=Speed
#Optional name for attribute has been mentioned.
Stats_MemoryArray=CurrCapacity:Current Capacity(MB);B;;;;maxCapacity
#Few optional parameter skipped
Stats_EquipmentNetworkElementFanStats=Speed;;;;speedMin;speedMax

#All optional parameters provided.
Stats_EquipmentNetworkElementFanStats=Speed;c;10000;20000;speedMin;speedMax
Now when Nagios service is called and uses one of these Stats “class”, then with the
normal inventory related data, the plugin will also return the listed attribute as
performance stats.
Below is the CLI output of a service call:
# ./cisco_ucs_nagios - u  -p  -H  -t
class -q EquipmentNetworkElementFanStats --filter dn:sys/switch-A/fan-module-11/fan-1/stats
sys/switch-A/fan-module-1-1/fan-1/stats:OK - Speed : 9981,airflowDirection :
FrontToBack,speedMin : 10953,speedMax :
11226|Speed=9981;10000.0;12000.0;;;
Here the attribute “Speed” after ‘|’ is the performance stat for this service. When such a
service is run in Nagios, then this performance stat is stored in historical information
database which then a third party graphing tool uses to populate graphs.
On Nagios GUI “Performance Data” field in the service gets populated when this service
is run.

Figure 9 : Performance Data
Note:


The plugin follows Nagios generic guidelines for generating performance data.



User can install any third party graphing plugin from Nagios Communities to
populate graphs by using performance data.

21

User Guide

Customizing Monitoring Plugin

Figure 10 : Graph plotted using performance data

6.3 Customize fault information
User can also control the attributes user wants to see in the fault details by editing the monitoring
plugin configuration file 'cisco_ucs_nagios.cfg' and updating the ‘FaultInst’ variable with the
required attribute names.
So for example user can have the following list of attributes which user wants to see as a part of the
fault details
FaultInst=Dn,Descr,severity,Cause,Type,Created

Figure 11 : Custom fault details

6.4 Skipping Faults
With this feature user can define the faults user wants to skip from the Nagios monitoring
service.
For example if user wants to skip faults with severity as ‘info’ then this can be defined
as the value for the configuration variable SKIP_FAULT_LIST in the monitoring plugin
configuration file 'cisco_ucs_nagios.cfg'.
The format of value to this variable is of type
:,:.
Example
SKIP_FAULT_LIST=Lc:suppressed,Type:fsm,Severity:info,Severity:condition

Note:
Skipping fault based on “Description” field is not advisable as it might contain some
special characters which might not let the fault to be skipped when a comparison is done

22

User Guide

7. Uninstall
To uninstall the Cisco UCSM Nagios integration, follow the step as mentioned below
a. Extract the installation tar gzip file in a temporary location.
# tar zxvf cisco-ucs-nagios-x.x.x .tar.gz
b. Now run the installer with '--uninstall' option
# ./installer.py --uninstall
NOTE: Uninstaller will prompt before trying to stop Nagios service.
Uninstallation process requires Nagios services to be stopped first.

8. Known Caveats
8.1 Frequent Service timeouts
If user has a large UCSM domain with more than 600 services and is seeing frequent service
timeout, then it is recommended that the user should check for network related issues. For
example ping timeouts, high network latencies, etc.
If the above doesn’t help, then user may try and tweak the following parameters in Nagios
configuration file ‘nagios.cfg’ to check if this resolves the issue.

Service Check Timeout
Format: service_check_timeout=
Example: service_check_timeout=600
This is the maximum number of seconds that Nagios will allow service checks to run. If the
network is slow or UCSM is slow in responding to the xml requests then it is recommended to
increase this value and check the results.
It may be a case that increasing only this value may not help.It is then recommended that user
should use this value in conjunction with ‘max_concurrent_check’ option.
Maximum Concurrent Service Checks
Format: max_concurrent_checks=
Example: max_concurrent_checks=50
In case of slow Nagios host or network or UCSM responding slowly to the xml requests, user is
recommended to keep this value to a minimum. This option will run minimum number of
concurrent services on the Nagios host thereby stabilizing the system and in turn handling the
frequent service time out issue.
More details on the above Nagios configuration options can be found at the following link
http://nagios.sourceforge.net/docs/3_0/configmain.html

23

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 25
Language                        : en-US
Tagged PDF                      : Yes
Author                          : pratripa
Creator                         : Microsoft® Word 2010
Create Date                     : 2015:10:08 15:34:32+05:30
Modify Date                     : 2015:10:08 15:34:32+05:30
Producer                        : Microsoft® Word 2010

EXIF Metadata provided by EXIF.tools

Cisco Ucsm Nagios User Guide

Navigation menu

Versions of this User Manual:

Views

Navigation