Manual Monitoring Cisco UCS Manager

User Manual: Monitoring Cisco UCS Manager

Open the PDF directly: View PDF PDF.
Page Count: 151

DownloadManual Monitoring Cisco UCS Manager
Open PDF In BrowserView PDF
Monitoring the Cisco UCS Manager
eG Enterprise v6

Restricted Rights Legend
The information contained in this document is confidential and subject to change without notice. No
part of this document may be reproduced or disclosed to others without the prior permission of eG
Innovations Inc. eG Innovations Inc. makes no warranty of any kind with regard to the software and
documentation, including, but not limited to, the implied warranties of merchantability and fitness for
a particular purpose.
Trademarks
Microsoft Windows, Windows NT, Windows 2000, Windows 2003 and Windows 2008 are either
registered trademarks or trademarks of Microsoft Corporation in United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their
respective owners.
Copyright
©2015 eG Innovations Inc. All rights reserved.

Table of Contents
MONITORING THE CISCO UCS MANAGER...........................................................................................................................................1
1.1

1.2
1.3

1.4

UCS CHASSIS LAYER .......................................................................................................................................................................... 4
1.1.1 Chassis IO Modules Test ........................................................................................................................................................ 6
1.1.2 Chassis Fans Test ................................................................................................................................................................ 19
1.1.3 Chassis Details Test ............................................................................................................................................................. 30
1.1.4 Chassis Fan Modules Test .................................................................................................................................................... 42
1.1.5 Chassis IO Module Backplane Ports Test .............................................................................................................................. 51
1.1.6 Chassis PSUs Test ............................................................................................................................................................... 53
1.1.7 Chassis IO Module Fabric PortsTest..................................................................................................................................... 66
THE NETWORK LAYER ...................................................................................................................................................................... 70
THE FABRIC INTERCONNECTS LAYER ................................................................................................................................................. 70
1.3.1 Fabric Interconnect PSUs Test ............................................................................................................................................. 71
1.3.2 Fabric Interconnect Ethernet Ports Test ................................................................................................................................ 82
1.3.3 Fabric Interconnect Fans Test .............................................................................................................................................. 90
1.3.4 Fabric Interconnect FC Ports Test ...................................................................................................................................... 101
1.3.5 Fabric Interconnect Details Test ......................................................................................................................................... 109
1.3.6 LAN Cloud Port Channels Test ........................................................................................................................................... 113
1.3.7 LAN Cloud PC Ethernet Ports ............................................................................................................................................ 118
THE BLADES LAYER ....................................................................................................................................................................... 121
1.4.1 Blade Overview Test .......................................................................................................................................................... 122
1.4.2 Blade Processors Test ........................................................................................................................................................ 133
1.4.3 Blade Motherboard Test ..................................................................................................................................................... 134
1.4.4 Blade Memory Arrays Test ................................................................................................................................................. 136
1.4.5 Blade NICs Test ................................................................................................................................................................. 137

CONCLUSION ......................................................................................................................................................................................... 147

x

Table of Figures
Figure 1: The architecture of the Cisco UCS .....................................................................................................................................................2
Figure 2: Layer model of the Cisco UCS Manager ............................................................................................................................................3
Figure 3: The tests mapped to the UCS Chassis layer ........................................................................................................................................6
sFigure 4: The detailed diagnosis of the Configuration state measure of the Chassis I/O Modules Test .............................................................. 19
Figure 5: The detailed diagnosis of the Overall status measure of the Chassis Fans test ..................................................................................... 30
Figure 6: A Cisco UCS Blade Server Chassis ................................................................................................................................................. 31
Figure 7: The detailed diagnosis of the Administrative state measure of the Chassis Details test ........................................................................ 42
Figure 8: The detailed diagnosis of the Overall status measure of the Chassis Fan Modules test ........................................................................ 51
Figure 9: The detailed diagnosis of the Overall status measure of the Chassis PSUs test ................................................................................... 65
Figure 10: The detailed diagnosis of the Overall status measure of the Chassis I/O Module Fabric Ports Test .................................................... 70
Figure 11: The tests mapped to the Network layer........................................................................................................................................... 70
Figure 12: The tests mapped to the Fabric Interconnects layer ......................................................................................................................... 71
Figure 13: The detailed diagnosis of the Overall status measure of the Fabric Interconnect PSUs test ................................................................ 82
Figure 14: The detailed diagnosis of the Overall status measure of the Fabric Interconnect Uplink Ethernet Ports test ........................................ 90
Figure 15: The detailed diagnosis of the Overall status measure of the Fabric Interconnect Fans test ............................................................... 100
Figure 16: The detailed diagnosis of the Fabric Interconnect Uplink FC Ports test .......................................................................................... 109
Figure 17: The detailed diagnosis of the Overall status measure of the Fabric Interconnect Details test ............................................................ 113
Figure 18: The detailed diagnosis of the Overall status measure of the LAN Cloud Port Channels Test ........................................................... 118
Figure 19: The detailed diagnosis of the Overall Status measure of the LAN Cloud PC Ethernet Ports test ...................................................... 121
Figure 20: The tests mapped to the Blades layer ........................................................................................................................................... 122

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Chapter

1

Monitoring the Cisco UCS Manager
The Cisco Unified Computing System (UCS) is a data center computing solution composed of
computing hardware, virtualization software, switching fabric, and management software. The idea
behind the system is to reduce total cost of ownership and improve scalability by integrating the
different components into a cohesive platform that can be managed as a single unit. Just-In-Time
deployment of resources and 1:N redundancy are also possible with a system of this type.
Figure 1 depicts the architecture of Cisco UCS.

1

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 1: The architecture of the Cisco UCS
The computing component of the UCS is available in two versions; the B-Series (a modular package
consisting of a powered chassis and full or half slot blade servers), and the C-series rackmount servers
(that can be used with or without UCS, or mixed with blade UCS systems). Both form factors utilize
the same standard components seen throughout the industry, including Intel Nehalem processors and
DIMM memory. The servers are distinctive for supporting Converged Network Adapters ( CNAs), Port
Virtualization, and in some models the Catalina chipset (ASICs that expand the number of memory
sockets than can be connected to a single memory bus).
Besides the blade servers and chassis, the other core components of the Cisco UCS are as follows:

 UCS manager: Cisco UCS Manager implements policy-based management of the server and
network resources. Network, storage, and server administrators all create service profiles,
allowing the manager to configure the servers, adapters, and fabric extenders and appropriate
isolation, quality of service (QoS), and uplink connectivity. It also provides APIs for integration
with existing data center systems management tools. An XML interface allows the system to
be monitored or configured by upper-level systems management tools.

 UCS fabric interconnect: Networking and management for attached blades and chassis with
10 GigE and FCoE. All attached blades are part of a single management domain. Deployed in
redundant pairs, the 20-port and the 40-port offer centralized management with Cisco UCS
Manager software and virtual machine optimized services with the support for VN-Link.

 Cisco Fabric Manager: manages storage networking across all Cisco SAN and unified fabrics
with control of FC and FCoE. Offers unified discovery of all Cisco Data Center 3.0 devices aa
well as task automation and reporting. Enables IT to optimize for the quality-of-service (QoS)
levels, performance monitoring, federated reporting, troubleshooting tools, discovery and
configuration automation.
2

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

 Fabric extenders: connect the fabric to the blade server enclosure, with 10 Gigabit Ethernet
connections and simplifying diagnostics, cabling, and management. The fabric extender is
similar to a distributed line card and also manages the chassis environment (the power supply,
fans and blades) so separate chassis management modules are not required. Each UCS
chassis can support up to two fabric extenders for redundancy.
The health of the Cisco UCS platform hence largely relies on how the blade chassis, the blade servers,
the fabric interconnects and extenders are functioning. This implies that issues in the availablity /
operability of one/more of these components, or the unexpected power/thermal/voltage failures they
may encounter can degrade the overall performance of the Cisco UCS. In order to avoid this, the
health and operational efficiency of the integral components of the platform should be continuously
monitored, and issues proactively reported.
eG Enterprise provides a 100%, web-based Cisco UCS Manager monitoring model that periodically
monitors the Cisco UCS manager, discovers the chassis, I/O modules, blades, and fabric interconnects
managed by the UCS manager, and determines the current status of each of these components.

Figure 2: Layer model of the Cisco UCS Manager
Each layer of Figure 2 is mapped to a series of tests that instantly capture current/potential
abnormalities in the state and functioning of the core components managed by the Cisco UCS
manager, and alerts administrators to the same. With the help of the metrics collected by these tests,
administrators can find quick and accurate answers for the following queries:

 Are all I/O modules (i.e., fabric extenders) operating normally? Is any I/O module in a
degraded/powered-off/inoperable state currently? If so, which one is it?

 Is any I/O module experiencing any critical performance issues now?
 How is the power/voltage/thermal states of the I/O modules?
 Is any I/O module missing?
 Is the temperature of all I/O modules normal? Is any I/O module experiencing abnormal
temperatures?

 Is any fan inoperable? In which chassis, does this fan exist?
 Does any fan operate at abnormal speeds?
 Is any fan experiencing any performance failures?
 Have non-recoverable problems occurred in the power/thermal /voltage states of any fan?
 How is the overall health of the chassis? Is any chassis in an inoperable state currently?
3

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

 Is any chassis license-insufficient?
 Are the power/thermal/voltage states of all chassis normal?
 Is any chassis receiving / transmitting more power than it can handle?
 Which fan module is currently in an inoperable state?
 Which fan module is behaving abnormally?
 Are all backplane ports healthy?
 Have any operational/performance issues been detected in any of the PSUs in the chassis?
 Which PSU is receiving voltage over 210 volts and emitting voltage over 12 volts?
 Are the fabric interconnects operating normally?
 Do the fabric interconnects have enough CPU and memory resources at their disposal? Is any
fabric interconnect experiencing a CPU/memory contention?

 Are the PSUs of the fabric interconnects operating normally?
 Is the power/voltage input and output of the PSUs within acceptable limits?
 Have any uplink ethernet ports failed?
 Which uplink ethernet port is seeing very high traffic?
 Are the fans of all fabric interconnects operating normally?
 Is any uplink fibre channel port in an abnormal state?
 Are there any disabled uplink fibre channel ports?
 Is any fibre channel port seeing very high traffic?
 Is any fibre channel port experiencing too many errors in transmission?
 Are the blade servers in a chassis healthy?
 Is any blade server unavailable?
 Is the power state/slot state of the blade servers OK?
 Are the blade servers utilizing memory optimally? If any blade server over-utilizing the
memory?

 Is the motherboard of any blade server consuming power/current excessively?
 Is the temperature of the motherboard normal? If not, then which side of the motherboard is
experiencing abnormal temperatures - the front or the rear?

 Is the temperature of any memory array of any blade server very high?
The sections that follow will discuss each layer of Figure 2 of this document.

1.1

UCS Chassis Layer

The Cisco UCS server chassis and its components are part of the Cisco Unified Computing System.
The Cisco UCS server chassis system consists of the following components:

 Cisco UCS server chassis

4

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

 Cisco UCS blade servers-up to eight half-width or four full-width blade servers, each
containing two CPUs and holding up to two hard drives

 Cisco UCS I/O Module—up to two I/O modules, each providing four ports of 10-Gb Ethernet,
Cisco Data Center Ethernet, and Fibre Channel over Ethernet (FCoE) connection to the fabric
interconnect

 A number of SFP+ choices from copper to fiber
 Power supplies—up to four 2500 Watt hot-swapable power supplies
 Power Distribution Unit
 Fan modules—eight hot-swapable fan modules
As a problem in the chassis system can affect the overall performance of the Cisco UCS platform, you
need to shield the chassis and its integral components from permanent physical or operational
damage. To achieve this, you need proactive updates of probable threats to the health of the chassis
system; these updates will enable you to initiate corrective measures before it is too late. The tests
mapped to this layer provide you with such problem updates.
With the help of these tests, you can keep an eye on the status of each chassis managed by the Cisco
UCS manager and also its core components such as the fabric extemders, fan modules, power
supplies, etc., and quickly detect abnormalities.

5

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 3: The tests mapped to the UCS Chassis layer

1.1.1

Chassis IO Modules Test

The Cisco UCS chassis contains I/O Modules or Fabric Extenders that allow th e blade servers in the
chassis to communicate with Cisco UCS Fabric Interconnects. The chassis supports up to two I/O
Modules, each with four I/O ports.
The Cisco UCS Fabric Extenders bring the unified fabric into the blade server enclosure, providing 10
Gigabit Ethernet connections between blade servers and the fabric interconnect, simplifying
diagnostics, cabling, and management.

6

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The Cisco UCS Fabric Extenders extend the I/O fabric between the Cisco UCS Fabric Interconnects and
the Cisco Blade Server Chassis, enabling a lossless and deterministic Fibre Channel over Ethernet
(FCoE) fabric to connect all blades and chassis together. Since the fabric extender is similar to a
distributed line card, it does not do any switching and is managed as an extension of the fabric
interconnects. This approach removes switching from the chassis, reducing overall infrastructure
complexity and enabling the Cisco Unified Computing System to scale to many chassis without
multiplying the number of switches needed, reducing TCO and allowing all chassis to be managed as a
single, highly available management domain.
The Cisco UCS Fabric Extenders also manages the chassis environment (the power supply and fans as
well as the blades) in conjunction with the Fabric Interconnects. Therefore, separate chassis
management modules are not required.
Cisco UCS Fabric Extenders fit into the back of the Cisco UCS Chassis. Each Cisco UCS Chassis can
support up to two Fabric Extenders, enabling increased capacity as well as redundancy.
This test monitors the overall health of each of the I/O Modules present in every chassis managed by
the Cisco UCS manager, and in the process, promptly alerts you to abnormalities in the power,
thermal, voltage states of the modules and sudden spikes in the ambient/ASIC temperature of the
modules. This way, defective I/O modules come to light.

Purpose

Monitors the overall health of each of the I/O Modules present in every chassis of
managed by the Cisco UCS manager, and in the process, promptly alerts you to
abnormalities in the power, thermal, voltage states of the modules, or a sudden
increase in the ambient/ASIC temperature of the modules

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

7

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
1. HOST – The IP address of the host for which the test is being configured.
2. PORT – The port at which the specified HOST listens. By default, this is NULL.
3. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
4. CONFIRM PASSWORD- Confirm the password by retyping it here.
5. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
6. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
7. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each I/O module in each chassis managed by the Cisco UCS
manager being monitored

Measurement

Measurement
Unit

8

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Configuration
state:
Indicates
the
ccurrent
onfiguration status
of this I/O module
present
in
this
chassis.

Numeric Value

State

0

Un-initialized

1

Un-acknowledged

2

Unsupportedconnectivity

3

Ok

4

Removing

Note:
By default, this measure reports the abovementioned States while indicating the
configuration status of the I/O module in
this chassis. However, in the graph of this
measure, states will be represented using
the corresponding numeric equivalents i.e.,
0 to 4.
The detailed diagnosis of this measure
provides the Time, ID, PID, Side, Chassis
ID, Fabric ID, Revision, Serial Number and
Vendor attributes for each I/O module.

9

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this I/O
module present in
this chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7
8
9
10
11

Voltageproblem
Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishable

12

Bios-posttimeout

13

Disabled

51

52

Fabric-connproblem
Fabricunsupportedconn

81

Config

82

Equipmentproblem

83
.

10

Decommissioni
ng

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

84

Chassis-limitexceeded

100

Not-supported

101

Discovery

102

Discoveryfailed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the status
of the I/O module in this chassis. However,
in the graph of this measure, states will be
represented
using
the
corresponding
numeric equivalents only.

11

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this I/O
module present in
this chassis.

Numeric Value State
0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Powerproblem

6

Removed

7

Voltageproblem

8
9
10

11

Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablisha
ble

12

Bios-posttimeout

13

Disabled

51

52

Fabric-connproblem
Fabricunsupportedconn

81

Config

82

Equipmentproblem

83
84
100

12

Decommissio
ning
Chassis-limitexceeded
Notsupported

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

101

Discovery

102

Discoveryfailed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operability of an I/O module in this chassis.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.

13

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Performance
state:
Indicates
the
current
performance status
of this I/O module
present
in
this
chassis.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
performance state of an I/O module.
However, in the graph of this measure,
states will be represented using the
corresponding numeric equivalents.

14

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Power state:
Indicates
the
current
power
status of this I/O
module
in
this
chassis.

Numeric Value

State

0

Unknown

1

On

2

Test

3

Off

4

Online

5

Offline

6

Offduty

7

Degraded

8

Power-save

9

Error

10

Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the power
state of an I/O module. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

15

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Presence state:
Indicates
the
current state of this
I/O module in this
chassis.

Numeric Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

20

30
40
100

Equippednot-primary
Equippedidentityunestablisha
ble
Inaccessible
Unauthorize
d
Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the
current state of the I/O module in this
chassis. However, in the graph of this
measure, states will be represented using
their corresponding numeric equivalents
only.

16

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Thermal state:
Indicates
the
current
thermal
state of this I/O
module present in
this chassis.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7
100

Lower

non-

recoverable
Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
thermal state of the I/O modules in this
chassis. However, in the graph of this
measure, states will be represented using
the corresponding numeric equivalents
only.
Nu
mer
Stat
ic
e
Val
ue
Unk
0

now
n

1

Ok
Upp
ernon

2

rec
ove
rabl
e

17

3

Upp
ercriti
cal

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Voltage state:
Indicates
the
current
voltage
state of this I/O
module present in
this chassis.

Numeric
Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7
100

Lower

non-

recoverable
Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
voltage state of the I/O module in this
chassis. However, in the graph of this
measure, states will be represented using
their corresponding numeric equivalents
only.
Ambient
temperature:

Celcius

An abnormal temperature may
severe damage to the I/O modules.

Indicates
the
current
ambient
temperature of this
I/O module present
in this chassis.

18

cause

M o n i to r i n g

t h e

C i s c o

U C S

ASIC
temperature:

M a n a g e r

Celcius

An application-specific integrated circuit
(ASIC) is an integrated circuit (IC)
customized for a particular use, rather than
intended for general-purpose use.

Indicates
the
current temperature
of
the
ASIC
(Application-Specific
Integrated Circuit)
in this I/O module
present
in
this
chassis.

If an ASIC registers an abnormal
temperature, it may severely affect the
operations of the I/O module in which that
ASIC operates.

The detailed diagnosis of Configuration state measure provides the Time, ID, PID, Side, Chassis ID,
Fabric ID, Revision, Serial Number and Vendor attributes for each I/O module.

sFigure 4: The detailed diagnosis of the Configuration state measure of the Chassis I/O Modules Test

1.1.2

Chassis Fans Test

A Cisco Blade Server Chassis contains the following components:

 Cisco UCS Fabric Extenders—Up to two fabric extenders (FEX), each FEX provides four ports of
10-Gigabit Ethernet, Cisco Data Center Ethernet, and Fibre Channel over Ethernet (FCoE)

 SFP+ transceiver choices that include copper and fiber optic
 Power supply units—Up to four 2500 W hot-swappable power supply units
 Fan modules—Eight hot-swappable fan modules
 Cisco UCS Blade Servers —Up to eight half-wide blade servers or four full-width blade servers,
each holding RAID capable hard drives
This test monitors the overall health of each fan present in each chassis managed by the Cisco UCS
manager, and proactively alerts users to the following:

 Fans that are in an abnormal operational state;
 Fans that are in a critical performance/thermal/voltage state;

19

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

 Fans in a degraded/errored power state;
 Fans operating at abnormal speeds.

Purpose

Monitors the overall health of each fan module present in each chassis managed by
the Cisco UCS manager, and proactively alerts users to the following:

 Fans that are in an abnormal operational state;
 Fans that are in a critical performance/thermal/voltage state;
 Fans in a degraded/errored power state;
 Fans operating at abnormal speeds.
Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

20

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each fan in each chassis managed by the Cisco UCS manager
being monitored

Measurement

Measurement
Unit

21

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this fan
present
in
this
chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9

Performance-problem

10

Accessibility-problem

11

unestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-conn-problem

52

Fabric-unsupportedconn

81

Config

82

Equipment-problem

83

Decommissioning

84

22

Identity-

Chassis-limitexceeded

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgrade-problem

106

Peer-comm-problem

107

Auto-upgrade

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The detailed diagnosis of this measure
provides the Time, ID, PID, Module,
Revision, Serial Number, Tray and Vendor
attributes for each fan in each chassis.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current operational
state of this fan
present
in
this
chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltageproblem

8
9
10
11
12

Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishable
Bios-posttimeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82
83
.

23

Equipmentproblem
Decommissioni
ng

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

84

Chassis-limitexceeded

100

Not-supported

101

Discovery

102

Discoveryfailed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operability statusof a fan. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

24

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Performance
state:
Indicates
the
current
performance status
of this fan present
in this chassis.

Numeric Value

State

0

Unknown

1

Ok

2
3
4
5
6
7
100

Upper-nonrecoverable
Uppercritical
Upper-noncritical
Lower-noncritical
Lowercritical
Lower nonrecoverable
Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the
performance status of a fan. However, in
the graph of this measure, states will be
represented
using
the
corresponding
numeric equivalents only.

25

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Power state:
Indicates
the
current
power
status of this fan
present
in
this
chassis.

Numeric
Value

State

0

Unknown

1

On

2

Test

3

Off

4

Online

5

Offline

6

Offduty

7

Degraded

8

Power-save

9

Error

10

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the power
status of a fan. However, in the graph of
this measure, states will be represented
using
their
corresponding
numeric
equivalents only.

26

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Presence state:
Indicates
whether
this fan currently
exists in this chassis
or not.

Numeric
Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

Equippednot-primary
Equipped-

20

identityunestablisha
ble

30

Inaccessible

40

Unauthorized

100

Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the
current state of a fan. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

27

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Thermal state:
Indicates
current
state of
present
chassis.

the
thermal
this fan
in
this

Numeric Value State
0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7
100

Lower

non-

recoverable
Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the
thermal state of a fan. However, in the
graph of this measure, states will be
represented
using
the
corresponding
numeric equivalents only.

28

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Voltage state:
Indicates
current
state of
present
chassis.

the
voltage
this fan
in
this

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7
100

Lower

non-

recoverable
Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
voltage state of a fan. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
Speed:

RPM

Ideally, the speed of the fans must be
within normal limits.

Indicates the speed
which
this
fan
operates currently.

The detailed diagnosis of the Overall status measure reveals the Time, ID, PID, Module, Revision,
Serial Number, Tray and Vendor attributes for each fan in each chassis.

29

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 5: The detailed diagnosis of the Overall status measure of the Chassis Fans test

1.1.3

Chassis Details Test

The Cisco UCS 5100 Series Blade Server Chassis is a scalable and flexible blade server chassis for data
centers. The chassis can house up to eight half-width Cisco UCS B-Series Blade Servers and can
accommodate both half- and full-width blade form factors. Four single-phase, hot-swappable power
supplies are accessible from the front of the chassis. These power supplies are 92 percent efficient and
can be configured to support nonredundant, N+1 redundant, and grid-redundant configurations. The
rear of the chassis contains eight hot-swappable fans, four power connectors (one per power supply),
and two I/O bays for Cisco UCS 2104XP I/O modules. A passive midplane provides up to 20 Gbps of
I/O bandwidth per server slot and up to 40 Gbps of I/O bandwidth for two slots.

30

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 6: A Cisco UCS Blade Server Chassis
A Cisco UCS can support multiple chassis, each with two fabric extenders for redundancy.
By running periodic health checks on each chassis managed by a Cisco UCS manager, you can
promptly identify the following:

 The chassis that is currently in an abnormal operational state;
 The insufficiently licensed chassis;
 Empty/missing chassis;
 The chassis that is experiencing serious power failures;
 The chassis with fans that are in a critical thermal state;
 The chassis that is handling unusually high input and output power.
Purpose

Runs periodic health checks on each chassis supported by a Cisco UCS to promptly
identify the following:

 The chassis that is currently in an abnormal operational state;
 The insufficiently licensed chassis;
 Empty/missing chassis;
 The chassis that is experiencing serious power failures;
 The chassis with fans that are in a critical thermal state;
31

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

 The chassis that is handling unusually high input and output power
Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

Configurable
parameters for
the test

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each chassis managed by the Cisco UCS manager being
monitored

Measurement

Measurement
Unit

32

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Administrative
state:
Indicates
current
administrative
status
of
chassis.

the

this

Numeric Value

State

1

Acknowledged

2

Re-acknowledged

3

Decommission

4

Remove

Note:
By default, this measure reports the abovementioned States while indicating the
administrative state of a chassis. However,
in the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, PID, Module,
Revision, Serial Number, Tray and Vendor
attributes for each chassis.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Configuration
state:
Indicates
the
current
configuration state
of this chassis.

Numeric Value

State

0

Un-initialized

1

Unacknowledged

2

33

Unsupportedconnectivity

3

Ok

4

Removing

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned States while indicating the
configuration state of a chassis. However,
in the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

License state:
Indicates
the
current
license
status
of
this
chassis.

Numeric Value State
0

Unknown

1

License-ok

2

Licenseinsufficient

Note:
By default, this measure reports the abovementioned States while indicating the
license state of a chassis. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

34

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status
of
this
chassis.

Numeric Value State
0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9

Performance-problem

10

Accessibility-problem

11

unestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-conn-problem

52

Fabric-unsupportedconn

81

Config

82

Equipment-problem

83

Decommissioning

84

35

Identity-

Chassis-limitexceeded

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgrade-problem

106

Peer-comm-problem

107

Auto-upgrade

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned States while indicating the overall
status of a chassis. However, in the graph
of this measure, states will be represented
using
their
corresponding
numeric
equivalents only.

36

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8
9
10
11
12

Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishable
Bios-posttimeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82
83
84

37

Equipmentproblem
Decommissionin
g
Chassis-limitexceeded

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operability state of a chassis. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

38

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Power state:
Indicates
the
current
power
status
of
this
chassis.

Numeric Value

State

0

Unknown

1

Ok

2

Failed

3

Input-failed

4
5
6
7
8

Inputdegraded
Outputfailed
Outputdegraded
Redundancyfailed
Redundancydegraded

Note:
By default, this measure reports the abovementioned States while indicating the power
status of a chassis. However, in the graph
of this measure, states will be represented
using
their
corresponding
numeric
equivalents only.

39

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Presence state:
Indicates
the
current status of
this chassis.

Numeric
Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

Equippednot-primary
Equipped-

20

identityunestablisha
ble

30

Inaccessible

40

Unauthorized

100

Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the
current state of a chassis. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

40

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The State values reported by this measure
and
their
corresponding
numeric
equivalents are described in the table
below:

Thermal state:
Indicates
the
current
thermal
state of this chassis.

Numeric Value State
0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7
100

Lower

non-

recoverable
Notsupported

Note:
By default, this measure reports the abovementioned States while indicating the
thermal state of a chassis. However, in the
graph of this measure, states will be
represented
using
the
corresponding
numeric
Nu
equivalents only.
mer
Stat
ic
e
Val
ue
Unk
0

now
n

1

Ok
Upp
ernon

2

rec
ove
rabl
e

41

3

Upp
ercriti

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Input power:

Watts

An abnormally high or low power may
cause serious damage to the hardware
components of the chassis. Therefore, the
value of this measure should be low.

Watts

Ideally, the value of this measure should be
low.

Indicates
the
current input power
of this chassis.
Output power:
Indicates
the
current
output
power
of
this
chassis.

The detailed diagnosis of the Administrative state measure provides the Time, ID, PID, Module,
Revision, Serial Number, Tray and Vendor attributes for each chassis.

Figure 7: The detailed diagnosis of the Administrative state measure of the Chassis Details test

1.1.4

Chassis Fan Modules Test

The Cisco UCS Blade server chassis contains eight hot-swappable fan modules. These fan modules
ensure that the internals of the chassis always receive adequate air flow and the temperature within
the chassis is maintained at acceptable levels at all times. Snags in the functioning of the fan module
can hence hamper air flow, which in turn may have disastrous effects on the health of the other
chassis components.
By periodically monitoring the availability, overall health, operational state, and the exhaust
temperature of fan module, you can promptly detect abnormalities in the operations of the module
and initiate speedy remedial measures. This test does just that.

Purpose

Periodically monitors the availability, overall health, operational state, and the
exhaust temperature of each fan module, and promptly detects abnormalities in the
operations of the module, so that remedial measures can be swiftly initiated

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

42

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each fan module available in each chassis managed by the
Cisco UCS manager being monitored

Measurement

Measurement
Unit

43

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this fan
module present in
this chassis.

Numeric Value State
0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9

Performance-problem

10

Accessibility-problem

11

unestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-conn-problem

52

Fabric-unsupportedconn

81

Config

82

Equipment-problem

83

Decommissioning

84

44

Identity-

Chassis-limitexceeded

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgrade-problem

106

Peer-comm-problem

107

Auto-upgrade

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned States while indicating the overall
status of a fan module. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, PID, Module,
Revision, Serial Number, Tray and Vendor
attributes for the fan module.

45

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this fan
module
in
this
chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8
9
10
11
12

Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishable
Bios-posttimeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82
83
84

46

Equipmentproblem
Decommissionin
g
Chassis-limitexceeded

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operating state of a fan module. However,
in the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

47

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Performance
state:
Indicates
the
current
performance state
of this fan module
in this chassis.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-non-recoverable

3

Upper-critical

4

Upper-non-critical

5

Lower-non-critical

6

Lower-critical

7

Lower non-recoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
performance state of a fan module.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Power state:
Indicates
the
current power state
of this fan module
in this chassis

48

Numeric Value

State

0

Unknown

1

On

2

Test

3

Off

4

Online

5

Offline

6

Offduty

7

Degraded

8

Power-save

9

Error

10

Not-supported

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned States while indicating the power
state of a fan module. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Presence state:
Indicates
whether
this
fan
module
exists or not in this
chassis currently.

Numeric Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

20

Equipped-notprimary
Equippedidentityunestablishable

30

Inaccessible

40

Unauthorized

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
existence of a fan module in a chassis.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.

49

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Thermal state:
Indicates
the
current
thermal
state of this fan
module present in
this chassis.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-non-critical

5

Lower-non-critical

6

Lower-critical

7

Lower
recoverable

100

Not-supported

non-

Note:
By default, this measure reports the abovementioned States while indicating the
current thermal state of a fan module.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
Exhaust
temperature:

Ideally, the value of this measure should be
low, as an abnormal temperature can cause
damage to the fans in a module.

Indicates
the
current
exhaust
temperature of the
fans present in this
fan module in this
chassis.

The detailed diagnosis of the Overall status measure provides the Time, ID, PID, Module, Revision,
Serial Number, Tray and Vendor attributes for the fan module.

50

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 8: The detailed diagnosis of the Overall status measure of the Chassis Fan Modules test

1.1.5

Chassis IO Module Backplane Ports Test

The Cisco UCS chassis supports eight blade slots, and each blade has two Intel Xeon "Nehalem"
processors and up to 96GB of RAM. The chassis also has two SAS drive slots and a RAID controller,
plus a connection to the backplane. The chassis is responsible for providing support infrastructure to
blades via the backplane connection.
A backplane is a circuit board (usually a printed circuit board) that connects several connectors in
parallel to each other, so that each pin of each connector is linked to the same relative pin of all the
other connectors forming a computer bus. It is used as a backbone to connect several printed circuit
boards together to make up a complete computer system.
In Cisco UCS, all network traffic flows over FCoE directly from the chassis backplane to an FI (Fabric
Interconnect) device.
To make sure that the blades in the chassis receive prompt and uninterrupted networking services,
you need to frequently check whether the backplane ports of the chassis are available and operational.
The Chassis IO Module Backplane Ports test makes this verification possible. At pre-configured intervals,
this test monitors the health of each of the backplane ports in every I/O module of a chassis, and
reports whether they are operational or not. Backplane ports experiencing errors, hardware failures,
or software failures can thus be identified quickly and accurately.

Purpose

Monitors the health of each of the backplane ports in every I/O module of a chassis,
and reports whether they are operational or not

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

51

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.

Outputs of the
test
Measurements
made by the

One set of results for each backplane port in each I/O module of every Cisco UCS
chassis managed by the Cisco UCS manager being monitored

Measurement

Measurement
Unit

52

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status
of
this
backplane port.

Numeric Value

State

0

Indeterminate

1

Up

2

Admin-down

3

Link-down

4

Failed

5

No-license

6

Link-up

7

Hardware-failure

8

Software-failure

9

Error-disabled

10

Sfp-not-present

Note:
By default, this measure reports the abovementioned States while indicating the overall
health of a backplane port. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

1.1.6

Chassis PSUs Test

A Cisco UCS Blade Server Chassis can be provided with upto four 2500 Watt hot-swappable power
supplies.
As issues in the power supply units can adversely impact the performance of the blades in a chassis,
administrators need to promptly detect power-related issues and rectify them before any irrepairable
damage is done. This test aids in the timely detection of the following anomalies related to PSUs:

 Abnormalities in the overall PSU health;
 Operational deficiencies;
 Critical performance setbacks;
 Unrecoverable power/thermal/voltage failures;
 Disturbing rise in temperature;
 Input/output voltage, current, and power that exceeds permissible limits.

53

M o n i to r i n g

Purpose

t h e

C i s c o

U C S

M a n a g e r

Aids in the timely detection of the following anomalies related to PSUs:

 Abnormalities in the overall PSU health;
 Operational deficiencies;
 Critical performance setbacks;
 Unrecoverable power/thermal/voltage failures;
 Disturbing rise in temperature;
 Input/output voltage, current, and power that exceeds permissible limits.
Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

54

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each PSU in each chassis managed by the Cisco UCS manager
being monitored

Measurement

Measurement
Unit

55

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this PSU in
this chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9
10
11

problem
Accessibilityproblem
Identityunestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-connproblem

52

Fabricunsupported-conn

81

Config

82

Equipmentproblem

83

Decommissioning

84

56

Performance-

Chassis-limitexceeded

100

Not-supported

101

Discovery

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgrade-problem

106

Peer-commproblem

107

Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the overall
status of a PSU. However, in the graph of
this measure, states will be represented
using
their
corresponding
numeric
equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, PID, Revision, Serial
Number and Vendor attributes for the PSU.

57

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this PSU in
this chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7
8
9
10
11

Voltageproblem
Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishable

12

Bios-posttimeout

13

Disabled

51

52

Fabric-connproblem
Fabricunsupportedconn

81

Config

82

Equipmentproblem

83
84

58

Decommissioni
ng
Chassis-limitexceeded

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operational state of a PSU. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

59

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The
States
this measure
corresponding
equivalents
in the table

Performance
state:
Indicates
the
current
performance state
of this PSU in this
chassis.

reported
by
and
their
numeric
are described
below:

Nu
meri
Stat
c
e
Valu
e

0

Unk
now
n

1

Ok

Upp
er60

2

non
reco
ver

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The
States
this measure
corresponding
equivalents
in the table

Power state:
Indicates
the
current power state
of this PSU in this
chassis.

reported
by
and
their
numeric
are described
below:

Nu
meri
Stat
c
e
Valu
e

0

Unk
now
n

1

Ok

Upp
er61

2

non
reco
ver

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The
States
this measure
corresponding
equivalents
in the table

Presence state:
Indicates
the
current state of this
PSU in this chassis.

reported
by
and
their
numeric
are described
below:

Nu
meri
Stat
c
e
Valu
e

0

Unk
now
n

1

Ok

Upp
er62

2

non
reco
ver

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The
States
this measure
corresponding
equivalents
in the table

Thermal state:
Indicates
the
current
thermal
state of this PSU in
this chassis.

reported
by
and
their
numeric
are described
below:

Nu
meri
Stat
c
e
Valu
e

0

Unk
now
n

1

Ok

Upp
er63

2

non
reco
ver

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The
States
this measure
corresponding
equivalents
in the table

Voltage state:
Indicates
the
current
voltage
state of this PSU in
this chassis.

reported
by
and
their
numeric
are described
below:

Nu
meri
Stat
c
e
Valu
e

0

Unk
now
n

1

Ok

Upp
er64

2

non
reco
ver

M o n i to r i n g

t h e

C i s c o

U C S

Internal
temperature:

M a n a g e r

Celcius

A high temperature is a cause for concern,
as it may cause severe damage to the
PSUs, which in turn may degrade the
performance of the blade server chassis.

Volts

Any value higher than 210 volts could
indicate a problem condition that may
require further investigation.

Volts

Any value higher than 12 volts could
indicate a problem condition that may
require further investigation.

Volts

Any value higher than 3.3 volts could
indicate a problem condition that may
require further investigation.

Amps

Ideally, the value of this measure should be
low. A sudden/consistent increase in this
value could warrant an investigation.

Watts

Ideally, the value of this measure should be
low. A sudden/consistent increase in this
value could warrant an investigation.

Indicates
the
current
internal
temperature of this
PSU in this chassis.
Input210v:
Indicates
the
current
input
voltage of this PSU
in this chassis.
Output12v:
Indicates
the
current
output
voltage of this PSU
in this chassis.
Output3v3:
Indicates
the
current
output
voltage of this PSU
in this chassis.
Output current:
Indicates the output
current of this PSU
in this chassis.
Output power:
Indicates the output
power of this PSU in
this chassis.

The detailed diagnosis of the Overall status measure provides the Time, ID, PID, Revision, Serial
Number and Vendor attributes for the PSU.

Figure 9: The detailed diagnosis of the Overall status measure of the Chassis PSUs test

65

M o n i to r i n g

1.1.7

t h e

C i s c o

U C S

M a n a g e r

Chassis IO Module Fabric PortsTest

A typical Cisco UCS system supports upto two I/O modules, each configured with four ports of 10-Gb
Ethernet, Cisco Data Center Ethernet, and Fibre Channel over Ethernet (FCoE) connection to the fabric
interconnect. Since the I/O module acts as a bridge between the UCS blades and the fabric
interconnect, all ethernet connections to the fabric interconnect will get suspended if one/more ports
are rendered unavailable or non-operational for a brief period. It is hence imperative that the
administrators be promptly alerted when the I/O module ports start behaving abnormally so that,
remedial measures can be initiated instantaneously to avoid a prolonged port outage. This test
monitors the overall health and availability of each of the ports in every I/O module, and sends out
proactive alerts to potential performance anomalies.

Purpose

Monitors the overall health and availability of each of the ports in every I/O module,
and sends out proactive alerts to potential performance anomalies

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

66

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test

One set of results for each fabric port in each I/O module of every chassis managed
by the Cisco UCS manager being monitored

67

M o n i to r i n g

Measurements
made by the
test

t h e

C i s c o

U C S

Measurement

M a n a g e r

Measurement
Unit

Interpretation
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this port in
this I/O module.

Numeric Value

State

0

Indeterminate

1

Up

2

Admin-down

3

Link-down

4

Failed

5

No-license

6

Link-up

7

Hardware-failure

8

Software-failure

9

Error-disabled

10

Sfp-not-present

Note:
By default, this measure reports the abovementioned States while indicating the overall
status of a port. However, in the graph of
this measure, states will be represented
using
their
corresponding
numeric
equivalents only.
The detailed diagnosis of this measure
Time, ID, Slot ID, Chassis ID, Fabric ID,
Port Type, Role Type, Network Type,
Transport Type and Peer details of the I/O
module fabric ports.

68

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Acknowledged
state:
Indicates
the
current
acknowledgment
status of this port in
this I/O module.

Numeric Value

State

1

Un-initialized

2

Un-acknowledged
Unsupported-

3

connectivity

4

Ok

5

Removing

Note:
By default, this measure reports the abovementioned States while indicating the
acknowledgement state of a port. However,
in the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Discovery state:
Indicates
the
current discovered
status of this port in
this I/O module.

Numeric Value

State

0

Absent

1

Present

2

Mis-connect

3

Missing

4

New

Note:
By default, this measure reports the abovementioned States while indicating the
discovery state of a port. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

The detailed diagnosis of the Overall status measure reports the Time, ID, Slot ID, Chassis ID, Fabric
ID, Port Type, Role Type, Network Type, Transport Type and Peer details of the I/O module fabric
ports.

69

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 10: The detailed diagnosis of the Overall status measure of the Chassis I/O Module Fabric Ports Test

1.2

The Network Layer

Determine the availability of the Cisco UCS manager the network, and quickly isolate latencies while
establishing a network connection with the Cisco UCS manager, using the tests mapped to this layer.

Figure 11: The tests mapped to the Network layer
Since the Network test mapped to this layer has already been dealt with in the Monitoring Unix and
Windows Servers document, let us proceed to take a look at the Fabric Interconnects layer in this test.

1.3

The Fabric Interconnects Layer

A core part of the Cisco Unified Computing System, the Cisco UCS Fabric Interconnects provide both
network connectivity and management capabilities to all attached blades and chassis. The Cisco UCS
Fabric Interconnects offers line-rate, low-latency, lossless 10 Gigabit Ethernet and Fibre Channel over
Ethernet (FCoE) functions.
The interconnects provide the management and communication backbone for the Cisco UCS B lades
and UCS Blade Server Chassis. All chassis, and therefore all blades, attached to the interconnects
become part of a single, highly available management domain. In addition, by supporting unified
fabric, the Cisco UCS Fabric Interconnects provides both the LAN and SAN connectivity for all blades
within its domain.
Typically deployed in redundant pairs, fabric Interconnects provide uniform access to both networks
and storage, eliminating the barriers to deploying a fully virtualized environment.
This layer monitors the fabric interconnects and their critical hardware components such as the PSUs,
the uplink and FC ports, and fans, and proactively alerts administrators to potential hardware failures
and operational issues experienced by the fabric interconnects; this way, the layer ensures the
continuous availability of the interconnects, and thus eliminates any disruption in communication for
the blades and the blade server chassis.
70

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 12: The tests mapped to the Fabric Interconnects layer

1.3.1

Fabric Interconnect PSUs Test

The Cisco UCS Fabric Interconnects is provided with two front end slots to support Power Supply
Units. The failure of a power supply unit, if not reddressed promptly, can cause short to prolonged
breaks in the availability of the interconnects. Moreover, a sudden yet steep rise in the
power/voltage/current handled by a PSU may not only injure that PSU, but also cause damage to the
associated fabric interconnect. To avoid such adversities, the PSUs supported by each fabric
interconnect should be periodically monitored.
This test monitors the overall health of each PSU supported by every fabric interconnect and promptly
reports abnormalities such as operational issues experienced by the PSUs, critical PSU failures, serious
errors in the power/thermal/voltage state of each PSU, and inexplicable surges in the input
power/voltage/current of a PSU.

Purpose

Monitors the overall health of each PSU supported by every fabric interconnect and
promptly reports abnormalities such as operational issues experienced by the PSUs,
critical PSU failures, serious errors in the power/thermal/voltage state of each PSU,
and inexplicable surges in the input power/voltage/current of a PSU

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

71

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each PSU in each fabric interconnect managed by the Cisco
UCS manager being monitored

Measurement

Measurement
Unit

72

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this PSU in
this interconnect.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8
9
10
11
12

Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishable
Bios-posttimeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82
83
84

73

Equipmentproblem
Decommissionin
g
Chassis-limitexceeded

100

Not-supported

101

Discovery

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgrade-problem

106

Peer-commproblem

107

Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the overall
status of a PSU. However, in the graph of
this measure, states will be represented
using
their
corresponding
numeric
equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, PID, Revision, Serial
Number and Vendor attributes of the Fabric
Interconnect PSU.

74

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this PSU in
this
fabric
interconnect.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9
10
11

Performanceproblem
Accessibilityproblem
Identityunestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82

75

Equipmentproblem

83

Decommissioning

84

Chassis-limitexceeded

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operational state of a PSU. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

76

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Performance
state:
Indicates
the
current
performance state
of this PSU in this
fabric interconnect.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
performance state of a PSU. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

77

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Power state:
Indicates
the
current power state
of this PSU in this
fabric interconnect.

Numeric
Value

State

0

Unknown

1

On

2

Test

3

Off

4

Online

5

Offline

6

Offduty

7

Degraded

8

Power-save

9

Error

10

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the power
state of a PSU. However, in the graph of
this measure, states will be represented
using
their
corresponding
numeric
equivalents only.

78

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Presence state:
Indicates
the
current state of this
PSU in this fabric
interconnect.

Numeric Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

20

Equipped-notprimary
Equippedidentityunestablishable

30

Inaccessible

40

Unauthorized

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
current state of a PSU. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

79

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Thermal state:
Indicates
the
current
thermal
state of this PSU in
this
fabric
interconnect.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
current thermal state of a PSU. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

80

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Voltage state:
Indicates
the
current
voltage
state of this PSU in
this
fabric
interconnect.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
current voltage state of a PSU. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

Input current:

Amps

An abnormally high or low value of current
may cause severe damage to the Fabric
Interconnect PSUs.

Watts

An abnormally high or low value of input
power may cause severe damage to the
Fabric Interconnect PSUs.

Indicates the input
current received by
this PSU in this
fabric interconnect.
Input power:
Indicates the input
power received by
this PSU in this
fabric interconnect.

81

M o n i to r i n g

t h e

C i s c o

U C S

Input voltage:

M a n a g e r

Volts

An abnormally high or low value of input
voltage may cause severe damage to the
fabric interconnect PSUs.

Indicates the input
voltage received by
this PSU in this
fabric interconnect.

The detailed diagnosis of the Overall status measure provides the Time, ID, PID, Revision, Serial
Number and Vendor attributes of the fabric interconnect PSU.

Figure 13: The detailed diagnosis of the Overall status measure of the Fabric Interconnect PSUs test

1.3.2

Fabric Interconnect Ethernet Ports Test

The Cisco UCS fabric interconnect includes the following key Ethernet port types:

 Server Ports - Server ports handle data traffic between the fabric interconnect and the adapter
cards on the servers. You can only configure server ports on the fixed port module. Expansion
modules do not include server ports.

 Uplink Ethernet Ports - Uplink Ethernet ports handle ethernet traffic between the UCS fabric
interconnect and the next layer of the network. All network-bound Ethernet traffic is pinned to
one of these ports. You can configure uplink Ethernet ports on either the fixed module or an
expansion module.

 Appliance Ports - The Appliance port is intended for connecting Ethernet-based storage arrays
(such as those serving iSCSI or NFS services) directly to the Fabric Interconnect. By adding
this Appliance port type, you can ensure that any port configured as an Appliance Port will not
be selected to receive broadcast/multicast traffic from the Ethernet fabric, as well as providing
the ability to configure VLAN support on the port independently of the other Uplink ports.

 FCoE Storage Ports - The FCoE Storage Port type provides similar functionality as the Appliance
Port type, while extending FCoE protocol support beyond the Fabric Interconnect. Note that
this is not intended for an FCoE connection to another FCF (FCoE Forwarder). Only direct
connection of FCoE storage devices (such as those produced by NetApp and EMC) are
supported. When an Ethernet port is configured as an FCoE Storage Port, traffic is expected to
arrive without a VLAN tag. The Ethernet headers will be stripped away and a VSAN tag will be
added to the FC frame.
In addition, the fabric interconnect supports Monitoring Ethernet Ports, and Ethernet ports that have not
yet been configured to perform any function and are hence still UnConfigured Ethernet Ports.
This test enables you to run frequent health checks on these ports so that, you can quickly identify
non-operational, overloaded, or slow ports. Whenever ethernet traffic slows down, you can use this

82

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

information to figure out which ethernet port is responsible for it. Moreover, in times of heavy traffic,
this information will enable you to decide whether additional ports need to be configured using the
expansion module for handling the load.

Purpose

Enables you to run frequent health checks on different types of ethernet ports so
that, you can quickly identify non-operational, overloaded, or slow ports

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

83

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. SHOW OVERALL STATUS - By default, regardless of the Administrative state of
an Ethernet port, this test reports the Overall status of that port. In other
words, by default, this test reports the Overall status measure for an Ethernet
port, even if the Administrative state of that port is Disabled. This is because,
the SHOW OVERALL STATUS flag is set to Yes by default. If this flag is set to No
instead, then this test will report the Overall status of only those Ethernet ports
that are currently in an Enabled Administrative state.
9. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each ethernet port managed by the Cisco UCS manager being
monitored

Measurement

Measurement
Unit

84

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

Administrative
state:

This measure reports either Enabled or
Disabled as the administrative status of the
Fabric Interconnect Uplink Ethernet ports.
The states and their corresponding numeric
equivalents are shown in the table below:

Indicates
the
current
administrative
status of this uplink
ethernet port in this
fabric interconnect.

Numeric Value State
1

Enabled

2

Disabled

Note:
By default, this measure reports the abovementioned States while indicating the
administrative
status
of
a
Fabric
Interconnect
Uplink
Ethernet
port.
However, in the graph of this measure,
states will be represented using their
numeric equivalents only - i.e., 1 or 2.

85

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this uplink
ethernet port in this
fabric interconnect.

Numeric Value

State

0

Indeterminate

1

Up

2

Admin-down

3

Link-down

4

Failed

5

No-license

6

Link-up

7

Hardware-failure

8

Software-failure

9

Error-disabled

10

Sfp-not-present

Note:
By default, this measure reports the abovementioned states while indicating the
overall status of an uplink ethernet port.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, Slot ID, Port Type,
Role Type, Transport Type, Network Type,
MAC and Mode attributes for the ethernet
ports.

86

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Operational
speed:

The values reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Indicates
the
current
operating
speed of this uplink
ethernet port in this
fabric interconnect.

Numeric Value

Measure Value

0

Indeterminate

1

1Gbps

2

10Gbps

3

20Gbps

4

40Gbps

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the operational speed of an uplink ethernet
port. However, in the graph of this
measure, the speed will be represented
using
the
corresponding
numeric
equivalents only.

Broadcast packets
received:

Number

In computer networking, broadcasting
refers to transmitting a packet that will be
received by every device on the network.
Broadcasting can be performed as a high
level operation in a program, for example
broadcasting Message Passing Interface, or
it may be a low level networking operation,
for example broadcasting on Ethernet.

Indicates
the
number
of
broadcast
packets
received
by
this
uplink ethernet port
during
the
last
measurement
period .
Broadcast packets
transmitted:

Comparing the value of these measures
across all the uplink ethernet ports will
point you to that port which is handling the
maximum broadcast traffic.

Number

Indicates
the
number
of
broadcast
packets
transmitted by this
uplink ethernet port
during
the
last
measurement
period.

87

M o n i to r i n g

t h e

C i s c o

U C S

Jumbo
packets
received:

M a n a g e r

Number

In computer networking, jumbo frames are
Ethernet frames with more than 1500 bytes
of payload. Conventionally, jumbo frames
can carry up to 9000 bytes of payload, but
variations may exist.

Indicates
the
number of jumbo
packets received by
this uplink ethernet
port during the last
measurement
period.
Jumbo
packets
transmitted:

In the event of a network slowdown, you
can compare the value of these measures
across all the uplink ethernet ports to
quickly isolate the port that is overloaded
with jumbo packets.

Number

Indicates
the
number of jumbo
packets transmitted
by
this
uplink
ethernet port during
the
last
measurement
period.
Multicast packets
received:

Number

In computer networking, multicast is the
delivery of a message or information to a
group
of
destination
computers
simultaneously in a single transmission
from
the
source
creating
copies
automatically in other network elements,
such as routers, only when the topology of
the network requires it.

Indicates
the
number
of
multipcast packets
received
by
this
uplink ethernet port
during
the
last
measurement
period.
Multicast packets
transmitted:

In the event of a network slowdown, you
can compare the value of these measures
across all the uplink ethernet ports to
quickly isolate the port that is overloaded
with multicast packets.

Number

Indicates
the
number
of
multipcast packets
sent by this uplink
ethernet port during
the
last
measurement
period.
Data received:

MB

Compare the value of these measures
across all ethernet ports to determine which
port is handling the maximum data traffic.

Indicates
the
amount
of
data
received
by
this
uplink ethernet port
during
the
last
measurement
period.

88

M o n i to r i n g

t h e

C i s c o

U C S

Data transmitted:

M a n a g e r

MB

Indicates
the
amount
of
data
transmitted by this
uplink ethernet port
during
the
last
measurement
period.
Packets received:

Number

Compare the value of these measures
across all ethernet ports to determine which
port is handling the maximum packet
traffic.

Indicates
the
number of packets
received
by
this
uplink ethernet port
during
the
last
measurement
period.
Packets
transmitted:

Number

Indicates
the
number of packets
transmitted by this
uplink ethernet port
during
the
last
measurement
period.
Unicast
packets
received:

Number

Unicast is the term used to describe
communication
where
a
piece
of
information is sent from one point to
another point. In this case there is just one
sender, and one receiver.

Indicates
the
number of unicast
packets received by
this uplink ethernet
port during the last
measurement
period.
Unicast
packets
transmitted:

Compare the value of these measures
across all ethernet ports to determine which
port is handling the maximum unicast
packet traffic.
Number

Indicates
the
number of unicast
packets transmitted
by
this
uplink
ethernet port during
the
last
measurement
period.

The detailed diagnosis of the Overall status measure provides the Time, ID, Slot ID, Port Type, Role
Type, Transport Type, Network Type, MAC and Mode attributes for the ethernet ports.

89

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 14: The detailed diagnosis of the Overall status measure of the Fabric Interconnect Uplink Ethernet Ports test

1.3.3

Fabric Interconnect Fans Test

The Cisco UCS Fabric Interconnects comprise of two slots on the front of the chassis for fan modules.
Each fan module houses six fans. The combination of six fans for each module and two modules
provides the chassis with 12 fans. Use this test to closely monitor the availability, overall health, and
performance of each of these fans and report anomalies so that, you can promptly initiate measures
to ensure that adequate air flow is available in the fabric interconnects.

Purpose

Closely monitors the availability, overall health, and
fans in the fabric interconnects and reports anomalies

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

90

performance of each of the

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each fan in each fabric interconnect managed by the Cisco
UCS manager being monitored

Measurement

Measurement
Unit

91

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this fan in
this
fabric
interconnect.

Numeric Value State
0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Powerproblem

6

Removed

7

Voltageproblem

8
9
10

11

Thermalproblem
Performanceproblem
Accessibilityproblem
Identityunestablishabl
e

12

Bios-posttimeout

13

Disabled

51

52

Fabric-connproblem
Fabricunsupportedconn

81

Config

82

Equipmentproblem

83

92

Decommissioni
ng

84

Chassis-limitexceeded

100

Not-supported

101

Discovery

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgrade-problem

106

Peer-commproblem

107

Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the overall
status of a fan. However, in the graph of
this measure, states will be represented
using
their
corresponding
numeric
equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, PID, Module,
Revision, Serial Number, Tray and Vendor
attributes for each fan in the fabric
interconnect.

93

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this fan in
this
fabric
interconnect.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9
10
11

Performanceproblem
Accessibilityproblem
Identityunestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82

94

Equipmentproblem

83

Decommissioning

84

Chassis-limitexceeded

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

100

Not-supported

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgradeproblem

106
107

Peer-commproblem
Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operational state of a fan. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

95

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Performance
state:
Indicates
the
current
performance state
of this fan in this
fabric interconnect.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
performance state of a fan. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

96

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Power state:
Indicates
the
current power state
of this fan in this
fabric interconnect.

Numeric
Value

State

0

Unknown

1

On

2

Test

3

Off

4

Online

5

Offline

6

Offduty

7

Degraded

8

Power-save

9

Error

10

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the power
state of a fan. However, in the graph of this
measure, states will be represented using
their corresponding numeric equivalents
only.

97

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Presence state:
Indicates
the
current state of this
fan in this fabric
interconnect.

Numeric Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

20

Equipped-notprimary
Equippedidentityunestablishable

30

Inaccessible

40

Unauthorized

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
current state of a fan. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

98

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Thermal state:
Indicates
the
current
thermal
state of this fan in
this
fabric
interconnect.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
current thermal state of a fan. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

99

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Voltage state:
Indicates
the
current
voltage
state of this fan in
this
fabric
interconnect.

Numeric Value

State

0

Unknown

1

Ok

2

Upper-nonrecoverable

3

Upper-critical

4

Upper-noncritical

5

Lower-noncritical

6

Lower-critical

7

Lower
nonrecoverable

100

Not-supported

Note:
By default, this measure reports the abovementioned States while indicating the
current voltage state of a fan. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

The detailed diagnosis of the Overall status measure provides the Time, ID, PID, Module, Revision,
Serial Number, Tray and Vendor attributes for each fan in the fabric interconnect.

Figure 15: The detailed diagnosis of the Overall status measure of the Fabric Interconnect Fans test

100

M o n i to r i n g

1.3.4

t h e

C i s c o

U C S

M a n a g e r

Fabric Interconnect FC Ports Test

The Cisco UCS fabric interconnect includes the following key Fibre Channel (FC) port types:

 Uplink FC Ports : Uplink Fibre Channel ports handle FCoE traffic between the fabric interconnect
and the next layer of the network. All network-bound FCoE traffic is pinned to one of these
ports. If one/more of these ports are not operable or a traffic congestion occurs on any of
these ports, then, significant latencies can be noticed in the FCoE communication between the
corresponding interconnect and the network. To avoid this, you need to constantly observe the
operational status, overall health, and the traffic flowing to and from each of the FC ports on
every fabric interconnect, spot abnormalities quickly, and fix them before it is too late. This
test enables you to do just that.

 Storage FC Ports : The Storage FC Port type allows for the direct attachment of a FC storage
device to one of the native FC ports on the Fabric Interconnect expansion modules. Like the
FCoE Storage Port type, the FC frames arriving on these ports are expected to be un-tagged –
so no connection to an MDS FC switch, etc. Each Storage FC Port is assigned a VSAN number
to keep the traffic separated within the UCS Unified Fabric. When used in this way, the Fabric
Interconnect is not providing any FC zoning configuration capabilities – all devices within a
particular VSAN will be allowed, at least at the FC switching layer (FC2), to communicate with
each other.
In addition, the fabric interconnect supports Monitoring FC Ports, and FC ports that have not yet been
configured to perform any function and are hence still UnConfigured FC Ports.
The test runs frequent health checks on each of the FC ports in every fabric interconnect, and turns
the spotlight on overloaded ports, non-operational ports, and ports that are operating at a slow pace.

Purpose

Runs frequent health checks on each of the FC ports in every fabric interconnect,
and turns the spotlight on overloaded ports, non-operational ports, and ports that
are operating at a slow pace

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

101

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. SHOW OVERALL STATUS - By default, regardless of the Administrative state of
an FC Port, this test reports the Overall status of that port. In other words, by
default, this test reports the Overall status measure for an FC port, even if the
Administrative state of that port is Disabled. This is because, the SHOW
OVERALL STATUS flag is set to Yes by default. If this flag is set to No instead,
then this test will report the Overall status of only those FC ports that are
currently in an Enabled Administrative state.
9. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each FC port in every fabric interconnect managed by the
Cisco UCS manager being monitored

Measurement

Measurement
Unit

102

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

Administrative
state:

This measure reports either Enabled or
Disabled as the administrative status a
port. The states and their corresponding
numeric equivalents are shown in the table
below:

Indicates
the
current
administrative
status of this FC
port in this fabric
interconnect.

Numeric Value State
1

Enabled

2

Disbled

Note:
By default, this measure reports the abovementioned States while indicating the
administrative status of an FC port.
However, in the graph of this measure,
states will be represented using their
numeric equivalents only - i.e., 1 or 2.

103

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this FC
port in this fabric
interconnect.

Numeric Value

State

0

Indeterminate

1

Up

2

Admin-down

3

Link-down

4

Failed

5

No-license

6

Link-up

7

Hardware-failure

8

Software-failure

9

Error-disabled

10

Sfp-not-present

Note:
By default, this measure reports the abovementioned states while indicating the
overall status of an FC port. However, in
the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The detailed diagnosis of this measure
provides the Time, ID, Slot ID, Port Type,
Network Type, Transport Type, WWPN and
Mode attributes for each FC port.

104

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Negotiated speed:

The values reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Indicates
the
current
operating
speed of this FC
port in this fabric
interconnect.

Numeric Value

Measure Value

0

Indeterminate

1

1Gbps

2

10Gbps

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the operational speed of an FC port.
However, in the graph of this measure, the
speed will be represented using the
corresponding numeric equivalents only.

Data received:

MB

Compare the value of these measures
across all FC ports to determine which port
is handling the maximum data traffic.

Indicates
the
amount
of
data
received by this FC
port during the last
measurement
period.
Data transmitted:
Indicates
amount of data
by this FC
during
the
measurement
period.

MB

the
sent
port
last

Packets received:

Number

Compare the value of these measures
across all FC ports to determine which port
is handling the maximum packet traffic.

Indicates
the
number of packets
received by this FC
port during the last
measurement
period.

105

M o n i to r i n g

t h e

C i s c o

U C S

Packets
transmitted:

M a n a g e r

Number

Indicates
the
number of packets
transmitted by this
FC port during the
last
measurement
period.
Crc received:

Errors

CRC or Cyclic Redundancy Check is a
process that helps in identifying any errors
that
might
occur
during
the
data
transmission process. Data is usually
transmitted in small blocks, and a CRC
value is assigned to each block and
transmitted along with it. This CRC value is
verified at the destination to ensure that it
matches the CRC value transmitted from
the source. A CRC error occurs when the
two values (source and destination) do not
match and the test fails. The main benefit
of CRC is that it helps you ensure that data
you have received or downloaded is not
damaged or corrupt.

Indicates
the
number of Cyclic
Redundancy Check
(CRC) errors that
occurred
during
data trafficking in
this FC port, during
the
last
measurement
period.

By comparing the value of this measure
across all FC ports, you can accurately
identify most error-prone FC port.

106

M o n i to r i n g

t h e

C i s c o

U C S

Error received:

M a n a g e r

Errors

Indicates the total
number of errors
received by this FC
port during the last
measurement
period.
Error transmitted:

Errors

Indicates the total
number of errors
transmitted by this
FC port during the
last
measurement
period.
Discard
received:

error

MB

Indicates the total
amount of data that
was
discarded
during reception of
data by this FC port
since
the
last
measurement
period.
Discard
error
transmitted:

MB

Indicates the total
amount of data that
was
discarded
during
data
transmission
through this FC port
since
the
last
measurement
period.
Too long
received:

error

Errors

Ideally, the value of this measure should be
low. A high value is indicative of many
errors during data reception. To identify the
most error-prone port, compare the value
of this measure across FC ports.

Indicates the total
number of errors
that occurred when
data of a large size
was received by this
FC port during the
last
measurement
period.

107

M o n i to r i n g

t h e

C i s c o

Too short
received:

U C S

error

M a n a g e r

Errors

Ideally, the value of this measure should be
low. A high value is indicative of many
errors during data transmission. To identify
the most error-prone port, compare the
value of this measure across FC ports.

Errors

Ideally, the value of this measure should be
0.

Errors

Ideally, the value of this measure should be
0.

Errors

Ideally, the value of this measure should be
0.

Indicates the total
number of errors
that occurred due to
truncated or corrupt
data received by
this FC port during
the
last
measurement
period.
Signal losses:
Indicates the signal
losses that occurred
on this FC port
during
data
transmission
and
reception in the last
measurement
period.
Synchronize
losses:
Indicates the losses
that occurred due to
synchronization of
this FC port with
other
components
during
the
last
measurement
period.
Link failures:
Indicates the link
failures
that
occurred
between
this FC port blade
server
chassis
during
the
last
measurement
period.

The detailed diagnosis of the Overall status measure provides the Time, ID, Slot ID, Port Type,
Network Type, Transport Type, WWPN and Mode attributes for each FC port.

108

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 16: The detailed diagnosis of the Fabric Interconnect Uplink FC Ports test

1.3.5

Fabric Interconnect Details Test

Since fabric interconnects provide both network connectivity and management capabilities for the
Cisco UCS system, an inoperable or resource-intensive fabric interconnect can shake the
communication backbone for the blade servers and the blade server chassis of the system. Likewise,
real and potential threats to the health of the interconnect hardware (eg., PSUs, mainboards, fans)
can also result in significant latencies in network traffic flow over the interconnects. With the help of
this test, you can keep track of the operational status and resource usage of the fabric interconnects,
and also be alerted to sudden spikes in the temperature of the PSUs, mainboards, and fans supported
by each interconnect.

Purpose

With the help of this test, you can keep track of the operational status and resource
usage of the fabric interconnects, and also be alerted to sudden spikes in the
temperature of the PSUs, mainboards, and fans supported by each interconnect

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

109

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each fabric interconnect managed by the Cisco UCS manager
being monitored

Measurement

Measurement
Unit

110

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this fabric
interconnect.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

Note:
By default, this measure reports the abovementioned states while indicating the
overall status of a fabric interconnect.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
The detailed diagnosis of this measure
provides the Time, Name, PID, Revision,
Serial Number and Vendor attributes of
each fabric interconnect.
Load:

Percent

A high value is indicative of excessive CPU
usage, and is a cause for concern.

MB

A

Indicates
the
percentage of CPU
utilized
by
this
fabric interconnect.
Available
memory:
Indicates
the
amount of memory
available with this
fabric interconnect.
Cached memory:

MB

Indicates
the
memory allotted for
cache
(frequently
used main memory
locations) in this
fabric interconnect.
Total memory:

low

value

bottleneck.

MB

Indicates the total
memory
of
this
fabric interconnect.

111

may

indicate

a

memory

M o n i to r i n g

t h e

C i s c o

Fan
inlet1:

U C S

control

M a n a g e r

Celcius

A low value is desired for this measure.

Celcius

A low value is desired for this measure.

Celcius

A low value is desired for this measure.

Celcius

A low value is desired for this measure.

Celcius

A low value is desired for this measure.

Celcius

A low value is desired for this measure.

Celcius

A low value is desired for this measure.

Indicates
the
temperature of fan
1 of this fabric
interconnect.
Fan control inlet2
Indicates
the
temperature of fan
2 of this fabric
interconnect.
Fan
inlet3:

control

Indicates
the
temperature of fan
3 of this fabric
interconnect.
Fan
inlet4:

control

Indicates
the
temperature of fan
4 of this fabric
interconnect.
Mainboard
outlet1:
Indicates
the
temperature of the
mainboard 1 of this
fabric interconnect.
Mainboard
outlet2:
Indicates
the
temperature of the
mainboard 2 of this
fabric interconnect.
PSU
inlet1:

control

Indicates
the
temperature
of
power supply unit 1
of
this
fabric
interconnect.

112

M o n i to r i n g

t h e

C i s c o

PSU
inlet2:

U C S

M a n a g e r

control

Celcius

A low value is desired for this measure.

Indicates
the
temperature
of
power supply unit 2
of
this
fabric
interconnect.

The detailed diagnosis of the Overall status measure provides the Time, Name, PID, Revision, Serial
Number and Vendor attributes of each fabric interconnect.

Figure 17: The detailed diagnosis of the Overall status measure of the Fabric Interconnect Details test

1.3.6

LAN Cloud Port Channels Test

You can aggregate a number of uplink ethernet ports by configuring them as a port channel, so traffic
will forward between your upstream LAN switch and Cisco UCS fabric interconnect over the aggregate
port channel ports as a single aggregated link.
This test auto-discovers the port channels configured on each Fabric Interconnect and reports the
overall health, operational speed, and VLAN status of each port channel. With the help of this test,
problematic and slow port channels can be identified.

Purpose

Auto-discovers the port channels configured on each Fabric Interconnect and reports
the overall health, operational speed, and VLAN status of each port channel. With
the help of this test, problematic and slow port channels can be identified

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

113

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each port channel configured on every fabric interconnect
managed by the Cisco UCS manager being monitored

Measurement

Measurement
Unit

114

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

Overall status:

The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current
overall
status of this port
channel.

Measure Value

Numeric Value

Indeterminate

0

Up

1

Admin-down

2

Link-down

3

Failed

4

No-license

5

Link-up

6

Hardware-failure 7
Software-failure

8

Error-disabled

9

Sfp-not-present

10

The detailed diagnosis of this measure
provides the complete details of a port
channel, such as, the ID of the port
channel, the ID of the Fabric Interconnect
for which it is configured, the Type, the Port
type, the flow control policy, the transport,
and the port channel name.

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the overall status of a port channel.
However, in the graph of this measure, port
channel status will be represented using
their numeric equivalents only.

115

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Administrative
state:

The values that this measure can report
and the numeric values that correspond to
the measure values have been detailed in
the table below:

Indicates
the
current
administrative
status of this port
channel.

Measure Value

Numeric
Value

1

Enabled

2

Disabled

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the administrative status of a port channel.
However, in the graph of this measure,
states will be represented using the
corresponding numeric equivalents only.
Administrative
speed:

Number

The values that this measure can report
and their corresponding numeric values are
available in the table below:

Indicates
the
current
administrative
speed of this port
channel.

Measure Value

Numeric Value

1 Gbps

1

10 Gbps

2

20 Gbps

3

40 Gbps

4

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the administrative speed of a port channel
However, in the graph of this measure,
speed will be represented using the
corresponding numeric values only. .

116

M o n i to r i n g

t h e

C i s c o

U C S

Operational
speed:

M a n a g e r

Number

The values reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Indicates
the
current
operating
speed of this port
channel.

Measure Value

Numeric Value

1 Gbps

1

10 Gbps

2

20 Gbps

3

40 Gbps

4

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the operational speed of a port channel.
However, in the graph of this measure, the
speed will be represented using the
corresponding numeric equivalents only.

VLAN status:

The values this measure can report and
their corresponding numeric values have
been listed in the table below:

Indicates
the
current VLAN status
of this port channel.

Numeric Value

Measure Value

0

OK

1

Missing-primary

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the VLAN status of a port channel.
However, in the graph of this measure, the
VLAN status will be represented using the
corresponding numeric equivalents only.

The detailed diagnosis of the Overall status measure provides the complete details of a port channel,
such as, the ID of the port channel, the ID of the Fabric Interconnect for which it is configured, the
Type, the Port type, the flow control policy, the transport, and the name.

117

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 18: The detailed diagnosis of the Overall status measure of the LAN Cloud Port Channels Test

1.3.7

LAN Cloud PC Ethernet Ports

You can aggregate a number of uplink ethernet ports by configuring them as a port channel, so traffic
will forward between your upstream LAN switch and Cisco UCS fabric interconnect over the aggregate
port channel ports as a single aggregated link.
This test auto-discovers the Ethernet ports aggregated in each port channel on every Fabric
Interconnect and reports the overall health, operational speed, and VLAN status of each Ethernet port.
With the help of this test, problematic and slow ports can be identified.

Purpose

Auto-discovers the Ethernet ports aggregated in each port channel on every Fabric
Interconnect and reports the overall health, operational speed, and VLAN status of
each Ethernet port. With the help of this test, problematic and slow ports can be
identified

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

118

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each Ethernet port in each port channel configured on every
fabric interconnect managed by the Cisco UCS manager being monitored

Measurement

Measurement
Unit

119

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

Overall status:

The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current
overall
status of this port.

Measure Value

Numeric Value

Unknown

0

Up

1

Down

2

Errormisconfigured

3

The detailed diagnosis of this measure
provides the complete details of a port,
such as, the ID of the port, the slot ID, the
Fabric Interconnect for which the port has
been configured, the Type, the Port type,
the flow control policy, the transport, and
the port, locale, and name.

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the overall status of a port. However, in the
graph of this measure, port status will be
represented using their numeric equivalents
only.
Administrative
state:

The values that this measure can report
and the numeric values that correspond to
the measure values have been detailed in
the table below:

Indicates
the
current
administrative
status of this port.

Measure Value

Numeric
Value

1

Enabled

2

Disabled

Note:
By default, this measure reports the abovementioned Measure Values while indicating
the administrative status of a port.
However, in the graph of this measure,
states will be represented using the
corresponding numeric equivalents only.

120

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The detailed diagnosis of the Overall status measure provides the complete details of a port, such as,
the ID of the port, the slot ID, the Fabric Interconnect for which the port has been configured, the
Type, the Port type, the flow control policy, the transport, and the port, locale, and name.

Figure 19: The detailed diagnosis of the Overall Status measure of the LAN Cloud PC Ethernet Ports test

1.4

The Blades Layer

The Cisco UCS B-Series Blade Servers are crucial building blocks of the Cisco Unified Computing
System, delivering scalable and flexible computing for a datacenter.
The Cisco UCS B-Series Blade Servers are based on industry-standard server technologies and
provide:

 Up to two Intel Xeon Series 5500 multicore processors
 Two optional front-accessible, hot-swappable SAS hard drives
 Support for up to two dual-port mezzanine card connections for up to 40 Gbps of redundant
I/O throughput

 Industry-standard double-data-rate 3 (DDR3) memory
 Remote management through an integrated service processor that also executes policies
established in Cisco UCS Manager software

 Local keyboard, video, and mouse (KVM) access through a front console port on each server
 Out-of-band access by remote KVM, Secure Shell (SSH) Protocol, and virtual media (vMedia)
as well as Intelligent Platform Management Interface (IPMI)
Since these blade servers are the heart of the Cisco UCS system, even a brief non-availability or nonoperability of these servers, or sporadic hardware-related issues they encounter, will have an adverse
impact on the overall performance of the Cisco UCS system. Using the tests mapped to this layer,
administrators can closely observe the changes in the status of the blade servers, and promptly detect
deviations, so that the problems can be resolved before they affect the Cisco UCS system as a whole.

121

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Figure 20: The tests mapped to the Blades layer

1.4.1

Blade Overview Test

Blade servers are the core components of the Cisco UCS system. Unavailable/inoperable blade servers
can hence bring the entire system to a standstill. Using this test, you can continuously monitor the
overall health, operability, and availability of each blade server in each chassis managed by the Cisco
UCS manager, and be alerted to anomalies as soon as they occur, so that you can take the required
corrective actions before your mission-critical services begin to suffer. In addition, the test also
captures critical power and thermal failures experienced by the blade servers, and takes stock of the
hardware (such as processors, cores, NICs, etc.) supporting the operations of the blade server.

Purpose

Continuously monitors the overall health, operability, and availability of each blade
server in each chassis managed by the Cisco UCS manager, and alerts
administrators to anomalies as soon as they occur, so that the required corrective
actions can be taken before mission-critical services begin to suffer

Target of the
test

A Cisco UCS manager

Agent
deploying the

A remote agent

122

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

test
Configurable
parameters for
the test

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test
Measurements
made by the

One set of results for each blade server in each chassis managed by the Cisco UCS
manager being monitored

Measurement

Measurement
Unit

123

Interpretation

M o n i to r i n g

test

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Overall status:
Indicates the overall
status of this blade
server
in
this
chassis.

124

Numeric Value

State

0

Indeterminate

1

Unassociated

10

Ok

11

Discovery

12

Config

13

Unconfig

14

Power-off

15

Restart

20

Maintenance

21

Test

29

Compute-mismatch

30

Compute-failed

31

Degraded

32

Discovery-failed

33

Config-failure

34

Unconfig-failed

35

Test-failed

36

Maintenance-failed

40

Removed

41

Disabled

50

Inaccessible

60

Thermal-problem

61

Power-problem

62

Voltage-problem

63

Inoperable

101

Decomissioning

201

Bios-restore

202

Cmos-reset

203

Diagnostics

204

Diagnostics-failed

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned states while indicating the
overall status of a blade server. However,
in the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The detailed diagnosis of this measure
provides the Time, Slot ID, chassis ID, PID,
Revision, Serial Number, Vendor, Name,
UUID, Service Profile and Original UUID
attributes for this blade server.
Administrative
state:

This measure reports either In-service or
Out-of-service as the adminstrative state of
the blade servers. The numeric equivalents
corresponding to these states are shown in
the table below:

Indicates
the
current
administrative state
of this blade server
loaded
in
this
chassis.

Numeric Value

State

1

In-service

2

Out-of-service

Note:
By default, this measure reports the abovementioned states while indicating the
administrative state of a blade server.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.

125

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Association state:

A service profile represents a logical view of
a single blade server, without needing to
know exactly which blade you are talking
about. The profile object contains the
server personality (identity and network
information). The profile can then be
associated with a single blade at a time.

Indicates
the
current associative
state of this blade
server loaded in this
chassis
i.e.,
indicates
whether
the blade server is
associated with the
service profile that
is preconfigured in
the
Cisco
UCS
Manager.

Cisco UCS Manager uses service profiles to
provision the blade servers and their I/O
properties. The Cisco Unified Computing
System
has
a
form
factor-neutral
architecture, allowing administrators to
centrally manage Cisco UCS blade servers
or rack-mount servers, or incorporate both
within a single management domain.
Service profiles are created by server,
network, and storage administrators and
are stored in the Cisco UCS Fabric
Interconnects.
Infrastructure
policies
needed to deploy applications, such as
power and cooling, security, identity,
hardware health, and Ethernet and storage
networking, are encapsulated in the service
profile.
The
policies
coordinate
and
automate element management at every
layer of the hardware stack, including RAID
levels, BIOS settings, firmware revisions
and settings, adapter identities and
settings, VLAN and VSAN network settings,
network quality of service (QoS), and data
center connectivity. Cisco UCS Manager
provides granular Cisco Unified Computing
System
visibility
for
higher-level
management tools from BMC, CA, HP, IBM,
and others, providing exceptional alignment
of infrastructure management with OS and
application requirements.
This measure reports the associative state
of the blade servers and their numeric
equivalents as shown in the table:

126

Numeric Value

State

0

None

1

Associated

2

Removing

3

Failed

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned states while indicating the
associative state of a blade server.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
Availability state:

MB

This measure reports either Available or
Unavailable as the availability status of the
blade servers. The states and their
corresponding numeric equivalents are
shown in the table below:

Indicates
the
current availability
status of this blade
server
in
this
chassis.

Numeric Value

State

0

Unavailable

1

Available

Note:
By default, this measure reports the abovementioned states while indicating the
availability state of a blade server.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Checkpoint state:
Indicates
the
current checkpoint
status of this blade
server loaded in this
chassis.

Numeric
Value

State

0

Unknown

1

Removing

2

Shallow-checkpoint

3

Deep-checkpoint

4

Discovered

Note:
By default, this measure reports the abovementioned states while indicating the
checkpoint state of a blade server.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
127

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Discovery state:
Indicates
the
current
discovery
status of this blade
server loaded in this
chassis.

Numeric Value State
0

Undiscovered

1

In-progress

2

Malformed-fru-ino

3

Fru-not-ready

4

Insufficientlyequipped

8

Failed

16

Complete

32

Retry

64

Throttled

128

Illegal-fru

129

Fru-identityindeterminate

130
131
132

Fru-stateindeterminate
Diagnostics-inprogress
Efidiagnostics-inprogress

133

Diagnostics-failed

134

Diagnosticscomplete

Note:
By default, this measure reports the abovementioned States while indicating the
discovery state of a blade server. However,
in the graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

128

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Operability:
Indicates
the
current
operating
state of this blade
server loaded in this
chassis.

Numeric Value

State

0

Unknown

1

Operable

2

Inoperable

3

Degraded

4

Powered-off

5

Power-problem

6

Removed

7

Voltage-problem

8

Thermal-problem

9
10
11

Performanceproblem
Accessibilityproblem
Identityunestablishable

12

Bios-post-timeout

13

Disabled

51

Fabric-connproblem
Fabric-

52

unsupportedconn

81

Config

82

129

Equipmentproblem

83

Decommissioning

84

Chassis-limitexceeded

100

Not-supported

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

101

Discovery

102

Discovery-failed

103

Identify

104

Post-failure

105

Upgradeproblem

106

Peer-commproblem

107

Auto-upgrade

Note:
By default, this measure reports the abovementioned States while indicating the
operational state of a blade server.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Power state:
Indicates
the
current
power
status of this blade
server loaded in this
chassis.

130

Numeric Value

State

0

Unknown

1

On

2

Test

3

Off

4

Online

5

Offline

6

Offduty

7

Degraded

8

Power-save

9

Error

10

Notsupported

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Note:
By default, this measure reports the abovementioned States while indicating the power
state of a blade server. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.
The States reported by this measure and
their corresponding numeric equivalents are
described in the table below:

Slot state:
Indicates
the
current slot status
of this blade server
loaded
in
this
chassis.

Numeric Value

State

0

Unknown

1

Empty

10

Equipped

11

Missing

12

Mismatch

13

Equipped-notprimary
Equipped-

20

identityunestablishable
Mismatch-

21

identityunestablishable

30

Inaccessible

40

Unauthorized

Note:
By default, this measure reports the abovementioned States while indicating the slot
state of a blade server. However, in the
graph of this measure, states will be
represented using their corresponding
numeric equivalents only.

131

M o n i to r i n g

t h e

C i s c o

U C S

Effective memory:

M a n a g e r

MB

Ideally, the value of this measure should be
high.

Indicates
the
amount of memory
that
can
be
effectively used by
this blade server
present
in
this
chassis.
Total memory:

MB

Indicates the total
memory available in
this blade server
present
in
this
chassis.
Number
processors:

of

Number

Indicates
the
number of Central
Proccessor
Units
available
in
this
blade server loaded
in this chassis.
Number of cores:

Number

Indicates the total
number of cores
available on all the
CPS
that
are
installed
in
this
blade server in this
chassis.
Number of cores
enabled:

Number

Indicates
the
number
of
core
processors that are
enabled
in
this
blade server in this
chassis.
Number
threads:

of

Number

This measures should be equal to either the
number of cores or twice the number of
cores if the operating system supports
hyperthreading.

Indicates
the
number
of
processes that can
run simultaneously
on this blade server
in this chassis.

132

M o n i to r i n g

t h e

C i s c o

U C S

Number
adapters:

M a n a g e r

of

Number

Indicates
the
number of adapters
available
in
this
blade server in this
chassis.
Number of NICs:

Number

Indicates
the
number of physical
ethernet
network
interface
cards
(NICs) available in
this blade server in
this chassis.
Number of HBAs:

Number

Indicates
the
number of physical
host bus adapters
(HBAs) available in
the blade servers.

1.4.2

Blade Processors Test

The Cisco UCS B-Series Blade Servers support up to two Intel Xeon Series 5500 multicore processors.
If the temperature of a processor suddenly soars or a high voltage of current unexpectedly flows
through a processor, it can damage one/more internal components of the processor, thereby
suspending not only the processor's operations, but also that of the blade server depending on it. It is
hence imperative to keep tabs of the temperature and current changes experienced by each of the
processors of a blade server. Using this test, you can periodically check the temperature and input
current of each of the processors supported by a blade server, and promptly detect abnormalities (if
any).

Purpose

Periodically checks the temperature and input current of each of the processors
supported by a blade server, and promptly detect abnormalities (if any)

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

133

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
9. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.

Outputs of the
test
Measurements
made by the
test

One set of results for each processor supported by every blade server in each
chassis being monitored

Measurement
CPU temperature:

Measurement
Unit
Celcius

A low value is ideal for this measure. A
sudden and significant increase in this value
could be a cause for concern.

Amps

Ideally, the value of this measure should be
low. A sudden, yet significant increase in
this value could inflict injury on the internal
components of the processor.

Indicates
the
current temperature
of this processor.
Input current:
Indicates the input
current received by
this processor.

1.4.3

Interpretation

Blade Motherboard Test

Issues in the motherboard can have an adverse impact on the performance levels delivered by a blade
server. This test monitors the health of the motherboard of each blade server loaded in each chassis
managed by a Cisco UCS manager, and reveals the following:

 Is the motherboard consuming power excessively?
 Are the current and voltage inputs received by the motherboard in excess of its capacity?
 Are the temperatures in the front and rear panels of the motherboard normal?
134

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

 If the temperature of the rear panel is very high, then which rear panel is contributing to this
abnormality - the left or the right rear panel?

Purpose

Monitors the health of the motherboard of each blade server loaded in a chassis

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

Configurable
parameters for
the test

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.

Outputs of the
test
Measurements
made by the
test

One set of results for each blade server loaded in each chassis managed by the
Cisco UCS manager being monitored

Measurement
Consumed power:

Measurement
Unit
Watts

An unusually high value could be a cause
for concern.

Amps

Ideally, the value of this measure should be
low. A sudden, yet significant increase in
this value could inflict injury on the
motherboard.

Indicates the total
power consumed by
the motherboard of
this blade server.
Input current:

Interpretation

Indicates the input
current received by
the motherboard of
this blade server.
135

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Input voltage:

Volts

Ideally, the value of this measure should be
low. A sudden, yet significant increase in
this value could inflict injury on the
motherboard.

Celcius

A very high temperature indicates that the
motherboard is overheated.

Celcius

A very high temperature indicates that the
motherboard is overheated.

Celcius

A very high temperature indicates that the
motherboard is overheated.

Celcius

A very high temperature indicates that the
motherboard is overheated.

Indicates the input
voltage received by
the motherboard of
this blade server.
Front
temperature:
Indicates
temperature of
front panel of
motherboard of
blade server.

the
the
the
this

Rear
temperature:
Indicates
temperature of
rear panel of
motherboard of
blade server.

the
the
the
this

Rear temperature
right:
Indicates
the
temperature of the
right rear panel of
the motherboard of
this blade server.
Rear temperature
left:
Indicates
the
temperature of the
left rear panel of
the motherboard of
this blade server.

1.4.4

Blade Memory Arrays Test

This test monitors the temperature of each of the memory arrays of the blade servers loaded in a
chassis, and reports any abnormal increase in temperature.

Purpose

Monitors the temperature of each of the memory arrays of the blade servers loaded
in a chassis, and reports any abnormal increase in temperature

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

136

M o n i to r i n g

Configurable
parameters for
the test

t h e

C i s c o

U C S

M a n a g e r

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.

Outputs of the
test
Measurements
made by the
test

One set of results for each memory array of every blade server loaded in each
chassis managed by the Cisco UCS manager being monitored

Measurement
Array
temperature:

Measurement
Unit
Celcius

Interpretation
A very high temperature could indicate that
the memory array is overheated.

Indicates
the
current temperature
of
the
memory
array present in this
blade server.

1.4.5

Blade NICs Test
This test auto-discovers the NICs (Network Interface Cards) supported by the UCS Blade
servers, monitors the overall health, operational state, and load on each NIC, and promptly
notifies administrators when an NIC suddenly switches to an abnormal state, becomes
overloaded, or encounters errors while sending/receiving data over the network. This way, you
can easily isolate problematic, over-used, and error-prone NICs.

137

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Purpose

Auto-discovers the NICs (Network Interface Cards) supported by the UCS Blade
servers, monitors the overall health, operational state, and load on each NIC, and
promptly notifies administrators when an NIC suddenly switches to an abnormal
state, becomes overloaded, or encounters errors while sending/receiving data over
the network

Target of the
test

A Cisco UCS manager

Agent
deploying the
test

A remote agent

Configurable
parameters for
the test

1. TESTPERIOD – How often should the test be executed
2. HOST – The IP address of the host for which the test is being configured.
3. PORT – The port at which the specified HOST listens. By default, this is NULL.
4. UCS USER and UCS PASSWORD – Provide the credentials of a user with at least
read-only privileges to the target Cisco UCS manager.
5. CONFIRM PASSWORD- Confirm the password by retyping it here.
6. SSL - By default, the Cisco UCS manager is SSL-enabled. Accordingly, the SSL
flag is set to Yes by default.
7. WEB PORT- By default, in most virtualized environments, Cisco UCS manager
listens on port 80 (if not SSL-enabeld) or on port 443 (if SSL-enabled) only. This
implies that while monitoring Cisco UCS manager, the eG agent, by default,
connects to port 80 or 443, depending upon the SSL-enabled status of Cisco
UCS manager - i.e., if Cisco UCS manager is not SSL-enabled (i.e., if the SSL
flag above is set to No), then the eG agent connects to Cisco UCS manager
using port 80 by default, and if Cisco UCS manager is SSL-enabled (i.e., if the
SSL flag is set to Yes), then the agent-Cisco UCS manager communication
occurs via port 443 by default. Accordingly, the WEBPORT parameter is set to
default by default.
In some environments however, the default ports 80 or 443 might not apply. In
such a case, against the WEBPORT parameter, you can specify the exact port at
which the Cisco UCS manager in your environment listens, so that the eG agent
communicates with that port for collecting metrics from the Cisco UCS manager.
8. DETAILED DIAGNOSIS - To make diagnosis more efficient and accurate, the eG
Enterprise suite embeds an optional detailed diagnostic capability. With this
capability, the eG agents can be configured to run detailed, more elaborate tests
as and when specific problems are detected. To enable the detailed diagnosis
capability of this test for a particular server, choose the On option. To disable
the capability, click on the Off option.
The option to selectively enable/disable the detailed diagnosis capability will be
available only if the following conditions are fulfilled:

 The eG manager license should allow the detailed diagnosis capability
 Both the normal and abnormal frequencies configured for the detailed
diagnosis measures should not be 0.

Outputs of the
test

One set of results for each NIC supported by every blade server loaded in each
chassis managed by the Cisco UCS manager being monitored

138

M o n i to r i n g

Measurements
made by the
test

t h e

C i s c o

U C S

Measurement

M a n a g e r

Measurement
Unit

Overall status:

Interpretation
The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current state of this
NIC.

Measure Value

Numeric Value

Unknown

0

Operable

1

Inoperable

2

Degraded

3

Powered off

4

Power-problem

5

Removed

6

Voltage-problem

7

Thermal-problem 8
Performanceproblem
Accessibilityproblem
Identityunestablishable
Bios-posttimeout

9
10
11
12

Disabled

13

Fabric-connproblem

51

Fabricunsupportedconn

52

Config

81

Equipmentproblem

82

Decommissioning 83

139

Chassis-limitexceeded

84

Not-supported

100

Discovery

101

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Measure Value

Numeric Value

Discovery-failed

102

Identify

103

Post-failure

104

Upgrade-problem 105
Peer-commproblem

106

Auto-upgrade

107

The detailed diagnosis of this measure
provides the complete details of an NIC
such as its ID, Vendor, vNIC, PCIE Address,
MAC, Original MAC, Purpose, Name, and
Type.

Note:
By

default,

this

measure

reports

the

Measure Values listed in the table above to
indicate the overall state of an NIC.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.

140

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Operability:

The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current operational
state of this NIC.

Measure Value

Numeric Value

Unknown

0

Operable

1

Inoperable

2

Degraded

3

Powered-off

4

Power-problem

5

Removed

6

Voltage-problem

7

Thermal-problem 8
Performanceproblem
Accessibilityproblem
Identityunestablishable

9
10
11

Bios-posttimeout

12

Disabled

13

Fabric-connproblem
Fabricunsupported-

51

52

conn
Config

81

Equipmentproblem

82

Decommissioning 83
Chassis-limitexceeded

141

84

Not-supported

100

Discovery

101

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Measure Value

Numeric Value

Discovery-failed

102

Identify

103

Post-failure

104

Upgrade-problem 105
Peer-commproblem

106

Auto-upgrade

107

Note:
By

default,

this

measure

reports

the

Measure Values listed in the table above to
indicate the operational state of an NIC.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
Administrative
state:

The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current
administrative state
of this NIC.

Measure Value

Numeric Value

Enabled

0

Resetconnectivityactive

1

Resetconnectivitypassive

2

Reset-

3

connectivity

Note:
By

default,

this

measure

reports

the

Measure Values listed in the table above to
indicate the administrative state of an NIC.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.

142

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Discovery state:

The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current
discovery
state of this NIC.

Measure Value

Numeric Value

Absent

0

Present

1

Mis-connect

2

Missing

3

New

4

Note:
By

default,

this

measure

reports

the

Measure Values listed in the table above to
indicate the discovery state of an NIC.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.

143

M o n i to r i n g

t h e

C i s c o

U C S

M a n a g e r

Presence state:

The values reported by this measure and
their corresponding numeric values are
described in the table below:

Indicates
the
current
presence
state of this NIC.

Measure Value

Numeric Value

Unknown

0

Empty

1

Equipped

10

Missing

11

Mismatch

12

Equipped-notprimary
Equippedidentity-

13

20

unestablishable
Mismatchidentity-

21

unestablishable
Inaccessible

30

Unauthorized

40

Not-supported

100

Note:
By

default,

this

measure

reports

the

Measure Values listed in the table above to
indicate the presence state of an NIC.
However, in the graph of this measure,
states will be represented using their
corresponding numeric equivalents only.
Data received:

MB

These measures are good indicators of the
load bandled by an NIC. By comparing the
value of each measure across NICs, you
can
quickly
identify
which
NIC
is
experiencing heavy data traffic and when while receiving data? or while transmitting
data?

Indicates
the
amount
of
data
received by this NIC
during
the
last
measurement
period.
Data transmitted:

MB

Indicates
the
amount
of
data
transmitted by this
NIC during the last
measurement
period.

144

M o n i to r i n g

t h e

C i s c o

U C S

Packets received:

M a n a g e r

Packets

These measures are good indicators of the
load bandled by an NIC. By comparing the
value of each measure across NICs, you
can
quickly
identify
which
NIC
is
experiencing heavy data traffic and when while receiving data? or while transmitting
data?

Indicates
the
number of packets
received by this NIC
during
the
last
measurement
period.
Packets
transmitted

Packets

Indicates
the
number of packets
sent by this NIC
during
the
last
measurement
period.
Dropped packets
received:

Packets

Indicates
the
number of dropped
packets received by
this NIC during the
last
measurement
period.
Dropped packets
transmitted:

Packets

Indicates
the
number of dropped
packets transmitted
by this NIC during
the
last
measurement
period.
Errors received:

Errors

Ideally, the value of both these measures
should be 0. A non-zero value indicates that
one/more errors have occurred on an NIC.
If these measure values increase with time,
you may want to compare the value of each
of these measures across NICs to quickly
zero-in on the error-prone NICs and
understand when the maximum number of
errors occurred on those NICs - while
transmitting data? or while receiving it?

Indicates the errors
encountered by this
NIC while receiving
data during the last
measurement
period.

145

M o n i to r i n g

t h e

C i s c o

U C S

Errors
transmitted:

M a n a g e r

Errors

Indicates the errors
encountered by this
NIC
while
transmitting
data
during
the
last
measurement
period.

146

C o n c l u s i o n

Chapter

2

Conclusion
This document has described in detail the monitoring paradigm used and the measurement
capabilities of the eG Enterprise suite of products with respect to Cisco UCS Manager. For details of how
to administer and use the eG Enterprise suite of products, refer to the user manuals.
We will be adding new measurement capabilities into the
you can identify new capabilities that you would like us
products, please contact support@eginnovations.com.
cooperation. Any feedback regarding this manual or any
be forwarded to feedback@eginnovations.com.

147

future versions of the eG Enterprise suite. If
to incorporate in the eG Enterprise suite of
We look forward to your support and
other aspects of the eG Enterprise suite can



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 151
Language                        : en-US
Title                           : Manual
Author                          : Geetha
Creator                         : Microsoft® Word 2010
Create Date                     : 2015:02:10 21:45:07+05:30
Modify Date                     : 2015:02:10 21:45:07+05:30
Producer                        : Microsoft® Word 2010
EXIF Metadata provided by EXIF.tools

Navigation menu