Implementing And Managing InfiniBand Coupling Links On IBM System Z 7539 Sg247539

User Manual: 7539

Open the PDF directly: View PDF PDF.
Page Count: 288 [warning: Documents this large are best viewed by clicking the View PDF Link!]

ibm.com/redbooks
Front cover
Implementing and Managing
InfiniBand Coupling Links
on IBM System z
Frank Kyne
Hua Bin Chu
George Handera
Marek Liedel
Masaya Nakagawa
Iain Neville
Christian Zass
Concepts, terminology, and
supported topologies
Planning, migration, and
implementation guidance
Performance information
International Technical Support Organization
Implementing and Managing InfiniBand Coupling Links
on IBM System z
January 2014
SG24-7539-03
© Copyright International Business Machines Corporation 2008, 2012, 2014. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Fourth Edition (January 2014)
This edition applies to the InfiniBand features that are available on IBM System z servers.
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. iii
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Stay connected to IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Summary of changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
January 2014, Fourth Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
March 2012, Third Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1. Introduction to InfiniBand on System z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Objective of this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 InfiniBand architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Physical layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 IBM System z InfiniBand implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Host channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Processor-specific implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 InfiniBand benefits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 The importance of an efficient coupling infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1 Coupling link performance factors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.2 PSIFB 12X and 1X InfiniBand links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Structure of this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Chapter 2. InfiniBand technical description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 InfiniBand connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 InfiniBand fanouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Adapter types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Fanout plugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Fanout plugging rules for zEnterprise 196 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Fanout plugging rules for zEnterprise 114 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Fanout plugging rules for System z10 EC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.4 Fanout plugging rules for System z10 BC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Adapter ID assignment and VCHIDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 Adapter ID assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 VCHID - Virtual Channel Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 InfiniBand coupling links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.1 12X PSIFB coupling links on System z9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.2 12X PSIFB coupling links on System z10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5.3 PSIFB Long Reach coupling links on System z10 . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.4 12X PSIFB coupling links on z196 and z114 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.5.5 Long Reach PSIFB coupling links on zEnterprise 196 and 114 . . . . . . . . . . . . . . 33
2.5.6 PSIFB coupling links and Server Time Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 34
iv Implementing and Managing InfiniBand Coupling Links on IBM System z
2.6 InfiniBand cables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 3. Preinstallation planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1 Planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 CPC topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Coexistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Supported coupling link types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Hardware and software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Hardware prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Software prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Considerations for Server Time Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.1 Considerations for STP with PSIFB coupling links . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Multisite sysplex considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6 Planning for future nondisruptive growth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Physical and logical coupling link capacity planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.7.1 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.7.2 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.7.3 Capacity and performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8 Physical Coupling link addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.9 Cabling considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Chapter 4. Migration planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1 Migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.1.1 Connectivity considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Introduction to the scenario notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6 Scenario 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.7 Scenario 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.8 Concurrent switch between IFB modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Chapter 5. Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1 Introduction to performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.2 Our measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3 Our configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.4 Testing background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.4.1 z/OS LPAR configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.4.2 CF configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.4.3 Workloads used for our measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.4.4 Run-time test composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4.5 Measurement summaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.5 Simplex performance measurements results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.5.1 Measurements on z10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.5.2 Measurements on z196 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.6 ISC and PSIFB 1X performance measurements results. . . . . . . . . . . . . . . . . . . . . . . 140
5.7 SM Duplex performance measurements results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.8 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Chapter 6. Configuration management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.1 Configuration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2 PSIFB link support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2.1 PSIFB connectivity options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Contents v
6.3 Sample configuration with PSIFB links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.4 Defining your configuration to the software and hardware . . . . . . . . . . . . . . . . . . . . . 161
6.4.1 Input/output configuration program support for PSIFB links . . . . . . . . . . . . . . . . 161
6.4.2 Defining PSIFB links using HCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.4.3 Defining timing-only PSIFB links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.4.4 IOCP sample statements for PSIFB links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.4.5 Using I/O configuration data to document your coupling connections . . . . . . . . 177
6.5 Determining which CHPIDs are using a port. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.6 Cabling documentation considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.7 Dynamic reconfiguration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.8 CHPID Mapping Tool support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Chapter 7. Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.1 Managing your InfiniBand infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.2 z/OS commands for PSIFB links. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.2.1 z/OS CF-related commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
7.3 Coupling Facility commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
7.4 Hardware Management Console and Support Element tasks . . . . . . . . . . . . . . . . . . 211
7.4.1 Display Adapter IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
7.4.2 Determining the CHPIDs that are associated with an AID/port. . . . . . . . . . . . . . 214
7.4.3 Toggling a CHPID on or offline using HMC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
7.4.4 Displaying the status of a CIB link (CPC view) . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.4.5 Display the status of a logical CIB link (Image view). . . . . . . . . . . . . . . . . . . . . . 221
7.4.6 View Port Parameters panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.4.7 Useful information from the Channel Problem Determination display. . . . . . . . . 224
7.4.8 System Activity Display. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.5 PSIFB Channel problem determination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.5.1 Checking that the link is physically working . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.5.2 Verifying that the physical connections match the IOCDS definitions. . . . . . . . . 229
7.5.3 Setting a coupling link online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
7.6 Environmental Record Editing and Printing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Appendix A. Resource Measurement Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Resource Measurement Facility overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Introduction to performance monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Introduction to RMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Interactive reporting with RMF Monitor III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
RMF Postprocessor reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Appendix B. Processor driver levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Driver level cross-reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Appendix C. Link buffers and subchannels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Capacity planning for coupling links overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Subchannels and link buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
vi Implementing and Managing InfiniBand Coupling Links on IBM System z
Appendix D. Client experience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Overview of the client experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Large production sysplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Exploiting InfiniBand for link consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
© Copyright IBM Corp. 2014. All rights reserved. vii
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results
obtained in other operating environments may vary significantly. Some measurements may have been made
on development-level systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
viii Implementing and Managing InfiniBand Coupling Links on System z
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
CICS®
DB2®
FICON®
GDPS®
Global Technology Services®
IBM®
IMS™
MVS™
OS/390®
Parallel Sysple
Redbooks®
Redbooks (logo) ®
Resource Link®
Resource Measurement Facility™
RMF™
System z10®
System z9®
System z®
System/390®
WebSphere®
z/OS®
z/VM®
z10™
z9®
zEnterprise®
The following terms are trademarks of other companies:
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. ix
Preface
This IBM® Redbooks® publication provides introductory, planning, migration, and
management information about InfiniBand coupling links on IBM System z® servers.
The book will help you plan and implement the migration from earlier coupling links (ISC3 and
ICB4) to InfiniBand coupling links. It provides step-by-step information about configuring
InfiniBand connections. Information is also provided about the performance of InfiniBand links
compared to other link types.
This book is intended for systems programmers, data center planners, and systems
engineers. It introduces and explains InfiniBand terminology to help you understand the
InfiniBand implementation on System z servers. It also serves as a basis for configuration
planning and management.
Authors
This book was produced by a team of specialists from around the world working at the IBM
International Technical Support Organization (ITSO), Poughkeepsie Center.
Frank Kyne is an Executive IT Specialist and Project Leader at the IBM International
Technical Support Organization, Poughkeepsie Center. He writes extensively and teaches
IBM classes worldwide on all areas of IBM Parallel Sysplex® and high availability. Before
joining the ITSO 13 years ago, Frank worked in IBM Ireland as an IBM MVS™ Systems
Programmer.
Hua Bin Chu is an Advisory I/T Specialist in China. He has five years of experience with
IBM Global Technology Services® and in supporting clients of large System z products. His
areas of expertise include IBM z/OS®, Parallel Sysplex, System z high availability solutions,
IBM GDPS®, and IBM WebSphere® MQ for z/OS.
George Handera has more than 30 years of data processing experience, ranging across
application development, DB2/MQ Subsystem support, performance management, systems
architecture, and capacity roles at Aetna. He has also worked independently, creating and
selling the copyrights to several mainframe products. George presents at a variety of user
group conferences with a performance-oriented focus related to new hardware offerings,
specialty engines, and coupling technology options and their impact on WebSphere MQ and
IBM DB2® services.
Marek Liedel is a System z IT Specialist in the TSCC Hardware FE System z center in
Mainz, Germany. He worked for 10 years as a Customer Engineer for large banking and
insurance customers and has a total of 16 years of experience in supporting System z clients.
Since 2002, Marek has held a degree as a Certified Engineer in the data processing
technology domain. His areas of expertise include MES installations, HMC/SE code, and
client support in the solution assurance process.
Masaya Nakagawa is a Senior IT Specialist in IBM Japan. He has 12 years of experience in
technical support at the IBM Advanced Technical Support and Design Center. His areas of
expertise include System z, Parallel Sysplex, and z/OS UNIX. Masaya has supported several
projects for mission-critical large systems for IBM clients in Japan and Asia.
x Implementing and Managing InfiniBand Coupling Links on IBM System z
Iain Neville is a Certified Consulting IT Specialist with IBM United Kingdom. He has 19 years
of experience in System z technical support and consultancy. His areas of expertise include
Parallel Sysplex, z/OS, IBM FICON®, Server Time Protocol (STP), and System z high
availability solutions. Iain’s other responsibilities include pre-sales System z technical
consultancy with numerous large financial institutions across the UK.
Christian Zass is a System z IT Specialist working in the TSCC Hardware FE System z
center in Germany and at EPSG European FE System z in France. He has 10 years of
experience working with and supporting System z clients. Christian’s areas of expertise
include System z servers and Telematics engineering.
Thanks to the following people for their contributions to this project:
Rich Conway
International Technical Support Organization, Poughkeepsie Center
Friedrich Beichter
IBM Germany
Connie Beuselinck
Noshir Dhondy
Pete Driever
Rich Errickson
Nicole Fagen
Robert Fuga
Steve Goss
Gary King
Phil Muller
Glen Poulsen
Dan Rinck
Donna Stenger
David Surman
Ambrose Verdibello
Barbara Weiler
Brian Zerba
IBM US
Thanks also to the authors of the original edition of this document:
Dick Jorna
IBM Netherlands
Jeff McDonough
IBM US
Special thanks to Bob Haimowitz of the International Technical Support Organization,
Poughkeepsie Center, for his tireless patience and support of this residency.
Now you can become a published author, too!
Here's an opportunity to spotlight your skills, grow your career, and become a published
author - all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
Preface xi
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us.
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review IBM Redbooks publications form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks publications
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
publications weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
xii Implementing and Managing InfiniBand Coupling Links on IBM System z
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. xiii
Summary of changes
This section describes the technical changes made in this edition of the book. This edition
might also include minor corrections and editorial changes that are not identified.
Summary of Changes
for SG24-7539-03
for Implementing and Managing InfiniBand Coupling Links on IBM System z
as created or updated on January 27, 2014.
January 2014, Fourth Edition
This revision adds information about the enhancements introduced with the IBM zEC12 to
provide more information about the InfiniBand infrastructure in operator commands and
IBM RMF™ reports.
Note that the whole book was not updated to include information about the zEC12 and zBC12
servers. For information about the InfiniBand support on those servers refer to the IBM
Redbooks documents IBM zEnterprise BC12 Technical Guide, SG24-8138 and IBM
zEnterprise EC12 Technical Guide, SG24-8049.
New information
Added recommendation about when to specify seven subchannels for a CF link CHPID,
and when to specify 32.
Changed information
Table 1-2 on page 9 was updated to remove IBM z9® and add IBM zEC12 and IBM
zBC12.
Table 1-3 on page 11 was updated to add the expected response times for zEC12.
Table 3-2 on page 42 was updated to reflect the recommended driver and microcode
levels for zEC12 and zBC12.
“Connecting PSIFB links between z196 and later processors” on page 169 was updated to
reflect changes in the default number of subchannels for PSIFB CHPIDs in HCD.
Appendix B, “Processor driver levels” on page 245 was updated to include the driver levels
for IBM zEC12 and IBM zBC12.
Because the InfiniBand support on zEC12 and zBC12 is similar to that on z196 and z114,
minor changes have been made to the text throughout the book to include zEC12 and
zBC12.
March 2012, Third Edition
This revision is a significant rework of the previous edition of this Redbooks document. It
reflects the latest InfiniBand-related announcements at the time of writing. In addition, it
reflects IBM experience with the use of, and migration to, InfiniBand links since their
announcement.
xiv Implementing and Managing InfiniBand Coupling Links on IBM System z
New information
A chapter describing common migration scenarios has been added.
Information about the relative performance of InfiniBand coupling links, compared to other
coupling link types, has been added.
Information has been added about the IBM zEnterprise® 196 and IBM zEnterprise 114
processors.
The July 2011 announcements added:
A new, high-performance, protocol for PSIFB 12X links.
More subchannels and link buffers for PSIFB 1X links.
Four ports on PSIFB 1X adapters.
The focus of the book has altered to concentrate more on the use of InfiniBand for
coupling and STP in System z servers, with less focus on the other possible uses of
InfiniBand.
At the time of writing, IBM System z9® has been withdrawn from marketing. Therefore,
information about adding InfiniBand to System z9 has been removed. However,
information about the considerations for System z9 as part of an InfiniBand migration
scenario has been retained or enhanced.
Changed information
There are numerous changes throughout the book to reflect software or hardware
changes, or new guidance based on client experiences.
Deleted information
Much of the information about the z9 generation of servers has been removed because
upgrades to those servers have been withdrawn from marketing.
Various information about the InfiniBand architecture has been removed to reflect the
focus in this book on the use of InfiniBand links for coupling.
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. 1
Chapter 1. Introduction to InfiniBand on
System z
In this chapter, we introduce the InfiniBand architecture and technology and discuss the
advantages that InfiniBand brings to a Parallel Sysplex environment compared to earlier
coupling technologies. InfiniBand is a powerful and flexible interconnect technology designed
to provide connectivity for large server infrastructures, and it plays a vital role in the
performance, availability, and cost-effectiveness of your Parallel Sysplex.
In this chapter, we discuss the following topics:
InfiniBand architecture
IBM System z InfiniBand implementation
Advantages of InfiniBand
The importance of an efficient coupling infrastructure
Ter m in ol og y
1
Note: This document reflects the enhancements that were announced for
IBM zEnterprise 196 on July 12, 2011 and delivered with Driver Level 93.
Any z196 that is using Driver 86 must be migrated to Driver 93 before an upgrade to add
HCA3 adapters can be applied. Therefore, this document reflects the rules and capabilities
for a z196 at Driver 93a or later CPCs.
a. For more information about Driver levels, see Appendix B, “Processor driver levels” on page 245.
2 Implementing and Managing InfiniBand Coupling Links on IBM System z
1.1 Objective of this book
This book is a significant update to a previous introductory edition. Since that edition was
published several years ago, IBM has made many announcements related to InfiniBand on
System z. We also have more experience with implementing InfiniBand in large production
environments. We provide that information here so that all clients can benefit from those that
have gone before them. Finally, the focus of this book has changed somewhat, with less
emphasis on InfiniBand architecture and more focus on how InfiniBand is used in a System z
environment.
1.2 InfiniBand architecture
The use, management, and topology of InfiniBand links is significantly different from
traditional coupling links, so a brief explanation of InfiniBand architecture is useful before
continuing on to the rest of the book.
InfiniBand background and capabilities
In 1999, two competing I/O standards called Future I/O (developed by Compaq, IBM, and
Hewlett-Packard) and Next Generation I/O (developed by Intel, Microsoft, and Sun) merged
into a unified I/O standard called InfiniBand. The InfiniBand Trade Association (IBTA) is the
organization that maintains the InfiniBand specification. The IBTA is led by a steering
committee staffed by members of these corporations. The IBTA is responsible for compliance
testing of commercial products, a list of which can be found at:
http://www.infinibandta.org/content/pages.php?pg=products_overview
InfiniBand is an industry-standard specification that defines an input and output architecture
that can be used to interconnect servers, communications infrastructure equipment, storage,
and embedded systems. InfiniBand is a true fabric architecture that leverages switched,
point-to-point channels with data transfers up to 120 Gbps, both in chassis backplane
applications and through external copper and optical fiber connections.
InfiniBand addresses the challenges that IT infrastructures face. Specifically, InfiniBand can
help you in the following ways:
Superior performance
InfiniBand provides superior latency performance and products, supporting up to
120 Gbps connections.
Reduced complexity
InfiniBand allows for the consolidation of multiple I/Os on a single cable or backplane
interconnect, which is critical for blade servers, data center computers and storage
clusters, and embedded systems.
Highest interconnect efficiency
InfiniBand was developed to provide efficient scalability of multiple systems. InfiniBand
provides communication processing functions in hardware, thereby relieving the processor
of this task, and it enables full resource utilization of each node added to the cluster.
In addition, InfiniBand incorporates Remote Direct Memory Access (RDMA), which is an
optimized data transfer protocol that further enables the server processor to focus on
application processing. RDMA contributes to optimal application processing performance
in server and storage clustered environments.
Chapter 1. Introduction to InfiniBand on System z 3
Reliable and stable connections
InfiniBand provides reliable end-to-end data connections. This capability is implemented in
hardware. In addition, InfiniBand facilitates the deployment of virtualization solutions that
allow multiple applications to run on the same interconnect with dedicated application
partitions.
1.2.1 Physical layer
The physical layer specifies the way that the bits are put on the wire in the form of symbols,
delimiters, and idles. The InfiniBand architecture defines electrical, optical, and mechanical
specifications for this technology. The specifications include cables, receptacles, and
connectors and how they work together, including how they need to behave in certain
situations, such as when a part is hot-swapped.
Physical lane
InfiniBand is a point-to-point interconnect architecture developed for today’s requirements for
higher bandwidth and the ability to scale with increasing bandwidth demand. Each link is
based on a two-fiber 2.5 Gbps bidirectional connection for an optical (fiber cable)
implementation or a four-wire 2.5 Gbps bidirectional connection for an electrical (copper
cable) implementation. This 2.5 Gbps connection is called a physical lane.
Each lane supports multiple transport services for reliability and multiple prioritized virtual
communication channels. Physical lanes are grouped in support of one physical lane (1X),
four physical lanes (4X), eight physical lanes (8X), or 12 physical lanes (12X).
InfiniBand currently defines bandwidths at the physical layer. It negotiates the use of:
Single Data Rate (SDR), delivering 2.5 Gbps per physical lane
Double Data Rate (DDR), delivering 5.0 Gbps per physical lane
Quadruple Data Rate (QDR), delivering 10.0 Gbps per physical lane
Bandwidth negotiation determines the bandwidth of the interface on both sides of the link to
determine the maximum data rate (frequency) achievable based on the capabilities of either
end and interconnect signal integrity.
In addition to the bandwidth, the number of lanes (1X, 4X, 8X, or 12X) is negotiated, which is
a process in which the maximum achievable bandwidth is determined based on the
capabilities of either end.
Combining the bandwidths with the number of lanes gives the link or signaling rates that are
shown in Table 1-1.
Table 1-1 Interface width and link ratings
Width Single Data Rate Double Data Ratea
a. All InfiniBand coupling links on IBM z10™ and later CPCs use Double Data Rate.
Quadruple Data Rate
1X 2.5 Gbps 5.0 Gbps 10 Gbps (1 GBps)
4X 10.0 Gbps (1 GBps) 20.0 Gbps (2 GBps) 40 Gbps (4 GBps)
8X 20.0 Gbps (2 GBps) 40.0 Gbps (4 GBps) 80 Gbps (8 GBps)
12X 30.0 Gbps (3 GBps) 60.0 Gbps (6 GBps) 120 Gbps (12 GBps)
4 Implementing and Managing InfiniBand Coupling Links on IBM System z
Links use 8 B/10 B encoding (every 10 bits sent carry 8 bits of data), so that the useful data
transmission rate is four-fifths the signaling or link rate (signaling and link rate equal the raw
bit rate). Therefore, the 1X single, double, and quad rates carry 2 Gbps, 4 Gbps, or 8 Gbps of
useful data, respectively.
In this book we use the following terminology:
Data rate This is the data transfer rate expressed in bytes where one byte equals
eight bits.
Signaling rate This is the raw bit rate expressed in bits.
Link rate This is equal to the signaling rate expressed in bits.
We use the following terminology for link ratings. Notice that the terminology is a mix of
standard InfiniBand phrases and implementation wording:
12X IB-SDR
This uses 12 lanes for a total link rate of 30 Gbps. It is a point-to-point connection with a
maximum length of 150 meters.
12X IB-DDR
This uses 12 lanes for a total link rate of 60 Gbps. It is a point-to-point connection with a
maximum length of 150 meters.
1X IB-SDR LR (Long Reach)
This uses one lane for a total link rate of 2.5 Gbps. It supports an unrepeated distance of
up to 10 km1, or up to 175 km2 with a qualified DWDM solution.
1X IB-DDR LR (Long Reach)
This uses one lane for a total link rate of 5 Gbps. It supports an unrepeated distance of up
to 10 km 1, or up to 175 km2 with a qualified DWDM solution.
The link and physical layers are the interface between the packet byte stream of higher layers
and the serial bit stream of the physical media. Physically, you can implement the media as 1,
4, 8, or 12 physical lanes. The packet byte stream is striped across the available physical
lanes and encoded using the industry standard 8 B/10 B encoding method that is also used
by Ethernet, FICON or Fibre Channel CONnection, and Fibre Channel.
Virtual lanes
InfiniBand allows for multiple independent data streams over the same physical link, which
are called virtual lanes (VLs). VLs are separate logical flows with their own buffering. They
allow more efficient and speedier communications between devices because no buffer or task
can slow down the communication on the physical connection. InfiniBand supports up to 16
virtual lanes (numbered 0 to 15).
Important: The quoted link rates are only theoretical. The message architecture, link
protocols, CF utilization, and CF MP effects make the effective data rate lower than these
values.
1 RPQ 8P2340 may be used to increase the unrepeated distance to up to 20 km.
2 The supported repeated distance can vary by DWDM vendor and specific device and features.
Note: There is no relationship between the number of CHPIDs associated with an
InfiniBand port and the number of lanes that will be used. You can potentially assign
16 CHPIDs to a port with only one lane, or assign only one CHPID to a port with 12 lanes,
and the signals will be spread over all 12 lanes.
Chapter 1. Introduction to InfiniBand on System z 5
1.3 IBM System z InfiniBand implementation
As you can see, InfiniBand is an industry architecture. It is supported by many vendors, and
each vendor might have its own unique way of implementing or exploiting it. This section
describes how InfiniBand is implemented on IBM System z CPCs.
1.3.1 Host channel adapters
Host channel adapters (HCAs) are physical devices in processors and I/O equipment that
create and receive packets of information. The host channel adapter is a programmable
Direct Memory Access (DMA) engine that is able to initiate local memory operations. The
DMA engine offloads costly memory operations from the processor, because it can access
system memory directly for reading and writing independently from the central processor.
This enables the transfer of data with significantly less CPU overhead. The CPU initiates the
transfer and switches to other operations while the transfer is in progress. Eventually, the CPU
receives an interrupt after the transfer operation has been completed.
A channel adapter has one or more ports. Each port has its own set of transmit and receive
buffers that enable the port to support multiple simultaneous send and receive operations. For
example, the host channel adapter ports provide multiple communication interfaces by
providing separate send and receive queues for each CHPID. Figure 1-1 shows a schematic
view of the host channel adapter.
A host channel adapter provides an interface to a host device and supports “verbs” defined to
InfiniBand. Verbs describe the service interface between a host channel adapter and the
software that supports it. Verbs allow the device driver and the hardware to work together.
Figure 1-1 Host channel adapter
Note: There is no relationship between virtual lanes and CHPIDs. The fact that you can
have up to 16 of each is coincidental.
.......
.....
... ... ...
Memory
Transport
Port Port Port
DMA
Channel
Adapter
Send and
Receive Queues
Send and
Receive Queues
Send and
Receive Queues
Send and
Receive Queues
Send and
Receive Queues
VL
VL
VL
VL
VL
VL
VL
VL
VL
6 Implementing and Managing InfiniBand Coupling Links on IBM System z
1.3.2 Processor-specific implementations
System z10 and subsequent CPCs exploit InfiniBand for internal connectivity and
CPC-to-CPC communication. System z9 CPCs only use InfiniBand for CPC-to-CPC
communication. Fabric components, such as routers and switches, are not supported in these
environments, although qualified DWDMs can be used to extend the distance of 1X InfiniBand
links.
System z CPCs take advantage of InfiniBand technology in the following ways:
On System z10 and later CPCs, for CPC-to-I/O cage connectivity, InfiniBand, which
includes the InfiniBand Double Data Rate (IB-DDR) infrastructure, replaces the Self-Timed
Interconnect (STI) features found in prior System z CPCs.
Parallel Sysplex InfiniBand (PSIFB) 12X links (both IFB and IFB3 mode) support
point-to-point connections up to 150 meters (492 feet).
Parallel Sysplex InfiniBand (PSIFB) Long Reach links (also referred to as 1X links) support
point-to-point connections up to 10 km unrepeated (up to 20 km with RPQ 8P2340), or up
to 175 km when repeated through a Dense Wave Division Multiplexer (DWDM) and
normally replace InterSystem Channel (ISC-3). The PSIFB Long Reach feature is not
available on System z9.
Server Time Protocol (STP) signals are supported on all types of PSIFB links.
Host channel adapter types on System z CPCs
System z CPCs provide a number of host channel adapter types for InfiniBand support:
HCA1-O A host channel adapter that is identified as HCA1-O (Feature Code
(FC) 0167) provides an optical InfiniBand connection on System z93.
HCA1-O is used in combination with the 12X IB-SDR link rating to
provide a link rate of up to 3 GBps.
HCA2-C A host channel adapter that is identified as HCA2-C (FC 0162)
provides a copper InfiniBand connection from a book to I/O cages and
drawers on a System z10, zEnterprise 196, or zEnterprise 114.
HCA2-O A host channel adapter that is identified as HCA2-O (FC 0163)
provides an optical InfiniBand connection.
Note: At the time of writing, System z9 CPCs are withdrawn from marketing, meaning that
if you have a z9 and it does not already have InfiniBand adapters on it, it is no longer
possible to purchase those adapters from IBM. For this reason, this book focuses on
IBM System z10® and later CPCs.
Note: InfiniBand is used for both coupling links and internal processor-to-I/O cage (and
process-to-drawer) connections in System z10 and later CPCs.
However, you do not explicitly order the InfiniBand fanouts that are used for
processor-to-I/O cage connections; the number of those fanouts that you need will depend
on the I/O configuration of the CPC. Because the focus of this book is on the use of
InfiniBand links for coupling and STP, we do not go into detail about the use of InfiniBand
for internal connections.
3 InfiniBand cannot be used to connect one System z9 CPC to another z9 CPC. Only connection to a later CPC type
is supported.
Chapter 1. Introduction to InfiniBand on System z 7
HCA2-O supports connection to:
HCA1-O For connection to System z9.
HCA2-O For connection to System z10 or later CPCs.
HCA3-O For connection to zEnterprise 196 and later CPCs.
HCA2-O is used in combination with the 12X IB-DDR link rating to
provide a link rate of up to 6 GBps between z10 and later CPCs and
up to 3 GBps when connected to a z9 CPC.
HCA2-O LR4A host channel adapter that is identified as HCA2-O LR (FC 0168)
provides an optical InfiniBand connection for long reach coupling links
for System z10 and later CPCs. HCA2-O LR is used in combination
with the 1X IB-DDR link rating to provide a link rate of up to 5 Gbps. It
automatically scales down to 2.5 Gbps (1X IB-SDR) depending on the
capability of the attached equipment. The PSIFB Long Reach feature
is available only on System z10 and later CPCs.
HCA3-O A host channel adapter that is identified as HCA3-O (FC 0171)
provides an optical InfiniBand connection to System z10 and later
CPCs. A HCA3-O port can be connected to a port on another HCA3-O
adapter, or a HCA2-O adapter. HCA3 adapters are only available on
zEnterprise 196 and later CPCs.
HCA3-O adapters are used in combination with the 12X IB-DDR link
rating to provide a link rate of up to 6 GBps. A port on a HCA3-O
adapter can run in one of two modes:
IFB mode This is the same mode that is used with HCA2-O adapters and
offers equivalent performance.
IFB3 mode This mode is only available if the HCA3-O port is connected to
another HCA3-O port and four or fewer CHPIDs are defined to
share that port. This mode offers improved performance compared
to IFB mode5.
HCA3-O LR A host channel adapter that is identified as HCA3-O LR (FC 0170)
provides an optical InfiniBand connection for long reach coupling links.
The adapter is available for z196, z114, and later CPCs, and is used to
connect to:
HCA2-O LR For connection to z10, and z196.
HCA3-O LR For connection to z196, z114, and later CPCs.
HCA3-O LR is used in combination with the 1X IB-DDR link rating to
provide a link rate of up to 5 Gbps. It automatically scales down to 2.5
Gbps (1X IB-SDR) depending on the capability of the attached
equipment. This adapter also provides additional ports (four ports
versus two ports on HCA2-O LR adapters).
1.4 InfiniBand benefits
System z is used by enterprises in different industries in different ways. It is probably fair to
say that no two mainframe environments are identical. System z configurations span from
4 HCA2-O LR adapters are still available for z10. However, they have been withdrawn from marketing for z196. On
z196, HCA3-O LR functionally replaces HCA2-O LR.
5 The maximum bandwidth for a HCA3-O link is 6 GBps, regardless of the mode. IFB3 mode delivers better response
times through the use of a more efficient protocol.
8 Implementing and Managing InfiniBand Coupling Links on IBM System z
sysplexes with over 100,000 MIPS to configurations with only one or two CPCs. Various
configurations intensively exploit sysplex capabilities for data sharing and high availability.
Others exploit simply the resource sharing functions. There are enterprises that run a single
sysplex containing both production and non-production systems. Others have multiple
sysplexes, each with a different purpose.
InfiniBand addresses the requirements of all these configurations. Depending on your
configuration and workload, one or more InfiniBand attributes might particularly interest you.
The benefits that InfiniBand offers compared to previous generation of System z coupling
technologies are listed here:
The ability to have ICB4-levels of performance for nearly all in-data-center CPC coupling
connections.
ICB4 links are limited to 10 meters, meaning that the maximum distance between
connected CPCs is limited to about 7 meters. As a result, many installations wanting to
use ICB4 links were unable to because of physical limitations on how close the CPCs
would be located to each other.
InfiniBand 12X links can provide performance similar to or better than ICB4 links, and yet
support distances of up to 150 meters. It is expected that InfiniBand 12X links will be
applicable to nearly every configuration where the CPCs being connected are in the same
data center. This is designed to result in significant performance (and overhead)
improvements for any installation that was forced to use ISC links in the past.
The ability to provide coupling connectivity over large distances with performance that is
equivalent to or better than ISC3 links, but with significantly fewer links.
HCA2-O LR and HCA3-O LR 1X links on z196 and later support either 7 or 32
subchannels and link buffers6 per CHPID, depending on the Driver level of the CPCs at
both ends of the link. For long-distance sysplexes, the use of 32 subchannels means that
fewer links are required to provide the same number of subchannels and link buffers than
is the case with ISC3 links. And if 64 subchannels (two CHPIDs with 32 subchannels
each) are not sufficient, additional CHPIDs can be defined to use the same link (in the
past, the only way to add CHPIDs was to add more physical links).
For a long-distance sysplex, the ability to deliver the same performance with fewer links
might translate to fewer DWDM ports or fewer dark fibers for unrepeated links. Also, fewer
host channel adapters might be required to deliver the same number of subchannels. Both
of these characteristics can translate into cost savings.
The ability to more cost effectively handle peak CF load situations.
Because InfiniBand provides the ability to assign multiple CHPIDs to a single port, you can
potentially address high subchannel utilization or high path busy conditions by adding
more CHPIDs (and therefore more subchannels) to a port. This is a definition-only
change; no additional hardware is required, and there is no financial cost associated with
assigning another CHPID to an InfiniBand port.
The IBM experience has been that many clients with large numbers of ICB4 links do not
actually require that much bandwidth. The reason for having so many links is to provide
more subchannels to avoid delays caused by all subchannels or link buffers being busy
during workload spikes. You might find that the ability to assign multiple CHPIDs to an
InfiniBand port means that you actually need fewer InfiniBand ports than you have ICB4
links today.
6 The relationship between subchannels and link buffers is described in Appendix C, “Link buffers and subchannels”
on page 247.
Chapter 1. Introduction to InfiniBand on System z 9
Every Parallel Sysplex requires connectivity from the z/OS systems in the sysplex to the
CFs being used by the sysplex. Link types prior to InfiniBand cannot be shared across
sysplexes, meaning that every sysplex required its own set of links.
Although InfiniBand does not provide the ability to share CHPIDs across multiple
sysplexes, it does provide the ability to share links across sysplexes. Because InfiniBand
supports multiple CHPIDs per link, multiple sysplexes can each have their own CHPIDs on
a shared InfiniBand link. For clients with large numbers of sysplexes, this can mean
significant savings in the number of physical coupling links that must be provided to deliver
the required connectivity.
zEnterprise 196 and later support larger numbers of CF link CHPIDs (increased to 128
CHPIDs from the previous limit of 64 CHPID). The InfiniBand ability to assign multiple
CHPIDs to a single link helps you fully exploit this capability7.
1.5 The importance of an efficient coupling infrastructure
Efficient systems must provide a balance between CPU performance, memory bandwidth
and capacity, and I/O capabilities. However, semiconductor technology evolves much faster
than I/O interconnect speeds, which are governed by mechanical, electrical, and
speed-of-light limitations, thus increasing the imbalance and limiting overall system
performance. This imbalance suggests that I/O interconnects must change to maintain
balanced system performance.
Each successive generation of System z CPC is capable of performing more work than its
predecessors. To keep up with the increasing performance, it is necessary to have an
interconnect architecture that is able to satisfy the I/O interconnect requirements that go
along with it. InfiniBand offers a powerful interconnect architecture that by its nature is better
able to provide the necessary I/O interconnect to keep the current systems in balance.
Table 1-2 highlights the importance that link technology plays in the overall performance and
efficiency of a Parallel Sysplex. The cells across the top indicate the CPC where z/OS is
running. The cells down the left side indicate the type of CPC where the CF is running and the
type of link that is used to connect z/OS to the CF.
Table 1-2 Coupling z/OS CPU cost
7 The number of physical coupling links that you can install depends on your CPC model and the number of books
that are installed.
CF/Host z10 BC z10 EC z114 z196 zBC12 zEC12
z10 BC ISC3 16% 18% 17% 21% 19% 24%
z10 BC 1X IFB 13% 14% 14% 17% 18% 19%
z10 BC 12X IFB 12% 13% 13% 16% 16% 17%
z10 BC ICB4 10% 11% NA NA NA NA
z10 EC ISC3 16% 17% 17% 21% 19% 24%
z10 EC 1X IFB 13% 14% 14% 17% 17% 19%
z10 EC 12X IFB 11% 12% 12% 14% 14% 16%
z10 EC ICB4 9% 10% NA NA NA NA
These values are based on 9 CF requests per second per MIPS.
The XES Synch/Async heuristic algorithm effectively caps overhead at about 18%.
10 Implementing and Managing InfiniBand Coupling Links on IBM System z
To determine the z/OS CPU cost associated with running z/OS on a given CPC and using a
CF on a given CPC, find the column that indicates the CPC your z/OS is on, and the row that
contains your CF and the type of link that is used. For example, if you are running z/OS on a
z10 EC, connected to a z10 EC CF using ISC3 links and performing 9 CF requests per
second per MIPS, the overhead is 17%.
The overhead reflects the percent of available CPC cycles that are used to communicate with
the CF. A given CF with a given link type will deliver a certain average response time. For a
given response time, a faster z/OS CPC is able to complete more instructions in that amount
of time than a slower one. Therefore, as you move z/OS to a faster CPC, but do not change
the CF configuration, the z/OS CPU cost (in terms of “lost” CPU cycles) increases. Using the
table, you can see that upgrading the z/OS CPC from a z10 EC to a faster CPC (a z196)
increases the cost to 21%8.
To keep the z/OS CPU cost at a consistent level, you also need to reduce the CF response
time by a percent that is similar to the percent increase in the z/OS CPU speed. The most
effective way to address this is by improving the coupling technology. In this example, if you
upgrade the CF to a z196 with the same link type (ISC3), the cost remains about the same
(21%). Replacing the ISC3 links with 12X IFB links further reduces the response time,
resulting in a much larger reduction in the cost, to 14%. And replacing the ISC3 links with 12X
IFB3 links reduces the cost further, to 11%.
These z/OS CPU cost numbers are based on a typical data sharing user profile of 9 CF
requests per MIPS per second. The cost scales with the number of requests. For example, if
your configuration drives 4.5 CF requests per MIPS per second, the cost is 50% of the
numbers in Table 1-2 on page 9.
z114 ISC3 16% 18% 17% 21% 19% 24%
z114 1X IFB 13% 14% 14% 17% 17% 19%
z114 12X IFB 12% 13% 12% 15% 15% 17%
z114 12X IFB3 NA NA 10% 12% 12% 13%
z196 ISC3 16% 17% 17% 21% 19% 24%
z196 1X IFB 13% 14% 13% 16% 16% 18%
z196 12X IFB 11% 12% 11% 14% 14% 15%
z196 12X IFB3 NA NA 9% 11% 10% 12%
zBC12 1X IFB 14% 15% 14% 18% 17% 20%
zBC12 12X IFB 13% 13% 12% 15% 14% 17%
zBC12 12X IFB3 NA NA 10% 11% 11% 12%
zEC12 1X IFB 13% 13% 13% 16% 16% 18%
zEC12 12X IFB 11% 11% 11% 13% 13% 15%
zEC12 12X IFB3 NA NA 9% 10% 10% 11%
8 In practice, the XES heuristic algorithm effectively caps overhead at about 18% by converting longer-running
synchronous CF requests to be asynchronous.
CF/Host z10 BC z10 EC z114 z196 zBC12 zEC12
These values are based on 9 CF requests per second per MIPS.
The XES Synch/Async heuristic algorithm effectively caps overhead at about 18%.
Chapter 1. Introduction to InfiniBand on System z 11
To further illustrate the relationship between coupling link types and response times,
Table 1-3 contains information about expected response times for different link types and
different types of requests on z10, z196, and EC12 CPCs.
Table 1-3 Expected CF synchronous response time ranges
These represent average numbers. Many factors (distance, CF CPU utilization, link utilization,
and so on) can impact the actual performance. However, you can see a similar pattern in this
table to Table 1-3; that is, faster link types deliver reduced response times, and those reduced
response times can decrease the z/OS CPU cost of using a CF with that link type.
1.5.1 Coupling link performance factors
Note that there is a difference between speed (which is typically observed through the CF
service times) and bandwidth.
Consider the example of a 2-lane road and a 6-lane highway. A single car can travel at the
same speed on both roads. However, if 1000 cars are trying to traverse both roads, they will
traverse the highway in much less time than on the narrow road. In this example, the
bandwidth is represented by the number of lanes on the road. The speed is represented by
the ability of the car to travel at a given speed (because the speed of light is the same for all
coupling link types that exploit fiber optic cables9).
To take the analogy a step further, the time to traverse the road depends partially on the
number of lanes that are available on the entry to the highway. After the traffic gets on to the
highway, it will tend to travel at the speed limit. However, if many cars are trying to get on the
highway and there is only a single entry lane, there will be a delay for each car to get on the
highway. Similarly, the time to get a large CF request (a 64 KB DB2 request, for example) into
a low-bandwidth link will be significantly longer than that required to place the same request
into a higher bandwidth link.
ISC3 PSIFB 1X ICB-4 PSIFB 12X
IFB
PSIFB 12X
IFB3
ICP
zEC12
Lock request 20-30 12-16 N/A 10-14 5-8 2-6
Cache/List
request
25-40 14-24 N/A 13-17 7-9 4-8
z196
Lock request 20-30 14-17 N/A 10-14 5-8 2-8
Cache/List
request
25-40 16-25 N/A 14-18 7-9 4-9
z10
Lock request 20-30 14-18 8-12 11-15 N/A 3-8
Cache/List
request
25-40 18-25 10-16 15-20 N/A 6-10
9 The speed of light in a fiber is about 2/3 of the speed of light in a vacuum. The speed of a signal in a copper
coupling link (ICB4, for example) is about 75% of the speed of light in a vacuum.
12 Implementing and Managing InfiniBand Coupling Links on IBM System z
Therefore, the “performance” of a coupling link is a combination of:
The type of requests being sent on the link (large or small, or some specific mix of
short-running or long-running).
The bandwidth of the link (this becomes more important for requests with large amounts of
data, or if there is a significantly large volume of requests).
The technology in the card or adapter that the link is connected to.
The distance between the z/OS system and the CF.
The number of buffers associated with the link.
Another aspect of the performance of CF requests that must be considered is, how many CF
requests do your systems issue and how does that affect the cost of using the CF? Changing
from one link type to another is expected to result in a change in response times. How that
change impacts your systems depends to a large extent on the number and type of requests
that are being issued.
If your CF is processing 1000 requests a second and the synchronous response time
decreases by 10 microseconds, that represents a savings of .01 seconds of z/OS CPU time
per second across all the members of the sysplex, which is a change that is unlikely to even
be noticeable.
However, if your CF processes 200,000 synchronous requests a second and the synchronous
response time improves by just half that amount (5 microseconds), that represents a saving of
one second of z/OS CPU time per second; that is, a savings of one z/OS engine’s worth of
capacity.
Using this example, you can see that the impact of CF response times on z/OS CPU
utilization is heavily influenced by the number and type of requests being sent to the CF; the
larger the number of requests, the more important the response time is.
This section illustrates the importance of using the best performing coupling links possible.
However, the coupling link connectivity in many enterprises has not kept up with changes to
the z/OS CPCs, resulting in performance that is less than optimal.
As you migrate from your current link technology to InfiniBand links, you are presented with
an ideal opportunity to create a coupling infrastructure that delivers the optimum
performance, flexibility, availability, and financial value. The primary objective of this book is to
help you make the best of this opportunity.
1.5.2 PSIFB 12X and 1X InfiniBand links
As stated previously, there are two bandwidths available for System z InfiniBand links: 12X
and 1X.
12X links support a maximum distance of 150 meters. It is expected that anyone with a
need to connect two CPCs within a single data center is able to exploit 12X InfiniBand
links.
1X links support larger distances, and therefore are aimed at enterprises with a need to
provide coupling connectivity between data centers.
The InfiniBand enhancements announced in July 2011 further differentiate the two types of
links. The new HCA3-O 12X adapters were enhanced to address the high bandwidth/low
response time needs that typically go with a sysplex that is contained in a single data center.
Specifically, they support a new, more efficient, protocol that enables reduced response
times, and the ability to process a larger number of requests per second.
Chapter 1. Introduction to InfiniBand on System z 13
The InfiniBand 1X adapters were enhanced in a different way. Sysplexes that span large
distances often experience high response times, resulting in high subchannel and link buffer
utilization. However, because each subchannel and link buffer can only handle one CF
request at a time, the utilization of the fiber between the two sites tends to be quite low.
To alleviate the impact of high subchannel and link buffer utilization, Driver 93 delivered the
ability to specify 32 subchannels and link buffers per CHPID for 1X links on z196 and later
CPCs. This provides the ability to process more requests in parallel without requiring
additional physical links. Additionally, because of the greatly increased capability to handle
more concurrent requests on each CHPID, the HCA3-O LR adapters have four ports rather
than two. This allows you to connect to more CPCs with each adapter, while still supporting
more concurrent requests to each CPC than was possible with the previous two-port adapter.
1.6 Terminology
Before the availability of InfiniBand coupling links, there was a one-to-one correspondence
between CF link CHPIDs and the actual link. As a result, terms such as link, connection, port,
and CHPID tended to be used interchangeably. However, because InfiniBand supports the
ability to assign multiple CHPIDs to a single physical link, it becomes much more important to
use the correct terminology. To avoid confusion, the following list describes how common
terms are used in this book:
CF link Before InfiniBand links, there was a one-to-one correspondence
between CF links and CF link CHPIDs. As a result, the terms were
often used interchangeably. However, given that InfiniBand technology
supports multiple CHPIDs sharing a given physical connection, it is
important to differentiate between CF link CHPIDs and CF links. In this
book, to avoid confusion, we do not use the term “CF link” on its own.
CF link CHPID A CF link CHPID is used to communicate between z/OS and CF, or
between two CFs. A CF link CHPID can be associated with one, and
only one, coupling link. However, an InfiniBand coupling link can have
more than one CF link CHPID associated with it.
Coupling link When used on its own, “coupling link” is used generically to describe
any type of link that connects z/OS-to-CF, or CF-to-CF, or is used
purely for passing STP timing signals. It applies to all link types:
PSIFB, ICB4, ISC3, and ICP.
Timing-only link This is a link that is used to carry only STP signals between CPCs.
CPCs that are in the same Coordinated Timing Network (CTN) must
be connected by some type of coupling link. If either of the CPCs
connected by a coupling link contain a CF LPAR, the CHPIDs
associated with all links between those CPCs must be defined in
hardware configuration definition (HCD) as Coupling Link CHPIDs. If
neither CPC contains a CF LPAR, the CHPIDs must be defined as
timing-only link CHPIDs. You cannot have both coupling links and
timing-only links between a given pair of CPCs.
Port A port is a receptacle on an HCA adapter into which an InfiniBand
cable is connected. There is a one-to-one correspondence between
Note: IBM recommends specifying seven subchannels per CHPID for coupling links
between CPCs in the same site. For links that will span sites, it is recommended to specify
32 subchannels per CHPID.
14 Implementing and Managing InfiniBand Coupling Links on IBM System z
ports and InfiniBand links. Depending on the adapter type, an
InfiniBand adapter will have either two or four ports.
PSIFB coupling linksThis refers generically to both 1X and 12X PSIFB links.
12X InfiniBand links This refers generically to both IFB and IFB3-mode 12X links.
1X InfiniBand links This refers generically to HCA2-O LR and HCA3-O LR links.
12X IFB links This refers to InfiniBand links connected to HCA1-O adapters,
HCA2-O adapters, or HCA3-O adapters when running in IFB mode.
12X IFB3 links This refers to InfiniBand links where both ends are connected to
HCA3-O adapters, and that are operating in IFB3 mode.
Gbps or GBps The convention is that the bandwidth of ISC3 and 1X PSIFB links is
described in terms of Gigabits per second, and the bandwidth of ICB4
and 12X PSIFB links is described in terms of Gigabytes per second.
Subchannel In the context of coupling links, a subchannel is a z/OS control block
that represents a link buffer. Each z/OS LPAR that shares a CF link
CHPID will have one subchannel for each link buffer associated with
that CHPID.
Link buffer Every CF link CHPID has a number of link buffers associated with it.
The number of link buffers will be either 7 or 32, depending on the
adapter type, the Driver level of the CPC, and the type of adapter and
driver level of the other end of the associated coupling link. Link
buffers reside in the link hardware.
For more information about the relationship between subchannels and
link buffers, see Appendix C, “Link buffers and subchannels” on
page 247.
System z server This refers to any System z CPC that contains either a z/OS LPAR or
CF LPAR or both and, in the context of this book, supports InfiniBand
links.
zEC12 This is the short form of zEnterprise System zEC12.
zBC12 This is the short form of zEnterprise System zBC12.
z196 This is the short form of zEnterprise System 196.
z114 This is the short form of zEnterprise System 114.
zEnterprise server This refers to both zEnterprise System 196 and zEnterprise
System 114.
z10 or System z10 This refers to both System z10 EC and System z10 BC.
z9 or System z9 This refers to both System z9 EC and System z9 BC.
CPC Many different terms have been used to describe the device that is
capable of running operating systems such z/OS or IBM z/VM®,
namely, server, CPU, CEC, CPC, machine, and others. For
consistency, we use the term “CPC” throughout this book. One
exception is in relation to STP, where we continue to use the term
“server” to be consistent with the terminology on the Hardware
Management Console (HMC), the Support Element (SE), and the STP
documentation.
Chapter 1. Introduction to InfiniBand on System z 15
1.7 Structure of this book
The objective of this book is to help you successfully implement InfiniBand links in a System z
environment. To this end, the following chapters are provided:
Chapter 2, “InfiniBand technical description” on page 17 describes the InfiniBand
hardware.
Chapter 3, “Preinstallation planning” on page 37 describes the information that you need
as you plan for the optimum InfiniBand infrastructure for your configuration
Chapter 4, “Migration planning” on page 63 provides samples of what we believe are the
most common migration scenarios for clients moving to InfiniBand links.
Chapter 5, “Performance considerations” on page 121 provides information about the
results of a number of measurements we conducted, to compare the relative performance
of the various coupling link technologies.
Chapter 6, “Configuration management” on page 155 provides information to help you
successfully define the configuration you want using HCD.
Chapter 7, “Operations” on page 189 provides information to help you successfully
manage an InfiniBand configuration.
The following Redbooks provide information that supplements the information provided in this
book:
IBM System z Connectivity Handbook, SG24-5444
Server Time Protocol Planning Guide, SG24-7280
Server Time Protocol Implementation Guide, SG24-7281
Server Time Protocol Recovery Guide, SG24-7380
IBM System z10 Enterprise Class Technical Guide, SG24-7516
IBM System z10 Business Class Technical Overview, SG24-7632
IBM zEnterprise 196 Technical Guide, SG24-7833
IBM zEnterprise 114 Technical Guide, SG24-7954
IBM zEnterprise EC12 Technical Guide, SG24-8049
IBM zEnterprise BC12 Technical Guide, SG24-8138
16 Implementing and Managing InfiniBand Coupling Links on IBM System z
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. 17
Chapter 2. InfiniBand technical description
In this chapter, we describe the technical implementation of the InfiniBand technology on IBM
zEnterprise and System z processors.
zEnterprise and System z CPCs use InfiniBand technology for interconnecting CPCs in a
Parallel Sysplex environment.
We discuss the following topics:
InfiniBand connectivity
InfiniBand fanouts
Fanout plugging
Adapter ID assignment and VCHIDs
InfiniBand coupling links
InfiniBand cables
2
18 Implementing and Managing InfiniBand Coupling Links on IBM System z
2.1 InfiniBand connectivity
zEnterprise and System z CPCs benefit from the high speed and low latency offered by
InfiniBand technology. This technology provides improved reliability, scalability, and
performance, which are all attributes important in a Parallel Sysplex.
Because we are dealing with increased I/O data rates and faster CPCs, we need a way to
connect two CPCs with a faster and more flexible interconnection. Also, as enterprises move
from Sysplex Timers to STP for time coordination, interconnectivity between the CPCs in the
Common Time Network (CTN) is required. InfiniBand provides all of this functionality.
Figure 2-1 provides an overview of the InfiniBand coupling implementation options that are
available with InfiniBand on zEnterprise and System z CPCs. The connectivity options are:
Any-to-any coupling and timing-only 12X IFB mode connectivity between zEC12, zBC12,
z196, z114, and z10 CPCs.
Any-to-any coupling and timing-only 12X IFB3 mode connectivity between zEC12, zBC12,
z196, and z114 CPCs.
Any-to-any coupling and timing-only 1X connectivity between zEC12, zBC12, z196, z114,
and z10 CPCs (optical link - long reach).
Figure 2-1 InfiniBand connectivity options
Note that Parallel Sysplex InfiniBand (PSIFB) coupling link channel-path identifiers (CHPIDs)
can be shared or spanned across logical partitions and channel subsystems on zEnterprise
and System z CPCs. However, the total number of CF link CHPIDs (including ICP, ISC, and
InfiniBand CHPIDs) must not exceed 128 (64 on CPC generations prior to z196). See 3.1,
“Planning considerations” on page 38 for more information.
Note: Whenever the text refers to coupling links or coupling link fanouts, this also applies
to STP timing-only links.
ISC-3*, 2 Gbps
10/100 km
z196 and z114
12x IFB, 12x IFB3, 1x IFB, & ISC-3
12x IFB, 6 GBps
Up to 150 m
z10 EC and z10 BC
12x IFB, 1x IFB & ISC-3
1x IFB, 5 Gbps
10/100 km
ISC-3*
10/100 km 1x IFB, 5 Gbps
10/100 km
12x IFB3
or IFB
6 GBps
150 m
HCA3-O
HCA3-O ISC-3*, 2 Gbps
10/100 km
HCA2-O*
HCA2-O*
HCA2-O
OR
OR
*HCA2-O, HCA2-O LR, & ISC-3 carry
forward only on zEC12
HCA3-O LR
OR
HCA3-O LR
OR
HCA2-O LR*
HCA2-O LR*
HCA2-O LR*
zEC12
zEC12
HCA2-O LR
1x IFB, 5 Gbps
10/100 km HCA3-O LR
OR
HCA2-O LR*
HCA2-O 12x IFB3 or IFB
6 GBps
150 m
OR
HCA3-O
HCA3-O LR
OR
HCA3-O
HCA2-O*
OR
HCA3-O LR
OR
HCA2-O LR*
HCA3-O
HCA2-O*
OR
zBC12
Chapter 2. InfiniBand technical description 19
Table 2-1 provides more information about the InfiniBand options that are available on
zEnterprise and System z CPCs. We describe these options in more detail in subsequent
sections.
Table 2-1 Available InfiniBand options for zEnterprise and System z CPCs
2.2 InfiniBand fanouts
This section describes the various InfiniBand fanouts that are offered on the zEnterprise and
System z CPCs.
There are six fanout types that are based on InfiniBand technology:
HCA1-O (z9 only).
HCA2-O (z196, z114, and z10. They can be carried forward to a zEC12 or a zBC12 on an
upgrade).
HCA2-O LR (Orderable on z10 only - but can be carried forward to z196, z114, zEC12, or
zBC12 on an MES).
HCA2-C (z196, z114, and z10 - depending on the ordered I/O configuration. They cannot
be used for timing links or coupling to a CF.)
HCA3-O (z114, z196, zBC12, and zEC12).
HCA3-O LR (z114, z196, zBC12, and zEC12).
Fanout type Description System z9 System z10 zEC12
aBC12
zEnterprise 196
zEnterprise 114
Maximum
distance
Data link
rate
HCA1-O 12X IB-SDR Optical
coupling link
N/A N/A 150 meters
(492 feet)
3 GBps
HCA2-O 12X IB-DDR
12X IB-SDR
N/A Optical
coupling link
Optical
coupling link
150 meters
(492 feet)
6 GBps
3 GBps
HCA2-O LRa
a. RPQ 8P2340 for extended distance is available on client request.
1X IB-DDR
1X IB-SDR
N/A Optical
coupling link
Optical coupling
link (carry forward)
10 km
(6.2 miles)b
b. An extended distance of 175 km (108 miles) is supported with DWDM. The data rate (DDR or SDR) depends on
the capability of the attached equipment.
5 Gbps
2.5 Gbps
HCA2-Cc
c. These are only used for internal connections within the CPC. HCA2-C fanouts cannot be used for STP or for
connecting to a CF. They are only included here for completeness.
12X IB-DDR N/A Copper
I/O cage link
Copper
I/O cage link
1 - 3.5 meters
(3.2 - 11.4 feet)
6 GBps
HCA3-O 12x IB-DDR N/A N/A Optical
coupling linkd
d. The improved IFB3 protocol can be used if two HCA3-O fanouts are connected.
150 meters
(492 feet)
6 GBps
HCA3-O LR 1x IB-DDR
1x IB-SDR
N/A N/A Optical
coupling link
10 kma
(6.2 miles)b5 Gbps
2.5 Gbps
Note: InfiniBand links can be used to connect a z9 to a z10 or later. However, they cannot
be used to connect two z9 CPCs to each other.
20 Implementing and Managing InfiniBand Coupling Links on IBM System z
Each PSIFB fanout has either two ports or four ports (for the HCA3-O LR) to connect an
optical cable (see Figure 2-2 and Figure 2-3 on page 21).
2.2.1 Adapter types
This section provides further information about the different types of InfiniBand coupling link
adapters:
HCA1-O
This fanout is only available on System z9. It provides interconnectivity for Parallel Sysplex
and STP connections between zEnterprise 196, zEnterprise 114, System z10, and
System z9.
The fanout has two optical multifiber push-on (MPO) ports. The link operates at a
maximum speed of 3 GBps.
HCA1-O fanouts can be connected only to HCA2-O fanouts on z196, z114, and z10
CPCs. Ports on the HCA1-O fanout are exclusively used for coupling links or STP
timing-only links and cannot be used for any other purpose.
HCA2-O
This fanout, shown in Figure 2-2, can be ordered on z196, z114, and z10 CPCs. It
provides interconnectivity for Parallel Sysplex and STP connections between these CPCs,
and supports connection to a HCA1-O fanout on a z9.
Figure 2-2 HCA2-O fanout and HCA2-O LR fanout
The fanout has two optical MPO ports. The link can operate at either double data rate (if
connected to another HCA2-O or a HCA3-O) or single data rate (if connected to an
HCA1-O).
Ports on the HCA2-O fanout are used exclusively for coupling links or STP timing-only
links and cannot be used for any other purpose.
HCA2-O fanouts can also be carried forward during an MES upgrade from a System z10
to a zEnterprise system.
Note: HCA1-O, HCA2-O, HCA2-O LR, HCA3-O, and HCA3-O LR adapters are used
exclusively by InfiniBand coupling and timing-only links. Throughout this document, we
refer to them as PSIFB fanouts.
Note: z9-to-z9 PSIFB link connections are not supported.
Port 1
Port 2
Port 1
Port 2
Chapter 2. InfiniBand technical description 21
HCA2-O LR
This fanout (also shown in Figure 2-2 on page 20) can only be ordered on System z10.
HCA2-O LR fanouts can be carried forward to z196 and z114 by way of an MES upgrade.
It provides interconnectivity for Parallel Sysplex and STP connections between zEC12,
zBC12, z196, z114, and z10.
The fanout has two optical Small Form-Factor Pluggable (SFP) ports. The link operates at
either 5 Gbps or 2.5 Gbps depending on the capability of the attached equipment.
Ports on the HCA2-O LR fanout are used exclusively for coupling links or STP timing-only
links and cannot be used for any other purpose.
HCA2-C
This fanout is available on z196, z114, and z10 CPCs, depending on the I/O configuration.
The number of HCA2-C fanouts is not chosen by you; it is determined by the number and
type of I/O cards that are ordered. HCA2-C fanouts are only relevant to coupling from the
perspective that the number of HCA2-C fanouts that are installed can have an impact on
the number of fanouts that can be used for coupling.
The HCA2-C fanout provides the connection between the CPC complex and the IFB-MP
cards that are installed in the I/O cages or drawers. Ports on the HCA2-C fanout are
exclusively used for I/O interfaces and cannot be used for any other purpose.
HCA3-O
This fanout (shown on the left side in Figure 2-3) is only available on z196 and later CPCs.
It provides interconnectivity for Parallel Sysplex and STP connections.
The fanout has two optical MPO ports. Each link operates at 6 GBps.
If the connection is made between two HCA3-O fanouts and there are no more than four
CHPIDs per port defined, the connection will automatically use the improved IFB3 protocol
mode.
Ports on the HCA3-O fanout are exclusively used for coupling links or STP timing-only
links and cannot be used for any other purpose.
Figure 2-3 HCA3-O and HCA3-O LR fanout
Note: On zEnterprise or later CPCs, the only reason for ordering HCA2-O fanouts is if
you need to connect to a z9 CPC using PSIFB. In all other situations, order HCA3-O (or
HCA3-O LR) fanouts.
22 Implementing and Managing InfiniBand Coupling Links on IBM System z
HCA3-O LR
This fanout (shown on the right side in Figure 2-3 on page 21) is only available on z196
and later CPCs. It provides interconnectivity for Parallel Sysplex and STP connections
between zEC12, zBC12, z196, z114, and z10 CPCs.
The fanout has four optical Small Form-Factor Pluggable (SFP) ports. Each link operates
at either 5 Gbps or 2.5 Gbps depending on the capability of the attached equipment.
Ports on the HCA3-O LR fanout are exclusively used for coupling links or STP timing-only
links and cannot be used for any other purpose.
Table 2-2 summarizes the InfiniBand interconnectivity options.
Table 2-2 InfiniBand interconnectivity options
2.3 Fanout plugging
This section describes the fanout plugging rules for the zEnterprise and System z CPCs.
2.3.1 Fanout plugging rules for zEnterprise 196
With the introduction of the zEnterprise CPCs, there are now six fanout types available.
Depending on the number of I/O cages or drawers installed and the I/O domains in use, you
have different numbers of fanout slots available for use as coupling or timing-only links. A
maximum of 16 coupling link fanouts are supported for a z196.
Note that a z196 CPC that is fully populated for I/O connectivity has a maximum of 12 HCA
slots remaining for coupling link fanouts. It has six I/O drawers or up to three I/O cages
installed and uses up to 24 I/O domains. To connect each of the domains, you need 24 I/O
interfaces, which can be provided by 12 I/O interface fanouts. This means that only 12 fanout
slots are left to install coupling link fanouts (either HCA2-Os, HCA2-O LRs, HCA3-Os,
HCA3-O LRs, or any combination of the four).
The fanout plugging rules vary and are dependent upon the z196 model. Figure 2-4 on
page 23 can serve as a reference, but use the IBM configuration tool to determine the correct
allocation.
HCA1 HCA2 12X HCA2 1X HCA3 12X HCA3 1X
HCA1 NoYesNoNoNo
HCA2 12X Yes Yes (IFB) No Yes (IFB) No
HCA2 1X No No Yes No Yes
HCA3 12X No Yes (IFB) No Yes (IFB3) No
HCA3 1X No No Yes No Yes
Note: For the fanout plugging rules for zEC12 or zBC12, refer to the IBM Redbooks
documents IBM zEnterprise EC12 Technical Guide, SG24-8049, or IBM zEnterprise BC12
Technical Guide, SG24-8138.
Chapter 2. InfiniBand technical description 23
Figure 2-4 z196 fanout positions
Positions D1 and D2 are not available on z196 models M66 and M80. For model M49, the
positions D1 and D2 in the book positions LG10 and LG15 are not available.
For more information about this topic, see IBM zEnterprise 196 Technical Guide, SG24-7833.
2.3.2 Fanout plugging rules for zEnterprise 114
The z114 CPC has two hardware models. In the hardware model M05, only the first CPC
drawer is installed with a maximum of four fanout slots. The hardware model M10 has both
CPC drawers installed and provides up to eight fanout slots; see Figure 2-5 on page 24.
ECF ECFOSC OSC
LG 10
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
LG 01
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
LG 06
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
LG 15
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
24 Implementing and Managing InfiniBand Coupling Links on IBM System z
Depending on the number of installed I/O drawers, a different number of I/O interconnection
fanouts are used. This can range from zero I/O interconnection fanouts for a dedicated
stand-alone Coupling Facility model, through a Model M05 with a maximum of four legacy I/O
drawers (where all four fanout slots are used for I/O interconnection fanouts), to a model M10
with three I/O drawers (where a maximum of six fanout slots will be used for I/O
interconnection fanouts). So, depending on the model and the I/O connectivity that is
required, there will be zero to eight fanout slots available to install coupling link fanouts (either
HCA2-Os, HCA2-O LRs, HCA3-Os, HCA3-O LRs, or any combination of these).
Figure 2-5 zEnterprise 114 fanout positions
For more information, refer to IBM zEnterprise 114 Technical Guide, SG24-7954.
2.3.3 Fanout plugging rules for System z10 EC
System z10 supports three fanout types.
The HCA2-C fanout provides the I/O interfaces. Depending on the number of I/O cages
installed and the I/O domains in use, you have different numbers of fanout slots available for
use as coupling links. A fully populated CPC has three I/O cages installed and uses 21 I/O
domains. To connect each of them, you need 24 I/O interfaces, which can be provided by 12
HCA2-C fanouts. That means a maximum of 12 fanout slots are available to install coupling
link fanouts (either HCA2-Os, HCA2-O LRs, or MBAs, or any combination of the three).
Depending on the System z10 model, the plugging rules for fanouts vary. Figure 2-6 on
page 25 can serve as a reference, but use the IBM configuration tool to determine the correct
allocation.
D8
I/O
D7
I/O
D6
FSP
D1
I/O
D2
I/O
D5
FSP
D3
OSC
ECF
D4
OSC
ECF
DCA
1
DCA
2
CPC Drawer 1
CPC
Blower
D8
I/O
D7
I/O
D6
FSP
D1
I/O
D2
I/O
D5
FSP
D3
Pass
thru
D4
Pass
thru
DCA
1
DCA
2
CPC Drawer 2
CPC
Blower
Chapter 2. InfiniBand technical description 25
Figure 2-6 System z10 EC fanout positions
Positions D1 and D2 are not available in z10 models E56 and E64. For model E40, the
positions D1 and D2 in the book positions LG10 and LG15 are not available.
More information about the plugging rules for System z10 EC is available in IBM System z10
Enterprise Class Technical Guide, SG24-7516.
2.3.4 Fanout plugging rules for System z10 BC
The System z10 BC offers the possibility to work without any I/O drawers and can act as a
dedicated Coupling Facility CPC without the need for HCA2-C fanouts. Depending on the
number of installed I/O drawers, a different number of HCA2-C fanouts for I/O connections
are needed.
A fully-populated CPC has four I/O drawers with a total of eight I/O domains installed and
uses four fanout slots for I/O connections; see Figure 2-7. A CPC without an I/O drawer
installed will have six fanouts available for coupling connections. So there will be between two
and six fanout slots available to install coupling link fanouts (either HCA2-Os, HCA2-O LRs, or
MBAs, or any combination of these).
Figure 2-7 System z10 BC fanout positions
For more information about the plugging rules for System z10 BC, see IBM System z10
Business Class Technical Overview, SG24-7632.
ETR ETROSC OSC
LG 10
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
LG 01
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
LG 06
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
LG 15
D1 I/O
D2 I/O
D5 I/O
D6 I/O
D7 I/O
D8 I/O
D9 I/O
DA I/O
D3 FSP
D4 FSP
D8
D7
D6
D5
D2
FSP
D3 D4
D1
FSP
D9
OSC
ECF
DA
OSC
ECF
DCA
1
DCA
2
DCA
3
CPC Drawer
26 Implementing and Managing InfiniBand Coupling Links on IBM System z
2.4 Adapter ID assignment and VCHIDs
This section describes the assignment of the Adapter IDs (AIDs) and explains the relationship
between the AID, the virtual channel path identifier (VCHID), and the channel path identifier
(CHPID). A solid understanding of these concepts facilitates the management and
maintenance of PSIFB links from the HMC or SE.
2.4.1 Adapter ID assignment
An adapter ID (AID) is assigned to every PSIFB link fanout at installation time. It is unique for
the CPC. There is only one AID per fanout, so all ports on the fanout share the same AID. The
adapter ID is:
A number between 00 and 1F on z196, z10 EC, and z9 EC
A number between 00 and 0B on z114
A number between 00 and 05 on z10 BC
A number between 08 and 0F on a z9 BC
In the input/output configuration program (IOCP) or hardware configuration definition (HCD),
the AID and port number are used to connect the assigned CHPID to the physical location of
the fanout.
There are distinct differences between zEnterprise systems, System z10, and System z9 for
the assignment and handling of the AID; for example:
For z196, the AID is bound to the serial number of the fanout. If the fanout is moved, the
AID moves with it. For newly-built systems or newly-installed books, you can determine
the AID from Table 2-3.
Table 2-3 Initial AID number assignment for zEnterprise 196
Note: For adapter and VCHID information specific to zEC12 or zBC12, refer to the IBM
Redbooks documents IBM zEnterprise EC12 Technical Guide, SG24-8049, or IBM
zEnterprise BC12 Technical Guide, SG24-8138.
Adapter
location
Book
Fourth First Third Second
D1 00 08 10 18
D2 01 09 11 19
D3 N/A N/A N/A N/A
D4 N/A N/A N/A N/A
D5 02 0A 12 1A
D6 03 0B 13 1B
D7 04 0C 14 1C
D8 05 0D 15 1D
D9 06 0E 16 1E
DA 07 0F 17 1F
Chapter 2. InfiniBand technical description 27
For zEnterprise 114, the AID is bound to the serial number of the fanout. If the fanout is
moved, the AID moves with it. For newly-built systems you can determine the AID from
Table 2-4.
Table 2-4 Initial AID number assignment for zEnterprise 114
For z10 EC, the AID is bound to the serial number of the fanout. If the fanout is moved, the
AID moves with it. For newly-built systems or newly-installed books, you can determine
the AID from Table 2-5.
Table 2-5 Initial AID number assignment for System z10 EC
Note: The fanout positions D3 and D4 are reserved for Functional Service Processor
(FSP) cards and cannot be used for fanouts. Also, positions D1 and D2 are not
available in zEnterprise 196 models M66 and M80. For model M49, the positions D1
and D2 in the book positions LG10 and LG15 are not available.
Fanout position D1 D2 D3 D4 D5 D6 D7 D8
CEC Drawer 1 AID 08 09 N/A N/A N/A N/A 0A 0B
CEC Drawer 2 AID 00 01 N/A N/A N/A N/A 02 03
Note: The fanout positions D3, D4, D5, and D6 are reserved for the Functional Service
Processors and Oscillator cards and cannot be used for fanouts.
Adapter
location
Book
Fourth First Third Second
D1 00 08 10 18
D2 01 09 11 19
D3 N/A N/A N/A N/A
D4 N/A N/A N/A N/A
D5 02 0A 12 1A
D6 03 0B 13 1B
D7 04 0C 14 1C
D8 05 0D 15 1D
D9 06 0E 16 1E
DA 07 0F 17 1F
Note: The fanout positions D3 and D4 are reserved for Functional Service Processor
(FSP) cards and cannot be used for fanouts. Also, positions D1 and D2 are not
available in System z10 models E56 and E64. For model E40 the positions D1 and D2
in the book positions LG10 and LG15 are not available.
28 Implementing and Managing InfiniBand Coupling Links on IBM System z
For z10 BC, the AID is bound to the serial number of the fanout. If the fanout is moved, the
AID moves with it. For newly-built systems, you can determine the AID from Table 2-6.
Table 2-6 Initial AID number assignment for System z10 BC
The AID for z9 is bound to the physical fanout position. If the fanout is moved to another
slot, the AID changes for that specific fanout, and it might be necessary to adjust the
input/output configuration data set (IOCDS).
The Physical Channel ID (PCHID) Report lists the assignments of the AIDs for new CPCs or
miscellaneous equipment specification (MES) upgrades, and in the Hardware Management
Console (HMC) and Support Element (SE) panels after installation. See 3.7, “Physical and
logical coupling link capacity planning” on page 50 for an example of a PCHID Report.
2.4.2 VCHID - Virtual Channel Identifier
A physical channel identifier (PCHID) normally has a one-to-one relationship between the
identifier and a physical location in the machine; see Figure 2-8 for an example.
Figure 2-8 The PCHID refers to the physical location
Fanout position D1 D2 D3 D4 D5 D6 D7 D8 D9 DA
AID N/A N/A 00 01 02 03 04 05 N/A N/A
Note: The fanout positions D1, D2, D9, and DA are reserved for the Functional Service
Processors and Oscillator cards and cannot be used for fanouts.
Chapter 2. InfiniBand technical description 29
However, a PCHID in the range from 0700 to 07FF lacks the one-to-one relationship between
the identifier and the physical location, either because they do not have a physical card (like
Internal Coupling connections (ICPs)), or because they are administered through different
identifiers (as for PSIFB links, with the AIDs). No one-to-one relationship is possible due to
the capability to define more than one CHPID for a physical location. Therefore, these are
sometimes referred to as Virtual Channel Path Identifiers (VCHIDs). Note that the SE and
HMC still refer to these as PCHIDs; see Figure 2-9 for an example.
Figure 2-9 The VCHID refers to the physical location of the HCA and Port
VCHIDs for IC channels have been implemented for several generations of System z CPCs.
However, prior to the introduction of PSIFB links, there was no requirement for the operators
to interact with the VCHID objects on the SE. With the introduction of PSIFB, there might now
be situations where you have to query or manage the VCHID objects.
To manage PSIFB links, the SE and HMC tasks have been changed to handle the VCHIDs
that support the physical hardware. In the SE Channel task, the VCHID for every CHPID that
is associated with a PSIFB link is shown, and all traditional channel operations can be carried
out against each VCHID. The AID and port that the VCHID is assigned to can be found under
the channel details for each VCHID (see Figure 2-9 for an example).
VCHIDs are assigned automatically by the system and are not defined by you in the IOCDS.
A VCHID is also not permanently tied to an AID. Therefore the VCHID assignment can
change after a Power-On Reset (POR) if the hardware configuration changed (if an HCA was
added or removed, for example). Due to the automatic assignment of the VCHID at every
POR, the client or SSR needs to make sure that the correlation for the channel that they
intend to manipulate has not changed. The VCHID that is currently associated with a coupling
CHPID can be found by issuing an MVS D CF,CFNM=xxxx command for the associated CF.
30 Implementing and Managing InfiniBand Coupling Links on IBM System z
2.5 InfiniBand coupling links
This section discusses the various PSIFB coupling links that are available for each individual
zEnterprise and System z CPC from the CPC point of view.
2.5.1 12X PSIFB coupling links on System z9
An HCA1-O fanout is installed in a fanout slot in the front of a z9 book and takes an empty slot
or the place of one of the previous MBAs. It supports the 12X IB-SDR link option and is used
to connect a System z9 to a zEnterprise or a System z10. The fanout has two optical MPO
ports and the order increment for the HCA1-O fanout is always one complete fanout with both
ports enabled.
The point-to-point coupling link connection is established by connecting the HCA1-O fanout to
a zEnterprise or System z10 HCA2-O fanout through a 50-micron OM3 (2000 MHz-km)
multimode fiber optic cable. A HCA1-O fanout can only be connected to a HCA2-O fanout.
The cable contains 12 lanes (two fibers per lane, one each for transmit and receive); 24 fibers
in total. The maximum supported length for these connections is 150 meters (492 feet). The
link bandwidth is 3 GBps for a single data rate connection (SDR) and is auto-negotiated.
Each HCA1-O fanout has an AID that is bound to the physical position in which it is installed.
That means that if you move the fanout to another position, the AID changes and you need to
adjust the AID in the IOCP or HCD. See 2.4, “Adapter ID assignment and VCHIDs” on
page 26 for more information.
It is possible to define up to 16 CHPIDs per fanout (AID), which can be freely distributed
across both ports.
A maximum of 8 HCA1-O fanouts per book are supported on a System z9, providing a total of
16 ports. Regardless of how many fanouts are installed, the maximum combined total of 64
CF link CHPIDs per CPC applies.
2.5.2 12X PSIFB coupling links on System z10
An HCA2-O fanout is installed in a fanout slot in the front of the System z10 CPC cage (or
CPC drawer for the z10 BC). It supports the 12X IB-DDR link option. This link connects to a
zEnterprise, a System z10, or a System z9 CPC, in a point-to-point coupling link connection.
Note: For coupling link information specific to zEC12 or zBC12, refer to the IBM Redbooks
documents IBM zEnterprise EC12 Technical Guide, SG24-8049, or IBM zEnterprise BC12
Technical Guide, SG24-8138.
Note: At the time of writing, upgrades for System z9 CPCs have been withdrawn from
marketing.
Important: For optimal performance, define no more than four CIB CHPIDs per port.
Note: HCA1-O adapters cannot be carried forward on an upgrade to a z10 or z196.
Upgrades to either of those CPCs requires replacing the HCA1 adapters with HCA2 or
HCA3 adapters.
Chapter 2. InfiniBand technical description 31
The fanout has two optical MPO ports (see Figure 2-2 on page 20). The order increment for
the HCA2-O fanout is always one complete fanout with both ports enabled.
The connection is established by connecting the HCA2-O to the other system’s PSIFB fanout
(either an HCA1-O for a System z9, a HCA2-O for a zEnterprise or a System z10, or a
HCA3-O for a zEnterprise) through a 50-micron OM3 (2000 MHz-km) multimode fiber optic
cable. The cable contains 12 lanes (two fibers per lane, one each for transmit and receive); 24
fibers in total. The maximum supported length for these connections is 150 meters (492 ft).
The maximum bandwidth is 6 GBps for a double data rate connection (DDR) or 3 GBps for a
single data rate connection (SDR) and is auto-negotiated.
Each HCA2-O fanout has an AID, which is bound to the HCA serial number. See 2.4,
“Adapter ID assignment and VCHIDs” on page 26 for more information.
It is possible to define up to 16 CHPIDs per fanout (AID), which can be freely distributed
across both ports.
A maximum of 16 fanouts for InfiniBand coupling are supported on a z10 EC. All 16 can be
HCA2-O fanouts providing a total of 32 ports. On a z10 BC, a maximum of 6 HCA2-O fanouts
are supported, providing a total of 12 ports. Regardless of the number of fanouts installed or
used, the maximum of 64 CF link CHPIDs per CPC still applies (including IC, Inter-Cluster
Bus-4 (ICB4), active InterSystem Coupling Facility-3 (ISC3), and PSIFB).
2.5.3 PSIFB Long Reach coupling links on System z10
An HCA2-O LR fanout is installed in a fanout slot in the front of the z10 CPC cage (or CPC
drawer for the z10 BC). It supports the 1X IB-DDR LR link option. This link connects either to
a zEnterprise or to a z10 CPC in a point-to-point coupling link connection. The fanout has two
optical SFP ports, and the order increment for the HCA2-O LR fanout is always one complete
fanout with both ports enabled.
The connection is established by connecting the HCA2-O LR to the other system’s
HCA2-O LR or HCA3-O LR port through a 9-micron single mode fiber optic cable. The cable
contains one lane with one fiber for transmit and one for receive. The maximum supported
length for these connections is 10 km1 (6.2 miles) unrepeated and 175 km (108 miles) when
repeated through a DWDM. The maximum bandwidth is 5 Gbps for a double data rate
connection (DDR) or 2.5 Gbps for a single data rate connection (SDR) and is auto-negotiated.
Each HCA2-O LR fanout has an AID, which is bound to the HCA serial number. See 2.4,
“Adapter ID assignment and VCHIDs” on page 26 for more information.
It is possible to define up to 16 CHPIDs per fanout (AID), which can be freely distributed
across both ports.
Important: For optimal performance, define no more than four CIB CHPIDs per port.
Note: The InfiniBand link data rate of 6 GBps or 3 GBps does not represent the
performance of the link. The actual performance depends on many factors, such as
latency through the adapters, cable lengths, and the type of workload.
1 Refer to RPQ 8P2340 for information about extending the unrepeated distance to 20 km.
32 Implementing and Managing InfiniBand Coupling Links on IBM System z
A maximum of 16 fanouts for coupling are supported on a z10 EC. All 16 can be HCA2-O LR
fanouts providing a total of 32 ports. On a z10 BC, a maximum of 6 HCA2-O LR fanouts are
supported, providing a total of 12 ports. Nevertheless, the maximum value of 64 CF link
CHPIDs per system (including IC, Inter-Cluster Bus-4 (ICB-4), active InterSystem Coupling
Facility-3 (ISC-3), and PSIFB) still applies.
\
2.5.4 12X PSIFB coupling links on z196 and z114
An HCA3-O or HCA2-O fanout is installed in a fanout slot in the front of the z196 CPC cage
(or CPC drawer for the z114). It supports the 12X IB-DDR link option. This link connects to a
zEnterprise, a System z10, or a System z9 CPC, in a point-to-point coupling link connection.
The fanout has two optical MPO ports (see Figure 2-2 on page 20), and the order increment
for the HCA3-O or HCA2-O fanout is always one complete fanout with both ports enabled.
The connection is established by connecting the HCA3-O or HCA2-O to the other system’s
PSIFB fanout (either an HCA1-O for a z9, a HCA2-O for a zEnterprise or z10, or a HCA3-O
for a zEnterprise) through a 50-micron OM3 (2000 MHz-km) multimode fiber optic cable. The
cable contains 12 lanes (two fibers per lane, one each for transmit and receive); 24 fibers in
total. The maximum supported length for these connections is 150 meters (492 ft). The
maximum bandwidth is 6 GBps for a double data rate connection (DDR) or 3 GBps for a
single data rate connection (SDR) and is auto-negotiated.
With the introduction of the HCA3-O fanout, it is possible to utilize an improved IFB protocol
which is called IFB3. This new protocol provides improved service times for the PSIFB 12X
link. However, certain conditions must be met to utilize it:
The IFB3 protocol will only be used when both ends of the link are connected to an
HCA3-O fanout.
The IFB3 protocol will only be used if a maximum of four CHPIDs are defined per HCA3-O
fanout port for all logical partitions (LPAR) combined.
For example, IFB3 mode will be used in the following situations:
Four CHPIDs are assigned to a HCA3-O port, and all four CHPIDs are shared across
z/OS LPARs that are in the same sysplex.
Four CHPIDs are assigned to a HCA3-O port, and each CHPID is in use by a different
sysplex.
Four CHPIDs are assigned to a HCA3-O port. The CHPIDs are defined as SPANNED,
and are shared across z/OS LPARs in multiple CSSs.
IFB3 mode will not be used in the following cases:
More than four CHPIDs are assigned to the port.
More than four CHPIDs are assigned to the port, but some of the CHPIDs are offline,
bringing the number of online CHPIDs below five. The port will still run in IFB mode.
Important: For optimal performance, it is best to avoid defining more than four CHPIDs
per port. However, if the link is being used to provide connectivity for sysplexes with low
levels of Coupling Facility activity over greater distances, it might be acceptable to assign
more than four CHPIDs per port to be able to utilize a greater number of subchannels.
Note: The InfiniBand link data rate of 5 Gbps or 2.5 Gbps does not represent the
performance of the link. The actual performance depends on many factors, such as
latency through the adapters, cable lengths, and the type of workload.
Chapter 2. InfiniBand technical description 33
The PSIFB link will automatically detect if the given requirements are met and will
auto-negotiate the use of the IFB3 protocol. The two ports of an HCA3-O fanout are able to
work in different protocol modes. It is possible to determine from the Support Element which
protocol is currently being used on any given HCA3-O port. See “Analyze Channel
Information option” on page 226 for more information.
Each HCA3-O or HCA2-O fanout has an AID, which is bound to the HCA serial number. See
2.4, “Adapter ID assignment and VCHIDs” on page 26 for more information.
It is possible to define up to 16 CHPIDs per fanout (AID), which can be freely distributed
across both ports. For optimum performance, especially when using HCA3-O links, do not
define more than four CHPIDs per port. However, if the link is being used to provide
connectivity for sysplexes with low levels of Coupling Facility activity, it might be acceptable to
assign more than four CHPIDs per port.
A maximum of 16 fanouts for coupling are supported on a z196. All 16 can be HCA3-O
fanouts, HCA2-O fanouts, or a mix of both, providing a total of 32 ports. On z114, a maximum
of eight HCA3-O fanouts, HCA2-O fanouts, or a mix of both, is supported, providing a total of
16 ports. Even though the maximum number of CF link CHPIDs was raised to 128 for
zEnterprise systems, remember that this includes IC, InterSystem Coupling Facility-3 (ISC3),
and PSIFB connections.
2.5.5 Long Reach PSIFB coupling links on zEnterprise 196 and 114
An HCA3-O LR or HCA2-O LR fanout is installed in a fanout slot in the front of the z196 CPC
cage (or CPC drawer for the z114). It supports the 1X IB-DDR LR link option. This link
connects to either a zEnterprise or a z10 CPC in a point-to-point coupling link connection.
The fanout has four optical SFP ports for HCA3-O LR or two optical SFP ports for HCA2-O
LR, and the order increment for the HCA3-O LR fanout is always one complete fanout with all
ports enabled. The HCA2-O LR fanout can no longer be ordered for a zEnterprise CPC. It can
only be carried forward through an MES from a z10 or a z196 at Driver level 86.
The connection is established by connecting the HCA3-O LR or HCA2-O LR to the other
system’s HCA3-O LR or HCA2-O LR port through a 9-micron single mode fiber optic cable.
The cable contains one lane with one fiber for transmit and one for receive. The maximum
supported length for these connections is 10 km2 (6.2 miles) unrepeated and 175 km (108
miles) when repeated through a DWDM. The maximum bandwidth is 5 Gbps for a double data
rate connection (DDR) or 2.5 Gbps for a single data rate connection (SDR) and is
auto-negotiated.
Important: In the case where a dynamic I/O configuration change results in an IFB
protocol mode change on an HCA3-O port, the physical port will automatically perform a
reinitialization. This will result in all defined CHPIDs on this port being toggled offline
concurrently and then back online. As a result, all connectivity to any connected Coupling
Facilities and STP through this port will be lost for a short period of time.
This means that you must ensure that all your CFs are connected through at least two
physical links, and that any change you make is not going to affect more than one port.
Note: The InfiniBand link data rate of 6 GBps or 3 GBps does not represent the
performance of the link. The actual performance depends on many factors, such as
latency through the adapters, cable lengths, and the type of workload.
2 Refer to RPQ 8P2340 for information about extending the unrepeated distance to 20 km.
34 Implementing and Managing InfiniBand Coupling Links on IBM System z
Each HCA3-O LR and HCA2-O LR fanout has an AID, which is bound to the HCA serial
number. See 2.4, “Adapter ID assignment and VCHIDs” on page 26 for more information.
It is possible to define up to 16 CHPIDs per fanout (AID), which can be freely distributed
across both ports.
An overall maximum of 16 coupling link fanouts is supported on a z196 and 12 of those can
be long reach coupling link fanouts. Therefore, the maximum number of up to 48 long reach
coupling ports can be reached if all 12 long reach fanouts installed are HCA3-O LRs.
On z114, a maximum of eight HCA3-O LR fanouts, HCA2-O LR fanouts, or a mix of both, is
supported, providing a total of up to 32 ports.
Even though the maximum number of CF link CHPIDs was raised to 128 for zEnterprise
systems, it still has to be taken into consideration because it includes IC, InterSystem
Coupling Facility-3 (ISC3), and PSIFB connections.
\
2.5.6 PSIFB coupling links and Server Time Protocol
External PSIFB coupling links can also be used to pass time synchronization signals using
Server Time Protocol (STP). Therefore, you can use the same coupling links to exchange
both time synchronization information and Coupling Facility messages in a Parallel Sysplex.
See 3.4, “Considerations for Server Time Protocol” on page 45 for more information.
Note that all the links between a given pair of CPCs must be defined as coupling links or as
timing-only links; you cannot have a mix of coupling and timing-only links between one pair of
CPCs.
2.6 InfiniBand cables
Two cable types are used for PSIFB connections in the zEnterprise, System z10 and
System z9 environment.
The cable types are:
Standard 9 µm single mode fiber optic cable (see Figure 2-10 on page 35) with LC Duplex
connectors for PSIFB LR (1X) links.
Note: The InfiniBand link data rate of 5 Gbps or 2.5 Gbps does not represent the
performance of the link. The actual performance depends on many factors, such as
latency through the adapters, cable lengths, and the type of workload.
Note: To avoid a single point of failure, use at least two physical connections for all CPCs
that are connected using InfiniBand, and spread those connections over multiple adapters
(different AIDs).
Note: Fiber optic cables and cable planning, labeling, and placement are client
responsibilities for new installations and upgrades.
Chapter 2. InfiniBand technical description 35
Figure 2-10 Single mode fiber optic cable with LC Duplex connectors
The 50-micron OM3 (2000 MHz-km) multimode fiber cable with MPO connectors (see
Figure 2-11) for PSIFB 12X links.
The 50-micron OM3 fiber cable is an InfiniBand Trade Association (IBTA) industry
standard cable. It has one pair of fibers per lane (24 fibers in total) for a 12X connection.
Figure 2-11 Optical InfiniBand cable, including TX and RX labels
The sender and receiver are clearly marked with either RX or TX, and the connectors are
keyed. Also, on all field-replaceable units (FRUs) using IFB optical modules, the keys face
upward, the Transmitter module (TX) is on the left side, and the Receiver module (RX) is
on the right side.
Transmitter (TX)
Receiver (RX)
Transmitter (TX)
Receiver (RX)
36 Implementing and Managing InfiniBand Coupling Links on IBM System z
To avoid problems or lengthy problem determination efforts, use the IBM part numbers to
order cables that are designed to provide 99.999% availability.
Important: The fiber optic modules and cable connectors are sensitive to dust and debris.
Component and cable dust plugs must always cover unused ports and cable ends.
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. 37
Chapter 3. Preinstallation planning
In this chapter, we provide information to assist you with planning for the installation of
InfiniBand coupling links on IBM zEC12, zBC12, zEnterprise 196, zEnterprise 114, and
System z10 CPCs.
3
38 Implementing and Managing InfiniBand Coupling Links on IBM System z
3.1 Planning considerations
To ensure that your migration to InfiniBand coupling links takes place as transparently as
possible, having a well-researched and documented implementation plan is critical. To assist
you in creating your plan, we discuss the following topics in this chapter:
Planning the topology of the CPC types you will connect using PSIFB coupling links
There are differences in the implementation of PSIFB on z196, z114, z10, and z9.
Planning for the hardware and software prerequisites
In addition to the minimum hardware and software levels that are necessary to install and
define the PSIFB coupling links, there are restrictions on the type of CPCs that can coexist
in a Parallel Sysplex.
Considerations for Server Time Protocol
In addition to its use to connect z/OS and CF LPARs, InfiniBand also provides connectivity
for the members of a Coordinated Timing Network.
Planning for connectivity in a sysplex that spans multiple data centers
InfiniBand provides potential performance and cost-saving advantages over ISC3 links for
connecting a sysplex that spans many kilometers.
Planning for future growth with minimal disruption
Because stand-alone CFs do not have the ability to do a dynamic reconfiguration, adding
coupling links normally requires a POR of the CF CPC. However, changes introduced in
z/OS 1.13, together with advance planning, provide the ability to add coupling link capacity
without requiring an outage of the affected CF LPAR.
Determining how many physical coupling links and logical CHPIDs you require
The number of coupling links you require reflects your connectivity requirements (how
many CPCs must be connected to each other and how many sysplexes span those
CPCs), availability requirements (making sure that there are no single points of failure),
and capacity and performance requirements.
Preparing the information that you need to define your configuration to hardware
configuration definition (HCD)
This involves planning your adapter IDs (AIDs) for the PSIFB coupling links and the
channel path identifiers (CHPIDs) that will be assigned to them. The InfiniBand host
channel adapters (HCAs) are represented by an identifier called an AID, and multiple
CHPIDs can be defined to the ports associated with the AID.
Planning your cabling requirements
PSIFB coupling links might require new types of cable and connectors, depending on the
type of links that are being used prior to the InfiniBand links.
3.2 CPC topology
PSIFB links are available on IBM zEC12, zBC12, z196, z114, z10, and z9 CPCs. They
support the use of coupling and timing-only links between these CPCs.
The full list of supported link types for these CPCs is contained in Table 3-1 on page 41.
Chapter 3. Preinstallation planning 39
3.2.1 Coexistence
You must consider the requirements for CPC coexistence when implementing PSIFB coupling
links. The z196 and z114 CPCs can only coexist in a Parallel Sysplex or a Coordinate Timing
Network (CTN) with z10 (EC or BC) and z9 (EC or BC) CPCs. You must remove any earlier
CPCs (such as z990, z890, z900, and z800) from the Parallel Sysplex or CTN or replace
them with a supported CPC before you can add a z196 or z114 to the sysplex. This statement
applies regardless of the type of coupling link that is being used.
Figure 3-1 illustrates the supported coexistence environments and the types of supported
coupling links for z196, z10, and z9 CPCs. For detailed hardware prerequisites, see 3.3.1,
“Hardware prerequisites” on page 42.
Figure 3-1 Coupling link configuration options and supported coexistence for zEnterprise 196/114
Note: z9 CPCs cannot be coupled to other z9 CPCs using PSIFB links.
2818
z114
PSIFB, ISC3
2817
z196
PSIFB, ISC3
z9 BC
PSIFB, ISC3, ICB4
12x IFB-SDR
3 GBps
12x IFB/IFB3-DDR
6 GBps
12x IFB-SDR
3 GBps
2094
z196 or z114
(No ICB-4)
2094
z9 EC
PSIFB, ISC3, ICB4
2817
or
2818
12x IFB/IFB3-DDR
6 GBps
z10 BC
PSIFB, ISC3,
and ICB4
2098
2096
z10 EC
PSIFB, ISC3, and
ICB4 (Except E64)
2097
1x IFB-DDR
5 Gbps
1x IFB-DDR
5 Gbps
STI
2097 STI
STI
STI
STI
STI
STI
STI
STI
ISC3
ISC3
ISC3
ISC3
ISC3
1x IFB-DDR 5
Gbps
1x IFB-DDR
5 Gbps
12x IFB-DDR 6 GBps
12x IFB-DDR 6 GBps
ISC3
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
IFBIFB
40 Implementing and Managing InfiniBand Coupling Links on IBM System z
Figure 3-2 shows the supported coexistence environments and the types of supported
coupling links for zEC12, zBC12, z196, z114, and z10 CPCs. Remember that a Parallel
Sysplex or a CTN can only contain three consecutive generations of System z servers, so the
guidelines that are related to upgrading or removing older CPCs that apply to z196 and z114
also apply to zEC12 and zBC12.
Figure 3-2 Coupling link configuration options and supported coexistence for zEC12 and zBC12
Also, remember that z196 and later do not support IBM Sysplex Timer connection, so you
must have commenced your migration to STP mode prior to installing your first z196 or later
CPC, and must plan on completing the migration before removing the last pre-z196 CPC.
Also, z196 and later CPCs do not support ICB4 links. If your current coupling infrastructure
uses ICB4 links, you must migrate to InfiniBand before installing your first z196 or later CPC if
you want to connect that CPC to the other CPCs in your configuration.
3.2.2 Supported coupling link types
IBM z196 and later CPCs support up to 128 CF link CHPIDs. Table 3-1 on page 41
summarizes the maximum number of physical coupling links that are supported for each CPC
type.
ISC-3*, 2 Gbps
10/100 km
z196 and z114
12x IFB, 12x IFB3, 1x IFB, & ISC-3
12x IFB, 6 GBps
Up to 150 m
z10 EC and z10 BC
12x IFB, 1x IFB & ISC-3
1x IFB, 5 Gbps
10/100 km
ISC-3*
10/100 km 1x IFB, 5 Gbps
10/100 km
12x IFB3
or IFB
6 GBps
150 m
HCA3-O
HCA3-O ISC-3*, 2 Gbps
10/100 km
HCA2-O*
HCA2-O*
HCA2-O
OR
OR
*HCA2-O, HCA2-O LR, & ISC-3 carry
forward only on zEC12
HCA3-O LR
OR
HCA3-O LR
OR
HCA2-O LR*
HCA2-O LR*
HCA2-O LR*
zEC12
zEC12
HCA2-O LR
1x IFB, 5 Gbps
10/100 km HCA3-O LR
OR
HCA2-O LR*
HCA2-O 12x IFB3 or IFB
6 GBps
150 m
OR
HCA3-O
HCA3-O LR
OR
HCA3-O
HCA2-O*
OR
HCA3-O LR
OR
HCA2-O LR*
HCA3-O
HCA2-O*
OR
zBC12
Statement Of Direction: The zEnterprise 196 and zEnterprise 114 will be the last
generation of System z CPCs to support ordering of ISC3 links. However, it will be possible
to carry forward ISC3 links on an upgrade from earlier CPC generations.
zEC12 and zBC12 also support carry forward of ISC3 links on an upgrade. However, this is
the last generation of System z CPC that will support this capability.
Chapter 3. Preinstallation planning 41
Table 3-1 Maximum number of coupling links supported
Link type Maximum supported linksa
a. Maximum number of coupling links combined (ICB, PSIFB, ISC3, and IC) for all System z10 and z9 CPCs is 64.
The limit for zEnterprise CPCs has been increased to 128.
zEC12 zBC12 z196b
b. Maximum of 56 PSIFB links on z196.
z114 z10
ECc
c. Maximum of 32 PSIFB links and ICB4 links on z10 EC. ICB4 links are not supported on model E64.
z10
BC
z9 EC z9 BC
IC 32 32 32 32 32 32 32 32
ISC3 48d
d. zEC12 only supports ISC3, HCA2-O, and HCA2-LR adapters when carried forward as part of an upgrade.
32 48 48 48 48 48 48
ICB4 n/a n/a n/a n/a 16 12 16 16
ICB3 n/a n/a n/a n/a n/a n/a 16 16
HCA1-O (12X IFB)e
e. HCA1-O is only available on System z9 (withdrawn from marketing in June 2010).
n/a n/a n/a n/a n/a n/a 16 12
HCA2-O LR (1X IFB)f
f. HCA2-O LR adapters are still available for z10, however they have been withdrawn from marketing for z196 and
z114. On z196 and z114, HCA3-O LR functionally replaces HCA2-O LR.
32 12 32g
g. A maximum of 16 HCA2-O LR and HCA2-O coupling links are supported on the z196 model M15.
12h
h. A maximum of 8 HCA2-O LR and HCA2-O coupling links are supported on the z114 model M05.
32i
i. A maximum of 16 HCA2-O LR and HCA2-O coupling links are supported on the z10 model E12.
12 n/a n/a
HCA2-O (12X IFB) 32 16 32g16h32i12 n/a n/a
HCA3-O LR (1X IFB)j
j. HCA3-O and HCA3-O LR are only available on z196 and z114.
64 32 48k
k. A maximum of 32 HCA3-O LR and 16 HCA3-O coupling links are supported on the z196 model M15.
32l
l. A maximum of 16 HCA3-O LR and 8 HCA3-O coupling links are supported on the z114 model M05.
n/an/an/an/a
HCA3-O (12X IFB & 12X IFB3)j32 16 32k16ln/an/an/an/a
Max External Linksm
m. Maximum external links is the maximum total number of physical link ports (does not include IC).
104 72 104n
n. The number of maximum external links is dependent on the number of HCA fanout slots available. The maximum
of 104 external links (72 for z114) can only be achieved with a combination of ISC3, HCA3-O LR, and HCA3-O
links.
72n64o
o. Maximum number of external links for all z10 and z9 CPCs is 64.
64 64 64
Max Coupling CHPIDsp
p. Maximum coupling CHPIDs defined in IOCDS include IC and multiple CHPIDs defined on PSIFB links.
128 128 128 128 64 64 64 64
Important: Be aware of the following points:
For z196 and zEC12, the maximum number of external links and the maximum number of
CHPIDs that can be defined are 104 and 128, respectively.
For z114 and zBC12, the maximum number of external links and the maximum number of
CHPIDs that can be defined are 72 and 128, respectively.
For z10 and z9, the maximum number of coupling links and the maximum number of
CHPIDs that can be defined are both 64.
The use of internal coupling (IC) links and assigning multiple CHPIDs to a PSIFB link can
cause the CHPID limit to be reached before the maximum number of physical links.
42 Implementing and Managing InfiniBand Coupling Links on IBM System z
Supported CHPID types for PSIFB links
All PSIFB coupling link CHPIDs are defined to HCD as type CIB. They conform to the general
rules for Coupling Facility peer channels (TYPE=CFP, TYPE=CBP, TYPE=ICP, or
TYPE=CIB).
You can configure a CFP, CBP, ICP, or CIB CHPID as:
A dedicated CHPID to a single LPAR.
A dedicated reconfigurable CHPID that can be configured to only one CF LPAR at a time,
but that can be dynamically moved to another CF LPAR in the same CSS.
A shared CHPID that can be concurrently used by LPARs in the same CSS to which it is
configured.
A spanned CHPID that can be concurrently used by LPARs in more than one CSS.
For further details, see 6.4.1, “Input/output configuration program support for PSIFB links” on
page 161.
3.3 Hardware and software prerequisites
In this section, we discuss the hardware and software prerequisites for implementing PSIFB
links on the z9 and later CPCs.
3.3.1 Hardware prerequisites
The implementation of PSIFB coupling links requires a zEnterprise, z10, or z9 CPC. The
base prerequisites for PSIFB are satisfied by the base zEnterprise or z10 models at general
availability. The recommended microcode levels are highlighted as part of the order process.
When installing an MES, the required microcode levels are documented in the MES
installation instructions. When installing a new machine with HCA adapters installed, there is
a minimum code requirement documented in the appropriate Solution Assurance Product
Review (SAPR) guide that is available to your IBM representative.
Additional hardware system area (HSA) is required to support the HCA1-O fanout on a z9
CPC. For z9, a power-on reset (POR) is required when the first PSIFB feature is installed;
however, this is not necessary on later CPCs.
See 2.2, “InfiniBand fanouts” on page 19 for a detailed description of the InfiniBand fanouts
that are offered on zEnterprise and System z CPCs.
Bring your CPCs to at least the Driver and bundle levels shown in Table 3-2 prior to moving to
InfiniBand. CPCs older than z9 are not supported in the same Parallel Sysplex or CTN as
z196 or z114. CPCs older than z10 are not supported in the same Parallel Sysplex or CTN as
zEC12 or zBC12.
Table 3-2 Recommended minimum hardware service levels
CPC Recommended Driver Recommended minimum
bundle level
z9 67L 63
z10 79 51a
z196 GA1b86 38
Chapter 3. Preinstallation planning 43
3.3.2 Software prerequisites
PSIFB links are supported by z/OS 1.7 and later releases. Several releases might require
additional program temporary fixes (PTFs) in support of PSIFB.
The information necessary to identify the required service is available in the following
Preventive Service Planning (PSP) buckets:
2097DEVICE for z10 EC
2098DEVICE for z10 BC
2817DEVICE for z196
2818DEVICE for z114
2827DEVICE for zEC12
2828DEVICE for zBC12
Rather than using the PSP buckets, however, we suggest using the SMP/E REPORT
MISSINGFIX command in conjunction with the FIXCAT type of HOLDDATA. The PSP
upgrades and the corresponding FIXCAT names are shown in Table 3-3. Because of the
relationship between STP and InfiniBand, we have included the STP FIXCATs in the table1.
Table 3-3 PSP bucket upgrades and FIXCAT values for z/OS CPCs
z196 GA2 and z114 93 13c
zEC12 GA1 12 38
zEC12 GA2, zBC12 15 6B
a. Minimum level required to couple z10 HCA2 adapters to HCA3 adapters is Bundle 46.
b. Minimum required to couple HCA2 adapters to HCA3 adapters.
c. If the CPC will have both HCA2 12X and HCA3 12X links to the same CF, install Bundle 18 or
later.
CPC Recommended Driver Recommended minimum
bundle level
1 For information about the available FIXCATs and how to download them, see the following site:
http://www.ibm.com/systems/z/os/zos/features/smpe/fix-category.html
CPC Upgrade FIXCAT value
zEC12 2827DEVICE IBM.Device.Server.zEC12-2827
IBM.Device.Server.zEC12-2827.ParallelSysplexInfiniBandCoupling
IBM.Device.Server.zEC12-2827.ServerTimeProtocol
zBC12 2828DEVICE IBM.Device.Server.zBC12-2828
IBM.Device.Server.zBC12-2828.ParallelSysplexInfiniBandCoupling
IBM.Device.Server.zBC12-2828.ServerTimeProtocol
z196 2817DEVICE IBM.Device.Server.z196-2817
IBM.Device.Server.z196-2817.ParallelSysplexInfiniBandCoupling
IBM.Device.Server.z196-2817.ServerTimeProtocol
z114 2818DEVICE IBM.Device.Server.z114-2818
IBM.Device.Server.z114-2818.ParallelSysplexInfiniBandCoupling
IBM.Device.Server.z114-2818.ServerTimeProtocol
44 Implementing and Managing InfiniBand Coupling Links on IBM System z
You must review the PSP information or REPORT MISSINGFIX output from SMP/E early in
the planning process to allow time for ordering any necessary software maintenance and then
rolling it out to all the members of all your sysplexes. Example 3-1 shows a sample REPORT
MISSINGFIX output for z196.
Example 3-1 Sample REPORT MISSINGFIX output for InfiniBand
MISSING FIXCAT SYSMOD REPORT FOR ZONE ZOSTZON
HOLD MISSING HELD _____RESOLVING SYSMOD____
FIX CATEGORY FMID CLASS APAR SYSMOD NAME STATUS RECEIVED
_______________ _______ _______ _______ _______ _______ ______ _______
IBM.Device.server.z196-2817.ParallelSysplexInfiniBandCoupling
HBB7750 AA25400 HBB7750 UA43825 GOOD YES
Additionally, we recommend that all System z clients subscribe to the IBM System z Red Alert
service. For more information, see the following subscription site:
https://www14.software.ibm.com/webapp/set2/sas/f/redAlerts/home.html
If you want to exploit the 32 subchannels per CHPID functionality, ensure that both your
hardware and HCD support levels are correct. Figure 3-3 shows the CPC options that are
presented after the HCD PTF to support 32 subchannels per coupling CHPID has been
installed. See Chapter 6, “Configuration management” on page 155 for more information.
Figure 3-3 HCD processor type support levels for zEnterprise CPCs
z10 EC 2097DEVICE IBM.Device.Server.z10-EC-2097
IBM.Device.Server.z10-EC-2097.ParallelSysplexInfiniBandCoupling
IBM.Device.Server.z10-EC-2097.ServerTimeProtocol
z10 BC 2098DEVICE IBM.Device.Server.z10-BC-2098
IBM.Device.Server.z10-BC-2098.ParallelSysplexInfiniBandCoupling
IBM.Device.Server.z10-BC-2098.ServerTimeProtocol
Command ===> ___________________________________________ Scroll ===> PAGE
Select one to view more details.
Processor
Type-Model Support Level
# 2817-M32
2817-M49 XMP, 2817 support, SS 2, 32 CIB CF LINKS
# 2817-M49
2817-M66 XMP, 2817 support, SS 2, 32 CIB CF LINKS
# 2817-M66
2817-M80 XMP, 2817 support, SS 2, 32 CIB CF LINKS
# 2817-M80
CPC Upgrade FIXCAT value
Chapter 3. Preinstallation planning 45
3.4 Considerations for Server Time Protocol
Server Time Protocol (STP) provides a coordinated time source for systems connected
through coupling links. It replaces the Sysplex Timer (9037) as the time source for
interconnected systems.
STP uses coupling links to transmit time signals between interconnected systems. All types of
PSIFB coupling links (on supported CPCs) can be used to transmit STP timing signals in a
Coordinated Timing Network. STP interconnectivity using other types of coupling links (ISC3,
ICB4) are also possible on CPCs that support them.
3.4.1 Considerations for STP with PSIFB coupling links
STP communication is at the CPC level, not the LPAR level. This means that STP
communication is not dependent on any particular LPAR being available or even activated.
Also, the speed of STP links (ISC3, ICB4, or PSIFB) has no impact on STP performance or
timing accuracy. However, you need to carefully consider the STP CPC roles and coupling
link connectivity when planning the links for your Coordinated Timing Network (CTN).
Server roles and connectivity
There are several server roles within an STP-only CTN. These are defined using the “System
(Sysplex) Time” icon on the HMC:
Current Time Server (CTS)
The Current Time Server is the active stratum 1 server and provides the time source in an
STP-only CTN. Only the Preferred Time Server or Backup Time Server can operate as the
CTS.
Preferred Time Server (PTS)
This refers to the server that has preference to operate as the CTS and Stratum 1 server
of the STP-only CTN is assigned the role of Preferred Time Server. This server requires
connectivity to the Backup Time Server and the Arbiter (if present).
Backup Time Server (BTS)
Although this is an optional role, it is strongly recommended. This server will take over as
the CTS and Stratum 1 server in recovery conditions or as part of a planned maintenance
operation to the PTS. The BTS requires connectivity to the PTS and Arbiter (if present).
Note: The ability to define 32 subchannels for PSIFB links in HCD and input/output
configuration program (IOCP) is provided by APARs OA32576 and OA35579. Those
APARs changed the default number of subchannels for all CF link CHPIDs to 32. APAR
OA36617 subsequently changed the default back to seven. You should only specify 32
subchannels if the link will span two sites.
Review REPORT MISSINGFIX output and PSP information to ensure that you have the
latest required service.
Note: Connection to Sysplex Timer (ETR) is not supported on z196 and later.
The use of STP is required for time synchronization in a Coordinated Timing Network
containing these CPCs.
46 Implementing and Managing InfiniBand Coupling Links on IBM System z
Arbiter
This is an optional role, although it is strongly recommended when three or more servers
participate in the CTN. The Arbiter provides additional validation of role changes for
planned or unplanned events that affect the CTN. The Arbiter is a stratum 2 server that
should have connectivity to both the PTS and the BTS.
Alternate servers
Any time a PTS, BTS, or Arbiter is going to be removed from service, move that role to
another member of the CTN. Make sure that alternate server has the same connectivity as
the server that it is replacing.
For more information, refer to Important Considerations for STP server role assignments,
available on the web at the following site:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101833
All other servers
The remaining servers in the CTN should have two failure-isolated links to the PTS and
the BTS, and also to the alternate locations for those roles.
Figure 3-4 Sample STP server roles and connectivity with coupling links
Figure 3-4 shows a CTN with the following elements:
There are two stand-alone CFs:
C1 is the PTS and CTS.
–C2 is the BTS
P1, P2, P3, and P4 all contain z/OS systems participating in the Parallel Sysplex.
PTS
CTS
S1
P2
Arbiter
S2
S2
S2
Standalone CFs
C o u pling link s
for ST P
2 C o up lin g lin ks
BTS
S2
S2
CTS – Current Time Server – Active S1
P 1, P2 , P3, P4 z/O S system s w ithin the P arallel Sysplex
C1 C2
P1
P2 P3
P4
HMC
Chapter 3. Preinstallation planning 47
P1 is the Arbiter.
The PTS, BTS, and Arbiter must be connected to each other.
Additional coupling links have been defined between the PTS and BTS for STP purposes
only.
There might be a requirement to configure links between two servers in the CTN that have
only z/OS LPARs defined. With no Coupling Facility LPARs at either end of the link, the links
must be defined as timing-only links. A timing-only link is shown in Figure 3-5 between
P3 and P4.
Figure 3-5 Sample STP server roles and connectivity with coupling and timing-only links
To ensure the highest levels of availability, it is vital that all servers in the CTN are able to
receive timing signals from the CTS or a stratum 2 server at all times. The best way to avoid
connectivity failures that impact STP is to ensure that every server is connected to the PTS
and the BTS (and their backups) by two failure-isolated coupling links. STP is designed to
issue a warning message if it detects that it only has one path to the CTS. However, STP is
Note: Be aware of the following points:
A timing-only link can only be defined when the CPCs at either end of the link contains
no CF LPARs.
A mixture of Timing-only and coupling links is not allowed between the same pair of
servers. HCD will not allow this definition.
The PTS, BTS and Arbiter must be connected to each other.
Define at least two failure-isolated coupling links or timing-only links between each pair
of servers for redundancy.
Timing-only links
PTS/CTS
S1
S2
CTS – Current Time Server – Active S1
P1, P2, P3, P4 – systems in a Parallel Sysplex
2 x Timing-only links
2 x Couplin g links
HMC P1
P2
P3 P4
BTS
S2
48 Implementing and Managing InfiniBand Coupling Links on IBM System z
not aware of the underlying InfiniBand infrastructure; it only sees CHPIDs. So, if there are two
online CHPIDs to the CTS, but both of those CHPIDs are using the same InfiniBand link, STP
will not be aware of that and therefore not issue a warning message.
Also note that STP’s ability to use a coupling link for timing signals is independent of any
LPARs. So, for example, following the successful completion of a power-on reset, all coupling
CHPIDs should be available for STP use, regardless of whether any LPARs are activated or
not. Similarly, if the LPAR that owns a coupling link that is currently being used by STP is
deactivated, that does not stop STP from using that link. For additional information about STP,
see Server Time Protocol Planning Guide, SG24-7280.
3.5 Multisite sysplex considerations
Extended distance sysplex connectivity is provided by ISC3 and PSIFB 1X links. The
maximum unrepeated distance of both these link types is 10 km (20 km with RPQ2).
Distances greater than this require a Dense Wave Division Multiplexer (DWDM).
Careful planning is needed to ensure there are redundant and diverse fiber routes between
sites to avoid single points of failure on a fibre trunk.
IBM supports only those DWDM products qualified by IBM for use in high availability,
multi-site sysplex solutions, such as GDPS. The latest list of qualified DWDM vendor products
can be found on the IBM Resource Link® website at the following link:
https://www.ibm.com/servers/resourcelink
To transmit timing information, STP can choose any defined coupling link or timing-only link
that is online between two IBM zEnterprise or System z CPCs; you cannot limit STP to only
certain CHPIDs. This means that you must not configure the links over a mixture of qualified
and unqualified equipment. Doing so might result in timing problems where CPCs might
become unsynchronized without your knowledge. When coupling links or timing-only links are
2 The relevant RPQs are 8P2197, 8P2263, and 8P2340, depending on the link type and CPC type.
Important: Check the qualification letters in detail to determine your precise requirements
and how best to address those requirements:
Selecting a qualified WDM vendor does not necessarily mean that the selected WDM
model is qualified.
Selecting a qualified WDM model does not mean that a specific release level is
qualified.
A vendor’s WDM model might be qualified for ISC3 or PSIFB 1X links for transfer of CF
requests but not qualified for ISC3 or PSIFB 1X IFB links when also exchanging STP
messages.
Ensure that the vendor qualification letter is inspected carefully by a DWDM technical
expert. The letter specifies model number, release level, interface modules qualified,
protocols, application limitations and so on.
The maximum supported distance is not the same for all vendors, and changes over
time as new devices, features, and capabilities are introduced and qualified.
For more information about qualified WDMs, search on the keywords “qualified WDM” on
the Redbooks website at the following link:
http://www.redbooks.ibm.com
Chapter 3. Preinstallation planning 49
configured over DWDM equipment, all links must use specific DWDM hardware (optical
modules, transponders, TDM modules) with interface cards qualified by IBM for the STP
protocol.
Coupling Facility response times at extended distances
Coupling Facility (CF) performance needs especially careful consideration when the CF is
located at a significant distance away from the connected z/OS or CF CPC.
As the distance between the z/OS and CF CPCs increases, the speed of light becomes the
dominant factor in the response time, adding around 10 microseconds per km for the round
trip to the CF. This results in a direct impact to the service time, and most synchronous
requests being converted to asynchronous.
Because the subchannel and link buffer that are used for the CF request are allocated for the
entire duration of the service time, the increased service times caused by the distance result
in high subchannel and link buffer usage and potentially more subchannel delay and Path
Busy events. There are two attributes of InfiniBand links that can be particularly beneficial to
long-distance sysplexes:
On zEnterprise CPCs using Driver 93 and later, 1X InfiniBand links support either 7 or 32
subchannels per CHPID, compared to the 7 subchannels per CHPID that were supported
previously.
This means that more CF requests can be active concurrently on each CHPID, reducing
instances of requests being delayed because all subchannels or link buffers are busy.
The ability to define multiple CHPIDs to a single PSIFB link can help because additional
CHPIDs (which provide more subchannels) can be added without having to add more
physical links.
With ISC links, you were limited to seven subchannels per link, and a maximum of eight links
between a z/OS and a CF, giving a maximum of 56 subchannels. With PSIFB 1X links prior to
Driver 93, you were still able to have only seven subchannels per CHPID. However, you were
able to have your eight CHPIDs to the CF spread over just two physical links, which is a
significant decrease in the number of required adapters and DWDM ports. With PSIFB 1X
links and Driver 93, the same eight CHPIDs support 256 subchannels.
For more information about the relationships between subchannels and link buffers and how
they are used, see Appendix C, “Link buffers and subchannels” on page 247.
Find additional reference information in the following documents:
Considerations for Multisite Sysplex Data Sharing, SG24-7263
IBM System z Connectivity Handbook, SG24-5444
Server Time Protocol Planning Guide, SG24-7280
3.6 Planning for future nondisruptive growth
If your CF LPARs reside in the same CPC as z/OS or IBM z/VM, it is possible to add links to
the CF dynamically by using the dynamic reconfiguration support provided by these operating
systems. This means that as your configuration and capacity requirements increase in the
future, you can add more physical links and more CHPIDs dynamically. But if your CF LPARs
reside in a CPC with no operating system (no z/OS or z/VM LPARs), there is currently no way
to add links or CHPIDs to that CPC without a POR.
50 Implementing and Managing InfiniBand Coupling Links on IBM System z
For coupling links prior to InfiniBand, you were able to define and install a link in the CF CPC
at one time, and then add a corresponding link in the z/OS CPC later. A POR was required to
add the link to the CF CPC. But when the link is later added to the z/OS CPC, a dynamic
reconfiguration can be used to make the link available to LPARs on the z/OS CPC, and no
POR is required on the CF CPC. This means that when the link is installed in the z/OS CPC,
you are immediately able to use it without interrupting CF operations.
However, prior to z/OS 1.13, when you define an InfiniBand CHPID, both the CF and the z/OS
CHPIDs must be defined and must be connected to each other3. And because you must
specify the AID when you define the InfiniBand CHPID, you cannot define the CHPID unless
the link was already installed or an order had been placed and the eConfig report providing
the AID is available. This effectively meant that you had to install the InfiniBand adapter in the
CF and z/OS CPCs at the same time.
This restriction is alleviated starting with z/OS 1.13 (and rolled back to z/OS 1.10 with APAR
OA29367). HCD will now accept an AID of an asterisk (*), meaning that you can install the
adapter in the CF CPC now and assign the real AID, but for the z/OS end you use a
placeholder AID of *. Then, when the adapter is subsequently installed in the z/OS CPC, you
replace * with the real AID and perform a dynamic reconfiguration. When the reconfiguration
completes, the new coupling link can be used without having to perform another POR on the
CF CPC.
This support and its use is discussed in more detail in “Overdefining CIB links” on page 160.
3.7 Physical and logical coupling link capacity planning
All sysplex clients will have to migrate to InfiniBand over the next few years as they move to
newer CPC generations. An InfiniBand infrastructure is significantly different from a
pre-InfiniBand infrastructure and will involve replacing all your previous coupling infrastructure
with InfiniBand links.
Therefore, the migration presents an ideal opportunity to ensure that the infrastructure that
you configure will deliver the best possible availability, performance, and flexibility, in the most
cost-effective manner.
To obtain the optimum benefit from this opportunity, rather than simply replacing your existing
configuration with an equivalent number of PSIFB links, you can go through an exercise to
determine the most appropriate configuration for your environment.
In this section we discuss the various aspects of your coupling infrastructure, and help you
identify the number of physical InfiniBand links and the connectivity that you need. We then
discuss the best way to configure that hardware for optimum performance.
3.7.1 Availability
Because the Coupling Facility is the heart of your Parallel Sysplex, it is vital that all systems in
the sysplex have access to it at all times. If one member of the sysplex loses all access to a
Coupling Facility, it is likely that all the contents of the CF will be moved to another CF,
meaning that the capacity of that CF will be lost to the sysplex until connectivity can be
restored. This also potentially creates a single point of failure if all structures now reside in the
same CF.
3 HCD will not build a production IODF if there are uncoupled CIB CHPIDs defined.
Chapter 3. Preinstallation planning 51
To avoid this situation, there are guidelines to follow when planning the availability aspects of
your PSIFB infrastructure:
Always configure at least two physical links between each pair of connected CPCs.
It is important to have no single points of failure in the connection between z/OS and the
CFs. This means using more than one physical link, and distributing those links over
multiple HCAs.
Additionally, if the CPC contains more than one book, distribute the links to each
connected CPC over multiple books (both in the z/OS CPC and in the CPC containing the
CF). Plan availability by BOOK/HCA/PORT.
For example, if you decide to have two CF link CHPIDs and two PSIFB links, and multiple
books are available in your CPC, your two CHPIDs can be mapped as follows:
CHPID 00 - BOOK 0, HCA1, PORT 1
CHPID 01 - BOOK 1, HCA2, PORT 1
In a single book system this is not possible, so spread your links across multiple HCAs.
When planning your physical configuration, be aware that different CPC types support
differing numbers of fanouts. For large configurations, it might be necessary to configure
multiple books to provide the number of required fanouts. Refer to the footnotes in
Table 3-1 on page 41 for more details.
Additionally, depending on your availability requirements, you might decide to add a
second book for availability rather than purely for capacity reasons.
The CHPID Mapping Tool (CMT) does not provide a CHPID availability mapping function
for InfiniBand links. However, it will validate that your manual mapping does not contain
any intersects. To understand the process better, refer to “CHPID Mapping Tool support”
on page 183.
In addition to their use for providing connectivity between z/OS and CF LPARs, coupling links
are also used to provide connectivity for STP signaling. For a CPC to be part of a multiCPC
sysplex, it must be able to send and receive timing signals to and from other members of the
timing network. When using STP, these signals are sent and received over coupling links, so
configure each CPC so that it has two failure-isolated connections to the PTS, the BTS, and
to any CPC that might take over those roles during an outage. For more information, see
Redbooks Server Time Protocol Planning Guide, SG24-7280 and Server Time Protocol
Recovery Guide, SG24-7380.
The number of physical links that you require between each pair of CPCs will reflect a
balance of availability and performance. InfiniBand is a powerful and flexible interconnect
architecture where only significantly high volume workloads are likely to require more than
two physical links between connecting CPCs. However, for availability reasons, every pair of
connected CPCs should have at least two failure-isolated physical links, regardless of the
bandwidth requirements.
Note: At the time of writing, the various functions in z/OS that check for single points of
failure are unaware of the relationship between CHPIDs and the underlying InfiniBand
infrastructure.
If two CHPIDs are online to the CF, those functions will assume that those CHPIDs do
not represent a single point of failure.
This places extra responsibility on you to ensure that you design a robust configuration
and to ensure on an ongoing basis that no real single points of failure exist.
52 Implementing and Managing InfiniBand Coupling Links on IBM System z
Figure 3-6 shows two configurations. The configuration on the left side has two CPCs, with
one HCA fanout on each CPC. Although this provides connectivity and sufficient bandwidth,
both links are connected to the same adapter, meaning that all communication between the
two CPCs will be lost if that adapter were to fail. The preferred solution is to install two fanouts
in each CPC and use one port in each adapter, as shown in the configuration on the right
side.
Figure 3-6 Configuring for availability
3.7.2 Connectivity
When determining the number of PSIFB links you require on a CPC, one of the
considerations is the number of other CPCs that it will need to connect to. Prior to InfiniBand
links, the number of links was driven by the number of CPCs that needed to be
interconnected (the physical configuration), the performance considerations, and the number
of sysplexes (the logical configuration), because pre-InfiniBand coupling links cannot be
shared between multiple sysplexes.
Important: Configure a minimum of two physical InfiniBand links connected to different
HCAs between every pair of connected CPCs.
Defining multiple CHPIDs on a single physical PSIFB coupling link does not satisfy the
high availability recommendation. You must implement multiple physical links, on multiple
HCAs, to avoid single points of failure.
1
2
1
2
1
2
1
2
1
2
1
2
Chapter 3. Preinstallation planning 53
Prior to InfiniBand, each z/OS CPC required at least two links (for availability reasons) per
sysplex for each connected CF. If you had two sysplexes, two z/OS CPCs, two CF CPCs, and
are using System Managed Duplexing, you need at least ten links. This is shown in the
configuration diagram in Figure 3-7.
Figure 3-7 Pre-InfiniBand configuration with two sysplexes
Additionally, if you want to provide STP connectivity between the two z/OS CPCs (which is
highly recommended in a configuration like this), an additional two timing-only links are
required between that pair of CPCs.
PCF1
PSY1 PSY2
PCF2
(timing-only links)
SM Duplex and STP links
Configuration with 1 sysplex
Pre-InfiniBand links
54 Implementing and Managing InfiniBand Coupling Links on IBM System z
Also remember that if System Managed Duplexing is being exploited, you need a pair of links
for each pair of CF LPARs4. So two sysplexes, with both sysplexes using System Managed
Duplexing, require at least twenty links, as shown in Figure 3-8.
Figure 3-8 Pre-InfiniBand configuration with two sysplexes
And that is with just four CPCs, two sysplexes, and a minimum number of links between each
z/OS and its CFs. Imagine a configuration with eight sysplexes (as some clients have). Or four
or more coupling links from each z/OS CPC for each sysplex. As the number of CPCs,
systems, coupling links, and CFs increases, the physical connectivity requirement increases
dramatically.
Because InfiniBand supports the ability to share physical links across multiple sysplexes, the
number of sysplexes is less likely to be a factor in the number of coupling links you require5.
4 Prior to InfiniBand, CF-to-CF links used for System Managed Duplexing cannot be shared by more than one
sysplex.
5 Although PSIFB links can be shared between sysplexes, CF link CHPIDS cannot. If you have a significantly high
number of sysplexes, you might find that the number of sysplexes drives a requirement for more physical links
because of the limit on the number of CHPIDs that can be assigned to an HCA.
PCF1
PSY1 PSY2
PCF2
(timing-only links)
DSY1 DSY2
DCF1 DCF2
SM Duplex and STP links
SM Duplex and STP links
Configuration with 2 sysplexes
Pre-InfiniBand links
Chapter 3. Preinstallation planning 55
Figure 3-9 shows how the use of InfiniBand simplifies the configuration by supporting multiple
sysplexes over (in this case) just two InfiniBand links between each pair of CPCs. This
example has twice as many sysplexes as the configuration shown in Figure 3-8 on page 54,
and yet it only has half as many coupling links.
Figure 3-9 InfiniBand support for multiple sysplexes
At some point, the capacity or connectivity requirements of the sysplexes might drive a need
for more links. However, in most cases, the required number of links reflects the number of
CPCs to be connected and the required capacity, and not the number of sysplexes.
3.7.3 Capacity and performance
The third aspect of planning your InfiniBand connectivity is to ensure that the number of links
you plan will provide acceptable performance.
There are two aspects to how coupling links contribute to performance:
Capacity If you are trying to send 1 GB of data to a cache structure in the CF
every second, your coupling link infrastructure must be capable of
handling that volume of traffic. This is typically referred to as
Summary: After you have mapped out the links you will need for availability and STP
connectivity, verify that connectivity is sufficient to connect every z/OS LPAR to each CF in
the same sysplex.
Providing timing-only links between the z/OS CPCs is highly recommended because it
provides additional flexibility in the placement of the STP roles during outages.
PCF1
PSY1 PSY2
PCF2
(timing-only links)
SM Duplex and STP links
Configuration with n sysplexes
InfiniBand links
CSY1
BSY1
ASY1
CSY2
BSY2
ASY2
CCF1
BCF1
ACF1
CCF2
BCF2
ACF2
56 Implementing and Managing InfiniBand Coupling Links on IBM System z
bandwidth: the larger the bandwidth, the more data can be moved in a
given amount of time.
Response time The response time for a synchronous CF request consists of:
1) Potentially having to wait for an available link buffer or subchannel.
2) Latency within the hardware before the request is placed on the link.
3) The size of the request and the bandwidth of the link.
4) The distance between the z/OS LPAR and the CF LPAR.
5) The speed and utilization of the CF.
Items 2 and 3 in this list are directly impacted by the link technology.
Different coupling link types support different bandwidths. For example, 12X InfiniBand links
have a larger bandwidth than 1X InfiniBand links. This means that a 12X link can process
more requests per second than a 1X link (all other things being equal). The relationship
between bandwidth and response time is discussed in 1.5.1, “Coupling link performance
factors” on page 11.
IBM RMF does not provide any information about the bandwidth utilization of coupling
CHPIDs. For 12X InfiniBand links, especially when running in IFB3 mode, we do not believe
that utilization of the link bandwidth is a concern. More important is to ensure that utilization of
the subchannels and link buffers does not exceed the guideline of 30%.
Because 1X links have a significantly smaller bandwidth and they support more subchannels
per CHPID, it is possible in various situations that link bandwidth utilizations can reach high
levels before subchannel or link buffer utilization reaches or exceeds the 30% threshold. This
is most likely to be observed if the links are used in conjunction with large numbers of large
cache or list requests. In this situation, significantly elongated response times will be
observed when the request rate increases. Distributing the load across more physical links
can result in a reduction in response times if high-bandwidth utilization is the problem.
From a response time perspective, there are considerations:
Large requests can experience shorter response times on 12X links than on 1X links. The
response time difference can be expected to be less pronounced for small requests (lock
requests, for example).
HCA3-O adapters running in IFB3 mode deliver significantly reduced response times
compared to IFB mode. However, those improvements are only available on the 12X
(HCA3-O) links, not on the 1X (HCA3-O LR) links.
Another potential source of delay is if all subchannels or link buffers are busy when z/OS tries
to start a new request. This is most often seen with workloads that generate CF requests in
bursts. The obvious solution to this problem is to add more subchannels and link buffers.
Prior to InfiniBand links, the only way to add more subchannels and link buffers6 was to add
more physical links. However, because InfiniBand supports the ability to assign multiple
CHPIDs to a single physical link, you can add subchannels and link buffers by simply adding
more CHPIDs to the existing links (at no financial cost). This capability is especially valuable
for extended distances and configurations that exploit System Managed Duplexing because
both of these cause increased subchannel utilization.
Adapter types
Generally speaking, the adapter generation (HCA1, HCA2, or HCA3) that you can install is
determined by the CPC that the link will be installed in. The one exception, perhaps, is z196,
6 If you are not familiar with the relationship between coupling links and link buffers and subchannels, review
Appendix C, “Link buffers and subchannels” on page 247 before proceeding with this section.
Chapter 3. Preinstallation planning 57
where you can install either HCA3-O (12X) adapters or HCA2-O (12X) adapters. When you
have a choice like this, install the most recent adapter type (HCA3-O, in this case).
In relation to selecting between 1X and 12X adapters:
If your sysplex is contained within one data center, and is unlikely to be extended over
distances larger than 150 meters, then 12X links are likely to be the most appropriate for
you.
If your sysplex spans multiple sites, 1X links support larger distances, potentially up to
175km with a DWDM, and therefore provide a greater degree of flexibility.
Additionally, in certain situations, HCA3-O LR adapters might be attractive because they
provide more connectivity (four ports per adapter instead of two) for each adapter.
Number of CHPIDs per link
InfiniBand supports up to 16 CHPIDs per adapter. You have flexibility to distribute those
CHPIDs across the available ports on the adapter in whatever manner is the most appropriate
way for you.
HCA2-O 12X adapters operate at optimum efficiency when there are not more than eight
CHPIDs assigned to the adapter. Given that there are two ports on an HCA2-O adapter, this
results in the recommendation of having not more than four CHPIDs on each port of an
HCA2-O adapter if your objective is to maximize throughput.
The more efficient IFB3 protocol is used when HCA3-O (12X) ports with four or fewer CHPIDs
assigned are connected to another HCA3-O port. This results in significantly improved
response time and the ability to process more requests per second.
Therefore, for both HCA2-O and HCA3-O adapters, if the best response time and maximum
throughput for a particular CF is important to you, ensure that the ports that are used to
connect to that CF are not defined with more than four CHPIDs.
Alternatively, if your use of the CF is such that optimum response times are not critical to your
enterprise, you can define up to 16 CHPIDs to a single port. To obtain the full benefit from all
ports on the adapter, however, you will probably want to aim for a more even distribution of
CHPIDs across the ports on the card.
When planning for the number of CHPIDs you need to connect a z/OS LPAR to a CF, it is
valuable to consider the number of CHPIDs you are using today. Note the following
considerations:
If you are replacing ISC links with any type of InfiniBand links, you can see a dramatic
decrease in response times. The reduced response times probably means that you will not
require as many links to connect to the CF as you have today.
If you are replacing ICB4 links with HCA3-O links running in IFB3 mode, you are likely to
see improved response times.
Remember that the determination of whether a port runs in IFB or IFB3 mode is based on
the total number of CHPIDs, across all sysplexes, that are defined on the port. For
information about determining how many CHPIDs are defined to the port, see 6.5,
“Determining which CHPIDs are using a port” on page 179.
Regardless of the type of link you are migrating from, you are unlikely to require more
CHPIDs than you have today unless your current configuration is experiencing high
numbers of subchannel busy and Path Busy events.
If your CFs reside in a CPC that does not contain a z/OS or a z/VM, remember that adding
CHPIDs in the future will require a POR of the CF CPC because it is not possible to do a
58 Implementing and Managing InfiniBand Coupling Links on IBM System z
dynamic reconfiguration on that CPC. For that reason, you might consider defining more
CHPIDs than you actually need to avoid a POR in the future. However, consider this
course of action only if the total number of CHPIDs defined on the InfiniBand port does not
exceed four.
If your configuration contains one or more performance-sensitive production sysplexes
and a number of less important sysplexes, consider the number of CHPIDs that you want
to assign to each HCA3-O port. For example, if you have a production sysplex and plan to
assign two of its CHPIDs to a given HCA3-O port, and assuming that the port at the other
end of the link is also on an HCA3-O adapter, that port will run in IFB3 mode.
Now consider your other sysplexes. Although the CF load that they generate might be
insignificant, adding more CHPIDs to that HCA3-O port results in more than four CHPIDs
being defined for that port, and the performance observed by the production sysplex will
be impacted because the port will now run in IFB mode.
In a situation like this, it might be better to keep the number of CHPIDs defined on any
HCA3-O port being used by a production sysplex to four or fewer, and have a larger
number of CHPIDs on ports that are being used by the other sysplexes.
An example configuration is shown in Figure 3-10. In this case, there is one production
sysplex (P), two test sysplexes (T), one development sysplex (D), and one system
programmer sandbox sysplex (S). The production sysplex is using port 2 on each adapter.
Because those ports only have two CHPIDs defined to them, they run in IFB3 mode. All
the other sysplexes are sharing port 1 on the two adapters. Because there are more than
four CHPIDs defined to those ports, they will run in IFB mode. However, the performance
difference is less likely to be important to those sysplexes.
Figure 3-10 Separating production and non-production sysplexes
3.8 Physical Coupling link addressing
PSIFB coupling link fanouts are identified by an Adapter ID (AID). This is different from
channels installed in an I/O cage or drawer (and ISC3 and ICB4 links), which are identified by
a physical channel identifier (PCHID) number that relates to the physical location.
The AID is used to associate CHPIDs with PSIFB coupling links in a similar way that PCHIDs
are used to define CHPIDs for other types of coupling links. However, when you look at the
channel list on the Support Element (SE), rather than seeing AIDs, you will still see Virtual
Channel Identifiers (VCHIDs) in the address range from 0700 to 07FF. To determine the
VCHID that is currently associated with a given coupling CHPID7, issue a D CF MVS
command; the output shows the VCHID for each CHPID that is connected to the CF.
TTDSP
TTDSP
1
2
1
2
1
2
1
2
IFB
IFB
IFB3
IFB3
90
91
98
99
80 82 84 86
81 83 85 87
88 8A 8C 8E
89 8B 8D 8F
7 VCHIDs are assigned to CHPIDs at IML time or when a dynamic I/O reconfiguration is performed, so always verify
the CHPID-to-VCHID relationship before performing any operation on a VCHID.
Chapter 3. Preinstallation planning 59
Figure 3-11 shows a sample PSIFB channel detail information display from the SE. This
display is for VCHID 0700, which is associated with CHPID 80 in CSS 2. The AID is 0B and
the port is 1. We describe the assignment of AIDs and VCHIDs in 2.4, “Adapter ID
assignment and VCHIDs” on page 26.
Figure 3-11 PSIFB Channel Detail information from SE
You can find the AID assignments for each fanout in the PCHID report. This report is provided
by your IBM technical representative for a new CPC or for miscellaneous equipment
specification (MES) upgrades to an existing CPC.
Example 3-2 shows part of a PCHID report for a z196 model M80. In this example, you can
see that there are four adapters in the first book (location 01). The adapters are installed in
location D7/D8/D9/DA in each case. The AID that is assigned to the first adapter in the first
book (location D7) is 04.
Example 3-2 Sample PCHID REPORT showing AID assignments
CHPIDSTART
15879371 PCHID REPORT Jun 07,2011
Machine: 2817-M80 NEW1
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Source Cage Slot F/C PCHID/Ports or AID Comment
01/D7 A25B D701 0163 AID=04
01/D8 A25B D801 0171 AID=05
01/D9 A25B D901 0171 AID=06
01/DA A25B DA01 0170 AID=07
06/D7 A25B D706 0163 AID=0C
06/D8 A25B D806 0171 AID=0D
06/D9 A25B D906 0171 AID=0E
06/DA A25B DA06 0170 AID=0F
10/D7 A25B D710 0163 AID=14
Note: The panel still refers to the term PCHID for this Virtual Channel Identifier. This can
be ignored.
60 Implementing and Managing InfiniBand Coupling Links on IBM System z
10/D8 A25B D810 0171 AID=15
10/D9 A25B D910 0171 AID=16
10/DA A25B DA10 0170 AID=17
15/D7 A25B D715 0163 AID=1C
15/D8 A25B D815 0171 AID=1D
15/D9 A25B D915 0171 AID=1E
15/DA A25B DA15 0170 AID=1F
Legend:
Source Book Slot/Fanout Slot/Jack
A25B Top of A frame
0163 HCA2
0171 HCA3 O
0170 HCA3 O LR
There is an important difference in how AIDs are handled for different CPC types when
changes to the machine configuration are made.
For System z9 CPCs, the AID is determined by its physical location, much like a PCHID. If
a Host channel adapter (HCA) is moved from one slot location on a processor book to
another slot location, it will assume the AID that is assigned to the new physical location.
For System z10 CPCs or later, when a PSIFB HCA is moved from one fanout slot location
to another fanout slot location, the AID moves with it (the AID is retained).
These differences illustrate the importance of referring to the PCHID report detail.
Figure 3-12 shows the HCD Channel Path List and several CIB CHPIDs. If you page right
(F20) the AID detail is displayed. Here we can see that CHPID 8C is assigned to AID 09 and
port number 1.
Figure 3-12 HCD Channel Path List
Detailed information about defining your InfiniBand infrastructure in HCD is contained in
Chapter 6, “Configuration management” on page 155.
Channel Path List
Command ===> _______________________________________________ Scroll ===> PAGE
Select one or more channel paths, then press Enter. To add, use F11.
Processor ID : SCZP301 CSS ID : 2
1=A21 2=A22 3=A23 4=A24 5=A25
6=* 7=* 8=A28 9=* A=*
B=* C=* D=* E=* F=A2F
I/O Cluster --------- Partitions 2x ----- PCHID
/ CHPID Type+ Mode+ Mngd Name + 1 2 3 4 5 6 7 8 9 A B C D E F AID/P
_ 7E FC SPAN No ________ a a a a a # # a # # # # # # _ 253
_ 7F FC SPAN No ________ a a a a a # # a # # # # # # _ 1F3
_ 8C CIB SHR No ________ a a _ _ _ # # _ # # # # # # a 09/1
_ 8D CIB SHR No ________ a a _ _ _ # # _ # # # # # # a 1A/1
_ 98 CIB SHR No ________ a a _ _ _ # # _ # # # # # # a 0C/1
_ 99 CIB SHR No ________ a a _ _ _ # # _ # # # # # # a 0C/2
Chapter 3. Preinstallation planning 61
3.9 Cabling considerations
PSIFB links utilize industry-standard optical fiber cables:
The HCA2-O and HCA3-O (12X IB-DDR) feature and the HCA1-O (12X IB-SDR) feature
require an OM3 50 micron multimode optical fiber with 12 fiber pairs (total of 24 optical
fibers)
The maximum cable length for the HCA2-O, HCA3-O and HCA1-O features is 150 meters
(492 feet) This provides more flexibility for physical placement of CPCs in the data center
than an ICB-4 implementation.
The HCA2-O LR and HCA3-O LR (1X IB-DDR) feature require a 9 micron single mode
optical fiber cable with one fiber pair. This is the same type of cable, and the same
connectors that are used by ISC3 links.
The maximum cable length for the HCA3-O LR and HCA2-O LR features is 10 km (6.2
miles).
See 2.6, “InfiniBand cables” on page 34 for additional cabling information.
Tip: A CPC must have an established local system name (LSYSTEM) to define a
connected CIB CHPID. HCD will default to the Central Processor Complex (CPC) name
that is specified for the Processor ID. Define a name for LSYSTEM that will be carried over
from one CPC to its replacement (that is, without any machine type information).
If the LSYSTEM parameter is changed (because of a machine upgrade (z10 to z196, for
example), the remote systems at the other end of the PSIFB connection might need a
dynamic activate or a power-on reset (for a stand-alone CF) for the name of the replaced
CPC to reflect the new name.
The LSYSTEM value is only used in PSIFB CHPID definitions, so any changes you make
to your LSYSTEM naming convention are unlikely to affect anything else. This is discussed
further in “LSYSTEM” on page 162.
Note: Clearly label both ends of all cables to avoid confusion and expedite problem
determination.
62 Implementing and Managing InfiniBand Coupling Links on IBM System z
© Copyright IBM Corp. 2008, 2012, 2014. All rights reserved. 63
Chapter 4. Migration planning
The migration from earlier sysplex connectivity solutions to InfiniBand connectivity options
provides you with the opportunity to completely refresh your coupling infrastructure, resulting
in improved availability, flexibility, and performance. However, because it involves the eventual
replacement of all your existing coupling links, you must plan the migration carefully,
especially if you need to complete the migration without impacting application availability.
It is safe to say that every enterprise’s sysplex environment is unique. It might range from a
small 1-CPC sysplex in a single computer room to a large n-way sysplex using structure
duplexing technology spread over many kilometers. Some enterprises are able to schedule a
window to perform a disruptive migration; others need to perform the migration without any
outage to their core applications.
This chapter describes several of the most common migration scenarios. They have been
tested and documented to help you plan your specific migration. All scenarios focus on the
basic steps to successfully complete the migration. You will need to adapt each scenario to
your specific installation, depending on the number of coupling links, Coupling Facility CPCs,
involved logical partitions, and number of System z servers in your configuration.
The following topics and scenarios are presented in this chapter:
Migration considerations
Connectivity considerations
Introduction to the scenario notation
Scenario 1: Disruptive CPC upgrade, PSIFB already in use (ICFs)
Scenario 2: Migration to PSIFB with no POR (ICFs)
Scenario 3: Concurrent CPC and link upgrade
Scenario 4: Concurrent PSIFB implementation (stand-alone CFs)
Scenario 5: Concurrent migration from IFB to IFB3 mode
Scenario 6: Concurrent switch between IFB modes
4
64 Implementing and Managing InfiniBand Coupling Links on IBM System z
4.1 Migration considerations
You must carefully plan for the migration to PSIFB coupling links. Introducing PSIFB links into
an existing environment requires a thorough understanding of the different System z CPC
implementations of Coupling Facility technology. When implementing a new CPC (like a z196
and z114) you need to consider migration to InfiniBand as part of the overall migration plan. A
conservative approach aimed at providing easy fallback and minimizing application disruption
is likely to consist of multiple steps.
We assume that you have familiarized yourself with the technical aspects of InfiniBand
technology on IBM zEnterprise and System z servers as described in Chapter 2, “InfiniBand
technical description” on page 17. Further, we assume that you have finished the planning for
the implementation of the InfiniBand technology in your specific environment by taking into
account all considerations listed in Chapter 3, “Preinstallation planning” on page 37.
This chapter focuses on the actual implementation steps, providing sample step-by-step
guidance for each of the most common implementation scenarios. Table 4-1 contains a list of
the scenarios discussed in this chapter.
Table 4-1 Sample migration scenarios
Depending on your environment and migration goals, you might need to use several of the
migration scenarios, one after the other. The following example shows how you might need to
combine a number of scenarios to get from where you are now to where you want to end up.
Example implementation steps
The sysplex environment consists of several System z10 CPCs, currently connected using
ICB4 links. Your company has decided to upgrade to the latest generation of zEnterprise
servers, and you are responsible for ensuring that the migration is carried out with no
application outages. In this case, you need to use several of the sample scenarios.
First, you need to add the InfiniBand adapters to your existing System z10 CPCs; see 4.4,
“Scenario 2” on page 76, “Migration to PSIFB with no POR (ICFs)”.
Next, you need to upgrade the z10s to z196s. There are a number of options for that
migration:
4.3, “Scenario 1” on page 68, “Disruptive CPC upgrade, PSIFB already in use (ICFs)”
4.5, “Scenario 3” on page 86, “Concurrent CPC and link upgrade” or
4.6, “Scenario 4” on page 95, “Concurrent PSIFB implementation (stand-alone CFs)”,
depending on your configuration.
Finally, you want to achieve the performance benefits of IFB3 mode, and this will involve
replacing your HCA2-O adapters with HCA3-O adapters. 4.7, “Scenario 5” on page 108,
Section Description Page
4.3 Scenario 1 - Disruptive CPC upgrade, PSIFB already in use (ICFs) 68
4.4 Scenario 2 - Migration to PSIFB with no POR (ICFs) 76
4.5 Scenario 3 - Concurrent CPC and link upgrade 86
4.6 Scenario 4 - Concurrent PSIFB implementation (stand-alone CFs) 95
4.7 Scenario 5 - Concurrent migration from IFB to IFB3 mode 108
4.8 Concurrent switch between IFB modes 112
Chapter 4. Migration planning 65
“Concurrent migration from IFB to IFB3 mode” provides an example of how such a
migration can be completed without impacting application availability.
Although this multistep process might seem complex, it can help you progress through a
number of technology changes without impacting your application availability (depending on
which migration option you select). Alternatively, if you are able to plan for an application
outage, it might be possible to consolidate a number of changes into a single outage, thereby
reducing the total number of steps that are required.
4.1.1 Connectivity considerations
In the migration scenarios we sometimes connect one version of an HCA adapter (HCA2, for
example) to a different version (HCA3). The ability to mix and match adapters like this
depends on the CPC generations that are to be connected. For example, a z10 with an
HCA2-O fanout can be connected to a zEC12 with HCA3-O fanout; however, the link will run
in IFB mode rather than in IFB3 mode.
Table 4-2 lists which 12X InfiniBand HCA fanouts can be interconnected, along with specific
considerations.
Table 4-2 12X connectivity options
Table 4-3 provides similar information for the 1X InfiniBand fanouts.
Table 4-3 1X connectivity options
Server
(Link type)a
a. Refer to 2.2, “InfiniBand fanouts” on page 19 and 2.5, “InfiniBand coupling links” on page 30 for
further detail regarding HCA fanouts.
z9
(HCA1-O)
z10
(HCA2-O)
zEC12/zBC12
z196/z114
(HCA2-O)
zEC12/zBC12
z196/z114
(HCA3-O)b
z9
(HCA1-O)
NO YES YES NO
z10
(HCA2-O)
YES YES YES YES
zEC12/zBC12
z196/z114
(HCA2-O)
YES YES YES YES
zEC12/zBC12
z196/z114
(HCA3-O)b
b. HCA3-O fanouts support two modes, IFB and IFB3. The improved 12X IFB3 mode can only be
used if the link is implemented between two HCA3-O fanouts and a maximum of four CHPIDs
are defined to the HCA3-O ports at each end of the link.
NO YES YES YES
Server
(Link type)az9
(N/A)bz10
(HCA2-O LR)
zEC12/zBC12
z196/z114
(HCA2-O LR)c
zEC12/zBC12
z196/z114
(HCA3-O LR)
z9
(N/A)bN/A N/A N/A N/A
z10
(HCA2-O LR)
N/A YES YES YES
66 Implementing and Managing InfiniBand Coupling Links on IBM System z
4.2 Introduction to the scenario notation
To make it easier to understand and compare the various scenarios, we present each one
using the same structure. After a brief description of the scenario, a diagram of the starting
configuration is presented. The target configuration of each scenario is also shown in a
diagram, typically at the end of the steps in the scenario. Depending on the complexity of the
scenario, intermediate configurations might be shown as well.
The descriptions refrain from using actual PCHID or CHPID numbers because they will be
different for each client. Instead, the relevant links are numbered and referred to accordingly,
where necessary.
Different kinds of coupling link types are used in each scenario. For the legacy links we might
have ISC3 links, ICB4 links, or a mixture of both link types. The links will be updated to the
most appropriate PSIFB link types; for example, an ISC3 link is replaced by a 1x PSIFB link
and an ICB4 link is replaced by a 12x PSIFB link. However, the handling of the various link
types is identical, so the link type used in each scenario can be replaced to suit your
configuration.
Table 4-4 lists the systems used in our examples. Note that the CPC name-to-CPC
relationship might change in a scenario where the current CPC is going to be replaced by
another one.
Table 4-4 CPCs used in migration scenarios
zEC12/zBC12
z196/z114
(HCA2-O LR)c
N/A YES YES YES
zEC12/zBC12
z196/z114
(HCA3-O LR)
N/A YES YES YES
a. Subchannels and link buffers need to be considered during the migration phase; see
Appendix C, “Link buffers and subchannels” on page 247.
b. Long-reach HCA fanouts are not supported on System z9.
c. HCA2-O LR adapters are still available for z10, but they have been withdrawn from marketing
for z196 and z114. On z196 and z114, the replacement is HCA3-O LR. However, you still might
have HCA2-O LR HCAs on a z196 if they were carried forward during a CPC upgrade.
Note: To avoid making the scenarios overly complex, the number of links used in each
scenario (and shown in the diagrams) represent the minimum number of links that are
recommended to avoid any single point of failure.
If you use more than the minimum number of links, the scenario steps will be still the same
with the exception that steps that deal with managing the links themselves will have to be
repeated based on the number of links actually in use.
CPC Model CPC name aLPAR types
System z10 EC E26 CPC1 z/OS
CF
Chapter 4. Migration planning 67
The LPARs and their associated names from Table 4-5 are used in our examples. Depending
on the scenario, the LPARs might reside in the same or in different CPCs.
Table 4-5 LPARs used in migration scenarios
Disruptive versus concurrent migrations scenarios
Whenever we describe a migration scenario as “concurrent”, we mean that it is required that
all z/OS LPARs on the involved CPCs (and therefore all core applications) continue to be
available throughout the migration. In these scenarios we assume that it is acceptable to
move coupling structures between CF LPARs and also to deactivate a CF LPAR as long as at
least one CF LPAR remains connected to all systems in the sysplex.
Note that if your critical applications exploit sysplex data sharing and dynamic workload
routing, you have greater flexibility because the availability of your critical application is no
longer tied to the availability of particular z/OS LPARs. This means that you can use a
scenario that we describe as disruptive but without impacting your application availability.
If all involved CPCs, and therefore all z/OS LPARs and all applications, need to be shut
down, the scenario is considered to be disruptive. Also, if the core applications are not
configured for sysplex-wide data sharing, this also means that the scenario needs to be
considered disruptive.
The terms “disruptive” and “concurrent” can be interpreted differently by one enterprise than
by another. There is no absolute definition possible, because you might consider a given
scenario to be disruptive, but it might be considered concurrent by someone else. Therefore it
is important that you define your migration objectives as clearly as possible to be able to
match them to one of our migration scenarios.
System z10 EC E40 CPC2 z/OS
CF
System z9 EC S18 CPC3 z/OS
CF
zEnterprise 196 M32 CPC4 z/OS
CF
zEnterprise 196 M32 CPC5 z/OS
CF
System z10 EC E26 CF01 CF only
System z10 EC E26 CF02 CF only
a. CPC name that is specified for the CPC ID in the IOCDS.
LPAR Type Name
z/OS ZOS1, ZOS2
CF CF1, CF2, CF3, CF4
CPC Model CPC name aLPAR types
68 Implementing and Managing InfiniBand Coupling Links on IBM System z
4.3 Scenario 1
Disruptive CPC upgrade, PSIFB already in use (ICFs)
There are two servers installed at the start of this scenario: a z9 EC and a z10 EC. Both
CPCs already have InfiniBand fanouts installed and are connected with 12x PSIFB links. The
z9 EC is going to be upgraded to a z196 server. Two LPARs, ZOS1 and CF1, are in use and
reside in the z9 EC. Two more LPARs, ZOS2 and CF2, reside in the z10 EC. Figure 4-1 on
page 69 shows the configuration at the start of the project.
In this case, the installation has already planned for a complete outage to allow for an
upgrade to the computer room environment (for recabling and power maintenance), so they
are going to take the opportunity of the outage to upgrade the z9. Because both CPCs will be
down, this is considered to be a disruptive scenario where all LPARs on both CPCs are being
stopped. However, if the installation prefers, it is possible to complete this migration without
impacting the systems on the z10; that type of migration is described in 4.5, “Scenario 3” on
page 86.
Note: The scenarios in this document reflect the current best practice guidance for CF
outages as described in the white paper titled Best Practices: Upgrading a Coupling
Facility, which is available at the following website:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101905
Tip: Regardless of which scenario applies to your situation, at the start of the migration it is
useful to clearly map which LPAR and CPC is connected to every coupling CHPID in each
LPAR, for CF and z/OS LPARs.
Also, if the migration will consist of a number of configuration changes, document the
expected configuration at the start of each step. This might take a little time, but it will prove
invaluable in helping avoid human error (and potential outages) by configuring the wrong
CHPID on or offline.
Chapter 4. Migration planning 69
Figure 4-1 Scenario 1: Starting configuration
In this scenario, the z9 system uses HCA1-O fanouts for the PSIFB coupling connections.
The z10 system has HCA2-O fanouts installed. The upgrade of the z9 is done as a “frame roll
MES”, which means that the actual frames of the z9 will be replaced with the new frames of
the z196, but the serial number will stay the same. Due to the nature of a frame roll MES, the
CPC that is being upgraded will always have to be shut down.
In a frame roll MES, several I/O cards or fanouts might be transferred to the new frame.
However, in this specific case, no InfiniBand fanouts are carried forward because the z196
does not support the HCA1-O fanouts. Therefore, new HCA fanouts are ordered. To be better
positioned for future upgrades, order new HCA3 fanouts for the z196 because they are not
only compatible with the HCA2 fanouts in the z10, but will also allow utilization of the
improved IFB3 mode when the client upgrades their z10 CPC to a z196 in the future.
Following are the steps in this migration:
1. Update hardware configuration definition (HCD) for the new z196 and create the z196
IOCDS.
The new IOCDS for the z196 is generated and saved to the z9. The IOCDS will be carried
forward during the upgrade to the z196 by the System Service Representative (SSR). This
is the most convenient way to install an IOCDS on an upgraded CPC.
Note: The process described here, and in subsequent scenarios, might seem intricate.
However, the objective is to shut down everything in an orderly manner, ensuring that the
subsequent restart will be clean and will complete in a timely manner.
In this first scenario, we include steps that are not strictly necessary if one or both CPCs
are going to be powered-down. However, we included the steps in an effort to have more
consistency between the various scenarios presented in this chapter.
IC
z10 EC
z10 EC
CPC2
CPC2
z9 EC
z9 EC
CPC3
CPC3
ZOS2ZOS1
CF
CF2
12x PSIFB 1
12x PSIFB 2
CF
CF1
IC
HCA1-O HCA2-O
HCA1-O HCA2-O
IC
z10 EC
z10 EC
CPC2
CPC2
z9 EC
z9 EC
CPC3
CPC3
ZOS2ZOS1
CF
CF2
12x PSIFB 1
12x PSIFB 2
CF
CF1
IC IC
z10 EC
z10 EC
CPC2
CPC2
z9 EC
z9 EC
CPC3
CPC3
ZOS2ZOS2ZOS1ZOS1
CF
CF2
CF
CF2
12x PSIFB 1
12x PSIFB 2
CF
CF1
CF
CF1
IC
HCA1-O HCA2-O
HCA1-O HCA2-O
BTS PTS
70 Implementing and Managing InfiniBand Coupling Links on IBM System z
2. Write the changed IOCDS for the z10 EC.
Depending on the z196 definitions in HCD, the IOCDS for the z10 EC might need to be
updated. If the LSYSTEM name of the z196 will be different than the LSYSTEM name of
the z9, and/or the CHPID numbers for the HCA3 adapters in the z196 will be different than
the corresponding numbers in the z9, then the z10 IOCDS will need to be updated.
If you use the same CHPIDs and CSSs on the new PSIFB links that you used before on
the coupling links, and the same LSYSTEM name, then no IOCDS change is required in
relation to the coupling links. For more information, see “LSYSTEM” on page 162.
3. The CFRM policy is updated.
Because the z9 EC is being upgraded to a z196, the CFRM policy needs to be updated.
The changes that you need to make depend on whether you plan to use the same name
for the z196 CF. The CPC serial number will remain the same because this is an upgrade
MES, but the CPC type is going to change and that needs to be reflected in the CFRM
policy as well.
The structure sizes have to be updated as well, because the new z196 will have a different
Coupling Facility Control Code (CFCC) level than the z9. To determine the correct sizes,
use either CFSizer or the Sizer tool available on the CFSizer site. An alternate process is
described in the document titled “Determining Structure Size Impact of CF Level
Changes”, available on the web at:
http://www.redbooks.ibm.com/abstracts/tips0144.html?Open
Save the updated policy with a new name, for example, “newpol”. This gives you the ability
to fall back to the previous CFRM policy if problems are encountered.
Note that the new policy will not actually be activated until after the upgrade is complete.
4. Quiesce all work on all z/OS LPARs in preparation for stopping z/OS and powering down
both CPCs.
5. Set the CF in the z9 EC into maintenance mode.
As the first step to emptying the CF1 CF on the z9 EC, set the CF into maintenance mode
by using the following z/OS command:
SETXCF START,MAINTMODE,CFNM=CF1
6. Move CF structures from the CF in the z9 EC to the CF in the z10 EC.
Reallocate the structures into the CF LPAR on the z10 EC using the following command:
SETXCF START,REALLOCATE
It is important that you empty the CF in the CPC that is being upgraded before you shut
down the sysplex. If you do not do this, information about the structures in the z9 CF will
be retained in the Coupling Facility Resource Management (CFRM) Couple Data Set and
cannot be removed.
7. Check that all structures have been moved.
Determine if any structures have remained in the CF by using this command:
D XCF,CF,CFNAME=CF1
Note: If you are running z/OS 1.12 or higher, verify that the reallocation will have the
desired effect by using this command first:
D XCF,REALLOCATE,TEST
Address any problem that is detected before you reallocate the structures.
Chapter 4. Migration planning 71
If any structure did not move, check whether application-specific protocols are needed and
use these to move (or rebuild) the structure. Repeat this step until all structures have been
moved.
For more information about moving structures out of a CF, refer to the sections titled
“Removing structures from a coupling facility for shutdown” and “Removing a structure
from a coupling facility” in z/OS MVS Setting Up a Sysplex, SA22-7625.
8. Set the z9 EC CF logically offline to all z/OS LPARs.
Placing CF1 in MAINTMODE and verifying that the REALLOCATE command was
successful can ensure that no structures are left in CF1. However, to be sure that no active
structures remain in the CF, it is prudent to take the CF logically offline to all connected
z/OS systems before shutting it down.
This is achieved using the following command in each z/OS system:
V PATH(CF1,xx),OFFLINE (you will have to add the UNCOND option for the last CHPID)
This has the effect of stopping the issuing z/OS from being able to use the named CF. If
this z/OS is still connected to any structure in the CF, the command will fail.
Note that this command does not make the named CHPIDs go offline; it is simply access
to the CF that is removed. The CHPIDs will be taken offline later. If you issue a
D CF,CFNM=CF1 command at this time, you will see that each CHPID still shows as being
physically ONLINE, but logically OFFLINE.
9. Shut down CF1.
Next you need to deactivate the CF1 LPAR. If the CF is empty, use the SHUTDOWN command
from the CF1 console on the HMC. The advantage of using a SHUTDOWN command,
compared to deactivating the LPAR using the HMC DEACTIVATE function, is that the
SHUTDOWN command will fail if the CF is not empty. The end result of the SHUTDOWN is
effectively the same as the end result of performing a DEACTIVATE, so it is not necessary
to perform a DEACTIVATE of a CF LPAR that was successfully shut down using the
SHUTDOWN command.
If the CF still contains one or more structures, but the decision has been made that the
instance of the structure does not need to be moved or recovered, the SHUTDOWN command
will not complete successfully, so DEACTIVATE the CF LPAR instead.
10.Stop ZOS1 and deactivate the ZOS1 LPAR.
The z/OS LPAR on the z9 is now shut down (using V XCF,sysname,OFFLINE). When that
system has entered a wait state, the LPAR should be is deactivated.
All LPARs on the z9 should now be in a state that is ready for the CPC to be powered
down.
Note: If you are running z/OS 1.12 or higher, review the reallocation report to determine
why a structure was not moved by using this command:
D XCF,REALLOCATE,REPORT
If you have a z/OS 1.13 or higher system in the sysplex, you might find that system
provides more helpful messages to explain why a structure is not being moved to the
CF you expect.
Note: The paths to the CF can be brought logically back online by using the
VARY PATH(cfname,chpid) command or by performing an IPL of the z/OS system.
72 Implementing and Managing InfiniBand Coupling Links on IBM System z
11.STP-related preparation.
Note the current STP definitions and the STP roles (CTS, PTS, BTS, and Arbiter) before
any changes are made, so they can be reinstated later.
For more details refer to the Redbooks document Server Time Protocol Recovery Guide,
SG24-7380, and the white paper Important Considerations for STP and Planned
Disruptive Actions, available on the web at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP102019
To better protect a sysplex against potential operational problems, STP was changed so
that it prevents the shutdown of a zEnterprise or System z10 CPC that has any STP role
assigned. This change is documented in the white paper Important Considerations for
STP server role assignments, available on the web at:
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101833
As a result of this change, you need to remove the STP server roles in the STP
configuration panel before the CPCs are shut down. Remove the Arbiter role first (not
applicable in this scenario) and then the BTS role1.
To implement the role changes, check the setting of the option called “Only allow the
server(s) specified above to be in the CTN”2 in the STP network configuration tab on the
HMC:
If this option is currently selected:
It must be cleared so that the BTS role can be removed from the z9 (because no
server role assignment changes can be made if this option is enabled).
Remove the BTS role from the z9.
If this option is not selected:
Remove the BTS role from the z9.
Note the original setting of this option because it is required in step 21.
When the z9 is upgraded to a z196, the STP definitions on that CPC will need to be
updated to specify the name of the CTN that the CPC is to join. If you remove the z9 from
the CTN prior to powering it down, you will be able to configure offline all coupling links
between the two CPCs because they are no longer being used by STP.
To remove the z9 from the CTN, logon to the z9 SE, go into the System (Sysplex) Time
option, select the STP Configuration tab and blank out the Coordinated timing network ID
field.
Finally, select the “Only allow the server(s) specified above to be in the CTN” option on the
HMC so that after the POR, the CTN configuration will be remembered.
1 The PTS should always be the last CPC to be shut down, and the first CPC to be restarted, so ensure that the CPC
that is not being upgraded is the one that is the PTS. Therefore, prior to making any other STP-related changes,
make the CPC that is not being upgraded the PTS if that is not already the case; in this example, that is the z10 EC.
If you need to make changes to the STP configuration to achieve this, work those changes into the process
described here.
2 Enabling this option causes the CTN’s timing and configuration settings to be saved so that they will not need to be
re-entered after a loss of power or a planned POR of the servers.
Note: Remove the STP server roles only if the complete CTN will be shut down. If your
CTN contains other CPCs that will continue to work, the STP server roles need to be
reassigned to the still-active CPCs.
Chapter 4. Migration planning 73
12.Configure offline all paths to the z9 EC CPC.
When all LPARs on the z9 EC are down, configure offline all coupling links to that CPC. To
do this, issue the following command for each CHPID going to CF1, for each z/OS LPAR
that is communicating with CF1:
CONFIG CHP(xx),OFFLINE
Because the CF1 LPAR is down at this point, it should not be necessary to use the
CONFIG CHP(xx),OFFLINE,UNCOND command for the last CHPID from each z/OS LPAR to
CF1.
Note that each z/OS LPAR might use different CHPIDs to connect to CF1, so ensure that
you configure the appropriate CHPIDs for each LPAR (in our example, there is only one
z/OS at this point, ZOS2).
After issuing all the CONFIG CHP commands, issue a D CF,CFNM=CF1 command from each
connected z/OS to ensure that all the CHPIDs to that CF are now both logically and
physically offline to that LPAR.
When the CONFIG OFFLINE command is issued for a z/OS CHPID, the z/OS end of the link
will be taken offline; however, the CF end will remain online. In this example, that is not an
issue because the CPC containing CF1 is going to be replaced.
If there are any CF-to-CF links between CF2 and CF1, issue the following command on
CF2 to take those CHPIDs offline:
CON xx OFF
Finally, use the z10 SE to check if there are any remaining coupling links online to the z9.
Toggle offline any such VCHIDs or PCHIDs at this point, using the HMC.
13.Shut down z/OS LPARs on z10.
The next step is to prepare the z10 for power-down to enable the computer room
maintenance.
Use your normal procedures to shut down the z/OS LPARs on the z10 (in our example,
ZOS2). After the z/OS systems have been stopped, deactivate those LPARs.
14.Deactivate the CF2 CF LPAR.
Prior to stopping the CF1 CF, any structures in that CF were moved to the CF2 CF. When
the z/OS LPARs are shut down, various structures will remain in the CF2 CF. For that
reason, the CF SHUTDOWN command will fail. Therefore, simply deactivate the CF2 LPAR.
15.Power down the CPCs.
Power down the z9 now.
Then power down the z10 EC.
16.The SSR starts the upgrade of the z9 EC.
17.The z10 is powered on and activated with the updated IOCDS (if an IOCDS update was
necessary); however, no LPARs are activated yet.
Note that if the IOCDS was updated as part of the z9 upgrade, then all CHPIDs will be
online at the end of the activation. However, if the IOCDS was not updated, then any
CHPIDs that were taken offline as part of the shutdown will be offline after the activation
completes.
18.After the SSR is finished with the z9 upgrade, the z196 is powered on and activated with
the new IOCDS3. Again, no LPARs should be activated yet.
3 If a CPC is activated with an IOCDS that was not previously used, all CHPIDs will be brought online as part of the
activation, regardless of the status of the CHPIDs before the CPC was powered down.
74 Implementing and Managing InfiniBand Coupling Links on IBM System z
19.Ensure that there is an available timing link between the z10 and the z196.
Use the z10 SE to ensure there is at least one coupling CHPID online from the z10 to the
z196 so that STP can exchange time signals between the two CPCs.
20.Add CTN ID to the z196.
In this step and the next one, we make changes to the STP configuration, so disable the
“Only allow the server(s) specified above to be in the CTN” option on the z10 HMC before
proceeding.
Because the z196 is considered to be a new CPC by STP, you must log on to the z196
HMC, select the System (Sysplex) Time option, and enter the CTN ID on the STP
Configuration tab.
After the CTN ID of the z196 has been added, use the STP panels on the z10 and z196 to
ensure that the change was successful and that both CPCs are now in the CTN.
21.STP roles reassignment.
Define or reassign the STP roles that were removed or reassigned in step 11.
After all required STP changes have been made, return the “Only allow the server(s)
specified above to be in the CTN” option to the original setting as noted in step 11.
22.z10 CF LPAR is activated.
The sequence in which the LPARs are activated needs to be controlled.
Activate the CF2 LPAR in the z10. The CF1 CF is not activated yet because we want to be
able to control the allocation of structures in that CF. Even though CF1 was in
maintenance mode prior to the power-down, changing the CPC type from z9 to z196 will
cause that to be reset. We want to place it back in maintenance mode before the CF
connects to the systems in the sysplex.
23.Activate z/OS LPARs.
After CF2 has successfully initialized (and you have verified this using the CF console),
activate and load the z/OS LPARs.
24.Activate the new CFRM policy with the following command:
SETXCF START,POLICY,TYPE=CFRM,POLNAME=newpol
Note that this will make z/OS aware of the newly-upgraded CF. However, because the CF1
LPAR has not been activated yet, z/OS will not be able to place any structures in that CF.
25.Place CF1 back in maintenance mode.
Use the following command to place CF1 in maintenance mode:
SETXCF START,MAINTMODE,CFNM=CF1
26.Activate CF1 CF.
Now that CF1 has been placed in maintenance mode, it can be initialized without fear of
structures immediately being placed in it.
27.Bring paths to CF1 online.
If the z10 IOCDS was updated as part of the upgrade, all CHPIDs will be online now, so
you can skip to the next step.
If the z10 IOCDS was not updated as part of the upgrade, the coupling CHPIDs from the
z10 to CF1 will still be offline. To make CF1 accessible to the z10 z/OS system, you now
need to bring all those CHPIDs back online. Verify that none of the CHPIDs used to
connect to CF1 have changed as a result of the upgrade. Configure the CHPIDs online
using the following command:
CONFIG CHP(xx),ONLINE
Chapter 4. Migration planning 75
After all the CONFIG commands have been issued, verify that all CHPIDs have the expected
status using the D CF,CFNM=CF1 command.
If CF-to-CF paths were in use prior to the upgrade, the CF2 end of those paths will need to
be brought online again. Do this using the following command from the CF2 console:
CON xx ON for each CF-to-CF CHPID.
Verify that the CHPIDs were successfully brought online by issuing the DISP CHP ALL
command on the CF console.
28.Make CF1 available for use again.
Before structures can be moved into CF1, you must take it out of maintenance mode. To
do this, issue the following command:
SETXCF STOP,MAINTMODE,CFNM=CF1
29.Move all the structures that normally reside in CF1 back into that CF.
If you are running z/OS 1.12 or later, issue the following command to ensure that all
structures will successfully move into the correct CF as defined by their PREFLIST:
D XCF,REALLOCATE,TEST
If any errors or warnings are reported, identify the reason and address them.
Complete the process of placing all structures in the correct by issuing the following
command:
SETXCF START,REALLOCATE
30.After all steps are completed, the final configuration is as shown in Figure 4-2.
Figure 4-2 Scenario 1: Final configuration
IC
z196
z196
CPC4
CPC4
LPAR
01
CF
FACIL03
12x PSIFB 1
12x PSIFB 2
IC
z10 EC
z10 EC
CPC2
CPC2
LPAR
00
CF
FACIL04
HCA2-O
HCA2-O
HCA3-O
HCA3-O
IC
z196
z196
CPC4
CPC4
LPAR
01
CF
FACIL03
12x PSIFB 1
12x PSIFB 2
IC
z10 EC
z10 EC
CPC2
CPC2
IC
z196
z196
CPC4
CPC4
LPAR
01
LPAR
01
CF
FACIL03
CF
FACIL03
12x PSIFB 1
12x PSIFB 2
IC
z10 EC
z10 EC
CPC2
CPC2
LPAR
00
CF
FACIL04
LPAR
00
CF
FACIL04
HCA2-O
HCA2-O
HCA3-O
HCA3-O
BTS PTS
IC
z196
z196
CPC4
CPC4
ZOS2ZOS2
12x PSIFB 1
12x PSIFB 2
IC
z10 EC
z10 EC
CPC2
CPC2
ZOS1 HCA2-O
HCA2-O
HCA3-O
HCA3-O
CF
CF1
CF
CF1
CF
CF2
CF
CF2
BTS PTS
76 Implementing and Managing InfiniBand Coupling Links on IBM System z
4.4 Scenario 2
Migration to PSIFB with no POR (ICFs)
This scenario describes how to migrate from pre-InfiniBand coupling links to InfiniBand
coupling links without requiring an outage of any of the LPARs that are using those links. This
scenario consists of a two-CPC configuration with one z/OS and one CF LPAR in each CPC.
There are two ICB4 links in place and they are to be replaced with PSIFB links.
The migration objective is to implement the new coupling link technology without any
interruption to our z/OS LPARs.
In this scenario, the InfiniBand technology is added to an existing sysplex configuration of
System z10 CPCs. This scenario can also be used as an interim step in migrating your
environment to the latest zEnterprise generation without taking an outage to your sysplex.
Because the zEnterprise generation does not support ICB4 coupling link technology,
implementing the InfiniBand technology on your System z10 CPCs prior to the upgrade to
zEnterprise might (depending on your sysplex configuration) be the only way to complete the
migration concurrently.
We describe two ways to achieve the objectives:
A concurrent migration, using two additional CF LPARs.
This is described in “Option 1 - Concurrent migration using two additional CF LPARs” on
page 77.
This option performs the migration by creating another CF LPAR on each of the two CPCs.
It requires updating the CFRM policy and moving your structures into the new CF LPARs.
This might be the preferred option if you already have the maximum number of links
between z/OS and your CFs, or if you want to have an interim step during the migration
where you have an opportunity to compare the performance of the different coupling link
technologies.
A concurrent migration, adding the new PSIFB links alongside the existing CF links.
This is described in “Option 2 - Concurrent migration by adding PSIFB links alongside the
existing CF links” on page 82.
This option accomplishes the migration by performing several dynamic activates on both
CPCs to add the new InfiniBand links, and subsequently to remove the legacy links. This
might be the preferred option if you have less than the maximum number of links (eight) to
your CFs, or if you do not want to temporarily use more resources for another set of CF
LPARs. In this case, you do not need to update the CFRM policies or move any structures.
Note that this option makes it more difficult to perform a performance comparison between
the two technologies.
Note: This example can be used for any type of coupling link. For instance, if you want to
migrate ISC3 links to InfiniBand links, the methodology is the same.
Chapter 4. Migration planning 77
Which of the two methods is the best for you depends on your sysplex configuration and your
migration objectives. Regardless of which option you choose to perform this migration,
Figure 4-3 shows the starting configuration.
Figure 4-3 Scenario 2: Starting configuration
In this scenario, two z10 EC CPCs, CPC1 and CPC2, are installed. Both CPCs have ICB4
fanouts installed and are connected using those links. The new PSIFB fanouts will be
installed alongside the ICB4 fanouts. Then the connections will be concurrently moved to the
new PSIFB links. After the PSIFB environment has been validated, the ICB4 fanouts will be
removed from the CPCs. CPC1 contains two LPARs, ZOS1 and CF1, and CPC2 contains
ZOS2 and CF2.
Option 1 - Concurrent migration using two additional CF LPARs
The advantages of this option are listed here:
The maximum number of coupling CHPIDs between a z/OS LPAR and a CF is eight. If the
configuration already has eight links to the CF, then migrating to InfiniBand links by
defining new CF LPARs and moving the structures to those new CFs might be more
attractive than having to remove some of the current CF links to free CHPIDs that would
then be used for the new InfiniBand links.
For more information about using this methodology, see Appendix D, “Client experience”
on page 253.
This option provides the ability to compare the performance of the old and new
configuration, and to be able to quickly move back to the original configuration in case of
any problems.
Only two dynamic activates are required.
IC
z10 EC
z10 EC
CPC2
CPC2z10 EC
z10 EC
CPC1
CPC1
MBA ICB 4
ICB 4ICB 4
IC
ZOS2ZOS2ZOS1ZOS1
MBAMBA
MBA
BTS PTS
CF
CF2
CF
CF2
CF
CF1
CF
CF1
78 Implementing and Managing InfiniBand Coupling Links on IBM System z
To achieve the objective of a concurrent migration, create two new CF LPARs called CF3 and
CF4. These are then connected to the z/OS LPARs using the new InfiniBand fanouts. When
the new CFs are up and available, the structures can be moved to them either together, or in
a controlled manner. At the end of the scenario the old CF LPARs can be removed.
Migration steps, using two additional CF LPARs:
1. Starting point.
The current configuration is shown in Figure 4-3 on page 77.
2. SSR starts the upgrade.
The SSR starts the concurrent MES to add the new 12x InfiniBand fanouts on both z10
ECs. When the fanouts have been installed, the required cabling is installed.
3. Update IOCDS on CPC1 and define the new LPAR profile.
The new IOCDS for CPC1 is generated and activated through the IOCDS dynamic
activate function to make the new CF3 LPAR and the 12x PSIFB coupling links available
for use, and to add the z/OS LPARs and the CF3 LPAR to the access list of those new
links. In addition, a new LPAR profile has to be created on CPC1 to allocate resources to
the new CF LPAR.
4. Update IOCDS on CPC2.
The new IOCDS for the CPC2 is generated and activated through the IOCDS dynamic
activate function to make the new CF4 LPAR and the 12x PSIFB coupling links available
for use, and to add the z/OS LPARs and the CF4 LPAR to the access list of those new
links. In addition, a new LPAR profile has to be created on CPC2 to allocate resources to
the new CF LPAR.
Note: In this example, we are not using System-Managed Structure Duplexing
(SM duplexing). If you do use this function, you need to consider the CF-to-CF link
requirements if you want to maintain the duplex relationship as you move the structures to
the new CFs.
For an example illustrating where SM duplexing is being used, see 4.6, “Scenario 4” on
page 95.
Note: This methodology requires that you have additional ICF engines and additional
memory available so that you can have all four CF LPARs up and running during the
migration.
ICF engines can be added concurrently to the LPARs if there are sufficient ICFs defined as
Reserved Processors in the LPAR profile. If you do not have unused ICFs for use by the
new CF LPARs, one option is to use Capacity on Demand (CoD) engines. Refer to
Redbooks document IBM System z10 Capacity on Demand, SG24-7504, and discuss your
options with your IBM sales representative.
Links that have been added to a CF LPAR using dynamic activate can be configured online
by the CF without any interruption to the CF activity.
Additional physical memory can be concurrently installed on the CPC ahead of time if
required. Note, however, that storage cannot be added to, or removed from, a CF LPAR
without a DEACTIVATE followed by a REACTIVATE.
Chapter 4. Migration planning 79
5. Update CFRM policy to add new CF LPAR in CPC1.
To move the structures from the old CF LPAR CF1 to the new CF LPAR CF3, the new CF
is added to the CFRM policy and the structure preference lists are modified accordingly.
For example, the CFRM policy definition for a structure might currently look like this:
STRUCTURE NAME(ISGLOCK)
INITSIZE(33792)
SIZE(33792)
PREFLIST(CF1,CF2)
To make CF3 the logical replacement for CF1, update the statements to look like this:
STRUCTURE NAME(ISGLOCK)
INITSIZE(33792)
SIZE(33792)
PREFLIST(CF1,CF3,CF2)
Update the policy with a new name “newpol1” for fallback reasons. After the policy has
been updated, the new policy is activated with the following command:
SETXCF START,POLICY,TYPE=CFRM,POLNAME=newpol1
6. New CF LPAR activation including new links on CPC1.
The CF3 LPAR on CPC1 is now activated and all the defined PSIFB links to that CF
should be brought online to both z/OS LPARs. For more details see 7.3, “Coupling Facility
commands” on page 202.
At this point, verify that you have the desired connectivity for the new CF LPAR.
You can use the HMC (see 7.2, “z/OS commands for PSIFB links” on page 191) and the
following z/OS command to check the connectivity:
RO *ALL,D CF,CFNAME=CF3
7. Move structures to CF3 CF.
After the new PSIFB links are brought online and it is verified that the new CF has the
desired connectivity, you can move the structures. To prepare for emptying out the current
CF, CF1, it is set to maintenance mode using the following command:
SETXCF START,MAINTMODE,CFNM=CF1
With CF1 now in maintenance mode, the REALLOCATE command causes all the
structures to be moved from CF1 to CF3 (assuming that is how you set up your preference
lists):
SETXCF START,REALLOCATE
Note: The IOCDS for each CPC can be prepared before the MES is started. However,
only activate it after the SSR has finished the installation of the new PSIFB fanouts.
If the new IOCDS is activated before the hardware is installed, the IOCDS definitions
will point to hardware that does not yet exist in the CPCs.
Note: If you are running z/OS 1.12 or higher, use the following command to test the
reallocation first:
D XCF,REALLOCATE,TEST
Address any problem that is detected before you reallocate the structures.
80 Implementing and Managing InfiniBand Coupling Links on IBM System z
Alternatively, you might want to move the structures in a more controlled manner (one by
one), which might be appropriate for more sensitive structures. In that case, you need to
use the following set of commands:
SETXCF START,MAINTMODE,CFNM=CF1
SETXCF START,REBUILD,STRNAME=”structure name”
8. Check that all structures have been moved.
Determine if any structures are still in CF1 by using this command:
D XCF,CF,CFNAME=CF1
If any structure did not move, check whether application-specific protocols are needed and
use these to move (or rebuild) the structure. Repeat this step until all structures have been
moved.
9. Middle step: test and decide if you will proceed with the migration.
Now you are at the middle step of the migration. Figure 4-4 shows that both CF LPARs are
defined and connected and the type of connections to each CF. At this point, both ICB4
and 12x InfiniBand links are available.
If you want to compare the performance of the different link types, this is the ideal
opportunity to do that because each system has access to both ICB4-connected and
HCA2-connected CFs, and structures can be moved between the CFs to measure and
compare the response times.
Figure 4-4 Scenario 2: Middle step of the adding additional CF LPARs scenario
Note: If you are running z/OS 1.12 or higher, review the reallocation report to more
easily determine why a structure was not moved by using this command:
D XCF,REALLOCATE,REPORT
ICB 4ICB 4
ICB 4ICB 4
ZOS2ZOS2ZOS1ZOS1
12x PSIFB12x PSIFB
12x PSIFB12x PSIFB
CF
CF2
z10 EC
z10 EC
CPC2
CPC2z10 EC
z10 EC
CPC1
CPC1
CF
CF3
CF
CF1
CF
CF1
CF
CF4
CF
CF4
ICIC
ICIC
MBAMBA
MBA
HCA2-O
HCA2-O
HCA2-O
HCA2-O
MBA
BTS PTS
Chapter 4. Migration planning 81
10.New CFRM policy definition and activation for CPC2.
Assuming that everything goes well with the migration to CF3, the next step is to replace
CF2 with CF4. This requires that the CFRM policy is updated again and activated. To allow
for fallback to the previous policy, the updated policy is given a new name, “newpol2”. The
preference list is also updated with the new CF4 information as was done earlier to add
CF3 to the PREFLISTs.
After the updates have been made, the new CFRM policy is activated using the following
command:
SETXCF START,POLICY,TYPE=CFRM,POLNAME=newpol2
11.New CF LPAR activation including new links on CPC2.
The CF4 LPAR on CPC2 is now activated and all the defined PSIFB links are configured
online. For more details see 7.3, “Coupling Facility commands” on page 202.
At this point, verify that you have the desired connectivity for the new CF LPAR. You can
use the HMC (see 7.2, “z/OS commands for PSIFB links” on page 191) and the following
z/OS commands to check the connectivity:
RO *ALL,D CF,CFNAME=CF4
RO *ALL,D XCF,CF,CFNAME=CF4
12.Structure move to CF4.
To move the contents of CF2 to the new CF (CF4), the current CF (CF2) is first set to
maintenance mode using the following command:
SETXCF START,MAINTMODE,CFNM=CF2
With CF2 in maintenance mode, the structures can now be moved using the following
command:
SETXCF START,REALLOCATE
If you want to move the structures in a more controlled way, you can move them one by
one. This might be useful for more performance-sensitive structures. In this case, use the
following set of commands:
SETXCF START,MAINTMODE,CFNM=CF2
SETXCF START,REBUILD,STRNAME=”structure name”
13.Check that all structures have been moved.
Determine if any structures have remained in the old CF by using this command:
D XCF,CF,CFNAME=CF2
If any structure did not move, check whether application-specific protocols are needed and
use these to move (or rebuild) the structure. Repeat this step until all structures have been
moved and CF1 and CF2 are both empty.
14.Clean up the configuration.
At this point, the ICB4 links and the old CF LPARs CF1 and CF2 are no longer in use and
can be removed. The first step is to remove those CFs from the CFRM policy. Then
configure the ICB4 links offline and deactivate the old CF LPARs (if not already done).
Note: If you are running z/OS 1.12 or higher, review the reallocation report to determine
why a structure was not moved by using this command:
D XCF,REALLOCATE,REPORT
82 Implementing and Managing InfiniBand Coupling Links on IBM System z
Finally, generate a new IOCDS for each CPC that does not contain those resources
(assuming that they are no longer needed). The final configuration is shown in Figure 4-6
on page 86.
Option 2 - Concurrent migration by adding PSIFB links alongside the
existing CF links
The benefits of this option (compared to adding a second set of CF LPARs) are listed here:
If System Managed Duplexing is being used, the migration can be completed without the
complexity of having to add additional CF-to-CF links.
No interrupt to any CF activity is required.
No CF policy or preference list changes are required.
Alternatively, this methodology has possible disadvantages compared to the first option:
More IOCDS dynamic activations are needed.
This method is more difficult to pursue if the configuration already has eight CHPIDs
between each z/OS and the CF.
It is more difficult to compare the performance between the old configuration and the new
configuration.
To achieve the objective of a concurrent migration, this option uses several dynamic activates
to introduce the new PSIFB links and remove the legacy CF links in several small steps. The
number of dynamic activates required depends mainly on the number of links that are
currently in use.
If there are, for example, six ICB4 links in use, and a minimum of four are needed to handle
the workload (assuming the maintenance window is set for a low utilization time), you can add
two PSIFB links (which will bring you to the maximum of eight links to the CF) with the first
IOCDS change.
In the second step, you remove four ICB4 links, leaving you with four active links (two PSIFB
and two ICB4). In the third step, you add four more PSIFB CHPIDs to the PSIFB links that
were added in the first step (so there are two physical links, each with three CHPIDs). In the
last step, you then remove the last two ICB4 links.
In the scenario described here, to avoid making it overly complex, we use a configuration with
only two links of each type. Therefore, the scenario might need to be adjusted to fit your
actual configuration by repeating several of the steps as many times as necessary.
The steps for the documented scenario are the following: first, we dynamically add two PSIFB
links between the z/OS LPARs (ZOS1 and ZOS2) and the CF LPARs CF1 and CF2. There will
be a mix of two ICB4 and two PSIFB links in the middle step of the migration. In the second
step, we remove the two ICB4 links from the configuration.
By using this method, you need to perform several dynamic IOCDS changes. However, there
will be no interruption to the CF activity (although we still advise migrating the coupling links in
a period of low CF activity), and, if you use duplexing for any structures, duplexing does not
have to be stopped.
The downside of this scenario is that it is not as easy to measure the performance of each link
type separately in the middle step when both link types are in use. Therefore, we suggest that
you carry out any desired performance measurements in advance in a test environment.
Chapter 4. Migration planning 83
You can also try using the VARY PATH command to remove one or other set of links from use
while you perform measurements, but you need to be careful that such activity does not leave
you with a single point of failure.
Concurrent migration by adding PSIFB links alongside existing CF links
1. Starting point.
Our sample configuration is shown in Figure 4-3 on page 77.
2. SSR starts MES.
The SSR starts the MES to add the new HCA2-O 12x InfiniBand fanouts on both z10 ECs.
Cabling is also done between both CPCs to connect these fanouts.
3. IOCDS change on CPC1 and CPC2.
The new IOCDSs for CPC1 and CPC2 are generated and dynamically activated. They
include the definitions for the two new PSIFB links.
4. Configure the new PSIFB links on the CF LPARs online.
The new PSIFB links are configured online on CF LPARs CF1 and CF2. Make sure that
you configure online all links, namely, those that are connecting the z/OS LPARs to the CF
LPARs, and any links that might be needed between the CF LPARs.
This is done with the following CFCC command (where xx stands for the CHPID) on the
CF console:
CON xx ONLINE
The status of the links can be verified with the CFCC command:
DISPLAY CHP ALL
See Chapter 7, “Operations” on page 189 for more details.
5. Configure the new PSIFB links online on the z/OS LPARs.
Configuring the CF end of the links online does not necessarily cause the z/OS end of the
links to come online. To ensure the z/OS CHPIDs come online, issue the following
command for each CHPID on each system that is connected to the CF:
CF CHP(xx),ONLINE
See Chapter 7, “Operations” on page 189 for more details.
6. Verify that the new PSIFB links are online.
Check the link status between the z/OS and CF LPARs and between the two CF LPARs
with the following z/OS commands:
RO *ALL,D CF,CFNM=CF1
RO *ALL,D CF,CFNM=CF2
Make sure that you check the response from each system to ensure that all expected links
are online. Also check the “REMOTELY CONNECTED COUPLING FACILITIES” section of the
Note: The IOCDS for each CPC can be prepared ahead of time of the MES, but only
activate it after the SSR finishes the installation of the new PSIFB fanouts.
Otherwise, the IOCDS definitions will point to hardware that does not yet exist in the
CPCs.
Note: It is also possible to use the “Configure Channel Path On/Off” function on the
HMC or SE to toggle the channel online.
84 Implementing and Managing InfiniBand Coupling Links on IBM System z
output to ensure that all the expected CF-to-CF links are online. For more detailed
information, see 7.2, “z/OS commands for PSIFB links” on page 191.
7. Middle step: decide if you will go ahead.
Now you are at the middle step of our scenario.
Figure 4-5 shows the configuration at this point. You are now using the legacy links and
the new PSIFB links between CPC1 and CPC2.
Figure 4-5 Scenario 2: Middle step of the dynamic activation implementation
8. Configure the legacy ICB4 links on the z/OS LPARs offline.
Assuming that everything is working as expected, the next step is to remove the ICB4
links. The legacy ICB4 links are configured offline on the ZOS1 and ZOS2 LPARs. This is
done with the following z/OS command (where xx stands for the CHPID):
CF CHP(xx),OFFLINE
This command must be issued on each z/OS system, and for each ICB4 link.
See Chapter 7, “Operations” on page 189 for more details.
Note: As mentioned, it is difficult to compare the performance between the old and new
configuration while we are using both coupling link technologies together. Perform
testing in a separate test environment with production levels of loads before the start of
the production migration to determine whether the new PSIFB links are capable of
taking over the complete workload from the ICB4 links with acceptable response times.
Note: It is also possible to use the “Configure Channel Path On/Off” function on the
HMC or SE to toggle the CHPIDs offline.
IC
z10 EC
z10 EC
CPC2
CPC2
z10 EC
z10 EC
CPC1
CPC1
ICB 4
ICB 4
IC
12x PSIFB
12x PSIFB
ZOS2ZOS2ZOS1ZOS1
CF
CF1
CF
CF2
MBA
MBA
MBA
MBA
HCA2-O
HCA2-O
HCA2-O
HCA2-O
BTS PTS
CF
CF1
CF
CF2