GC28 0725 2_OS_VS2_System_Programming_Library_MVS_Diagnostic_Techniques_Rel_3.7_Sep78 2 OS VS2 System Programming Library MVS Diagnostic Techniques Rel 3.7 Sep78

User Manual: GC28-0725-2_OS_VS2_System_Programming_Library_MVS_Diagnostic_Techniques_Rel_3.7_Sep78

Open the PDF directly: View PDF PDF.
Page Count: 564

DownloadGC28-0725-2_OS_VS2_System_Programming_Library_MVS_Diagnostic_Techniques_Rel_3.7_Sep78 GC28-0725-2 OS VS2 System Programming Library MVS Diagnostic Techniques Rel 3.7 Sep78
Open PDF In BrowserView PDF
GC28-0725-2
File No. S370-37

Systems

OS/VS2 System
Programming Library:
MVS Di.agnostic Techniques
Release 3.7

I ncludes Selectable Units:
Scheduler Improvements
Supervisor Performance # 1
Supervisor Performance #2
Service Data Improvements
JES2 Release 4.1
3838 Vector Processing Subsystem Support
Dumping Improvements
Attached Processor System for Models 158/168
Hardware Recovery Enhancements
Interactive Problem Control System (lPCS)

--...
- --- -- --.-_-....-

.-

- - - ----

_ _ _ _ _

-----~-

VS2.03.804
VS2.03.805
VS2.03.807
VS2.03.817
5752-825
5752-829
5752-833
5752-847
5752-855
5752-857

(

Third Edition (September, 1978)
This is; a major revision of, and obsoletes, GC28-0725-1 incorporating changes released in the
follo\\ting System Library Supplement:
Interactive Problem Control
System (IPCS)

5752-857 GD23-0095-0 (dated March 31, 1978)

See the Summary of Amendments following the Contents for a summary of the changes that
have been made to this manual. A vertical line to the left of the text or illustration indicates
a technical change made in this edition; revision bars are not used, however, to indicate changes
made in previous editions, technical newsletters, or supplements.
This edition applies to release 3.7 of OS!VS2 and to all subsequent releases of OS!VS2 until
otherwise indicated in new editions or Technical Newsletters. Changes are continually made
to the information herein; before using this publication in connection with the operation of
IBM systems, consult the latest IBM System/370 Bibliography, GC20-0001, for the editions
that are applicable and current.
Publications are not stocked at the address given below; requests for IBM publications
should be made to your IBM representative or to the IBM branch office serving your
locality.
A form for reader's comments is provided at the back of this publication. If the form has
been removed, comments may be addressed to IBM Corporation, Publications Development,
Department D58, Building 706-2, PO Box 390, Poughkeepsie, NY 12602. IBM may use
or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation whatever. You may, of course, continue to use the information
you supply.
©Copyright International Business Machines Corporation 1976, 1977,1978

Guide for Using This Publication

The following is a list of the requirements for using this publication.
• This publication contains information for the following Selectable Units:
Scheduler Improvements - SU4
Supervisor Performance # 1 - SUS
Supervisor Performance # 2 - SU7
Service Data Improvements - SU17
JES2 Release 4.1 - SU25
3838 Vector Processing Subsystem Support - SU29
Dumping Improvements - SU33
Attached Processor System for Models 158/168 - SU47
Hardware Recovery Enhancements - SUSS
Interactive Problem Control System (IpeS) - SU57
• To use this publication, you must have installed at least SUs 4, 5, 7, 17, 25,
(if you are a JES2 user), 33, and 55.
• The implied date of this publication, for the purpose of adding new
supplements/TNLs, is September 30,1978. Always use the page with the latest
date (shown at the top of the page) when adding pages from different supplements/TNLs.

Guide for Using This Publication

iii

iv

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Preface
]./

This publication describes diagnostic techniques and gUidelines for isolating
problems on MVS systems. It is intended for the use of system programmers and
analysts who understand MVS internal logic and who are involved in resolving MVS
system problems.
This publication is intended for use only in debugging. None of the information
contained herein should be construed as defining a programming interface.

Organization and Contents
This publication stresses a three-step debugging approach:

1. Identifying the external symptom of the problem.
2. Gathering relevant data from system data areas in order to isolate the problem
to the component level.

3. Analyzing the component to determine the cause of the problem.
In support of this approach, the publication has been reorganized into three
basic parts consisting of five sections and three appendixes as follows:
Part 1

Section 1. GeneralIntroduction -- Describes the debugging approach that is used
and defines the external symptoms that are used to identify a system problem.
Section 2. Important Considerations Unique to MVS -- describes concepts and
functions that should be understood prior to undertaking system diagnosis.
Included are: global system analysis, system execution modes and status saving,
locking, use of recovery work areas, effects of MP, trace analysis, debugging hints,
and general data gathering techniques.
Section 3. Diagnostic Materials Approach -- provides guidelines for obtaining and
analyzing storage dumps of data areas affected by the problem.

Preface

v

Part 2
Section 4. Symptom Analysis Approach - describes how to identify an external
symptom (loop, wait state, TP problem, performance degradation, or incorrect
output), and provides an analysis procedure for what kind of problem is causing the
symptom.
Section 5. Component Analysis - describes the operating characteristics and
. recovery procedures of selected system components and provides debugging
techniques for determining the cause of a problem that has been isolated to a
particular component.

Part 3
Appendixes
A. - describes the flow of various MVS processes.
B. - provides a step-by-step approach to analyzing a stand-alone dump.
C. - contains definitions of abbreviations used throughout the publication.

vi

OS/VS2 System Programming Library: MVS Diagnostic Techniques

jf
~

Referenced Publications
The following publications either are referenced in this publication or provide
related reading:

I

GA22-7000
System/370 Principles of Operation
GA27-3093
Synchronous Data Link Control General Information
OS/VS2 MVS Interactive Problem Control System (IPCS) User's
GC34-2006
Guide and Reference
OS/VS Environmental Recording Editing and Printing (EREP)
Program
GC28-0772
OS/VS2 System Programming Library:
GC28-0681
Initialization and Tuning Guide
Supervisor
GC28-0628
GC28-0627
Job Management
Service Aids
GC28-0674
GC28-0677
SYSl.LOGREC Error Recording
GC28-0751 and GC28-0752
Debugging Handbook (2 volumes)
GC28-0703
JES3 Debugging Guide
GC30-2051
OS/VS2 TCAM System Programmer's Guide, TCAM Level 10
GC30-3040
OS/VS TCAM Debugging Guide, TCAM Levell 0
GC27-0023
OS/ VS2 MVS VTAM Debugging Guide
Operator's Library:
GC38-0229
OS/VS2 MVS System Commands
GC23-0007
OS/VS2 MVS JES2 Commands
GC23-0008
OS/VS2 MVS JES3 Commands
VTAM Network Operating Procedures
GC27-6997
GC30-3037
OS/ VS TCAM Levell 0
OS/VS Message Library:
VS2 System Messages
GC38-1002
GC38-1008
VS2 System Codes
GY30-3012
3704/3705 Program Reference Handbook
SY26-3823
OS/VS2 I/O Supervisor Logic
SY28-0623
OS/VS2 System Initialization Logic
SY26-3825
OS/VS2 VSAM Logic
SY26-3826
OS/VS2 Catalog Management Logic
SY27-7267
OS/VS2 VTAM Data Areas
SY35-0010
OS/VS2 Access Method Services Logic
SY28-0621
OS/VS2 VTAM Logic
SY28-0713 through SY28-0719
OS/VS2 System Logic Library (7 volumes)

Preface

vii

OS/VS2 CVOL Processor Logic
OS/VS2 MVS iES2 Logic
OS/VS2 VIO Logic
OS/VS2 MVS JES3 Logic
.
OS/VS2 TCAM Level 10 Logic
"IBM 3704 and 3705 Communications Controllers NCe!VS Logic
OS/VS2 Data Areas (microfiche)
3704/3705 Communications Controllers Principles of Operation
IBM 3704/3705 Communications Controllers Emulation
Program Generation and Utilities Guide and Reference Manual
IBM 3704/3705 Communications Controllers NCP/VS Generation
and Utilities Guide and Reference Manual

I

viii

OS/VS2 System Programming Library: MVS Diagnostic Techniques

SY35-0011
SY24-6000
SY26-3834
SY28-0612
SY30-3032
SY30-3013
SYB8-0606
GC30-3004
GC30-3008
GC30-3007

If!

'~

Contents

Section 1. General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.1.1
Basic MVS Problem Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1
IPCS - Interactive Problem Control System. . . • . . . . . . . . . . . . . . . . . . . . 1.1.4
Section 2. Important Considerations Unique to MVS ••••••••.•.•••.•..
Global System Analysis . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . . .
Global Indicators that Determine the Current System State . . . . . . . . . . . . .
Work Queues, TCBs and Address Space Analysis . . . . . . . . . . . . . . . . . . .
TCB Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SRB Dispatching Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address Space Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Task Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
System Execution Modes and Status Saving. '.' . . . . . • . . . . . . . . . . . . . . .
System Execution Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Task Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . .
SRB Mode . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Physically Disabled Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Locked Mode . . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . .
Determining Execution Mode From a Stand-alone Dump . . • . . . . . . . . . . .
Locating Status Information in a Storage Dump . . . . . . . . . . . . . . . . . . .
Task/SRB Mode Interruptions. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Locally Locked Task Suspension . . . . . . . . . . . . . . . . . . . . . . . .
SRB Suspension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Classes of Locks. . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . . .
Types of Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Locking Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Determining Which Locks Are Held On a Processor . . . . . . . . . . . . . . . . .
Content of Lockwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Find Lockwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .
Results of Requests for Unavailable Locks. . . . . . . . . . . . . . . • . . . . . . .
Use of Recovery Work Areas for Problem Analysis . . . . . . . . . . . . . . . . . . . .
SYS1.LOGREC Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Listing the SYS1.LOGREC Data Set. . . . . . . . . . . . . . . . . . . . . . . .
SYS1.LOGREC Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Important Considerations About SYS1.LOGREC Records . . . . . . . . . . .
SYS1.LOGREC Recording Control Buffer. . . . . . . . . . . . . . . . . . . . . . .
Formatting the LOGREC Buffer . . . . . . . . . . . . . . . . . . • . . . . • . .
Finding the LOGREC Recording Control Buffer . . . . . . . . . . . . . . . . .
Format of the LOGREC Recording Control Buffer. . . . . . . . . . . . . . . .
fRR Stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Extended Error Descriptor (EED) . . . . . . • . . . . . . . . . . . . . . . . . . . .
RTM2 Work Area (RTM2WA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Formatted RTM Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . .
System Diagnostic Work Area (SDWA) Use in RTM2 . . . . . . . . . . . . . . . .
Effects of Multi-Processing On Problem Analysis . . . . . . . . . • • . . . . . . . . . .
Features of an MP Environment. . . . . . . . . . . . . . . . . . . . • . . . . . . . .
MP Dump Analysis . '.' . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . .
Data Areas Associated With the MP Environment. . . . . . . . . . . . . . . . .
Parallelism . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . • . . . .
General Hints for MP Dump Analysis. . . . . . . . . . . . . . . . . . . . . . . .
In ter-Processor Comm unica tion. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Direct Services. . . . . . . . . . . . . . . . . . . . . . • . . . • . . . . . . . . . .
Remote Pendable Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Remote Immediate Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MP Debugging Hints . . . . . . . . . . . . . . .; . . • . . • . . . . . . . . . . . . . .

2.1.1
2.1.3
2.1.3
2.1.6
2.1.6
2.1.7
2.1.7
2.1.8
2.1.10
2.2.1
2.2.1
2.2.1
2.2.2
2.2.2
2.2.3
2.2.4
2.2.5
2.2.5
2.2.6
2.2.7
2.3.1
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.5
2.3.7
2.4.1
2.4.2
2.4.2
2.4.3
2.4.13
2.4.14
2.4.15
2.4.15
2.4.15
2.4.17
2.4.19
2.4.19
2.4.19
2.4.20
2.5.1
2.5.1
2.5.2
2.5.3
2.5.4
2.5.6
2.5.7
2.5.8
2.5.9
2.5.10
2.5.16

Contents

ix

x

MVS Trace Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trace Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trace Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notes for Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tracing Procedure.. . . . . . . . .. '.' . . . . . . . . . . • . . . . . . . . . . . . .
Cautionary Notes . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . .
Miscellaneous Debugging Hints . . . . . . . . . . . . . . . . . . . . . '.' . . . . . . . .
Alternate CPU Recovery (ACR) Problem Analysis . . . . . . . . . . . . . . . . . .
Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Low Storage Overlays . . . . . • . . . . . .'. . . . . . . . . . . . . . . . . . . .
Common Bad Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
OPEN/CLOSE/EOV ABENDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Debugging Machine Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Debugging Problem Program Abend Dumps. . . . . . . . . . . . . . . . . . . . . .
Debugging from Summary SVC Dumps . . . . . .. . . . . . . . . . . . . . . . . .
SUMDUMP Output for SVC-Entry SDUMP . . . . . . . . . . . . . . . . . . . .
SUMDUMPOutput for Branch-Entry SDUMP . . . . . . . . . . . . . . . . . .
Started Task Control ABEND and Reason Codes. . . " . . . . . . . . . . . . . .
SW A Manager Reason Codes. . .'. . . . . . . . . " . . . . . . . . . . . . . . . . .
Additional Data Gathering Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . •
Using the CHNGDUMP, DISPLAY DUMP, and DUMP Commands . . . . . . . . .
How to Print Dumps . . . . . . . ; '. . . . . . . . . . . • . . . . . . . . . . . . . . .
How to Automatically Establish System Options for SVC Dump . . . . . . . . .
How to Copy PRDMP Tapes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Rebuild SYS1.UADS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Print SYS1.DUMPxx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Clear SYS1.DUMPxx Without Printing . . . . . . . . . . . . . . . . . . . .
How to Print the SYS1.COMWRITE Data Set. . . . . . . . . . . . . . . . . . . . .
How to Print an LMOD Map of a Module . . . . . . . . . . . . . . . . . . . . . . .
How to Re-create SYS1.STGINDEX . . . . . . . . . . . • . . . . . . . . . . . . . .
Software LOGREC Recording. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the PSA as a Patch Area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Using the SLIP Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Designing an Effective SLIP Trap. . . . . . . . . . . . . . . ',' . . . . . • . . .
Enabling the PER Hardware to Monitor Storage Locations . . . . . . . . . . . . .
System Stop Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
U sing the MVS Trace to Monitor Storage . . . . . . . . . . . . . . . . . . . .. . ,.
How To Expand the Trace Table . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.6.1
2.6.1
2.6.3
2.6.5
2.6.5
2.6.7
2.7.1
2.7.1
2.7.3
2.7.4
2.7.5
2.7.5
2.7.6
2.7.11
2.7.14
2.7.14
2.7.16
2.7.18
2.7.19
2.8.1
2.8.2
2.8.2
2.8.5
2.8.5
2.8.6
2.8.7
2.8.7
2.8.8
2.8.8
2.8.9
2.8.9
2.8.10
2.8.10
2.8.12
2.8.15
2.8.17
2.8.18
2.8.18

Section 3. Diagnostic Materials Approach. . . . . . . . . .. . . . . . . . . . . . . ..
Standalone Dumps . .'. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SVC Dumps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
How to Change the Contents of an SVC Dump Issued by an Individual
Recovery Routine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SDUMP Parameter List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SYSABENDs, SYSMDUMPs, and SYSUDUMPs. . . . . . . . . . . . . . . . . . . . . .
Software-Detected Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hardware~Detected Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1.1
3.1.3
3.1.5

Section 4. Symptom Analysis Approach . . . . . . . . . . . . . . . . . . . . . . . .
Waits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Characteristics of Enabled Waits . . . . . . . . . . . . . . . . . . . . . . . . . . .
Characteristics of Disabled Waits . . . . . . . . . . . . . . . . . . . . . . . . . . .
Analysis Approach for Disabled Waits ~ . . . . . . . . . . . . . . . . . . . . . . .
Analysis Approach for Enabled Waits . . . . . . . . . . . . . . . . . . . . . . . .
Stage 1: Preliminary Global System Analysis. . . . . . . . . . . . . . . . . .
Stage 2: Key Subsystem Analysis . . . . . . . . . . . . . . . . . . . . . . . .
Stage 3: System Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Common Loop Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Analysis Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.1.1
4.1.3
4.1.3
4.1.4
4.1.5
4.1.7
4.1.8
4.1.10
4.1.15
4.2.1
4.2.1
4.2.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

.
.
.
.
.
.
.
.
.
.
.
.

3.1.6
3.1.7
3.1.9
3.1.9
3.1.10

~

TP Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Message Flow Through the System. . . . . . . . . . . . . . . . . . . . . . . . . .
Types of Traces. . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . '.' .
EP Mode Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NCP Mode Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trace Output Under Normal Conditions . . . . . . . . . . . . . . . . . . . . . . .
Example 1: VTAM I/O Trace. . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 2: VT AM and GTF Traces. . . . . . . . . . . . . . . . . . . . . . .
Notes on Examples 1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VTAM Buffer Trace Modification. . . . . . . . . . . . . . . . . . . . . . . . .
VTAM I/O Trace (RNIO) Modification . . . . . . . . . . . . . . . . . . .
Other Tracing Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performance Degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Operator Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dump Analysis Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Incorrect Output . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . .
Initial Analysis Steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Isolating the Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Analyzing System Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.

4.3.1
4.3.1
4.3.3
4.3.4
4.3.5
4.3.7
4.3.7
4.3.12
4.3.27
4.3.28
4.3.29
4.3.29
4.3.30
4.4.1
4.4.1
4.4.2
4.5.1
4.5.1
4.5.1
4.5.2
4.5.3

Section 5. Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dispatcher. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Important Dispatcher Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . .
Dispatchable Units and Sequencing of Dispatching . . . . . . . . . . . . . . . . . .
Dispatchability Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Miscellaneous Notes About the Dispatcher . . . . . . . . . . . . . . . . . . . . . .
Dispatcher Recovery Considerations . . . . . . . . . . . . . . . . . . . . . . . . . .
Dispatcher Error Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
lOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . .
Front-End Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Back-End Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
lOS Problem Analysis . . . . . . . . . . . '.' . . . . . . . . . . . . . . . . . . . . .
lOS Abend Codes .. ' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Loops. . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
lOS Wait States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ' . . . . . . .
General Hints for lOS Problem Analysis . . . . . . . . . . . . . . . . . . . . . . . .
Error Recovery Procedures (ERPs) . . . . . . . . . . . . . . . . . . . . . . . . . . .
lOS and ERP Processing . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . .
Identifying ERP Module Names . . . . . . . . . . . . . . . . . . . . . . . . . . .
How ERP Transfers Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abnormal End Appendages . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Retry/Restart the Channel Program . . . . . . . . . . . . . . . . . . . . . . . .
Error Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ERP Messages and Logging. . . . • . . . . . . . . . . . . . . . . . . . . . . . . .
Intercept Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unit Check on Sense Command . . . . . . . . . . . . . . . . . . . . . . . . . . .
Compound Errors. . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Diagnostic Approach. . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Program Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functional Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Program Manager Organization . . . . . . . . . . . . . . . . . . . . . . . . . . .
Program Manager Con trol Blocks. . . . . . . . . . . . . . . . . . . . . . . . . .
Program Manager Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Queue Validation . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . .
System Initialization . • . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . .

5.1.1
5.1.3
5.1.3
5.1.4
5.1.10
5.1.12
5.1.13
5.1.14
5.2.1
5.2.1
5.2.1
5.2.1
5.2.4
5.2.4
5.2.5
5.2.6
5.2.8
5.2.8
5.2.9
5.2.9
5.2.10
5.2.11
5.2.11
5.2.12
5.2.13
5.2.13
5.2.13
5.2.14
5.3.1
5.3.1
5.3.1
5.3.1
5.3.2
5.3.4
5.3.5

Contents

xi

Basic Functional Flow. . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . ..
LINK . . . . . . . . • . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . .
ATTACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XCTL . . . . . . . . . . . . . . . '. . . . . . . . . . . . . . . . . . . . . . . . . . .
LOAD . . . . . . . . . . . . . . . . . . '. . . . . . . . . . . . . . . . . . . . . . .
DELETE·. . .
Exit Resource Manager. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . .
SYNCH . . . . . . . . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . .
IDENTIFY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Abend Resource Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
806 Abend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APF Authorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Module Subpools ... ' . . . . . . : . . . .. . . . . . . . . . . . . . . . . . . . . ..
Fetch/Program Manager Work Area (FETWK) ; . • . . . . . . . . . . . . . . . . .
RB. Extended Save Area (RBEXSAVE) . . . . . . . . . . . . . . . . . . . . . . . .
VSM . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . .
Address Space Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Step Initialization/Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Virtual Storage Allocation. . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . .
GETMAIN's Functional Recovery Routine . . . . . . . . . . . . . . . . . . . . . .
VSM Cell Pool Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Miscellaneous Debugging Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Real Storage Manager (RSM) . . . " . . . . ~'.. . . . . . . . . . . . . . . . . . . . .
Major RSM Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
PCB . . . . . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SPCT . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . .
PFTE . . . . . . . . . . . ' . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . .
Page Stealing. . . . . . . . . . . '. . . . . . . . . . . . . . . . . . . '. . . . . . . . . . . .
Reclaim . . . . . . . ' .. :• . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Relate . . . . . . . . . . '. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . .
RSM Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RSM Debugging Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Converting a Virtual Address to a Real Address . . . . . . . . . . . . . . . . . . . .
Example: Converting a Virtual Address to aReal Address . . . . . . . . . . . .
Auxiliary Storage Manager (ASM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Component Functional Flow . . . . . . . . • . . . . . . . . . . . . . . . . . . . . .
Saving an LG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Requesting I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Requesting Swap I/O . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . ..
Component Operating Characteristics . . . . . . . . . . . . . . . . . . . . . . . . .
System Mode ... '. . . . . . . . . . . . . . . . . . ',' . . . . . . . . . . . . . .
Address Space, Task, and SRB Structure. . . . . . . . . . . . . . . . . . . . . .
Storage Considerations. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . .
MP Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Interfaces With Other Components. . . . . . . . . . . . . . . . . . . . . . . . .
Register Conventions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Footprints and Traces . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . .
General Debugging Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Paging Interlocks . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .
Incorrect Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Finding the LSID for a Given Page . . . . . . . . . . . . . . . . . . " . . .
Finding LSIDs ofVIO Data Sets .. , . . . . . . . . . . . . . . . . . . . . .
Locate PART and PAT Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Converting a Slot Number to a Full Seek Address . . . . . . . . . . . . . .
Unusable Paging Data Sets.; . . . . . . . . . • . . . . . . . . . . . . . . . . . . .
Page/Swap Data Set Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Error Analysis Suggestions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Validity Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
c ••••••••••.•••••.••••

xii

OS/VS2 System Programming Library: MVS Diagnostic Techniques

'• • • • • • • • • • • • • • • • • • • •

5.3.5
5.3.5
5.3.8
5.3.8
5.3.11
5.3.11
5.3.11
5.3.12
5.3.12
5.3.13
5.3.14
5.3.14
5.3.19
5.3.19
5.3.20
5.4.1
5.4.3
5.4.5
5.4.6
5.4.8
5.4.10
5.4.10
5.5.1
5.5.1
5.5.3
5.5.5
5.5.6
5.5.6
5.5.8
5.5~8

5.5.9
5.5.12
5.5.13
5.5.15
5.6.1
5.6.2
5.6.2
5.6.3
5.6.4
5.6.4
5.6.4
5.6.6
5.6.6
5.6.6
5.6.7
5.6.7
5.6.7
5.6.8
5.6.8
5.6.9
5'.6.10
5.6.10.
5.6.12
5.6.14
5.6.15
5.6.17
5.6.18
5.6.19

'A

..

I

\

ASM Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SALLOC Lock . . . . . . . . . . . • . • . . . . . . . . . . . . . . . . . . . .
ASM Class Locks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Local Lock of Current Address Space . . . . . . . . . . . . . . . . . . . . .
Compare and Swap (CS) Serialization . . . . . . . . . . . . . . . . . . . ..
Serialization via Control Block Queues. . . . . . . . . . . . . . . . . . . . .
Recovery Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Recovery Traces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Recovery Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Recovery as a Debugging Tool. . . . . . . . . . . . . . . . . . . . . . . . . . ..
Recovery Footprints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
FRR/ESTAE Work Areas . . . . . . . . . . . . . . . . . . . . . . . . . . ..
SDWA Variable Reco"rding Area. . . . . . . . . . . . . . . • . . . . . . . ..
ASM Diagnostic Aids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
COD ABEND Meanings for ASM . . . . . . . . . . . . . . . . . . . . . . . . . .
ASM Recovery Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . ..
ASM Tracking Area (ATA) .. .' . . . . . . . . . . . . . . . . . . . . . . . .
Recovery Audit Trail Path (EPATH) . . . . . • . . . . . . . . . . . . . . . .
Additional ASM Data Areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
BSHEADER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
BUFCONBK . . . . . . . . . • • • . . . • . • . . . . . . . . . . . . . . . . . .
DSNLIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
·MSGBUFER . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . .
System Resources Manager (SRM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SRM Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Address Space States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
SRM Indicators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
System Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . •
Individual User Indicators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other Indicators. . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . .
SRM.Error Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Module Entry Point Summaries . . . . . . . . . . • . . . . . • . . . . • . . . . . . .
IRARMINT - SRM Interface Routine • . . . . . . • . . . . . . . . . . . . . .
IRARMEVT - SRM SYSEVENT Router . • . . . . . . . . . . . . . . . • . . .
IRARMSTM - Storage Management Routine. . . . . . . . . . . . . . . . . . .
IRARMSRV - SRM Service Routine . . . . . . . . . . . . . . . . . . . . . . • .
IRARMERR - SRM's Functional Recovery Routine . . . . . . . . . . . . . . .
IRARMCPM .,... Processor Management . . . . . . . . . . . . . . . . . . . . . .
IRARMIOM - I/O Management . . . • . • • . . . • . . . . . . . . . . . . . • •
IRARMRMR- Resource Manager • . . . . . . . . . . . . . . . . . . . . . . . .
IRARMCTL - SRM Control Algorithms • . . . . . . . . . . . • . . . . . . . .
IRARMWAR- Workload Activity Recording . . . . . . . . . . . . . . . . . . .
IRARMWLM- SRM Workload Manager . . . . . . . . . . . . . . . . . . . . . .
VTAM . . . . . . . . . . . . . . . . . . • .". . . . . . . . . . . . . . • . . . . . . . . . .
VT AM's Relationship With MVS . . . ~ . . . . . . . . . . . . . . . . . . . . . . ..
Processing Work Through VTAM. . . . . . . . . . . . . . . . . . . . . . . . . . ..
VTAM Functio~ Management Control Block (FMC B) . . . . . . . . . . . . . . . .
VT AM Operating Characteristics. . . . . . . . . . .. . . . . . . . . . . . . . . ..
Module Naming Convention . . . . '. . . . . . . . . . . . . . . . . . . . . . . . •
Addre~s Space Usage. . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . .
Locking . . . . . . '. . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . .
VT AM Recovery /Terinination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VTAM Debugging . . . . . . . . . . . . . . • . • . . . . • . . . • . . . . . . . . . . .
Waits . . . . . . . . . . . . . . . . . • . • • . . • . . . . . . . • . . . . . . . . . '.
Program Checks . . . . . • . . . . . . • • . • . . . . . . . . . . . . . . . . . . . .
Miscellaneous Hints on VTAM • . . . . . . . . . • • . . . . . . . • . . . . . . . . .
VSAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Record Management . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . • . . .
RPL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PLH. . . . . . . . . . . . . . . . . . . . . . . . • • . • . . . . . . . . . . • . . . .
BUFC . . . . . . . . . . . . . . . . . . : . • . . . . . . . . . . . . . . . . . . . . .

5.6.19
5.6.19
5.6.20
5.6.21
5.6.21
5.6.22
5.6.22
5.6.23
5.6.23
5.6.24
5.6.24
5.6.24
5.6.25
5.6.25
5.6.26
5.6.26
5.6.26
5.6.29
5.6.32
5.6.32
5.6.33
5.6.33
5.6.34
5.7.1
5.7.1
5.7.2
5.7.3
5.7.3
5.7.6
5.7.8
5.7.8
5.7.8
5.7.9
5.7.9
5.7.9
5.7.10
5.7.10
5.7.11
5.7.12
5.7.13
5.7.13
5.7.15
5.7.16
5.8.1
5.8.1
5.8.2
5.8.5
5.8.6
5.8.6
5.8.6
5.8.7
5.8.8
5.8.10
5.8.11
5.8.15
5.8.15
5.9.1
5.9.1
5.9.1
5.9.2
5.9.3

Contents xiii

Record Management DebuggingAids . . . . . . . • • . • . . . . . . . . • . . . . . .
Open/Close/End-of-Volume. . . . . . . .• . . . . . . . . . . . • . . . . . • • . . .
O/C/EOV Debugging Aids. . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . .
I/O Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ' . . . . . . . . . .
I/O Manager Debugging .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Catalog Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . .
Major Registers and Control Blocks. . . . . . . . . • . . . . . . . . . . . . . • . . .
How to Find Registers . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . .
Major Registers . . . . . . • . . • . . . . . . . . . . . . • . . . . . . . . . . . . .
Major Control Blocks . . . . . . . . . . . . ~ . . . . . . • . . . . . . . . . . . . .
Module Structure. • . • . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . .
VSAM Catalog Recovery Logic . . . . , • . . • . . . . . . • . • . . . . . . . . . . .
Establishing/Releasing a Recovery Environment . . . . . . . . . . . . . . . . .
Maintaining a Pushdown List End Mark . . . . . . . . . . . • . . . . • . . . • .
Tracking GETMAIN/FREEMAIN Activity . . . . . . . . . . . . . . . . . . . .
CMS Function Gate. . . . . . . . . . . . . . . . • . . . . . • . . . . . . . . '. . .
Recovery Routine Functions . . . . • . . . • . . . • . . . . . . . . . . . . . . . . .
Diagnostic Output . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . .
Backout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Drop Catalog Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Storage Freeup . . . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DEFINE/DELETE Backout . . . . . . . . . . . . . . . • . . . . . . . . . . . . .
Debugging Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Allocation/Unallocation . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Functional Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Unallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • .
Batch Initialization and Control. . . . . . . . . . . . . . . . . . . . . . . . . . .
Dynamic Initialization and Control. . . . . . . . . . . . . . . . . . . . . . . . .
JFCB Housekeeping . . . . . . . . . . .'. . . . . . . . . . • . . . . . . . . . . .
Common Allocation . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fixed Device Allocation . • . • • • . . . . . . . . . . . . . . . . . . . . . . .
TP Allocation . • . . . . . . . . • • . . . . . . . . . . . . . . .' . . • . . . . .
Generic Allocation . . . . . . . . • . . . • . . . . . . . . . • . . . . . . . . .
Recovery Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..
Common Unallocation . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Volume Mount and Verify. . . • . . . . . . . • . . . . . . . . . . . . . . . . . .
General Debugging Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Allocation Module Naming Conventions. . . . . . . . . . . . . . . . . • . . . .
Registers and Save Areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Common Allocation Control Block Processing • . . . . . . . . . . . . . . . . .
ESTAE Processing . • . . • . . . . . . . .. . . . . . • . . . . • • . . . . . . . . •
Debugging Hints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Allocation Serialization . . .'. . . . . . . . . . . . . . . . . . . . • . . . . • . .
Subsystem Allocation Serialization. . . . . . . . . . . . . . . . . . . . . . . . .
Device Selection Problems (Non-Abend) . . . . . . . . . . . . . . . . . . . . . .
Address Space Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . • .
OBO Abend. . . . . . . . . . . . . . . . . . . . ' . . . . . . . . . . . . . . . . . . .
OC4 Abend in IEFAB4FC, or Loop in IEFDB413 . . . . . . . . . . . . . . . .
Volume Mount and Verify (VM&V) Waiting Mechanism . . . . . . . • . . . . .
Allocation/Unallocation Reason Codes. . . . . • . . . • . • . . . . • . . . . . . . .
Common and Batch Allocation and JFCB Housekeeping Reason Codes . . . .
Common and Batch Unallocation Reason Codes . . . . . . . . . . . . . . . . .
Dynamic Allocation Reason Codes. . . • . . . • . . . . . . . . . . . . . . . . .
JES2
Job Processing Through JES2 . . . . . . . . . . • . . . . . . . . . . . . . . . . . . .
Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conversion. . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . . .
Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . .
Purge . . . . . . . . . . . . . . . . . . '. . . . . . . . . . . . . . . ~ . . . . . . . .

xiv

OS/VS2 System Programming Library: MVS Diagnostic Techniques

5.9.3
5.9.6
5.9.7
5.9.8
5.9.9
5.10.1
5.10.1
5.10.1
5.10.2
5.10.2
5.10.9
5.10.10
5.10.10
5~10.10

5.10.11
5.10.11
5.10.12
5.10.12
5.10.13
5.10.13
5.10.13
5.10.14
5.10.15
5.11.1
5.11.1
5.11.2
5.11.2
5.11.2
5.11.3
5.11.3
5.11.4
5.11.4
5.11.4
5.11.5
5.11.5
5.11.5
5.11.5
5.11.6
5.11.6
5.11.6
5.11.7
5.11.10
5.11.11
5.11.11
5.11.12
5.11.12
5.11.13
5.11.13
5.11.13
5.11.14
5.11.16
5.11.16
5.11.19
5.11.19
5.12.1
5.12.1
5.12.1
5.12.1
5.12.1
5.12.1
5.12.2

JES2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.2
HASJES20 Program Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.2
HASJES20 Module Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.3
HASP Control Table (HCT) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.12.4
HASPSSSM, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.6
Subsystem Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.7
Dispatcher Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.9
$WAIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.9
$$POST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.10
JES2 WAIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.10
Dispatcher Queue Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.10
JES2 Error Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.11
Disastrous Error Routine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.11
JES2 ESTAE Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.13
Catastrophic Error Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.13
JES2 Exit Routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.13
Input/Output Error Logging Routine . . . . . . . . . . . . . . . . . . . . . . . . 5.12.14
JES2 $DEBUG Functions In a Multi-Access Spool Configuration . . . . . . . . . 5.12.14
Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.15
Read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.15
5.12.15
Write . . . . . . . . . . . . . . . . . . . . . . . . . . '. . . . . . . . .
Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.16
Miscellaneous Hints on JES2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.16
Starting JES2 - Enqueue Wait on STCQUE . . . . . . . . . . . . . . . . . . . . 5.12.16
Subsystem Interface (SSI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.1
5.13.1
System Initialization Processing . . . . . . . . . . . . . . . . . . . . . . . .
5.13.2
Subsystem Interface Major Control Blocks . . . . . . . . . . . . . . . . .
5.13.5
Requesting Subsystem Services . . . . . . . . . . . . . . . . . . . . . . . .
5.13.5
Invoking the Subsystem Interface. . . . . . . . . . . . . . . . . . . . .
Logic Flow Examples. . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.7
Notifying a Single Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.7
Notifying All Active Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . 5.13.8
Debugging Hints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . 5.13.9
Recovery Termination Manager (R TM) . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1
Functional Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1
Work Areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1
Major RTM Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.1
Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.2
Hardware Error Processing.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.2
Normal Task Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.4
Abnormal Task Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.5
Retry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.6
Cancel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.7
FORCE Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.8
5.14.9
Address-Space Termination . . . . . . . . . . . . . . . . . . . . .
5.14.10
Error ID . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . .
SVC Dump Debugging Aids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,5.14.11
Important SVC Dump Entry Points . . . . . . . . . . . . . . . . . . . . . . . . 5.14.11
BRANCH=YES Option. . . . .
. . . . . . . .
5.14.11
BRANCH=NO Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.11
SVC Dump Error Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14.12
SYS1.LOGREC Entries Produced for SVC Dump Errors
5.14.12
Fixed Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.14.12
Variable Data. . . . . . . . . . . . . . . . . . . . . . . . .
5.14.13
Control Blocks Used to Debug SVC Dump Errors . . . . . . .
5.14.14
Address Space Control Block (ASCB) . . . . . . . . . . . .
5.14.14
Recovery Termination Control Table (RTCT) . . . . . . , . . . . . . . . . 5.14.14
SVC Dump Work Area (SDWORK). .. . . . . . . . . . . . . . . . . . . . 5.14.14
Summary Dump Work Area (SMWK). .
. . . . . . .
5.14.14
Resource Cleanup for SVC Dump. . . . .. . . . . . . . . . . . . . . . . . . . 5.14.15

Contents 'xv

Communications Task . . . . • . . . . . . . . . . . . . . • . . . . . . . . . . . . • . . .
Functional Description. . . . . . . . . . . . . • . . . . • . . . . . . . . • . . • . . .
Communications Task Control Blocks. . . • . . . . . . . . . . . . . . . . . • . ..
Debugging Hints. . . . . . . . . • . .. . . . . • . . • • . . . . . . . . . . . . . • . .
Console Not Responding to Attention. . . . . . . • . . . . . . . . . . • . . . .
Enabled Wait State. . • . . • . . . . • . . . . . . . . . . . . . . . . . . . . . . .
Disabled Wait State . . . . . . . . . • . . . . . . . . . . • . . . . . . . • . . . . .
Messages or Replies Lost. . . . . . . . • . . . • . . . . . . . . . . . . . . . . . .
No Messages on One Console . . . • . . . . . . . . . . . . . . . . . . . . • . . .
Messages Routed to Wrong Console. • . . . . . . . . . • . . . . . . . . . . • . .
Truncated Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Console Switching . . . . . . • . . . . . . . . . • . . . . . . . . . . . . . . . . •
DIDOCS Trace Table. ; . . . . • . . . . . . . . . . . . . . . . . . . . . . . . . .
DIDOCS-In-Operation Indicator . . . . . . . . • . . . . . • . . . . . . . . . . .
DIDOCS Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A: Process Flows . . . . . . . . . . . . . . . . • • . . . . . . . . . • . . . • .
RSM Processing for Page Faults. . . . . . . . • . . . . . . . . . . . . . . . . . . . . ..
IEAVPIX Tests . . . . . . . . . • . . . . . • . . . . . . . . . • . . . . . • . . . . ..
IEAVGF A Tests. . . . . . . . . . . . . . . . . . . . • . • . . . • . . . . . . . . . . .
IEAVPIOP Tests . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . • . .
IEAVIOCP Tests . . . . . . . . . . . . • . . . . . . . . • . . . . . • • . . . . . • . .
Swapping. . . . . . . . '. . . . . . . . . . . . . . . . • . . • . . . . . . . . . . . . . . ..
Swap-In Process . . . . . . . • . . . . . . . . . '. . . . . • . . . . . . . . . . . . . . .
Swap-Out Process . . . . . . . . . . ; . . . . . . . . . . . . . . . . . . . . . . . . . .
EXCP/IOS . . . . . . . . . . . . . . . . . . . . . . . " . . . . . . . . . . . . . . . . . . .
GETMAIN/FREEMAIN . . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . "
GETMAIN Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
FREEMAIN Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VTAM Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TSO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • .
Time Sharing Initialization. . . . . ~ . . . . . . . . . . . . . . . . . . . . . . . . . .
LOGON Processing. . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . • . . . .
LOGON Scheduling Diagnostic Aids . . . . . . . . . . . . . . . . . . . . . . . .
TSO Line Drop Processing . . . . . . . . . . . . . . . . . . . . • . . . . . • • . . . .
TMP and Command Processor Interface . . . . . . . ',' . . . . . . . . . . . • . . .
TSO Command Processor Recovery . . . . . . . . . . . . . . . . . . . . . . . . . .
TSO Terminal I/O Overview. . . . . . . . . . . . . • . . . . . . . . . . . . . . . . .
Terminal Output Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terminal Input Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . • . . . . .
TSO/TIOC Terminal I/O Diagnostic Techniques . . . . . . . . . . . . . . . . . . .
TSO Attention Processing . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . . .

5.i5.1
5.15.2
5.15.4
5.15.6
5.15.6
5.15.6
5.15.7
5.15.7
5.15.8
5.15.8
5.15.9
5.15.9
5.15.9
5.15.10
5.15.10
A.I.l
A.1.3
A.l.3
A.l.3
A.l.6
A.l.6
A.2.1
A.2.1
A.2.3
A.3.1
A.4.1
A.4.1
A.4.2
A.5.l
A.6.1
A.6.1
A.6.4
A.6.l2
A.6.l4
A.6.l7
A.6.21
A.6.23
A.6.24
A.6.25
A.6.26
A.6.27

Appendix B: Stand-alone Dump Analysis. . . . . . . . . . . . . . . . . . . . • . . . . B.1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.l.l
Analysis Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • • . B.1.7
Appendix C: Abbreviations . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . C.l.l
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.1,.1

xvi

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Figures

Figure 2-1.
Figure 2-2.
Figure 2-3.
Figure 2-4.
Figure 2-5.
Figure 2-6.
Figure 2-7.
Figure 2-8.
Figure 2-9.
Figure 2-10.
Fi~ure 2~11.

Figure 2-12.
Figure 2-13.
Figure 2-14.
Figure 2-15.
Figure 2-16.
Figure 2-17.
Figure 2-18.
Figure 4-1.
Figure 4-2 .
. Figure 4-3.
Figure 4-4.
. Figure 4-5.
Figure 5-1.
Figure 5-2.
Figure 5-3.
Figure 5-4.
Figure 5-5.
Figure 5-6.
Figure 5~7.
Figure 5-8.
Figure 5-9.
Figure 5-10.
Figure 5-11.
Figure 5-12.
Figure 5-13.
Figure 5-14.
Figure 5-15.
Figure 5-16.
Figure 5-17.
Figure 5-18.
Figure 5-19.
Figure 5-20.
Figure 5-21.
Figure 5-22.
Figure 5-23.
Figure 5-24.
Figure 5-25.
Figure 5-26.
Figure 5-27.
Figure 5-28.
Figure 5-29.
Figure 5-30.
Figure 5-31.
Figure 5-32.
Figure 5-33.

Definition and Hierarchy of MVS Locks .
Bit Map to Show Locks Held on a Processor
Classification and Location of Locks.
SYS 1.LOG REC Software Incident- Record 1.
SYS1.LOGREC Software Incident Record 2.
SYS1.LOGREC Software Incident Record 3.
Format of the LOGREC Recording Control Buffer
Format of Records Within the LOGREC Recording Control Buffer
SIGP Return Codes.
External Call (XC) Process Flow .
Emergency Signal (EMS) Process Flow
How to Locate the Trace Table
Types of Trace Entries .
MVS Trace of a Page Fault Without I/O
MVS Trace of a Page Fault With I/O .
GTF Trace of a Page Fault Without I/O
GTF Trace of a Page Fault With I/O .
Trace Example of PER Hardware Monitoring
Summary of EP and UCP Mode Traces
VTAM I/O Trace Example.
VT AM and GTF Traces Example .
JES2 Commands for Status Information.
System Use of Hardware Components
Global SRB Queue Structure and Control Block Relationships
Local SRB Queue Structure and Control Block Relationships.
Dispatcher Processing Overview
lOS Processing Overview
Major lOS and EXCP Control Block Relationships
Program Manager Modules.
Program Manager Control Blocks and Work Areas
Program Manager Queues .
IEAVNP05 Initialization
New PRB Initialization - LINK
New RB Initialization - XCTL
XCTL RB Manipulation
CDE Initialization by IDENTIFY.
Module Search Sequence for LINK, ATTACH, XCTL and LOAD.
Module Search Seq uence of Private Libraries
CDE Allocation .
VSM's View of MVS Storage
VSM's Control Block Usage
VSM's Global Data Area
SDWAVRA Error Indicators
VSM Cell Pool Management
Major RSM Control Blocks and Their Functions
Relationship of Critical RSM Control Blocks
Page Stealing Process Flow.
Converting Virtual Addresses to Real Addresses.
Relationship of Important ASM Control Blocks.
Locating an LSID From an LPID .
Relating the Virtual Address to the PART and PAT
Page/Swap Data Set Error Action Matrix.
SRM Control Block Overview .
SRM Module/Entry Point Cross Reference
VTAM Control Block Structure
Several RPHs Waiting for the Same Lock.

2.3.2
2.3.4
2.3.6
2.4.4
2.4.7
2.4.11
2.4.16
2.4.16
2.5.8
2.5.12
2.5.14
2.6.1
2.6.2
2.6.3
2.6.3
2.6.4
2.6.4
2.8.16
· 4.3.3
.4.3.8
· 4.3.14
· 4.4.2
· 4.4.3
· 5.1.5
· 5.1.7
· 5.1.9
· 5.2.2
· 5.2.3
· 5.3.2
· 5.3.3
· 5.3.3
· 5.3.6
· 5.3.7
· 5.3.9
· 5.3.10
· 5.3.13
· 5.3.15
· 5.3.16
· 5.3.17
· 5.4.2
· 5.4.4
.5.4.7
· 5.4.9
· 5.4.11
.5.5.1
·5.5.2
5.5.7
5.5.14
5.6.5
5.6.11
5.6.13
5.6.17
5.7.4
5.7.20
5.8.3
5.8.9

Figures

xvii

Figure 5-34.
Figure 5-35.
Figure 5-36.
Figure 5-37.
Figure 5-38.
Figure 5-39.
Figure 5-40.
Figure 5-41.
Figure 5-42.
Figure 5-43.
Figure 544.
Figure 5-45.
Figure 5-46.
Figure 5-47.
Figure 5-48.
Figure 5-49.
Figure 5-50.
Figure 5-51.
Figure 5-52.
Figure 5-53.
Figure 5-54.
Figure A-I.
Figure A-2.
Figure A-3.
Figure A4.
Figure A-5.
Figure A-6.
Figure A-7.
Figure A-8.
Figure A-9.
Figure A-10.
Figure A-II.
Figure A-12.
Figure A-13.
Figure B-l.

xviii

Sample Storage Pool Dump
Queueing of RPHs While Waiting for Storage
Relationship of the Six Major Functions of Allocation/Unallocation .
Common Allocation Input.
Common Allocation Control Blocks After Construction of Volunit
Table and EDLs. .
VM& V Control Block Structure .
HASJES20 Module Map
Locating the JES2 Module Directory in HASPNUC.
HCT Major Vector Fields .
The Subsystem Vector Table .
HASPSSM - HASJES20 - OS/VS2 Relationship .
Formal Subsystem-Interface Vectors.
JES2 Queue Control Fields.
JES2 Processor Control Element Relationships .
Example Dump of JES2 Processor Queue Chains
Major JES2 Control Blocks.
Subsystem Interface Control Block Usage
Control Block Structure for Invoking Subsystem Interface
Finding the SSIB for a Job When SSOB Pointer is Zero
Sequence of Communications Task Processing .
Communications Task Control Block Structure.
Page Fault Process Flow
Swap-In Process Flow .
Swap-Out Process Flow.
IOS/EXCP Process Flow
VTAM SEND Process Flow
Overview of Logon Processing.
TCAM Organization After a TSO Logon .
Logon Work Area
LOGON Work Area Bits That Indicatt;: the Currently Executing Module.
LOGON Scheduling Post Codes
Overview of TSO Line Drop Process.
Summary of Command Processor Recovery Activity
TSO Attention Flow
Standalone Dump Analysis Flowchart

OS/VS2 System Programming Library: MVS Diagnostic Techniques

· 5.8.13
· 5.8.14
.5.11.1
· 5.11.8
.5.11.9
.5.11.15
· 5.12.3
· 5.12.4
.5.12.5
.5.12.6
· 5.12.6
· 5.12.8
· 5.12.9
· 5.12.11
· 5.12.12
· 5.12.17
· 5.13.4
· 5.13.6
· 5.13.6
· 5.15.3
· 5.15.5
· A.1.4
· A.2.2
· A.2.4

· A.3.2
· A.5.2
· A.6.2

· A.6.7
· A.6.9
·
·
·
·
·

A.6.12
A.6.13
A.6.15
A.6.22
A.6.28

· B.I.6

Summary of Amendments
for GC28-072S-2
VS2 Release 3.7
Changes have been made throughout this publication to reflect a Service Update to
OS/VS2 Release 3.7 and to include the following topics:

Diagnostic Aids Information
Information from OSjVS2 System Logic Library, Volume 7, SY28-0719, was
added in the following topics:
•
•
•
•
•
•
•

Started task control (STC) abend and reason codes.
Scheduler work area (SWA) manager reason codes.
Auxiliary storage manager (ASM) diagnostic aids and serialization information.
Allocation/unallocation reason codes.
TSO logon scheduling.
Communications task overview and diagnostic aids.
DIDOCS diagnostic aids.

Also, diagnostic aids information was added for:
• Error recovery procedures (ERPs).
• Converting virtual addresses to real addresses.
• JES2 miscellaneous hints.

Interactive Problem Control System (IPCS), SU57
Overview information was added for IPCS.

Miscellaneous Changes
Throughout the text:
• Minor technical and editorial changes were made.
• References to DSS (dynamic support system) were removed.
• References to EREPO were changed to EREPI (environmental recording editing
and printing).

Summary of Amendments

xix

xx

OS/VS2' System Programming Library: MVS Diagnostic Techniques

Section 1. General Introduction

This section introduces basic MVS problem analysis and provides an overview of the
interactive problem control system (lpeS).

Basic MVS Problem Analysis Techniques
Problem isolation and determination are significantly more complex in MVS than
in previous operating systems because of:

•

Enabled System Design which has made the internal and environmental statussaving functions more extensive than those of previous system.

•

Multiprocessing (MP) which potentially allows the execution of code in
sequences not encountered in a uniprocessing (UP) environment. MP can also
cause contention for serially reuseable resources. (In this manual, MP refers to
multiprocessing on both multiprocessors and attached processors.)

•

Locking Mechanism which facilitates Enabled System Design and Multiprocessing functions and maintains data integrity.

•

Subsystems which are responsible for processing work requested from the
system. They maintain their own work queues, control block structures and
dispatching mechanisms - all of which must be understood in order to
effectively pursue problems in the MVS operating system.

•

Software Recovery which attempts to keep the system available despite errors.

•

The large number of new components which provide new functions and whose
internal logic must be understood for effective problem determination.

As a result of this complexity, MVS problem solvers have made two adjustments
in their diagnostic outlook:
•

Rather than learning the system logic at an instruction or module level, they
have learned the system in terms of component interactions at the interface
level.

•

They have learned that the most effective problem analysis at a system level is
obtained from a disciplined, almost formal, diagnostic approach.

Section 1. General Introduction

1.1.1

•

Section 1: General Introduction (continued)
This publication contains those debugging techniques and guidelines that have
proven the most useful to problem solvers with several years experience in
analyzing MVS system problems. These techniques are presented in terms of a
debugging "approach" that can be summarized in three steps:
1.

Identifying the external symptom of the problem.

2.

Gathering relevant data from system data areas in order to isolate the problem
to a component.

3.

Analyzing the component to determine the cause of the problem.

The most important step in this approach is often the first - correctly
identifying the external symptom of a problem. To do this, it is best to get a
description of the problem as it was perceived by an eyewitness. You will want a
description that provides a context from which to start, such as:
"System is looping; can't get in from console."
'''Job abended with 213"

"I/O error on 251."
"Console locked out."
"Terminal hung, keyboard locked."
"System in wait, nothing running."
"Bad output."
"Job won't cancel."
"System degrading. Very slow."
"System died."
"OC4 in component abc."
The list is endless, of course. Your objective is to fit one (or more) of these
descriptions to one of the following external symptoms.
•

Enabled wait - The system isnot executing any work and when it takes
interrupts, nothing happens. Something appears to be stuck.

•

Disabled wait - The system freezes with a disabled PSW that has the wait bit
on. This can be either an explicit and intentional disabled wait or a situation
that occurs because the PSW area has been overlaid. Unfortunately, the latter
is more often the case.

•

Disabled loop - This is normally a small (fewer than 50 instructions) loop in
disabled code.

,e

1.1.2

Enabled loop - This is normally a large loop in enabled code (and may
include disabled portions - loops as a result of interrupts).

OS/VS2 System Programming Library: MVS Diagnostic Techniques

I

\

Section 1. General Introduction (continued)

•

Program check - The program is automatically cancelled by the system,
usually because of improper specification or incorrect use of instructions or
data in the program. The program check message gives the location of the
failing operation and the condition code. If a SYSABEND, SYSMDUMP, or
SYSUDUMP DD statement was included in the JeL for the job, a dump of
the problem program will be taken.

I.

ABEND - The system issues an SVC 13 with a specific code from 1 to 4095
to indicate an abnormal situation.

•

Incorrect output - The system is not producing expected output. Incorrect
output can be categorized as: missing records, duplicate records or invalid
data that has sequence errors, incorrect values, format errors, or meaningless
data. If a program has apparently executed successfully, incorrect results will
not be detected until the data is used at some future time.

•

Performance degradation - A bottleneck or system failure (hardware or
software) has severely degraded job execution and throughput.

•

TP problem - A problem, usually detected by the operator or terminal user,
that indicates malfunctions are affecting one or more terminals,lines, etc.

The chapters in Section 4 (Symptom Analysis Approach) will help you identify
these symptoms. The main rule at this stage of your analysis is to proceed
carefully. When first screening a problem, do not assume too much. Don't even
assume that the original eye witness description was correct. Keep all initial
information about the problem as a reference for your later analysis.
In the course of identifying the correct external symptom, you will begin
gathering data that will lead you to other sections of the publication. Specific data
gathering techniques are contained in Sections 2 and 3. Section 2 describes the
major MVS debugging areas such as LOGREC records and recovery work areas.
Section 3 describes how to use a storage dump effectively as your main source of
diagnostic material.
Eventually you should have gathered enough data to isolate the problem to a
particular component or process. Section 5 and Appendix A provide techniques
for analyzing system components and processes so that you can determine the
cause of the problem. Appendix Bcontains a step-by-step procedure that can be
used as a guide for analyzing a stand-alone dump.

I

Note: Before you begin using this publication for problem analysis, scan
through it to find out where the various types of information are located.
Depending on your current debugging ski1llevel, various sections will be more
important than others.
Always keep in mind that trouble-shooting a system of the internal complexity
of MVS is not always an "If A, then B" procedure. The guidelines and techniques
presented in this publication define "generally" what the analyst will discover. The
nature of the debugging process is such that the problem solver does not perform
the same analysis for every problem.

Section 1. Generallntroduction

1.1.3

Section 1: General Introduction (continued)

IPCS - Interactive Problem Control System
The interactive problem control system (IPCS) provides MVS installations with
expanded capabilities for diagnosing software failures and facilities for managing
problem information and status.
IPCS includes facilities for:
• Online examination of storage dumps.
• Analysis of key MVS system components and control blocks.
• Online management of a directory of software problems that have occurred in
the user's system.
• Online management of a directory of problem-related data, such as dumps or
the output of service aids.
IPCS runs as a command processor under TSO, allowing the user to make use
of existing TSO facilities from IPCS, including the ability to create anti execute
command procedures (CLISTs) containing the IPCS command and its subcommands.
IPCS supports three forms of MVS storage dumps:

I • High-speed stand-alone dumps produced by AMDSADMP.
• Virtual dum,ps produced by MVS SDUMP on SYSl.DUMP data sets.
• Virtual dumps produced by MVS SDUMP on data sets specified by the
SYSMDUMP DD statement.
Dumps on data sets specified by the SYSABEND or SYSUDUMP DD statements cannot be analyzed using the IPCS facilities.
For information about IPCS, refer to the OS/VS2 MVS Interactive Problem
Control System (IPCS) User's Guide and Reference.

1.1.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

(

"

Section 2. Important Considerations Unique to MVS

This section describes concepts and functions that are unique to the MVS environment and useful to problem analysis. It also contains miscellaneous debugging
hints and general data gathering techniques.
The chapters in this section are:
• Global System Analysis
• System Execution Modes and Status Saving
• Locking
• Use of Recovery Work Areas in Problem Analysis
• Effects of Multi-Processing on Problem Analysis
• MVS Trace Analysis
• Miscellaneous Debugging Hints
• Additional Data Gathering Techniques

Section 2: Important Considerations Unique to MVS

·2.1.1

2.1.2

OS/VS2 System Programming Library: .MVS Diagnostic Techniques

Global System Analysis

In trying to isolate a problem to an intemal symptom, a global system analysis
often uncovers enough data to provide a starting point for the actual problem
isolation and debugging. This chapter discusses the main considerations the analyst
should be aware of when analyzing a stand-alone dump, including:
•

The system areas that should be inspected to understand the current system
state at the time of a dump

•

The system areas that should be examined to understand the current state of
the work in the system and the current disposition of storage and tasks

Global Indicators That Determine the Current System State
The following areas should be examined to help determine the current state of the
system:
I.

PSA - occupies the first 4K bytes of real storage for each processor. Note that
absolute 0 is not used during normal system operation on a machine with the
MP feature - this is true whether the system is operating in MP or UP. (The
one exception is a control program that is system generated with
ACRCODE=NO.) During NIP processing the PSA(s) for the processor(s) are
initialized and the prefix register(s) are initialized to point to them.

Special Notes About Standalone Dumps:
•

Before taking a stand-alone dump, it is necessary to perform a STORE
STATUS operation. This hardware facility does not use prefixing;
instead it stores values such as the current PSW, registers, CPU timer, and
clock comparator in the unprefixed PSA (the one used before NIP
initialized the prefix register) at absolute address 100. The dump program
subsequently saves these values and, in an MP environment, issues a
SIGP instruction to the other processor requesting a STORE STATUS
operation. As a result, these values in the unprefixed PSA are overlaid
by the second processor's values.
Therefore, in an MP environment the status in the unprefixed PSA is
always that of the non-IPLed processor, not the one on which the standalone dump was IPLed.

•

In a machine not equipped with the MP feature and therefore without
prefixing, the IPLing of the stand-alone dump program causes low storage
(0-X'18') to be overlaid with CCWs. You should be aware of this and not
consider it as a low storage overlay.

Section 2: Important Considerations Unique to MVS

2.1.3

Global System Analysis (continued)

;;
•

In an MP environment, the STORE STATUS operation must be performed
only from the processor to be IPLed for the stand-alone dump program.

•

IPLing the stand-alone dump program twice causes the storage dump to
contain a dump of itself because it was read in for the first IPL. This
causes the dump program to overlay a certain portion of the nucleus
(generally starting at X'7000') and the general purpose registers to contain values associated with the stand~alone dump program and not MYS.

•

If the operator does not issue the STORE STATUS instruction before
IPLing a stand-alone dump, the message "ONLY GENERAL PURPOSE
REGS YALID" appears on the formatted dump. The PSW, coritrol
registers, etc., are not included. This greatly hampers the debugger's
task:
.

)

2.

Registers and PSW - The print dump program formats the current PSW and
the general, floating point, and control registers associated with each processor.
From these, you can determine the program executing on each processor.

If the current PSW is 070EOOOO 00000000 and the GPRs are all 0, you are
in the no~work wait condition, which indicates no ready work is available
for this processor to execute.
If there is or should be work remaining, an invalid wait condition results.
(Refer to the chapter on "Waits" in Section 4.)
If the registers are not equal to zero and the PSW does no~ contain the wait
bit (X'0002'), there is an active program. If the wait task is dispatched, the
system is in the no-work wait condition.

3.

ILC/CC - location X'84' for external interrupts; location X'88' for SYC
interrupts; location X'8C' for program interrupts. These fields indicate the
last type of interrupt associated with each interrupt class for each processor.
The work active when each interrupt occurs is represented by the old PSWs
aLlocations:' X'18' (external); X'20' (SYC); X'28' (program). Common conten ts 0 f these fields are:
X'84'

2.1.4

00001004

clock comparator

00001005

CPU timer

- 00001201

SIGP-emergency signal

- 00001202

SIGP-external call

OS!VS2 System Programming LibraI)': MVS Diagnostic Techniques

';j

Global System Analysis (continued)
X'88' - 000200xx

where xx is the SVC number. This field should be
inspected for 1,!nusual SVCs such as:

1, - WAIT:

can indicate an enabled wait situation

D - ABEND:

can indicate program error processing

F - ERREXCP: can indicate· a problem in I/O error processing
10

.PURGE:

can indicate a problem in the swap process

38

ENQ:

can indicate a resource contention problem

4F

STATUS:

can indicate a non-dispatchability problem

X'8C' - OOOXOO 11

indicates a'page fault interrupt. Anything other than
a code of 11 is highly suspect and must be inspected
further. Also with a code of 11 ~ the program check
old PSW (location X'28') must be enabled (mask
X'OT) because disabled page faults are not allowed in
MVS and it is an error if one occurs.

=

4.

PSA + X'204' (CPU ID)

5.

PSA + X'21 0' (address of LCCA - 1 per processor) - The LCCA contains many
of the status-saving areas that were located'in low storage in previous systems.
It is used for software environment saving and indications. The registers
associated with each of the interrupts you find in the PSA are saved in this
area. In addition, the system mode indicators for each processor are
maintained in the LCCA.

6.

PSA + X'224' (PSAAOLD) - This is the address of the ASCB of the work last
dIspatched on each processor. This field indicates the address space that is
curren tly executing.

7.

PSA + X'21C' (PSATOLD) - This is the address of the TCB of the work last
dispatched on each processor. This field in conjunction with PSAAOLD isolates to
a task within an address space. Note: PSATOLD=O when SRBs are dispatched.

8.

PSA + X'228' (PSASUPER) - This is a field of bits that represent various
supervisory functions in the system. If a loop is suspected, these bits should
be checked in an attempt to isolate the looping process.

Note: Because of SRM timer proces~ing in MVS, the external first level
interrupt handler bit (X'20') or the dispatcher bit (X'04') may be set.in this
field even in the enabled wait situation.
9.

PSA + X'2F8' (PSAHLHI) - This field indicates the current locks held on
each processor. Knowing which locks are held helps isolate the problem,
especially in a loop situation. By determining the lock holders you can
isolate the current process. (See the chapter on "Locking" later in this
section.)

Section 2: Important Considerations Unique to MVS

2~1.5

Global System Analysis (continued)
10. PSA + X'380'(PSACSTK) ~ This is the address of the active recovery stack
which contains the addresses of the recovery routines to be routed control in
case of an error. If the address is other than X'COO' (normal stack), the type
.' of stack (for example, program check FLIH or restart FLIH) is meaningful,
especially inthe.Joop situation.
By searching the normal stack (X'COO') and associating the recovery
routine to active mainline'routines you may get an idea of the current process.
This is true only ifthe pointer to the current entry is not X'C34,' which would
indicate an empty recovery stack.

Note: If a loop is suspected, the first word following each routine address in
the current stack should be scanned. A X'80'indicates that routine is in
control. A X'40' indicates that routine is in control and that it is a nested
recovery routine ..
If X'IO' into the stack is non-zero, also check for an SDWA address at X'44'
into the active stack. This block is mapped by the SDWA DSECT and is
described in the Debugging Handbook, (RTCA and SDWA are different names
for the same control block.) If an SDWA address is present, an error has
occurred and it can be related to the problem you are analyzing. If trapping
via RTM's SLIP facility, the registers at entry to RTM are contained in this area.

At this point you should understand each processor's current activity, any
possible errors that have been detected by recovery, and the current system
state or mode.

Work Queues, TCBs and Address Space Analysis
Examine the following areas to help determine the current state of work in the
system.

TCBSummary
The TCB summary report, produced by AMDPRDMP (print dump program),
contains a s,ummary of the address spaces .and their associated tasks. A quick scan
of the completion (CMP) field for each task reveals any abnormal terminations
that have occurred. Discovery of an error completion code warrants further
investigationasto the cause. Remember, however, that these codes are residual
and thejob or task might have recovered from the problem.
, Also investigate mUltiple abnormal completion codes which all relate to the same
area of the system, or many tasks that all have the same completion code. These
'completion codes. can all relate to one area of the .system and perhaps to the problem
you are investigating. Again, LOGREC should provide further documentation in an
error situation such as this.

2.1.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Global System Analysis (continued)
Once you understand the system's history from a trace, LOGREC,.and error
viewpoint, you should examine the work to be done as your next step to understanding the problem.

SRB Dispatching Queues
The print dump program formats the SRB dispatching queues. Elements on any of
these queues should be investigated, especially in cases where no work appears to
be progressing through the system.
Elements on the global or local service manager queues (GSMQ/LSMQ) can
indicate that the dispatcher has not received control since these SRBs were
scheduled. This is an unusual condition that should be investigated. It can also
indicate that the CVT anchors for these queues have been inadvertently altered.
This again is an error condition.
Elements on the GSPLs/LSPLs should be explained. It is possible the dump was
taken before the SRB routines were able to execute. But it more likely indicates
some other system problem such as an enabled wait or disabled loop. If there
are SRBs on an LSPL, you should determine if the associated address space
is swapped-into storage and if it is not, why not. (Possible causes are real frame
shortage or a problem in the paging/swapping mechanism.) Again this is an indication of a potential system problem. The chapter on "Waits" in Section 4 and the
chapter on "Dispatcher" in Section 5 contain additional information on the
dispatching queues.
If, at this point, you can isolate the problem to a component, refer to the
"Component Analysis" for that component in Section 5. The chapter on "Waits"
in Section 4 should prove helpful if you have isolated to a problem in the system.

Address Space Analysis
If you have isolated the error to a given address space or wish to determine the
state of a given address space, analyze the ASCB.
Important indicators in the ASCB are:
•

. ASCBLOCK (ASCB + X'80') - to determine the specific state of the local lock.
If it. contains 7FFFFFFF or FFFFFFFF (the lock suspend/interrupt IDs),
refer to the chapter on "Locking" later in this section for an explanation.

Note: When holding a suspend lock, code can only be suspended because it
attempts to obtain an unavailable higher suspend lock or because of a page fault.
To find the reason for the suspension, refer to the discussion of Task Analysis
later in this chapter and to the chapter on "Locking" later in this section.

Section 2: . Important Considerations Unique to MVS

2.1.7

Global System Analysis (continued)
•

ASCBEWST (ASeB + X'48') - to determine the TOD clock value when the
address 'space last executed. This field helps you determine how long an
address space has been swapped-out. By subtracting this field (middle four
digits) from the last timer value in the MVS trace table and converting to
seconds, you can discover the approximate swap-out time. (See the
chapter "MVS Trace Analysis" later in this section.)

•

ASCBRCTF (ASCB +X'66'), - current status of the address space.
ASCBFLGI (ASCB + X'67')

•

ASCBASXB (ASCB + X'6C') - pointer to the ASXB that anchors the T~Bs.

•

ASCBSRBS (ASCB + X'76') - number ofSRBs active (currently executing or
suspended) in the address space.

•

ASCBOUCB (ASCB + X'90') - pointer to the OUCB, whicH is helpful when
determining Why an address space is swappedout.

•

ASCBFMCT (ASCB + X'?8')- number of real frames currently occupied by
the address space.

•

ASCBTCBS (ASCB + X'7C')

number of ready TCBs.

ASCBCPUS (ASCB + X'20') -

number of processors running tasks in this
address space.

Task Analysis
Once you understand the ASCB you should analyze the associated task structure.
Once again, scan the TCBs associated with your address space and look for an
abnormal completion field. While doing so, check the RB structure for each task.
Remember that the region control task, dump task, and started task control are
represented by the firstthree TCBs. "Normally" they will be waiting during
task execution. If one of them is not, you should determine why.
Assuming the first three TCBs ar.e not obvious problem areas, continue
inspecting the remaining TCBs. You are trying to explain each RB. Starting with
the last RB created (the first RB, pointed to by the TCB + 0), determine what work
is represented. If work is waiting, find out.why.

Note: The master scheduler address space has system task TCBs that differ from
other address spaces. Refer to the diagrams for Master Scheduler Initialization, Start
Initiator, and Job Execution in the topic "General System Flow" in the Debugging
Handbook, Volume 1 for details of the TCB structures.

2.LB

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Global System Analysis (continued)
The RBOPSW indicates the issuer of an explicit WAIT. If an explicit WAIT
is not obvious, consider the following suspension possibilities and their associated
key indicators:
1.

If ASCBLOCK = X'7FFFFFFF' or X'FFFFFFFF', the status (registers and
PSW) of the suspended or interrupted task is saved in the IHSA (ASCB + X'6C'
points to ASXB; ASXB + X'20' points to IHSA). The reason for suspension
is important. If it is for a lock, find oUJ what address space or task owns that
lock and what the owners.' state is. (The chapter on "Locking" later in this
section shows how to determine lock owners.) If it is for a page fault, find
out of the state of that page fault. Note also that while the RBTRANS field
points to the page fault causing address, the RBWCF is O.

Note: If a task owned the local lock at the time of the suspension or interrupt,
the TCB active indicators and the TCBCPUID (last processor on which this task
was dispatched) is set on. If no TCB in the task structure has these indicators
set, you can assume an SRB owned the lock. If no SRBs are on the CMS
suspend queue, the suspension is probably the result of a page fault.
An SRB can be suspended because of a page fault or a request for an
unavailable suspend lock. The save area for the suspended SRB is the SSRB
(see the Debugging Handbook). If suspended for page fault processing, the
SSRB is pointed to by the corresponding PCB+IC. PCBs are generally chained
together and anchored in two locations: (1) the RSMHDR for local address
space page faults; (2) the PVT for page faults caused by referencing commonly
addressable storage. Note that if real frames were not available when the page
fault occurred, even local page faults are queued from the PVT on the defer
queue (PVTGFADF, PVT + X'7S4'). For a CMS lock request, the SSRB is on
the CMS lock suspended queue. Se~ the chapter on ''Waits'' in Section 4 for
details on how to locate the SSRB. For Local lock suspensions, the SSRBs are
chained together on a queue anchored in the ASCB (ASCB + X'84').
A locked TCB can be suspended for the same reasons as an SRB. The save
area is the IHSA (described in the Debugging Handbook). The IHSA is valid
during a page fault if the corresponding PCB+8 flag is on, indicating the lock
'was held at the time of the page fault. Also, the TCBLLH (TCB + X'I14')
is set to X'O l' if the task was locally locked at the time of the page fault.
The IHSA is valid for a CMS lock suspension if the ASCB is on the eMS
lock suspend queue at label CMSASBF in IEANUCO 1. The TCB can be
suspended because of a page fault while holding both the local and CMS locks.
One way to tell is that the ASCB+X'67' flag for the CMS lock is turned on and
the ASCB address is in the CMS lockword.

Section 2: Important Considerations Unique to MVS

2.1.9

Global System Analysis (continued)
2.

If ASCBLOCK = X'OOOOOOOO' and the memory/task is waiting, the status
is saved in the RB/TCB. (See the chapter on "System Execution Modes and
Status Saving" later in this section.)

3.

Suspended SRBs can cause bottlenecks. The chapter on "System Execution
Modes and Status Saving" can aid in locating any suspended SRBs that relate to
the address space. Note: Do not spend time looking for them unless other facts
about the problem indicate a potential problem in this area.

By far the most important consideration in task analysis is the RB structure of
each task. Generally if you have isolated the problem to an address space, RB
analysis shows a potential problem in the way of:
•

Long RB chains

•

Contention caused by an ENQ (SVC 38) request

•

Page fault waits

•

I/O waits

•

Abnormal termination processing, that is, SVC D RB

Once you have analyzed the RB structure you might want to go back and further
analyze the TCBs. Following are additional important fields in the TCB:
1.

TCBFLGS (TCB + X'ID') - indicators of how the system currently considers
this task.

2.

TCBGRS (TCB + X'30') -.general purpose registers (O-IS) saved when a
TYPE 1 SVC is issued or for an interruption for a non-locked task.

3.

TCBSCNDY (TCB + X'AC') - additional system indicators for this task that
help to determine why this task is not executing.

4.

TCBRTWA (TCB + X'EO') - pointer to the RTM2 work area (mapped in the
Debugging Handbook) which contains information similar to the SDWA but
also data for RTM processing.

Summary
This chapter contains major considerations you must be aware of when
analyzing a stand-alone dump in MVS. A disciplined approach is important; resist
the tendency to go off on tangents upon finding the first unexplainable condition.
After gathering all the facts, try to resolve the "cause and effect" situations you are
bound to uncover. Generally, at this point you will have isolated the error and can
start a detailed component/process analysis.

2.1.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

~

~

System Execution Modes and Status Saving

MVS differs significantly from previous operating systems by having multiple
execution modes. Status is saved and·restored from many different locations
depending upon the execution mode at the time control was lost. This chapter
explains those modes and how they affect problem analysis.

System Execution Modes
MVS has four execution modes:
1.

Task mode

2.

SRB mode

3.

Physically disabled mode

4.

Locked mode

Code always executes in one of these modes or, in certain cases, in a combination
of modes. For instance, code running in task or SRB mode can also be either
locally locked or physically disabled.

Task Mode
Task mode describes code that is executing in the system because the dispatcher
selected work from the task control block (TCB) chain. To start execution, the
dispatcher sets up the environment (registers and PSW) and then passes control to
the code to be executed. The registers and PSW are found in one of two places:

1.

In the TCB at TCBGRS (TCB+X'30'), which is a register save area used when
unlocked, enabled TCB mode work is interrupted. The PSW is obtained from
the request block (RB) that is found through the TCB+O.

2.

In the IHSA (interrupt handler save area), which is used to save registers when
locally locked task mode code is interrupted. IHSA is found through
ASXB+X'20'; ASXB is found through ASCB+X'6C'. The PSW for locally
locked tasks is obtained from the IHSA.

Task mode is probably the most common execution mode. All programs given
control via ATTACH, LINK, and XCTL operate in this mode.

Section 2: Important Considerations Unique to MVS

2.2.1

System Exe~ution Modes and Status Saving (continued)

SRB Mode
SRB (service request block) mode describes code that is executing in the system
because the dispatcher fmds an SRB on one of the SRB queues. SRB set-up is
started by the SCHEDULE macro. SCHEDULE is an in-line macro that places the
requestor-furnished ~RB on one of two service queues, local or global, depending
on the requestor's specification. These queues can be found from the CVT at
CVTGSMQ (CVT+X'264'), which contains the address of the global service
manager queue, or at CVTLSMQ (CVT+X'268'), which contains the address of the
local service manager queue. Whenever the dispatcher finds work on either queue,
the SRBs are moved to the corresponding system priority list queue. The global
system priority list qlleue (GSPL), which contains globally scheduled SRBs, is
found from the CVT at CVTGSPL (CVT+X'26C').
There is also one local system priority list queue (LSPL) per address space.
Each LSPL, which is found from the ASCB at ASCBSPL (+X'IC'), contains all
SRBs locally scheduled by the requestor and also those SRBs that were globally
scheduled when the targeted address space was swapped out.
SRBs are selected from these LSPLs by thy dispatcherin order to start execution.
The dispatcher loads registers 0, 1, 14, and 15 from information in the SRB and
builds the PSW. The PSW key and address are the responsibility of the scheduler
of the SRB and are specified in the SRB. SRB mode has the characteristics of
being enabled, supervisor state, key requested and non-preemptable. Nonpreemptable means that the interrupt handler should return control to the
interrupted service routine (code running under SRB mode). However, service
routines can be suspended because of a page fault or because a lock (CMS or local)
is unavailable.

Physically Disabled Mode
Disabled mode is reserved for high-priority system code whose function is the
manipulation of criticlllsystem queues and data areas. It is usually combined with
supervisor state and key 0 in the PSW, and assures that the routine running disabled
is able to complete its function before losing control. It is restricted to just a
few modules in MVS (for example, interrupt handlers, the dispatcher, and
programs holding a global spin lock).
Physically disabled mode is used for one of two reasons:
1. To assure that data remains static while the code is referencing or updating the
data.
2. To assure that non-reentrant code does not lose control while performing
critical system functions. For example, lOS must run disabled while enqueueing
and dequeueing requests to UCBs and while updating UCBs at the start and end
of I/O operations.

2.2.2

OS/VS2 System Programming LibraJY: MVS Diagnostic Techniques

System Execution Modes and Status Saving (continued)
In the MVS system, physical disablement on a system basis because of MP must
be accompanied by locking in order to guarantee serialization. MVS disabled code
is also always accompanied by either a global spin lock or code executing under a
"super bit". The "super bits" are located in each processor's PSA (X'228').
They are used primarily for recovery reasons - they allow RTM to recognize that
a disabled supervisory function was in control at the time of error even though
global locks were not held. This indicates that FRR recovery processing should
be initiated by RTM.
Note that type 1 SVCs do not execute disabled in MVS. Instead they are
entered with the local lock. Thus they are considered to be task mode physically
enabled, holding the local lock.

Locked Mode
Locked mode describes code executing in the system while owning a lock. (See
the chapter on "Locking". later in this section.) A lock can be requested during any
execution mode (SRB, TCB, physically disabled).
Status saving while in a locked mode requires unique considerations from the
system. An example is a program that invokes a type 1 SVC, such as EXCP
or WAIT, that executes in locked mode. When a type 1 SVC is enabled, it
can be interrupted. However, if the SVC is interrupted, the registers cannot be
saved in the T<;B beca:use it is being used to save registers active at the time of the
SVC request for return to the requestor. Therefore, status must be saved elsewhere.
For programs executing in locked mode, status is saved according to the
condition surrounding the programs, as follows:

Locally locked task is interrupted. A new area, the IHSA interrupt handler save
area (IHSA), has been defmed in MVS to contain the status when a locally locked
task is interrupted. The IHSA is found from the ASCB + X'6C,' which points to
the ASXB; the ASXB + X'20' points to the IHSA.
Locally locked SRB is interrupted. When locally locked SRBs are interrupted,
there is no problem because SRBs are non-preemptable. The registers and PSW are
saved in the LCCA. When the system has handled the interrupt, the SLIHs return
to the FLIHs, the status is restored from the LCCA, and control is returned to the
interrupted SRB routine.
Locally locked SRB is suspended. Locally locked SRBs that are suspended must
have their status saved in a unique area. The process that suspends an SRB is
responsible for obtaining an SSRB (suspended SRB), 'which will contain the
interrupted status and will also serve as the control block used to reschedule the
service routine once the reason for suspension has been resolved. See "Locating
Status Information in a Storage Dump" later in this chapter for a detailed
description of how to find these SSRBs.

Section 2: Important Considerations Unique to MVS

2.2.3

System Execution Modes and Status Saving (continued)

Determining Execution Mode from a Stand-alone Dump
Knowing the system's execution mode at the time a stand-alone dump was taken is
important in analyzing adisabled coded wait state or a loop. The folloWing areas
may help determine the mode of execution:
LCCA Indicators - There are two bytes of important dispatcher flags in the
LCCA + X'21C'. At location X'21D', the LCCADSRW flag is
turned on just prior to any LPSW (Load PSW) for a global
SRB, a Local SRB, or task dispatch. For a global SRB, the
LCCAGSRB and LCCASRBM flags are also set on. For a
Local SR,B, only the LCCASRBM flag is set on in addition to
LCCADSRW.
PSA Indicators
• Super Bits -

Flags in the supervisor control word located at PSA +
X'228' indicate whether the dump was taken while
in one of the interrupt handlers or dispatcher.

• Recovery
Stack

If the first two words of the RTM stack vector table
(PSA + X'380') are not equal, then control is in one of the
interrupt handlers or the dispatcher. Compare the address
at PSA + X'380' with each entry in the FRR stack vector
table starting at PSA + X'384' to determine the owner of the
active stack. (See the chapter on "Use of Recovery Work
Areas for Problem Analysis" later in this section for stack
vector table analysis.)

• Current
Work

PSA + X'218' contains the addresses of the new TCB, old
TCB, new ASCB and old ASCB consecutively in a four-word
area. If the system is in SRB mode, the address of the old
TCB equals O. If the addresses of the new and old ASCBs are
not equal, then the stand-alone dump was taken between the
time that an address space switch was requested and the time
the dispatcher dispatched an address space or a global SRB was
dispatched. In all cases, the old TCB and ASCB indicate the
current work.

• Locks

The PSA also contains the lock indicators. (See the chapter on
"Locking" later in this section for a description of how to
determine the lock mode.)

ASCB Indicators - The following ASCB locations help determine execution
mode:

2.2.4

X'lC'

Address of the local service priority list,
which contains SRBs queued for dispatching.

X'66-67'

RCT flags.

X'72-73'

Non-dispatchability flags.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Execution Modes and Status Saving (continued)
X'76'

Count of SRBs dispatched in this address
space.

X'7C'
X'80'

Number of ready TCBs in this address space.
Local lock (see the chapter on "Locking'~
later in this section for how to interpret this
field when f 0).

X'84'

Address of the SRB suspend queue for
unavailable local lock requestors.

Keep in mind that mixed modes frequently occur. For
example, a local SRB can obtain a lock, be interrupted, and
the stand-alone dump taken while disabled in the I/O
supervisor. Depending on the system mode at the time of
the interrupt, a task's status (registers, PSW, etc.) can be saved
in one of several places.

Locating Status Information in a Storage Dump
Status information is located in a storage dump depending on the conditions
under which it was saved .

• Task and SRB Mode Interruptions: Status saving is required whenever the
code gives up control, whether voluntarily or involuntarily. Initial status
is saved by the first level interrupt handler (FLIH) as follows:
SVC FLIH (task mode only) - Initially:
registers saved at LCCA+X'380' (LCCASGPR)
Then for Type 1 and Type 4 SVCs:
registers moved to TCB+X'30' (TCBGRS)
PSW moved from PSA to requestor's RB
Then for Type 2, 3, and 4 SVCs:
Registers moved to SVRB
PSW moved from PSA to requestor's RB
I/O FLIH -

Initially:
registers saved at LCCA+X'lCO' (LCCAGPGR)
PSW saved at LCCA+X'200' (LCCAIOPS)

Then for unlocked tasks:
Registers moved to TCB
PSW moved to RB

Section 2: Important Considerations Unique to MVS

2.2.5

System Execution Modes and Status Saving (continued)
For locked tasks (CMS or local):
registers moved to IHSA

ASCB+X'6C' ~ASXB
ASXB+X'20' ---.. IHSA

PSW moved to IHSA
For SRBs:

registers remain in LCCA
PSW remains in LCCA

External FLIH - Initially:
registers saved at LCCA+X'AO'

(LCCAXGRI)

Then for recursion purposes:
registers moved to LCCA+X'EO'
PSW is in PSA+X'240'

(LCCAXGR2)
(pSAEXPSI)

I f first recursion:
registers moved from LCCA+X'AO'
to LCCA+X'120'
PSW is in PSA+X'248'
If second recursion:
registers moved to LCCA+X'AO',
where they stay
PSW is in PSA+X'18'

(LCCAXGRI)
(LCCAXGR3)
(PSAWXPS2)
(LCCAXGRI)
(FLCEOPSW)

Note: Subsequent status manipulation for tasks and SRBs is the same as for the
I/O FLIH(that is, the movement from LCCA to TCB or IHSA is identical).

Program check - Initially:
Then:

registers saved at LCCA+8

(LCCAPGRI)

registers moved to LCCA+X'48"
PSW is in LCCA+X'88'

(LCCAPGR2)
(LCCAPPSW)

For page faults that require I/O the following occurs:
Unlocked tasks:

registers moved to TCB
PSW moved to RB

Locked tasks:

registers moved to IHSA
PSW moved to IHSA

SRBs:

Are suspended: see "SRB Suspension" later in this
chapter.

Note: For SRB code, status is not moved from the LCCA save areas. SRBs are
non-preemptable and are given control back immediately, with the
status being restored from the LCCA.

•

2.2.6

Locally Locked Task Suspension: Status saving is the same as for locked task
interruptions (described earlier under "I/O FLIH") except that IHSAalso
contains the floating point registers, the FRR stacks, and the PSW. The
ASCBLOCK field is updated to contain X'7FFFFFFF'.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Execution Modes and Status Saving (continued)

• SRB Suspension: An SRB can be suspended in two cases. If a service routine
encounters a page fault and a page-in is required, then the SRB routine must
give up control. In that event, an SSRB (suspended SRB) must be obtained and
the status saved in that con trol block. Then the SSRB is queued from the page
control block (PCB) in the real storage manager. When the paging I/O
completes, the SSRB is re-queued to the local service priority list (LSPL)
where it is found later by the dispatcher. The SSRB must be obtained
because the original SRB was not retained after the dispatch. Status saved in an
SSRB must include the current FRR stack.
The second case of SRB suspension is an unconditional request for an
unavailable lock. Status saving for SRB suspension for a lock differs from the
page fault where the SSRB is queued and where control returns after the
redispatch of the SSRB. For a request for a 10ca1lock that is unavailable, the
SSRB is queued from the ASCB. For a request for an unavailable CMS lock,
the SSRB is queued on the CMS suspend queue header. (For more detail see the
chapter on "Locking" later in this section.) In both cases of SRB suspension,
resumption is at the appropdate entry in the lock manager to try to
acquire the lock. Upon release of the eMS lock by the holder, any SSRBs are
rescheduled. Upon release of the local lock by the holder, the first SSRB that
was suspended is given the local lock and rescheduled.
Suspend SRB queues can be summarized:

Page Faults
PCB is chained from PVTCIOQF (at PVT+X'75C') for a common area page
and from RSMLIOQ (at RSMHD+X'24') for a private area page.
PCB+X'lC' points to SSRB.
Local Lock Requests
SSRB is queued from ASCBLSQH (ASCB+X'84').
eMS Locked

SSRB is queued from the CMS SRB suspend queue in lEAVESLA as
shown:

PSALITA
(PSA + X'2FC')

~
+0

LIT

t

DISP LOCK'

~ IEAVESLA
DISP LOCK
SALLOC LOCK
SRM LOCK

00000000
CMS lockword and
queue header for
SR,Bs and ASCBs
suspended for CMS

t

10

+14

CMS LOCK
CMS SUSPEND
Q

HDR

Section 2: Important Considerations Unique to MVS

2.2.7

2.2.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Locking

Serialization of resources to provide data integrity and protection is a necessary
function of operating systems. In pre-MVS systems, resource serialization was
accomplished by physical disablement and by the ENQ/DEQ component. Physical
disablement controls only one processor and thus, in MP systems, does not
guarantee serialization.
To achieve these requirements the locking facility provides:
• Serialization in a tightly-coupled MP system
• Serialization across address spaces for common resources
• Serialization within address spaces
A central lock manager acquires and maintains all locks. Use of the lock
manager is restricted to key 0 programs running in supervisor state, which prevents
unauthorized problem programs from interfering with the serialization process.
The lock manager is located in the nucleus in CSECT lEAVELK.

Classes of Locks
MVS locks are divided into two classes:

•

Global Locks, which protect serially reusable resources related to more than
one address space. These resources provide system-wide services or use
control information in the common area. Examples of resources protected by
global locks are UCBs and dispatcher control blocks.

•

Local Locks, which protect serially reusable resources assigned to a particular
address space. When a task or SRB holds a local lock, the queues and control
blocks serialized by that lock can be used only by the task or SRB holding the
lock.

Figure 2-1 defines the MVS locks. All MVS locks, except the local lock, are
global locks.

Section 2: Important Considerations Unique to MVS

2.3.1

Locking (continued)
Name

Description

DISP

Global dispatcher lock - serializes all functions associated with the
dispatching queues.

ASM

Auxiliary storage management lock.....; serializes the auxiliary storage
resources.

SALLOC

Space allocation lock - serializes real storage management (RSM)
resources; virtual storage management (VSM) global resources, and
some auxiliary storage management (ASM) resources.

10SYNCH

I/O supervisor synchronization lock - serializes the lOS purge function
and other lOS resources.

10SCAT

lOS channel availability table lock - serializes the lOS processorrelated save area.

10SUCB

lOS unit control block lock - serializes access and updates to the unit
control blocks. There is one lock per UCB.

10SLCH

lOS logical channel queue lock - serializes access and updates to the
lOS logical channel queues. There is pne lock per channel queue.

SRM

System resources manager lock - serializes use of the SRM control
blocks 'and associated data.

CMS

Cross memory services lock - serializes on more than one address space
where this serialization is not provided by one or more of the other
global locks. Provides global serialization when enablement is required.

LOCAL

Local storage lock - serializes functions and storage within a local
address space. There is one lock per address space.

~"

Note: Locks are listed in hierarchical order, with DISP being the highest lock in the
hierarchy.

Figure 2-1. Definition and Hierarchy of MVS Locks

Types of Locks
Two types oflocks exist. The type determines what happens when a processor
makes an unconditional request for a lock that is unavailable. The types are:

2.3.2

•

Spin locks - prevent the requesting processor from doing any work until the
lock is cleared by the other processor. The requesting processor enters a loop
in the lock manager (lEAVELK) that keeps testing the lock until the other
processor releases it. As soon as the resource is free, the first processor can
obtaip the resource and continue processing.

•

Suspend locks - prevent the requesting program from doing work until the
lock is available, but allow the processor to continue doing other work. The
request is queued by suspending the requesting task or SRB, and the requesting
processor is dispatched to do other work. Upon release of the lock, the highest
priority queued requestor is given c'ontrol of the lock, except in the case of the
local lock. Upon release of the local lock, the first SSRB will be given the lock
and rescheduled.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Locking (continued)
Combining classes and types oflocks provide three categories oflocks:

Global Spin Lock, which is used primarily to provide serialization in MP systems.
While code is executing under a global spin lock, it is physically disabled. An
unconditional request for an unavailable lock will cause the processor to spinin the
lock manager. Upon release of the global spin lock, the looping processor acquires
ownership and returns control to the requestor.
The global spin locks supported by MVS are: DISP, SALLOC, ASM, IOSYNCH,
IOSCAT, IOSUCB, IOSLCH, and SRM.

Local Suspend Lock, which is used to serialize resources within an address space.
There is one local suspend lock per address space and it is located in the ASCB.
An unconditional request for the local lock when it is not available causes the
suspension of the requesting task or SRB until the lock is released.
Global Suspend Lock, which is used to serialize resources that are commonly
addressable from any address space. The requestor remains physically enabled
while owning the lock. The CMS (cross memory services) lock is the only
supported global suspend lock. The local lock must be held in order to obtain
the CMS lock. An unconditional request for the CMS lock when it is unavailable
causes suspension of the requesting task or SRB.

Locking Hierarchy
To prevent a deadlock between processors, MVS locks are arranged in a hierarchy,
and a processor may unconditionally request only locks higher in the hierarchy
than locks that it currently holds. The locking hierarchy is the order in "which the
locks are listed in Figure 2-1 with DISP being the highest lock in the hierarchy.
Some locks are single system locks (for example, DISP), and some locks are
multiple locks in which there is more than one lock within the lock level (for
example,IOSUCB). For those global lock levels that have more than one lock, a
processor may only hold one lock of each level. For example, if a processor holds
an IOSUCB lock, it may not request a different IOSUCB lock.
The local lock must be held by the caller when requesting the eMS lock. Also,
the local lock cannot be released while holding the CMS lock.
It is not necessary to obtain all locks in the hierarchy up to the highest lock
needed. Only the needed locks have to be obtained, but in hierarchical sequence.

Section 2:' Important Considerations Unique to MVS

2.3.3

Locking (continued)

Determining Which Locks Are Held On a Processor
To diagnose certain MVS problems, such as wait states and performance
degradation, it is necessary to d.etermine the lock status of the system as well as
the back-up of work caused by lock contention.,
Locks held by a particular processor are indicated in the processors PSA
(prefixed save area). There is a bit map in the PSA which the lock manager
checks when a request is made for a lock. This map is called PSAHLHI (PSA
highest lock held indicator). Each bit corresponds to a particular lock in the
hierarchy. The bits are in the same order as the hierarchy so that the low-order bit
corresponds to the lowest lock in the lock hierarchy. When a bit is on, it means
that lock is held ~y the corresponding processor. Figure 2-2 shows the bit
assignments.
(Note: When a holder of a eMS or local lock is .suspended, the corresponding bit in
the PSAHLHI field is reset to 0 even though the lock is still held.)
PSAHLH I (location X~2F8' in PSA)

. .2FA

2F8

10
08
04
02
01
00
00
00
00
00
00
00
00

00
00
00
00
00
80
40
20
10
08
04
02
01

DISP
ASM
SALLOC
IOSYNCH
IOSCAT
IOSUCB
IOSLCH
not assigned
not assigned
not assigned
SRM
CMS
LOCAL

Figure 2-2. Bit Map to Show Locks Held on a Processor

2.3.4

OS/VS2System Programming Library: MVS Diagnostic Techniques

Locking (continued)

Content of Lockwords
Each lock is represented by a lockword that defmes the availability and status of
the lock. The contents of lockwords differ according to the type oflock they
describe:
Global Spin Lockword

X'OOOOOOOO' - Lock is available.
X'00000040' - Lock is held on processor O.
X'00000041' - Lock is held on processor I.
Global Suspend Lockword (CMS Lock)

X'OOOOOOOO' - Lock is available.
X'OOxxxxxx' - ASCB address of owner of lock. If an address space owned the
CMS lock but was interrupted or suspended, the ASCBCMSH flag
in ASCBFLG 1 is turned on and the CMS lock-held bit in
PSAHLHI is turned off until the address space is redispatched.
The ASCB address remains in the CMS lock until it is released.
Locai Suspend Lockword (Local Lock)

X'OOOOOOOO' - Lock is available.
X'00000040' - Lock is held on processor O.
X'00000041' - Lock is held on processor 1.
X'7FFFFFFF' - Task or SRB suspended while holding the lock. The reason for
suspension is either a page fault or an unconditional request for
the CMS lock while it was unavailable.
X'FFFFFFFF' - Task or SRB holding the local lock was suspended or interrupted
but is now dispatchable. The reasons for this state are:
• A page fault has been resolved for a locked task or SRB.
• The CMS lock, at one time unavailable, is now available.
• A higher priority address space was given control over this
locked task.

How To Find Lockwords
Lockwords for single system locks are located in a table called lEAVESLA
(pSA + X'2FC' points to the lock interface table (LIT); LIT + 0 points to
IEAVESLA). They can also be located at the label IEAVESLA in a NUCMAP.
Lockwords for multiple system locks are supplied by the requestor of the lock.
The addresses of these are placed in the PSA for each processor at locations
X'284' to X'298'.

Section 2: Important Considerations Unique to MVS

2.3.5

Locking (continued)
. The location of all the lockwords>are shown in Figure 2-3. Note that all
lockwords must reside in fixed common storage.
.

(

Location of
Address of
Lock (when
actually held)

Lock
Name

Class

Type

Number of Locks

Location of Lock

DISP

Global

Spin

1

IEAVESLA+O

ASM

Global

Spin

1 per ASID

ASMHD+X'14'

SALLOC

Global

Spin

1

IEAVESLA+4

IOSYNCH

Global

Spin

1

IOCOM+X'38'

PSA+X'28C'

IOSCAT

Global

Spin

1

IOCOM+X'30'

PSA+X'290'

IOSUCB

Global

Spin

1 per UCB

UCB-8

PSA+X'294'

IOSLCH

Global

Spin

1 per LCH

LCH+8

PSA+X'298'

SRM'

Global

Spin

1

IEAVESLA+8

CMS

Global

Suspend

1

IEAVESLA+X'10'

LOCAL

Local

Suspend

1 per address space ASCB+X'80'

PSA+X'284'

*PSA+X'2FC' points to the lock interface table; the lock interface table +0 points
to IEAVESLA.

Figure 2-3. Classification and Location of Locks

(
2.3.6

OS!VS2 System Ptogramming Library: .MVS Diagnostic Techniques

Locking (continued)

Results of Requests for Unavailable Locks
Global Spin Locks - An unconditional request for a global spin lock results in a
disabled loop in lEAVELK. In this case, register 11
contains the address of the requested lock and register 14
contains the address of the requestor.
Local Locks

- Tasks requestil)g an unavailable local lock are suspended. In
each case, the request block old PSW (RBOPSW) is set
to re-enter the lock manager, and the registers are saved in
the TCB. Note: The dispatcher will not dispatch any task
in the address space other than the holder of the lock until
the lock is released.
SRBs requesting an unavailable local lock are suspended.
In each case, the lock manager obtains an SSRB and places
the GPRs and the current FRR stack there.

Notes:
1. The FRR stack can be used to help recreate the process
leading up to the point of suspension by interpreting the
recovery routines that are currently active. SSRBs for
local lock suspensions can be found by inspecting the
local lock suspend queue anchored in the ASCB from
field ASCBLSQH (ASCB+X'84'). SSRBs are obtained
from SQA (SP 245). SSRBs on the local lock suspend
queue are chained together at SRB+4.
2. When interrogating a given address space, if the
ASCBLOCK field is not 0, check the ASCBLSQH to
determine the SRB work being delayed in this address
space because of lock contention.
eMS Lock

Tasks unconditionally requesting the CMS lock when it is
unavailable are suspended. For each task:
•

GPRs are saved in the IHSA which is pointed to from
ASXB + X'20'.

•

The resume PSW is set to re-enter the lock manager.

•

The ASCB is queued on the CMS suspend queue. (The
first element of the CMS suspend queue is anchored in
CSECT IEAVESLA + X'14'; this anchor points to either
an SSRB or an ASCB which is suspended for the CMS
lock. There is only one queue for suspended CMS lock
requesters.) Note: When a NUCMAP is not available,
locate the IEAVESLA through PSA + X'2FC' which
contains the address of the lock interface table; the
lock interface table + X'O' contains the address of
I EAVES LA.

Section 2: Important Considerations Unique to MVS

2.3.7

Locking (continued)
The address spaces suspended on the CMS lock are
represented by the ASCBs on the CMS suspend queue.
The ASCBs are chained together at the field ASCBCMSF
(forward pointer).
Note: When an ASCB is on the CMS suspend queue, the
ASCBLOCK contains X'7FFFFFFF'.
When the CMS lock is released, the ASCBLOCK is
changed to X'FFFFFFFF', which indicates that work was
interrupted but it is now ready to be resumed.
SRBs unconditionally requesting the CMS lock when it
is unavailable are suspended. For each SRB, the lock
manager:
• Obtains an SSRB from SQA
• Saves GPRs and the FRR stack in the SSRB
• Sets ASCBLOCK to X'7FFFFFFF'
• Chains the SSRB on the CMS SRB suspend queue
located in IEAVESLA (IEAVESLA + X'14')

Note: Since there is only one queue for suspended CMS
lock requesters, the SSRBs and ASCBs are chained on the
CMS suspend queue using either ASCBCMSF (ASCB +
X'C') or SRBFLNK (SSRB + 4). There are no backward
pointers. Thus the CMS suspend queue could have the
following appearance:
PSALITA
(PSA+X'2FC')

c:
t

+0

)
SSRB

DISPLOCK '

.Joo.

J

+4

-

+C

,

(SSRB

IEAVESLA
DISP LOCK

+4

SRM LOCK
0-- - - --0

+1 4

2.3.8

CMS SUSPEND ./
QHDR

/

OS/VS2 System Programming Library: MVS Diagnostic Techniques

,

+4

...a.. ASCB

L/
+C

SALLOC LOCK

CMS LOCK

__ SSRB

ASCB

..lo.ASCB

~/

Use of Recovery Work Areas For Problem Analysis

Recovery processing, which is unique to MVS, enhances the reliability of the
operating system. When an error occurs, "active recovery" is given control, one
routine at a time, in an attempt to isolate the error to a unit of work. Recovery
terminates that work instead of the entire operating system and then continues
normal system operation. This process occurs whether the error is in the system or
an application.
Because system operation is not halted at the point of error, the resulting storage
dumps represent system status sometime after the original error(s). Often the
system can encounter numerous errors, fully recover, and continue. At other times
it can be a recovery failure that causes the system to cease operations and to take
a stand-alone dump. In either case, the obvious problem and its associated tracks
have been covered over. This makes the back-tracking process extremely difficult.
However, experience has shown that although recovery causes this difficulty, it
can very often provide valuable clues for the problem analyst. This chapter points
out important recovery areas and explains how they can be used in the debugging
process.

CAUTION: Recovery is not designed to aid the problem solver; it is designed as
a means by which the system can prevent total loss. Because recovery maintains
system status information, its work areas often provide the same information to
the analyst. However, once recovery is invoked, the system is in a tenuous
position; it is attempting to maintain operation despite an error. It is possible that
the recovery process itself can encounter the same error or bad data. Most often
this is not the case; the system does recover and continues normal operation.
But the possibility of recursive errors in the recovery process does exist, in which
case the new error becomes of prime consideration. If you are dependent on
internal recovery conrol blocks and queues, be aware of this possibility. Don't
get caught following a chain of blocks for some subsequent or unrelated problem
that will help your own error-finding efforts. This danger is most prevalent when
you use recovery work areas without following the normal work-related debugging
techniques. Do not immediately use the RTM2 work area without analyzing the
Task/RB structure and associated indicators.
The following work areas should be used carefully and only after traditional
techniq ues have failed. The exceptions to this rule are:
• When the dump is taken as a result of a trap (for example, SLIP) and the analyst
understands that the current status at the time of error can only be found by
using the recovery save areas.
• When there are problems in the recovery process itself.
In other instances, be aware of the total environment so that what you discover
in these areas bears some relationship to the problem you are analyzing. These
areas are of great importance if used with understanding.

Section 2: Important Considerations Unique to MVS

2.4.1

Use Of Recovery Work Areas For Problem Analysis (continued)

SYSl.WGREC Analysis
For effective problem analysis, use the information in SYSl.LOGREC
to understand the error history Of the system. Because of recovery
processing, MVS does riot halt operation when an error occurs. Dump analysis
must be performed using a snapshot of storage as it appears sometime after the
error and recovery have occurred; therefore, some type of recording mechanism is
needed in order to trace the error.
The entries in SYSl.LOGREC provide information about a potential problem.
This is the most informative data about the error that you receive. The
SYS1.LOGREC entries serve as a diagnostic trace of the problem encountered by
the operating system; they usually provide a history of events leading up to a
system incident. Use this information to understand system problems, the recovery
acti()ns that are taken as a ,result of these problems, and the outcome of the
recovery attempt.
Often more than one record exists for the same software incident. You must
be able to relate these records in the proper sequence and understand the progress
of recover'y the various records indicate. Knowing the errors that have occurred
since the last IPL helps you understand the system behavior and explains;
your findings at dump analysis time.
In stand-alone dump analysis you should always inspect the in-storage LOGREC
buffer for entries that recovery routines have made but which were not written to
the SySl.LOGREC data set because of a system problem. Very often it is these
records that are the key to the problern solution. (There is a discussion of
LOGREC buffer analysis· later in this chapter.) ,
Information that is w!itten by recovery routines to the SYSl.LOGREC data set
is used primarily to monitor incidents both when retry is attempted and when
percolation to the next recovery routine takes place.
Generally, functional recovery routines (FRRs) will write a SYSl.LOGREC
record whenever they are entered. The default for ESTAE routines, however, is to
not write a record. This means that unless the ESTAE routine specifically requests
recording, no SYSl.LOGREC record will be built.

Listing the SYSl.LOGREC Data Set
To get a listing of the SYSl.LOGREC data set, use the IFCEREPI service aid as
described in OS/VS Environmental Recording Editing and Printing (EREP)
, Program. (The JCL required to print the SYSl.LOGREC data set is contained in
the chapter "Additional Data Gathering" later in this section. It is important to
obtain both an event history and a full report. The event history (EVENT=Y
parameter on the EXEC statement) prints an abstract for all records in
chronological' order. This allows the analyst to recreate the sequence of events.)
IFCEREPI formats the standard area, the first X'194' bytes of each SDWA, into
a series of titles, each followed by pertinent data found in the standard area.
IFCEREPI will put the variable area, the last X'6C' bytes of each SDWA, in an
alphameric or hexadecimal format, whichever is specified. This variable area is

2.4.2

OS/VS2 System Programming Lihrary:MVS Diagnostic Techniques

Use of Recovery Work Areas For Problem Analysis (continued)
used by the recovery routines to construct messages and to provide data that often
contains valuable debugging information.
There are five different types of software incidents for which the failure is
written to SYSl.LOGREC. They are:
1. ABEND (SVC 13)
2. Invalid SVC
3. MCH software recovery attempt
4. Program check
5. Restart key depressed

SYSl.LOGREC Records
This section contains examples and explanations of three different types of error
records that you can obtain from SYS1.LOGREC.

SYSl.LOGREC Software Incident Record 1
Figure 24 is an example of the data that is recorded in SYS1.LOGREC when a
software (source) entry is recorded as the result of an SVC 13. The following
explanations are called out by Notes A-E in the example:
Note A: The CSECT name is IDAVBPPI; it can be found in module IDDWI.
IDAVBPRI is the FRR that processed the error under consideration.
The EC PSW indicates that SVC D was issued at location X'F4BB64'.
Note B: Approximately midway into the formatted record shown, you find
more spe~ific information about why this particular LOGREC entry was
made. Note B points out three bits that reflect the status of the system
a t the time this failure was detected:
• SVC was issued by a locked or SRB routine.
• Logically disabled (physically disabled, locked, or SRB) routine
was in contr.ol. .
• Type 1 SVC routine was in control.
Note C: For this LOGREC record there is no formatted entry for the system
completion code. Only a portion of the recorded software incidents are
assigned a system completion code. A system completion code can be
found at X'04' bytes into the SDWA, which can be found unformatted
at the bottom of the record. Also, if the cause of a record is an abend
SVC, a completion code is contained in register 1 under "Regs at time
of error". The system completion code for this failure is OE3.

Section 2: Important Considerations Unique toMVS

2.4.3

N

CA TE

~.

_ ________ _.____ _ ___

~

20 31 01 93

CPU

RELEASE

.SE_R IAL ___ .1D ____ LEVEL _ _ _ __
023732
0158
VS 2 REL. 03

---JO:.iNAME--------·----- --NWll1ADP2

~

ASE;~D

P::tOG~

I:-..IG

~","'E

AM

N~ME OF :-1UDULE INVULVtD

~

NAME OF CSECT INVOLVED
FU/liCTIO,"Al RECJ.VERY RJUTIilE--- IDAVBP~l

1

~

Note A

!3

4

-"f,GS 0:"7__ ~fC!S_8_-:15 _ _ _ JJOOO_OJ_O___~_OLi.~ 774

~

J:~ST

-a:

LE~jGTH

COOE _______________

I~TtRRUPTCJDE

~

_____

._ _ _._ _ _ _ _ _ _ ~OOOOOOLO_O_OOOOOQ _ _ _ __

_ _ _ _ _ OOOoOOO_Loo_Oonooo

coeceoco

________________

50F~B9_rj~0_OC~ nt~50F:j_q~_E.9~O)O-"0..!_S_W__

BC

_____ t,/A. __

~
N

i

CPU

HH "\'4 SS.TH __

--- RECORD ENTRY SOURCE - SO~TWARE --- TYPE
SOFT~A~E(SVC 13)
117 75
-.ERRORID=SEQOOOS6 CPU0040 ASID0002 TIME 20.31.01.0

o

_.._----

TIKE

_______ DAy_n

ESTAE Raco FOR ESTAI)

INTE~RUPT CJOE
VI RT Ai3..QLQf-IBANS EX CEP

0000
EXCEP___ QQ204]_20

_v_l_~I~OV.~_P£_J.~t\'lS

F~U~

070COOOO 00 F4EE90

0000
0020272~0=<--

______________

REGS OF RB LEVEL UF ESTAE EXIT OR ZERO FOR ESTAI

~

__~_EG S __ .9:-:_ !-. ___ OOOCOuO o___0 CC.E_~OC_C. _ _C~_~.4 ~E .?~__ gq9_09~9~i;A 7 ~£! __O.9J~_Al FOB_ _QQ 1 OOOO_1__ 0:JC~_.?~D-'=-oJ_ _ _ _ _ _ _ _ __

tI:l

o

~EGS

~.

00000J03

3-15

---- - ,Y.CH FCAG-

~

40F45174

13 YIE: ------------tiCIC

a

OJCA7B28

CCCCY:JCO
f~puT--niFo

F;{A;:'-E

KEY FAILURE
~CK--_:~{CUP_L)--NOf-RECb;;:i)-E-D--O--REGISTE:RSW~pr{EC-I-C-TABLE
_.1 1'1(, S TA'1~ I S VALID_ _____O_PSW _U:--'PREDI C TABLE

STORAGE

~.

~

g.

APE VALID

ADD~S

STO~AGE

STURAGE IS RECONFIGURED
0
STORAGE DATA CHECK
RF.C[).\FIGUtlE_ STHUS_ AVAIL__oJ_ _ ACJLR_ECUESJ_______
:;>~'_~".F!GU~E NJT ATT[r-IPTED 0
If\SlRLCTICN FAILuRE
_________________________.SQElllW_R
TIMER ERROR
______ ____ ______
BEGINNING VIRT ADDR OF STORAGE CHECK
OOCOOOOO
E~J[~G VI~T ADDP OF STORAGE CHECK
COCCOOOO .

=

.~.

~

REAL STORAGE

fAIl.

INC; ADDRESS-

-~~cHiNE~C-~rE-tK

0

50F4b9B4

00CA73b4

ERRURI~~lJICA

SVC IN

TDR.S

STn~AGEERROR

0

CHA,ilGE INDICATOR ON

0

0
0
0
0
0

FRA~E

OFFLINf(O~

SCHE~)

0

INTERCEPT
0
STJQAGE __ E_~~_O~_P~~MA~E/'{L_Q _ _

PERMANENT RES. STORAGE

0

FB_AME_r~LSOA_..

Q

FRAME IN LSQA
FR"~L£..A....GE

0

F~A~E

IS

Tlr-1_LS_T~~OL~_S_S.QCJAL~lLJ-1A_C!::il,"lL~.!:i£CK

0

0
0

FIXED

V=~

REC_Oi;'l(hE.i{ LOCK
0
LeeK -.
--------6---------..-----

---SR~

°
°
°

UPDAlE8 ~EGS F~F ~ET~Y
C
IUSCAT LUCK
FREE RTCABEFCRE- RETR-Y---o--iu-s-uc-e-COCK
lOSLCr LeCK
IGS~~CH LOCK
_. ___ ._______ l\CB LeeK
D~(e LeCK
ACBDEdS LeCK
-------------------~AS-M-PAT LOCK
_______ SA L L C.C L uc,~
CNS LOCK
LOCAL LOCK
DUMP Crlt.~ACHRI ST ICS-------- .---. - -- -

IOSC~T

-------.---

LOCKWO~U

00100000
I OS-UCB L-OCK-\-I-j~D
o~o~~50o;!.0------------IOSLCH LuCKwu~D
OOu 00 00o ________________________
ILSVNCH LCCKWORO
00000000
_0 ___\lC6 L[)CKIJORD _ _ _ _ _ _ _ _ OOO~OOOO'____________________ .
0
D~CB LGCKWO~D
00000000
a ACBJEBS
LOCK~ORD
OOOOOOOJ
0
A-SMPA T L O~C~K'-.!.~"~J~R!.!.D~------~O~O~O~O~O~O~O~O~-------------0
AS IDe U::.:.R.:..:R..:.:E~N.:...:T'___ _ _ _ _ _0.:..0.:...:0.:...:2=--_ _ _ _ _ _ _ _ _ _ _ _ __
0
Note E
0
.

0

DU~P

OU~P FLAGS

Note 0

SuAfA OPTlCNS
__ S_NA.~D;)JI,~REQUEST_
0_ _ 01 SPLA_Y_NL:Cl,.LUS
PARM LIST SUPPLIED
0
DISPLAY SOA
STORAGE LIST SU?PLIEO
0
DISPLAY LSOA
---- ---. - -.- - --.. - - -- 01 SPLAY--S~A ________________~____________~DISPL~V GTF TRACE TA~lE
151 SPLAY CCNTPGL BLOCKS
_ _ _ _ _ _ _ _ . __ ._________ .__ DlSi~L.t1'l._QC3l.Q_EL_S
USER VAol ABLE EnCOIe DATA

(")

ct.

g

~

~
~
g
....
n

g

~

HE-X

Dll."oIP OF

.0'
c:
(I>

S

~tI.l
N

~

u-.

DJ_SJ~L~Y_SAVc':'A~E_~_S_ _ _ _ _O
_ _~A.!'\IGE

~TD

QOCA7B2B
00C,AID3B_
DISPLAY SAVE AREA HEADER 0
RANGE 2
80CAIFOB 00CAIF2C
DISPLAY oEGISTEqS
0
RANGe 3
00000000 OOCOOOOO
0·1 SI>LAY- TASK LPA-"MoDuLESo--RANGE-4-00000()O-O-000-ooooo
DISPLAV TASK J?A~~~J~D~U~L;E~S__~O_________________________________
D1SP.LAY PSW
0
DI SEl..,A'L_V5ELs...UBPQQL.S'-_----"'0'___ _ _ _ _ _ _ _ _."..-_ _ _ _ _ __
1

Rr-CORO

----'-H:..;:Ec.::.:Ac--DEQ.----4-oii~6BCO

eCOCOOOO

--.-~.-- .. - - :-~-.. - --

~JOO

OJOJOC9C

0020
OJ40
0060

OO~4EE90

-

0075117F

11494895

-------~-- ----_.---------_..

~
~

°

aOOE300Q
00000000
000000~J
C000DC9C
00CA73F4
OOCAIF08
000,)0000-- O~CA 7B28--5CF 4oC;S·i.--CCCAiB64
00000))0
00000000
070C2000
OOF4~B66
-----'cO~-O____cOO-2ooJrj--ooici72 0
oetcoeco
ooon 000
OOAO 001)0001
OOCA1CDO
CCCOOCOO' 4aF~8174
- - - - OaCO-50F455Eb--00000004--0000::1000--0-0000000
OOEO COGOCOOF
GeCCOCOe
C40AOeCO
O)OOOJJO
- - - - - 0 1 0 0 OO)O')J 05---.)0) ooO·J 0---00 0) 0 J 00--0 00 OJ 000
012000020038
C9C4C4E6
C9404040
C9C4CIE5
-----'6-140-0-0-.5b-j-j 60---1 C:O:)000---CCCA-7B 28--0 JC Ai 038
0160 OOJOojOJ
00000000
OCOOOOOO
40000000
·----O-18045320066--,-c-cccoco-c--ccc-OO·cc-o--oooo6-o-0-6
JIAO 6~C9C4Cl
(5C2D101
F16SC14D
C4E2D7C3
----ClCO--7EC3ClFl
C6FOFbo"O--6000J006--oo-0-00000
OleO OCCOOOOO
CCCCGeee
eocoooco
0000)000

f1?

~

0
0
0
1
0
0

AOEA

F~OM

~Cl~~~f~~~B~~i~m~O~~~e,A(~~)~~~8-----------·--------------~---------------­

1 Note C

g'

RANGES

PDAlA CPTlu~S

Figure 2-4.

SYS l.LOGREC Software Incident Record 1 (part 2 of 2)

00023782
OOOOOJ~O

0158C2AO
- OOJOOOOO

0010C001

00CA7:~0

--------------

05D605C5

60C609D9

--------------_._._ .. - - - - - - - -

00000000
00000000

000E3000
40F4B774

50F4B8E·6--000~JJ:)4--00a·00ooo--OOOOoOOO------

0002000D
00202720
070COOOO
00F4EE90
00 F4EE90
OOOOJC9C
O'JCA 7OF4
aOCA IF 08
00000000
00CA7B28
50F4B9B4
00CA7864
b-oooo-oO-O--00o-0600o---oe-oo~io·o-0--cio-coocOO·-----O)OOOOJO
0)CCE550
OOJ00J~O
00e00000
0000-) 0 ~o--o 0 oeo C)O 5 - - 0 Cooc 0 0-0--0001)(1000-----C207D7Fl
CQC4CIE5
C2D7D9Fl
00CCE5CO
8 JCAi F::I8
OJ CA1F 2 C
00000 JO 0
00000000:<.-----05D6D5C5
60C6D9)9
00380040
00020006
006C4033
05 E-b-FOF 1
Ci-C401F2
6~E·5E-~D-6-----5D7EC3Cl
F7C2F2FS
68C140C2
~4C6C350
000 00 O-C)O----O 00-0 0000--00 0 0-0-0 0 0---00-0 0 o-e 0-0 - - - - JOOOOOJO
0)000)00
OJOOOOOO
00000000

Use of Recovery Work Areas For Problem Analysis (continued)
Given the name of the module involved in the error, you can determine
the id of the failing component by using the "Module Summary" section
of the Debugging Handbook. This summary also names the corresponding
PLM for each component. Component microfiche numbers are found in
the "Component Summary" section of the Debugging Handbook, Volume 1.
Note D: The "Diagnostic Aids" section of the OS/VS2 VIO Logic describes the
diagnostic output for module IDAVBPPI. It explains that the recovery
routine sets starting and ending addresses for the DSPCT header and
the BUFC in the SDWADPSL fie~d of the SDWA. A diagnostic message
is then built in the variable recording area in the SDWA (at X'194'). This
message is formatted in the LOGREC record under the 'User Variable
EBCDIC Data' field, just above the unformatted SDWA. (Also see Note P.)
Note E: The entries in the 'Dump Characteristics' section of this LOGREC record
reflect the SDATA and PDATA options specified by the recovery routine
for a SYSABEND, SYSMDUMP, or SYSUDUMP. All recovery routines
can specify exactly what portions of storage are dumped. In addition, the
recovery routines can specify a list of storage ranges that are to be
dumped. In the dump for the failure in this example, the only area of
storage displayed would be the SWA. A range of addresses would also be
included. Range 1 is from CA7B28 to CA7D38; range 2 is from CAIF08
to CAIF2C.
In summary, from studying this LOGREC entry you find that
the module IDAVBPPI has detected an error and has issued a OE3
ABEND, with a return code of 4. At the time of the failure, the system
was logically disabled and a type 1 SVC was in control. SVC 13 was
issued while the system was logically disabled, which is why the LOGREC
entry was written. A functional recovery routine, module IDAVBPRI,
was given control and tried to recover from the error. It was unsuccessful
so it dumped the scheduler work area (SW A) and two sections of stor~ge
where the DSPCT and an important parameter list were located. The
module then percolated to. the next higher FRR in the stack. Note that
the 'Recovery Return Code' field (SDWA X'FC') = 00; this indicates
percolation. A code of '04' indicates that retry was requested.

+

SYSl.LOGREC Software Incident Record 2
Figure 2-5 is another example of the type of data recorded in
SYSI.LOGREC when a software incident occurs. Compare this example with
record 1 in order to understand the different types of information that you can
.
obtain from SYS1.LOGREC.
First compare the time stamp at the top of this record with that in record 1.
These times are either identical or just a fraction of a second apart whenever
the system is percolating through FRRs.

2.4.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

----- --

----~--------.--

..- - - - DATE
D.\YY~

TIME

C?U

CPU

RELEASE

SERIAL

10

lEV_El___-:-:--

HHMM-SS.TH

SO{YWflR E--':::=--iYP-E-SO~iwAP:E (SVC -'1-3''''-- 117 -75--2-0- 31 01 ·93---·02J·78Z--oisa-·--vs 2 REl. 03
ERRORID=SEQ00056 CPU0040 ASID0002 TIME20.31.01.0
-J-U3r~A-Yft:'----'- -NW01ADP2'
ASENDING_PROGRAM NAME
____ r:-JA:
BC M_QPE PSW.-!LU_.11LQ_Lf.fi.~PR
BC MOOE PSW OF LAST RB
---' RECORD f:tiTQY SUURCf -

UF ~UDULE INVULVED
IDDWI
OF CS ECT. INVOLVED
_ ._ _ IDD~t TkM._ _ _ __
~FUNCTIONAL RECuVERY RJuT1NE
IDCWlfRR

Note G_____

NA~E
_~A~E

REGS AT TIME OF

c c CC.Of) OLOO_v 0000 0_

O_O_::I_Q.O_O_O'(LO 0 0_0_0.0.0.0,--_ _ __

E~~OR

u

c;-c--oooo oc

REGS 0-7---- 0~)6uoOJ-·-Ii OOT3(Y60-Q0F4EE
9COOC A 7BF400CAl F 08
__ ~EG~ __ .8-1 '?___ o~9_q'O.QO_ _
40F!!fHJ_4_ _COQ..9..Q.00LA.Y--SOA ---0
D-i S?LAY-SAVE--:-AREA -HEADER 0
RANGE-2--0-0-CS2800-CO-CS-2B20-_ _S.lQ~~G_L~J..iL$l).?p_~_t~Q. _ _Q_ 01 S PL~LL_SQ_A
l _ _D.t.5£>L A~_~EGISTER S
0
RA~G~_ _8_Q~B2A~JL_J~9C62~F8
DISPLAY S~A
1
DISPLAY TASK LPA MODULES
0
RANGE ~
00000000 00000000
DISPL~Y GTF TRACE TABLE
0
DISPLAYTAS~K~J~P~A~M~D~O.~U~L~E~S__O~________________________________
DISPLAY CCNTQOL BLeCKS
0
DISPLAY PSW
DISPLAY OCB/OELS
0DISPLAY uSER SUBPODLS,____~__~________________~___________
USER VARIABLEEa-C-DIC--DATA--~·--------- ··-/NoteH~-- ------- -

t-t

sr-.

J
~

rIl

~

<§

-:rua-NA-,~:E'; N ~ OTAD? 2, V10--- L:"10 o,i; I JD..r;"kTtxCp;Af! C-S-)---ooc"Ifzs-bS; A {V DSC-fD =i)"ifcAi E7 0

i

~.

HEX DUMP OF RECORD
HEADER.
40-83-0-8{i~o--c:C-C:-0:-C:-O::-:0::-:O::-:O:-----:O--:O---=7---=5117F

~

g.

11494e95

o a00---0506 DC 7C ---8-6DEja 00
000 a0 0 00
000 00000
OOzo OOF4EE90
OOOOOC9C
OOCA7DF4
00CA1F08
- - ---- - - OJ40 -0)000::)00
JOCA 7B28--~Ci=-4BC;B4---o-6cJ'.7664
0060 00000000
00000000
070C2000
OOF4S666
--------"008-0-00020000--66-2ci72"0----co-(000-00
000E3-boo
OOAO 00100001
OOCA7CDO
CCCCOCOD
40F4B774
------OOCO-SOF4BbE6--0C000004--00000000--0000:)OOO
OOEO OOOOOGOF
CCCCCOOO
04CAI000
00000000
- - - - - 0 1 0 0 - 00000000--OJOOoo-00--60000 0-0-0--000-00 000
0120 ooooeooo
C9C4C4E6
(9404040
CQC4C4E6
-----o~i4o_o-i)o05:)-6--3-ocooo-6-0---c-cC J2~2a
ooc-~Tf28
0160 00000000
00000000
000000-00
40000000
-----0-180-4534000 o - - c Ccecco o--c"CO{)C 00-6-----00-600-660
o LAO
C1C4D7F2
6Bf5C9LJ6
4003D406
C47EC9C4
------':OlCO-5D407E40--FOFOC-3-t~i{iF-O-FO--6B-ci4i5E5
OlEO 00000000
CCCCOCOO
CCCOOCCO
OOOCOOOO

~.
.:
(D

""

Figure 2-5.

015802AO

D506D5C5

60C6D909

000 00000----0 0001 0 0 0--0-0-000 0 00
0 OUE 3000
00100001
OCCA7CDO
00000000
4aF4B17~
-S-OF483E6--0a 00 0 J)4--0000000-0---0000 OJOO------00020000
00202720
070S0000
OQF4E8FO
OO-F4EE-90
OJoooE9{:---OO-CA1BF4
OOCA IF 08=-------OOOOCOOO
00CArB28
50F489B4
00C~7B64
6-o-o-o000·0--ooooooo-0--ooooci-oo-0--o(iuOo-coo-'----'-----OOOaOlJO
OJCCES50
00000000
00800000
0000000-0--600:)00-0-0--00000000--00000000'-----C9E3D9D4
C9C4C4E6
C9C6)QD9
OOCCE500
OOc"i3-2tDO
OJCBZ-S20--a70Ct32t.C-8
00c"B2AF~8'------DSiJ6DSC-S
60C60909
00380040
00020008
00 6"Ci;o4-S--Dt" [)& c-2155----c104(: S-iE--O-S!: 6F JF 1----------C4E6C968
EoC9C5E7
C3D702C1
4DC9DbC2
C-4-E2C3C-2--~fD-iE-F"(:lFO
C3clF--i(s---F7F040CO'-"------00000000
00000000
00000000
OOOOOJOO

SYSl.LOGREC Software Incident Record 2 (part 2 of 2)

~--

~

00023782-

£-:::....

Use of Recovery Work Areas For Problem Analysis (continued)
The following explanations refer to Notes F-I in Figure 2-5.
Note F: Look at the status bits that appear approximately midway through the
example. One additional bit has been turned on in this entry that was off
in record I. This indicates that this routine received control through
percolation (SDWA + X'EA' =X'IO'). This indicator (that is, SDWA +
X'EA') is not set when recovery processing goes from the last active FRR
to the current ESTAE.
Note G: The name of the FRR in this example is IDDWIFRR. The completion
code and the register contents are the same as in record I.
Note H: Look at the 'User Variable EBCDIC Data' field. This area gives the
location of two more control blocks that can be used in determining
exactly what failed. These two control blocks are:
• lOB, located at address X'CB2BOO'
• VDSCB, located at address X'CA 7E70'
Note I: Compare the lOB and VDSCB addresses to the 'Dump Ranges Area' that
have been specified. The VDSCB does not fall in one of these ranges
because it is part of the SWA. Using the 'Diagnostic Aids' section of.
OSjVS2 VIO Logic, you can identify the other two dump ranges that are
printed. Included in these ranges are the current channel program and
the DEB.

Section 2: Important Considerations Unique to MVS

2.4.9

Use of Recovery Work Areas For Problem Analysis (continued)

SYSl.LOGREC Software Incident Record3
Figure 2-6 illustrates a SYS1.LOGREC software (source) entry that has been
recorded as a result of a·program check (type). The following explanations refer to
Notes J-N in Figure 2-6.
Note J: Because there is no completion code in register 1 for this type of entry,
look for the completion code in SDWA+X'4' (in this case OC9).
Note K: Check the status bits. They confirm the fact that the failure was a
program check that occurred while an enabled RB was in control.
NoteL: The 'Dump Characteristics' bits are on only if the functional recovery
routine issues a SETRP macro with the DUMP=YES operand. This macro
uses the SDWA to contain its dump options and these are the fields
formatted in the LOGREC entry. Functional recovery routines can also
take dumps by issuing the SDUMP macro. The SDUMP macro uses a
different area for its dump options. You might receive a dump of certain
failures even though the LOGREC 'Dump Characteristics' are zeros.
Check the byte at displacement X'4' into the SDWA. This flag is turned
on if a dump was requested by a SETRP, CALLRTM, or ABEND macro.
As a general rule, ESTAE routines are the most common users of the
DUMP and DUMPOPT operands of the SETRP macro. Since the OC9
abend code in this LOGREC entry was for a problem program (an enabled
RB in contro!), a dump would also be taken if the job had a SYSUDUMP,
SYSMDUMP, or SYSABEND DD statement in its JCL.
Note M: There is a dump associated with this failure because location SDWA+4
(X'80') indicates a request for a dump. This can be seen from the
unformatted record.
Note N: For this entry, the data in the variable recording area (at X'194' under
'Hex Dump of Record') is not formatted under 'User Variable EBCDIC
Data'. This data is formatted by specifying an option (see Note P) in the
individual recovery routine.
Note P: The two bytes of SDWA + X' 190' specify the length of the variable
recording area that starts at X'194'.
In the two bytes at SDWA + X'192': the first byte specifies how the
routine wants its data in the variable recording area printed (X'80' for
unformatted hexadecimal, X'40' for hexadecimal and formatted under
'User Variable EBCDIC Data'); the second byte gives the length of the
data. It is often helpful while reading LOGREC entries to refer to the
SDWA layout in the Debugging Handbook for additional information
about individual bit settings.

2.4.10

OS/VS2 System Programming Library: MVSDiagnostic Techniques

DATE
TIME
CPU
CPU
- - ---",.,
___. __,___
, _______ .____
. _ _ O.\X._Y~_Ht:L.r"f04,_ SS,.~ER..IAL-IQ
RECGRO ENTRV,SOURCE - SOFTWARE
TYPE
PROGRAM CHECK
117 7S
20 31 01 50
023782
0158
__~~~~?-_____~E~R__
RORID=SEQ00055 CPU0040 ASID001B TIME20.31.01.0
JuBNA~~

PELEASE
LEVEL, _ _ __
VS 2 REL. 03

N~01ACP2

ABEI\:DING PROGRAM NAME" ,__,_N/A
NAME OF MUDULE INVULVED
IGG0325A
,_,NA··~E .OF CSECT INVOLVED , _____ IGGC32,5A
FUNCTIO~AL RECOVERY ROUT!NE
IFGCRRCA

B,c...1..C.P_E_P5jL~L.Ll~CO.LEB.B.a1L-

B.L.l1.0DLP_S}f~LA.S_LP..JoL8_ _ __
EE_85Q02.(L5_0CJ~5.a,9,O,_ _ _ __

FFO_409_Q!L!t.9,021CCQ

ReGS AT TIME Of ERROR
-REGSO=-7---cf6003'i30--0-0EB'2E8800cB2EEO
__R.E-'~S _ ?=..1_? _ _ 0 09J>9g_oO_Q.Q.QQQ9Q9~~,~QJLQ 0

00C82CBO
00E63iac
OOOOFFFF
ceo v ~~OO ,2E~~,A__oi~ft2!=J~_8

__E~~~,AT TIME OF ABEND
070COOOO 00021CCO
ADUITIJNAL INfQ:
It-lST_LE~G:rH CODE _ _ _ _ 02
I~TER~U?T CODE'
0009
__
VI.BLApq!LOF T~A~S_,p~,EP_ _09:p_'!!!.Q,4C
REGS OF RB LEVEL UF EST,AE

fX~I~T~O~R~I~E~R~O~F~O~R~ES~T~A~I~

FFfF1FFF
60E6323E

80001E70
00021~~~4=E

_ _ _ _ _ _ _ _ __

EC,?SW FRUM ESTAE Q.B(O FOtt ESTAI)
01OCOOOO 00021CCO
ADDITIONAL INFO:
.1 NSLLENGJH::COD_!;
02"-_ _ _ _ _ _ _ _ _ _ _ _ _ __
INTERRUPT CODE
0009
VIRT ADDR OF TRA~S~E!:.!X~C!:<..!E!:..!P~~0!..!,0u,Da4~4~6~4~0'--_ _ _ _ _ _ _ _ _ __
___________________________________________________

ttEGS 0-7
00003230
OCCB2E88
CCCB2EEO
00CB2CBO
OOE631BC
OOOOFFFF
--r([:GS-9::i-5--0000000'O-'-OCOOOo-oo---6oooocioo---ao003230-000-ZFFCA-oDCB2E88

FFFF1FFF
80001E10
60E63Z3EO"Oo21C4=-e------------

---~CH-FTAGBYTE

~
(")

g'
~

~
~

g.....

n

g
l:!/.

~

;

g
.."

o
e.
.g
(1)

s-

RI/.TCHE" LJ,.lCK
_0 ____________ _
VALI0 SPIN
0
5~~ LeCK
0
0
rUSCAT LOCK
0
IOSCAr LOCK~DRO
OOJaOOOO
UPDATED ~EGS FU~ R~TRY
-F~ E-E-~fcA-bEFG~ E-R ~ TRY--O--ro S uc e L nCK
0
O-S-UC- S---L
i(-";ORO
00--0 J-O-OOO~-----------___________________ IOSLCH _LGCK,_
9__ IOSLCH _lOCKWO&D
OOOOOOOO~_ _ _ _ _ _ _ _ _ _ __
IOSYNCH LOCK
0
IOSYNCH LOCKWORO
00000000
!\oCB LeCK __
O_NCB lOCKwQRD_
00000000_ _ _ _ _ _
DNC8 LeCK
0
UNce LOCKWORD
aaoeoooo
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _---'AC_B(:L~ryLL_QCK
0
AC ODESS~KWORO
OOOOJ.QQ.--l/O~_ _ _ _ _ _ _ _ _ _ __
ASMPAT LOCK
0
ASMPAT lOCK wORD
00000000
______________ SALL_CC __ I-OCK
Q.
ASID CURRENT
0018
CMS LOCK
0
L ~ __________ .
LO~_~_L_L_Q~!<~_ _ _ _ _ _~O~_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __
DUMP CHARACTeRISTICS
DUMP RANGES AREA
DU~'P FLAGS
SDATA GPTIor-;s
PDATA UPTIO~S
FROM
TO
SNAP uUHP REQUEST
0
DISPLAY NUCLEuS
0
DISPLAY SAV~'AREAS
0
RA~GE 1
00000000
OJOOOOOO
-P-i.R:1lISf--sUPPLIEO-------O--OIS-PL-AY-SQA-- -0
D-i-s?L-AY-SAVE-AR-EA-tiEA~o~NGez-000-00o-oo-000oo-0-60
_$_T_JJLAG~_l_lS.L_S_U~PlJEP_ _ _ J)__ DI SP_LA'CLSQ_A
9_-_DI$~L_"Y__R.EGl.Slf~S ______{)_ _R_~_NG_E._3
__-_QQ900Q(Hl_O_QP_Q90J~_Q..~__
J
DISPLAY SWA
0
DISPLAY TASK LPA MODULES
0
RANGE 4
00000000
00000000
_0t e
_-_OJ sP-UY~r..L.lR_ACLUJH__E
0
D I S£...LA Y TASK JP A ~DDU.L00E.....S~..lo!O~_ _ _ _ _ _ _ _ _ _--,_ _ _ __
N""",----Note M
01 SPLAY CCNTRCL BLOCKS
0_
DISPLAY PSW
0
~ _______ ._:.. _______~ _0J S~l,..!LOC~.l.OEl_S
0
01 SPlA_Y_VSE..B_SJJ.ae...DQ_LS _ _ -2
HEX D~MP OF PECJRD
_ _ _rtI;A O_[~_ _ 42 830800__ CC
QQJ2Xl 3._2,_'__01.:> 8 J4A.O _ _ Q5_~_Q.F_O£l_ _C_1 C~Q.1~2~_ _ _ __

i

~

ci

GLOBAL LOCKS TO BE FREED

ADDITIONAL PROCESSING

~

;...

(ic

N

Note

52
em
~

-"-'

_ _ _ _--=O_Q..Q~OOC B_2C 9~5.S 90
OOOO~_'-3...L-Q9~_e 2....,E~8,..,8:<-_ _ _ __
C020
00CB2tEO
OCCS2CBO
OOE631BC
OOOOFFFF
fFFF7FFF
8JJJ7F.70
ooooa~oo
00000000
_ _ _ _ _O~)40_00000QJO _ _ 0000323_Q_ _OPu2f.FC_A_ _aO(:JR~~8
60f;63_VE_ _000_2.1C_4~_ _00CCO_~5_8_ _0.o00000P~_ _ _ __
0060
00000000
OCOOOOOO
070COeOO
00021CCO
00020009
00044640
070COOOO
00021CCO
________ 0)8 J __ OJJ20009___ 00044640_-~_00J03 23Q___ COCB2 E8$
OOCg2EEO_ _OQCB2C~_Q _ _ 00~531_~C___OOOOFS:-ff,_ _ _ _ __
OOAO
FFF~7FFF
800J7E70
00000000
OOOOOOJO
00000000
00003230
0002FFCA
00CS2EdS
_ _ _ _ _-'QQc;._~OE6323E_ _G_CC21C_~E_ _ _
E~Q_O_Q2~~O_Q_OOO_0_QQ_
~_Q(H>-QO_QL_..QO_OAQ;>QQ
Qu_Q9~J_O_;>'L-...Q..;tO_v.000",->O,,--_ _ _ __
O:lEO
OOI)OOOOF
00000000
40040000
00001001) Note
00000000
00CA2F4S
OOOOOJOO
00800000 .
_ _ _ _ _ o1 OQ__OOOOOOOO _ _OCCOOOOO_ _
- _OQ_OQ90QO_ _OQOOoP~tO_ P ___Ooo.oOO_0_0_ _ O.QQ.o_OQQO
_ _ _OQQOQ900
0900000_0"--_ _ _ __
0120
00000000
C<;C7C7fO
F3F2FSCl
C9C7C7FO
'F3F2F5Cl
C9C6C7FO
D909FOCl _ 00CA2EF8
_
0140 OOOJOOOO
00000000
00000000
00000000
~OOOOJOOO
oboooooo
00000000
00000000
-----0160-00COGOOO---eC000000--00000000--0000-0000
~OOJO-O--cio-o6oo-oo--6o-376-cioo---00-HioOOB'------0180
45300000
00000000
OCCCOCCO
OCCCOOOO~6C§018
AOOOO.)OO
00CB2A30
OOEACCAC

~
~

g.

=

..0'
c(p

rA

-----O::...:i-A-0-OOO-o-0-ooO---oo-oooifoo--OO-CA2FSa-c47E"c"ic4
Oleo

50407E40

FCFCC3C2

FCC2FOFO

-----=oi"Eo-oa-oo-ooo~oocjo-Oo-o()ocioo-oo

Figure 2-6.

~

SYS l.LOGREC Software Incident Record 3 (Part 2 of 2)

6BC14DE5
00000000

{4E6C-9~E('C9C5E7

C3:>768(: 1

C4E2C3C2
00000000

C3~4i-4~3

5D7EFJFO
OOOJOOOO

4DC9-D-6-C~2------

___Cpf]40pa

00000000

0000000.",0------

Use of Recovery Work Areas For Problem Analysis (continued)

Important Considerations About SYSl.LOGREC Records
As shown in the three incident records the LOGREC records are mostly SDWAs the
system supplies, plus variable user data areas the individual recovery routines
supply.
Following are some special considerations pertaining to specific portions of
LOGREC entries:

I • . Jobname -

If the jobname is "NONE-FRR", this indicates that the record is
generated by an SRB's FRR (Functional Recovery Routine) or the current
ASCB was invalid.

• "BC mode PSW at Time of Error, of Last RB" - You can ignore these fields.
• "EC PSW from EST AE RB (0 for ESTAI)" - This field has the following
possible meanings:
a. If the ESTAE is associated with an RB level other than the one encountering
the error, this is the PSW at the time that the RB level associated with the
ESTAE last gave up control. Note: If this is the case, the "RB of ESTAE
Not in Control" flag should also be set.
If the ESTAE is associated with the RB level in error, the PSW is equal
to the "EC PSW at Time of ABEND" because the last time the RB level gave
up control was when the error occurred.

b. If the record was generated by an FRR, this is the PSW used to pass control
to the FRR and is therefore the address of the FRR.
c. If the record was generated by an FRR (that is, a locked/disabled routine is in
control, or the system is in SRB mode), and the "EC PSW at Time of
ABEND" is equal to the EC PSW from ESTAE RB, this is a system-generated
record.
• "Regs of RB Level of ESTAE Exit or Zero for EST AI":
a. If the EST AE exit is associated with the RB level that encountered the error,
these registers are the same as "Regs at Time of Error".
b. If the EST AE is associated with an RB level other than the one encountering
the error, then these are the registers at the time that RB last gave up control.

Section 2: Important Considerations Unique to MVS

2.4.13

Use of Recovery Work Areas For Problem Analysis (continued)
c. If this is an FRR-generated record, the two sets of registers are identical.
However, if the FRR or ESTAE has updated the registers for retry, these
registers are the new, updated registers.
• "SVC by Locked or SRB Routine" - This indicator can be misleading.
A forced SVC 13, which is often the way FRR-protected code passes control to
recovery, also causes this flag to be set if the SVC occurred in locked,
disabled, or SRB mode. Although the flag is set, this situation is not a key,
error indication in itself. The analyst must investigate why the issuing routine
invoked SVC 13.
• Error Identifier
This field, as described in recovery termination management (Section 5),
contains pertinent information regarding the error described by this
SYSl.LOGREC entry, and provides a correlation to other SYSl.LOGREC
entries. Related software and MCH records have the same sequence (SEQ)
number that allows the correlation of records written in a particular recovery
path (that is, FRR and/or ESTAE percolation, or MCH and subsequent software
entries). For locked, disabled, or SRB routines, the processor identifier (CPU)
indicates the processor on which the routine was running when it encountered
an error. A zero processor identifier indicates that the record was written by an
ESTAE ,routine (that is, the processor identifier is not uniquely identifiable
because the ESTAE routine may be executing on a processor other than the
mainline). ASID indicates the current ASID at the time of the error. TIME
indicates the tirne that the ERRORID was generated. It is normally very close
to the time that the record was written, as indicated in the first line of the
record. TIME can be used to chronologically order related SYSl.LOGREC
entries that contain the same SEQ number. This ordering is useful in reconstructing the environment as it was at the time of the error.
If an SVC dump is taken, the ERROR ID as it appears if(the SYS1.LOGREC
record, will also appear in the SVC dump output and associated IEA911 I
message. Do not be concerned if the ERROR ID sequence numbers seem to
have an increment of more than one. Although the RTM adds one
to the sequence number of each unique entry (not percolation or recursion),
there may be no associated recording of the error, thus, the sequence number
is updated internally but is not always'externally written.

As shown above, the SYSl.LOGREC data set is a vital tool in debugging. At
times, the information in the LOGREC printout can be used to describe the
entire problem situation. A search of Retain for the CSECT, FRR, and abend
code will often 'identify the problem as a known one.

SYS1.LOGREC Recording Control Buffer
This is one of the most important areas to be used when analyzing problems in
. MVS~ The previous discussion of LOGREC records analysis generally applies to the
in-storage LOGREC buffer as well.

2.4.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Use of Recovery Work Areas For Problem Analysis (continued)

This buffer serves as the intermediate storage location for data that
the recovery process uses after it has completed but before the data reaches
SYSl.LOGREC. The physical I/O is done from this buffer. Its real significance
is in the error history it displays. Also, any records in the buffer that have not
reached SYSl.LOGREC are almost certainly related to the problem you are trying
to solve.

Formatting the LOGREC Buffer
The in-storage LOGREC buffer can be formatted by specifying the LOGDATA verb
under AMDPRDMP. This verb causes the entries still in the buffer to be formatted
in the same manner as those printed from SYSl.LOGREC. For detailed information on how to invoke the AMDPRDMP service aid, see OS/VS2 SPL: Service Aids.

I

Finding the LOGREC Recording Control Buffer
The CVT + X'23C' points to the RTCT (recovery termination control table);
and RTCT + X'20' points to the RTMRCB (LOGREC recording control buffer).
The buffer always resides in SQA on a page boundary, is 4K bytes in length, and is
generally located just beyond the trace table. Scanning the EBCDIC portion of
the dump following the trace table usually leads you to a series of module/job
names that are part of the individual records.

Format of the LOGREC Recording Control Buffer
The LOGREC recording control buffer is a "wrap-table" similar to the MVS trace
table. The entries are variable in size. The latest entries are the most significant
especially if they have not yet been written to SYSl.LOGREC. Knowing the areas
of the system that have encountered errors and the actions of their associated
recoverY routines, information obtained from SYSl.LOGREC and the LOGREC
recording control buffer, helps provide an overall understanding of the
environment you are about to investigate. Figure 2-7 shows the format of the
buffer and Figure 2-8 shows the format of individual records \Yithin the buffer.

Section 2: Important Considerations Unique to MVS

2.4.15

Use of Recovery Work Areas For Problem Analysis (continued)

o

4

RCBBUFB
start of
record
area

t

RCBBUFE
end of
record
area

t

E.

C

8
RCBFREE
next
available
space

t

RCBFLNG
number
of bytes
available

X'40'

10

RCBDUM
Dummy
Displacement

I
SRB used to post
Recording Task in Master (
Address Space in order to )
write record to
(
SYS1.LOGREC

X'50'

Missing Record Header - This
record shows the number of times
space was requested but was not
available.

X'58'

Processor serial number

X'59'
LCNT
Missing
record
count

X'5E'

(

FLGS
SRB in use
flag

RCBTLNG
Total buffer length

)

I
\

If the record contains a counter or is present in SYS1.LOGREC, you have a good indication of
a recovery loop.

X'60' = first possible record header

Figure 2-7. Format of the LOGREC Recording Control Buffer

Record Header
2

o

Length
of
Record

Record
Types

Record Type -

Options

4

3

6

8
ASID
for
POST

Options

10

C

ECB

Reserved

Actual
Record

X'80'

This record wraps around from the end of the buffer space back
through the beginning.

X'40'

This record is to go to SYS1.LOGREC.

X'20'

This record is a WTO.

- X'08'

Record not buffered; the address ofthe record exists at X'10.'

X'04'

The recording requestor is to be posted when the record is written.

X'01'

Record is ready to be written. If not set, the record is still being
constructed.
.

.

Note: The beginning of the actual record + X'20' is the start of the SDWA for software
reoords. The SDWA contains software diagnostic information at the time of the error
and is mapped in the Debugging Handbook.

Figure 2-8. Format of Records, Within the LOGREC Recording Control Buffer

2.4.16

OS/VS2 System Programming Library: MVSDiagnostic Techniques

Use of Recovery Work Areas For Problem Analysis (continued)

FRR Stacks
The FRR (functional recovery routines) stacks are often useful for understanding
the latest processes on the processors. Entries are added and deleted dynamically as
processing occurs. The PSA + X'380' contains the pointer to the current stack.
The format is described in Data Areas section of the Debugging Handbook under
FRRs. Experience has shown that the normal stack (located at X'COO' in each.
PSA) is perhaps the most useful, although all stacks have been beneficial on
occasion.
The FRR stack +X'C' points to the current recovery stack entry. (Unless the
FRR stack +X'C' matches FRR stack +0, in which case no recovery is present on
the stack.) This entry +0 points to the recovery routine that is to gain control in
case of error. The entry +4 contains flags used for RTM processing; a X'80'
indicates this FRR is currently in control, a X'40' indicates a nested FRR is
currently jn control. The next 24 bytes serve as a work area for the mainline
function associated with the FRR pointed to by this entry. This parameter area
may contain footprints useful to your debugging efforts. The previous entry in the
stack (X'20' bytes in front of the current) represents the next most current
recovery routine. Only the current and previous entries are valid. The stacks do
contain residual information associated with recovery that was previously active but
is no longer valid. You should not rely on any information beyond the current
entry.
Also consider the case where:
A gains control and establishes recovery;
A passes control to B;
B establishes recovery, performs its function, deletes recovery, and passes
control to C;
C establishes recovery and subsequently encounters an error.
The FRR stack will contain entries for module A's and C's recovery routines.
There is no indication from the FRR stack that B was ever involved in the process
although it might have contributed to or even caused the error. The debugger gains
an insight into the process but is not presented with the exact flow. Although you
can get an idea of the general process or flow, do not make assumptions based
solely on the FRR stack con tents.
If you have trapped a specific problem, the stacks often contain valuable
information. The same is true of a stand-alone dump taken because of a suspected
loop. If RTIW +0 (at FRR stack +X'lO') is not zero, the FRR stack contains
current, valid data. Following are some of the more valuable fields in the FRR
stacks from a debugging viewpoint:

Section 2: Important Considerations Unique to MVS

2.4.17

Use of Recovery Work Areas For Problem Analysis (continued)
1. FRR stack + X'10' - RTM 1 work area (RTIW)
In the case of an error, the RTIW + 2 (FRR stack + X'12') field indicates the
error type as follows:

2
3
4
5
10
11
12
13
15
20

-

program check
restart key
SVC error (SVC was issued while in locked, disabled, or SRB mode)
DATerror
machine check
paging I/O error
abnormal termination
branch entry to abnormal termination (compatibility interface)
cross memory abnormal termination
memory termination
MCH (machine check handler)

2. RTIW + X'34' (FRR Stack + X'44') - address of system diagnostic work area
(SDWA)
If no pointers can be found, the SDWA for each supervisor FRR stack can be
found at X'20' bytes past the start of the last entry in the respective stack.
(FRR +4 points to the last entry.) The SDWA for disabled errors on the normal
stack is at X'330' bytes past the start of the last entry on the restart stack.
(pSA +X'3B8' points to the restart stack.)
3. RTIW + X'40' (FRR stack + X'50') - mode at entry to RTMI
X'80'
X'40'
X'20'
X'10'
X'08'
X'04'
X'02'
X'O l'

-

supervisor control mode (pSASUPER=t=O)
phYSically disabled mode
global spin lock held
global suspend lock held
local lock held
Type 1 SVC mode
SRB mode
- unlocked task mode

This is the system mode at the time of entry to RTM 1. The mode may
change as processing continues through recovery; the current mode is at RTIW
+ X'41 ' (FRR stack + X'51 ').

2.4.18

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Use of Recovery Work Areas For Problem Analysis (continued)

Extended Error Descriptor (EED)
The extended error descriptor (EED) passes error information between RTM 1 and
RTM2 and also between successive schedules of RTM 1. The EED address is found
at RTIW + X'3C' (FRR stack + X'4C'), at TCBRTM12 (TCB + X'104'), or in the
RTM2 SVRB at X'7C'. The EED is generally not present because RTM2 releases
it early in its processing. The EED is described in the Debugging Handbook as part
of the RTIW. Important EED fields are:
EED+O

pointer to next EED

EED + 4 (byte 0) - description of contents of the rest of the EED
BYTE 0 = 1 - software EED
2 - dump parameters
3 - hardware EED
4 - errorid EED
For a software EED:
EED + X'C' - registers 0-15
EED + X'4C' - PSW/Instruction Length Code (ILC)/Translation Exception
Address (TEA) at time of error

RTM2 Work Area (RTM2W A)
This is the work area used by RTM2 to control abend processing. Registers,
PSW, abend code, etc. at the time of the error are recorded in the RTM2WA.
This area is often useful for debugging purposes and is described in the Debugging
Handbookby RTM2WA. This work area can be found through TCB + X'EO',
or RTM2 SVRB + X'80'.

Formatted RTM Control Blocks

I RTM control blocks are formatted either by AMDPRDMP as a TCB exit with the
FORMAT, PRINT CURRENT, and PRINT JOBNAMES control statements, or with
the ERR option under SNAP/ABEND. With the exception of the RTCT, the
formatted control blocks are all TCB-related, and are formatted only when they are
associated with the TCB. The formatted control blocks are:
• RTCT (recovery termination control table) - formatted with the first TCB of
the current address space on the processor on which the dump was initiated.
(This control block is formatted only by AMDPRDMP.)
• FRRS (functional recovery routine stack) - has the RTI W embedded within
it and is formatted with the current TCB if the local lock is held. (This control
block is formatted only by AMDPRDMP and it is mutually exclusive of the IHSA).

Section 2: Important Considerations Unique to MVS

2.4.19

Use of Recovery Work Areas For Problem Analysis (continued)
• IHSA (interrupt handler save area) - has the FRR stack saved within it and is
formatted with the TCB pointed to by the IHSA, if the address space was
interrupted or suspended while the TCB was holding the local lock. (This
control block is formatted only by AMDPRDMP and it is mutually exclusive of
the FRRS) .
•. RTM2WA (RTM2 work area) ...:. fonriatted if the TCB pointer to it is not zero.
• ESA (extended save area of the SVRB) bit summary - formatted only if the
RTM2WA formatted successfully and the related SVRB could be located.
• SDWA (system diagnostic work area) - formats the registers at the time of error
only if the ESA formatted successfully and the SDWA could be located.
• EED (extended error descriptor block) - formatted if the TCB or RTIW pointer
to it is not zero.
•

SCB (STAE control block) - formatted under AMDPRDMP for abend tasks
only. It is formatted under SNAP/ABEND whenever the TCB pointer to it is
not zero.

, System Diagnostic Work Area (SDWA) Use in RTM2
This work area is used to pass information to ESTAE recovery routines·. It is
found by: SVRB + X'80' points to RTM2WA; RTM2WA + X'D4' points to
SDWA. Also, register 1 contains the address of the SDWA when the recovery
routines are entered.

2.4.20

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Effects Of Multiprocessing On Problem Analysis

The multiprocessing (MP) capability of MVS allows two processors to share real
storage using one control program. (MP refers to multiprocessing on both multiprocessors and attached processors.) MVS also functions on a uniprocessor configuration, which may be only one processor configured out of what is otherwise
an MP system. In MP mode, each processor has addressability to all of main
storage and executes under the control of one set of supervisor routines.
Because various queue structures must be processed in a serial fashion, interlocking facilities are implemented in both the hardware and software to allow
serialization of portions of the control program where conflicts may arise. Queue
structures that don't require serialization are processed in parallel, that is, without
regard to the other processor.

Features of an MP Environment
The main features of a multiprocessing configuration are:
PSA - Each processor has a unique real storage frame, called a prefixed save area
(PSA), referenced with addresses from 0 to 4K. Its lOcation in real storage is
determined by the processor's prefix register.
Inter-Processor Communication - Malfunction alerts (MFA) are automatically
generated by failing processors before entering the check-stop state. Other interprocessor signaling is accomplished with the SIGP instruction. (This feature is
discussed in detail later in this chapter.)
VARY Command - Performs three functions: (1) dynamically add or remove a
processor from the configuration; (2) dynamically increase or decrease the amount
of useable real storage; (3) control the availability of channels and devices.
QUIESCE Command - Quiesces the system so that I/O pools or two channel
switches or both can be reconfigured~
locking - Access to various supervisory services is serialized by means of a
software locking structure.
Dispatching - Assures that highest-priority ready work is processed by available
processors.
PTLB (purge translation lookaside buffer) - When an entry is to be invalidated
in a page or segment table, the translation lookaside buffer (TLB) on every
processor must be purged before permitting subsequent references to the
corresponding virtual address.
Timing - The TOD clocks must be synchronized among the configured processors.

Section 2: Important Considerations Unique to MVS

2.5.1

Effects of Multiprocessing On Problem Analysis (continued)
RMS - When components of the hardware operating system fail, it becomes the
responsibility of the recovery management support (RMS) to help define the
extent of the damage.
Compare and Swap - Two instructions assure interlocked update operations. They
are Compare and Swap (CS) and Compare Double and Swap (CDS). References to
storage for these instructions are interlocked the same way as the Test and Set (TS)
instruction.
lOS - lOS has the ability to initiate I/O activity to a device from whichever
processor has an available path.
ACR - When one processor fails in an MP configuration, the alternate CPU
recovery (ACR) function attempts to take the failing processor offline so that
system operation can continue with the remaining processor. (See Miscellaneous
Debugging Hir:tts).
CPU Affinity - The ability to force a job step to execute on a particular processor
is a feature of MVS. (For example, because an emulator feature is generally
installed on only one of the processors in an MP environment, processor a:ffmity
will force the execution of programs that require this feature to the proper
proc~ssor.)

MP Dump Analysis
Experience with MVS has shown that there are comparatively few bugs unique to
MP. Usually, problems encountered in an MP environment could also be discovered
in a UP environment. The increased in~eraction (parallelism) between software
components in an MP environment tends to increase the probability of hitting bugs
that are not unique to MP. Thus, the. odds are that the dump you are trying to
debug could also occur on a UP configuration.
The first step of MP dump analysis is to determine conclusively that it is an MP
dump. To do this, you must find the common system d~ta area (CSD). The
CSD address is located at offset X'294' in the CVT. The halfword CSDCPUOL, at
offset X'A' in the CSD, gives the number of processors currently active. If this
number is two, you are looking at an MP dump. For the rest of this discussion, we
will assume that CSDCPUOL=2.
Several other fields in the CSD are informative. For example, the byte
CSDACR at offset X' 16', indicates whether or not ACR is in progress. ACR in
progress (X'FF' in CSDACR) indicates that one of the processors in the configuration
is becoming inactive. If this is the case, the problem may be the result of a failure
during ACR processing, and the MP dump will probably present at least two
problems:

2.5.2

1.

A hardware failure causing ACR to be invoked.

2.

A failure during ACR processing. (See the discussion on ACR processing in the
"Miscellaneous Debugging Hints" chapter later in this section.)

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Effects of Multiprocessing On Problem'Analysis (continued)

Data Areas Associated With the MP Environment
There are several processor-related areas with which you should be familiar:
1. The PCCA (physical configuration communication area)
2. The LCCA (logical configuration communication area)
3.

The PSA (prefixed save area)

There is a set of these control blocks for each processor located as
follows:
CVT + X'2FC' points to the PCCAVT (contains the address of a PCCA for each
processor)
CVT + X'300' points to the LCCAVT (contains the address of an LCCA for each
processor)
PCCA + X'18' points to the virtual address of the PSA for that processor
PCCA + X'1 C' points to the real address of the PSA for that processor
The PSA is the "low storage area" (first 4K bytes of storage) and it contains,
among other things, the hardware-assigned storage locations. System/370 Principles
a/Operation details the prefixing mechanism the hardware uses to reassign a block
of real storage for each processor to a different block in absolute main storage.
Prefixing permits processors to share main storage and operate concurrently.
The PCCA contains information about the physical facilities associated with its
processor, the LCCA contains save areas for use by the first level interrupt handlers
(FLIHs). The need for processor unique areas arises, for example, because external
interrupts could occur simultaneously on each processor, and therefore a processorrelated area must exist for status saving by the external FLIH. Such areas are in the
processor's LCCA. After locating these control blocks, you can determine several
' IIthings about the status of each processor.
• The PSWs at the time of the last program, I/O, SVC, external, and machine
check interrupts for each processor (PSA)
• The general purpose registers at each interrupt (LCCA)
• The mode (SRB or task) of each processor (LCCA)
• The last program interrupt on each processor (PSA)
• The address of the device causing the last I/O interrupt on each processor
(PSA)
In addition, a work/save area vector table (WSAVTC) pointed to at LCCA +
X'218' is associated with each processor. This vector table contains pointers to
processor-related work/save areas. For example, there is a large save area for use by
, ACR, whiqh is pointed to in the processor's WSAVTC. It is important to be aware
of the existence:of these processor-relatedareas because GTF, SRM, ACR, lOS, etc.,
use them; but you must narrow your problem to one of these processes (such as GTF,
SRM, etc.) before the information in the associated work/save areas become helpful. _

.Section 2: Important Considerations Unique to MVS

2.5.3

Effects of Multiprocessing On Problem Analysis (continued)

Parallelism
The most important characteristics of the MVS MP capability is parallelism. In
looking at MP dumps, you must always remember that any two processes run in
parallel and reference the same main storage locations. As a result, queue structures
and common data areas are vulnerable. In order to preserve their integrity, the
system must insure that they are accessed serially. The resources that must be
serialized in order to guarantee their integrity are called serially reusable resources I
(SRRs). The use of shared resources is undoubtedly the key item to be kept in
mind in debugging an MP dump. There are various mechanisms available for
serializing SRRs:
•
•
•
•
•
•
•
•

ENQ/DEQ
WAIT/POST
Disablement
Locking
Compare and Swap (CS) instruction
Non-dispatchability
Test and Set (TS) instruction
RESERVE/RELEASE

Obviously all users of a particular SRR must use the same serialization
mechanism. The integrity of an SRR is not enhanced if one user employs locking
and another uses ENQ/DEQ. The point is to understand the processes going on in
both processors at the time of the failure. The processor on which the failure occurred
may not be the one that caused the problem.
Use of the work/save areas pointed to from the ASXB is a good example. These
areas are serialized with the local lock . The following diagram shows what could
happen if the same address space is running on both processors and one of the
processes involved fails to serialize properly.
PROCESSOR 0

PROCESSOR 1

•
•
•

•
•
•Gets local lock

•Branch enters validity check routine
•
•

Branch enters validity check routine
Releases local lock

•
•

In this example, assume that the process executing on processor 0 fails to get the
local lock before it branch enters the system validity check routine. The validity .
check routine uses the local lock to serialize one of the save areas mentioned above
in order to save the caller's registers. The registers saved by the validity check
routine on processor 1 can be overlaid by the registers saved by the validity check
routine on processor O. Thus, the failure would be encountered on processor 1,
but the processor 0 process would be the one that caused the failure.

2.5.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Effects of Multiprocessing On Problem Analysis (continued)
OIINI (OR Immediate and AND Immediate) instructions also illustrate this
phenomenon. These instructions take more than one machine cycle to complete
(that is, the operand is fetched, altered, and then stored). In previous operating
systems, physical disablement and UP environments were enough to insure the
completion of one instruction before another was executed. In MVS, with multiple
processors, this is no longer true.
For example, suppose processor 0 issues 01 and the operand has been fetched.
Before processor 0 stores the changed byte, processor 1 executes the fetch cycle of
an NI instruction to change a different bit in the same byte. Now, processor 0
stores the original status plus the 01 change; subsequently the NI instruction completes, which erases the effect of the 01 on the same byte. In MVS, locking is used
to solve some of the problems arising from such multi-cycle instructions. When
locking is not an appropriate solution, the CS instruction is. CS serializes the word
containing the byte against the other processor. The point is that in debugging an
MP dump, both processors must be considered because interaction between
processes and shared resources is generally-the key to solving the problem.
When a program serializes an SRR incorrectly, other programs can alter the SRR
before the first program completes its update. The other programs may be running
on the other processor, or they may have received control .on the same processor
because the first program was pre-empted (for example, SRB suspension because of
a page fault) before completing its update. Proving that a problem resulted from
incorrect serialization is accomplished by fmding both the "other" program and the
window (an interval in which a program opens a serialization exposure is called a
window) .
. The system trace table can sometimes be used to find potential "other"
programs. If the occurrence of the error has not been overlaid in the trace table,
it may be possible to reconstruct the series of events leading up to the failure by:

1. Listing all events on that proces'sar, in order, using the logical processor
address field in each event's trace entry
2. Making a similar list of all of the events on the other processor
3. Comparing the two lists to see if the processes executing in parallel
on the processors are altering a common resource
Try to relate these two processes to the serialization problem that caused the
dump. The existence of the window is confirmed by reading the code that alters
the state of the SRR and finding where the two programs serialize improperly.

Section 2: Important Considerations Unique to MVS

2.5.5

<

Effects of MultiproceSSing On Problem Analysis (continued)

General Hints For MP Dump Analysis
The following is a list of general hints to help you analyze an MP dump.
1. The use of PRIORITY and DPRTY parameters no longer ensures the order
in which tasks are dispatched. First; the SRM, when attempting to handle
resources, can allow a task or job with a lower DPRTY to run prior to a job
with a higher priority. Second,ils the dispatcher dispatches tasks to both
processors, tasks of different priority may be executing on both processors
simultaneously.
2. The CHAP (change priority) SVC does not ensure that tasks are dispatched in
the expected order when dispatching on two processors.
3. Attached tasks can execute at the same time as the mother task on different
processors. Therefore, if both tasks reference the same data, serialization of
the data is required:
4. Any references made to system control blocks that change dynamically after
IPL must be serialized to preserve the integrity of the data. The serialization
technique for the data item must match that employed by the system.
5. Tasks can be redispatched on a different processor from the one on which they
were previously operating. Therefore, do not use storage from 04K because
redispatch on a different processor results in different data being referenced.
6. Ifsubpools are shared between tasks, users must serialize the use of any data in
the sub pools c'ommon to the two tasks.
7. SRBs can be dispatched on either processor unless they are scheduled with
affinity for a particular processor.
8. Asynchronous appendages can operate Simultaneously with the task on the
other processor. ,
9. Recovery routines can run on either processor, not necessarily the one on
which the error was detected.

I 10. STATUS STOP does not prevent SRBs from being added to the local queue;
it merely quiesces the address space after any currently executing or suspended
SRBs have completed.
11. When access methods allow sharing of data sets between tasks in the same
address space, access to the data sets must be serialized between the tasks.

2.5.6

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Effects of Multiprocessing On Problem Analysis (continued)

Inter-Processor Communication
MVS uses the inter-processor communication OPC) function in doing its interprocessor related work. The IPC function uses the SIGP (Signal Processor) instruction to provide the necessary hardware interface between the MP-configured
processors. This instruction provides twelve distinct functions. Two of these
functions are augmented by the control program to request services of the other
processor; external call (XC) and emergency signal (EMS) which are SIGP codes
02 and 03, respectively. Thus, there are two classes of IPC services:

1. Direct - These services are defined for those control program functions that
require the modification or sensing of the physical state of one of the processors. Ten of the twelve SIGP functions are defined as IPC direct services:
Function
sense
start
stop
restart
initial program reset
program reset
stop and store status
initial microprogram load
initial processor reset
processor reset

Function Code
01

04
05
06
07

08

09
OA
OB
OC

Note: Codes OA, OB, and OC are not valid on a Model 158.
2. Remote - These services are defmed for those control program functions
that require the execution of a software function on one of the processors.
The two remaining SIGP functions, external call (XC) and emergency signal
(EMS), provide the hardware interface and interruption mechanism to initiate
the desired program on the proper processor. The remote service function is
provided in two categories:
• Pend able service - uses the XC function of SIGP
• Immediate service - uses the EMS function of SIGP
When processor A issues a SIGP (XC or EMS) instruction to processor B, a
request for an interrupt becomes pending in processor B for the external
class. If external interrupts are disabled in the current PSW for processor B,
the interrupt is not taken. If the PSW for processor B is enabled, then
separate mask bits for XC and EMS are interrogated in control register O.
Interrupts are taken one at a time for those requests enabled in the control
register. If processor B is disabled, processor B keeps pending at most one
XC and one EMS request. XC requests can pend simultaneously. Each
specific XC request is encoded in a physical configuration communication
area (PCCA) buffer associated with the receiving processor.
Both the direct and remote services may be used to initiate the desired
function on any of the processors phYSically attached via the MP feature,
including the processor the request is initiated on.
Section 2: Important Considerations Unique to MVS

2.5.7

Effects of Multiprocessing On Problem Analysis (continued)

Direct Services
The direct service function consists of a macro instruction (DSGNL) and a SIGP
issuing routine (IEAVEDR); The DSGNL macro generates an in-line sequence of
instructions that:
1.

Loads general register 0 with one of the ten SIGP function codes used to
perform the desired hardware action

2.

Loads general register 1 with the address of the specified processor's physical
configuration communication area (PCCA)

3.

Loads general register 15 with the address of lEAVEDR

4.

BALRs 14, 15

Upon return from lEA VEDR, register 15 contains a return code indicating the
status of the request. If the return code is 8, register 0 contains sense information
about the receiving processor as shown in Figure 2-9.

Return Code of 8:

Register 0
Bit

o
1-23
24

25
26
27
28

29
30
31

Meaning
Equipment check
Reserved
External call pendi ng
Stopped
Operator intervening
Check stop
Not ready
Reserved
Invalid order
Receiver check

The other return codes are:

o-

SIGP instruction successfully initiated. The function is not necessarily
completed upon return to the caller.

4 - SIGP function not completed because path to the addressed processor
was busy or the addressed processor was in a state where it could not accept
and respond to the function code.
12 - Not operational, that is, the specified processor is either not installed
or is not configured into the system or is powered off.
16 - SIGP unsuccessful. Processor is a uniprocessor and does not have SIGP
sending and receiving capabilities.

Figure 2-9. SIGP Return Codes

2.5.8

OS!VS2 System Programming Library: MVSDiagnostic Techniques

Effects of Multiprocessing On Problem Analysis (continued)

Remote Pendable Services
The remote pendable services function (external call) consists of a macro
instruction (RPSGNL) and a routine (lEAVERP) which are used to invoke the
execution of a specified program on a specifc processor. This service is used by
supervisor state, zero protection key functions that are not dependent upon the
completion of the specified service in order to continue their processing~ The
RPSGNL macro generates an in-line instruction sequence that:
1.

Loads register 0 with a code identifying one of the services to be initiated

2.

Loads register 1 with the address of the PCCA of the processor on which the
service is to be initiated

3.

Loads register 15 with the address of IEAVERP

4.

BALRs 14, 15

Upon return, register 15 contains a return code. If the return code is 8, register 0
contains sense information (see Figure 2-9). There are currently six functions that
can be initiated via external call:
1.

Switch - specifies that the service routine (lEAVEMSI) used by the memory /
task switch function is to be executed.

2.

SIO - specifies that the IDS start I/O routine (IECIPC) is to be executed on
the specified processor.

3.

RQCHECK - specifies that the timer supervisor TQE check service routine
(IEAPRQCK) is to be executed. This routine ensures that the top TQE on the
real-time queue is being timed.

4.

GTFCRM - specifies the GTF service routine (AHLSTCLS) that modifies the
Monitor Call (MC) control registers is to be executed.

5.

MODE - specifies the recovery management services (RMS) service
routine (IGFPEXI2) that modifies the RMS oriented control registers is to be
executed.

6.

MFITCH - specifies that the MFI service routine (pointed to by
CVT + X'320') is to be executed. This routine executes TCH (Test Channel)
instructions on the processor to which the channels are attached.

Section 2: Important Considerations Unique to MVS

2.5.9

Effects of Multiprocessing On Problem Analysis (continued)
The remote pendable seIVices routine (IEAVERP) sets the appropriate code in the
external call buffer of the receiving processor's PCCA (offset X'84') as follows:
SWITCH
SIO
RQCHECK
GTFCRM
MODE
MFITCH

X'80'
X'40'
X'20'
X'IO'
X'04'
X'02'

Then IEAVERP sets the external call (XC) function code (X'02') in register 0 and
uses the DSGNL macro instruction to cause the SIGP instruction to be issued.
The receiving processor will take an external interrupt when it becomes enabled
for such interrupts. The external FLIH determines that the interrupt was an XC
and passes control to the XC SLIH. The XC SLIH locates the XC buffer (X'84') in
his PCCA, determines the function requested, and branches (BAL) to the
appropriate routine. Refer to Figure 2·10 for the XC process flow.

Remote Immediate Services
The remote immediate seIVices function consists of a macro instruction, RISGNL,
and a routine, IEAVERI, which are used, like the remote pend able services, to
cause the execution of a specified program on any of the online MP·configured
processors. However, the immediate seIVice differs from the pendable seIVice in
two important ways:
•

The processors in an MP configuration are enabled for the emergency signal
(EMS) interrupt at times when the processors are not enabled for the external
call interrupt. In particular, EMS interrupts are enabled when the processor
is in the "window spin" state in which all other asynchronous interrupts
(except machine check and malfunction alerts) are disabled. This "window
spin" state is entered by a routine, such as the lock manager, when a point is
reached in its processing that requires an action on the other processor in order
for processing to continue. The "window spin" state specifically allows either
the malfunction alert or EMS interrupts that are used to trigger the alternate
CPU recovery (ACR) function to be accepted and processed.

•

An immediate seIVice routine can be requested to execute serially or in parallel
with the function requesting the seIVice. That is, IEAVERI will spin while
waiting for the designated processor to signal either that the receiving routine
has completed execution(serial) or that the receiving routine has been given
control (parallel).

Some of the functions that can be initiated via EMS are:

2.5.10

•

HIO - A Halt I/O command is issued to the designated device by the
receiving processor.

•

ACR Function - The receiving processor helps the sending processor from a
failure by al ternate CPU recovery procedures.

OS/VS2 System Programming Library:MVS Diagnostic Techniques

Effects of Multiprocessing On Problem Analysis (continued)
•

Clock Synchronization - TOD clocks are adjusted so the same value is in
each clock.

•

PfLB - The receiving processor purges its translation-Iookaside buffer (TLB).

The remote immediate services macro, RISGNL, generates an in-line sequence of
instructions that:

1.

Loads register 0 with the PARALLEL/SERIAL indication

2.

Loads register 1 with the address of the PCCA of the processor on which the
service is to be executed

3.

Loads register 11 with the address of a parameter list to be passed to the
service routine

4.

Loads register 12 with the entry point address of the service routine to be
executed

5.

Loads register 15 with the address of lEAVERI

6.

BALRs 14, 15

As for direct and remote pendable services, upon return register -15 contains a
return code. Register 0 contains sense information in case the return code was
eight. (See Figure 2-9).
lEAVERI builds the emergency signal buffer in the sending processor's own
PCCA at offset X'88', sets the EMS function code X'03' in register 0, and issues
the DSGNL macro to cause the SIGP to be issued. The receiving processor will
take an external interrupt when it becomes enabled. The external FLIH determines
that the interrupt is an EMS and routes control to the EMS SLIH. The SLIH
locates the EMS buffer of the sender and, for a parallel request, the SLIH turns
off the parallel bit and calls the receiving routine. For a serial request, the receiving
routine is given control, and, upon completion, the serial bit is turned off. During
this interrupt handling process, the sending processor was in the window spin state
until the serial or parallel bit was turned off. Figure 2-11 'shows the EMS process
flow.

Section 2: Important Considerations Unique to MVS

2.S.11

SENDING PROCESSOR
Invoked via Macro
(See Below)

1. Disables (STOSM)
External and 10 Interrupt
Set up (see Note 1.1

Input Registels
RO
. R1

Function Code

2. Is Receiving Processor
Online?
No
Yes

Receiving Processor's
/PCCA

R 14

Return Address

R15

IEAVERP EP

IEAVERP

X'SO'
X'40'
X'20'
X'10'
X'OS'
X'04'
X'02'

3. Turns On External Call's
Sub·Function Code in
External Call's Buffer In
Receiving Processor's
PCCA. (Compare and
Swap On)
4. Sets External Call
Function Code, X'02' In
Reg 0

IEAVEDR

5. Issues DSGNL (0), (1)

Disables (STOSM)
External and I/O Interrupts
Set up . see Note 1.

6. Checks Return Codes.
If R.C. = S and Status
is External Call Pending,
Set Return Code = O.

Establishes SIGP Registers
a. Physical Processor Address
R2 = PCCACPUA based on R1
b. Establishes Parameter Register
R1 = 0
c. Establishes Function Code
R3= RO
SIGP R1, R2, 0 (R3)
3. Checks Condition Code
CC2 - Busy - Retry (2)
CC1 - Eq. Chk, Operator Intervention
Receiver Check - Retry
Within Limits
CC1 - All Others - R.C. S
CC3 - R.C. S (See Note 3.)
CCO - R.C. 0

Return Registers
RO
R14
R 15

Status Bits
Return Address
Return Code

Note: R.C. S means
status bits are set in
Register O.

4.

Input Registers
RO
R1

Function Code
= X'02'

Returns to
IEAVERP

Receiving Processor's
PCCA

R14

Return Address

R15

IEAVEDREP
Entry Point

Return Registers
RO
R14

!

R 15

RPSGNL

~

SWITCH

l

,PROCESSOR = j PCCA Entry .Address l

1

(1)

f

(0)

Figure 2-10. External Call (XC) Process Flow (Part 1 of 2)

2.5.12

Status Bits
Return Address
Return Code

Note: R.C. S means
status bits are set in
Register 0

IEAVERP Invoked via RPSGNL Macro Expansion:

SIO
RQCHECK
GTFCRM
MODE
MFITCH

Restores Caller's Status and
Returns to Caller

OS/VS2 System Programming.Library: MVS Diagnostic Techniques

(To Part 2)

RECEIVING PROCESSOR
External FLiH

(Fr om Pa rt 1)
JIo..

A

'"

Determines If
Interrupt
IsAn
External
Call

I
Input Registers
R2

FLIH Return
Address

R10

Ext. Call SLIH
Entry Address

"I

..

External Call SLI H

1. Turns On Active Bit
2. Locates External Call Buffer
PSA ~ PCCA

3. If Buffer Equals 0,
Returns ,to F LI H
4. Determines Subfunction
Requested Compare and
Swap Bit Off and Bal 14
to Appropriate Routine:
X'SO'SWITCH
X'4Q'SIO
X'20'RQCHECK
X'10' GTFCRM
X'OS'RESERVED
X'04' MODE
X'02' MF ITCH

IEAVEMS1
IECIPC
IEAPRQCK
AHLSTCLS

Appropriate
Routine

...
...

...
.....

IGFPEXI2
CVTMFRTR%

5. Turns Off Active Indicator and
Returns To External FLiH - BR 2

Notes:
1. T urns on active indicator
Saves callers registers
Establishes addressability
2. Disables/Enables Spin
1. Turns on SPIN indicator
2. Enables for MFA and emergency signal interrupts
3. Disables
4. Turns off SPIN indicator
3. If CC = 3 and yet the processor is logically online, a SIGP
hardware failure may exist. A "Soft ACR" option is
available to the system operator to reconfigure to a
UP system.

Figure 2-10. External Call (XC) Process Flow (part 2 of 2)

Section 2: Important Considerations Unique to MVS

2.5.13

SENDING PROCESSOR
See Macro Below
IEAVERI

Input Registars
RO
R 1 ...R-e-ce-i-Vi-ng-P-ro-c-es-so-r-'s......
PCCA
R11
R12
R14

~-------I

~----~--~----~

R15 ~--------------~

1. Disables (STOSM)
External and 10 Interrupts
Sets up (see Note 1J
2. Is Receiving Processor Online ~
Yes
NO .... RC=4.~

Bit 0 - Parallel
Bit 1 - Serial
Bit 31 - RMS Indicator

(To

Receiving RoutIne's
Entry Point
3. Builds Emergency Signal Buffer in
Own peCA.
a) Turn On Parallel or Serial
Indicator.
b) Place Receiving
1) Routine's EP
2) Routine's Parameter Address
3) Processor's Address
In The Buffer
4. Sets Emergency Signal Function
Code, X'03' In Reg O.
Issue DSGNL (0), (1)
5. Checks Return Codes:
Unsuccessful
Successful

6. Serial
Request

Parallel
Request

Spin Until Serial
Bit Is Off.
Note 2

Spin Until Parallel
Bit is Off.
Note 2

7. Restore Caller's Status and
Returns To Caller

X'03'

I

Status Bits

Return Address
Return Code

Input Registers
RO

Function Code =
X'03'

R1

Receiving Processor's
PCCA

R14
R15

~-------f

Figure 2·11. Emergency Signal (EMS) Process Flow (part 1 of 2)

2.5.14

Emergency Signal Buffer
(In Sending Processor's PCCA)

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Receiving Routine's
Par~meter Address

IEAVEDR
1. Disables (STOSM)
External and 1/0 Interrupts
Sets up (see Note 1,)
2. Establishes SIGP Registers
a. Physical Processor Address
R2 = PCCACPUA based
on R1
b. Establishes Parameter
Register
R1=O
c. Establishes Function Code
R3=RO
SIGP R 1. R2. 0 (R3)
3. Checks Condition Code
CC2 - Busy - Retry (2)
. CC1 - Eq.Chk.Operator
Intervention Receiver
Check - Retries Within
Limits
CC1 - All Others - R.C.S
CC3 - R.C. 12 (See Note 3)
CCO - R.C. 0

(To Part .

/
\

RECEIVING PROCESSOR
(From Part 1)

External FLIH

~~----------.~!r------.
Determines
Interrupt
Is An
Emergency
Signal

Input Registers
R2

FLIH Return
Address

R10

EMSSLIH
Entry Address

E'YIergancy Sillnal SLIH
1. Turns On Active Bit
2. Locates EMS Buffer of Sender
CVT ... PCCA VT (Processor TO) .... PCCA

B 1===========1 :=:================~
3. If RMS Indicator On,

(From Part 1)

~alls

ACR

...

ACR

~

4. If Receiving Processor 10 Equals This
Processor 10, Returns to FLIH.

Receiving Routine

5. Determines If This Is
Serial

Input Registers
R1
R14

Parallel:

or

Calls Receiving
Routine

Tums Off Parallel Bit

Turns Off
Serial Bit.

Calls Receiving
Routine

Parameter Address
Return Address

R 15 Receiving Routine's
Entry Address

6. Turns Off Active Indicator

R14

7. Returns to FLIH

R15 Receiving Routine's
Entry Address

t . CPU

RISGNL j Pa~lIel
1Senal ~

=

j PCCA Entry Address

?

= j Address t [ p

?

(12)

~

,arm

(1)

=j

1

Return Address

Output Register

IEAVERI Invoked via RISGNL Macro Expansion:

EP

Input Registers
R 1 Parameter Address

l
f

R21 FLIH Return Address

I

t]

Address
(11) ~

Notes:
1. Turns on active indicator
Saves callers registers
Establishes addressability
2. Disables/Enables Spin
1. Turns on SPIN indicator
2. Enables for MFA and emergency signal interrupts
3. Disables
4. Turns off SPIN indicator
3. If CC = 3 and yet the p'rocessor is logically online, a SIGP
hardware failure may exist. A "Soft ACR" option is
available to the system operator to reconfigure to a
UP system.

Figure 2·11. Emergency Signal (EMS) Process Flow (part 2 of 2)

Section 2: Important Considerations Unique to MVS

2.S.1S

Effects of Multiprocessing On Problem Analysis (continued)

MP Debugging Hints
1.

Apparent disabled loop in lEAVERI on processor A.
This is probably caused when processor A sends an EMS to processor B, but
the receiving routine on processor B has not yet turned off the serial or
parallel bit in processor A's PCCA. Thus, processor A is in the "window
spin" state in lEAVERI.
To find what processor A wanted processor B to do, locate processor
A's PCCA.
CVT + X'2FC' points to the PCCAVT
PCCAVT + 4 (CPUID for processor A) points to processor A's PCCA.
PROCESSOR A's
PCCA

X'80' for Parallel
Request
X'40' for Serial
Request

X'88'
RISP
X'8C'
Receiving Routine PARM address
X'90'
Receiving Routine EP address
X'94'
Receiving Processor's PCCA
"t'address

"'r

By locating the proper PCCA (in this case processor A's), you can determine
whether the EMS request was parallel or serial, the entry point, and, therefore, the name of the receiving routine. Although this information tells quite
a bit about the current process on processor A, the real problem, however, is
most likely on processor B. Three past experiences can help determine the
state of processor B.
• Processor B, if disabled for EMS interrupts, would never take the EMS
interrupt; therefore the receiving routine would never get control and the
parallel or serial bit would never get turned off.
• There could be a hardware problem with the SIGP circuitry. For example,
if lEAVERI got condition code 0 as a result of issuing the SIGP instruction on processor A, but the SIGP was never received on processor B, there
would be a loop in IEAVERI.
• Processor B was stopped in order to take a stand-alone dump. Before the
dump program was IPLed or processor A was stopped, processor A issued
the EMS for page invalidation. Thus, when the dump occurred, processor
A was looping while waiting for the page invalidation to complete. So it
appeared that processor A's looping was the problem when actually it was
caused by a previously-identified problem on processor B.

(
2.5.16

OS/VS2 System Programming Library: .MVS Diagnostic Techniques

Effects of Multiprocessing On Problem Analysis (continued)
2.

Locate External Call buffers
The external call buffer is located at offset X'84' in ilie PCCA. Normally, the
buffer is clear, but it is worthwhile to check to make sure that there is no
XC work to process, as indicated by the request codes below:
PCCA

X'84'

code

Request Code:

I

X'80'
X'40'
X'20'
X'10'
X'08'
X'04'
X'02'

SWITCH
SIO
RQCHECK
GTFCRM
RESERVED
MODE
MF1TCH

The code is set in the receiving processor's PCCA so that a bit on in processor
B's PCCA, for example, means that processor A initiated the request.
3.

Determining Which Processor Has I/O Capability
The processor attribute bits, PCCAATTR, are located at offset x' 178' in the
PCCA. lfbit 1 (PCCAIO) is 1, then this processor has 1/0 capability, which
means that this processor has at least one channel logically online.

Bit 1 is set to 0 by:
IEAVNIPO: For each processor that has no channels physically online.
(Note: For Model 158 and Model 168 APsystems, PCCAIO=O
for the attached processing unit.)
IEEVCPU:

When the last channel of a processor is varied offline.

Bit 1 is set to 1 by:
IEAVNIPO: For each processor that has channels physically online.
IEEVWKUP: When a processor is varied online and it has channels physcially
online.
When the first channel of a processor is varied online.

Section 2: Important Considerations Unique to MVS

·2.5.17

Effects of Multiprocessing On Problem Analysis (continued)

Bit 1 is referenced by:
IGFPTERM: When searching for a live processor, if that processor has I/O
capability (PCCAIO=1), a SIGP EMS is issued to that processor.
IGFPTSIG: When processing an EMS received from a failing processor.
When invoked during system termination, if executing on a
processor with I/O capability, IGFPTSIG writes to LOGREC and
the console.
IGFPXMFA: When processing an MFA received from a failing processor. If
executing on a processor that has I/O capability, IGFPXMFA
invokes ACR.
.
IEAVTACR: IfPCCAIO=1 for the failing processor, IEAVTACR invokes I/O
restart to handle outstanding I/O.

2.S.18

OS/VS2 System Programming Library: MVS Diagnostic Techniques

MVS Trace Analysis

This chapter reviews the trace formats found in VS2 storage dumps. The
MVS trace (similar to the as trace) and the GTF trace are available in both systeminitiated dumps (SNAP) and in stand-alone dumps. There are formatting routines
for most combinations. The trace table entry format can be found in the "Data
Areas" section (see TTE-Trace Table Entry) and the "Dump and Trace Formats"
section of the Debugging Handbook.
.The information in this chapter is provided to assist you in reviewing the various
formats as you will see them in a storage dump. The page fault path is used as the
vehicle for describing these formats in the following examples and descriptions.

Trace Entries
To have these entries formatted in a SYSUDUMP/SYSABEND/SYSMDUMP, the
installation must specify SDATA={TRT) in the SYSl.PARMLIB members or use
the CHNGDMP command.

Note: SYSMDUMP produces a machine-readable dump; AMDPRDMP must be
used to print it.AMDPRDMP does not format the system trace table'.
For unformatted trace table entries, the system queue area (SQA) must have
been printed. Use location X'S4' as shown in Figure 2-12 to locate the trace table.
Remember that 'TRACE ON' was required at IPL time. (Note that if GTF is active,
the system trace is turned off.)
Entry Pointers
LocX'54'
" " Current

FD95CO
FD95EO
FD9600
FD9620
FD9640
FD9660
FD9680

~
a

.

First

00000000 000295C8 ~~
00000000 00000000 00000000 00000000
078D206F 40F63284 00000004 00000004
070C3011 BOF647EE 00000000 OOCBDOOO
078D706F 40F63284 00000000 00000000
078D2003 40014C72 00FA903A 00000000
070C700C 60E470B4 00FA903A 00000000
\ 070C200C--':OE4 70B41\00E4 70E6 OOOOO~O

c

b

Last

~
00000000
00064F08
OOCBDOOO
00000127
00064DE8
00064DE8

F5003240
00000000
00400004
00400004
00400004
00400004
00400004

00000000
00000000
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410

00000000
00000000
0712 8A 1 B
07128A38
07128A67
07128A8D
07128A9F

0,()064DE8)~~~~
d
e
f
9

* ....... H .. .

*.......... .
*.... 6 . . . . .
* ..... 6 . . . . .
* .... 6.A .. .
* ........ ..
*..... u .... .
* ..... U . . . U.

where:
a-

address column in SQA

b-

PSW or device address/CAW if an SIO operation

c-

variable, see TTE in Debugging Handbook

d-

CPU I D: 0040 for processor 0; 0041 for processor 1

e-

ASID:

f -

TCB address

9-

Timer value

0001 is Master Scheduler; 0002 is usually JES;
0000 is Dummy Task or N/A

Figure 2-12. How to Locate the Trace Table

. Section 2: Important Considerations Unique to MVS

2.6.1

MVS Trace Analysis (continued)

If low address storage is overlaid and the trace table pointer (X'S4') is lost, you
can locate the trace table (which is in the SQA) by searching through the high
address range of common storage: Each trace entry is X'20' bytes in length and
begins in the extreme left-hand column of a storage dump. Once you locate a
pattern ofX'OT and X'04' combinations, you have found the trace table.
If location X'S4' has not been overlaid, then it will point to the control information for the trace; this information is directly in front of the actual table.
The trace routine places an entry (record) type indicator in the fifth position of
the PSW and moves the interrupt code in to make the PSW appear as Be mode .
. Figure 2-13 illustrates and explains each of the trace entry types.
Position 5

G)--FD98AO
FD98CO
FD98EO
(Y-FDB900
FDB92 0

@-~g:~~~
@f~g:!~~
FDBA20

FD9740
@-FD9760

@:~~~!~
G)rFD97CO
FD97EO
FD9800
FD9820
®-FD9840
FD9860
FD9880

000150
078D7000
078D206F
078r.@>04
078D706F
078D2003
078$6F

000060F8
00F62F84
40F63284
00F63284
40F63284
40014C72
40F63284
070~11 BOF647EE
078D706F 40F63284
078D2003 40014C72

001F54F8
00F62F30
00000004
00000000
00000000
00FA903A
00000004
00000000
00000000
00FA903A

OCOOOOOO
00000001
00000004
00000000
00000000
00000000
00000004
00CB5000
00000000
00000000

00064F08
00064F08
000002BO
000002BO
00064DE8
00064F08
00CB5000
000002BF
00064DE8

00400004
00400004
00400004
00400000
00400004
00400004
00400004
00400004
00400004
00400004

00CEC410
00CEC410
00CEC410
00FE99B8
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410

0712B5A8
0712B5C4
0712B6B2
0712B6DO
0712B700
0712B716

40014C72
3001A128
0001C760
7001A128
0003054C
000060F8
60E470B4
60E470B4
00E470E6
00F62F84
0001C760

00FA903A
001F5258
00000004
00000000
00000004
001F5258
00FA903A
00E470E6
00E470E6
001 F54F8
00000004

00000000
OCOOOOOO
00FF5168
00000088
OOOOFFFF
OCOOOOOO
00000000
00000000
00000000
OCOOOOOO
00FF5168

00064DE8
00000000
00FF5194
00CC3CBO
00FF5258
00FF5194
00064DE8
00064DE8
00064DE8
00000000
00FF5194

00400004
00400004
00400004
00400004
00400004
00400004
00400004
00400004
00400004
00400004
00400004

00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410
00CEC410

07128BDF
07128BEE
07128C01
07128C10
07128C1C
07128C58
07128C65
07128D10
07128D25
07128D32
07128D43

078D2003
070*50
070 4 00
070C700C
070c@J00
00000250
070C700C
070C200C
078r:&>00
078D5250
070C4000

00FF51.~4

07128D81
07128D8E
07128DA9
0712B~82

*....... 8 ••• 8 •••••••••
*..•.. 6 ••• 6 •••••••••••
*.... 6 •••••••••••••••
*..... 6 •••••••••
*.... 6 •••••••••
*............. .
*.... 6 •••••••••••••••
*..... 6 •••••••••••••••
*.... 6 •••••••••••••••
>1< • • • • • • • • • • • • • • • • • •

>1< • • • •

>I<

>1< • • • • • • • • • • • • • • • • • • • • •
>1< • • • • • • •

>I< • • • • •
>1< • • • • •

..

>1< • • • • •

>I<

Fifth digit in first word is O. This is an SIO entry for device 250. The CAW
address is '60F8'. The CSW is residual from the previous I/O interrupt. The 10SB
address is 'FF5194'.

Fifth digit = 2, an SVC interrupt. An SVC '6/=' was issued from location F63284
(minus the interruption length code -- I LC). Variable fields are registers 15, 0
and 1.
Fifth digit = 3, a program interrupt. Interrupt code of X'11' is a page exception.
Word four is the referenced translation exception address (TEA).

®

Fifth digit = 4, an SRB dispatch. The address in the PSW (1C760) is the entry
point address. Word 3 contains the ASID to be dispatched. This illustrates the
scheduling of POST status after an I/O interrupt.

®

Fifth digit = 5, an I/O interrupt. The device. address (250) has been moved into the
PSW. Words 3 and 4 are the CSW with the channel end/device end.

o
®

Fifth digit = 6, SRB redispatch. SRBs can be suspended because of lock contention
or a page fault. The address in the PSW is the return address to the lock manager
or the instruction that caused a page fault.
Fifth digit = 7, Task dispatch. Interrupt code is from the last task interrupt. If the
interrupt cOde is 0, it is a return from SVC 0 or the first dispatch of this request
block (RB) for the task.

Figure 2-13. Types of Trace Entries

2.6.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

8 •.•..•..•••••

*.. , .. u .•.•.••..•.•. Y.

Fifth digit = 1, an external type. This entry has an interrupt code of X'1004' so it
was generated by a clock comparator interrupt.

@

•••••••••••••• Y.

.....................
*.. ••• G ••••••••••••••
.....................
>I<

where:


IO
SRB
DSP

017 ASCB 00FD5858 CPU 0000
RO 0000005B Rl 0000005B
R8 008B5F04 R9 00000000
TIME
44413.341696
ASCB 000167FO CPU 0000
TIME
44413.343055
0353 ASCB 00016780 CPU 0000
FLGS 00000010 8801
TIME
44413.344333
ASCB 00017058 CPU 0000
'l'IME
44413.345269
0353 ASCB 00016780 CPU 0000
CSW 00078498 OC000001
'rIME
44413.372394
AseB 00F05858 CPU 0000
TIME
44413.373942
ASCB 00F05858 CPU 0000
TIME
44413.375033

JOBN USR1085
OLD PSW 075C0011 00C6BOOO TCB 00888FB8 MOON SVC-RES
VPA-00C68000
R2 8F8B5B78 R3 40C69002 R4 008B58F8 R5 01885F2C R6 008B5EE4 R7 018B5F20
R10 00000008 R11 008B5A04 R12 00000000 R13 0000005B R14 008B8EB8 R15 00C6BOOO
JOBN *MASTER*

SRB PSW 070COOOO 00061A40

SRB 00FE7400

PARM 00000000

JOBN *MASTER*
STAT 0000

R/V CPA 00078740 00078470
SK ADOR 00000000 OE000803

CAW OOOOEFBO
CC 0

OSIO 00000000

JOBN N/A

DSP PSW 070EOOOO 00000000

TCB 00017158

MOON N/A

JOBN *MASTER*
SNS N/A

OLD PSW 070EOOOO 00000000
R/V CPA 00078470 00078470

TCB N/A
FLG C0108801

USIO 00000000
A2000353 00

JOBN USRT085

SRB PSW 070COOOO 0004B6FA

SRB 00FFB480

PARM 00000001

JOBN USRT085

OSP PSW 075COOOO 00C6BOOO

TCB 008B8EB8

MOON SVC-RES

PGM 017

The page fault. VPA=address of fault.

SRB

The dispatch of ASM's part monitor routine in master's address space.

SIO 353

The Start 110 to page-in the requested page.

DSP

The dispatch of any ready work while the page-in 1/0 is in progress.
In this case, there is no ready work, so the wait task is dispatched.

10353

The 1/0 interrupt from the paging device. ASM's disable interrupt exit(DIE) routine gets control.

SAB

The dispatch of RSM's I EAVIOCP page-in completion processor, to
validate the page table entry and post the faulter as ready to run.

IDSP

The faulter resumed where he left off.

Figure 2-17. GTF Trace of a Page Fault With I/O -

2.6.4

OS/VS2 System Programming LibraIy: MVS Diagnostic Techniques

TYPE GLOBAL

TYPE LOCAL

MVS Trace Analysis (continued)

Notes for Traces
The trace provides a history of some of the events that lead to a storage dump.
Trace interpretation is one of the most important aspects of debugging.

Tracing Procedure
When attempting to recreate the process that was occurring on the processor(s)
when the dump was taken, start at the last entry in the trace table (identified either
by the trace header or by the highest clock value in the last column) and scan upwards. While scanning, look for unexpected events. These include:
•

Unit check, unit exceptions on I/O devices

•

Non-CC

=0 on SIOs

•

Non-type 11 program checks

•

SVC D, 33 - (see number 6 under "Cautionary Notes" later in this chapter)

•

Malfunction alerts (X'1200' external interrupt)

•

Entries that show both processors -executing the same code as indicated by the
ICs (instruction counter) in the entries

•

Large time gaps in the TOD dock value

•

MP environment and only one processor doing anything

These entries indicate a potential for errors. Do not be distracted if you
discover an entry of this type. Record the incident for future use. Then continue
scanning back through the trace and try to determine what was happening in the
system that might have caused the failure. Remember to conduct the scan by unique
processor. Separate the processes that occur on each processor and watch for any
obvious interactions in the processes.
You can further subdivide the activity by address space (as depicted by
ASID) or by task (TCB address; remember to stay under the same ASID). As you
recreate the situation, remember that you are relating individual
entries to real events that must occur in order to accomplish work. Do not be
distracted. For example, do not look for an I/O interrupt just because you see an
SIO. The two events should be associated, but you should also determine the
follOwing:
•

Why the I/O is occurring;

•

If the I/O is related to the process, address space, task, page fault,
etc. that you are concerned with;

•

If the I/O completion should trigger another event. This is the way work is
accomplished in MVS, that is, events triggering more events. As you become
familiar with trace coding you learn to expect this "event causing" sequence.
Certain sequences occur very frequently; you learn to recognize these and
to look for less familiar sequences.

Section 2: Important Considerations Unique to MVS

2.6.5

MVS Trace Analysis (continued)
As you are searching trace entries, watch for repeating patterns, which can
indicate a loop in the system. These patterns can 'appear as constantly repeating
ICs (generally the case in a tight enabled loop), or as a repeating sequence of
entries (often the case in a process loop, such as an ERP constantly retrying an
I/O operation). Note that in the latter case, other entries from otlier processes
can intervene periodically in the trace table, especially in an MP environment.
If you reach a point in the trace analysis where you are somewhat comfortable
with the processes you are uncovering and recreating, and you feel you have a
fair understanding of the activity in the system, pause. Try to understand
what you have found. Is there any way you can relate your findings to the reason
you have taken the. dump in the first place? Do the unexpected events have
anything to do with the problem, or are they unrelated to the problem? It c·an
happen that the events you have discovered are unrelated to the problem causing
the dump and you have exhausted the scope of the trace. In this case, you probably
have to go into the system and study the address space and task structures, .
queues, and global data areas in order to zero in on the problem.
However, if the events you have discovered are related to the problem causing
the dump, you must then attempt to isolate the erroneous process. Try to .
understand how the unexpected events relate to the process. Look on both
sides of the event: did the event trigger the bad process, or is it a result of the
bad process?

It is also necessary in trace analysis inMVS to understand whether you
are looking at the primary error or at some secondary problem. Is this
a mainline failure or a failure because of a problem in the recovery? Also, you
must decide if the problem is caused by a previous error from which the system has
recovered. Always be sure that it was not something several pages earlier in the
trace that caused recovery to be activated and eventually led to the current problem. If thjs is the case you must now decide which error to pursue. The original
error is probably more important; however, m\lch of the required information
might be lost because of recovery arid the subs~quent recovery failure. Also k~ep in
mind that if you must attack the secondary error condition, your search of the
dump and the recovery areas can often uncover information about the first error.
The trace is one of the most useful tools available for back-tracking through a
problem sequence. You must use it in conjunction with system control blocks and
indicators in order to recreate the errorsequence. This is still true in MVS despite
the fact that the trace contains less information than in previous systems.
In MVS, the SVC calls have been greatly reduced because of branch entry
logic for both transfer of control and supervisor services. This means that trace
entries are not provided as in previous operating systems. Also, many significant
events, such as lock acquisition and release, SRB scheduling, and SIGP issuance,
are not traced. Because of these MVS considerations, you must be able to understand the processes and interpret the trace table rather than just read it.

2.6.6

OS!VS2 System Programming Library: MVS Diagnostic Techniques

'.

MVS Trace Analysis (continued)

Cautionary Notes
Listed below are some items the problem solver should understand when analyzing
an MVS trace table.

I.

I/O Processing:
• Much I/O is accomplished in MVS by the branch entry interface to lOS and
without the use of SVC 0 (EXCP). Therefore, you often find I/O
entries (SIO/I/O interrupt) that are not accompanied by SVC O.
• Back-end I/O processing in MVS generally results in an SRB schedule of
IECVPST. This trace entry should appear soon after an I/O interrupt. The
register 1 slot will contain the 10SB address. The 10SB is the key to
tracking the I/O request.

2.

Timer Value:
The last field of each trace entry contains the middle four bytes of the eightbyte TOO clock at the time the entry was made~ The clock can be of
considerable importance when trace entries and various system fields (such as
the ASCB or LCCA, which also contain TOO clock values) are used to determine how much time has elapsed between significant events. The last digit
represents a value that is increased every 16 microseconds. Also, the fourth
digit represents the value to be increased every second.

3.

Enabled Wait State:
Because of recovery, the end symptom of many problems is an enabled wait
state. For tracing, the wait state presents particular problems in MVS. SRM
maintains a timer interval that causes a clock comparator interrupf(code
X'1004') approximately every 1/2 second. These external interrupts are
recorded in the trace table. You then see the re-dispatch of the no-work
wait task followed by another clock comparator interrupt, and so on. Even
though this occurs,the sequence is not repeatedly traced. In addition, in an
MP environment there are external calls (code X' 1202') issued between the two
processors requesting that the receiver look for ready work. These calls will be
followed by a re-dispatch of the no-work wait on the receiving processor. In
short, the wait state is a combination of dispatches of the no-work wait task,
clock comparator interrupts, and SIGP external calls. The IC (instruction
counter) will always be O. At approximately 12- or I3-seconds intervals, an
SRB is dispatched in the master scheduler address space to run a section of
SRM in order to gather system statistics. When the SRB has completed, the
no-work wait task is again dispatched.
All this extraneous activity causes the trace to wrap around and overlay the
important trace entries of the events that led up to the enabled wait state.

Section 2: Important Considerations Unique to MVS

2.6.7

MVS Trace Analysis (continued)

4.

MP Activity:
The communication between the two processors in the MP environment is
traced ,as the external interrupts are accepted by the receiving processor. An
external interrupt code ofX'1201' is an emergency signal; and an external
interrupt code of X' 1202' is an external call. (The previous chapter, "Effects
of MP on Problem Analysis," explains this communication process.)

5.

Trace Currency:
Various processesthatoccur in MVS turn off the MVS trace. The most
prominent o'f these are GTF and SVC dump. Determine if the trace was
running when the, dump occurred: if you are unaware that the trace was not
running when the dUlllP was taken, you might go off on a fruitless chase and
lose considerable time. The trace was still active when the dump occurred if
the CVT +X'191' value =X'FA'.

Note: When SVC dump turns off the MVS trace, it sets bit 0 on in the CPU
identifier (offset X'14') in the current trace table entry.

6.' SVC D Entries:
SVC D is the means .by which termination is invoked. In previous operating
systems, SVC D meant abnormal termination. This is not always true in MVS.
RTM2 is the mechanism for normal end-of-task processing as well as for
abnormal te,rmination; RTM2 is invoked via SVC D. Consequently, SVC D
for normal termination is a valid situation and is traced. You
can determine whether SVC D implies normal or abnormal termination by
inspecting the register 1 slot associated with the SVC D entry. If the first
byte contains a X'08', RTM2 is being invoked for normal termination and this
is not an error situation.

7.

Important events not traced:
MVS design prevents locked, disabled, or SRB code from issuing SVCs. The
SVC FLIH abnormally terminates code that issues an SVC. Note that in this
case, the erroneous SVC invo~ation is not traced. Also note that locked, disabled, or SRB code that issues SVC D does so as a means of entering RTMI;
this is a common technique used by IBM SCP code in order to invoke recovery.
RTM indicators show SVC error, but the real problem is why the SVC D
was issued.

2.6.8

OS/VS2 System Programming,Libr~ry: MVS Diagnostic Techniques

MVS Trace Analysis (continued)

8.

Unit exception I/O interrupt on a 3705 communications controller:
The presence of unit exception conditions from the 3705 is a common
occurrence while running VTAM. This is a normal situation and should not be
considered erroneous. The host processor has issued a set of read commands to
the 3705, and the channel program has been terminated before all the reads
have completed because the NCP did not have enough data to satisfy each read
CCW.

9.

GETMAIN, FREEMAIN - SVC X'A', SVC X78':
For SVC X'A', inspect the register 1 slot of the associated trace entry. A value
of X'80' in the high-order byte indicates GETMAIN; a value of X'OO' indicates
FREEMAIN. SVC X'78' uses a code in register 15 (see the Debugging
Handbook.) If a GETMAIN is indicated, the register 1 slot of the associated
re-dispatch of the SVC issuing code can be used to locate the storage allocated
by the GETMAIN process.

I 10.

A GETMAIN for X'3CC' bytes is often seen soon after an SVC D is issued:
This is RTM2's request for storage for an RTM2WA. By locating the redispatch of RTM2 and inspecting the register 1 slot, you can locate the
RTM2WA.

Section 2: Important Considerations Unique to MVS

2.6.9

2.6.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints

This chapter is a collection of miscellaneous debugging hints to aid the problem
solver in specific situations not covered elsewhere in this book. It includes the
following topics:
•
•
•
•
•
•
•
•

Alternate CPU Recovery Problem Analysis
Pattern Recognition
OPEN/CLOSE/EOV ABENDs
Debugging Machine Checks
Debugging Problem Program ABEND Dumps
Debugging from Summary SVC Dumps
Started Task Control ABEND and Reason Codes
SWA Manager Reason Codes

Alternate CPU Recovery (ACR) Problem Analysis
Alternate CPU recovery (ACR) is the process by which MVS dynamically adjusts
to the unexpected failure of a processor in a multiprocessing (MP) configuration.
ACR is initiated by the failing processor. If the failing processor's hardware detects
the failure, it issues a malfunction alert (MFA) external signal to the other
processor. If the failing processor generates the severe machine check interrupt
(recursive or invalid logout) type, the machine check iIlterrupt handler will initiate
ACR via the SIGP instruction, emergency signal (EMS) operand, which generates
an external interrupt on the receiving processor.
When the running processor detects that a failing processor is requesting ACR,
it places X'FF' in the CSDACR byte (CSD+X'16') in the CSD control block. The
byte will be restored to X'OO' after ACR is complete.
ACR works in three phases: pre-processing, intermediate, and post-processing
phase. Pre-processing is the initialization phase: the running processor copies the
PSA and normal functional recovery routine (FRR) stacks of both processor's
and places them in the area pointed to from their respective LCCA's WSACACR
pointer. The WSACACR pointer islocated at X'lO' beyond the area pointed to
by LCCACPUS; Additionally, LCCAs are marked so that in both processor's
LCCAs, LCCADCPU points to the LCCA of the failing processor and LCCARCPU
points to the LCCA of the running processor. By means of the LCCACPUA field
in the LCCA, you can determine which processor has failed and which is still
running.
Note that in a storage dump, the physical PSA of the failed processor is the
same as it was when the processor decided that ACR should be initiated. The
normal FRR stack, pointers to other FRR stacks, locks, PSASUPER bits etc. all
reflect the state of the processor at the time it failed. This will be useful for solving
problems in the recovery initiated for the process on the failed processor.

Section 2: Important Considerations Unique to MVS

2.7.1

Miscellaneous Debugging flints (continued)
The ACR intermediate phase gets control from the MVS dispatcher, or lock
manager global spin lock routine. In this phase, ACR switches from the process
. on one logical processor to the process on the other logical processor. This
switching continues until the RTMI recovery (routing to FRRs) completes on
behalf of the process on the failed processor. At this point, the ACR postprocessing phase is entered.
ACR post-processing consists of cleanup activities performed by other
components and by ACR. ·Post-processing invokes I/O restart (IECVRSTI) to
initialize the channel reconfiguration hardware (CRH) function on a Model 168
or to mark outstanding I/O from the failed processor with a permanent error which
then initiates error recovery processing via error recovery procedures (ERPs).
Console switch is invoked via POST. Additionally, the system resources manager
(SRM) is notified of the loss of the processor. Finally, ACR performs additional
cleanup activities and sets the CSDACR flag to X'OO'.
Historically, the parts of the ACR process that have had software problems are
the FRRs (written by component developers to protect particular mainline
functions) and the ERPs (device-dependent routines). The mainline ACR routine
(lEAVTACR) is basic and has been quite free of problems.
Note: The I/O error processing invoked during the ACR process has caused many
of the problems discovered to date. Of significant importance is EXCP I/O error
processing. The following flow describes the non-CRH situation for an MVS
1S8 MP system.

.I

1. I/O restart (IECVRSTI}determines all devices that have outstanding requests
at the time of a machine check.
2. IECVRSTI simulates an I/O interrupt for each device of a channel control
check and interface control check (X'OOOOOOOO 00060000') and sets the
pseudo interrupt bit in the IRT (lRTPINT bit at X'02' in IRTENVR). This
prevents lOS from interfacing with the channel check handler (CCH).
3. IECVRSTI passes control to lOS via the I/O FLIH.
4. lOS sets the IOSCODfield in IOSP to X'74' and schedules IECVPST.
S. IECVPST routes control to the abnormal exit routine.
6. For an EXCP, the EXCP compatability interface routine receives control.
7. EXCP converts the X'74' to X'7F' in the lOB.
8. EXCP branches to abnormal end appendage .
•

I

9. Abnormal end appendage returns to EXCP, which returns to IECVPST.
10. IECVPST invokes normal ERP processing.
11. If no path remains to a device, subsequent I/O requests (either ERP retry or
normal new requests) are intercepted by lOS and flagged with IOSCOD = X'S l'
and IECVPST is scheduled.
12. IECVPST routes control to the abnormal exit routine.
13. For EXCP requests, the abnormal exiUs again the EXCP compatability
interface routin~.

2.7.2. OS/VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)
14. EXCP converts the X'SI' to a X'41' (permanent error) in the lOB and enters
the abnormal end appendage.
15. The abnormal end appendage returns to EXCP; EXCP returns to IECVPST,
which enters the termination routine.
The important point in the above discussion is that EXCP changes the ACR
completion codes to conventional error post codes.
The most frequent I/O problems have been:
• ERP's abnormal end appendages not coded for a 0 CCW address in CSW.
• ERP's abnormal end appendages not recognizing that the last path to a device
has been lost (as with asymmetric I/O) and thus going into an I/O retry loop.

Pattern Recognition
When analyzing a dump you should always be aware of the possibility of a storage
overlay. System incidents in MVS are often caused by storage overlays that
destroy data, control blocks, or executable code. The results of such an overlay
vary. For example:
• The system detects the problem and issues an abnormal completion code, yet
the error can be isolated to an address space.
• Referencing the data or instructions can cause an immediate error such as a
specification or op-code exception.
• The bad data can be used to reference a second location, which then causes an
evident error.
When you recognize that the contents of a storage location are invalid and
subsequently recognize the bit pattern as a certain control block or piece of data,
you generally can identify the erroneous process/component and start a detailed
analysis. This section discusses pattern recognition and potential causes of storage
overlays, and points out common patterns that aid the debugger.
Once you recognize an overlay, analyze the bit pattern. If you do not recognize
the pattern at all, try to determine the extent of the damaged area. Look at the
data on both sides of the obviously bad areas. See if the length is familiar; that is,
can you relate the length to a known control block length, data size, MVC length,
etc.? If so, check various offsets to determine their contents and, if you recognize
some, try to determine the exact control block/data. Even if you do not recognize
the pattern, take one more step. Can you determine the offset from some base (X)
that would have to be used in order to create the bit pattern? If so, the fact that
there is a certain bit pattern at a certain offset (Y) can be helpful. For example,
a BALR register value (X'40D21CS8') at an offset X'C' can indicate that a program
is using this storage for a register save area (perhaps caused by a bad register 13).
Another field in the same overlaid area might trigger recognition.

Section 2: Important Considerations Unique to MVS

2.7.3

Miscellaneous Debugging Hints (continued)
Look at the overlaid area and scan for familiar addresses sucn-'as devjce· addresses,
UCB addresses, and BAL/BALR register values (full word with Ngh-order byte
containing some "1" bits). If you find. any of these, try to detemrine what components
or modules are involved or what control blocks contain these addresses.
Repetition of a pattern can indicate a bad process. If you can tecognize the bad
data you might be able to relate that data to the component or module that is
causing the error. This provides a starting point for further analysis\

Low Storage Overlays
Low storage is a common location for storage overlays. The following should be
noted:
•

Location X'IO' (CVT pointer) should contain a nucleus address. This location
is refreshed by the program check first level interrupt handler and so is often
valid when adjacent locations are bad.

•

Location X'14' should always be O.

•

Locations X'18' through X'3F' (old PSWs) should always contain valid PSWs.
The mask (first byte) of each PSW should be X'07', with the exception of
X'30' which can containX'O', X'04', or X'07'.

•

Location X'4C' should be equal to location X' 10'.

•

Locations X'S8' through X'7F' (new PSWs) should contain valid PSWs.

If any of the above statements is not true, consider a low storage overlay.
Further analysis is required to determine what the cause may be. Also consider
that, on a non-prefixed machine, the low storage locations described above can be
overlaid by CCWs for the stand-alone dump program, starting at location X'IO.'
Do not consider this an error situation.
Two common low storage problems are:
•

A register save area starting at location X'30'. This can happen when an area
of the system saves register status in a TCB at location O. Or it can be cal,1sed
by a routine using PSATOLD for a TCB address when the system is in SRB
mode; this is indicated by PSATOLD=O.

•

An SRB/IOSB combination starting at location X'O'. This can be caused by
a problem in the lOS storage manager. The contents vary depending upon
how many control blocks the code has initialized. Points to considerare:

1.

2.7.4

The two blocks might point to each other (X'IC' into each).

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)
2.

An ASCB address might be at location 8.

3.

Addresses of I ECVEXCP routines might be at X'68' and/or X'6C'.

Common Bad Addresses
Three common bad addresses are:
•

X'COOOO', and this address plus some offset. These are generally the result of
some code using 0 as the base register for a control block and subsequently
loading a pointer from 0 plus an offset, thereby picking up the first half of a
PSW in the PSA.
Look for storage overlays in first level interrupt handlers or in code pointed
to by the Qld PSW. These overlays result when 0 plus an offset cause the
second half (IC) of a PSW to be used as a pointer.

•

X'COO', X'C34', X'CSO', X'CS4', X'CSC', X'C7C', and other pointers to
fields in the normal FRR stack. Routines often lose the contents of a
register during a SETFRR macro expansion and illegally use the address of
the 24-byte work area returned from the expansion.

•

Register save areas. Storage might be overlaid by code doing an STM (Store
Multiple) instruction with a bad register save area address. In this case, the
registers saved are often useful in determining the component or module at
fault.

OPEN/CLOSE/EOV ABENDs
When a dump shows an abend issued from O/C/EOV, the key area to start
your diagnosis in is the RTM2 work area. The failing TCB has a pointer
(at TCB+ X'EO') to this area. This work area contains information current
at the time of the abend, the most important being the register contents.
Register 4 pOints to the current O/C/EOV work area. This work area is built by
IFGORROA during problem determination and contains key information about the
problem: the JFCB, lOB, DEB and other pertinent fields are all saved in the work
area for use later by the recovery routines. The O/C/EOV work area is documented
on microfiche in each O/C/EOV module.
The module in control at the time of the abend can be determined from the
"Where To Go" (WTG) table, which is pointed to by register 6 in the RTM2 work
area. The WTG table is contained within another work area called the O.C. work
area. IFGORROA saves a copy of the current DCB in this work area. If multiple
DCBs are involved, the prefix to the DCB work area points to another DCB
work area. These DCB areas are laid out precisely like a DCB. All these work
areas and their prefixes are documented at the end of every O/C/EOV module in
the microfiche.

Section 2: Important Considerati.ons Unique to MVS

2.7.5

Miscellaneous Debugging Hints (continued)
In an MVS environment, O/C/EOV must build these work areas rather than rely
on what is in real storage at the time of the dump. The main task is to find these
. ·areas and interpret their fields using microfiche. A quick way to find these work
areas is to find subpool230 in the dump. All O/C/EOV data is in this subpool.
Assuming you have all the. pertinent information about the failure, the problem
becomes the same as an O/C/EOV problem in OS. One more point: built into the
code is message IEC999I. This message indicates that there is a problem in the
O/C/EOV code that cannot be determined. While you may be able to circumvent
this problem, you should also submit an APAR for it.

Debugging Machine Checks
The machine check interruption is the hardware's method of informing the MVS
control program that it 'has detected a hardware malfunction. Machine checks vary
considerably in their impact on software processing. Some machine checks notify
software that the processor detected and corrected a hardware problem that
required no software recovery action (software calls these errors soft errors). Hard
errors are hardware problems detected by a processor but that require software-initiated
action for damage repair. Hard errors also require software recovery to verify the
integrity of the process that experienced the failure. Obviously, if there are software problems after a machine check, it is more likely that the machine check was
a hard error. It is important to get· a feeling for which software components are
affected by particular hardware failures.
The machine check interrupt code (MCIC), located in the PSA, describes the
error causing the interrupt. The following discussion shows how to find MCICs and
how to interpret them for subsequent software processing. Machine checks can be
found in a LOGREC buffer (LRB), the SYS I.LOGREC data set, or in the storage
area used as a buffer prior to writing records to SYSI.LOGREC (see the discussion
of SYSI.LOGREC analysis in the "Recovery Work Areas" chapter earlier in this·
section). Also, a pointer to the LRB that describes the last machine check that occurred
ona processor can be found in that processor's PCCA at PCCALRBV (PCCA+X'AO').
The LRB contains the machine check interrupt code (MCIC), except when:
•

The machine check old PSW is zero. The MCIC is also zero. The
LRBMTCKSbit (field LRBTERM at LRB+X'20') is turned on by
software.

•

MCIC is zero and the machine check old PSW is non-zero. The LRBMTINV
bit (field LRBTERM at LRB+ X'20') is turned on by software.

The MCIC is the principal driver of software processing after a machine check.
\ It must be examined to determine the'actions that MVS should take. The MCIC
contains bits describing the conditions that caused the interupt. Note that more
than one failing condition can be described by a machine check at one time.
Software performs repair processing for each condition found; software recovery
processing is initiated if any hard error conditions are found (except in'the cases
described on the following pages).

2.7.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)
Because hard errors require FRR and ESTAE processing, identifying a hard
error is important. Important MCIC bits are listed below, with a description
of their hardware significance and impact on software. A handy MCIC reference
matrix, containing additional machine check and ensuing action-taken
information appears at the back of this section.
Bit 0 (System damage) - The processor is still useable, but damage occurred
while the processor was in the process of changing PSWs or otherwise changing
system control, and thus has lost the associated process or interrupt. Software
recovery routines (FRRs) are entered for this hard error.
Bit 1 (Instruction processing damage) - The processor is still useable but an
instruction has failed to operate as intended. Software recovery is initiated for
this hard error, unless the backed-up bit is on with storage error or key error
uncorrected on refreshable storage (see Bit 16 description).
Bit 2 (System recovery) - The processor detected and corrected a potential
hardware problem. The interrupted process is completely restored by software
for this soft error; no repair is performed and no recovery routines are entered.
Bit 3 (Timer damage)· - The interval timer at PSA location X'SO' has failed.
Because MVS does not use this timer, this failure is ignored (indicated as a soft
error).
Bit 4 (Timing facility damage) - Damage has occurred to the CPU timer, clock
comparator, or time-of-day clock. The particular clock facility that is damaged
is described by MCIC bits 46 and 47. A first failure to a facility results in an
attempt to reuse it. Subsequent failures result in taking the facility offline
(described in the PCCA fields PCCATODE, PCCACCE, or PCCAINTE). If no clock
of a particular type remains in the system, any task which requests timing using
that type of clock is sent through software recovery. This is treated as a soft error
for the process current on the processor at the time of the interrupt.
Bit 5 (External damage) - Damage has occurred to a unit external to the
processor. MVS expects more information in a channel check I/O interrupt.
This is treated as a soft error.
Bit 7 (Degradation) - The system has detected that elements of the high-speed
buffer (cache) or translation look-aside buffer have had bit (parity) errors. The
bad elements are automatically reconfigured out of the buffer. Once a predefined
threshold of degradation machine checks is reached, the buffer and the translation
look-aside buffer are reset, thus making the entire buffer available again. This
threshold has a default value of 3 which can be changed by the operator via the
MODE command. Until then, the system might perform at a reduced rate because
of increased storage access time (cache element deletion) or increased time to
translate virtual addresses (because of translation look-aside buffer element
deletion). However, because no damage has been dane to any software processor
data, this soft error is merely recorded in SYSI.LOGREC. The system state at the
time of the error is re-established, ignoring the occurrence of the buffer bit error.
It is treated as a soft error and no software recovery is initiated.

Section 2: Important Considerations Unique.to MVS

2.7.7

Miscellaneous Debugging Hints (continued)
Bit 8 (Warning) - Damage isjmminen~; there is a cooling loss or a power drop,
etc. Software determines if the error is transient or permanent. If it is transient,
the warning interrupt is treated as a soft error. If permanent, an attempt is
made to invoke the power warning feature software, to record the system state
at the time of this hard error.
Bit 16 (Storage error uncorrected) - There is a block in storage with a double bit
error that is located at the real, prefixed address stored in PSA location X'F8'. If
the frame's page is refreshable, that is, unchanged, pageable, and in the current
address space, it is marked invalid so a future reference will cause a fresh copy
to be paged into a new frame. (Note: More than one error can occur before the
page goes offline.) In all cases, an attempt is made to take the damaged frame
offline (unless the frame is in the nucleus). For unchanged nucleus frames, the
page is refreshed from a copy paged-out at NIP time. When a storage error uncorrected condition occurs in conjunction with a system recovery or external
damage error, it is treated as a soft error and no recovery routines are entered. If
the storage error occurs in conjunction with instruction processing damage when
the backed-up bit (bit J4) and storage logical validity-bit (bit 31) are on, and the
frame's page is refreshable, the error is treated as soft and no recovery routines are
entered.

Any other occurrences of storage error uncorrected are treated as hard errors
and software recovery is initiated for the error.
Bit 17 (Storage error corrected) - A single-bit storage error was detected and
successfully corrected by hardware. Software treats this error as a soft error. This
error sometimes appears in conjunction with system recovery (bit 2).
Bit 18 (Storage key error uncorrected) - Hardware has detected a bit error in a
storage key. Software attempts to reset the storage key to its original value. If the
key is successfully reset, and the storage key error occurs in conjunction with
instruction processing damage when the backed-up bit (bit 14) and the storage
logical validity bit (bit 31) are on, the error is treated as soft and no recovery
routines are entered. When thestorage key error occurs in conjunction with a
system recovery or external damage error, it is also treated as a soft error and
no recovery routines are entered. Change bits are set to one in case the frames have
been altered. Any other occurrences of storage key error are treated as hard
errors and software recovery is initiated for the error.

In addition to these error description bits there are other MCIC fields that
describe the time-of-occurrence of the machine check interrupt, or the validity of
the registers, PSW, and other data logged out during the machine check interruption
process.
The two time-of-occurrence bits are bits 14 and 15. The backed-up bit (bit 14),
when set to 1, indicates that the machine check occurred before actual damage
occurred. The delayed bit (bit 15) is set to 1 when the processor has been disabled
for one or more of the interrupt conditions described in the MCIC. The processor
had been processing after damage was detected.

2.7.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)

Validity bits describe the validity of the associated field logged out during the
machine check interrupt. If a validity bit is 0, the associated data logged out is
ineorrect. Validity bits are:
•

Bit 20 (PSW EMWP mask validity)

•

Bit 21 (masks and key validity)

•

Bit 22 (program mask and condition code validity)

•

Bit 23 (instruction address of machine check old PSW validity)

•

Bit 24 (failing storage address validity)

•

Bit 25 (region code validity)

•

Bit 27 (floating point register validity)

•

Bit 28 (general purpose register validity)

•

Bit 29 (control register validity)

•

Bit 30 (processor model-depe.ndent logout validity)

•

Bit 46 (processor timer validity)

•

Bit 47 (clock comparator validity)

Additionally, the storage logical validity bit (bit 31 set to 1) indicates that all store
operations (that were to occur before the machine check interrupt) have
completed.

Section 2: Important Considerations Unique to MVS

2.7.9

Miscellaneous ,Debugging '1'Hints
(continued)
.

'·,1

The following chart attempts to show the action taken for each error condition.
For example: In column 6 the condition involves recursive machine checks, or, a
check stop, or, invalid logout. The condition originated on either a Model 158 or
a Model 168 attached PI:oc~ssor system, and did not involve the APU. The action
taken resulted in a disabled wait. Where multiple errors do exist, appropriate repair
action is taken for all errors, and recovery action is taken for the most severe error.
With the exception of I/O reserve outstanding, the status of each of the
conditions can be determined from examination of MCH SYSI.LOGREC records.
CONDITION

1

2

3

4

5

6

Recursion

X

X
X
X

X
X
X

X
X
X

X
X
X

X
X
X

Check Stop

X

Invalid Logout
Subclass (MCIC)

System Damage

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

X

X

X

Inst. Proc'g. Damage
System Recovery

X

X
X
X

..

Timer Damage
Clock Damage

X

X

External Damage

X

X

X

X

X
X

Degradation

X

Warning
Time

Backed Up

Type

Stor Err Uncorr

X
0
X

Delayed

X
0

0
X

X

X

X

X

X

Stor Err Corr

X

Key Error

X

X

X
X

Key Err Unresetable
PSW (WP, MS, PM, IA).

Validity

Failing Stor Addr
Registers (FP, GR, CRI

X
X
X

X
X
X

X

X

X

.'

0
0
0

Logout
Storage Logical
CPU Timer

0
X

Clock Comparator
Location

Pageable
Nucleus

X

X
X

LSOA, SOA

X

Fixed

X

0
X

X
0

X
0

X
X

X
X
X

X

X
X
X
X

V=R
Outside Curro Memory
Storage State

0

UP

X
X

158

X

X

X

X

X

168
Reserve Outstanding
1st

X

X

X

X X
X (X
X I\"X
X 0

APU
I/O

X

X

MP

Occurrence

IlX)

X

AP
Processor

I

X

X

0

Changed
Unchanged

System

0

1(5<)

X
X
X
X

X

X
X

X
X

2nd

X
X

X

ACTION TAKEN

X

X

Reset timing component

X

X

Mark CPU Timer perm. damaged

X

Mark Clock Comp perm. damaged

X
X

Mark TOD Clock perm. damaged

X

Invoke PWF if available

X

Activate CRH
Take frame offline immed.

X

X

X

X

X

Take frame offline when avail.
Invalidate Page Table Entry

Resume at MCOPSW

X

X
X

Take Processor offline

X

X

X

X

X

X

X

X

X
X

X

X

X
X

X

X

X

X

X
X

X
X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X
X

X

X
X

X
X

X
X

X
X

X

X

X

Notes:
• Key.

X = Condition must be present

o

= Condition must not be present

©=
2~·7.10

X
X

X

Refresh the nucleus page
'Possible loss of Job.

X

X

X

X

X

Restartable Wait
Enter RTM for Recov.
Record

X

X

Repair SPF Key
Disabled Wait

X

The action is the same no matter which condition represents the situation

OS!VS2 System Programming Lihrary: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)

Debugging Problem Program Abend Dumps
The following steps may provide some initial assistance in this debugging process:
I. Locate the RTM2 work area (RTM2WA), which is pointed to by the TCBRTWA
field in the TCB and the ESART2WA field in the abend SVRB. It provides a
summary of the abend as follows:
Name

Offset

Explanation

RTM2CC

ID

Abend completion code.

RTM2ABNM

8C

Abending program name. This is the name of a
load module or an external entry point (ALIAS)
in the load module.

RTM2ABEP

94

Abending program address (the beginning 0 f the
load module or an ALIAS in the load module).

RTM2EREG

3C

Registers at time of error.

RTM2APSW

7C

EC PSW at time of error.

RTM2ILCI.

85

Instruction length code for PSW at time of error.

RTM2ERAS

36C

Error ASID.

RTM2TRCU

37C

Address of current trace entry for saved system
trace table.

RTM2TRFS

380

Address of first trace entry for saved system
trace table.

RTM2TRLS

384

Address of last trace entry for saved system trace
table.

RTM2ERR;\

B4

Error type.

Notes:
• The RTM2ABNM and RTM2ABEP fields do not contain information about
the abending program if an SVC has abended .
• In a recursive abend (an abend occurring while the original abend is
. being processed by an ESTAE or other recovery routine), more than one
RTM2WA may be created, and the RTM2PREV or RTM2PRWA field points
to other RTM2WAs associated with the problem. The system diagnostic
work area (SDWA) is pointed to by the RTM2RTCA field during recovery
routine processing, and has register contents at time of error stored in the
SDWAGRSV field. These register contents may differ from those in the
RTM2WA after a recursive abend.

Section 2: Important Considerations Unique to MVS

2.7.11

Miscellaneous Debugging Hints (continued)

2. To find the abend code and its explanation, look at the completion code at
. the top of the abend dump. A user completion code is printed as a 4-digit
decimal number and a system completion code is printed as a 3-digit
hexadecimal number.
If the user code is non-zero, a user program has specified the completion code
in an abend macro instruction. Looking up the name of the abending program
in the RTM2WA, and investigating why the program would issue this completion
code, should lead directly to the cause of the error in the user program.
Usually the system code is non-zero. This indicates that a system routine
issued the abend but a problem program might indirectly have caused the
abnormal termination. For example, a problem program might have branched to
an invalid storage address, specified an invalid parameter on a macro instruction,
or requested too much storage space.
Often the explanation of the system code gives enough information to
determine the cause of the termination. The explanations of system completion
codes, along with a short description of the action for the programmer to take to
correct the error, are contained in OS/VS Message Library: VS2 System Codes.
A summary of system codes is in the Debugging Handbook Volume 1.
Note: Completion codes are not printed at the top of abend dumps that are
formatted with the AMDPRDMP service aid. System completion codes can be
found in the third to fifth digits (OOxxxOOO) of the abend completion code in
the RTM2 work area. User completion codes are located in the sixth to eighth
digits (OOOOOxxx) of the abend code in the RTM2 work area, and in this case
are in 3-digit hexadecimal form.

3. To find the name of the abending program look in the RTM2 work area. System
routines usually start with the letters A or I; and module prefixes for system
routines are listed in the Debugging Handbook Volume 1.
Note: If the RTM2 work area is not available, or if the name of the abending
program is not given in the RTM2 work area, the routine name can be obtained
from the request blocks (RBs) that are formatted in the dump. If the ABEND
dump wastaken to a data set (or to SYSOUT) specified with a SYSABEND,
SYSMDUMP, or SYSUDUMP DD statement, the last two RBs are SVRBs for
the SNAP and SYNCH SVCs used to take the dump. The SVC numbers can be
checked.by obtaining the hexadecimal SVC number from the interruption code
of the WC-L-IC field in the RB. The Debugging Handbook contains a list of SVC
numbers. The SNAP SVC is hexadecimal '33', and the SYNCH SVC is hexadecimal 'OC'. The RB for the program that caused the abend is immediately
before these two RBs.

CSECTs within load modules in the private area of an address space can be
located using a linkedit map produced by the AMBLIST service aid. CSECTs
in load modules in the nucleus, FLPA, or PLPA can be located using a nucleus
or link pack area map, also produced by AMBLIST.

2.7.12

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)

4. To find the instruction that caused a program interrupt (program check)
completion code (OCx) in a problem program, examine the PSW at the time of
error. It is at the top of the abend dump, in the RTM2 work area, and in the
RB for the program that caused the abend. The instruction address field in
the PSW contains the address of the next instruction to be executed.
The length of the abend-causing instruction is printed following the'
instruction length code's title 'ILC' at the top of some abend dumps. It is
also located in the RTM2ILCl field (see the RTM2 work area), and is formatted
in the third and fourth di.gits (OOxxOOOO) of the WC-L-IC field in the PRB.
The address of the instruction that caused the termination can be found by
subtracting the instruction length from the address in the PSW.
Subtract the program address found in the RTM2WA (and in the last PRB)
from the instruction address. The resulting offset can be used to find the
matching instruction in the abending program's assembler listing for this CSECT.
S. To find the cause of a program interrupt, check the explanation of the system
completion code and the instruction that caused the interrupt. Also check the
registers from the time of error which are saved in the RTM2WA and in the
SVRB following the RB for the program that caused the abend. The
formatted save area trace can be used to check the input to the failing CSECT.
6. To find the cause of an abend code from anSVC or from a system I/O
routine, check the explanation of the system completion code, then find the last
instruction executed in the failing program and examine the related SVC and I/O
entries in the trace table or GTF trace records.
The last PRB in the formatted RBs has a PSW field containing the address
of the instruction following the instruction that issued the SVC. For I/O
requests, check the entry point address (,EPA') field in the last PRB. The
formatted save area trace gives the address of the I/O routine branched to,
and the return address in that save area is the address of the last instruction
executed in the failing program.
The trace information can be checked for SVC entries that match the
formatted SVRBs, or for I/O entries issued from addresses in the failing program.
The trace information is formatted in the dump if the installation has specified
it as a dump option. If the system trace table is not formatted, look in the
RTM2 work area for pointers to the copy of the system trace table that was
saved from the time of the error. Location X'S4', which is the FLCTRACE
field in the prefixed save area (PSA), points to the system trace table header.
The system trace table is frequently overlaid with entries for other system
activity by the time the dump is produced.
If the dump contains trace records, begin at the most recent entry and
proceed backwards to locate the most recent SVC entry indicating the problem
state. From this entry, proceed forward in the table. Examine each entry for an
error that could have terminated the SVC or I/O system routine. The format of
system trace table entries is described in the Debugging Handbook under the
heading 'TTE Trace Table Entry.' The format of GTF trace records is also
described in the Debugging Handbook.

Section 2: Important Considerations Unique to MVS

2.7.13

Miscellaneous Debugging Hints (continued)

Debugging from Summary SVC Dumps
The summary dump area formatted by the SUMDUMP option of SDUMP should
contain the most current data relevant to the problem present in the dump. It is
strongly recommended that the SUMDUMP output be reviewed prior to
investigating the usual portions of the dump. The SUMDUMP option provides
different output for SVC and branch entries. For example, branch entries general'
dump PSA, LCCA, and PCCA control blocks, and SVC entries generally dump
RTM2WA control blocks. Each output type is indicated by the header
"- - - - tttt - - - - RECORD ID X'nnnn'," where tttt is the title for the type of
SDMDUMP output, and nnnn is the hexadecimal record identifier assigned to
the type. The record id values are described in the table below. They are also
described by the IHASMDLR mapping macro in the Debugging Handbook.

SUMDUMP Output for SVC-Entry SDUMP
, The following table summarizes the SUMDUMP output types for an SVC entry to
SDUMP:
SVC-ENTR Y TABLE
Record ID
Dec. Hex

Title

Mapping
Macro

TRACE TABLE

TTE

4

4

46

2E

SUMLIST RANGE

48

30

REGISTER AREA

49

31

PSW AREA

53

35

NORMAL DATA END

57

39

RTM 2 WORK AREA

IHARTM2A

58
60

3A
3C

RTM2WA TRACE TAB
ASIDINFO

TTE

Fields used to Dump
PSW or Register Areas

RTM2NXTI
RTM2EREG

For an SVC entry to SDUMP, the SUMDUMP output can contain information
that is not available in the remainder of the SVC dump if options such as region,
LSQA, nucleus, and LPA were not specified in the dump parameters.
For each address space that is dumped, the SOMDUMP output is preceded
by a header with the ASID, plus the jobname and stepname for the last task
created in the address space. The SUMDUMP output contains RTM2 work
areas for tasks in address spaces that are dumped. Many of the fields in the
RTM2WA provide valuable debugging information. (See "Debugging Problem
Program ABEND Dumps" for more details.)

2.7 .. 14

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)
Each RTM2WA is followed by 'RTM2WA TRACE TAB' output (record id
x'3A'), if there is a copy of the system trace table associated with the RTM2W A
(RTM2TRCU, RTM2TRFS, and RTM2TRLS fields are non-zero). The current
entry in the trace table copy is pointed to by RTM2TCRU (offset 37C) in the
associated RTM2 work area. System trace table entries are mapped by the TTE
(Trace Table Entry) section in the Debugging Handbook.
Each RTM2WA is also followed by 'PSW AREA' output (record id X'31 ').
A PSW area, consisting of the instruction pointed to by the RTM2NXTl field in
, the EC PSW saved in the RTM2WA, and the preceding instruction with length
from the RTM2ILCl field, is dumped if the instructions can be accessed ..
After information for all RTM2WAs associated with a task is dumped, 'PSW
AREA' (record id X'31') and 'REGISTER AREA' (record id X'30') output appears.
This consists of 2K of storage before and after each valid unique address pOinted to
by the PSW and the registers from the time of the error (RTM2NXTl and
RTM2EREG fields) from all the RTM2 work areas. Up to 32 unique addresses can
be dumped for each task. Register addresses less than 2K are not dumped because
they are considered to be counters. If the storage that is 2K before and after an
address cannot be accessed, a length of 300 bytes is tried. If that amount of storage
cannot be accessed, the address' record entry appears with a zero length.
'TRACE TABLE' output (record id X'04') appears if the first address space
dumped has no trace table saved in an RTM2 work area and the system trace was
active. The output includes the header (pointers to the current, fi~st, and last
entries) and the entries in the system trace table. System trace table entries are
mapped by the trace table entry (TTE) described in the Debugging Handbook.
'SUMLIST RANGE' output (record id X'2E') appears at the beginning of the
SUMDUMP output if the SUMLIST keyword was specified in the SDUMP macro
instruction.

Section 2: Important Considerations Unique tQ MVS

2.7.15

Miscellaneous Debugging Hints (continu.ed)
SUMDUMP Output for Branch-Entry SDUMP
The following table summarizes the SUMDUMP output types from a branch entry
to SDUMP:
BRANCH-ENTR Y TABLE
Title

Mapping
Macro

1
2
3

PCCA
LCCA
PSA

lHAPCCA
IHALCCA
IHAPSA

4
5
6
7
8
9

TTE
TRACE TABLE
FRRSTACK
·IHAYSTAK
GWSAPAGE 10 ERR
GWSA GET/FREEMAIN GWSA RSM
GWSA RSM SUSPEND
GWSA MEM SWITCH
GWSASTATUS
GWSASRM
GWSA MEM TERM
GWSA ENQ/DEQ
GWSA STOP/RESTRT
GWSA lEAVESCO
CWSALOW-LVL CMN
CWSAGTF
CWSASRM
CWSA TIMER
CWSAACR
CWSA RTM/MACHK
CWSA lOS FLIH
CWSA DISPATCHER
CWSAMFI
CWSAABTERM
CWSA I/O RESTART
CWSASTATUS
CWSA SUPR REPAIR
CWSA RTM-CCH

Record ID
Dec. Hex

l'

2
3
4
5
6
7
8
9
10
11

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

2.1.16

A
B
C
D
E
F

10
11
12
13
14
15
16
17
18
19
lA
IB
lC
ID
IE

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Fields used to Dump
PSW or Register Areas

FLCIOPSW, FLCPOPSW
FLCEOPSW,FLCROPSW

Miscellaneous Debugging Hints (continued)
BRANCH-ENTR Y TABLE (Continued)
Record ID
Dec. Hex

Mapping
Macro

Title

31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

IF
20
21
22
23
24
25
26
27
28
29
2A
2B
2C
2D
2E
2F
30
31
32

LWSA LOW-LVL CMN
LWSA VALID'Y CHK
LWSARTM
LWSASDUMP
LWSAABTERM
LWSACIRB
LWSA STG2 EXT EF
LWSA EXIT (SVC3)
LWSAPOST
LWSAWAIT
LWSA STATUS
LWSASTAE
LWSAEVENTS
LWSARSM
LWSA ASCB CHAP
SUMLIST RANGE
INT HANDLER SA
REGISTER AREA
PSW AREA
GBL WSA VEC TABL

51

33

CPU WSA VEC TABL

S2

34

LCL WSA VEC TABL

S3
S4
55
S6

35
36
37
38

NORMAL DATA END
CWSAASMDIE
CWSA ASM SRB-I/O
SDWA

60

3C

ASIDINFO

IHAIHSA

Fields used to Dump
PSW or Register Areas

IHSAGPRS

IHAWSAVT
(WSAVTG)
IHAWSAVT
(WSAVTC)
IHAWSAVT
(WSAVTL)

IHASDWA

SDWAGRSV

The SUMDUMP output for a branch entry to SDUMP might not match the data
that is at the same addresses in the remainder of the dump. The reason for this is
that the SUMDUMP is taken at the entry to SDUMP, and while the processor is
disabled for interrupts. The system data in the remainder of the dump ~s often
changed because other system activity occurs before the dump is complete. The
SUMDUMP output is preceded by a header with the ASID for the failing address
space.

Section 2: Important Considerations Unique to MVS

2.7.17

Miscellaneous Debugging Hints (continued)

From a branch entry into SDUMP, the SUMLIST range and trace table output is
handled similarly to that from an SVC entry. However, SUM LIST addresses must
point to areas that are paged~in or they cannot be dumped.
The PSA, LCCA, and PCCA are dumped for each alive processor (record ids
x'03', x'02', and x'Ol' respectively).
The interrupt handler save area (IHSA - record id x'2F') is dumped for the
current address space. This save area includes the current FRR stack for suspended
address spaces.
The system diagnostic work area (SDWA - record id x'38') is dumped for the
current error if the RTMI work area is currently valid and being used.
Unique register contents are obtained from the IHSA and the current SDWA.
Each unique register value is used as an address and storage is dumped from 2K plus
and minus this address for a total of 4K each. These 'Register Areas' are printed
with record id X'30'.
The Super FRR Stacks (record id X'05'), including RTMI work areas are
dumped.
The global, local, and processor work save area vector tables (record id X'32',
X'34', and X'33' respectively) are dumped. The save areas pointed to by these save
area vector tables are also dumped. The branch-entry table at the beginning of this
description lists the record ids for each work save area.
2k of storage on either side of the address portion of the I/O old PSW, the
program check old PSW, the external old PSW, and the restart old PSW saved in
the PSA for all processors, is dumped. These 'PSW Areas' are printed with record
id X'31'.

Note: The SUMDUMP output from a branch entry to SDUMP only contains areas
that were already paged in when the SUMDUMP was taken.

I Started Task Control ABEND and Reason Codes
In case of an irreparable error, the started task control (STC) routines issue these
ABEND codes:
OB8 -

An error occurred while STC routines were processing a START,
MOUNT, or LOGON command.
In each case, the command task is terminated; for a START or MOUNT
command, the STC routines issue message IEE824I.
The following error codes
ABEND:
04

2.7.18

~an

appear in register 15 at the time of the

Module IEEPRWI2 or IEFJSWT detected an invalid command
code in the CSCB; the command code was incorrect for a
ST ART, MOUNT, or LOGON command.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Miscellaneous Debugging Hints (continued)

)B9 -

08

- Module IEESB605 invoked IEF AB4FC (an Allocation
routine) to build a TIOT for the START, MOUNT, or
LOGON task; IEF AB4FC returned control to IEESB605 with
a return code indicating failure.

12

- Module IEESB605 invoked IEFJSWT (an STC routine) to
. write the internal JCL text for the START, MOUNT, or
LOGON command into system data set; IEFJSWT returned
control to IEESB605 with a return code indicating that it
failed in its attempt to open the data set.

Module IEESB605 invoked the master subsystem via the subsystem
interface to determine whether a START command was issued to start
a subsystem; an error occurred during master subsystem processing.
The command task is terminated; for a START or MOUNT command,
IEESB605 issues message IEE8241.

OBA -

Module IEESB605 invoked the master subsystem via the subsystem
interface to determine whether a START command was issued to start
a subsystem; an error occurred during subsystem interface processing.
The command task is terminated; for a START or MOUNT command,
IEESB605 issues message IEE8241.

I SWA Manager Reason Codes
In case of an irreparable error, the SWA manager routines issue a OBO ABEND.
Before abending, both object modules IEFQB550 and IEFQB555 place a code in
register 15 indicating the exact cause of the error.
These are the error codes that can appear in register 15:
04

- The routine that called SWA manager requested an invalid function.

08

- The routine that called SWA manager passed an invalid SW A virtual
address (SV A). Either the SVA does not point to the beginning of a SWA
prefix or the SWA prefix has been destroyed.

OC

- A SWA manager routine has attempted to read a record not yet written
into SWA.

10

- Either IEFQB550 (move mode module) has attempted to read or write
a block which is not 176 bytes or IEFQB555 (locate mode module) has
attempted to assign a block with a specified length of 0 or a negative
number.

14

The routine that called SWA manager has specified an invalid count
field. For move mode, an invalid count is 0 for a READ, WRITE, or
ASSIGN function; an invalid count for WRITE/ASSIGN is 00.

18

The routine that called SWA manager by issuing the QMNGRIO macro
instruction specified both or neither of the READ or WRITE options.

1C

The routine that called SWA manager was attempting to write into a
SWA block for the first time; it either passed a nonexistent ID or failed
to pass one at all.

20

IEFQB555 has attempted to write a block using an invalid pointer to
the block.

Section 2: Important Considerations Unique to MVS

2.7.19

2.7.20

OS/VS2 System Progran:lming Library: MVS Diagnostic Techniques

Additional Data Gathering Techniques

This chapter describes additional techniques for gathering data and circumventing
certain system problems. The superzaps should be checked out before they are
applied to your system. Displacements vary according to release level and PTP
activity.
The examples were deliberately kept simple and are designed to illustrate a
technique rather than to be practical in themselves.

CA UTION: Extreme care must be used when you are considering a system
alteration in order to gather additional data about a problem. None of the Super~aps descnoed in this chapter should be applied before the system programmer has
verified the logic being zapped and the trap logic itself. Remember if anyone
location or offset within the module or trap changes, all offsets and base registers
must be verified.
This chapter contains the following topics:

I•

I

Using the CHNGDUMP , DISPLAY DUMP, and DUMP Commands
• How to Print Dumps
• How to Automatically Establish System Options for SVC Dump
• How to Copy PRDMP Tapes
• How to Rebuild SYSl.UADS
How to Print SYS 1.DUMPxx
• How to Clear SYSl.DUMPxx Without Printing
• How to Print the SYSl.COMWRITE Data Set
• How to Print an LMOD Map of a Module
• How to Re-create SYSl.STGINDEX
• Software LOGREC Recording
• Using the PSA as a Patch Area
• Using the SLIP Command
• Enabling the PER Hardware to Monitor Storage Locations
• System Stop Routine
• Using the MVS Trace to Monitor Storage
• How to Expand the Trace Table

I•

Section 2: Important Considerations Unique to MVS

2.8.1

Additional Data Gathering Techniques (continued)

Using the CHNGDUMP, DISPLAY DUMP and DUMP Commands
A dump obtained from MVS contains those storage areas specified in the dump
request and those defined as system defaults in SYSl.PARMLIB for SYSABEND,
SYSMDUMP, and SYSUDUMP. Normal system defaults are:
SYSABEND:

CB, ENQ,TRT,ALLPA, SPLS, LSQA, PSW, REGS, SA, DM,
10, and ERR

SYSMDUMP:

LSQA, NUC, RGN, SQA, SWA, and TRT

SYSUDUMP:

CB, ENQ, TRT, ALLPA, SPLS, PSW, REGS, SA, DM, 10,
and ERR

There are no defaults for an SVC dump other than SQA, ALLPSA, and
SUMDUMP, which are assumed by the dump program if the options NOSQA,
NOALLPSA,and NOSUM are not specified.
The CHNGDUMP operator command is llsed to dynamically alter the options
specified originally by SYS I.PARMLIB or by previous CHNGDUMP
commands. Dump mode may be set to ADD, OVER, or NODUMP. System action
for each setting is:
~equest

ADD

- merges the options specified on the dump
the system dump options list.

OVER

- ignores the options specified in the dump request and uses only the
options in the dump options list.

with the options in

NODUMP - ignores the request and does not dump.
To determine the c.urrent system dump options, use the DISPLAY DUMP,
OPTIONScommand. If an error is made while specifying the CHNGDUMP
command, the system rejects the command and issues an error message.
The topic '.'How to Automatically Establish System Options for SVC Dump",
which appears later in this chapter, describes how to issue the CHNGDUMP
command during IPL. See Operator's Library: OS/VS2 MVS System Commands
for the format of the CHNGDUMP command.
The DUMP command must be used carefully if the desired dump is to be
obtained. For instance, the following typical error can occur when requesting
a dump. The operator enters DUMP COMM=(title). The system responds with
message IEE094 requesting the dump parameters. If the operator replies 'U'
to this message, the system dumps the current address space which is the
master scheduler address space. The operator must reply with ASID, Jobname,
or TSOname. See Operator's Library: OS/VS2 MVS System Commands for the
format of the DUMP command.

How to Prin t Dumps
The PRDMP control statements can be used to minimize the size of the output
produced from a stand-alone dump and still keep the number of reruns to a
minimum. This section discusses the DD statements and control statements used
in the following example:

2.8.2

OS/VS2 System Programming Library:' MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)
/lASIDDMP JOB MSGLEVEL=1

/I EXEC PGM=AMDPRDMP
/lPRINTER DD SYSOUT=A
/lSYSPR INT DD SYSOUT=A
/lTAPE DD UNIT=TAPE,LABEL=(1,NL),VOL=SER=ABCTPE,DISP=OLD
/lSYSUT1 DD UNIT=251 ,SPACE=(TRK,(400,20)) ,DISP=NEW
/1* PRINT STORAGE=ASJD(X)=(X,X,X,X,X,X) IS PROPER FORMAT
CVTMAP
CPUDATA
SUMMARY
QCBTRACE
SUMDUMP
LPAMAP
FORMAT
EDIT
PRJ'NT CURRENT,SQA
PR I NT STORAG E=ASI D(X)=(xxxx,xxxx,xxxx,xxxx)
PRINT JOBNAME=(jobnames)
PRINT REAL=(xxxx,xxxx)
ASMDATA
END

The PRINTER DD statement defines the output data set for the dump itself. It
should be directed to aSYSOUT class as shown.
The SYSPRINT DD statement defines the data set for PRDMP messages, etc.
The TAPE DD statement defines the input data set to PRDMP. It can define one
of the SYS1.DUMPxx data sets, a stand-alone dump tape, or a GTF output data set
on either tape or DASD.
The SYSUTI DD statement defines work space to PRDMP. It can be used to
define the input data set. It is not required if the input data set is defined by the
TAPE DD statement. It does, however, significantly enhance the performance of
PRDMP when it is used in conjunction with the TAPE DD statement and when
the input is a tape data set.
The SPACE parameter is determined by the size of the dump. Generally 5
cylinders or 95 tracks or 285 4104 records should be specified for each megabyte
of real storage dumped by SADMP.
Control Statements
The placement of the control statements determines the sequence in which the
dump is printed. Refer to the "Dump and Trace Formats" section of the
Debugging Handbook for examples of how these statements format a dump.
The following statements should be included in ev.ery run of PRDMP:
SUMMARY - defines and prints the dump ranges of the dump, active processor,
active tasks, etc.
CVTMAP - formats the CVT and can be an aid in finding other Significant control
blocks in thesystem.
CPUDATA - formats the CSD, PSA, PCCA and LCCA for each active processor.

Section 2: Important Consid.erations Unique to MVS

2.8.3

Additional Data Gathering Techniques (continued)

QCBTRACE - formats the END/DEQ control blocks in use at the time the dump
was taken.
SUMDUMP - locates and prints the summary dump data provided by SVC dump.
It should be used on all SVC dumps.
LPAMAP - provides a listing of the modules on the link pack area list. It
identifies the entry point address of those modules and their length. It does
not identify SVC modules since they are found by the SVC table.
The FORMAT statement can produce voluminous data depending on the number
of address spaces defined at the time the .dump is taken. However, it should be
included in the initial run of PRDMP because it produces the formatted TCB
summary showing the abend completion codes for each TCB in the system and
the global and local SPLs.
The EDIT statement should also be included in every initial run ofPRDMP. It
formats and prints the GTF buffers (that is, all internal trace buffers or those
external trace buffers that havenot been written to the TRACE data set) if GTF
is active at the time the dump is taken. If GTF is not active, only an error
message is printed. The OS trace is not valid if GTF is running.
The PRINT statemen t can be used several ways:

2.8.4

•

PRINT CURRENT, SQA - should be included in the initial rUn of
PRDMP. It formats and prints the address space and task-related control
blocks of the address space active at the time the dump is taken. SQA
should be printed for the valuable data it contains such as trace table,
and logrec buffers. PRINT CURRENT prints only the current address
space of the processor from which the SADMP program was IPLed.

•

PRINT NUC, CSA - should not be included in the initial run of
PRDMP because of the volume of data it produces. Once a problem is
suspected in this area, the PRDMP program should be rerun specifying
only these parameters.

•

PRINT STORAGE=ASID(x)=(xxxx,xxxx) - should not be included in
the initial run of PRDMP. Once a problem is isolated to an address space
or a range of storage addresses, rerun PRDMP specifying only these
parameters. Several AS IDs and several address ranges can be requested
with one run of PRDMP. PRDMP does not duplicate address ranges for
every ASID but prints all storage dumped (NUC, CSA, SWA, LPA in
storage) if only ASIDs are specified without address ranges. PRINT
STORAGE is useful for printing SVC dumps. See the discussion "How
to PrintSYSl.DUMPxx" later in this chapter.

•

PRINT JOBNAME=Gobnames) - produces output equivalent to PRINT
CURRENT except it prints the private address space ofjob(s) requested.
It should not be used for the initial run of PRDMP unless the jobname is
known from another source, such as the system log.

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)
•

PRINT REAL=(xxxx,xxxx) - prints real storage in specified address
range pairs. Use this option only when the system cannot find adequate
data to format the dump.

ASMDATA - formats and prints all ASM control blocks. It produces voluminous
data and should not be run until an ASM failure is suspected.

How to Automatically Establish System Options for SVC Dump
A potential problem is that the SVC dumps written to the SYSl.DUMPxx contains
only those address ranges that the FRR or EST AE routine passes to SDUMP. When
these dumps are subsequently printed by PRDMP, the PRDMP formatting program
might not find sufficient data to format the dump properly. This can make it
difficult to find data in an SVC dump and it can provide erroneous indicators to the
problem solver.
The CHNGDUMP command can be used to alter the SVC dump system options
and provide a complete dump. The following job updates the COMMNDOO
member of SYSl.PARMLIB to issue the CHNGDUMP command automatically at
IPL time. The CHNGDUMP command can also be entered by the operator. (See
Operator's Library: OS/VS2 MVS System Commands for a description of the
CHNGDUMP command.)
IIUPDAT JOB ("S,S),MSGLEVEL=1,REGION =100K
II EXEC PGM=IEBUPDTE
IISYSPRINT DDSYSOUT=A
IISYSUT1 DO UNIT=SYSDA,VOL=SER=SYSRES,DISP=OLD,DSN=SYS1.PARMLIB
IISYSUT2 DO UN IT=SYSDA,VOL=SER=SYSR ES,D ISP=OLD ,DSN=SYS 1.PARM LI B
ffSYSIN DO DATA
.f REPL NAME=COMMNDOO,LIST=ALL
.I NUMBER NEW1=10,INCR=20
COM='TRACE ON'
COM='CD SET,SDUMP=(PSA,NUC,SQA,LSQA,RGN,TRT), Q=YES,ADD'
./ ENDUP

How to Copy PRDMP Tapes
It is sometimes necessary to copy dump tapes to supply another location with a
copy of the dump while retaining your own. It is particularly useful to be able to
supply a dump tape with an AP AR.

A simple way to do this is to use PRDMP as a copy program. Define the input
tape with the TAPE DD statement and the output tape with the SYSUT2 DD
statement. It is also possible to put several dumps on one tape or take one dump
from a multiple dump tape by manipulating the file number parameters in the label
parameter. The following example shows how this is done:

Section 2: Important Considerations Unique to MVS

2.8.5

· Additional Data Gathering Techniques (continued)
I/ASIDDMP JOB MSGLEVEL=1
EXEC PGM=AMDPRDMP
/IPRINTER DO SYSOUT=A
IISYSPRINT DDSYSOU.T=A
IITAPE DO UNIT=TAPE,LABEL=(2,NL),VOL=SER=DMPIN,DISP=OLD
IISYSUT2 'DO UNIT=:=TAPE,LABEL=(,NL),VOL=SER=DMPOUT,DISP=(NEW,KEEP)
/ISYSIN DO *
END

II

1*

After 'copying a PRDMP tape, a quick run through PRDMP to verify that the CVT
can be formatted and printed will prove that the copy was successful.
IIADMP JOB MSGLEVEL=1
II
EXEC PGM=AMDPRDMP
IIPRINTER bb SYSOUT=A
IISYSPRINT DO SYSOUT=A
IITAPE DO UNIT=TAPE,LABEL=(1 ,NL) ,VOL=SER=DMPTPE,DISP=OLD
IISYSUT1 DO UNIT=SYSDA,SPACE=(TRK,(400,20)),DISP=NEW
CVTMAP
END

1*

How to Rebuild SYSl.UADS
The loss of the SYSl.UADS data set can significantly impact a TSO environment.
However, it is possible to run the TMP as a batch job and recreate SYS1.UADS in
the background. The following is an example of a job that has been run
successfully to scratch and recreate a SYSt;UADS data set.
IIBLDUADS JOB MSGLEVEL=1
II EXEC PGM=IEFBR14
/1002 DO VOL=SER=SYSRES,DISP=(OLD,DELETE),UNIT=3330,
/I
DSN=SYS1.UADS
II
EXEC PGM=IKJEFT01
IISYSPRINT DO SYSOUT=A
IISYSUADS DO DSN=SYS1.UADS,DISP=(NEW,KEEP),SPACE=(SOO,(20,9,30)),
II
UN IT=3330,VOL=SER=SYSR ES,DCB=(R ECFM=FB,DSORG=PO, LR ECL=80,
II
BLKSIZE=SOO)
IISYSLBC DD DSN=SYS1.BRODCAST,DISP=SHR
IISYSIN DO *
ACCOUNT
SYNC
ADD (USER01 TSOTEST * IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT
ADD (USER02 TSOTEST * IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT
ADD (USER03 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD (USER04 TSOTEST *IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT
ADD (USER05TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD,:(USER06 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD (USER07 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD (USEROS;TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD (USER09 TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD (USEROA TSOTEST * IKJACCNT) UNIT(SYSDA) ACCT OPER JCL MOUNT
ADD (USEROB TSOTEST * IKJACCNT) UN IT(SYSDA) ACCT OPER JCL MOUNT
ADD (USEROC TSOTEST * IKJACCNT) UN rr(SYSDA) ACCT OPER JCL MOUNT
LIST (*)
END

1*

2.8.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)

How to Print SYSl.DUMPxx
See the discussion under "How To Print Dumps" earlier in this chapter to define
the control statements required. The same rules apply except in this case the TAPE
DD statement points to one of the SYSl.DUMPxx data sets. These are catalo~d
data sets and require no further definition.
Be aware that the dump data sets contain only those address ranges passed to
SVC dump by tb.e dump requestor and might not contain sufficient data for
PRDMP to properly format all requested control blocks.
Because SVC dumps usually contain a limited number of address ranges, printing
the entire SYSl.DUMPxx data set is feasible and assures that all the information
about the problem will'be available.
See the next topic "How To Clear SYSl.DUMPxx Without Printing" for a
description of how to clear the dump data sets for reuse. Note: Printing the dump
data sets does not clear them as it did on previous systems.
The following example shows how to print SYSl.DUMPOO:
IIASIDDMP JOB MSGLEVEL=1
EXEC PGM=AMDPRDMP
IIPRINTER DD SYSOUT=A
IISYSPRINT DD SYSOUT=A
IITAPE DD DSN=SYS1.DUMPOO,DISP=OLD
IISYSUT1 DD UNIT=SYSDA,DISP=NEW,SPACE=(CYL,(10,5))
SUMMARY
CVTMAP
CPUDATA
SUMDUMP
LPAMAP
PRINT STORAGE

II

/*

How To Clear SYS 1.DUMPxx Without Printing
In previous systems, printing the dump data set also cleared it and made it available
for reuse. In MVS this is no longer true. The dump data sets can be cleared at
'SPECIFY SYSTEMS PARAMETERS' time during IPL. They can also be cleared
and made available for reuse by using PRDMP to copy the data set to tape with
the SYSUT2 DD statement pointing to the output data set. This must be a separate
job step from printing the dump. If it has been determined that the SYSl.DUMPxx
data set need not be saved, it can be cleared and made available for reuse by
running PRDMP with the SYSUT2 DD statement defined as DUMMY. The
following example shows how'to clear SYSI DUMPOO. See the example in the
discussion "How to Copy PRDMP Tapes" e'arlier in this chapter for how to define
the SYSUT2 DD statement to unload the SYSl.DUMPxx data sets.

Section 2: Important Considerations Unique to MVS

l.&' 7

Additional Data Gathering Techniques'(cohtinued)

/lASIDDMP JOB MSGLEVEL=1'
EXECPGM=AMDPRDMP
l/PRINTERDD SYSOUT=A
/lSYSPRINT DD SYSOUT=A
IITAPE DD DSN=SYS1.DUMPOO,DISP=OLD
/lSYSUT2 DO DUMMY
IISYSIN DD *
END

II

How To Print The SYSl.COMWRITE Data Set
,The follQwingjobwill format and print the TCAM SYSl.COMWRITE data set.
Note that the PARM fields in the EXEC statement define the traces to be
formatted and printed. See OS/VS TeAM Debugging Guide Levell 0 for more
information on the use of the SYSl.COMWRITE data set.
/lCOMWRITE JOB MSGLEVEL=1
/lSTEP1 EXEC PGM=IEDQXa,PARM='STCB,IOTR,BUFF'
/lSYSPRINT DD SYSOUT=A
'
/lSYSUT1 DD DSN=SYS1.COMWRITE,DISP=SHR

I*'

How To Print An LMOD Map Of a Module
The following job produces a modUle cross-reference of the nucleus, module
IEFW21 SD, and a link pack area map. In addition, AMBLIST produces an IDR
listing or a complete hexadecimal dump of an object module. If you include the
RELOC parameter, the cross-reference listing is based at the address the module
is loaded in LPA.
, Note that the JCL must contain a DD statement for every data set containing a
module you referenced in the control card section.
For more information about AMBLIST, see OS/VS2 System Programming
Library: Service Aids.
/lAMBLlST JOB MSGLEVEL=1
EXEC PGM=AMBLIST
IISYSLlB,'OD DSN=SYS1.LPALlB,DISP=OLD
IILOADLIB DD DSN=SYS1 ;NUCLEUS,DISP=OLD
/lSYSPRINT DD SYSOUT=A
/lSYSIN DD *
LISTLOAD OUTPUT=XREF ,MEMBER=I EANUC01 ,DDN=LOADLIB
LlSTLPA
LlSTLOAD OUTPUT=XR EF,M EMB ER= I E FW21SD

II

"1*

2.8.8

OS/VS2 ,System Programming Library: MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)

How To Re-Create SYSl.STGINDEX
It is possible for the SYS 1.STGINDEX data set to be destroyed because of system
failure or operator intervention during an:lPLwith the coldstart (CLPA~CVIO)
option. Loss of this data set prevents warmstarting the system or restarting jobs
using VIa data sets.

I

The followingjob has been run successfully to recreate this data set. Remember
to change the VOLUME and CYLINDERS parameters to apply to your system.
/lSTGINDEX JOB MSGLEVEL=1
/lEXEC PGM=IDCAMS
IISYSPRINT DO SYSOUT=A
I!VOL DO DISP=OLD,UNIT=3330,VOL=SEA=SYSAES
/lSYSIN DO *
DEFINE SPACE(VOL(SYSAES)FILE(VOL)CYL(7))
DEFINE CLUSTER(NAME(SYS1.STGINDEX)VOLUME(SYSAES)CYLINDEAS(7)KEYS(128)BUFFEASPACE(5120)RECORDSIZE(2041 2041)REUSE)DATA(CONTROLlNTERVALSIZE(2048) )INDEX(CONTROLINTERVALSIZE(1024))

,/

Software LOGREC Recording
The following JCL defines a two-step job. The first step prints an event history
report for all SYS1.LOGREC records. The second step formats each software~
IPL, and EOD record individually. The event history report is printed as a result of
the EVENT=Y parameter on the EXEC statement of the first step. It can be a very
useful tool to the problem solver because it prints the records in the same sequence
they were recorded and therefore shows an interaction between hardware error
records and software error records.
I/EREP JOB MSGLEVEL=1
I/EREPA EXEC PGM=IFCEREP1,PARM='EVENT=Y,ACC=N',REGION=128K
I/SERLOG DO DSN=SYS1.LOGREC,DISP=SHR
I/TOURIST DO SYSOUT=A
/lEREPPT DO SYSOUT=A,DCB=BLKSIZE=133
/lEREPB EXEC PGM=IFCEREP1 ,PARM='TYPE=SIE,PRINT=PS,ACC=N',REGION=128K
/lSERLOG DO DSN=SYS1.LOGREC,DISP=SHR
I/TOURIST DO SYSOUT=A
I/EREPPT DO SYSOUT=A,DCB=BLKSIZE=133

1*

See the discussion on LOGREC analysis in the "Use of Recovery Work Areas"
chapter earlier in this section for an explanation of its use and for examples of the
output produced.

Section 2: Important Considerations Unique to MVS

2.8.9

Additional Data Gathering Techniques (continued)

Using The PSA As a Patch Area
There' are two areasin the PSA reserved for future expansion. They can be used
for· quitk implementation of a'trap without having to consider base registers. They
are X'410' - X'BFF' and X'ES4' - X'FFF'. Both of these areas are frequently
used in examples throughout this chapteL

CAUTION: Use extreme care when you use this method. Patches should be made
only to disabled code unless the patch is completly reentrant. Saving registers and
data in the PSA while the system is enabled could produce unpredictable results,
especially in an MP environment where more than one PSA exists and the code
could be interrupted and'subsequently redispatched on the other processor.
Extreme care must be used when considering a system alteration in order to gather
additional data about a problem. No superzaps should be applied before the
system programmer has verified the logic being zapped and the trap logic itself.
Remember if anyone location or offset within the module or trap changes, all
offsets and base registers must be verified.

Using the SLIP Command
SLIP (serviceability level indication processing) provides a way of getting information from RTM prior to ESTAE or FRR recovery processing. This is in addition to
the information ordinarily supplied by dumping services during abnormal termina. tion. The SLIP command, usual1y entered by a system programmer, either at the
console or via the input stream, can al,soreside in theJ~OMMNDxx parmlib
member. The SLIP command's purpose is to establish SLIP definitions of
the error circumstances under which interception of an error is to occur, and
of the action to be taken following the interception.

;If

1"4

As long as enough system queue area storage is available, SLIP definitions may
be established at any time. ,The recovery termination manager (RTM) compares
the SLIP definitions with the dynamic system conditions at the time of the error.
If RTM detects a match, the requested action is taken.
The ACTION keyword has the following options:

I•

ACTION=SVCD indicates that an SVC dump will be scheduled for the current
ASID. This is the default option if ACTION is not specified. SDUMP parameters
in this case are: SUM, SQA, RGN, TRT, LPA, CSA, and NUC.

(
2.8.10

PS/VS2 System Programll1!pgLibrary: MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)
One of the advantages of this dump over one taken by a recovery
routine is that nothing has been done to correct the error situation. Although
the bulk of the SVC dump is not taken until later , the summary dump portion
preserves as much volatile data as possible. An SVC dump also contains more
data than a SYSABEND or SYSUDUMP, and because it is machine readable, it
can, if necessary, be copied onto a tape to accompany an APAR, or used with
interactive dump display programs. The biggest advantage is in situations
where no dump was occurring.
• ACTION=W AIT indicates that the system will be placed in a 01 B wait state.
At this time, the operator can find the save area where the stop/restart routine
(IEESTPRS) saves the caller's (IEAVTSLP) registers. Register 2 contains the
address of the RTM work area for the error. This is either IHAFRRS (RTMI)
or IHARTM2A (RTM2). Register 4 contains the address of the SLIP control
element (SCE), which contains the id for this trap.

IHAWSAVT

CVT

X'2AC' CVTSPSA

V

X'24' WSAGREST

V

Save Area

Registers
0-14

• ACTION=NODUMP indicates that SLIP is to set a flag in the RTM work area
which is checked by the dump programs ABEND/SNAP and SVC dump. If the
bit is on, all dump requests are ignored. Because the bit is in the RTM work
area, only dumps requested during processing of this error by RTM or its subroutines (FRR and ESTAE) are suppressed. Shouid the error involve recursive
entry into RTM, the bit setting is propagated to the next RTM work ~rea.
This action is useful for preventing dumps that may not be needed (X22,
X37, etc.) because accompanying messages provide sufficient information. It
can also be used to prevent duplicate dumps for known problems which have
already been documented.
•

ACTION=IGNORE indicates that the system will not do any further SLIP processing, and that normal system recovery will continue. This option is normally
chosen for known errors. For example, if trapping OC4 completion codes and
SLIP SET,COMP=OC4,ACTION=IGNORE,LPAMOD=MODX,END is entered
after SLIP SET ,COMP=OC4,A=SVCD,END had been issued, it results in
dumps for all OC4 errors except those in module MODx. The ACTION=
IGNORE command must be issued after the original command because trap
conditions are checked LIFO.

~
~

Section 2: Important Considerations Unique to MVS

2.S.11

Additional Data Gathering Techniques (continued)
It is also possible to display information about SLIP definitions by using the
DISPLAY command at the operator's console. For details concerning operand
usage and entering the SLIP and DISPLAY commands, see Operator's Library:
OS/VS2 MVS System Commands. The folloWing is provided to demonstrate a
typical application of the SLIP command:

Obtaining a Dump with Queue Control Blocks and Elements
An error in the DEQ SVC routine is suspected because whenever program DVTRTN
executes, it abnormally terminates even though its parameter list is correct. The
resulting abend dump does not include queue control blocks and queue elements.
To get a dump that does include this information, issue the following SLIP command:
SLIP SET,ID=QELS,COMP=X30,ERRTYPE=ABEND,JSPGM=DVTRTN,END

ID identifies this SLIP definition as "QELS"; COMP specifies the applicable
system completion code; ERRTYPE specifies that an abend condition must exist
for this error interception; }SPGM identifies "DVTRTN" as the job step program
that must be executing for this error interception; END denotes the end of this
SLIP command.

Designing an Effective SLIP Trap
The design of a SLIP trap requires knowledge of the error conditions and what
makes the error unique. An effective trap should catch only the intended error.
To do this, the description should be as specific as possible.
The best way to design a trap is from a dump of the error. In the case of the
NODUMP action, a dump should be available. In other cases, an approximate dump
(one taken near the time 'of the error) or one without sufficient information to
'debug might be available. The following chart lists several SLIP keywords and
indicates the data area fields that SLIP compares them with.
It should be understood that SLIP operates as a subroutine within the RTM.
SLIP is called from either RTM 1 or RTM2, depending on whether the error
environment allowed FRR or only ESTAE recovery respectively. The level of RTM
in control affects the data areas available. The calls to SLIP are prior to calls to any
error recovery routines, therefore it is possible that the data areas contained in a
dump may have been changed since SLIP examined them. This is especially true of
the COMP keyword value. Many recovery routines change the abend completion
code to make it more specific. For example, a system service that receives a bad
address from a user parameter list will get an OC4 which it converts to its own
completion code meaning a bad parameter list.

2.8.12

OS/VS2 System Programming Library: MVS Diagnostic Techniques

,/
~

Additional Data Gathering Techniques (continued)

SLIP Keywords and Corresponding Data Areas
Note: There may be several RTM2 work areas pointed to by the TCB if several
abends occurred. The oldest one (last on the queue) is probably the best one to use.

ERRTYP
(RTMI)
In RT1TENPT of the RTM1 work area is the number indicating the reason for
entry into RTM1:
5=MACH

l=PROG
2=REST

10=PGIO

3=SYCERR

15=MEMTERM

4=DAT

(RTM2)
The reason for entry into RTM2 is indicated by flags in the RTM2 work area as
follows:
RTM2MCHK=MACH

RTM2SYCE=SYCERR

RTM2PCHK=PROG

RTM2TEXC=DAT

RTM2RKEY=RESTART

RTM2PGIO=PGIO

RTM2SYCD=ABEND

RTM2EOM=MEMTERM

RTM2ABTM=ABEND

MODE
System mode at error time is indicated in the MODEBYTE as follows:
1 ...
.1 ......
.. 1. . ...
... 1 ....
1 ...
.1 ..

.. 1.
... 1

MODESUPR
MODEDIS
MODEGSPN
MODEGSUS
MODELOC
MODETYP1
MODESRB
MODETCB

Supervisor Control
Physically disabled
Global spin lock held
Global suspend lock held
Locally locked
Type 1 SYC
SRB mode
Task mode (unlocked)

Section 2: Important Considerations Unique to MVS

2.S.13

Additional Data GatheringYechniques (continued)
(RTMl).
The MODEBYTE value is contained in RTIWMODE. The PSW from SDWAECI
is used for PP, Super, SKey, and PKey states. The SDWASTAF bit is used for
,RECV.
(RTM2)
In the ESAMODE field (SVRB + X'B1,l') of the SVRB pointed to byRTM2VRBC,
are bits mapped by MODEBYTE as indicated above. For the PSW values, SLIP
uses the RBOPSW field of the RB preceding the SVRB.
The RTM2RECR bit must be on for RECV, and in the previous RTM2 work
area the RTM2XIP bit plus the SCBINUSE bit of the SCB pointed to by
RTM2NSCBN must be on.

COMP
(RTMl)
In the SDWA, field SDWACMPC contains the original value.
(RTM2) ,
The RTM2CC field contains the original value for each work area.

JOBNAME
(RTMI and RTM2)
In the ASCB, fields ASCBJBNI or ASCBJBNS point to the job name for either
initiated or started jobs.

JSPGM
This keyword does not apply to errors which enter RTMl, so if it is specified, the
trap is limited to RTM2 type errors only.

PVTMOD (RTM2 only)

LPAMOD
. ADDRESS
The address used for these keywords is obtained from the same PSW used when
checking values for the MODE keyword. Additionally, PVTMOD applies to RTM2
type errors only and restricts the trap accordingly. The module name for PVTMOD
is compared with those in the CDE list for the jobstep TCB of the current address
space.

2.8.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)

ASID
(RTMl)
Both SOWAFMIO and SOW AASIO are checked.

I

(RTM2)

Both RTM2FMID and ASCBASID are checked.

Enabling The PER Hardware To Monitor Storage Locations
A convenient place to hook the system is in the MVS trace table's common
prologue code in lEAVTRCE. All interrupts and dispatcher entries enter this code.
Therefore a modification here will enter this trap after every interrupt and
before the dispatcher dispatches or redispatches any TCB or SRB. The trap in the
examples below was jnserted in module lEAVTRCE three instructions after the
label STR in place of the code that normally stores the timer value in the trace
table.
This trap does not stop the system but traces in the MVS trace the PSWs that
alter a specified storage location. To stop the system, a branch from the program
check FLIH can be made to a patch area, and a test can be made for the interrupt
code of X'80' with a branch equal to a trap to stop the system. In the system
dump, the instruction that performs the modification is pointed to by X'98'
in the PSA.
Care should be used with this diagnostic aid since degradation occurs in proportion to the number of interrupt.s taken. Only use it to monitor a section of
storage which is never modified or only infrequently modified. Once the trap
is in, there is ho need to re-IPL to remove it. Manually storing a word of zeros
in control register 9 prevents further interrupts.

Section 2: Important Considerations Unique to MVS

2.S.1S

Additiorial Data Gathering Techniques (continued)
Following is an example of the PER hardware trap to be applied by superzap.
NAME IEANUC01 IEAVTRCE
VER 03AO B2058000,4780B02C,D70380028002,D203C01C8002,947FC014
REP 03AO 4 7F00608,07000700 ,07000700,07000700,07000700,07000700
NAME IEANUC01 IEAVTRTS
VER 0796 82001078
REP 0796 47F00600
NAME IEANUC01 IEAVFXOO
V ER 0600 00000000,00000000,00000000 ,00000000 ,00000000,00000000,00000000
V ER 061 C 00000000,00000000,00000000,00000000,00000000 ,00000000 ,00000000
REP 0600 96401078,82001078
TURN ON PER BEFORE ENTERING FRR
REP 0608 96400300
ALWAYS TURN ON PER FOR DISPATCHER
REP 060C 4700B032,92F0060D
BUT SET THE FIRST TIME SWITCH FOR THE REST
REP 0614 96400058,96400060,96400068,96400070,96400078 SET THE NEW PSW.
REP 0628 B79B0630,
LOAD FUNCTION CODE, LOW AND HIGH RANGE
REP 062C 47FOa032
RETURN TO MAINLINE
REP 0630 XXOOOOOO
FUNCTION CODE IN HIGH ORDER
REP 0634 XXXXXXXX
LOW RANGE l *
REP 0638 XXXXXXXX
HIGH RANGE

r

*Note: To check a word in storage starting at 9F41C for example,
Lowrange address = 9F41C
High range address := 9F41 F.
To check a byte, use the same address in low and high.

Because the switch is in the PSA, the control registers and NEW PSWs are
initialized on both processors in an MP environment. However, they are set only
once and not each time through the routine.
The example in Figure 2-18 shows trace entries usingthe storage alteration mask
(function code X'20'). The interrupt address is the address of the instruction that
modified the monitored storage.

PGM
PGM
PGM

OLD PSW
OLD PSW
O,LD PSW

470C3080 A0009AOA
470CI080 E0034126
470C3080 AOOOBEF8

R15/RO 00009970 00DF467A
R15/RO 00014C20 00DF467A
R15/RO 0000BEB8 00DF467A

R1
R1
R1

00FFOB08
00FF837C
00FF837C

IDS 00400002
IDS 00400002
IDS 00400002

TCB 00000000
TCB 00000000
TCB 00000000

TME 9BD80401
TME 9BD80439
'TME 9BD8053F"

Figure 2-18. Trace Example of PER Hardware Monitoring

On occasion it might be necessary to monitor when only one address space is active.
One way of doing this is to change the previous superzap example at address 060E
from B032 to 0640 and include the following superzap. This superzap turns PER
on only if the specified address space is active.
NAME I EANUC01 I EAVFXOO
V ER
REP
REP
REP
REP
REP
REP
REP
REP
REP
REP

2.8.16

0640
0640
0644
0648
064C
0650
0654
0658
065C
0660
0664

00000000,00000000 ,00000000 ,00000000,00000000,00000000,00000000
58D00224
GET CURRENT ASCB
48DOD024
GET CURRENT ASID
49D00664
IS THIS MY ASID?
47800658
YES - GO TURN PER ON
B7990660
TURN PEROFF
47FOB032
RETURN TO MAINLINE
B7990630
TURN PER ON
47FOB032
RETURN TO MAINLINE
00000000
THIS WORD IN CR9 TURNS OFF PER
xxxx
ASID TO BE MONITORED

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Additional Data Gathering Techniques (continued)

Caution: Extreme care must be used when considering a system alteration in order
to gather additional data about a problem. No superzaps should be applied before the
system programmer has verified the logic being zapped and the trap logic itself.
Remember if anyone location or offset within the module or trap changes, all
offsets and base registers must be verified.

System Stop Routine
On occasion it is necessary to stop the system and take a stand-alone dump to fully
document a problem. Loading a wait state PSW is sufficient on a uniprocessor.
Stopping only one processor on an MP system is not adequate. This routine will stop
an MVS MP or UP system. The caller must be supervisor state and key zero. The
wait state code you wish displayed is placed at location X'756'. This trap also
moves the wait state PSW to storage location zero and loads the PSW from there to
prevent inadvertent restarts when the trap is hit.

"~

NAME IEANUC01 IEAVFXOO
VER 0700 36F'00'
REP 0700 ACFC074E
REP 0704 900F0758
REP 0708 58FOO010
REP 070C 58EOF294
REP 0710 91COE008
REP 0714 47E00744
REP 0718 41200000
REP 071C 41300001
REP 0720 48400204
REP 0724 1244
REP 0726 4770073C
REP 072A A E030009
REP 072E 4760072A
REP 0732 D20700000750
REP 0738 82000000
REP 073C A E020009
REP 0740 4760073C
REP 0744 D20700000750
REP 074A 82000000,0000
REP 0750 OOOEOOOO,OOOODEAD
REP 0758 00000000

DISABLE
SAVE REGISTERS
GET CVT POINTER
GET CSD POINTER
TEST IF MP
NO JUST LOAD WAIT PSW
SET REG 2 TO CPU 0
SET REG 3 TO CPU 1
GET CPU ADDRESS
TEST FOR CPU 0
NO, STOP CPU 0 FIRST
YES, STOP CPU 1 FIRST
SPIN TIL CC=O
MOVE THE WAIT PSW TO ZERO
LOAD WAIT STATE ON CPU 0
SIGP STOP CPU 0
SPI N TI L CC=O
MOVE THE WAIT PSW TO ZERO
LOAD WAIT STATE ON CPU 1
WAIT PSW
SAVE AREA

Caution: Extreme care must be used when considering a system alteration in order
to gather additional data about a problem. No superzaps should be applied before the
system programmer has verified the logic being zapped and the trap logic itself.
Remember if anyone location or offset within the module or trap changes, all
offsets and base registers must be verified.

Section 2: Important Considerations Unique to MVS

2.8.17

Additional Data GatheringTechniques (continued)

Using The MVS Trace To Monitor Storage

(

TheMVS trace code in module IEAVTRCE is an excellent place to hook the
system to monitor system operation and branch to a trap routine. Three instructions past label STR in lEAVTRCE is the code which stores the timer values in the
trace table. All trace entries pass through this code. Overlaying this code
allows you to monitor any place in the system as it runs disabled, key zero and
supervisor state. It must be understood that this ,code is physically disabled and
therefore the trap must not page fault. Also no reference can be made to private
area addresses since the trap can receive control in any address space. For larger
patches a branch from this code to a patch area in the PSA is possible. At entry to
this code, register 12 (C) points to the trace entry. This code normally stores the
timer value located at X'lC' into the trace table. Storing a word at register 12 (C)
. + X'lC' would allow dYnamic monitoring of that word in storage if addressability
is obtained. The other seven words of the trace table are ·built within the trace
entry code for each trace type. Monitoring for more than one word entails
changing all entries.
To eliminate certain trace entry types, it is only necessary to put a branch
instruction 07FB at the entry point for that entry.
Caution: Location X'10' cannot be monitored with this trap because the PCFLIH
refreshes location X'1 0' before it branches to the trace routine. Extreme care must
be used when considering a system alteration in order to gather additional data
about a problem. No superzaps should be applied before the system programmer has
verified the logic being zapped and the trap logic itself. Remember if anyone
location or offset within the module or trap changes, all offsets and base registers
must be verified.

How To Expand The Trace Table
Use the following zap to force trace on during NIP processing.
NAME IEEVWAIT IEEVWAIT
VER 0194 4710
REP 0194 47FO

To increase the size of the trace table, you may zap module IEAVNIPO at label
NVTTRACE to a greater value. It defaults to X'190' (400 decimal). Do not
exceed a value of X'400' for the size of the trace table; 806-4 and OC4 abends can
occur when the link pack area directory is accessed.
NAME IEANUC01 IEAVNIPO
VER 2ECO 0190
REP 2ECO XXXX
WHERE X IS THE NEW VALUE DESIRED.

Caution: Extreme care must be used when considering a system alteration in order
to gather additional data about a problem. No superzaps should be applied before the
system programmer has verified the logic being zapped and the trap logic itself.
Remember if anyone location or offset within the module or trap changes, all
offsets and base registers must be verified.

(
2.8.18

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Section 3. Diagnostic Materials Approach
",,,

\
/

This section provides guidelines for analyzing storage dumps to find which data
areas were affected by the error and to isolate internal symptoms of the problem.
The three chapters in this section are:
• Stand-alone Dumps

I • SVC Dumps
I • SYSABENDs, SYSMDUMPs, and SYSUDUMPs

Section 3. Diagnostic Materials Approach

•

3.1.1

(
3.1.2

OS/VS2 System ProgrammingUbrary: MVS Diagnostic Techniques

Stand-alone Dumps

The stand-alone dump provides the problem solver with a larger quantity of data
than system-initiated dumps because it contains areas that belong
to the entire operating system rather than just a single address space or component.
One of the major problems for the analyst is finding the important data for his
problem and then isolating the problem area. Once this isolation is achieved, the
debugger uses unique system/component techniques to gain further insight into the
exact cause of the problem.
This chapter points out where to look in a stand-alone dump to determine
various problem symptoms. The general approach is to analyze a stand-alone dump
to find out what the system isdoing (or not doing). Important areas will be
described and possible reasons for their current state/contents will be explained.
The analysis starts at the global system level and, by gathering data and gaining an
understanding of the environment, works down to the address space and task level.
The experienced problem solver realizes that under certain conditions it may
be necessary or advantageous to omit interpreting various areas. For example,
if during system operation he observes that a given segment of the system (such
as VT AM) is not functioning (other areas.appear okay - jobs are executing,
SYSIN!SYSOUT is appearing, etc.), he may decide to take a stand-alone dump.
In this case, the current state of the system is probably not important. He
would not be interested in current PSW, registers, etc.; he would be interested
only in the address spaces that are using VT AM and the state of the TP network.
The dump is not taken for a problem that is "active" now, but to give the analyst
data with which to determine a problem that appears to have originated some
time ago. The point is that knowing why the dump was taken will often govern.
which, if any, of the stand-alone dump areas are of significance for a given problem.
Information contained in the chapter on "Waits" in Section 4 can be used as a
supplement to the following discussions. (Also, a step-by-step approach to
analyzing a stand-alone dump is contained in Appendix B of this manual.)
To analyze a stand-alone dump, you should always ask the following questions:

1.

Why was the dump taken ?
Console sheets/logs are very important in stand-alone dump analysis. They are
often the key to solving "enabled wait" situations and may present valuable
information about system activity prior to taking the dump. Messages
concerning I/O errors, condition code=3, SVC dumps, abnormal job terminations, device mounts, etc. should be thoroughly investigated to determine if
they could possibly contribute to the problem you are tracking.
The dump title gives an'indication of the problem's external signs or, possibly,
a specific situation that must be investigated, such as "VTAM NOT
FUNCTIONING."

Section 3. Diagnostic Materials-Approach

3.1.3

Standalone Dumps (continued) .

2.

What is the current state 01 the system ?
Examine the available global data areas to determine what the system is
currently doing. The "Global System Analysis" chapter in Section 4 aids
in this process. Remember that at this point, you are gathering information and
trying to understand the system environment in order to isolate the internal
symptom; you are not ready yet to debug.

3.

Has your global analysis isotated the problem to an internal symptom ?
If so, refer to the qiscussion of that. symptom in Section 4 of this manual.

4.

What previous errors have occurred within the system,. could they possibly
.have any allect on your current problem ?
The interpretation ofSYSl.LOGREC and the in-storage LOGREC buffers are
most important in determining error history. See the chapter on "Use of
Recovery Work Areas" in Section 2.

5.

What is the recent system activity?
The chapter on "MVS Trace Analysis" in Section 2 aids in trace table
in terpretation.

6.

Wh€!t is the work status within the tiyslem ?
Your objective is t.o determine if the system has for some reason not completed
all scheduled work. Determining what that work is and why it is not
progressing can provide insight into the problem as well as answer some
questions that may hav.e arisen during an earlier analysis. Understanding the
majorcontrol block structure anq work queue status should aid in determining
the possible source of the error. Refer to the discussion of "Work Queues and
Address Space Status" in the "Global System Analysis" chapter of Section 2.

At this point, you should have gathered enough data to have a definition of the
internal problem symptom. You should also have considerable information about
the system's state, error history, and job status. You should refer to the
appropriate chapter in Section 4 "Symptom Analysis Approach" or, if you have
isolated the error to a component or process, Section 5 or Appendix A,
respectively.

3.1.4' OS/VS2 System ProgrammiIJg Library: MVS Diagnostic Techniques

SVC Dumps

SVC dumps (invoked by the SDUMP macro) are usually taken as a result of an
entry into a functional recovery routine (FRR) or ESTAE routine. The component recovery routine specifies the address that will be dumped.
The "Component Analysis" chapters in Section 5 should help you identify what
areas of the system were dumped and what they contain.
The SVC dump is taken asynchronously and the global data areas (PSA, LCCA,
PCCA, etc.) usually contain no relevant data except in cases where overlays,
machine checks, channel checks, etc., have occurred.
SDUMP options SQA, ALLPSA, and SUMDUMP are the defaults for all requests.
The SUMDUMP option of SDUMP provides a summary dump within an SVC dump.
There is a twofold purpose for this. First, since dump requests from disabled,
locked, or SRB-mode routines cannot be handled by SVC dump immediately,
system activity destroys much useful diagnostic data. With SUMDUMP, copies of
selected data areas are saved at the time of the request and then included in the
SVC dump when it is taken. Second, SUMDUMP provides a means of dumping
many predefmed data areas simply by specifying one option.
The data areas saved in SUMDUMP can be printed out by using the
AMDPRDMP control statement SUMDUMP. TIlls summary dump data is not
"mixed with the SVC dump because in most cases it is chronologically out of step.
Instead, each data area selected in the summary dump is separately formatted
and identified .
. For information on print dump program changes needed to print the summary
dump, and multiple address-space output from SVC dump, see OS/VS2 System
Programming Library: Service Aids.
The RTM2WA pointed to by the TCB upon whose behalf the dump is being
taken is the most valid system status indicator available. The dump task is usually
the current task; the task upon whose behalf the dump is being taken will contain
a completion code in the TCB completion code field. It is possible for the EST AE
routine to issue SVC D itself, in which case the current task is also the failing task.

Section 3: Diagnostic Materials Approach

3.1.5

I SVC Dumps (continued)
Because of MVS recovery (retry and percolation), the SVCdump may be only
part of the documentation at the problem solver's disposal. The problem solver
should attempt to obtain:
1.

2.

The system log for the time the dump was taken to ascertain if:
•

Any other SVC dumps were taken before or after the one he is
investigating.

•

Any task subsequently abended. If so, a system dump that displays other
areas of storage that have meaningful data may be available.

The LOGREC formatted listing for the time immediately preceding the time of
the SVC dump. If the component analysis procedure fails to determine the
cauSe of the problem, analyze the dump as you would a stand-alone dump.
Keep in mind that the information obtained via the CPUDATA option on
AMDPRDMP is probably meaningless. Refer to the "Global System Analysis"
chapter in Section 2 for information on how to do a task analysis of available
address-space-related control blocks.

Keep in mind that the system has detected the error and has attempted recovery,
at least on a system basis. Therefore, there will be a good indication of the type
(internal symptom) of error (loop, abend, problem check, etc.) that caused the
problem. (See Section 4, "Symptom Analysis Approach.")

How to Change the Contents of an SVC Dump Issued by an Individual Recovery
Routine
At times, SVC dump contents are not sufficient to solve a problem. The most
convenient way to change the contents is the CHNGDUMP command. It can be
used to establish system options to be added to the options on each SDUMP
request, or to totally override the SDUMP options. See "Using the CHNGDUMP
Command" in Section 2. If you do not want to affect all SVC dumps or if storage
lists are involved, you may wanttochange the parameter list in a particular ESTAE
exi t instead.
You can usually find the name of the recovery routine by looking at the user
data (or title) on the SVC dump printout. If not, search the ESTAE's PRB for the
virtual address of the SDUMP SVC instruction.
The following description of SDUMP's parameter list can help you decide which
bits will provide the data you want. The SDUMP macro expansion generates the
parameter list and puts the address of the list in register 1.

3.1.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

I SVC Dumps (continued)

I SDUMP Parameter List
Offset

o

1...

user-supplied DCB=

· 1..

BUFFER=YES

· . 1.

user-specified STORAGE= or LIST=

· .. 1 ....

user-specified HDR= or HDRAD=

1. ..

user-specified ECB=

· 1..

user-specified ASID=

· . 1.

QUIESCE=YES

· .. 1

BRANCH=YES

1. ..

indicates SDUMP (as opposed to SNAP)

· 1. .

indicates a SYSMDUMP request

· . 1.

indicates enhanced SVC Dump

· .. 1 ... .

user-specified ASIDLST=

· . .. 1.. .

user-specified SUMLIST=
reserved

others

SDATA options

2

1. ..

ALLPSA

. 1. .

PSA

.. 1.

NUC

... 1 ... .

SQA

1.. .

LSQA

· 1..

RGN

· . 1.

LPA

· .. 1

TRT (MVS trace table)

Section 3. Diagnostic Materials Approach

3.1.7

SVC Dumps (continued)

Of!Set
3

more SDATA options

1. ..

CSA

. 1..

SWA

.. 1.

SUMDUMP

. .. 1 ' . '. I~

•

1. . .
~

others

3.1.8

1..

NOSUMDUMP
,NOALLPSA
"NOSQA
reserved

4

DCB address

8

address of storage list

C

address of header record

10

address of ECB

14

caller's ASID

16

target ASID of scheduled dump

18

address of ASID list

lC

address of summary dump storage list

20

address of SYSMDUMP 4K SQA area

24

address of SYSMDUMP CSA work area

28

length of header record (less than 100)

29

header record (will appear as title)

OS/VS2,System PtogrammingLibratY: MVS Diagnostic Techn;'.Iues

SYSABENDs, SYSMDUMPs, and SYSUDUMPs

SYSABENDs, SYSMDUMPs, and SYSUDUMPs are produced by the system when a
job abnormally terminates and a SYSABEND, SYSMDUMP, or SYSUDUMP DD
statement was included in the JCL for the tenninating step. In an MVS system, the
output produced is dependent on parameters supplied in the SYSI .PARMLIB
members IEAABDOO, IEADMROO, and IEADMPOO for SYSABENDs,
SYSMDUMPs, and SYSUDUMPs, respectively. See OS/VS2 System Programming
Library: Initialization and Tuning Guide for the IBM-supplied defaults and options
that are available.

If the IBM defaults are used, a hexadecimal dump of LSQA is produced when
the SYSABEND DD statement is specified. MVS systems do not dump the nucleus
or SQA as a default for SYSABEND or SYSUDUMPs. SYSMDUMP defaults
include NUC and SQA.
With a SYSABEND, SYSMDUMP, or SYSUDUMP, the system has detected the
error and therefore provided a starting point (such as ajob step completion code)
for analysis. The analyst should always look at the JCL and allocation messages
that accompany the dump. The allocation messages contain error messages that can
sometimes be helpful. There will also be a JES2 job log that shows the operator
messages and responses that relate to the job. The error messages also contain
valuable information about the error and should always be investigated.
SYSABEND, SYSMDUMP, and SYSUDUMP errors can generally be divided into
two categories: software-detected errors and hardware-detected errors.

Software-Detected Errors
Software-detected errors are those in which one or more of the following occurs:
•

A module detects an invalid control block queue.

•

A called module returns with a bad return code.

•

A program check occurs in system code and a recovery routine changes the
program check to a completion code and abnormally terminates the task.
The best approach for a software-detected error is:

1.

Use the JES2 job log and allocation messages to investigate all error messages
produced. (Refer to the appropriate Message manual to determine the causes
and corrective action of each message.)

2.

Check the abend code defmed in the dump. (Refer to OS/VS Message Library:
VS2 System Codes to determine causes and corrective actions of the code.)
Some abend codes define problem determination areas that can be used to
help derme the problem.
.

Section 3. Diagnostic Materials Approach

3.1.9

· SYSABENDs, SYSMDUMPs, and SYSUDUMPs (continued)·

3.

In the event that sufficient data is not available in the Messages and Codes
manuals to resolve the problem, the analyst can go directly to the program
listing. The diagnostic sections of most PLMs contain a message/module and
abend'/module cross-reference. Once the correct module has been located,
the program listing (supplied in the system microfiche) helps to defme the
problem.

SYSABENDs, SYSMDuMPs, and SYSUDUMPs normally do not produce
system-related data areas other than those which are formatted. Because of this
and the fact that error recovery will attempt to reconstruct invalid control block
chains before terminating the task, any error that does not occur in the private area
maybe difficult to resolve from a SYSABEND, SYSMDUMP, or SYSUDUMP alone.
Because of the recovery and percolation aspects of MVS, the SYSABEND,
SYSMDUMP, or SYSUDUMP could be the end result of an earlier system error. If
so, the analyst should determine if any LOGREC entries were made pertaining to
this task and if any SVC dumps were taken while this task was running. The system
error is normally reflected in either the LOGREC entries, the dump data sets, or
both.

Hardware-Detected Errors
A hardware-detected error is a program check that is not intercepted by a recovery
routine. This is identified by a system completion code of X'OCx' where x is the
program check type. For this type of error, the analyst needs to know the address
of the module where the program check occurred, and the register contents when
the program check occurred. The best place to locate this information is in the
RTM2WA that is pointed to by the abending TCB.
Given the registers and PSW at the time of the error, the analyst should
determine the module that program checked by using the load list link edit maps
of the program. (If the module is outside the private area, a NUCMAP or LPA map
may be nece~sary.) Then he should examine the program listing for the module
until the cause of the program check is defined.

3.1.10

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Section 4. Symptom Analysis Approach .

This section describes how to identify correctly an external symptom, and provides
an analysis procedure for determining what kind of problem is causing the
symptom.
Each external symptom is described in a separate chapter, as follows:
• Waits
• Loops
• TP Problems
• Performance Degradation
• Incorrect Output

•

Section 4. Symp'\'i!ftn Analysis Approach

4.1.1

4.1.2

OS/VS2 System Programming Lib,rary: MVS DiagnoStic Techniques

Waits

Wait states may be either enabled or disabled. The characteristics of each type
are described below.

haracteristics of Enabled Waits
Enabled waits have traditionally been the most difficult problem to analyze because
of the lack of an obvious failure. The enabled wait provides no indication of error
other than that the system apparently has nothing to do. In fact the enabled wait
has been accurately described as an end symptom of a problem with no obvious
causes. The task of determining the possible cause is left to the debugger. Other
types of software failures - abends, program checks, loops, messages - provide
a starting point for analysis; that is, software or hardware has indicated a violation
of interfaces or data integrity and has halted the erroneous process at the point of
error. The enabled wait provides none of these.

Note: The subsystem design of many components includes a dispatching
mechanism and internal control block structure not generally recognized by the
operating system. When these subsystems (for example, VTAM, TCAM, JES2)
malfunction, work through these components is often halted. Because of the
critical nature of these processes, external signs of the problem are often detectable.
Within this debugging discussion, these problems are often treated as wait states,
that is, the system may be capable of running batch work, but the TP network
appears "hung-up." This general discussion of analysis-approach applies for
problems such as "permanently" swapped-out address spaces, TP network hung, and
no batch running.
The advantage is that the external symptoms may allow
you to more easily isolate the problem component or at least a starting point - it
may be obvious that TCAM is not responding, or that JES2 is not processing
input.
Experience has shown that in MVS a much greater percentage of re-IPL situations are caused by enabled waits than in previous systems. One reason for this
characteristic of MVS is software recovery. Software recovery attempts to repair
the damage caused by a failure and allow the system to continue meaningful operation. The general philosophy of recovery is to isolate the error to ajob, terminate
the job, and allow the system to continue. This philosophy dictates that under
certain conditions innocent work may be forcefully terminated.

Section 4. Symptom Analysis Approach

4.1.3

Waits (continued)
Software recovery obviously may cause the termination of some critical process
which in turn causes dependent processes to wait indefinitely. For example,
assume that while processing a page-fault, an error occurred during the I/O interruption processing; software recovery was invoked and subsequently caused a cleanup
of the bad control blocks, but did not post the I/O requestor. It is possible
that the paging mechanism will wait indefmitely for the missing interrupt. This
in turn could cause a problem program to wait indefinitely for the paging operation
to complete. The end result is no work accomplished and also no external problem
symptom, although a problem clearly exists. The debugger must find the bottleneck -the paging exception - and subsequently back-track enough to determine why the
bottleneck still exists. Very often, this back-tracking requires analysis of several
components in order to determine the original cause.

Characteristics of Disabled Waits
Situations can develop during execution of the MVS system that require the software to abruptly terminate the system by loading a disabled PSW with the wait bit
set to 1. In previous systems, this occurred much more frequently, than it does in
MVS because, in MVS, many of these situations were removed from the code and
replaced with software error recovery. However, a few cases still remain that cause
this symptom. To understand these situations better, refer to the 'Wait State
Codes'section ofOS/VS Message Library: VS2 System Codes.
A more critical situation for the analyst is a disabled wait that is caused when
data areas. containing PSWs referenced by the dispatcher or hardware are overlaid
and subsequently fetched for use in an LPSW. This often occurs when a PSA overlay condition exists, that is, the low storage PSWs fetched by the hardware have
been inadvertently overlaid by a program running in supervisor state key O. Other
data areas, such as PRBs, may contain PSWs used by the dispatcher and are also
potential sources of the disabled wait state. Bad LPSWs are difficult to track down.
The most common MVS uses of the LPSW in instructions are:
•

hardware loading from low storage for an interruption-processing sequence

•

dispatcher loading from X'300' into the PSA

•

RTM (IEAVTRTS) passing control to FRRs.

•

the system termination routine

•

SVC FLIH and I/O FLIH LPSWs.

Storage overlays resulting in wait state PSWs are approached in the same manner
as other storage overlays. The important step is to realize the storage overlay has
occurred, then re-create the process that was possibly responsible. The discussion of
pattern recognition in the chapter "Miscellaneous Debugging Hints" in Section 2
should be helpful.

4.1.4

OS/VS2 System Programming LiJ>rary: MVS Diagnostic Techniques

Waits (continued)

Analysis Approach for Disabled Waits
The following is a list of objectives that provides a systematic approach to
analyzing a disabled wait.
Objective 1 - Determine positively that an actual disabled wait condition exists.
Is the PSW the type that is used when MVS loads an explicit wait or is this an overlaid PSW with the wait bit on ?
Analysis - Examine the current PSW contained in the dump according to the
technique described in the chapter "Standalone Dumps" in Section 3. The PSA
overlay should also be analyzed to determine if key PSWs have been overlaid.
If the PSW shows an explicit wait, look up the wait state code in OS/VS Message
Library: VS2 System Codes to find what conditions could cause the explicit wait.
You may need to do some extra analyzing before the condition can be related to a
component. (Note: No further analysis for explicit wait situations is discussed
in this book.)
If the PSW suggests an overlaid PSA or some other error source, proceed to
Objective 3; otherwise proceed to Objective 2.
If,for any reason, the current PSW is not formatted in the dump, the last PSW
shown in the trace table, location X'300' (used by the dispatcher), or low storage
should be examined as possible sources of the last PSW .

. Objective 2 - Determine if the situation has been improperly diagnosed as a
disabled wait. This will eliminate a situation in which the locked console is
diagnosed as a disabled wait.
Analysis - In previous operating systems, the operator's inability to communicate
with the system through the console was an external indication of a disabled wait
condition. In MVS, this same external symptom is often not a true disabled wait.
Console communication is dependent upon other services of the operating system,
such as paging, and the I/O subsystem. A problem in any of these services often
terminates console activity and causes an apparent "disabled wait" situation, when
the PSW does not actually reflect a disabled wait.
If the current PSW is not disabled for external and I/O interrupts or if the wait
bit (X'0002') in the PSW is not set to one (PSW = X'070EOOOO 00000000'), you
should proceed to either the "Enabled Wait Analysis" topic later in this chapter or
to the chapter on "Loops" later in this section.

Objective 3 - Once you know that the disabled PSW is the result of an overlay in
low storage or in another data area, you must gather specific data about the overlay.
Ask such questions as: What was the damage to the PSW? When did the overlay
most likely occur? Where did the PSW come from?

Sectiot.t 4. Symptom Analysis Approach

4.1.5

Waits (continued)

Analysis - It is important to try to fmd out how the PSW was overlaid - was it a
byte, an entire word or doubleword, a single bit, or was a large portion of the
surrounding area destroyed along with the PSW? (The discussion of Pattern
Recognition in the chapter "Miscellaneous Debugging Hints" in Section 2 will help
you determine this.) Much of this analysis depends on your experience and
familiarity with the nonnal data for the subject PSW and the surrounding area.
You should try to gather enough data to know, for example, that "n" bytes were
overlaid beginning at location xyz.
Also, examine the trace table, if available, and try to determine when the PSW
was probably last valid. Look for interrupts and unusual conditions in the trace
entries to try to reconstruct the process(es) leading up to the incorrect PSW.

If the trace indicates the overlay occurred after the most recent trace ~~y, the
registers are important because they may show recent BALs ·and BALRs and
they may contain the address of a routine or control block that was used to overlay
the subject PSW. This "is actually a good situation because it will not take long to
relate the overlay to some bad pointer in a control block and, hopefully, your
analysis will proceed to a specific component.
If the overlay occurred several trace en tries earlier, detennine a possible save
area that might contain the registers that were active at the time of the overlay by
examining interrupt entries or dispatch entries in the trace table.
If there is no trace table, it is almost impossible to defme when the overlay
occurred. You might try to analyze, for example, TCB save areas, hoping for a
clue as to when the overlay occ,urred and to gather infonnation concerning the
problem. However, this process is baSically undefined and undisciplined. In most
cases, a trap for the overlay can be generated at this point and used as soon as
possible.

Objective 4 - Determine which component most likely caused the overlay and
choose a likely set of modules from that component to analyze at an instruction
level. Determine which data area field contains the bad address and who set up
the field.

Analysis - As mentioned earlier, by using the registers and trace table it is possible
to identify which code actually overlaid the PSW, but the source of the error must
still be found. This mostly involves screening code to reconstruct the path which
caused the overlay and locating the data that generated the bad address. At this
point, you want to learn which module set the bad field so you can start backtracking.
Shortcuts are possible according to the analyst's familiarity with the modules
that are involved. Certainly the main objective should be to decide which component is most likely responsible and then to proceed to the discussion of that
component's analysis (in Section 5).

4.1.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Waits (continued)

Analysis Approach for Enabled Waits
It is most important that you understand the actions that must take place in
order to accomplish work in the operating system. This requires a basic understanding of the key system processes in MVS - paging, I/O, dispatching, locking,
WAIT/POST, ENQ/DEQ, VTAM, TeAM, SRM, JES2/3. These areas of the system
are responsible for directing work through MVS; a malfunction in anyone may
cause global system problems. Several, if not all, must be investigated in order to
determine why work is not progressing.
This investigation requires a disciplined approach. The relationships of component interfaces and their mutual dependencies must be understood. With this in
mind, the debugger should proceed to gather information about the vaoous
processes and try to integrate his findings with his other information and assumptions
about the problem, always trying to isolate one cause of the bottle-neck. He must
avoid the tendency to guess, assume, and go off on tangents once the first
irregular item is uncovered. Instead, he should continue to gather known facts
and piece them togetherin some logical pattern that recreates the situation.
In the vast majority of wait state cases, more than one key process will appear
backlogged. The challenge is to determine how these problem processes relate
and which is the fundamental cause of the wait situation. After you gather the
facts and understand the bottlenecks, you must answer one question.
If I "pull the cork" on this given bottleneck will all the other intertwined situations
resolve themselves? In every problem there is only one bottleneck for which the
answer to this question is "yes". The other problems are consequences of this key
process's failure to complete its designed function. Isolating the process is half
your battle; the other half is determining the cause of this one process's failure.
Following is a sl,lggested disciplined approach for the problem solver who is
approaching" a-system wait problem. The approach involves three distinct
stages of problem analysis:
Stage 1 - Preliminary global system understanding, including
• system externals
• current system state
• LOGREC analysis
• trace analysis
• determining the reason for waiting
Stage 2 - Key subsystem analysis - an in-depth analysis of the MVS components
that are responsible for accomplishing work.
Stage 3 - System analysis - using the information gathered in Stages 1 and 2 the
problem solver must "step back", get perspective about the known
facts by piecing them together in a logical fashion, and isolate the error
to a process, component, module, etc.
This approach is described in detail in the following sections.

Section 4. Symptom Analysis Approach

4.1.7

Waits (continued)

Stage 1: Preliminary Global System Analysis
1. System Externals - Completely understand· the system externals of the
situation. Console sheets and the system log should be inspected.
•

For any' enabled wait (operators call it "system hung") find out if a
display requests command was issued. (Lack of operator action can cause
system bottlenecks.)

•

Often many pages of console sheets must be investigated to uncover
operational problems and explain events uncovered in the dump.
Scanning provides a feeling for the events, jobs, requests, etc. leading up to
the problem.

•

Make sure all DDR SWAP requests, I/O error messages, SQA shortage
messages, etc. can be explained.

•

Always take the time to examine these external areas because a small
effort here could save many hours of detailed dump analysis. Do not overlook obvious items such as a MOUNT PENDING message in the console
log that can cause system problems.

2.

Current System State - Investigate fully the current situation as depicted by
the dump.
For enabled waits, the PSW should equal X'070EOOOOOOOOOOOO' (often called
the "no-work " wait) or there should be a considerable recurrence of the
no-work wait in the OS trace table - see the chapter on "MVS Trace Analysis"
in Section 2. If this is not the case, use the disabled wait analysis approach
(earlier in this chapter).
If the PSW indicates the no-work wait situation, you have an enabled wait.
You should now check other global system data areas indicators to get the
whole picture. Following are key global indicators:
• There should be no bit set in the PSASUPER field (PSA+X'228'). If
there is, some supervisor routine should be in control. This situation
can indicate incomplete processing by the associated routine. All
possibilities should be pursued until the situation can be explained .
• Because of SRM timer/analysis processing, even when the system is in the
enabled wait situation,the state of the processor at the very instant the
dump was taken can indicate, via the "super bits" or locks indfcator
(PSAHLHI), that some process was occurring. You must determine in
this case that these fields being set is nonnal and continue with wait
analysis. If the fields cannot be explained, you have isolated the error.

4.1.8

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Waits (continued)
• There should be no locks held, as indicated by PSAHLHI on either
processor. This situation is similar to the one described just above. You
must try to discover the owner of the lock and determine why it is still
held despite the fact that the system is waiting. Often the purpose of the
lock will provide insight as to who the owner might be .. The chapter on
"Locking" in Section 2 should be of help in your analysis.

3.

LOGREC Analysis - Determine if key components have encountered
difficulty; determine previous errors encountered by the system. This can be
accomplished by inspecting SYSl.LOGREC as well as the in-storage LOGREC
buffer. Errors encountered in any of the key processes noted earlier (RSM,
ASM, lOS, JES2/3, SRM, ENQ/DEQ, VTAM, etc.) may provide further
information. If you do find an error associated with any of these areas,
determine whether it could lead to the bottleneck.
The LOGREC records generally contain the names of the errorencountering routines and often the job on whose behalf the system was
processing at the time of the error. If the routine names are not present, you
may have to use system maps and the PSW/register information in the
LOGREC records in order to associate errors with components. The discussion
!~:~l LOGREC analysis in the "Use of Recovery Work Areas" chapter in Section
2 should be helpful in your analysis.

4.

Trace Analysis -

Determine the last activity within the system.

Because of SRM's timer processing, the trace table for most wait conditions is not useful. However, on the rare occasion that the system has been
stopped or if for some reason the trace is not overlaid with timer il1terrupts
(X'l004' extemalinterrupt entries), the trace should be analyzed to ensure
normal processing, for example, page faults are being processed, I/O is being
accomplished. Be suspicious oflarge (relative to most entries) time gaps
in the trace table. If the table has not wrapped-around, process re-creation
may be of some use in determining what the system was doing up to the point
of incident. (The chapter on "MVS Trace Analysis" in Section 2 shouid be
helpful.)

5.

Determine the reason for waiting - Once it has been determined that the
system is waiting, it is always useful to determine what the various address
spaces or jobs are waiting for. This is accomplished by inspecting and scanning
the various tasks and their associated RB structure in a formatted stand-alone
dump. Remember the RCT, started task conrol (STC)/LOGON, and dump
task may all be waiting in each address space - this is normal. The question
you should ask is: Why are the subtasks below the STC/LOGON waiting?
Generally in an active system more than one address space will be waiting
for the same or similar resource in a problem situation. Therefore, as you
scan and analyze address space status, look for suspensions in common
modules (RB resume.PSWs containing.similar addresses):

Section 4. Symptom Analysis Approach

4.1.9

Waits (continued)
•

marty tasks in page-fault wait can indicate the paging or I/O mechanism
is faulty.

•

The PVT can indicate areal frame shortage.

•

Many tasks in terminal I/O wait can indicate something is wrong with
the TP access method or some part of the network.

•

Seve'ral Resume PSWs pointing into the ENQ/OE.Q routine, IEAVENQ1,
can indicate an ENQ resource contention problem.

In general, be on the look-out. Try to compare and relate the system
activities as you encounter them. Often more than one process or address
space is held up because of a common bottleneck. It may be a global resource
required by more than one address space, for example, a lock or data set. It is
important that the exact cause be determined.

Stage 2: Key Subsystem Analysis
As partof this investigation, if noth~hg can be easily determined from a cursory
address space scan, you may have to delve~)'i1io the key components. Following are
some highlights of the important and potentially suspect areas:

1.

I/O Subsystem ..... Check for unprocessed I/O requests, bottlenecks in the I/O
process will almost always log-jam the system. Since lOS is the central facility
for controlling I/O operations, I/O problems should always be suspected in an
enabled wait condition. Therefore, the lOS component and its associated
queues should be analyzed early in the subsystem analysis stage of debugging.
Two important lOS queues and control blocks will indicate whether problems
exist in the I/O process:
•

Logical channel queues (LCH) contain lists of elements for I/O requests.
If these queues (pointed to by the CVT + X'8C') are not empty in a waiting
system, lOS must be further investigated.

•

Unit control blocks (UCBs) are a logical representation of each I/O
device containing I/O active indicators at offset 6/7. If any indicator
is set, this device must be further investigated. This condition can
indicate either a hardware or software problem.

Both the queued (LCH) and active (UCB/IOQE) requests must be further
investigated to determine the associated requestors and what effect their I/O
not being serviced will have on system operation (for example, if paging I/O or
console I/O is not being serviced, the system will usually stop).
The UCB contains indicators for OOR, intervention required, and missing
interrupt handler processing. Any such indication must be further investigated.
An ENQ on the SYSZEC 16 resource is an indication of a waiting condition
generally associated with swapping. The swapping process cannot complete
until active I/O finishes. In a quiesced system, an ENQ on this resource must
be further investigated.

4.1.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Waits (continued)

2.

Paging Mechanism - Check for unserviced page faults. ASM, RSM, and SRM
are closely related and depend upon each other to maintain real storage, the
swapping process, and page fault resolution. If, when you determined the
reason for waiting as described in stage 1, you discovered several page fault
wait conditions, be suspicious. Some key indicators in determining page
fault waits are:
•

ASCBLOCK = X'7FFFFFFF' - indicates suspensiori while holding the
local lock. If in task mode at the time of suspension, the resume PSW
instruction address (saved at IHSA + X'IO') should be checked. When the
instruction address = RBRTRAN (-C offset), it indicates
the task is suspended while it waits for a page fault resolution. The page
fault occurred when a new module (paged-out) was referenced. If in SRB
mode at the time of suspension, an SSRB will be queued from a PCB. The
anchor for these PCBs is the RSMHDR (private area page fault) or the PVT
(common area page fault).

•

ASCBLOCK = 00000000 indicates no locks are held. The RB structure
can reveal the same situation as described above for RBOPSW instruction
address = RBRTRAN or RBXWAIT=O and, in addition, an RB wait count
= 1. If you find several tasks in this state, check the dump for the page
represented by RBRTRAN. Is it in storage? (Remember for private area
addresses to be sure that the address space you are investigati.ng is printed.)
If the page is not in storage you may have a potential paging problem.
Again, if in SRB mode at page fault time, the SSRB must be found to
determine more about the process.

If you believe paging is a potential problem, check the PVTAFC (available
frame count). A "low" value may indicate a frame shortage. While "low" is
difficult to define, the value should certainly be above the PVTAFCOK value
(PVT+ X'6 '). Beyond this, "low" is influenced by sizes of working sets of the
address spaces in your system. The working set size for each address space is
contained in the associated SRM-user control block (OUCB). This count
(plus an SRM constant of 10) is the number of frames required to swap-in the
corresponding address space. If enough frames are not available, the address
space will remain swapped-out.
ASM maintains a count of the number of paging requests received and the
number for which processing has completed in the ASMVT. If these counts
are not equal, ASM is backed-up and page faults have not been resolved. This
can be caused by an I/O problem or some internal ASM problem. The ASM
Component Analysis chapter in Section 5 describes the work queues in the
paging activity reference table entries (PARTEs). Finding unprocessed work on
these queues will aid in determining whether ASM is the problem component.
But again be careful: you are still gathering data about the wait state. Your
purpose now is not to debug ASM - it may not be the problem. Note the
apparent ASM problems and continue your investigation. Later when you
piece together your findings and find the real source of the problem, detailed
debugging and logic flow will be required.

Section 4. Symptom Analysis Approach

4.1.11

Waits (continued)

3.

ENQ/DEQ - Check for unresolveable resource contention. Finding an
ENQ/DEQ interlock and determining what work is being held up because of
this interlock can provide important information about the overall problem.
The QCBTRACE option of AMDPRDMP provides a formatted structure of
the resources and the work that is in con ten tion for them. Determining who
owns the resources and the current status of the owners (if swapped-out,
why? or if waiting, for what ?) often provides important clues in understanding the bottleneck.
Also in your scanning process, you should be on the alert for address spaces
that contain subtasks (usually below the STC/LOGON level) with multiple
RB levels, and with the lowest RB containing a resume PSW with an address
somewhere within ENQ code (nucleus resident) and with the RB wait count
RBWCF = 1. The previous RB should be an RB with the ENQ
SVC (SVC X'38') indication in the "WC-LIe" portion of the RB prefIx (-4
offset). This indicates that this task and probably the address space are
suspended because of an unsatisfIed ENQ request. If several address
spaces or tasks are found in this state you should fInd out why. The
QCBTRACE facility of AMDPRDMP can be most helpful. An illustration follows:
Investigation of QCBTRACE data shows many requests backed-up on
resource A. The analyst notes this and determines what ASID or TCB owns
resource A at this time (in this example , ASID 9). The other resources
represented in the QCBTRACE are now scanned. If ASID 9 is backed-up
behind someone else (ASID 10) waiting for another resource (B), you must now
determine ASID 10's status with respect to other resources, including resource
A. Essentially you are looking for cases where:
•

An address space has resource A and is waiting for resource B and a
second address space has resource B and is waiting for resource A. This
indicates a deadlock. You must determine the faulty process. In this
case you have probably isolated the error to the ENQ process and the way
it is being used. You· must analyze the task structure of each address
space to determine how this situation occurred. Do not forget the
SYSl.LOGREC buffers. They may contain clues like errors in ENQ/DEQ
or one of the tied-up address spaces Gobs). Faulty recovery should be
suspected if the latter is the case.
It may be that ajob requests control (via ENQ) of a resource and subsequently encounters a software error. The task's associated recovery
gains control and "recovers" from the error but does not dequeue (DEQ),
and therefore does not release the resource. Eventually, the contention
for this resource, depending on its importance, could cause severe
problems.

4.1.12

OSjVS2 System Programming Library: MVS Diagnostic Techniques

Waits (continued)
•

An address space has control of a resource and a lot of address spaces are
queued-up behind this address space. In this case, you must find out why
the holder is not releasing the resource. Also know your system. It is not
unusual to see activity on the master catalog resource: "SYSIGGVI Master Catalog Name." But be suspicious of most resources. Determine
from the holder's task structure what process it is attempting. Determine
whether the address space is waiting or swapped-out and why. If it is not
waiting or swapped-out, check the non-dispatchability bits and the
possibility that the address space is looping.
This second case is much more likely to be a sign of some other system
problem. Your clue is what is preventing the holder's execution; this will
point you to another process which must be investigated and may lead to
the detection of the final problem.

Note: When analyzing a dump of a quiesced system you should be suspicious
of "unusual" ENQ resource names - resources that should not be a contention
factor in a quiesced system. The presence of these names should be understood
and explained because they very often will point you to the problem area.
Common resource names are:
"SYSZECI6 - PURGE" - Can indicate a problem in the I/O process related
to the resource holders address space
Can indicate a bottleneck in the swapping process
"SYSZVARY - x"

4.

indicates the reconfiguration component has been
invoked - why is it not completing?

Dispatching - Determine if there is work to do in the system. A common
trouble indicator is an MVS dispatching queue containing elements that
indicate work is ready to execute in a waiting system. The GSMQ, LSMQ,
GSPL, and each LSPL should be empty. (The chapter on "System Modes and
Status Saving" in Section 2 contains details of these queues and how to find
them). Generally it is not a problem in the dispatching mechanism itself but
merely an error indication. Often the most useful information is just that
'yes, there is work.' Why is it not being dispatched? Is there a problem in
some other area of the system? Is the address space swapped out? Yes,
there may be a real storage problem delaying swap-in. Or perhaps SRM has not
been told to swap-in the address space via a "user-ready" SYSEVENT. In
short, investigate the OUCB for the address space you are concerned with.
Another useful point is to find out what problems could arise if this work
were not dispatched. Investigating the queued work will indicate what would
be accomplished if this work were executed. This is usually important because
it can clear up much of the "smoke" you may be encountering in your overall
system investigation.

Section 4. Symptom Analysis Approach

4.1.13

Waits (continued)
Likewise, investigate the task strucfure. Generally, you can ask the same
questions as above,bllt you must look in different places for the key
indicators. Among the most important indicators are:
•

The ASCB, which contains a count of ready TCBs in the memory

•

The TCB non-dispatchability flags

•

The RTM work area, which contains ~us at time of error

•

. The RB structure. Look for long RB chains or unusual SVCs and
interrupt codes. Look for page fault waits.

Again, use this information to lead you to processes or problems that
hold-up the system.
S. . Locking - Determine if there is a locking conflict. The locking mechanism
causes system bottlenecks when it is not used properly. The global spin locks
cause obvious problem symptoms such as one processor spinning in the
lock manager (lEAVELK) in an MP environment. (In a UP eJlvironment,
global spin iocks are generally not a problem unless a lock/w~rd or interface is
overlaid or ba.d, causing a disabled spin. The enabled locks (local/CMS) are
generally the problem ones.) The chapter "Locking" in Section 2 describes in
detail the considerations with which you should be concerned. Elements on
the CMS/localsuspend queues may indicate a problem. The technique you
adoptto resolve the conflicts is exactly the same as the ENQ interlock or log~
jam situation.

6.

Teleprocessing - Determine if the TP network is responding. Problems in the
TP network often manifest themselves as waiting network or waiting
terminals, even waiting systems. The chapter "TP Problems" in Section 4
contains a detailed description of TP problem analysis. The VTAM and TCAM
chapters in· SectionS contain techniques for VT AM and TCAM problem
analysis.
An important fact for the problem solver here is that these ar:e
subsystems. As such, they maintain their own control blocks, queues, and
dispatching mechanisms. They are responsible for work being processed once
it enters the subsystem and they often have little direct dependency on MVS.
That is, normal MVS problem indicators will not generally solve the problem.
You must understand the subsystem's work-processing mechanism in order to
be an effective analyst. For example, VTAM has its own address space with a
number of tasks used primarily for network start-up, shut~down, and operator
commands. In most VT AM problems, a look at the VTAM address space will
show these tasks are waiting. However, this is normal when no operator
processing is required. Even though VTAM is waiting, this is not the place to
be distracted. Again, remember this VTAM task structure, put it aside as part
of your information gathering, and then proceed to the analysis of VTAM's
in ternal work queues as described in the VTAM chapter of Section S.

4.1.14

OS/VS2 Systel11 Programl11ing Library: MVS Diagnostic Techniques

Waits (continued)

7.

Console Communications - Determine whether console communication is
possible. The system can appear or actually prove to be waiting because the
operator is not able to communicate with MVS. This could be the sign of
a problem almost anywhere in MVS, but it often indicates an error in the
communications task or its associated processing.
The communications task (comm task) runs as a task in t~e master
scheduler's address space and is usually represented by the third TCB in the
formatted'portion of the stand-alone dump and identified by a X'FD' in the
TCBTID field (TCB+ X'EE'). By inspecting the RB structure associated with
this task, you can 'to determine the current status. It is not unusual to
fmd one RB with a resume PSW address in the LPA and an RB wait count of
one. If more than one RB is chained from the TCB and you were not able to
enter commands, analyze the RB structure because this is not a normal
condition.
The key cOlltrol block is the unit control module (UCM) which is located
in the nucleus. CVTCUCB (CVT+X'64') points to the base UCM. The base
UCM-4 contains the address of the UCM MCS prefrx and the base UCM-8
contains the address of the UCM extensipn. From the UCM you can determine
the status of the various consoles. The following should be considered and can
warrant further investigation:
•

Important WTORs are outstanding.

•

An out-of-buffer (WQEs, OREs) situation exists.

•

There are unusual flags in the UCM.

•

There is a full-screen condition.

•

There is a console out of ready.

Remember that comm task processing is dependent on the rest of the operating
system. Most likely, some external service or process has caused comm task to
back-up, and this possibility should be investigated. Remember the debug
process: gather all the facts, then proceed with analysis.

Stage 3:, System Analysis
At this point you should have a detailed understanding of the system and its key
components. You should know which components or processes are back-logged
and, correspondingly, what work Gobs) is not being processed by the system
because of these back -logs. You must now stand back from the problem.

Section 4. Symptom Analysis Approach

4.1.15

Waits (continued)
Answer this question: Which of these problems and situations can be related
to or attributed to each other? For example, if I/O is queued for the paging
devices (indicated by 10QEs on the LCHs associated with the_paging devices' DCBs)
al!dy'Qu also_ found several address spaces are in "page-fault wait", you can now relate
these findings. And if one of these address spaces performed an ENQ for a resource
and did not yet DEQ because of the page-fault suspension, it is very likely other
address spaces are also backlogged behind this job's processing. Initially your
ENQ/DEQ analysis showed the problem, but at this point you can attribute the
ENQ contention problem to the page-fault suspension problem that you have
already attributed to the I/O problem.
This process must be repeated for all the potential error situations you uncovered in your investigation. Do not forget to use the system indicators in your
attempt to arrive at the source of the problem. And most importantly, ask yourself: If I unplug this bottleneck, will all the other intertwined situations
resolve themselves ? In the previous example, resolving the ENQ situation will
allow the work queued in the ENQ/DEQ component to execute but the "page-fault
waiting" job will still be hung. That is, ENQ/DEQ is not the problem to pursue.
Indeed, if you resolve the I/O problem, this page fault is resolved, the DEQ will be
performed, and all work in the system will resume normal operation. Yes, the I/O
problem is the important consideration in this case. The I/O problem is the one
that must be pursued. When this problem is resolved, the enabled wait state
condition has been resolved. Global system areas, recovery work areas, 'LOGREC
analysis, and lOS component analysis will be necessary to further isolate, and
eventually solve, the problem.

4.1.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Loops

Loops are defined as disabled or enabled, depending upon their external
appearances. A disabled loop can be recognized externally by a solid system
light and the inability to communicate with the system through the consoles (that
is, no input or output). Usually, a disabled loop indicates a hardware
and/or software malfunction. There are several cases in MVS however in which a
disabled loop is purposely used and is not an error indication. These cases are
discussed later in this chapter.
An enabled loop is generally much larger than a disabled loop. Observed from
the console it appears as a bottleneck: the system seems to be slowing down
periodically, suggesting performance degradation. The operator may notice that
a particular job remains in the system for a long time and does not terminate.

Common Loop Situations
There are two common loop situations:
1.

Two processors of an MP environment communicate via the signal processor
(SlGP) instruction. Often the SIGP-issuing processor enters a disabled loop
until the receiving processor either accepts the SIGP-caused interrupt or
performs the operation requested by the issuing processor. This loop serializes
the processors in the MP configuration. The SIGP-issuing processor loops in a
nucleus-resident module, IEAVERI.
Often during an MP dump analysis you will fmd that one processor was in
this loop. This is not an error if:
• The operator pushed the STOP button on one processor and not the other to
investigate a suspected problem.
• The receiving processor disabled for external interrupts thereby preventing
the SIGP-issuing processor from proceeding.
If this situation continues for an extended period, it means there is a system
problem but the loop is a result of that problem and is not an error itself. Most
often, the other processor's activities must be analyzed to determine the problem.
For a more detailed discussion of MP communication, refer to the chapter
"Effects of MP on Problem Analysis" in Section 2.

Section 4: Symptom Aria1ysis Approach

4.2.1

Loops (continued)
2.

The lock manager (lEAVELK), which resides in the nucleus and controls the
locking mechanism of MVS, contains a section of code that enters a disabled
loop when a global spin lock is requested butis not available. On a UP this is
an invalid condition and always signifies an overlaid lockword or invalid lockword address. On an MP system, this usually indicates that the other processor
is holding the lock and not releasing it. But it may indicate an overlaid lockword; if not, the problem is definitely on the other processor. In either case,
register 11 contains the pointer to the requested lockword and register 14
contains the address of the requestor. Check the value in the lockword. Valid
values are a fu1lword of zeros, or three bytes of zeros and the logical processor
address in the fourth byte. Any other bit configuration will cause the system
to spin in a disabled loop and signifies an overlaid lockword or invalid lockword address. If the lockword is not valid, it is necessary to identify who
overlaid the lockword. It is possible that the lockword was overlaid in conjunction with some other problem. Again, since the disabled loop may not be
the problem but a symptom of a possible error on the other processor,
determine why the requested lock is not available. For a detailed discussion of
"Locking" see Section 2.

Analysis Procedure
Generally for loop analysis, you will have a stand-alone dump if the operator considered the problem serious enough to re-IPL the system, or an SVC dump,
SYSUDUMP, SYSMDUMP, or SYSABEND (provided by the software recovery) if
the operator pressed the RESTART key in order to break the apparent loop. For
the SVC dump, SYSUDUMP, SYSMDUMP, and SYSABEND dumps there is an
abnormal completion code of X'071' associated with the looping task of a job if the
RESTART key was pressed when the program was actually looping. In addition, a
formatted SYSI.LOGREC listing should be available.
Before you can determine what problem is causing the loop, you must determine
first that a loop really exists,and second whether it is enab,led or disabled.
First, verify that a loop exists. The disabled loop situation is fairly straightforward. The PSW contains a disabled mask (X'40' or X'OO') and all other
system activity will have stopped.
Recognizing that there is an enabled loop is often the most difficult step.
Enabled loops are often quite large and may encompass several distinct operations
-:- I/O events, SVCs, module linkage, etc. Because the loop is enabled, it is often
interrupted, pre-empted and eventually resumed many times. This makes it
difficult to recognize the loop pattern. Following are some indicators of a potential
enabled loop:
•

4.2.2

The current PSW has an enabled mask, X'OT, in the first byte and the instruction address portion =t= O. This alone does not prove there is a loop, but the
information may help your analysis of the problem later.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Loops (continued)

•

The MVS trace table shows a repetitive pattern of events, for examp!e, SVCs
issued from the same virtual addresses, or dispatcher entries for virtual
addresses that are relatively close together. Determine if the entries are related
to the same address space by using the ASID field (offset X'16' into the trace
entry). If so, you can now examine the task and control block structure
indicated by the trace entries. The chapter on "MVS Trace Analysis" in
Section 2 should prove helpful.

•

Many tasks (TCBs) or address spaces (ASIDs) appear to be bottlenecked
waiting for some resource(s). This can be determined by using the
QCBTRACE option for AMDPRDMP and analyzing the output. If there
appears to be a bottleneck, determine what job owns the resource(s) and what
that job is currently doing. It may be that the job that acquired the resource(s)
is in an infinite enabled loop; therefore, when other jobs request the same
resource(s), their requests cannot be satisfied, which eventually causes a major
performance throughput problem. See the chapter on "System Execution Modes and Status Saving" in Section 2 for how to recreate the job's current
status. A reconstruction of the PSW and registers helps you to determine if
there was an enabled loop.

•

TCB/RB structure analysis. Look for unusual or long RB structures chained
from TCBs. These may.indicate a loop that includes several levels of supervisor linkage.

Enabled Loop Exception:
The system resources manager (SRM) of MVS constantly monitors resources,
gathers data, and analyzes the system. SRM uses a timer interrupt approximately every .4 seconds in order to gather its stati.stics. This timer interrupt
occurs even when the system is in an enabled wait condition. Because of
this, the enabled wait is often referred to by operators as an enabled
loop. (They observe the "WAIT" indicator from the console, followed by
a burst of activity (SRM processing), followed by the "WAIT" indicator, etc.
It may even be possible to enter certain operator commands.) However, this
is really an enabled wait condition and analysis should proceed according
to the discussion on "Enabled Waits" in the "Waits" chapter earlier in this
section.
The dump you are analyzing may show the MVS trace table containing a
no-work wait (070EOOOO 00000000) PSW followed by a timer interrupt,
SRB dispatch, MP communication, etc. This pattern indicates an enabled
wait condition, not an enabled loop. (See the "Pattern Recognition" topic
in the."MisceUaneous Debugging Hints" chapter in Section 2.)

Once you have determined the type of loop, the following analysis procedure
should help determine what problem is causing the loop.

Section 4: Symptom Analysis Approach

4.2.3

Loops (continued)

Objective I - Who is looping? .
The PSW and registers saved at the time of the dump indicate the active work. (See
the chapter "Global System Analysis" in Section 2.) The register save areas in the
LCCA/PSA indicate important environmental data at the time of the last I/O interrupt, external interrupt, etc. (See the chapter "System Execution Modes and
Status Saving" in Section 2.)
The PSA indicators contain valid information about disabled loops. Also remember the recoverY areas active at the time of the loop are valid and may provide hints as to the current process. (See the chapter "Use of Recovery Work
Areas" in Section 2.)
Unlike the disabled loop situation, the enabled loop may not have the current
registers associated with it. This is true if the loop was interrupted and new processing was initiated before the dump was taken. For the enabled loop, find the
current registers and status from the ASCB/ASXB/TCB/RB structure and the
associated save areas (for example, IHSA)~ The chapter "System Execution Modes
and Status Saving" in Section 2 will be helpful for this phase.

Objective 2 - What is the system mode?
It is important to know whether the system is in SRB or task mode and the
implications of these modes. In all cases of true disabled loops, the PSW, LCCA,
and PCCA contain valid status indicators such as the last dispatched routine
(PSA+X'300'). The old PSWs reflect the last interrupt status. The register save
areas in the LCCA are valid. The LCCA+X'2ID' set to I indicates SRB mode; set
to 0 indicates task mode. The ASCB NEW/OLD and TCB NEW/OLD pointers
reflect the current task. (Note: If the TCB OLD pointer is zero, the system is in
SRB mode or possibly in superviso~ mode - that is, dispatcher or supervisor
recovery. The discussion in the "System Execution Modes and Status Saving"
chapter in Section 2 and the "Dispatcher" chapter in Section 5 are useful.
By scanning the MVS trace table, you will be able to determine system events
leading up to the loop. See the chapter on "MVS Trace Analysis" in Section 2.

SYSl.LOGREC and the in-storage LOGREC buffer may contain indications
of previous occurrences of the loop (records with X'071' completion codes) or
records of previous errors in the currently looping process that could possibly
contribute to the current loop. See the "Locking" chapter and the discussion
on LOGREC in the "Use of Recovery Work Areas" chapter in Section 2.

4.2.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Loops (continued)

Objective 3 - What is the extent of the loop and why is the system looping?

Using the current PSW and the global data areas in combination with the general
purpose registers, you should be able to determine the extent of the loop. One
register often contains the key to a loop-causing value. Try to isolate that one
register. It may be necessary to inspect the actual object/source code to determine
the basic logic in case there is an encoded loop that is supposed to end when a
certain value is reached. If that value cannot be reached for some reason, the loop
will not end.
Isolating the cause of the loop is important in loop analysis. Once the cause is
isolated, you can proceed the same as with a system-detected error such as a program check.

Objective 4 - Determine the cause of the error - how is the value that is causing
the loop developed?

To determine how the bad value was developed, it is necessary to back through the
logic leading to the loop. Be aware of bad control blocks. Look at the bad value
itself and the areas from which it was developed. Try to determine if the value is
the result of a storage overlay or if it was calculated from bad logic. See the
"Pattern Recognition" topic in the "Miscellaneous Debugging Hints" chapter of
Section 2 to help make this determination.
In addition to bad control blocks and data fields, consider the loop control
mechanism used for encoded loops. Often a common cause of problems is that the
BCT instruction is used and the loop control register contains a negative value.
Scanning the active registers at the time of the dump often aids in discovering
this type of problem.
Figuring out how the erroneous field could possibly contain the value it does is
the most challenging part of the process. Again, the contents of the field often
provide the clue to determining the error-causing process.
Also, consider how serialization is accomplished for the field in question. Is it
possible for both processors to be updating the field simultaneously? The MVS
trace helps you recreate recent processes, but you also must understand the modes
and structure of the code that updates the field. (Your work in Objective 2 should
be helpfu1.)
It is possible that the code setting up the field was physically interrupted and,
because it was non-reentrant or the logic was faulty, another process updated the
field or control fields and subsequently caused the first process to encounter
unexpected data.

Section 4: Symptom Analysis Approach

4.2.5

4.2.6

OS/VS2 System Programming L,brary: MVS Diagnostic Techniques

TP Problems

A common problem in teleprocessing (TP) environments is incorrect data, which
may affect one termirtal or an entire component. The symptoms include no
data, wrong data, or too much data, but the general problem symptom is that
something is wrong with one or more messages. The problem is usually not tied
directly to a component or access method, as a program check would be; often an
error message is from a component not directly causing the problem.
Typical symptoms are:
• An error response from an application that suggests incorrect data was entered
from a terminal, when in fact the data was correct
• A "hung" terminal - the system will not respond
• Wait states, in which message traffic gradually dies off
• Incorrect characters in a message (the data may be going into or out of the
system)
This chapter discusses TP problem analysis, in the following topics:
• Message Flow Through the System
• Types of Traces
• Trace Output Under Normal Conditions

Message Flow Through the System
Data exchanged between programs in the system and terminals follows a route
through several components. The first step in solving "typical" problems is to
determine where along that route something is happening incorrectly.
By far the most valuable tools for doing this are the traces in the various
components. To use the traces effectively it is necessary to understand how
messages flow through the system. For example, consider a message from a TCAM
application to a TCAM terminal. The path might go from the application program
buffer to a TCAM queue data set, to a TCAM buffer while TCAM processes it,
over the channel into the 3705, then into an NCP buffer, and fmally over a
communications line to a terminal: Traces allows the message to be checkpointed at certain spots along the path; therefore, understanding the,path is vital
to knowing what traces to use and what you should see for a message that flows
correctly.

Section 4: Symptom Analysis Approach

4.3.1

TP Problems (continued)
To use traces effectively you must also understand how components refer to
terminals or lines and how they communicate with each other regarding these
terminals and lines. Terminals or lines are identified in traces by a line control
block (LCB), a logical name, a network address, a polling/addressing sequence,
or a subchannel address. Not only must these relationships be known in order to
use multiple traces, but certain correspondences must be correct in order for data
to move through the network.
When using traces, the general approach to a problem of incorrect data is to
track the data flow from a point where everything was all right to a point where the
messages stopped or were incorrect. Messages that are flowing correctly can be
used to establish time relationships between different traces. Then each message
can be followed along its route past each checkpoint, with the goal of isolating
: a gap between two checkpoints where the message stopped or became bad. The
next step is to focus on this narrower area to learn what is wrong.
If a message stops, what is wrong or what is missing? How does the flow up to
that point compare with a normal flow? You must understand what resources
and what processes are required for a message to move from where it appeared
last to where it should have appeared next. What buffers and/or control blocks
would have been used? Were they available? A single terminal or all terminals
may encounter a "wait state", and it is necessary"to dig into the component to
determine what processing has taken place and what condition or resource is
preventing further processing. The TeAM Debugging Guide should be referenced
for problem isolation in TCAM.

If a part of the data moving through the system becomes bad, the traces should
isolate a component or an interface over which it was transformed. Comparison
with normal message flow will indicate whether any change at all should have
taken place. If no change should have occurred, an overlay ("clobber") or incorrect
pointers to data buffers maybe the problem. The exact amount and positioning
of bad data should be determined, for it might provide an obvious correlation ~ith
other known variables such as a buffer length. If some transformation norm"ally
occurs to a message, the controlling process under which it is performed must
be examined. What could cause an incorrect transformation of data? Examples
are translate/edit tables or mappings from one resource name into another name,
such as mapping the logical network name into the network address.

4.3.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TP Problems (continued)

Types 0 f Traces
The following is. a summary of EP and NCP mode traces and their relationships to
each other. For more information on these traces refer to:

• IBM 3704 and 3705 Communication Controllers Emulation Program
Generation and Utilities Guide and Reference Manual
• IBM 3704 and 3705 Communication Controllers Network Control Program/VS
Generation and Utilities Guide and Reference Manual
• OS/VS2 MVS VTAM Debugging Guide
• IBM 3704 and 3705 Program Re!emence Handbooks
• OS/VS2 TCAM System Programmer's Guide, Level 10
• OS/VS TCAM Debugging Guide, Levell 0
Figure 4-1 shows a summary of EP and NCP mode traces .
. EP Mode
3705 EP or PEP
EP Subchannels
BTAM APPL
r-----IIOS

EP

TCAM·

.

TCA M 1/0 Trace

GTF 510 and 1/0
EP Line Trace
Level 3

EP Line Trace
Level 2

NCPMode
Application
3705
TCAM

105

TeAM
Dispatcher
Subtask
Trace

TCAM
Buffer
Trace

Native Subchannels
1--_ _ _ _ _-1

PIU

GTF

Trace

510
and 1/0
Trace

NCP
Channel
Adapter
Trace

NCP Line
Trace

Figure 4-1. Summary of EP and NCP Mode Traces

Section 4: Symptom Analysis Approach

4.3.3

TP Problems (continued)

EP Mode Traces
The following describes the six EP mode traces:
1. 3705 Emulator Program line trace - Any or all emulator subchannels can be
traced in the 3705. Each character over a line (level 2 interrupt) and/or each
interaction with the channel (level 3 interrupt) for data or status transfer can
be traced. The trace is activated via the 3705 panel or via the EP (emulator
program}service aid, Dynadump. Trace data is retrieved from a 3705 dump or
via Dynadump. All of the 3705 storage above the relatively small emulator
program itself is used for a trace table (this can be a significant amount in a
large 3705).
If a problem can be isolated to one or two lines and can be detected quickly,
an in-storage trace (in the 3705) is usually sufficient. The trace can be stopped
before critical information is lost because of wrapping. If all lines must be
traced, if there are a good number of high-speed lines with constant activity
such as polling, or if the problem is not externally detectable, then Dynadump
should be used to dump trace data as it is created.

2. PEP Emulator line trace - This traces,EP lines in a PEP (partioned emulator
program). The size of the trace table is set at PEP generation time and is fixed.
The size of the trace table compared to the amount of storage used for tracing
in an EP makes wrapping a much more serious problem. Dynadump is
extremely useful when tracing EP lines in a PEP.
3. GTF I/O and SIO trace -All emulator sub channels can be traced. SIOs, I/O
interrupts, CAWs, CSWs, and SIO condition codes are traced. No CCWs or data
are traced. Data can be traced to an external file and selectively reduced by time
and sub channel address. SIOs and I/O interrupts are directly correlated with
channel activity seen in a 3705 EP line trace. Refer to OS/VS2 System
Programming Library: Service Aids for details on activating and printing this
trace.

4. TCAM EP mode line I/O interrupt trace table - This trace I/O interrupts on
EP or 2701/2/3 lines. The TCAM INTRO macro or the operator specifies the
response at TCAM start up~
5. TeAM dispatcher sub task trace - This records the flow of all TCAM-dispatched
elements. It is specified, as is the TCAM I/O trace, by the INTRO macro or by
operator responses. It is useful in determining the flow of activity within TeAM.
It contains a history of TCAM buffers, LCBs, QCBs, etc., that are processed by
the various TCAM subtasks.

6. TeAM buffer trace - This traces buffers for specified lines as they are
;processed by a message handler (MH). The lines are the same lines as those
specified by the I/O interrupt trace. or TPIO trace.

4.3.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

TP Problems (continued)

NCP Mode Traces
The following describes the NCP mode traces:
1. NCP line trace - This traces all characters on a given TP line (level 2 interrupt)
in the 3705. Only one line can be traced at a time. There is one four-byte entry
for each character interrupt in the 3705, including a one-byte time field. The
time field wraps after 25 .5.. seconds, but it is useful for seeing delays and when
time-out periods expire. This trace is controlled through access method operator
commands. Data is shipped to VTAM from the NCP and traced via GTF, so
GTF must be active for the USR trace when line trace data is required.
2. NCPchannel adapter trace - This traces the NCP's interaction with the channel
for status and data transfer. There is one 32-byte entry for each Level 3 NCP
channel adapter interrupt. The data stays in a static trace table within the 3705
and is retrieved via~a 3705 dump. The trace option is included in an NCP by
altering the TRACE= operand in the SYSCG006 macro (in SYSl.MAC3705).
Refer to the section "Channel Adapter Trace" in IBM 3 704 and 3 705
Communications Controller Network Control Program/VS Logic, and to the
3704/3705 Program Reference Handbook.

If the NCP uses a Type 1 channel adapter, the trace includes all data
transferred over the channel. If the NCP uses a Type 2 or Type 3 channel
adapte~ (CA), the trace does not include data. The trace is most useful for
problems involving NCP ABENDs where the last activity on the channel is of
interest. Because the trace wraps very fast and cannot be written out
dynamically, it is not useful for finding what happened over a period of time.
3. GTF SIO and I/O trace - This traces SIO and I/O interrupts for a 3705. Its
main value is in showing what SIOs and I/O interrupts take place in relation to
the RNIO trace (which is discussed next). The RNIO trace shows data (PIUs)
in and out. The SIO and I/O traces can then show attention interrupts and how
many PIUs were transferred at a time and if there are problems in timing or in
"coat-tailing." Coat-tailing is the ability to bring PIUs into the system from the
3705 on the same I/O operation used to write other data out to the 3705.
4. GTF RNIO trace, or VTAM I/O trace - This shows the header and first few
bytes of data of each PIU coming into or going out of VTAM. It is sent to
GTF under the RNIO trace option. Each entry is time-stamped. The tracing is
done in channel-end appendages as soon as the I/O operation that transfers the
data is complete.

Section 4: Symptom Analysis Approach

4.3.5

TP Problems (continued)
5. ITAM buffer trace - This is a VT AM trace sent to GTF under the USR trace
option. The trace is performed at two points in VTAM.
a)

TPIOS. Traces are labeled TPIOS IN/OUT REMOTE. Application
jobname, destination and origin node names, feedback data block (FDB),
feedback status block (FSB) for input operations, PIU header and text are
traced.
For TPIOS OUT REMOTE entries, the transmission header is not exactly
as it will appear when I/O is finally performed. Sequence number and
length fields, plus some other bits in the TH (transmission header) may be
filled in after the TPIOS trace. Use the RNIO trace to see exactly how the
PIU header was sent to the 3705.

b) Control Layer. Traces are labeled C/L IN and C/L OUT. Application
jobname, destination and origin node names, and text are traced
approximately at the time of the transfer of a message from the application's buffer to VTAM's buffer, and vice-versa.
Note: VTAM traces such as I/O trace (RNIO) and buffer trace (USR) must
be started and stopped by a VTAM operafor command for each terminal
or node in the network that is to be traced. There is no higher level
operation available.

6. TCAM dispatcher subtask trace - This trace is described in item 5 under "EP
Mode Traces" earlier in this chapter.
7. TeAM buffer trace - This trace is similar to the TCAM buffer trace for EP
lines described at the end of the section "EP Mode Traces" earlier in this
chapter.
8. TeAM PIU trace - This traces path information units (PIUs) for a line, a
line group, or an NCP.

4.3.6

OS/VS2 System Programming Library:· MVS Diagnostic Techniques

TP Problems (continued)

Trace Output Under Nonnal Conditions
The following sections illustrate how some nonnal situations are seen in traces.
Understanding normal processing cannot be over-emphasized, for it is often a
comparison between traces such as these and a trace of an error that reveals the
key to a problem.

Example 1: VTAM I/O Trace
The first example (Figure 4-2) shows only a VTAM I/O trace (RNIO) for:
1. the activation of a PEP (it is incidental that it's a PEP),
2. the activation and connection of a 3600 logical unit,
3. data exchange between an application and the LU, then:
4. disconnection and deactivation of the LU and the PEP.

Trace Entries:
•
•
•
•

1-9
10 - 15
16 - 21
24 - 32 -

show the PEP activation and initialization
are line-activates for start-stop lines
are line-activates for some BSC lines.
show the activation of the link, controller, and LU. Operator
VARY commands are used to activate a 3600 cluster controller
and logical unit. Unlike old-device lines, the SDLC linkJs not
activated until a device on the link needs to be activated.

An application program is then started that openS' (via
OPNDST) the LU.
•

33 - 37 - show both the connection and first message to the LU.

•

38 - 42 - show another link activation and controller contact; the controller
and its LUs are not being traced, however, so none of the
subsequent activation is shown. (Every node to be traced must
have a VTAM command issued.) The PEP itself is being RNIO
traced, which is the reason its activation is shown.

•

43 - 52 - show data exchange between the LU and the application.

•

53 - 56 - show the disconnection from issuing a CLSDST.

•

57 - end- show the PEP deactivation of the SNA devices, the SIS and BSC
lines, and the PEP itself.

Note entries 8 - 9 and 22 - 23 in Figure 4-2: this trace was made with a level
3.0 PEP, which does not support VTAM's attempt to alter the channel attention
delay; therefore, error responses are received. VTAM, through the NCP or PEP,
immediately activates all old-device lines (S/S and BSC) in NCP mode. The SDLC
devices are system generated with ISTATUS = INACTIVE and therefore, no SDLC
links are activated. (The VTAM messages 1ST ..... 1 PEP736A ACTIVE would
have appeared on the console after trace entry 23.)

Section 4: Symptom Analysis Approach

4.3.7

~

~

EXTERNAL TRACE - DO TAPE

00
••• DATE

o

~
fC

fI'l

'<
C"I.I

S-

a

~

~
a
a

!'
t-

o:

~

~
fI'l
S2

~

~
ct,
f')

~

g.

=

~'
('D
C"I.I

DAY 168

YEAR 1915

. 1;RNIO ASCB OOFOCD60 CPU 0001 JOBN
.41690.144513
TIME
2, RNIO ASCB OOFOC060 CPU 0001 JO~N
TIME
41690.145616
3lRNIO ASCB 00FDCD60 CPU 0001 JOSN
TIME
41691.003599
4,RNIO ASCB OOFOCD60 CPU OOCI JOEN
TIME
41691.226991
5,RNIO ASCB OOFDC060 CPU 0001 JOcN
41691.5286~4
TIME
6;RNIO ASCB OOFDCD~C CPU 0001 JOBN
TIME
41691.549546
7 RNIO ASCS OOFOC06v CPU 0001 JObN
TIME
41691.1995'-"2
8· RNIO ASC8 00FOCD6Q CPU 0001 JOSN
TIME
41691.81872b
9 RNIO ASCB 00FDCD60 CPU 0001 JObN
TIME
41692.0~9619
10'RNIO ASC5 OOFOCGeD CPU 0001 JOBN
41692.111212
TIME
11 RNIO A~CB 00FOCD60 CPU ODOI JDbN
TIME
41692.399922
12 RNIO ASCS OOFOCObO CPU 0001 JObN
TIME
41692.411~38
13' RNIO ASCB 00FOCD6C CPU 0001 JOBN
41692.100031
TIME
14 RNIO ASCe COFDCD60 CPU 0001 JObN
416Cf2.1231Ib
TIME
15;RNIO ASCB OOFOCD6~ CPU 0001 JOBN
TIME
41693.0001~1
0..---.
~

....

-

-

TIME 16.30.32.113950
VTAM

IN 10000000 2boooooo 00092BOO 00500900 400698

VTAM

OUT IF002000 10000001 00066B80 00110103

VTAM

IN IFOOI000

0000FB80 00110101 C5D1F1F3

OUT IF002000 10000002 00046B80 OOAO

VTAM

IN IFOOI000 20000002 0004FB80 OOAO

VTAM

OUT lE002COO 10000001 00100880 00010211 200001FO

VTAM

IN lEOOI000 20000001 00069B80 00010211

VTAM

OUT 1£002000 10000002 00080880 00010211 20000500

VTAM

IN lEOOI000 20000002 OOOA9F90 00100200 00010211

VTAM

~

~
~

r::r

OUT lE002000 10000003 00080880 0001020A 2001

VTAM

IN lECOI000 20000003 00069880 000102CA

I:IJ

'(;'
VTAM

OUT

VTAM

I~002000

VTAM

=

--

OUT lE002000 10000005 00080B80 OOOl020A 2005

VTAM

IN lEOOICOO 2000000500069880 0001020A

- VTAM

-

--

-

OUT lE002DOG 11000026 OD08JBaO 0001020A 206B

VTAM

IN lEDOI000 20000026 00069B80

VTAM

OOQ1~2nA

OUT lE0C20QC 10000021 00080B80 0001020A 2019

VTAM

IN IfCOI000 2C000021 00069880 0001020A

VTAM

OUT If 002000 1(000028 00080B80 DOOI020A 2081

VTAM

IN lECC1000 20000028 00069B80 0001020A

V1AM

0UT lE002000 10000029 000B0B80 0001C211 2000050C

VTAM

.IN H001000 2COOCC29 OOOA9F90 00100200 OC010211

VTAM

Figure 4-2. VTAM I/O Trace Example (Part 1 of 4)

[Q

10000004 00080880 0001020A 2003

IN lE001000 2C000004 00069880 0001020A

-

16'RNIO ASCt COFrCDb( CPU 0001 JOtN
TIME:
411f.4.5:C32bS
17 RNIIJ ASCS COFLC060 CPU 0001 JO~N
TIME
41104.605968
18 RNIO ASCS r.OFlI.CvbCr CPU 0001 JO~N
TIME:
41704.e632~e
19 RNIO ASCE. C'O~['C(:6Q CPU 0001 JLBN
4] 705. H't.C!';a
lIME
20 RNIC Asce CCF(jC[){;C CPU 0001 JObN
T1Mf4110 5 .18 511(,
21RNIO Asce COFI.JCD6C CPU COCI JOfN
TIM[
41 n ~.4v62B3
22 ~NIC ASCB 0f·FlICDt,l CPU 2C01 JOb~
4170~.15893

~
"0

47!RNIO ASCB

~IOFE439C

CPU 0001 JOBN
TIME
41968.BOO126
OOFE439C CPU 0001 JOSN
4l961~.876992
TIME
OOFE4390 CPU 0001 JUbN
41972.101915
TIME
OOH.4390 CPU 0001 ·JOBN
TIM~
41972.125414
OOFE4390 CPU 0001 JOBN
TIME
41991.309495

OUT

VTAM

OUT lE002COO IC00002C 00080B8C 0001C20A 212E

VTAM
VTAM

10010001 00230390 OOD9C5Cl C4E84B40

IN lEOOlOOO l000002C 00069B80 0001020A
OUT lE002000 1000002D 00080B80 00010201 212F
IN lEDOlCOO

VTAM

IN lCOOI000 20000002 00090600 000102S0 212FOI

lRAC36GO

IN lC001001

20~0002D

2110~001

00069B80 000le201

00060390 0040FOFl

TRAC3bOO

OUT lEC0211D 10010002 001rC390 00C90508 F3F6FOFO

TRAC3600

IN lCOOI001 21100002 00180390 00C90508 E4C9D9E8

TRAC3600

OUT lE002110 1(1010003 00560390 00C4CIE3 C140C2Cl

TRAC3600

IN lCOOI00l 21100003 OC180390 00C90508 E4C909E8

f5

=iD

1~00211D

VTAM

0

.j:I.

0

=
C".
=
-e=
til

lRAL3600

'"I

iN

""r::r
;r:I.I

TRAC3600

=
~

~

."
."
0

..-

TIME 16.36.27.900691
OUT IF00211D 10010002 00046B80 OOAO

t)

~2FF9100

IN IFOOI001 2ll0COOl 0004EB8C 0031

TkAC3600

d".

•

OUT lE002000 10000028 00080B80 00010201 211t

35 RNI0 ASCB eOF£:439C CPU ODD} JLBN
TIME41788.0977t;4
36 RNIO ASCb OOFE439C CPU 0001 JObN
TIME
41788.642254
37,RNIO ASCB OOFE439G CPU COOl JObN
41789.441206
TIME
38 RNIO ASCB n":FOCD6~ CPU OOul JOhN
TIME
41872 .~31413
39 RNIO ASCB OOFOCD60 CPU GOOI JOcN
TIME
41B13.07178't
4O:RNIO ASCB OOFDC06C CPU 0001 JOBN
TIM!::
41873.266604
41' RNIO ASCB M'FDCD6C> CPU 0001 JOBN
TIME
41873.571967
42 RNIO ASCB 00FDCD6C, CPU 0001 JOBN
TIME
4tlB15.!>131l3

ff

TAPE

Figure 4-2. VTAM I/O Trace Example (Part 2 of 4)

~

~

o

o

CIl

-

,..-

48 RNIO ASCB OOFE4390 CPU 0001 JOBN TRAC3600
TIME
41991.332963
49 RNID ASCB OOFE4390 CPU 0001 JOaN TRAC3600
TIME
41994.510949

"<

--

CIl
N

~
S(Ij

a

a

~s

$'

OC!

t"'4

a=

~

~
CIl
52
,em
~

ct.
(')

~

g.

=

~.
(II
(Ij

~

EXTERNAL TRACE - DO TAPE

-

OUT 1E002110 IG010004 00560390 00C4C1E3 C140(2C1
IN lCOO1001 21100004 00180390 00C90508,E4C9o'9E&

~

50 RNID ASCB 00FE:4390 CPU 0001 JOBN
42224.350661
TIME
51 RNIO ASCS OOFE4390 CPU 0001 JOBN
42224.375301
TIME
52 RNIO ASCB OOFE4390 CPU 0001 JOBN
TIME
42237.552094
53 RNIO ASCB OOFE4390 CPU 0001 JOBN
TIME:
42238.652850
54 RNIO ASCB OOFE4390 CPU 0001 JOHN
TIME
42239.251761
55RNIO ASCB 00FDCD60 CPU 0001 JOBN
42253.499154
TIME
56.RNIO ASCB OOFDCD6C CPU 0001 JOhN
42254.054066
TIME

TRAC3600

IN 1COOI001 21100017 00180390 00C90508 E4C909E&

TRAC3600

OUT 1E002110 10010018 00560390 00C4C1E3 C140C2Cl

TRAC3600

IN lC001001 21100018 00180390 00C905D8 E4C909E8

TRAC3600

IN 1F001001 21100003 0004E68D DOA1

TRAC3600

IN 1FOOI001 21100001

0004~880

0032

VTAM

OUT lF002ll0 10000002 00046880 OOOE

VTAM

IN 1FOOlOOO 211D0002 0004FB80 OOOE

~

."
'"t

~
a
('I)

***

DATE

DAY 168

TIME 16.44.23.800573

YEAR 1975

57 RNIO ASce 00FDC060 CPU COOl JOBN
42263.952790
TIME
58'RNIO ASCB 00FOCD60 CPU 0001 JUBN
42264.5558Cl
TIME
59 RNIO ASCB OOFOCD60 CPU 0001 JOBN
42264.586587
TIME
60 RNIO ASCB OOFDC060 CPU 0001 JO&N
42264.955744
TIME
61 RNIC Asce COFDC060 CPU vOOI JCBN
42265.2·)2fj~7
TIME
62 RNIO ASCB 00FOC060 CPU UOOI JOAN
42270.355864
TIME
63 RNIO ASCS 00FOCD6C CPU 0001 JuBN
42275.385671
TIME
64 RNlO ASCB OOFDCD6C CPU 0001 JObN
422.75.756677
TIME
65 RNIO ASCB OOFDCD60 CPU 0001 JOeN
4Zr(S.797796
TIME
66 RNIO ASCE OOFDCD60 CPU COOl ~05N
TIME:
42281.157516

rIl

..(')

VTAM

OUT 1F00211C 10000002 00056B80 001202

VTAM

IN IFOGI000 211C0002 0004EBcC 0012

Q

=
~.
~

('I)

VTAM
VTAM
VTAM
VTAM
VTAM
VTAM
VTAM
VTAM

OUT If 002000 10000034

00080B~0

IN lEOOI00C 20000034 00069BbO 00010202
OUT 1E002COO It000035 C008CB80 GOOI020B Z11?
IN lEOOI000 2t000035 00069880 (001020B
OuT lE002000 1000003600080880 00010202 212f
IN IfOOlGOO 2C000036 C0069BHO 0001r202
OUT 1[002000 10000037 00080880 00010208 212l
IN lEOOlOOO 2C000037 00069B80 0001020B

-

--

67 RNI0 AS(B OOFDCD60 CPU 0001 JUHN VTAM
42895.353348
TIME
68 RNIO A~C8 O~FOCD6C CPU 0001 JGbN VTAM
42898.bt699~
TIME
69 RNIO ASCB GOFDCD6G CPU ceDI JOPN VTAM
~289f,.8b4574
TIMt

,e

00010202 211C

GUT If 002000 10Qo002F

e008~B8C

0001020B 2eOI

IN If 001000 20Q0002F

C~069B8~

00Cl r 2&R

OUT

l~CLlGCO

11000030 000eCB80 0001020S

•

2CC~

Figure 4-2. VTAM I/O Trace Example (part 3 of 4)

~

.......

-

-

EXTERNAL TRACE - Dn TAPE

70 RNlu ASc[; rOrDCD6r

CPU 0001 JObN VTAM
42'i02.46bo41
CPU 0001 JQ~N VTAM
[H;C[!6'

-

-

IN IfCOI0QC 2(000030 000b9BHO 0001C208

--.

TIM ..

71 kNIO

A~Ce

~

T1 Mf

CPU ceDI J()EN VTAM
72 RNIO ASCE OOfDCD6C
TIME
42906.C7135"t
73 RN u:: ASC b fC F[,C C6e CPU (O~1 JubN V1AM
42'1C'o.C6't(·4b
TIM!'
74 RNlO ASCB -,OF-CCUt;-; CPU cnOl JDBN VTAM
TIME
42909.671495
""""!::
~

-

g

~

~
3

't:I

~

~

ea.

'<

fa.
~

~
't:I
"1

~

~

~

2~000031

00069BPO OOO1020b

OUT IfOC7000 10000032 00080e80 0001020B ?C07
IN IEOOIOOO 20000032 Qr069B80

OOOl~2rB

VTAM
V1AM
V1AM
VTAM

OUT lEOC2000 lOOOC04D COC90B8(. GOOI n20B 2030
IN IEOOIOOO

2C00004~

OOC6geao COO1020R

OUT IfC02000 1000004E 00080B80

OC~lJ20B

2019

IN IFDOIOOO 2000004E 00069880 00010206

~
~

Q

a'

VTAM
VTAM

OUT 1£C02000

1~OOQ04F

J006CB8~

00010208 2Cb7

IN 1fOCIOOO 2000004r 00069880 OD010208

VTAM

OUl IF002000 1000000D 00056~BC 0012(1

VTAM

iN IFOCHOO 200000.oD OOC4H&O 0012

Figure 4-2. VTAM I/O Trace Example (Part 4 of 4)

~
(")

IN 110CICOO

-

75 RNlO ASCf:. OOFuC[J';'v CPU GOOI JObN
43005.3150('0
lIME76 RNIO ASCb ~! FDCD6(J CPU G.I JOhN
TIM!
43008.~91b5t.
77 RNlO ASCB OOfDCD6i) CPU 0001 JObN
43009.10C'051
TIME
78 RNIO ASCB OQFGCD6C CPU 0001 JUbN
43012.499105
lIMF
79 RNHJ ASCS O!" FOCDb0 CPU 0001 JOUN
TIME
43012.S5&114
80 RNIO Asce OOFDCD60 CPU 0001 JObN
TlMf.
43015.8990HB
81 RNIO ASCB OOFDCDeO CPU oeol JOBN
43·017.031614
TIME.
82 RNIO ASCB fJOFuCD6C: CPU 0001 JOEN
43017.2994L5
TIME

ct.

LUT HO('2')OO 1 ('(100031 0008::;&&0 r:"01'.'205 zr:-5

42902.48(:' 7C;4

~

fIl

'()'
Q

a
g'
tD

~

TP Problems (continued)

Example 2: VTAM and GTF Traces
The second example (Figure 4.3) shows all of the VTAM·GTF traces for parts of
the process shown in the previous example. The TPIOS buffer trace, control
layer (C/L) buffer trace, RNIO trace, and NCP line trace are illustrated in the order
they occur.

Trace Entries:
• 1· 17
•

- show the PEP activation, etc.

18· 29 - are activations. for start·stop lines.

• 30·55 - show the SOle link activation; they also show controller and LU
activation and connection.
• 56

- is the first C/L record, the first data received from the application.

• 56·76 - show the message flow between the application and the LU.
• 77· 93 - show the NCP line trace in relation to the data in the PIUs. The
exact placement ofline trace records relative to RNIO and buffer
trace records cannot be depended upon; in general, most or all of
the line trace that shows receipt of a message in the NCP will
precede the inbound host traces for that message. The RNIO
records are omitted from this section of the trace: there
should be RNIO records for each PIU, plus one for each line trace
entry showing a buffer ofline trace data coming into VTAM from
the NCP.
..
• 84· 122 - show the last data exchanges, the disconnection and deactivation of
the 3600, and deactivation of the SOLC link.
The RNIO trace entries in this example· can be matched exactly
with the entries of the previous example. This example shows how
messages can be followed through the network. The GTF I/O and
SIO traces are not shown in this example. Running a single termiilal
at a low message rate, as in this example, would cause almost every
RNIO trace to be preceded by several I/O - SIO entries. The
sequences are usually as follows:
Outbound:
GTF

SIO

GTF

I/O -CE,OE

GTF

RNIOOUT

- Inbound:

4.3.12

GTF

I/O -ATTN

GTF

SIO

GTF

I/O - CE, DE, UE

GTF

RNIO IN

OS/VS2 SystentProgrammins Library: MVS Diagnostic Techniques

TP Problems (continued)
All SIO and I/O entries are for the native sub channel address of
the 3705. With more message traffic, different sequences may be
seen. Very few problems have occurred in the area of the VTAMNCP channel interface, but a brief explanation is included to avoid
confusion.
The following GTF I/O, SIO, and RNIO sequences are possible:

1. Coat-tailing - data is returned from the NCP on the same I/O
operation used to send data out, as follows:
GTF

SIO (WRITE CCWs with READ CCW appended)

GTF
RNIO

I/O CE, DE, UE (UE because some but not all READ
CCWs were used)
IN

RNIO

IN

RNIO

OUT

The number of PIUs transferred out and in can vary. If the
maximum number of PIUs is sent in (see the MAXBFRU
operand of the HOST macro in NCP generation), then the I/O
interruption has a status of just CE, DE.
2. More than the maximum number of PIUs were in the NCP ready
to be sent to VTAM (assumes the maximum is 3): .
GTF

I/O ATTN

GTF

SIO (READ CCW string)

GTF

I/O CE, DE, ATTN

RNIO

IN

RNIO

IN

RNIO

IN

GTF

SIO (read CCW string)

GTF

I/O CE, DE, UE

RNIO

IN

The first read operation ends normally with ari attention,
indicating more data in the NCP. This avoids an extra interrupt
just to present an attention.

Section 4: Symptom Analysis Approach

4.3.13'

~

~

EXTERNAL TRACE - DO TAPE

~

••• DATE

0

-<

'<
rIl

;-

a

."

t
a

S·

QQ

t:
0"

lot

~
~

t"'-l

FDB 00000000 00867028 ,00100000
RSVD 0000
THRH lCOC2GOC 10000000 00006B80 00
TEXT 110103

lNG2 00A4

* ••.

7 RNIO ASCB

8 RNIO ASCB

~

9USRFD FEF

e.

,Q

=
(D

rIl

10 USRFD FEF-

41691.5431Cy2
CPU 0001 JUbN VTAM
TIMf
41691.54C,54b
12 RNJO ASCB OCFD('D60
CPU 0001 JOBN VTA~
TIMf
41691.799592
13 USRF[l FEf- ASCB COF[)(,!)60
JOBN VIAM
IPIQS IN ANODE VTAM
HW.
DNLDE PtP73(,A
REMOTE.
FSb
Tl--:I<,H
Ton
TIMF
41691.803107
14 USRFO FEF ASCS OOFDCUbC JOoN VTAM
TPI0S UUl ANLL~ V1AM
FGE
THRH
kEMOTt ONOOE: P~P7~cA
TEXT
TIMt

11. kNIO ASCB

[:.:F[)CD6~

L..

Figure 4-3. VTAM and GTF Traces (Part 1 of 13)

00000000 00000000
*

OUT IF002000 10000001 00066B80 00110103
IN IFOOI000 2DOOOOOI OOOOFB80 00110107 C5D7F7F3
COOOOOOO
022COOOO
IFOOI000
110107C5

OOB67369
00000000
20000001
D7F7F3F6

OOOAOOOO
RSVO 0812
lNG2 OOCO
10002000 00010000 00000000 00000000 00000000 00000000
DOODFB80 00
C140
* •• PE-P736A
*

~

~

~

"'t

coro

lNG2 OOA4

RSVD

*.

TlM~

41691.217088
00FOC060 CPU 0001 JUbN VTAM
TIME
41691.226997
OOFDCDbO CPU 0001 JU~N VTAM
TIMF
41691.528864
ASCb C0FDCD60 JOBN VTAM
TPIOS IN ANOliE VTAM
FOB
RE:MOTE: DNDDE P[P730A
f-SI:l
THRH
TEXT
TIME
416'11.532490
ASCB \' Of'OCD60 JObN VTAM
TPIeS OUT ANODE VTAM
F:::'L
DN(jOE Pl P73bA
RE:MOn
THkH
TEXT

RSVD

IN 10000000 20000000 00092800 00500900 400698

00000000 OO~67348 OOOEOOOO
RSVD
lC002000 10000000 00006B80 00
nXl AD

t;.

go

TIME 16.30.32.113950

TIME
41690.735400
2 RNIO ASCBOOFDCD60 CPU 0001 JUhN VTAM
TIME
41690.744573
3 RNIO ASCB 00FDCD60
CPU 0001 JOBN VTAM
TIME
41690.745676
4 RNIO ASCB 00FDCD60 CPU 0001 JOhN VTAM
TIME
-41691.003599
5 USRFD FEF ASCB OOFOCD6D JOBN VTAM
TPIOS IN ANODE VTAM
FDE;
REMOTE DNODE PFP736A
FSI3
THRh
TE-Xl
TIME
41691.0742~1
6 USRFD FEF ASCB COFDCD60 JOBN VTAM
TPIOS OUT ANODE V1AM
FDE
REMOTE DNODE PE'P736A
lHRH

i

0
~

YEAR 1975

USRFD FEF ASCB 00FDCD60 JOBN VTAM
TPIOS OUT ANODE VTAM
REMOTE DNODE PEP736A

~
t"'-l
N
t"'-l

DAY 168

0
t::I"

OOuOOOOO 00000000
*

;"

a

--=
fIl

n

-=
0

OUT IFOC2DOO 10000002 0004&680 OOAO

S·

IN IfnOl000 20000002 0004FB80 OOAO

(D

,e
(0000000 00807431 00010000
RSVD 0812
lNG2 OOCO
022COOC:: '':;COOOOOO 10002000 (,'0020COO OOOOe-JOO OOCOOOOO 00000000 00040000
IFOOIJon 20000002 C004Fb80 00
AC'
*.

*

00000000 00667410 rC270COC
RSVD (1000
lNG2 COA4
RSVD 00000000 00000000
lCOO~OOO l~OOOOOO 00000880 00
0102l12C COOIFOF6 61FIF761 F7F54BFl F6F8FIFb ••••••• 06/17/75.16816*
4Bf-3F44::; F5F2
*.34.52
*
GUT IF002:00 100QCOCI 00100680 0001'-'211 2ea001H.
IN llCOI000 20000001 0006Qe8o 00010211
(,00000(,[;
C 22CC'OOO
1 LCe 1 ,.,;)C

0507369 ~OO300CC
RSV[' (812
LNG2 ooeo
DOCODOD lC_u20~' G~010COO cccrnooo to:~0CAC 0000COCC COObODOD
OeeeDOl CC069b8C 00

* •••

010211

GrOeteC
lCCOZ~O

01[2112

OB67348 ~015000C - RSVU
COl ';::: r
;) -'f''',',B8C CO
(r5CCOC

eeoo

LNG2 OOA4

*
RSVD

* ••••••••

00000000 00000000

*

~--~

'-'--""

-

EXTtRNAl TRACE - DO TAPE
lIME

41t91.e143b2
15 RIIIIO A~Cf:' (;(,F[lC[,c.C
C..·.v 0001 J(JbN VlAM
TIM~
416S1.81U128
16 RNIr; ASCI: COF(JC()b(' CPU 00(:1 JUE..N VlAM
TIM[41692.099019
17 USR~U FEF ASC~ ~O~D(Dbt JO~N VTAM
TPICS IN ANO~E VlAM
F~b
kEMOTE DNCD!: PEP73~A
fS6

CUT ItC02COO lroooooz OOOBDBBO 00010211
IN 11.001:CO

2~000002

0(000000 . 0 67431 ~OC70 OC
(22C0000 uO 00000 10002 '0
THR~ lE001~OC 2C (0002 (SOA9 90
llXl 1002~OOO Jl 211

41692.103190
18 USRFO FEF A~(e COFDtD60 JObN V1AM
TPIOS OUT ANODE. V1AM
REMOTE
~~co~ PtP73bA

OCOA9F90 00100200 00010211
~SVD

ce12
LNG2 ooco
OQOOOOOO OOQ0}CQO 0COOOOOO OOCAaOOO

00020~OC

co

* ...... .

TIME

F(;E'

CCoC-.:.coc

THKH

lCOO2~Ot

oe6741C C0120000
RSVD 0000
"(\f1(J~Cr: (;0000&8 C :'0

lNG2 OOA4

19 RNIO A$Cb

CCFGC~bC

41692.11~b7?

C~UCCCI

JUtN VTAM
TIME
41692.117212
20 RNIO Asce OOF[JCDcC
CPU ceDI JllbN VT AM
TIM[
41692.399922
21 USRFD FEF Asce GOFDCD60 JOBN V1AM
TPIOS IN ANO~E VTAM
FDE:
~EM01E
DNO[JE P~P13bA
FS5

OUT

IFGCi~OC

10(00003 0008CB8Q C001020A

IN

l~~Cl~OO

~~OC0003

24 RNIO ASCb
25 USRFIJ FEF

~
()
ct.

g

f:

26 USRFD FEF

~

.§

s-a

~
~

27 RNIO ASCB
28 RNIO ASCB

lao
C'I'.I

FDb 00000000 00b6134b 00120000
RSVD 0000
THRH lC002~~0 10000000 00000880 00

TIME
41692.413215
OOFDC060 CPU \i001 JOBN VTAM
TIME
41692.417538
OOFDCD60 CPU oeOl JObN VTAM
TIME
4169?700031
ASCB 00FDCD60 JOSN VTAM
TPIO~ IN ANODE VTAM
FOB
kEMOTf
DNODE PtP7~6A
FSb
THRH
TEXT
TIME
41692.703547
ASCB OOFDCC60 JOSN VTAM
TPIOS OUT ANODE VTAM
FOb
REMOTE ONODE PEP736A
THRH
TEXT
TIME
41692.713339
OOFDCObO CPU COOl JDBN VTAM
TIME
41692.723718
OOFDCD6~
CPU 0001 JOBN VTAM
TIME
41693.000187

~

a..-....-..-

~

Figure 4-3. VTAM and GTF Traces (Part 2 of 13)

...

"C

~

~

t it

*

~

2~Ol

'"0

a~

00069B80 0OOlS20A

lNG2 OOA4

3fIl

-

RSVD

-..
n

*

=
...

00000000 00000000

~

* .•••.

TE Xl 0 Ie 20A2C ': 3
A~CB

00000000

~OCOOOOO

Ol'OOOOOO CCE-:t13b9 (\0030000
RSV[) 0812
LNG2 OOCD
C22COCOC oocroooc 10002000 00030000 OOOcOOOO 00000000 00000000 00060000
THk~ lE001000 2(000003 00069BPO 00
lEXT C 102e A
* •••

TIME
41e92.4C3432
22 USRFD FH ASCb 00FDCD60 JO&N VTAM
TPIOS OUT ANODE VTAM
REMOTE DNODE PtP73~A
23 RNIO

RSVD

*

* •••••

lEXT CI02CA20

TIME

20000~OG

Q

~.

*

OUT lE002000 1 0 000004 00080880 OOOlC20A 20C3
IN 1I:0('}(/00 2(-00('004 00069880 0001020A
D0[00~ot

U0867431 00030000
RSVD 0812
lNG2 OOCD
022COOOO 00COOrioo ID002000 ~00400DO 00000000 00000000 OODoocor
lEOOlCOO 20000004 00069880 00
01020A

....

00000~00

60867410 08120000
RSVD OOCO
10000000 GOOOOB80 00
01020A20 05

lNG2 OOA4

lC002~OO

OUT lE002000 10000005

00~80B~0

* •••••

RSVD

00060000

*

00000000 00000000

*

0001020A 2005

IN 1[00100C 20000005 00069880 OD01020A

-

-

"

.....!,

,

~

w

;...
Q\

o

~
fIl

29 USRFD FEF ASCB. 00FDCD60 JOBN VTAM
TPIOS IN ANODE VTAM
REMOTE DNODE PEP136A

N

~
S'
:3
r.f,I

~

1
5·

OQ

r""I

c;:

~
~

fIl

S2

.~

!~
~

ct.

(')

~

So

5!.
.c
c:
(D

r.f,I

-

r----

TIME

41693.003693

~r-3O'USRFD FEF ASC8 COFOC06C JOE.N VTAM
TPIOS OUT ANODE VTAM
REMOTE DNODE PEP136A

FOB
FSB
THRH
TEXT

-

00000000 00B61369 00030000
RSVD 0812
LNG2 OOCO
022COOOO 00000000 10002000 00050000 00000000 00000000 0000000000060COO
lEOOI000 20000005 00069B80 00
01020A

•...

*

FOB 00000000 00861410 00120000
RSVO 0000
THRH lC002000 10000000 00000880 00
TEXT Ol020A21 IS

TIME
41146.536984
31 RNIO AS(.e 00FOCD60 CPU UOOI JOBN VTAM
41146.543962
TIME
·32 RNIO ASCB OOFOC06C CPU 0001 JOSN VTAM
TIME
41146.823315
33 USRFD FEf AS(B 00F~CD60 JOSN VUM
FOB
TPIOS IN ANODE: VTAM
Rt::MOTE ONODE PfP136A
FSB
THRH
TEXT
TIME
41146.826948
34 USRFO FEF ASCE OOFD(060 JOBN VUM
FOB
TPlOS OUT ANODE VTAM
THRH
REMOTE ormoE PE:P136A
TEXT
TIME
41146.837196
35 RNIO ASCB OOf~CD60 CPU 0001 JO~N VTAM
41146.B42293
TIME
36 RNIO ASCB OOF(JCDbO CPU 0001 JOBN VTAM
TIME
41141.123411
37USRFO FEF ASCB OOFOCG60 JOBN VTAM
FDb
TPIOS IN ANODE VTAM
FS6
REMLH
DNDDE PEP121t-A
THRH
TFXT
41141.121060
TIME
38 RNIO Asce OOFDCD60 CPU 0001. JObN VTAM
TIME
41149.32486~
39 USRFD FEF ASCB COFOCDt..O JOBN VTAM
FOb
TPIOS IN ANODE VTAM
FSB
kl::.MOTt: ONODE PEI'13t..A
THRH
TfXT
TIME41149.3284'76
40 USRFD FfF ASCB OOFDCDbO JObN V1AM
FOB
TPIOS OUT ANuDE V1AM
THRH
REMOTE ONODE Cl1l0f1
TEXT
TIME
41149.34064C
41 RNIO ASCB GOFDCD60 CpU ~OOI JOSN VTAM
TIMf
41141i.3451£:7

-

--

'EXTERNAL TRACE - DO TAPE

LNG2 OOA4

RSVD

00000000 00000000

* ••••.

*

OUT H002000 IO()0002A 00080880 COOHl2CA 211B
IN lEOOlOOO 2000002A 00069880 C001020A

~

00000000 (0861369 00030000
RSVO 0812
LNG2 OOCO
022COOOO 00000000 10002000 D02ACOOO 00000000 ccrcooco 00000000
lE001COO 20obo02A 00069880 00
01020A

*•••

0000C1CO (':OB61348 00120000
RSVD 0000
lC002000 10000000 00000880 00
01020121 IC

LNG2 OOA4

RSVO

a
f

~)06COOO

*

~

.()
o

-=

00000000 CCOO;OOO

* •••••

~.

*

fD

~

OUT 1£002000 10000028 000bDB8o DOOle2DI 211C
IN If 001000 2C00002b OOOt9B80

ODOI~2nl

00000000 00661431 00030000
RSVD 0212
lNG2 ooce
~22CCOOO COOOOCOO 10002000 D0280000 ooooooeo aOGrODeO 00000000
IEOOI000 200DOQ2B 00069B80 00
010201

o~rbOOCO

*•••

*

IN lC001000 20000001 00090800 000lr280 211COI
00000000
022COOOO
lCOOI000
01028021

RSVD 0812
nOB612Al 00060000
lNG2 OOCC
GOOOQOOC 10002000 00010000 00000000 oooooono COOOOOr,O 00090000
20000001 0009caco 00
lCOl
* .......

*

00000000 00667280 onlOOO~C
kSVD 0000
lCOC21lC I COOCJOO·~. ('000688C 00
110101

lNG2 OOA4

*•••

RSVO

(I{;(:OO:;. 0

(t~'-CtO,'

;,'tlv

*

OUT IFC0211C 10000001 OOOb.6bftO COll0101

- -

Figure 4-3. VT AM and GTF Traces (Part 3 of 13)

~

~-~

~

"'~7

_?"

-

~

42 kNIO ASCB OCFDCObv
43 USRF 0 'F EF

44 USRFD HF

45 RNIU ASCB
46 RNIO ASCB

47 USRFD FEF

CPU 0001 JU8N VTAM
TIME
41750.CZ5~22
ASC.I; OOFDCD6C JOBN VT AM
TPIOS IN ANODE VTAM
FDb
RtMGTE DNODE CLILOt7
fS8
THRH
TEXT
TIME
~1750.C28~7q
ASCB OOFOCDbO JO.BN VTAM
TPI05 OUT A~ODE VIAM
FOB
REMOTt DNODE UICILOE7
THRH
TEXT
TIME
41756.278253
OOFDCD60 CPU 0001 J08N VTAM
TIMt
41758.282b5b
COFOCD6( CPU COOl JOBN VTAM
TIME
41758.828781
ASCh OOf'DCD60 JOBN VTAM
TPIOS IN ANODE VTAM
FOB
REMOTE DNDDE UICIlOE7
FSB
THRH

-

EXTERNAL TRACF - DO TAPE
IN IFOOlC'GO 211C(\001 (V'''00fB80 C01F1F3 f'6fOFOFe

COOOOOOO CObb736~ OOOACOOO
RSVD 0812
LNG2 00C0
022COOOO OOfDroeD 10Q0211C 00010000 rooncoco 80000000 ~OC"COOO
IF001000 211C0001 OOOOEB80 00
1101F3F6 FOFQFOFO FOFO
*~.36000000
00000000 00b67348 00100000
RSVD 0000
lC002110 10000[00 COOOtB80 00
000101

LNG? (,(jA4

RSVv

oroooooo

*

000 0000 OOOCOOOO

* •••

*

OUT IF00211D 10000001 OOU66BPO 00000101
~

IN If-001GOO 2110("001 0005EB80 000001

""CI
""CI

00000000 OOB672Al 00020000
RSVD 0812
LNG2 ooce
022COOOO 00000000 10002110 00010000 00000000 ooooooeo
1FOOIOOO 21100001 0005EBBO 00
TEXT OD01

ooooeooo

*••

*•••••
*

~

~

~

a
~

'tS

~
~

fa •
.."

~....

~

~

~

;...
-..l

***

OATf

DAY 168

YEAR 1975

51 USRFD FEF ASC8 OOFE4390 JO~N TRAC360C
TPIOS IN ANODE PIVT3600
FOB
REMOTE DNODE U1C1l0E7
FSB
THRH
TEXT
lIME
41786.044994
52 USRF~ FEF ASCB OOFE4390 JOBN TRAC36CO
TPIOS OUT ANODE P1VT3600
FOB
REMOTE DNODE UICILOE7
THRH
TEXT
TIME
41788.093294
53 RNIO ASCB OOFE4390 CPU 0001 JOBN TRAC3600
TIME
41788.097764

-

Figure 4-3. VTAM and GTF Traces (Part 4 of 13)

III

.n

OOOO~OOO

*
*

oooeoooo

*.

LNG2 00A4

*.

0~040000

*
RSVD

00000000 00000000

*

~

~.

~

***

00000000 00867369 0001000C
RSVO 0812
LNGZ OOCO
022COOOO 00000000 1001211D 00010000 COOOOOOO ceoooooo
IFOOI0~1 21lDOOOI 0004EB80 00
31

OUT 1F002110 1C010002 00046B8C OOAO

tD

a

*

TIME 16.3t.27.900691

00000000 00861348 COOfOOOO
RSVO 0000
lC002110 10010000 G0006BBQ 00
AO

~

o~OSOOOO

TIME
41758.b32307
48 USRFD fEF ASCB OOFt:4390 JOBN TRAC360(j
TPIOS OUT ANODE P1VT360C
FOE
00000000 00B67280 00310000
RSVD oeco
LNG2 OOA4
RSVD OOOCOOOO
REMOTE DNODE U1C1l0E7
THRH 1C002110 10010000 C~006B80 00
TEXT 31010302 FF910000 D9(5(3D6 09C44040 r0000040
• • RECORD
40404040 40404008 D7F1E5E3 F3F6FOFO
.PIV136CO
TIME
41787.389730
OUT 1F00211D 10010001 00276BeO 00310103 OZFF910C
49 RNlO ASCB OOFE4390 CPU COOl JObN TRAC3600
TIME
41787.394270
IN 1FC01001 21100001 0004E680 0031
50 RNIO ASCB OOFE4390 CPU 0001 JOBN TRAC3600
TIME
41788.042000

~
(")
=t.

-

-

.....

;

C(;H 429(:

J(If:,N ThAC.

(;'(0

TPIOS IN

ANGll

~lV13t(

Hi

(,c co.:,'(,

R~MPT~

GNOD~

UICILCt

F~

O~

T~

H 1~
T At;:

H
56

IN IFOOICCI

Ti\AL36::'O

USRFD Ffl ASCB
tiL

~11~2C(:

con4E6~~

-

on

TAPi"

~OA(

417~f.0~:lj4

TI~L

_ f-t 72.AI ;'.01'" ,(
(,·0('(' :'00CC;f)O 1,)01211[1

OI)020~Ct'

~1)Cl

00

211C~OG2

(,OC~E~a[

k::'V[J Cf<12
("O~,

lNG2

u.cn

0000 :CCCC'CC'

cee

;(;(.

t

l>04(" \,:0

*.

417b~.b4~266

JOUN lkAC36f~
ANODE ~lVl~GC0
TEXT D~(5CIC4 r~~t~OF3 ~BG7C~4: 7DD7DQDb G4f7E?7D
t.N(lL·E UIClLCE7
4CCtD6[J 404(,4C40
*
*
*
*
40404G4r,) 40404040
TIMr
41972.11982C
OUT HC'021lD 1<'010003 OO!)B1)390 C'('C4LIE3 LI4C('2('1
70 RNIG A~(l' (tCH439(; CPU ~\-01 JuuN TRAC3bCO
TI~E

71 RNIO AStt

~
n

g"

CPU 0001 JOhN TRAC36GO
IN 1(001001 211D0003 00180390 00C90508 t4C9D9E8
TIMt
41991.309 .. 95
72 USRF[ FlF ASCE 'OOFE~390 JUbN TRA(360C
TPIOS IN - ANODE PIVT3(;.O(;
Hi6
(000000(, -LO~o7111 CC1~COOO
RSVD OEl2
lNG2 OOCO
R~MLT[
~NQD[ UICllCE7
F~B
022(0000 lOOCOCOO l001211~ U00300CC Dcceocoo CCOOtOCO C0000COO GOlbOOOO
T~RP lC~OI~01 211D0003 ~OlbG~9C QO
TEXl C9D5U8E4 C909f~4C ~3F6fOf~ 40F~FrF2 ~~f4FIC~ *INQUIRY3~OO oo~ UIC*
F1
*1
*

a
a

."

0-

!;;.
~

41q91.3124CB OCFE439C' JObN TRAC3t,1;0
GUT ANL{;[ PIVT360c..
HXl
C/l
LNC[;f UIC.UC£ 7

C~Clf3tl

4CCZtlf2

C54DCl~7

C1D3CQC3 CIE3C9Cb

U54('Lc,.C~

f.2[J7u6D~

l~L54CD4

C~E2E2Cl

C7C~~04C

4('404' 40 .,04::4'41'1 .. C41,"·4(-4'.: 4e:4('4"4C 4(>41>4('40
~(;"'~~'-'4C

---

4~404~4G

4n4~4~4~

4~4n4~4~

4~4~~040

404(4040 40404('4(
TIMf

41991.323826

Figure 4-3. VT AM and GTF Traces (Part 6 of 13)

*INQUIRY 3600 C02 UIL*
*1

TIME

."

i

TEXl

[;N(J[)l \JIClVi:7

--

--

*

*~ATA bA~~ APPliCA1IC*
*N RfSPGNSE MESSAGE *

*
*
*

*
*
*

~

."

a=

~
9

--=
fIl

t"J

Q

.....

~.

~

,e

~

W
t...»
Q

- -- -

75

US~FL

~tf

~St~

~0F~43~(

TPI(;~

o

~IUT

fd:MLlt:

~
en

(,f-l(·Dl

UICll( 1:7

[f

rt

Is
!3

$'

OQ

t':

0"

~

~'
en
~

i'
o

~

n'
~

g.

=

.E'

;

""

76 kNIC ASCh

--17

--

-

lIM[

41991.

U~F.4~9C

CPU C001 JObN
4199 1. 33;" '163

TJM:;

-

-

-

.JUblli TRt.C3cfO

AN(lCf PIVT3le;)

N

!3

-

-

EXTFF:NAL TP,AU - DC TAPI

~OCCDOOC (OB670FG rOb~GOOO
LNC2
RSVu 0000
lC0021l[l li':Cll"'''CO ; Of}G3'-'0 CO
TEXT C4LIE3Cl ~OCZCIE~ t54CC187 0703(9C3 Clt:3C9D6
054CDCfC5 £.287Db(\5 !'2Ce; /tCDlo C~E2f2C 1 C7C54040
404~4(40 ~c4t4r4C 404C4n~C 4n404G4C 4P4J4C40
404t~C4n 40404(40 4040404( 40404C4C 40404040
404(4C40 404 r 4C4C

HiE

~?7:J79

LCe
LCD
LCu
LCD
LCD

UUT It 0 0211D

TRAC~~0~

9
'1

9
9
9

lIMf42085.93Y817
78 USRFD FF2 ASCb rCFOCD6C JOhN VTAM
ut-.(,DE P~P73bA
LHH
LCD 9
LCD 9
LCU 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LC[j 9
TIME
42085.942128
79 USRF~ FF2 ASCB rOFDCD60 JDBN VTAM
LINE
uNDO£: Pl:-P736A
LCD 9
LCD 9
LCD 9
LC[! 9
LCD 9
LC[I 9
LCD 9
LCD 9
LCD 9
LCD 9
TIME
42(-85.942605
80USRFD FF2 ASCB OOFDC060 JOBN VTAM
ONODE PE:P73tA
LINE
LCD 9
LCD 9
LCD 9
LCD 9
LC[) 9
TIME42085.943443

Figure 4-3. VTAM and GTF Traces (Part 7 of 13)

(:(lA4

~,~Vl.

C00C~~OO

~OCOtCOQ

nl~:t-:

'"

C058:390 Or(4C}E3 C140('2Ll

-

f'CF q
P(.F Q
I-'CF "
f'CF 9
PCf 5

TIME
TlMf
TIME
TIME
TlMf

[P
6
7
1
7
7
7
1
7
PCf- 1
PCF 7

00
TIME
TIMF
TIME
TIME
TIME
TIME
TIME
TIME
TIME
TIME

PCf
PCf
f'CF
PCF
PCF
Pc.F
PCF
PCf

l~nlG004

*OA1A ~~Sl A~PLI(ATIO*
*N kE~PON~~ HES~~lt:
*
**
*
>I'
*

PDf 7F
P[~F Cl
PDF 33
PDF 1F
PDF FF

LCu
LCD
LCD
LCD

9
9
9
9

PCF 9
PCF ,Q
PC': 9
PCF 9

TIME
TIMF
TIME
TIME

07
01
07
01

~('f "'5
SCf 40
SCF 40
SCF 1t5

PDF
PDF
PDF
PDF

71:;
Fl
3A
FF

09
09
C9

TIME 00
SCF CD
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
f'OF
PDF

CO
01
OC
90
C9
08
C9
E8

LtD
LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD

9
9
9
9
9
9
9
9
9

PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF

7
1
1
1
1
7
7
7
7

TIME
TIME
TIME
TIME
TIME'
TIME
TIME
TIME
TIME

09
09
09
09
09
09
09
09
09

SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF

49
49
49
49
49
49
49
49
49

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

C'l
2C
01
00
03
CO
D5
f4
09

07
07
07
01
01

09
09
09
09
09
09
(i9

~CF

4~

SCf
SCF
SCF
SCF

40
40
45
45

FF
!:.E

PCF
f'CF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF

7
1
1
7
1
7
7
7
6
9

TIME 09
TIME 09
TIME 09
TIME OA
TIME OA
TIME OA
TIME OA
TIMl OA
TIME OA
TIME,OA

TIME 08
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 49
SCF 00
SCF 45

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

40
F6
FO
FO
F1
E4
C3
CB
85
7f

LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD

9
9
9
9
9
9
9
9
9

PCF
PCF
PCF
PCf
PCF
PCF
PCF
PCF
PCF

7
7
7
7
7
7
7
7
9

TIME
TIME
TIME
TIME
TIME
TIME
TIME
TIMe;
TIME

09
09
OA
OA
OA
OA
OA
OA
OA

SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF

49
4'1
49
49
49
49
49
49
45

PDF
PDF,
PDF
PDF
PDF
PDF
PDF
PDF
PDF

F3
FO
40
H
40
FI
FI
85
00

PCF
PCF
PCF
PCF
PCF

[P
9
9
9
9
5

00
TIME
TIME:
TIME
TIME
TIME

TIME' OB
SCF 45
SCF 40
SCF 40
SCF 45
SCf 45

PDF
PDF
PDF
PDF
PDF

7E
CI
3D
7E
FF

LCD
lCO
LCD
LCD

9
9
9
9

PCF
PCF
PCF
PCF

9
9
9
9

TIME
TIME
TIME
TIME

OA
OA
OA
OA

SCF
SCF
SCF
SCf

45
40
40
45

PDF
P.OF
PDF
PDF

1E
11
00
FF

f:P 00

OA
OA
OA
OA
OA

-

-

-::-J

a~
a
fIj

.()

o

s·a
~

~

"",.....,.,

.-,
~XTERNAL TRACE - DO TAPE

81 USRFD FEF ASCS 00FE4390 JObN TRAC3600
TPIOS IN ANODE PIVT360C
FOB
REMOTE ONOOE U1CllOE7
F~B
THRH
TEXT
TIME
42085.948765
USRFO FF1 ASCB 00FE4390 JOE:;N TRAC3600
ell IN
ANODE PIVT3600
TEXT
ONOOE UIC'1l0E7
TIME
42085.956048 .
USRFO FF1 ASCB 00~E4390 JOBN TRAC3600
C/l OUT ANODE PIVT3000
TEXT
ONODE UICILOf7

00000000 00B67369 00150000
RSVD 0812
lNG2 OOCO
022COOOO 00000000 10012110 OOOCOOOO 00000000 OOOOOOO'\) oooooooe 00180000
1COOI001' 2110(100C-00180390 OC
C90508E4 C9D9E840 F3FbFOFO 40FOFIF1 4OE4FIC3 *INQUIRY 3bOO Oil UIC*
FI
*1
*
C9D5v8E4 C909f84C F3F6FOFO 40FOF1Fl 40E4FlC3
Fl

*INQUIRY 3600 011 UIC*
*1
*

C4CIE3CI
0540D9C5
40404040
4C404040

*OATA BASE APPlICATIO*
*N RESPONSE ME:SSAGE *
*
*
*
*

40C2Cl£:2
E207D605
4('404040
40404040

C540CI07
E2C540D4
40404040
4C40404r

0703C9C3
C!1E2E:2Cl
40404040
404n4C4C

C1E3C90b
C7C54040
404C4040
40404040

82

ff

(')

;to

8
~

~

a
a

....

"I:S
·0

~
'e?,
"<

~.
flO)

.t'
"I:S

USRFD FEF ~sce OOFF4390 JOSN TRAC36fO
TPIOS OUT ANODE PIVT3600
FOb 00000000 00867348 C06~00OO
REMOTE DNODE UICILOE:7
THRH IC00211D 10010000 00000390
TEXT C4Clf3Cl 40C2tlE2 C540C1D7
054009C5 E2070605,E2C540D4
40404040 4040404C 40404'::40
40404·040 404(14040 40404041)
4040404~ 4r4"4040
TIME
42085.970723
83 USRFD FF2 ASCS OtFDCD60 JOEN VTAM
DNODE: PFP7:;;6A
LIM
E:P co
TIME CF
LCD 9 po- 6 lIME DC SC,F 00 POF fF
Lce 9 H.F 7 TIME UC SCF 49 PDF 51
LCfI 9 PCF 7 TIME OC StF 49 PDF 9F
LCD 9 pet- <.1 TIME OC SCF 45 PDF 00
LtD 9 PCF 'i TIME OC S(.F 45 PDF 7E
LCD 9 PCF S- TIME cc StF 40 PCF Cl
LCD 9 PtF 9 TIME DC SCF 40 PDF 2A
LCC 9 f'CF 'i TIME OC . SCF 40 PDF 01
LC.I. 9 PCF- 9 TIME OC SCF 40 PI)F 00
Leo 9 peF 9 TIME (-(. S('F 4C PDF 03
TIME
420&6.537157
84 USRFD FF2 .ASCS CC.FOC[lb0 JOBN VTAM
LINE
UNODE PfP736A
E P 00
TIMI: Of:
LC[) 9 I-'CF C) TIME OC S(.F 4C PDF 90
LCL 9 P(.F 9 TJ"'E nc SC.F 40 POF C~
LCf) 9 PCF 9 TIME OC SCF 40 PLF E3
LCD 9 P('F C; TIME: OQ SCF 4Ci PDF 40
LCD 9 PCF CI TIM( O[} SCF 4() P[}F C1
LC[: 9 PCF 7' TIME C·D SCF 4(\ PDF C5
LC!) 9 H.F .,.. TIMt cn SCF 40 ~DF Cl
LCu <;, PCF 9 TIME- 0[1 SCF 40 PDF 07
LCu 9 PCF 9 UMf:' 0[: SCF .. 0 pt;.F e9
U .. r, '1
TIME (o[) SCF 40 PDF Cl
PC.F ':'
11M[
42(·!<0.537[;43

a~
~

~

.....

CO
0703C9C3
C5E2E2Cl
4040404Q
41)4(4040

LNG2 00A4

CIE3C906
C7C54040
40404040
404()4C40

~

00000000 OOOCOOOO

9
9
9
9
9
9
9
9
9

PCF
PCF
PC'-:
PcrPCF
PCF
PCF
PCF
PCF

7
7
6
9
9
9
9
9
9

TIME
TIME
TIME
TIMf.
TIME
TIME
HME
TIMf:
TIME

DC
DC
ct
(lC
DC
Ot
OC
DC

LCD
L(.9
L(.I:;
·LCD
lCD
LCD
LCD
LCD
LCD

9
9
9
9
9
9
9
9
9

PCJ-=
PCF
PCF
PCF
peF
PCF
PCF
PCF
PCF

9
9
9
9
9
9
9
9

TIMl:
TIME
TIMl:
TIMl:
UMF
TIM!:
TIM£.
TIM£;
TIME

OC
0(,
oe
OD
OD
Of)
00

DC

O~

OD

SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF

49
49
00
45
45
40
'to
40
4C

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

CI
39
9f
7E
7E
04
CO
rl
00

S('F
S('F
S('F
::'CF
SCF
SCF
SCF
SCF
SCF-

40
40
40

PDF
PDF
PDF
Pl.F
P[;F
PDFPDFPDF
PDF-

riO
Cl
Cl
C2
E2

4(1

40
'40
40
40
4f)

Figure 4-3. VTAM and GTF Traces (part 8 of 13)

.

.."
.."

g.

*OATA BASE APPLICATIO*
*N RESPONSE MESSAGE *.
*
*
*
*
*
*

LCO
LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD

Cj

RSVD

;'"

a
fIJ

.()

-;s·=
Q

-Q.

~o

li7
03
C3

-

=-

~

RSVD 0000

--

~

W

N
N

0

rn

"<
rn
N

rn

'<
fI>

S!3
"'CI

Jot

0

~
r»

=

!3

er

OQ

~

;

'3
a::
-<
.rn

~

~0

!.
n
~

(D

n

e.=-

.c

c:
(D
fI>

I,

EXT~kNAL

c,

85 USRFD FF2ASCB CO~OtOt:O JGbN VTAM
LINE
Dr'WDl PEP736A
LtC. 9
LCD 9
LtL 9
LCD 9
LCV 9
LtD 9
LCD 9.
LCD 9
LC[ c;.
lIME

86 USRFD FF2

[P

OC

'i

TIME:
TIME
TIME
TlMl
TIME
TIME
TIME
TIME.
TIME

Cf
P(.F
PtF
P(F
PCF
PCF
P(.F

9
9
q
9
9
'-I

9

TIMf
TIME:
TIME
TIME
TIME
TIME
lIME

T J~I: 11
SCF 40
SCF 40
SCF 40
SCF 45
SCF 45
SCF 40
SCF 40
~CF 45
SCf 00
SCF 4'1
TIMl 11
SCF 49
SCf- DO
SCF 45
SCF 4!>
SCF 4()
set- 40
SCF 40
SCF 40
SCF 40
SCF 40

PDF C1
PDF 01
PDF 1£
FDF 1£
P~)F 11
I'DF DO

PDF F-F
PlJF ~F
PDF 71

SCF
SCF
SCF
SCF
S(.F
SCF
SCF

~

."
."

...

0

~

tD

a

--....
fIJ

()

0

==
5·

PDF 3B

PDF
PUF
F['F
t'CF
PDF
PDF

bE
71:
7~

(:6

oa

'01

PDF Or\

V()F 4')
PDF 40

lIME 11
S('F 40 PDF 40
SCF 40 . PDF 40
TIM!:

11
11
11
11
11
11
11

P[;F E2

LCD
LC['
Lcr.
LCl;
LCD
LCD
LCD
LCD
LCD

9
9
4

9
9

PCF 7
PCF 9
PCF 9
PC~ 9
PCF 9
PCF 9
PCF 9
PCf- q
PCF 9

TIM..,
TIME
TIME
TIME
TIME
TIME
TIMf:
TIME
TIMf

10
10
10
10
10
10
10
lC'
10

SCF 49
SCF 45
SCF 45
SCF 40
StF 40
SCF40
SCF 40
~CF 4(1
SCF 40

P[)F
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

LCD 9

PCF 9

TIME. 10

SCF 40

PDF 40

LCD
LCD
LCD
LCO

PCF
PCF
PCF
PCF
PCF
PCF
PCF

TIME 11

SCF
SCF
SCF
SCf
SCF
$CF

PDF
PDF
PDF
P[IF
PDF
P[)F
PDF

q

9
9
<1

=
tD

b£:
OC
11:
Cl
26
l'l
00
t5
40

Q.

'-'

13

40
40
40
40
40

40
4CJ

PDF
PDFPDF
PDF
PDF
PDF
f>OF

40
40
4(1
40
40
40
40

9
9
9
9

lee 9
LCD 9
LCD 9

9
9
9
9
9
9
9

TIME 11

TIME 11
TIMF 11
lIME,l1
TIME 11
TlMf 11

~CF

40
.40
40
40
40
40
40

~o

40
4'0

40
40
40
40

.Figure 4-3. VTAM and GTF Traces (part 9 of 13)

~

~

-~

'3

fXTFRN~L TRACE - DD TAPE

LCD 9
LCD 9
LCD 9

f€l
(')

d'.

~
f:

!R

a

"0

~.

~

ea.

'<

~.

I:fl

~

TIM!:
42087.037982
90 USRFC FF2 ASC6 OOFDCObO JOhN VTAM
LINE
DNDDE P!:P736A
LCD 9
LCll 9
LCD '1
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
TIME
42087.03&683
91 USRFD FF2 ASCB DOFOCD60 JOBN VTAM
oNDDE PtP736A
lINE
LCD 9
LCD 9
LCD 9
lCO 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
TIMF
42087.039365
92 USRFD FF2 ASCB OOFDCD60 JOBN VTAM
LINE
DNODE PE.P73bA
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
LCD 9
TIME
42087.041704
93 USRFD FF2 ASCS 00FDCD60 JOSN VTAM
LINE
DNODE PE:P736A
LCD 9
LCD 9
LC~ 9
LCD 9
LCD 9
LCD 9

...o
go

"0
.1\)

~

iN
~

IN

Figure 4-3. VT AM and GTF Traces (Part 10 of 13)

PCF 9
PCF 9
PCF 9

TIME: 11
TIME 11
TIME 11

SCF 40
SCF 4Q
SCF 40

PDF 40
PDF 40
PDF 4(1

LCD 9
LCD 9

PCF 9 . TIME- 11
PCF 9 TIME 11

SCF 40
SCF 40

PDF 40
.-[iF 40

00
TIME
TIME
TIME
TIME
TIME
TIM£:
TIME
TIME
TIME
lIME

11
11
11
11
11
11
12
12
12
12

TIME. 13
SCF 40
SCF 40
SCF 40
SCF 40
SCF 40
SCF 40
SCF 40
SCF 40
SCF 40
SCF 40

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD

PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF

SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF

PDF
PDF
PDF
PDF
POF
PDF
PDF
PDF
PDF

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

40
40
40
9F
7E
7E
Cl
3D
71:
FF

LCD 9
LCD 9
LCD 9
LCD 9
LCD.9
LCD 9
LCD 9
LCD 9
LCD 9

PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF

9
9
9
9
9
9
9
9
9

TIME
TIME
TIME
TIME
TIME
TIME
TIME
TIME
TIME

12
12
12
12
12
12
12
12
12

SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF

40
40
40
40
45
45
40
40
45

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

40
40
40
02
7E
7E
11
DO
FF

FF
91
59
00
7E
C1
30
7E
FF
Cl

LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD
LCD

9
9
9
9
9
9
9
9
9

PCF
Pt-F
PCF
PCF
PCF
PCF
PCF
PCF
PCF

7
7
6
9
9
9
9
9
6

TIME
TIME
TIME
TIME
TIME
TIME
TIME
TIME
TIME

14
14
14
14
14
14
14
14
14

SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF
SCF

49
49
00
45
45
40
40
45
00

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
P(JF

C1
35
59
7E
7E
11
DO
FF
FF

91

LCD
LCD
LCD
LCD
LCD
LCD

9
9
9
9
9
9

PCF
PCF
PCF
PCF
PCf
PCF

7
6
9
9
9
9

TIME
TIME
TIME
TIME
TIME
TIME

14
14
15
15
15
15

SCF
SCF
SCF
SCf
SCf
SCF

49
00
45
45
40
40

PDf
PDf
PDF
PDF
PDf
PDF

35
59
7E
1E
11
DO

PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF

EP
9
9
9
9
9
9
9
9
9
9

PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF
PCF

EP 00
9 TIM!::
9 TIME
9 TIME
9 TIME
9 TIME
9 TIMf
9 TIME
9 TIME
9 TIME
5 TtME

12
12
12
12
12
12
12
12
12
12

TIME 13
SCF 40
SCF 40
SCF 40
SCF 40
SCF 45
SCF 45
SCF 40
SCF 40
SCF 45
SCE' 45

9
5
7

00
TIME
TIME
TIME
HME
TIME
TIME
TIME
TIME
TIME
TIME

14
14
14
14
14
14
14
14
14
14

TIME 16
SCF 00
SCF 49
SCF 49
SCF 45
SCF 45
SCF 40
SCF 40
SCF 45
SCF 40
SCF 49

PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF
PDF

EP
7
7
9
9
9
9

00
TIME
TIME
TIME
TIME
TIME
TIME

14
14
14
15
15
15

liME 10
SCF 49
SC.F 49
SCF 45
SCF 45
SCf 40
SCF 40

PDF
PDF
PDF
PDF
PDF
PD·F

PCF
PCF
PCF
PCF
PCF
flCF
PCF
PCF
PCF
PCF

PCF
PCF
PCF
PCF
PCF
PCF

EP
6
7
7
9
9
9

9

40
40
40
40
40
40
40
40
40
40

9
9
9
9
9
9
9
9
9

9
9
9
9
9
9
9
9
9

TIME
TIME.
TIME
TIME
TIME
TIME
TIME:
TIME
TIME

11
11
11
11
11
12
12
12
12

40
40
40
40
40
40
40
40
40

40
40
40
40
40
40
40
40
40
~

"""

~9

CO
7E
Cl
3D

J

a

--....g
fIJ

()

~.

-e

~

W

EXTERNAL TRACE - 00 TAPE

N

~

lCLI
lCD
lCD
lCD

oCIl

"<
CIl

TIME

~

~
c:I>
S-

a

~

~

aa
5·

CJQ

t:'!

fa::
~
S2

«!i

g.

a.~
i
c:I>

...

~

9
9
9
9

PCF
PCF
PCF
PCF

9
5
1
1

TIME
TIME
TIME
TIME

15
15
15
15

SCF
'SCF
SCF
SCF

45
40
49
49

PDF
PDF
PDF
PDf

1E
FF
C1
35

lCO 9
lCD 9
lCD 9

PCF.9
PCF b
PCF 1

TIME 15
TIME 15
TIME 15

SCF 45
SCF 00
SCF 49

PDF FF
PDF FF
PDF 91

42081.042310

'---

94 RNIO ASCS 00FE4390 CPU 0001 JObN TRAC3bOO
IN 1COOI00121100017 00180390 00C90508 E4C909E8
TIMf
42224.350661
95 USRFO FEF ASCB OOFE4390 JObN TRAC3600
TPIOS IN ANODE P1VT3600
FOB 00000000 00B61111 ~0150000
RSVO 0812
lNG2 OOCO
REMOTE ONODE UICll0E1
FSB 022COOOO 00000000 10012110 00170000 00000000 00000000 00000007 ~1800CO
THR~ lCOOI001 211D0011 00180390 00
TEXT C90508E4 C9D9E8~0 F3F6FOFO 40FOF2F2 40E4FIC3 *INQUIRY 36CO 022 U1C*
Fl
*1
*
42224.35366~;
TIME
96 USRFD FFI ASCS OOFE4390 JOSN TkAC3600
IN
ANODE P1VT3bOO
TEXT C90508[4 C9D9[840 F3F6FCfO 40FOF2F2 4DE4FIC3 *INQUIRY 3600 022 U1C*
C/l
DNODE UIClLOE7
F1
*1
*
42224.351028
TIME
97 USRF£: FF 1 ASCf OOFE4390 JObN TRAC3bOO
OUT ANODE ~IVT3600
TEXT C4C1E3Cl 40(2CIE2 C54CCI01 0703C9C3 CIE3C906 *OATA BASf APPlI(ATIO*
C/l
ONODE: UIClLOl:1
054009C5 [2010605 £2(54004 C5E2E2Cl C1C54040 *N RESPONSE MESSAbE *
40404040 40.404(;40 40404040 40404040 40404040 *
*
40404040 40404040 40404040 40404040 40404040
*
**
40404040 40404Q40
TIME
42224.365018
98 USRFO FEf- ASC£) OOH4390
JOlN TRAC3H,~
Tf'lOS OUT ANODE PI VT3600
FOB OO'd)(..OC>O cOB670FO 00650000
RSVO 0000
LNG2 OOA4
RSVD COCOOO~ 0 Qr.,OOOOOO
REMOT~
DNODE UICll0E7
THRH 1(002110 10010000 00000390 00
TEXT C4C1E3Cl 40C2CIE2 C540CI07 07D3C9C3 Clf3C906 *OATA BASE APPlICATIO*
D~4009C5 E2D7DbD5 E2C54004 C5E2E2Cl C7C54040
*N RESPONSE MESSAGE.
40404~40 40404040 4040404C 4C404040 40404040
n
40404 40 4C40404r 404~,4{\4':-' 4C4G4040 40404040
40404,:40 404(\4(,40
TIME
42224.36E637
OUT lE00211D 10010018 005B0390 OOC4CIE3 C140(2Cl
99 RNIO AStB COFF4390 CPU 0001 JObN TRAC36CG
lIM[
42224.375301
OOFE439(
CPU("~~1
JObN
A('3bOO
IN 1(,001001 21100018 O~180390 COC9D5D8 E4('909E8
100 RNIO ASC8
TIME
42231.552094\
101 USRFD FEF ASCS COFt4390 JOBN TRAC36~O
TPIOS LUT ANOOE PIVT36C0
FOb ocoeoooo (0867280 OOOEOOOO
RSVD COOO
lNG2 COA4
RSVD 00000000 00000000
kEMnT~
DNODf UICIlOE7
HMH lC002110 10010000 20006B80 00
T XT Al
*.
TIME
4?23f:.C343Q2
.
102 RNIO ASCo (OF f"4390 CPU COCI JOBN TRAC36C,_
IN IFOOIOOI 21100003 0004EB80 OCA1
lIME
4L23~.b52~~O
'
103 USRF r HF ASCP 0 vFE439C JOt.N TRAC36';O
RSVO 0812
lNG2 OOCO
TPI0S IN ANODE PIVT3600
fDB 00000000 "0667111 COCIOOOO
0~2COOCO 00000000 I~CI2rlD 0003CDct oooeoooo 00000000 (OeOOOoo OC04COOO
klMOT[ DNOOE UICILOE7
FSB
THRH IFOOIDCl 21100003 0004EB80 00
next Al
*.

~

::p

g.

[
f'-I

-..
(')

-!
g

*

*
*
*

*
*
*
*

\

*

-

lIHF-

*

42238.691~21

~

FigUre 4-3. VTAM and GTF Traces (Part Hof 13)
~

~

~

-

-

-

--

104 USRFfJ HF ASCB OOH4-390 JO~N TRAC36('.O
FOB
TPIGS OUT ANODE PIVT3bOO
REMOTE DNOOt U1CILCE7
THRH
TEXT
TIME
4223E.717014
105 RNIO ASCS OOFE43YO CP~ 0001 JOBN TRAC36CO
TIME
42239.251781
106 USRFD HF ASCB OOFt439Q JOBN TRAC3b~O­
TPIOS IN ANUDe P1VT3bOO
FOB
R~MOTE
~NOOt UIC1LOE7
FSb
THRH
TEXT
TIMr
42239.254&10
1W USRFU FEF ASCB ODFDCD60 JOBN VTAM
TPIOS OUT ANODE VTAM
FDB
REMOTE DNODE UICILOE7
THRH
TEXT
TIME
42253.494622
108 RNIO ASCB ft"FDCD6(" CPU 0001 JOBN VT AM
TIME
42253.499154
109 RNIO ASCB OOFDCObO CPU 0001 JOElN VTAM
TIME
42254.054066
110 USRFD FEF ASCB COFDCDbO JOliN VlAM
TPIOS IN ANODf VTAM
FDB
REMOTE DNOU~ UICILCE1
FSE
THRH
HXT
TIME
42254.121367

- -

EXTERNAL TRACE - DO TAPE
00000000 GOBblQ28 OOOFcono
RSVD
lCCC211D 10010000 C:006880 00
3201

IN IFOOI001 211UCCOl

croe

LNG2 C0A4

PSVO

-

OODocaso CQOOGboo

* •.

0004Eb~O

*

0032

OOeb75Cl 0001000e
RSVD C812
lNG2 OOCO
C22C0000 00000000 1001211D 00010000 oeOraDOO OOOOOOCO
IFnDl~Ol illDOOOl rOC4fB8C 00
32
*.

toco~~oo

000000:0 OOB~75AO r-OE0000
RSVD 00(;('
lC00211D 100COOOO r0006B8C 00
OE

lNC;2 rCA4

OO~OOOO

00040000

*
RSVD

*.

00000000 oooooeoo

*

..,;j

""0

~

OUT IFOC211D 1(000002 00046680 OOOf:

0

TN IFOOIUOO 21}00002

0004~&~C

s:

OOOf

('I)

!3
fIl

--

ooooeOOD UOBb72Al ~OO10000
RSVD 0812
lNC;2 ooco
'J22COOOCt :;OO))O}O'; 1(,'02110 0;;020:00 ooocoooo 00000000 00000000 00040000
IFOOI000 21100002 C004EB80 DO
uf
*.

( ")

0
::I

9·

*

s::

('I)

~

***

DATE:

DAY 168

YEAI<. 1975

111 USRFD FH ASCB OOFDCD60 JOhN VTAM
TPIOS OUT ANODE VTAM
REMPTE DNODE Clll0b7

~
(')

112 RNIO ASCb

d".

g
~

113

RNIO ASCB

114 USRFD FEF

~

a
'0

i

~
~

115 USRFD FEF

~.

~
'0

i

116 RNIC ASCB

~

~

FOB 00000000 10B67280 ~~OFCOCO
RSVO 0000
THRH lC00211C 10000000 00006e80 00
TfXT 1202

TIMf
42263.944e7b
00FDCD60 CPU «01 JDBN VTAM
TIME
42263.952790
OOFOCObO CPU COOl JuBN VTAM
TIME
42264.555b~1
ASC5 00FDCD60 JOBN VTAM
Foe
TPIOS IN ANODE V1AM
REMOTE UNODE tLll0~7
FSE
THRH
lEXT
TIME
422b4~55q330
ASCB otFDCDbC JOeN VTAM
FOB
TPIOS OUT ANOCE VTAM
REMOTE DNOuE PeP730A
THRH
TEXT
TIME
42264.582179
COFDCD60 CPU 0001 JUBN VTAM
TIME
42264.~665b7

~

Figure 4-3. VTAM and GTF Traces (Part 12 of 13)

**.

TIME 16.44.23.800573
LNG2 OOA4

kSVD

*••

00000000 00000000

*

OUT IF00211C 10000002 00056880 001202
IN IFOOI000 211C0002

C012

~~04fB80

RSVO 0812
00000000 008&75(1 00010000
lNG2 OOCO
022c0600 00(00000 100021]C 00020000 COOCOOOO 00000000 00000000 00040000
IF001000 211COOC2 nOC4EB8C 00
12
*.

•

00000000 GCB675AC ~0120000
RSVD 0000
lCC02000 10000000 00000BbO 00
01020221 lC
OUT IfC0200C lSOOC034

OOQ&~B8C

lNG2 r;I)A4

* •••••

OCDIr2C2 lIIC

-

"SVD

00000000 00000000

*

- -

.,.,.

w

o

~
f'-l
N

117 RNIO A$CE, (lOFDC(;t,O
CPU (;001 JOBN VTAM
TIME
42264.9~~744
118 USRHI FfF ASC& COFlJCDbC)
Jue,'l VTAM
~Db
TPIOS IN ANO~E VT~M
REMClt
DNC'Di:: PEP73t.A
FSB

f

TIMl

REM0Tf

!:

121 RNIO AseB (OmCDoD

~
f'-l
52

c§

ONUOE

P~P73~A

TIME

~

a.

1
U>

C0069SAC GOOI0202

FOL
(OOOOODe CClb7?RD (0120000
THRH lCG02000 Il000000 ~CCCCB8(

TIME
TIM~

lNC2

~DA4

* •••••

QIC20b21 IB
OUl

CPU O{)C 1 JOf.N VTAM

422b5.2C2C27
CPU COOl JUBN Vl AM

O~CbCOOO

I£Ot2[00 I 0 0C0035

IN 1[001000

2~OOD035

C00BC~6C

0001~20f

kSVV

~OOCOCOO

*
00000000

*

211t

000698eo 00010208

~

42210.35~ao4

122 USRFO HF ,ASCB COfDCDtO JOPN VTAM
TPIOS IN
ANODE VTAM
DNO~f Pf~736A
REMOTE

TIME

RSVD 0000
O~

4226~.197C3&

42270.35~397

i(I).
So

2C~QCC34

*...

T~Xl

120 RNIO ASCf:i OOFCCD60

f

1~DC1300

4~264.9592b3

119 USRFD HF ASCI: 00F[lCU60 JObN VHM
TPlcis GUT ANODE VTAM

a5·

OQ

IN

00000000 (Oh672Al CCC30COC
Rsvn 0812
LNG? roce
622C0000 (0000000 ]~C~2~CS C034COCO r00cCono JCOOOOCG [00"00eO
THRH lEOOlf00 2CD00)34 ~OC69BHC 0r
TlXT 01~202

~

i
a

-

rXTtkNAlTkAC[ - DO TAP[

~

FOE

OCOOCOOC tOb675Cl 00030000
RSVD 0812
lNG2 O~CO
fSb
022COOOO 00000000 100C2COO 00350[OC 00000000 00000000 00000000 00060000
THRH If 001000 20000035 00069ErD 00
lEXl 0102)b

*•••

*

J

--g...
fIJ

()

~.

Figure 4-3. VTAM and GTF Traces (Part 13 of 13)

!

TP Problems (continued)

Notes on Examples 1 and 2
1. Mappings of the data in the various trace entries are not included. Blocks such as
FDB and FSB are described in VTAM manuals, OS/VS2 VTAM Logic and
OS/VS2 VTAM Data Areas.
PIU formats can be found in the 3704/3705 Program Reference Handbook.
Those PIUs that accomplish a function other than transfer of data between an
application and a terminal have a network command in the RU (as shown in
entry 3 of the examples). Most are detailed in the "Network Commands"
section of the 3704/3705 Program Reference Handbook. Network commands
that the NCP must process are shown in more detail in the "Network
Commands" appendix in IBM 3704 and 3705 Communications Controller
Network Control Program/VS Logic.
For a full understanding of the line trace entries, refer to 3704/3705
Communications Controller Principles of Operation for ICW field definitions.
SDLC commands and N(R) - N(S) processing can be seen in the trace examples
(at entries 83-93). The IBM 3704 and 3705 Program Reference Handbook
section "SDLC Commands and Responses", and IBM Synchronous Data Link
Control General Information may be useful.
2. Data flow between the application and the LU is on an exception-response-only
basis. Other PIUs request positive FME acknowledgement, and the FMEs can be
seen in the trace examples (at entry 8).
3. No pacing is used; it was not included in the NCP definition.
4. Note that the line trace shows the outbound PIU (see entry 82) changed from
the FIDI TH (transmission header) format that was transferred to the NCP,
into a FID2 format TH that is sent to a cluster controller (see entry 83). Also
note that the NCB segmented the PIU during transmission to the controller.
The length of the PIU was greater than the MAXDATA operand for the
controller in the NCPgeneration (66 in this NCP), so it was broken into two
segments. The second segment is sent (starting in entry 87).
5. At the beginning of each PIU transmission, there are three or four flags set
(X'7E') because a temporary "superzap" was made in the-NCP at the time the
trace was made. Normally only one flag would be sent.

Section 4: Symptom Analysis Approach

4.3.27

TP Problems (continued)

Summary
Wh~n symptoms are intermittent or confusing, the debugger should be aware that
there is some variable present that he hasnot recognized. In such situations,
assumptions are dangerous, yet it is very easy and common to focus immediately
on the wrong process as the location of a problem and assume that other steps
must be correct. Using the traces in the TP components should help the debugger
to make fewer assumptions.

A less obvious benefit of these traces is their educational value. NCP line traces,
for example, illustrate from actual examples the workings of TP line protocols and
their use with different devices. These protocols are important when "what if'
situations need to be projected and when line errors or terminal errors can be
translated into some of the none-too-obviou.s external symptoms that sometimes
result. Then, symptoms may be seen later in terms of possible component errors
and traces or traps can be used to confirm suspicions.
As discussed earlier, an operator command turns on (or off) a trace for one
node. Sometimes, many or all terminals of a certain class or on a set of lines need
to be traced. Depending on when an error is occurring, or on the connection
design of a network, tracing sometimes should start as soon as the NCP is activated.
A trace can be started when the NCP is activated (see Operator's Library: VTAM
Network Operating Procedures or, Operator's'Library: OS/VS TeAM Level 10),
buf if several must be started, the following technique has proven useful.
In the NCP definition, code INITEST=YES on the PCCU macro, but do not put
an INITEST DD card in the VTAM procedure. Upon activation of the NCP, VTAM
asks the operator if he wants to bypass initial test. At this point the network is
defined inside VT AM so traces can be started by, operator command, but the NCP
has not yet been loaded. Start as many traces as desired and reply to bypass initial
test. After your reply, the NCP will activate. -This technique was used to trace the
initialization sequences shown in the first trace example.

4.3.28

OSNS2 System Programming Library: MVS Diagnostic Techniques

TP Problems (continued)

VTAM Buffer Trace Modification
Many operator commands are required if all nodes are to be traced. However,
VTAM modules can be superzapped to trace unconditionally. The examples that
follow are intended only to illustrate a technique for gathering information. They
may be applied differently to suit individual situations.
Various techniques can be used for a VT AM buffer trace. Buffer trace for a
node is indicated by bits in the 'RDTE (resource defmition table entry) and in
the FMCB (function management control block) that describe a session. Module
lSTOCCFB creates the FMCB and, by temporarily modifying it to create all
FMCBs with trace bits on, causes buffer trace to always be done.
NAME

ISTOCCRT

ISTOCCFB

ver

08BA

9110A015

ver

08BE

47EOB8AE

ver

08C2

9604E020

turn on in FMCB

rep

08BE

4700

cause fall-thru

test RDTE trace bit

Alternately, each traced path can be "superzapped" in the logic that checks a,
trace bit. For buffer trace, four "zaps" are required: for C/L and TPIOS buffer
trace, and each IN and OUT. By zapping at these locations, selectivity is possible,
and alternate control can be introduced by changing the logic from checking the
trace bit in FMCB to checking some other global indicator, such as a flag in the
PSA. This technique is illustrated in the following discussion of I/O trace.

VT AM I/O Trace (RNIO) Modification
The same alternatives apply to the I/O trace that applied to the VTAM buffer trace.
To create every NCB (node control block) with the RNIO trace indicator on (so
that everything is traced), the ZAP is:
NAME

ISTOCCRT

ISTOCCFB

ver

06BA

91087015

ver

06BE

47EOB6AE

ver

06C2

9201AOID

turn on flag in NCB

rep

06BE

4700

cause fall-thru

check flag in RDTE

Section 4: Symptom Analysis Approach

4.3.29

TP Problems (continued)
Using the second approach of altering the checking at the time of the tracing,
two zaps are required .. The following zap will cause tracing of all inbound PIUs if a
byte in the PSA(s) (iocation X'xxx') is set non-zero:
NAME

ISTZCEAB

ISTZFMIB

ver

68

9500301D

check NCB flag

rep

68

95000xxx

check low-core flag instead

ver

7A

9500401D

check NCB flag

rep

7A

95000xxx

check low-core

Two paths exist within ISTZFMIB, so two locations must be changed.
For ,outbound PIUs, the logicin VTAM is slightly more complex. An indication of
whether or not to perform RNIO trace is transferred from the NCB to the PIUs
buffer area before the buffer is queued for output. However, the trace is not
performed until the I/O is complete, so although the indicator is used at that time,
VTAM also checks to see if the GTF RNIO trace is still active. Therefore, in
addition to the following "zap," the GTF RNIO trace must be active before VTAM
attempts to make the trace entry:
NAME

ISTZCEAB

ISTZFFDB

ver

D2

91406011

47 EOyyyy

check trace flag

rep

D2

95000xxx

4780y),ryy

check

low~core

instead

Other Tracing Methods
If there is a single point to be made in thls section, it is to minimize assumptions
about what or how data is travelling through a network. There are other sources
of information besides standard traces that can provide snapshots of mes~ages at
various points enroute. An application program may log every message received
and sent. It may be important to know exactly at what point in the flow the
logging occurs, but in general the log can be used as another trace point when
access method traces have been followed as far toward the application as they go.

Access method or component buffers may sometimes be used to see if a
message got as far as a buffer, or to see what form a message had when it was put
into a buffer by a component. Dumps of buffer areas and dumps of TCAM queues,
from disk or main storage, would be used in place of traces. The limitation here
is buffer or queue reuse, which often creates confusion when half of the message
to be examined is found but critical information has been lost because
of reuse. Nevertheless, these sources can be valuable. In the NCP, for example,
because only one line can be traced, buffers usually provide the last snapshot of a
message as it appeared before going to a terminal. Status indicators in buffer
headers often can be used 'to tell how a message was processed; if the buffer is still
in use, then backtracking to find a work element or process that refers to the buffer
can provide the key to understanding why a message is stuck. The following
example shows such a case.

4.3.30

OS/VS2 System Programming Library:MVS Diagnostic Techniques

TP Problems (continued)
Assume a heavy-running 3600 network in which a few logical units do not
receive a response message after input is entered. The problem is intermittent and
strikes any LU any time over a period of several hours. The GTF VT AM traces
are run for all LUs during a typical run and when one LU fails to receive a
response message~ traces are stopped and all network components dumped.

It is not possible (without writing a user exit) to print the GTF trace for selected
tenninals~

so the entire trace is printed for a short period surrounding the time of
failure. Activity on the problem LU shows the last input message came into VTAM
and the application and a response message was sent all the way out to the 3705.
Line trace is not used because only one line can be traced and the problem line
is not predictable.
.

Other variations are tried with the LU; it can be closed (via CLSDST) and
reconnected and run~ so a hardware problem is unlikely. No MDR records in
SYSl.LOGREC indicate an error on the line. At this point the problem is
isolated to the NCP or beyond in the network. Using other indicators (in this case
the NCP~s logical unit block~ LUB)~ an analysis of the message path from the time
the NCP receives it shows that the last outbound PIU did not go out over the TP
line. (NCP keeps the sequence number of the last PIU sent and the number in the
GTF trace is the next higher sequence number than the one in the LUB.) The NCP
buffers are searched and the missing PIU is found intact in a buffer. The problem is
isolated to the NCP; the buffer is still in use~ and indicators in the buffer header
reveal how much processing was done in the NCP; this leads eventually to the bug
in the NCP.

Section 4: Symptom Analysis Approach

4.3.31

4.3.32

OS/VS2 System Programming Libraty: MVS Diagnostic Techniques

Performance Degradation

This chapter describes how to investigate performance degradation problems.
It is not intended to serve as a tuning guide or as a reference for general performance analysis (which should be performed through SMF, GTF, etc.).
The following points should be considered when a problem is suspected in the
operating system itself or in the manner in which applications use the operating
system.

Operator Commands
When a bottleneck or system failure, hardware or software, is degrading
throughput, the following operator commands can help identify the source of
degradation and, possibly eliminate it.
D A,L

Displays current system status. A job step with a name of STARTING
indicates initiation has not successfully completed. Also, if a job step
is marked with an'S,' it is considered swapped out. Other jobs may
be queuing behind these jobs in an allocation/ deallocation path.

D R,L

Displays any outstanding requests. Operator action is required (for
example, to mount a volume). Other jobs may need to wait until
action has been taken.

DM

Displays configuration information. The loss of a hardware component
(for example, a channel) may have been noted on a hard copy console
and missed by the operator.

If a resource queue "snooper" program exists, it should be started and output examined to find any ENQ bottlenecks. If no such program is available, take a
dump of an address space, the nucleus, and request SQA. The PRDMP service aid
(with the QCBTRACE option) can then be used to print the dump so the resource
queue can be examined.
Use the job entry subsystem display commands to fmd the status of jobs,
queues, printer setups, requirements of SYSOUT data sets, etc. to find reasons
why JES2 is not able to schedule work. Some JES2 commands that may be useful
are shown in Figure 44.

Section 4: Symptom Analysis Approach

4.4.1

Performance Degradation (continued)

$D

J1-9999
S1-9999
T1-9999

Status of jobs, started tasks, or time-sharing users. If a range of jobs
has been held they may be released using $AJ.

$AJ

Release jobs.

$DF
$DU,PRTS
$TPRTN

Status of output forms queue.
Status of printer setup characteristics.
Change setup to needs of queue output.

$LJ 1-9999 ,H
$OJ1

List held SYSOUT data sets
Release held SYSOUT data sets

$DO
$AO

Display queue.
Release queue.

$AA

Release all jobs held by a $H A command.

Figure 4-4. JES2 Commands for Status Information

If the use of previous commands does not make it obvious why JE82 is not
scheduling work, take a dump of the JE82 address space. Print the 8YS1.DUMPxx
data set to help determine the problem.
Find the number of the IPS member that should be active and issue T IPS=na to
ensure that it is active. Print the IP8 member in 8YS1.PARMLIB and analyze the
IPS for an explanation of degraded service. Then, enter the W command to
print the system log to obtain the history of system execution.
Figure 4-5 shows important hardware components used by the system that
should be understood when a degradation problem is suspected.

Dump Analysis Areas
The following areas in a storage dump may provide a starting point for further
analysis. Problems in these areas may indicate a bug or some unexpected use of
system services.
1.

4.4.2

ENQ/DEQ - A check of ENQ/DEQ's processing queues may indicate contention problems. The in-use blocks are anchored in the CVT at CVT+ X'280'
(CVTFQCB, first QCB element) and at CVT+X'284' (CVTLQCB, last QCB
element). A queue of many QELs off a particular major or minor QCB should
be explained. An indication of a possible problem is a mixture of shared
and exclusive requests intertwined for one resource. The state (running!
waiting/swapped-out/etc.) of the holder of the resource should be determined.

OS!VS2 System Programming Library:. MVS Diagnostic Techniques

Perfonnance Degradation (continued)

@®'
Ye~,

Profile
Services

No

Find
Dominant
Jobs

®
-Yes

Yes

No

~
Yes

Yes

Build PAK
(lEAPAKxx)
List

®®
Concurrent
Analysis

Find
Major
Contributors

Find
Major
Contributors

DASD

Seek
Analysis
Primary Tools

o

SMF

G
®

MF/1

@

GTF

Hardware Monitor

Figure 4-5. System Use of Hardware Components
Section 4: Symptom Analysis Approach

4.4.3

Performance Degradation {continued}
Also check the free elements. The ENQ/OEQ global save area, mapped in
lEAVENQ 1, contains six addresses , each of which points to the first element
of a free queue. The ENQ/OEQ global save area is found through:
CVT+X'2AC' (CVTSPSA) which points to the global save area table; the
global save area table +X'20', points to the ENQ/OEQ global save area.
There are multiple queu~s, each containing blocks of all one size. These blocks
occupy SQA. Merging the free ele'ments with the in-use elements should
provide an indication of ENQ SQA fragmentation. Because fixed storage is
involved, the fragmentation may be reducing the number of frames available
for paging.
2.

lOS storage manager queues should be inspected. The anchors for the various
pools (small, medium, and large block pools) are located at the end of
IECVSMGR at external symbol IECVSHO~, which should show up in a
NUCMAP. Generally there should be one 2K page of small blocks (used for
10QEs) and one 4K page of medium blocks (used for RQEs). Examine the
large blocks in detail. If the system was quiesced, there should be two 4K
pages of large blocks and all blocks should be on the free queue. Many heavilyloaded systems require 8-10 pages oflarge blocks. If the actual number is
much higher than this, determine the ASIO that each in-use block is assigned
to (the two bytes at block address-8 contains the ASIO address). System
address spaces can have many blocks, but any user address space with a large
m.imber of blocks should be explained.
Common problems are: I/O loop, I/O errors, and storage not being freed at
. I/O termination time. These page frames occupy real storage, which depletes
the pool of available real storage and possibly causes excessive paging.

4.4.4

3.

Check page frame table entries (PFTEs) for large fix counts. The
CVT=X'164' contains the address of the PVT; PVT+X'C' contains the address
of the apparent PFTE origin - you must index several hundred bytes
(X'10' times the number of pages in the nucleus). Large fix counts may ,
indicate a page fix macro loop, or page fix without page free. Frames
allocated to a private area space may indicate a user errOL Try to analyze the
contents of the page for a clue as to;who is page fixing without page freeing.

4.

Check PFTEs for bad frames caused by hardware storage errors that rendered
these frames unusable (in PFTE+ X'C', the X'04' flag should be set if this is the
case). Contact hardware personnel to determine if a machine malfunction has
occurred.

S.

COE (contents directory entry). These blocks represent modules loaded
into virtual storage; COEs reside in SQA and the queue is anchored at
CVTQLPAQ (CVT+X'BC'). The loaded module's name and starting address
reside in the COE. Those with starting addresses less than the value in
CVTLPDIA (CVT+X'168') were members of either an IEAFIXnn list or an
IEALPAnn list. For members of these lists, COEs are built by ,~IP and they occupy
real, fixed storage even when the module is not in use. If fixed storage or
fragmentation is a problem, moving these modules to LPA can provide a
partial solution.

OS/VS2 System ·Programming Library:' MVS Diagnostic Techniques

Perfonnance Degradation (continued)
6.

The BLDL table, pOinted to by the nucleus external symbols IEARESBL
and/or IEARESBS, should be checked. The address(es) should be less than the
value at CVTNUCB, the upper nucleus boundary. If not, try changing the .
BLDL=nn system initialization parameter to BLDF=nn. This will cause the
BLDL list to occupy real storage at all times. If the number of entries is less
than 93, one frame is used.

7.

In a quiesced system, the number of paging requests received should equal the
number of paging requests completed by ASM. The fields ASMIORQR
(ASMVT+X'IDO') and ASMIORQC (ASMVT+X'lD4') in the ASMVT
represent the number of tequests received and completed,
respectively. The difference between the two counts represents requests not
completed. A large number of uncompleted requests can indicate ASM is either
not processing at all or is taking considerable time for each operation.
Examine the PAT (page allocation table) to determine whether the page data
sets are almost full. Also examine ASMERRS (ASMVT+X'74'), PAREFLGS
(pART+X'8'), and the 10SB for paging requests (IOSCOD=X'41') to determine
if I/O errors have occurred and the data sets are no longer in use.

8.

CSA use should be examined. If SQA is depleted, requests are filled from
CSA. This can be determined by inspecting the SQA DQE (descriptor queue
element):
•

the CVT+X'230' points to the GDA (global data area)

•

the GDA+X'18' points to the SQA SPQE (subpool queue element)

•

the SPQE+X'4' points to the SQA DQE.

The DQEs are chained together. If more than one DQE exists for the SQA,
it has expanded into the CSA. This causes the frame to be fixed. Also, often
CSA users page fix. In this case fragmentation, if present, could cause
performance degradation.
9.

Possible real frame shortage can be indicated by inordinately large counts in
the PVT fields: PVTRSQA (a count of the number of times the SQA reserved
frame was allocated), and PVTDFRS (a count of the number of times real
frame allocation was deferred because of a lack of frame availability). These
counts by themselves mean little,"but can be of some use when analyzing an
overall problem.

.. Section 4: Symptom Analysis Approach

4.4.5

4.4.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Incorrect Output

The problem of missing, unexpected, or erroneous output is one of the most
difficult. This incorrect output might take the form of a message on the console
log or in SYSOUT, or an incorrect total in a report. There is usually very little
documentation that assists the debugger in analyzing incorrect output.

Ini tial Analysis Steps
To resolve the problem of missing or incorrect output the analyst must have a
complete understanding of the job environment. There is no fast, clear cut
approach to these errors. This section only tries to assist your thought processes
as you begin to work on a problem of this type.
There are four basic categories of incorrect output: missing, unexpected,
erroneous, or a combination of these. The steps in resolving the problem must
take the category into account.
Initially, consider the following steps:
1.

Gather all possible documentation. You will probably need additional
information as you begin to understand the problem in more detail.

2.

Consider all recent hardware and software changes to the system and to the
application(s) if relevant. A change to an application that updates a data base
affects all other data base users.

3.

Remember that output requires input. Consider the possibility of bad input.

4.

Consider whether the problem is associated with some new function or
application. Most incorrect output errors occur in the installation and test
phase.

Isolating the Componen t
Next, attempt to locate the component causing the error. Do this by thinking
through the flow. listed below are some questions that might assist you.
•

Is the problem related to a user function or application? If yes, have there
been recent changes or is testing still in progress?

•

Is the job control language correct? Have there been recent changes to the
JCL?

•

Have any user exits been added or modified?

•

Have any user supervisor calls (SVCs) been added or modified?

•

Are there operator interactions that could affect the input/output?

Section 4: Symptom Analysis Approach

4.5.1

Incorrect Output (continued)
•

From which access method or function is the output expected? Some
examples are: JES, VSAM, BTAM, TCAM, and WTO.

•

Was RJE involved in the input and/or output?

•

Was there any cross-address-space communication involved in the data
movement? In MVS, most telecommunication requires data passing between
address spaces.

•

Is there any evidence of I/O error activity? Refer to the console log and
LOG REC data.

•

Do you have a storage dump, or should you obtain one? See the chapter on
"Additional Data Gathering" in Section 2.

•

Would a trace be helpful in understanding the flow? Consider tracing the
activity with GTF.

Many of the above questions have to be answered in order to get a better understanding of the problem area. In many cases, the problem has to be recreated
with various traces or traps. These questions help to determine what data is
needed to solve the problem.

Analyzing System Functions
To solve an incorrect output problem, you must understand the mode of
operation and the processes required to accomplish the function in question.
The first question must be the following: where does the output originate?
Then you must be able to verify that the activity did occur. There must be some
means for understanding the path the data should take from the origin to the fmal
location (device). .
Consider the following example:
I.

A TSO user invokes his program which should write a message to the terminal
and then wait.

2.

The program waits after the I/O but no message appears.

3.

What are the system functions involved?
a.

A language translator and the linkage editor that created the load module.

b.

OPEN code necessary to complete the link between the device and the
user PUT macro.

c.

TSO TIOC flow. The user issues PUT which branches to the TIOC module
IGG019T4. This module issues TPUT. What is the TPUT path through
TIOC?

d.

TSO TIOC interfaces with TCAM. What is the data path through TCAM ?

e.

TCAM interfaces with the I/O supervisor. Can evidence be found of the
SIO? What types of trace would be helpful?

In this example it may be necessary to take a series of dumps to resolve where
the message was lost. But first be certain that the correct message is in the correct
buffer at the time of the user PUT macro.

4.5.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Incorrect Output (continued)
It could be necessary to apply this type of thinking all the way down to the
CSECT level.
'

Summary
In analyzing incorrect' output, there are two key points. The first is that a better
understanding of the system flow is probably required for this type of problem
than for any other. The second point is that it is very important to be able to
obtain the correct documentation at the correct time.
Note: The chapter on TP (teleprocessing) problem analysis earlier in this section
provides some specific steps for analyzing incorrect output in the TP environment.
Many of the techniques in that chapter can be applied to incorrect output
analysis.

Section 4: Symptom Analysis Approach

4.5.3

4.5.4

OS/VS2 System Programming Library: MVS Diasnostic Techniques.

Section 5: Component Analysis

This section describes the operating characteristics and recovery procedures of 15
system components and provides debugging techniques for determining the cause
of an error that has been isolated to a component.
The components described in this section are contained in the following
chapters:
• Dispatcher
• lOS
• Program Manager

• VSM
• RSM
• ASM
• SRM
• VTAM

• VSAM
• Catalog Management
• Allocation/Unallocation

•

• JES2

• SSI
• RTM

I•

Communications Task

Section s. Component Analysis

5.1.1

5.1.2

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Dispatcher

For effective problem analysis, it is important to understand how work is processed
by the MVS system. The MVS dispatcher plays a large role in processing work by
controlling the initiating of all work within the system. An understanding of the
dispatcher's processing and control block structure is imperative for the debugger.
This chapter describes the following items about the MVS dispatcher:
• Important dispatcher entry points
• Dispatchable units and sequence of dispatching
• Dispatchability tests
• Dispatcher recovery considerations
• Dispatcher error conditions

Important Dispatcher Entry Points
The dispatcher's main entry points are the following:
lEA ODS

- Entered disabled, key 0, supervisor state, no locks held.

This entry point is called by the following:
• Exit prologue (lEAVEEXP), when control is not returned to the issuer of an
SVC.
• Lock manager (IEAVELK), when it is suspending a task that unconditionally
requested a local lock that was unavailable.
• Program check FLIH (lEAVEPC), when a TCB or SRB was suspended because
of a page fault that required I/O or because no frames were available.
• External FLIH.
• RTM (recovery termination manager).
IEAPDS7

- Entered disabled, key 0, supervisor state, no locks held.

This entry point is called by I/O FLIH and by SVC FLIH when the SVC requires

., alocal lock that is not available.
IEAPDS6

- Entered disabled key 0, supervisor state, no locks held.

This entry point is called by RTM on an EOT (end of task) condition.
IEAPDS2

- Entered disabled, key 0, supervisor state, the dispatcher lock held.

This entry point is called by the lock manager (lEAVELK), when suspending an
SRB that has requested the local or CMS lock, and when suspending an address
space that has requested the CMS lock.

Section 5. Component Analysis

5.1.3

Dispatcher (continued)
IEAPDSRT - Entered enabled or disabled, any key, supervisor state, no locks held.

This entry point is the termination return address for all SRBs.
Ds.{STCSR - Job step timing subroutine. Calculates an.P accumulates job step
timing.

This entry point is called by the following:
• Lock manager (lEAVELK), when common suspend routine of the lock manager
is suspending an SRB or locally 10cked'TCB because of a lock request or a page
fault suspension.
• Dispatcher. The dispatcher calls this subroutine internally when it is saving the
status of a previously executed unit of work.
• Timer SLIH. The timer SLIH calls this subroutine before it gives control to
SRM.

Dispatchable Units and Sequence of Dispatching
This section describes the unique dispatchable units of work and the queues where
they are located. The dispatchable units are described below and are listed
according to the priority with which they are dispatched.
1. Special Exit
A special exit is made known to the dispatcher by a unique flag setting in the
LCCADSF 1 (LCCA + X'21 C') field. The LCCADSF 1 bits and the exits they
indicate are:
Bit

Exit

LCCAACR
LCCAVCPU
LCCATIMR

ACR
Vary CPU
Timer Recovery

The dispatcher enters these exits via a branch.
2. Global SRBs
IEAGSMQ is the header for the global SRB staging queue. If it is not zero, it
points to the global SRB queue. (See Figure 5-1.) Requestors use the
SCHEDULE macro to compare and swap global SRBs onto the queue. The
dispatcher obtains the DISP lock and removes the SRBs from the queue with
the compare and swap (CS) instruction. The dispatcher then calls CSECT
lEAVESCO at entry lEAVESC I in order to move the SRBs to the appropriate
priority level (0 or 4) on the GSPL (global system priority list) queue.
IEAGSPL is the global SRB dispatching queue. The queue is divided into
non-quiescable (priority level 4) and system level (priority level 0) SRBs. The
dispatcher removes the SRB from the GSPL queue, updates the PSAAOLD
with the SRBASCB address, loads its STOR value, and dispatches the SRB.
PSAANEW is not updated.

5.1.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Dispatcher (continued)
Global SRB Staging Queue
10

IEAGSMQ
CVTGSMQ
X'264'

SRB

o
Global SRB Dispatching Queue
10

IEAGSPL
CVTGSPL
X'26C'

X'S'
PSAANEW PSAAOLD

I

125CO

I

125CO

I

o
ASCB (at location FFE300)

IEAGSPL

X'2C'

ASCBSTOR

o
NOTE: 0 and 4 in SRBs represent system priority level

Figure 5-1. Global SRB Queue Structure and Control Block Relationships

Section 5. Component Analysis

5.1.5

Dispatcher (continued)
3. Local SRBs
IEALSMQ is the header for the local SRB staging queue. If it is not zero, it
points to the local SRB queue. (See Figure 5-2.) Requestors use the
SCHEDULE macro to compare and swap local SRBs onto the queue. The
dispatcher tests this queue if it cannot fmd any special exits or global SRBs to
dispatch. If this queue is not empty, the dispatcher obtains the DISP lock and
removes the entire queue with compare and swap instructions. The disp~tcher
then calls CSECT lEAVESCO at entry lEAVESC2 in order to move the SRBs to
the appropriate priority level (0 or 4) on the LSPL. lEAVESC2 also notifies
SRM via the SYSEVENT macro if the address space is swapped ou t. Memory
switch is then invoked to direct the dispatcher to the highest prionty work.
(Note that no work is dispatched. -The SRBs are simply moved to the
appropriate dispatching queues' (ASCBSPLs).)

5.1.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Dispatcher (continued)
After a user request to schedule local SRBs:
10
IEALSMQ

CVTLSMQ
X'268'

SRB
SRBASCB

After the dispatcher has determined there are SRBs to be processed and moves them to the
appropriate ASCBlevel:
10

X'1C'

Figure 5-2. Local SRB Queue Structure and Control Block Relationships

Section 5. Component Analysis

5.1.7

Dispatcher (continued)
4 .. Address Space Dispatcher
This is not actually a unique dispatchable unit of work, but rather an anchor for
the real dispatchable units of work (that is, local SRBs or TCBs). The address
space dispatcher is entered to select the next address space (memory) in which
work will be dispatched. If an address space is dispatchable, the priority of
dispatching within the address space is the following:
a) Local SRBs
b) Local Supervisor (locally locked, interrupted work)
c) TCBs
If the dispatcher finds any SRBs on the LSPL (pointed to by the ASCBSPL),
the top SRB is de queued and dispatched. If there are no SRBs on the local SPL
queue, the local lock is tested for the interrupt id, X'FFFFFFFF'. If the
interrupt id is in the local lock, the id is changed to the current CPU ID via
compare and swap, and the status (FRRs, GPRs, FRR stack, CPU timer value,
PSATOLD, PSATNEW and resume PSW) is restored from the IHSA (Interrupt
Handler Save Area). The ASCBASXB points to the ASXB; ASXBIHSA
(ASXB + X'20') in turn points to the IHSA. Status is saved in the IHSA when
a locally-locked program is interrupted and control is switched away from it
because there is higher priority work to handle.
The dispatcher does a compare and swap to obtain the local lock:
• If the local lock is available and the number of ready TCBs exceeds the
number of processors active in the address space,
• or if the ASCBS3S bit (ASCB + X'67') indicates that there is work for the
Stage 3 Exit Effector to process.
If the dispatcher is successful in obtaining the lock, it will go to the Stage 3
exit effector, if necessary, and then select the first dispatchable TCB that is
not active on another processor.
The dispatcher may dispatch the above units (SRBs, supervisor, TCBs) without going through the memory dispatcher if the address space was current when
the dispatcher was entered and if there was no indication that a memory switch
was required. (PSAANEW =PSAAOLD).
5. Wait Task
The wait task is dispatched when the dispatcher reaches the bottom of the
ASCB ready queue and can find no ready work after a recursive search of the
SRB queu~s and the ready queue.
Figure 5-3 provides an overview of the processing sequence through the MVS
dispatcher.

5.1.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Dispatcher (continued)

EXIT

START

INDICATE
SRB MODE

UPDATE
PSAANEW
, PSAAOLD

INDICATE
SRB MODE

RESTORE
STATUS FROM
IHSA
LPSW
RESTORE REGS,
PSW FROM
IHSA

LPSW

GET HIGHEST
READY TCB

YES
WAIT

RECURSIVE SEARCH OF THE SRB QUEUE AND THE
READY QUEUE TO VERIFY THAT NO READY WORK EXISTS ..

Figure 5-3. Dispatcher Processing Overview

Section 5. Component Analysis

5.1.9

Dispatcher (continued)

Dispatchability Tests
The dispatcher conducts the following dispatchability checks:

SRB Tests
Test

*

Condition

1.

ASCBRCTF / / ASCBOUT

Address space swapped out.

. 2.

CSDSCFLI//CSDSYSND

System non-dispatchable .

(al ASCBFLG2/ /ASCBXMPT

(a)

3.

ASCBDSPII!(any bit on)

Address space non-dispatch able .

. 4.

ASCBFLG2/ / ASCBSNQS

All SRBs stopped.

. 5.

ASCBSSRB

System level SRBs stopped (This does not
apply to NONQ SRBs.)

6.

SRBCPAFF

Does SRB have affinity to this processor?
(PCCACAFM defines the current processor)

7.

SSRBFLG 1//SSRBLLH

If set, compare and swap (via CS instruction)
CPUID into local lock word, ASCBLOCK,
which has the suspend ID in it
(X'7FFFFFFF').

If the system is non-dispatchable, the
SRB must have been scheduled to an
exempt address space.

*Format of test description is "field//bit within field."

S.I.10

OS/VS2 System Programming'Libruy: MVS Diagnostic Techniques

Dispatcher (continued)

Address Space and Task Tests
The following address space test criteria must be met before the task dispatcher gets
control.

Address Space Tests

Condition

1.

ASCBDSPI / / ASCBNOQ

The ASCB is not on the ready queue. The
address space will not be dispatched.

2.

ASCBDSPI / /ASCBF AIL

The ASCB is in failure mode and in the
process of being terminated. The address
space will not be dispatched.

3.

CSDSCFLI / /CSDSYSND

System non-dispatchable.

(a) ASCBFLG2/ /ASCBXMPT

(a) If the system is non·dispatchable, the
SRB must have been scheduled to an
exempt address space.

4.

LOCAL LOCK//ASCBLOCK
Suspend ID
(X'7FFFFFFF')

Cannot process the address space unless an
SRB owned the local lock and was
suspended and is now re-scheduled to be
dispatched.

Interrupt ID
(X'FFFFFFFF')

Compare and swap CPUID into local lock
and restore the status (FPRs, GPRs, FRR
stack, CPU timer value, PSATOLD
PSATNEW, and resume PSW) from the
IHSA.

Own CPUID

Restore GPRs (general purpose registers)
and PSW from the IHSA.

Free
(X'OOOOOOOO')

If ready work is in the address space,
compare and swap (via CS instruction) the
CPVID into ASCILOCK.

Other CPVID

Bypass the address space.

5.

ASCBFLG 1/ / ASCBS3S

Go to the task dispatcher and interface with
Stage 3 exit effector.

6.

ASCBTCBS > ASCBCPVS

There are more TCBs ready than there are
processors currently executing in the address
space; the address space can be dispatched.

After these six tests indicate that the dispatcher should dispatch in an address
space, the following task indicators are tested .

.SecUOR S. COIRponeat.\aaly~

5.1.11

Dispatcher (continued)
Task Tests

Condition

1.

RBWCF

RB must not be waiting.

2.

TCBFLGS4

TCB primary non-dispatchability flags must
not be set.

3.

TCBFBYT 1/ /TCBACTN

If TCB is active, it must be a redispatch
situation; otherwise, this TCB is active on
the other processor (TCBCCPVI).

4.

TCBAFFN

TCB affInity, if any, must match this
processor's physical address (which is
located in PCCACAFM).

Miscellaneous Notes about the Dispatcher
1.

. 2.

You can determine the last dispatch by examining the PSW at location X'300'.
The TOD of the last dispatch is located at LCCADTOD (LCCA + X'2S8') .
The dispatcher sets the following mode indicators before dispatching work.
a.

For a global SRB - LCCADSF2//LCCASRBM, LCCAGSRB, and
LCCADSRW
PSA TNEW/PSATO LD =O's

b.

For a local SRB - LCCADSF2//LCCASRBM, and LCCADSRW
PSATNEW/PSATOLD = O;s

c.

For a task - LCCADSF2//LCCADSRW
PSATNEW/PSATOLD =f O's TCB address

5.1.12

OS/VS2 System-Programming Library: MVS Diagnostic Techniques

Dispatcher (continued)

Dispatcher Recovery Considerations
Dispatcher recovery is designed to record information about the error, reconstruct
critical dispatching queues, and to retry to continue normal dispatching functions.
The data that the dispatcher records in the system diagnostic work area (SDWA)
is the following:

Fixed Data:
SDWAMODN

IEAVEDSO, dispatcher module name

SDWACSCT

IEAVEDSO, dispatcher CSECT name

SDWAREXN

IEAVEDSR, dispatcher recovery routine

Variable Data:
SDWAURAL - Seven full words' of data as follows:
PSAHLHI

- Locks held at time of error.

ASCBLOCK

- Value of locallockword for current address space at the
time of error.

LCCASPLJ

-SRB queue journal word. Contains the address of the
top SRB on the staging queue when dequeued by the
dispatcher and passed to IEAVESCO.

PSAAOLD

- Current ASCB address at the time of error.

Control Register 1 - Value of STOR (CRt) at the time of error.
PSATOLD

- Current TCB address at the time of error.

LCCADSFl

- Dispatcher flag bytes that were on at the time of error.

If the dispatcher lock was held at the time of error, the following recovery
routines are called by the dispatcher recovery routine:
•

IEAVESCR - Schedule recovery routine; it recovers SRB queues.

•

IEAVEQV3 - Verifies, and possibly reconstructs, the ASCB Ready Queue.

•

IEAVEGAS - Verifies each ASCB on the ready queue.

If the local lock was held by the dispatcher, the error was not a DAT (dynamic
address translation) error; and if the current ASCBSTOR value equaled the
CRl value, then the following recovery routines are invoked by the dispatcher:
•

IEAVEEER - Exit effector recovery routine (if the ASCBS3S is on).

•

lEAVEQV3 - Verifies, and possibly reconstructs, the TCB queue.

•

IEAVETCB - Verifies each TCB on the TCB queue.

Note: The queue verification routine, IEAVEQV3, also records error information
in the SDWAURAL about any changes to the queue structure.

Section s. Component Analysis

5.1.13

Dispatcher (continued)
By removing elements that have been overlaid (or "clobbered") from the queue,
the dispatcher recovery routine attempts to keep the system up at the cost of a
particular user,job, address space, etc. There is a certain exposure in this
philosophy.because the element that has been lost might have owned a critical
system resource or might be a critical function in itself (for example, a TCB that
represents the user's main application program). Once the element is lost, there
might be no indication that it was a critical resource (a valid control block, for
example) or that it owned a critical resource.

Dispatcher Error Conditions
•

The abend COD is issued from CSECT lEAVESCO when a local SRB is
scheduled t6 an invalid ASCB.

•

Program check interrupts (usually of the page, addressing, or segment
exception variety) occur when:
PSAANEW is overlaid and the dispatcher attempts to switch address spaces
into the value in the PSAANEW
- PSALCCAV or PSAPCCAV values are overlaid
The CVT pointer is overlaid
The ASCB ready queue is overlaid
- The TCB queue or the TCBRBP field is overlaid

5.1.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

lOS

The purpose of the I/O supervisor (lOS) is to provide a central facility to control and
conduct I/O activity through· the operating system. The structure of lOS in MVS
is somewhat different than that of previous operating systems. In MVS, lOS
"front end processing" is responsible for device control and I/O initiation; lOS
"back end processing" is responsible for processing interrupts, providing sense
information in error situations, and scheduling the posting of the I/O requestor at
completion time. (Figure 5-4 provides an overview of lOS front-end and back-end
processing. Figure 5-5 shows the major lOS and EXCP control block relationshlps.)

Front-End Processing
The major portion of the I/O process (the queueing of I/O requests and starting
them) is contained in CSECT IECIOSCN (microfiche name IECIOSAM), which is
called the channel scheduler. The channel scheduler is invoked through an
interface provided by the ST ARTIO macro via a branch entry. The channel
scheduler assumes that all channel program translation and page fixing of buffers
and CCWs is performed by the caller. The control block interface is the SRB/IOSB
combination, which must be non-pageable and commonly addressable from any
address space (that is, SQA and fixed CSA). The channel scheduler operates in
physically disabled mode. Invokers (called "drivers") of the channel scheduler
include EXCP, VSAM block processor, VTAM TPIOS, and PCI fetch; they are
identified by the driver ids located in the IOSB+4.

Back-End Processing
When lOS is invoked for an I/O interrupt, processing starts in the I/O first level
interrupt handler (FLIH) which branches to an entry point, IECINT, within the
channel scheduler. Back-end lOS executes physically disabled in the address space ·
that is active on the processor at the time of the I/O interrupt. lOS then schedules
the SRB/IOSB to the address space of the requestor. The module IECVPST (post
status) receives control under the SRB and interfaces with the driver's special exits
and termination routines (channel end, abnormal endappendages). Figure 5-4
shows an overview of the lOS process using EXCP as the I/O driver.

lOS Problem Analysis
Problems in the I/O process can cause three symptoms:
I. Abend codes
2. Loops
3. Wait states
These symptoms are discussed in the following sections.

Section 5. Component Analysis

5.2.1

lOS (continued)

Front· End,Processing

Back· End Processing

User
SVCO

......

...

~

IECVSMGR
Gets storage
for SRB/IOSB,
TCCW, BEB,
FIXLlST, ROE

BR

~

EXCP
Driver

-

IECINT
(lOS)

tECVSMGR
BR

I ECVTCCW
Scheduled via SRB/IOSB

CCW
Translation
Fixing

BR
~

~

BALR
SRB/IOSB (input parameter)

,

EXCP Driver

(lOS)

IECVSMGR

•
•

Gets storage for
100E (I/O
Oueue Element)

•

...

IECIOSCN

po

....

BR

BR

•

EXCP Driver

t
User

Figure 5-4. lOS Processing Overview

5.2.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

POST
Appendage Interface
IECVTCCW
-

Retranslation

-

Unfixing

IECVSMGR
-

Free blocks

Frees 100E

lOS (continued)

Major lOS Control Block Relationships

Major EXCP Control Block Relationships

CVT

8e
lOB
DCB

LCH

0

• 1st 100

,/

/

4

(

• Last 100

/

\

\

100

I

"

\

,

(

~

....

UCB Prefix

---

lI'

UCB
14

8
ASCB

- - - - - - - -)

20

-f-

'1\

4

I

8

I

Virtual -

4C

Real

~I

I

~CHANPGM

.~l--A'"

Common UCB
Extension

..J

'\

I
I
./

.... TCB

/

-

0
4

48

~(DEB

10

~
2C ~

.J

C

I

14

UCB

'.,..TCCW

EWA

34

ROE

0 •

I
I

I

.- /

1C

.....

-4

8

10SB

-Va--

8

-

SRB

---

CCW1
CCW2
CCWn

L.-",

I

. . _J

Figure 5·5. Major lOS and EXCP Control Block Relationships

Section 5. Component Analysis

5.2.3

lOS (continued)

lOS ABEND Codes
lOS abends are generally caused by an invalid control block. The error can be
caught by validity checking or it can cause a program check. The recovery
routines, generally FRRs, receive control on a program check. For either a
validity check or a program check, the error is converted to an abend code.
EXCP FRR processing saves the abend code and the relevant status (that is,
error PSW, and error registers) at the time of error in the EXCP problem
determination area, which is pointed to by the TCB (X'CO'). lOS abend codes
are documented in OS/VS Message Library: VS2 System Codes. The EXCP
problem determination area is documented in OS/VS2 I/O Supervisor Logic.
Note: DUring abend processing, the EXCP problem determination areas are,
not freed. When you find the area pointed to by the TCB, scan that area for
previously-obtained areas to help with lOS analysis.

Loops
If an invalid control block is passed to lOS and it is not caught by the validity
check routines, a loop is often the result. The traditional problem has been caused'
by the storage manager (IECVSMGR) passing a bad address back to a requestor.
Consequently, the requestor initializes the bad block and overlays or clobbers some
valuable piece of storage. On occasion the bad address passed by IECVSMGR is O.
The fact that most of the I/O process runs in supervisor state, key 0 means that the
PSA can be overlaid. This usually causes a program check loop whenever any type
of interrupt is subsequently received by the processor.
At this point, pattern recognition is importari~ to determine whether the storage
manager has been involved in the problem. (pattern recognition is discussed in the
"Miscellaneous Debugging Hints" chapter in Section 2.) Try to determine whether
ohas been used as the address of an SRB/IOSB or EWA control block. The first .
X'AO' bytes ofPSA may be affected. The routine responsible for this could be an
lOS driver or recovery routine. Look for addresses of exit routines which are
pointed to by the 10SB; they give an indication of the driver and potentially,
some idea of the process. Remember that the hardware stores the current PSW as
an old PSW (at locations X'18' - X'40') if any interrupt occurs. Therefore these
locations may not look bad.
The main thing to keep in mind is that generally IECVSMGR is not the cause of
the problem. For performance optimization reasons, the storage manager has
minimal validity checking and thus trusts that the invoker is operating correctly.
Historically the cause of this type of problem is that the same block is freed twice,
which causes the storage manager's free queue to contain invalid pointers.
Often this double freeing has occurred some time earlier, which makes the
recreation of the erroneous process very difficult. Extensive analysis and piecing
are required. Multiple dumps may help provide the pieces necessary to recognize a
pattern or common occurrence. Or, a trap might have to be devised.

5.2.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

lOS (continued)

If there is evidence of a recent error in the I/O process, searching the in-storage
LOGREC buffer or SYSl.LOGREC records for an lOS error helps recreate the
process. Generally the lOS recovery routines attempt to free control blocks and
might inadvertently free one that has just been freed. Try to determine if there is
any way that the channel scheduler or I/O driver and its associated exits could have
freed blocks before or after recovery processing. In a retry situation, normal
termination procedures could have freed a block that was already freed by
recovery. Again, traps might be required.

lOS WAIT States
Another problem is an enabled wait state with work remaining I
for lOS to accomplish. To analyze a wait state, it is necessary to determine the
cl,Jrrent status of lOS. To determine current lOS status, scan the UCBs for valid
10QEs in UCBIOQ (UCB-4). The 10QE is valid ifUCBPST (UCB+6, bit X'20') is
on. The 10QE address is valid only when it is active. Understand that once,
a block is freed, it is generally reused quickly when a subsequent request for:
an I/O operation is encountered. Because of this, it is very uncommon to find a
significant 10QE pointed to by the UCB prefix once lOS has returned the block.
The block usually represents another request. If the UCB pointer in the 10SB
pointed to by the 10QE does not equal the address of the UCB you started with,
the blocks have been reused and the data is invalid.
Additionally the 10QEs can be found in the storage manager areas. These are
located by CVT+X'7C' which points to IOCOM+X'24' which points to module
IECVSMGR. Label IECVSHOR is an external symbol for the storage pool headers
for small blocks (IOQEs). These are foliowed by the pool headers for medium
(RQEs) and large (SRB/IOSBs, BEB, TCCW, ERPWA, fix lists) blocks. The pool
headers are 16 bytes long and the last word points to segment headers for 2K bytes
(small block) or 4K bytes (medium and large blocks) of storage. The IOQE+5
contains an allocated indicator. If all X'3C' bits are on, the block is allocated and,
in the case of 10QEs, represents I/O requests that are started or that have been
requested by a driver but have not been started because of a busy or not ready
condition (UCBFLA).
After the storage manager (medium and large) blocks are found, notice their
g-byte prefixes, the first halfword of which contains the ASIO of the address space
to which the block is allocated. Note that the ASIO is 0 when the block is not
allocated and in special cases such as when unsolicited device ends are not
associated with any address space. Scanning these prefixes for an ASIO that.
matches the problem address space can help in finding blocks
associated with I/O requests related to that address space. Medium and large blocks
that contain aX'17' in the fourth byte of the prefix are not allocated. A value of
X'7S' for medium blocks, and X'76' for large blocks, indicates that they are
currently allocated. (Note that the third byte of this first word of the prefix is
unused.)
The 10QE points to the associated 10SBs which contain information about the
channel programs and pointers to the requestor's control blocks.

Section S. Component Analysis

S.2.S

lOS (continued)
In general, VCBs and associated 10QEs/IOSBs indicate active I/O. Any flag bits.
set in the VCB + 6/7 help identify the status of the requestor. Also, investigate
VCB flags indicating the quiesce option, DAVV (direct access volume verification)
processing, I/O restart, missing interrupt handler (MIH), or message pending.

Another place to look is the LCHs (logical channel queues). When a STARTIO
macro is issued,if both the channel and device are available, lOS attempts to issue
the SIO instruction. If any bit in VCBFLA (VCB+6) is on, the device is considered
busy. The TCH instruction is used to determine if the channel is busy. If ei ther is
busy, the 10QE for the request is queued to the LCH. This queue then indicates'
all requests that have been accepted for processing but for which either no SIO has
yet been issued or an SIO was issued but a non-zero condition code was received.
The first LCH is pointed to by CVT+X'8C'. Each LCH is X'20' bytes long.
VCBLCI (VCB+ X'A') is an index to the LCH for the given VCB. Each LCH is a
double-headed, single-threaded queue of 10QEs. The LCH + 0 points to the first
10QE and LCH+4 points to the last, or only, 10QE. If LCH + 0 is all Fs or Os the
.~_ _ -----9ueue is emm. in which case ther.uue_lliLLe_qlte_stsJnL1haLchanneL---.The~QQEs__ _
themselves are linked with 10QELNK (IOQE+O). 10QEIOSB (IOQE+8) points to
the 10SB for the request it represents. Note that 10QENQ (IOQE+4, bit X'40')
must be on for all 10QEs on the LCH.

General Hints For lOS Problem Analysis
1 . Saveareas. lOS does not use save areas in the standard manner. When registers
are saved, the order is often 0-15 at offset 0 into the save area. If the local
lock is obtained (as is generally the case), IECVPST, the first module to
execute in the user's address space after an I/O interrupt, uses the local lock
save area (ASXBFSLA at ASXB+X'24') to pass the address of the local lock
save area to the exit routines. An exception is I/O interrupt processing for a
paging pack where a storage manager or ASM area is used. Basic lOS uses the
lOS save area (LCCA+X'218' points to the CPV work save area vector table!
(WSAVTC); WSAVTC+X'18' points to the lOS save area). This save area is
also passed to DIE (Disable Interrupt Exit) routines. Also, the TCCW control
block contains a save area. EXCP passes the address of the associated
TCCW+X'48' (in Register 13) to appendages for use as a save area.
2.

EXCP back-end processing does all the interfacing to the traditional
appendages. In MVS, appendages are entered in SRB mode, physically
enabled, and with register 13 containing the address of a save area.
It is EXCP's responsibility to map the 10SBto the lOB to maintain
compatability. Also on return from the appendage, EXCP re-maps the lOB to
the 10SB.

5.2.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

~

lOS (continued)
3.

The EWA (ERP work area) can be important in problem analysis. The
IOSB+X'34' points to EWA, which contains infonnation, including sense data,
passed to the ERPs from lOS as well as work areas and counters for the ERPs.
The ERPIB, which is useful for channel errors, is contained in the EWA.
See the topic "Error Recovery Procedures (ERPs)" later in this section for a
description of ERP processing.
Several problems have been uncovered where ERPs constantly retry an I/O
operation that constantly fails. The EWA can contain the number of retries.
and other control information helpful in determining the reason why. EWAs
often contain the retry CCWs.

4.

The LCCA of each processor contains an IRT (lOS recovery table). lOS uses
various fields in the IRT to checkpOint its progress. The IRT also contains
pointers to the active control blocks on whose behalf lOS is processing.

5.

Two 10SB flags (IOSEX, 10SERR) are used to control error processing. For a
permanent error the general flow is:
•

Abnormal or nonnal exit entered with IOSCOD=7F, 10SEX=1,
IOSEl
......../

~
1 - - - - - - ' :;
"0

1------.. 0

:2E

~
--y

Builds fixed LPA
(via IEAFIXxx)
and puts COEs on the
ALPAO.
Note: Some modules
can now be in both
the pageable and fixed
LPAs.
Builds modified LPA
(via IEALPAxx).
Fetches modules and
enqueues CO Es on
bottom of ALPAO
only if modules are
not already in the
fixed LPA.

ALPAO
(SOA)

MOduleslb
Pageable
LPA

....

">

-----...",
--'

r1

PLPAO

Modified
LPA

r1

Uses lEA LO 000
to build
"permanent" COEs
for specified
pageable LPA
modules.
SVCLlB
Modules

Figure 5-9. IEAVNP05 Initialization

5.3.6

>

v

LPOEs
~----------------------~,-'"

..........
-----~

LlNKLIB
Modules

If CLPA is specified,
builds pageable LPA
and PLPAO for all
modules on
SYS1.LPALIB.

COEs

OS/VS2 System Programming Library: MVS Diagnostic Techniques

...
"

Modules>

....

Fixed LPA

Program Manager (continued)
If a useable copy of the requested module is not immediately available, the
requestor's program manager SVRB is put into a wait state and enqueued on the
SVRB suspend queue (SSQ). The SVRB is dequeued and posted out of its wait
when the desired module becomes available. For "not in storage" suspends,
module IEAVLKOI posts all SVRBs queued on a CDE's SSQ when it successfully
completes a module fetch. Each of these SVRBs then restarts the LINK request
essentially from the beginning at entry point IEAQCS02 in module IEAVLKOO.
For the serially reuse able case, module IEAVLK02 posts the top SVRB on a CDE's
SSQ when the PRB that was using the module represented by the CDE exits. In
this case, execution resumes in module IEAVLKOO at entry point IEAQCS03. The
logic at this entry point assumes the requested module is in storage and
immediately available.
Once a module becomes available to a request, the module-use count in the CDE
is increased by one. This use count is decreased by one when the current requestor
no longer needs the module.
Next, LINK processing gets storage for a PRB out of sub pool 253. The PRB
is initialized (including setting the RBOPSW to point to the entry point of the
requested module) and enqueued on the current TCB's RB queue. It is enqueued
after the program manager SVRB, but before the linking module's RB. The
program manager then exits, thus causing the requested load module to gain
control next. (See Figure 5-10.)

PRB

How initialized by IEAVLKOO for LINK
(and ATTACH)

Field
RBPREFIX

zero

RBSIZE

13 double words

RBSTAB1

zero

RBSTAB2

from PM SVRB except RBATTN=O

RBCDFLGS

zero

RBCDE1

+ requested CDE (may be a minor)

RBOPSW-LH

from caller's RB (or AABCODOO)

Note 1
Note 2

RBOPSW-RH

module entry point from CDENTPT

RBPGMQ

from PM SVRB

RBWCF

from PM SVRB

RBLINKB

+

RBGRSAVE

from PM SVRB

caller PRB (or 'TCB if ATTACH)

Notes:
1.

2.

RBCDE1 will point to the CDE containing the requested load module name. This
may be a minor CDE. CDRRBP of the major CDE however will point to the new
PRB. Field CDRRBP in a minor CDE has no meaning.
If ATTACH, RBOPSW (left half) is set to AABCODOO where,
AA

=

from current PM PSW

B

=

from TCB protect key (TCBPKF)

C

=

X'C' if TCBFSM = 1; X'D', otherwise

D

=

from PICA if there is one, else 0

Figure 5-10. New PRB Initialization - LINK

Section 5. Component Analysis

5.3.7

Program Manager (continued)

ATTACH
When the ATTACH service routine completes the initialization of the requested
daughter TCB, it gives control to LINK in order to establish the first PRB for the
daughter TCB. ATTACH simulates the SVC FLIH by creating a program manager
SVRB under the daughter TCB and then causing the daughter to branch enter
module IEAVLKOO at entry point IEAQCS01. Processing is essentially the same as
for LINK except for APF considerations which are explained later.

XCTL
Module IEAVLKOO gets control from the SVC FLIH at entry point IGC007 when
the XCTL SVC (SVC 7) is issued. With XCTL, unlike LINK, the first function of
module lEAVLKOO is to establish the new RB. The method used depends on the
type of caller, as follows:
•

If the caller is an SVRB, the caller's SVRB is reused for the new module. It
remains in the TCB RB queue in the same position as it was when IEAVLKOO
got control.

•

If the caller is an IRB, storage is obtained from subpool 255 for a new PRB.
The new PRB is then enqueued on the TCB RB queue between the IRB and
the program manager SVRB.

•

If the caller is a PRB, storage is obtained for a new PRB from subpool 255 and
then it is enqueued upon the TCB RB queue following the program manager
SVRB. The caller's PRB is then put on top of the queue. The program
manager then issues the EXIT SVC (SVC 3) forcing the caller's PRB, since it
now is on top of the queue, through exit processing. This results in the storage
for the caller's old module being freed before the new module is obtained. The
program manager then resumes execution at entry point IEAQCS02 in module
IEAVLKOO.

Figure 5-11 shows how the new PRB (SVRB in the case where the caller is an
SVRB) is initialized for an XCTL. Figure 5-12 shows how the new RB is enqueued
in the TCB RB queue before the program manager locates the new load module.
The next function in the XCTL process is to locate the desired module. If the
caller is an SVRB, the module is searched for via the ALPAQ; if it is not found, it
is searched for via the PLPAD. If it is not found by either the ALP AQ or the
PLPAD, an 806 abend is generated. If the load module is found, final initialization in the RB is completed and the program manager exits. The following
exceptions to normal processing occur when an SVRB issues an XCTL macro
(they are made for performance reasons):

5.3.8

•

Only the ALPAQ and PLPAD are searched.

•

If the CDE on the ALPAQ is found useable, the use count is not increased.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Program Manager (continued)

•

If an LPDE in the PLP AD is found useable, no CDE is built or enqueued on
the ALPAQ. Furthermore, the RBCDEI field is made to point to the LPDE
rather than a CDE.

If the caller is not an SVRB, the requested load module is located as it is in
LINK. Once found, initialization is completed on the already existing PRB and
return is made to the caller.

How initialized by IEAVLKOO for XCTL
RB field
Caller is a PRB

Caller is an I RB

Caller is an SVRB

RBPREFIX

zero

zero

left as is

RBSIZE

from caller PRB

17 double words

left as is

from caller PRB

zero

left as is except
RBTRSVRB, Note 1

from caller PRB

from caller IRB
except RBFOYN=1

left as is

RBCOFLGS

zero

zero

left as is

RBCOE1

t requested COE

t requested COE

+COE or +LPOE

RBOPSW-LH

from caller PRB

from caller IRB

left as is

RBOPSW-RH

t module entry point t module entry point t module entry point

RBPGMQ

zero

zero

left as is

RBWCF

from caller PRB

from caller IRB

left as is

t calling IRB

left as is

from caller IRB

left as is

RBSTAB1

RBSTAB2

RBLINKB

+caller's

RBGRSAVE

from caller PRB

caller RB

Note:
1. Bit RBTRSVRB indicates (for a dump routine) the location of the load module. It
will be set to 0 if the module was located via a COE on the ALPAQ. It willi;>e set to
1 if the module was located in the pageable LPA.

Figure 5·11. New RB Initialization - XCTL

Section 5. Component Analysis

5.3.9

Program Manager (continued)

XCTL by PRB

At Start:

TCB
~

Before the SVC 3:

TCB

.

After the SVC 3:

TCB

...

Program
Manager
SVRB

...

XCTLissuing
PRB

XCTLissuing
~ PRB's
calling RB

XCTLissuing
PRS

Program
Manager
SVRB

New PRB

...

Program
Manager
SVRB

I

New PRB

~

XCTLissuing
~ PRB'.s
calling RB

XCTLissuing
~ PRB's
calling RB

resumes at I EAQCS02

XCTL by IRB

TCB

...

Program
Manager
SVRB

-

...

Program
Manager
SVRB

....

-

New PRB

IRB

----.

...

XCTLissuing
~ PRB's
calling RB

XCTL by SVRB

TCB

New
SVRB

XCTLissuing
~ PRB's
calling RB

Lwas
Figure 5-12. XCTL RB Manipulation

5.3.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

XCTL·issuin g PRB 's SVRB

Program Manager (continued)

LOAD
Module IEAVLKOO is called by the SVC FLIH at entry point IGC008 when the
LOAD SVC (SVC 8) is issued. For a LOAD request, the TCB's load list is first
searched for an LLE representing a useable copy of the requested module. If found,
the LLE total responsibility count is increased by one. In addition, if the caller is
in supervisor state and/or key 0-7, the system responsibility count is updated. A
separate system count is maintained to prevent a non-system user from deleting a
module loaded by a system routine.
If the load list does not yield a useable copy of the requested module, the
module is located and CDEs are manipulated as explained earlier for LINK. The
final step for LINK processing is the building of the PRB. For LOAD, however, no
PRB is built; instead, an LLE is built and enqueued at the top of the TCB's load
list queue. This LLE points to the CDE (whether it be on the JPQ or the
ALPAQ) of the requested module. The total responsibility count is initialized to
one, and the system responsibility count to zero or, if a system request, to one.

DELETE
Module lEAVLKOO is called by the SVC FLIH at en try poin t IGC009 when the
DELETE SVC (SVC 9) is issued. Since the module to be deleted must have been
previously loaded by the same task, lEAVLKOO searches the TCB's load list queue
for the module. If it is not found, the program manager exits with a return code of 4.
If the module is found, the total responsibility count in the LLE is decreased by
one. The system responsibility count is also decreased by one if the DELETE was
issued by a system program. Finally, the use count in the CDE is decreased by one.
The LLE is dequeued and freed if the total responsibility count goes to zero. If
the use count in the CDE also goes to zero, routine CDHKEEP in module
IEAVLK02 is called. This routine frees the CDE and all its minor CDEs, the
associated extent list, and the module itself. Once control is returned to
IEAVLKOO, the program manager exits.

Exit Resource Manager
Module lEAVLK02 is called by the exit prologue at entry point IEAPPGMX whenever a PRB exits. The purpose is to clean up the program resources that were being
used by the PRB. First, the program manager decreases by one the use count in the
CDE being used by the PRB.

SectionS. Component Analysis

5.3.11

Program Manager (continued)
If the module is serially reuseable, and there are SVRBs suspended on the CDE's
SSQ, the top SVRB is posted so it can begin using the module.

If the CDE's use count goes to zero, then the CDE, all its minor CDEs, the
extent list, and the module itself are freed. When the module is freed (by subroutine CDHKEEP) it is freed from:
•

Sub pool 0, jf bit CDSPZ is 1

•

Subpool 251, if bit CDSPZ is 0 and bit CDJPA is 1

•

Subpool 252, if bit CDSPZ isO and bit CDJP A is 0

(See the discussion of "Module Sub pools" later in this chapter.)
If the exiting PRB is the last in the rCB's RB queue, lEAVLK02 also does endof-task clean up. This consists of cleaning up and freeing all LLEs remaining on the
TCB's load list queue.

SYNCH
Module lEAVLKOO is called by SVC FLIH at en try point IGCO 12 when the
SYNCH SVC (SVC 12) is issued. SYNCH essentially uses the tail end of LINK
processing to build and enqueue a PRB for the user exit. No module searching,
CDEs, LLEs, etc. are involved.

IDENTIFY
Module IEAVIDOO is called by the SVC FLIH at entry point IGC041 when the
IDENTIFY SVC (SVC 41) is issued.
IDENTIFY builds a minor CDE for the requested name and entry point. The
CDE is enqueued on the JPQ or ALPAQ following the major CDE that represents
the module containing the entry point. One exception to this is if the requestor is
not authorized (not supervisor state, not in a system key, and not executing in an
APF-authorized step) and the embedded entry point is in a module from an APFauthorized library. In this case, for integrity reasons, a major CDE for the
embedded entry point is built and enqueued on the JPQ. Since the CDE is
initialized to represent the module as not coming from an authorized library, no
authorized user is allowed to use this user-defined entry point.
Module lEAVIDOO also accommodates as/LOADER with special processing.
When as/LOADER issues the IDENTIFY SVC, it has loaded a module into subpool 0, built an extent list, and now wants to be represented by a major CDE and
extent list built and enqueued on the JPQ. This request is called a "major request"
and is indicated when Register 0 contains 0 upon entry to IEAVIDOO. Register 1
contains a pointer to the module name and extent list.
Figure 5·13. illustrates CDE initialization by IDENTIFY.

5.3.12

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Program Manager (continued)
Normal Request

COE Field

Normal

Non-authorized
requestor and
module from an
APF-authorized
library

COCHAIN

(behind major)

(top of JPQ)

(top of JPQ)

CORRBP

zero

zero

zero
as per input

Major Request

CONAME

as per input

as per input

COENTPT

as per input

as per input

as per input

COXlMJP

• major COE

zero

+Xl (at end of CO E)

. couse

zero

zero

zero

CONIP

as in major coe

0

0

CONIC

0

0

0

COREN

1

as in major coe

0

COSER

1

as in major COE

0

CONFN

0

0

0

COMIN

1

0

0

COJPA

0

1

0

CONlR

1

as in major coe

1

COSPZ

0

0

1

COXle

0

0

1

CORlC

0

0

0

COOlY

0

1

0

COSYSLIB

0

0

0

COAUTH

as in major coe

0

0

Figure 5-13. CDE Initialization by IDENTIFY

ABEND Resource Manager
Module IEAVLK02 is called by RrM at entry point IEAPPGMA under two
circumstances: when a rCB is going to abnormally terminate; and when a program
manager SVRB is going to be forced through exit processing because of a recovery
retry.
When IEAVLK02 is called, its function is to clean up CDE SVRB suspend
queues. If the current rCB has any program manager S VRB on an SVRB suspend
queue for any CDE on the lPQ, the SVRB is dequeued. Furthermore, when a rCB
is going to abnormally terminate, if any CDE on the lPQ has the CDNIC bit on and
a program manager SVRB on the abending rCB's RB queue is responsible for
fetching the module into virtual storage, all other SVRBs waiting for the module
are posted and the CDE is dequeued and freed.

Section 5. Component Analysis

5.3.13

Program Manager (continued)

806 ABEND
If the program manager cannot locate a load module in response to a LINK,
ATTACH, XCTL, or LOAD request, it issues an 806 abend. Two key areas in
the program manager should be understood if an unexpected 806 abend occurs
or if the program manager uses a copy of a module that was not anticipated. These
are (1) the module search sequence or search order and (2) the criteria used in
determining whether or not a module already in virtual storage is useable.
1.

Search Sequence
For a LOAD request,CDEs located on the task's load list queue are first
searched for a useable module. If this search fails, the search sequence for
LOAD is then the same as it is for LINK, ATTACH, and XCTL.
The search sequence for LINK, ATTACH, XCTL, and LOAD (if the LLE scan
is unsuccessful) is shown in Figures 5-14 and 5-15.

2.

Useability Criteria
When searching for a module, the program manager looks for a CDE already
enqueued on the JPQ or ALPAQ with a CDNAME the same as that of the
requested name. If a matching name is found and the CDE is on the ALPAQ,
the module is immediately available to the requestor because all these CDEs
represent modules that are reentrant and from APF-authorized libraries. If
the CDE is on the JPQ, however, certain tests have to be made to determine if
the module represented by the CDE can be used by the requestor. The routine
CDALLOC (CDE Allocation) performs this testing. The CDE with the
matching name is the input to CDALLOC. Output is a return code indicating
the useability of the associated module. Figure 5-16 describes tests and actions
taken by CDALLOC.

APF Authorization
'The program manager performs two APF-related functions. The first function determines whether or not ajob step is APF-authorized when the job step TCB is attached.
The second function prevents any authorized program from accessing, via LINK,
ATT ACH, XCTL or LOAD, a module that is not from an APF-authorized library.

5.3.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Program Manager (continued)

Note: For XCTL, if caller is
an SVRB, only the ALPAQ
and PLPAD are searched

Search CDEs
via TCB's
Load List
Queue

Search JPQ

No

See Figure 5-15

Order of COEs on ALPAQ
Search
Private
Libraries

First: Modules activated from the
pageable LPA - newest modules
first

No

Second: Modules listed in
I EALODOO - in inverse order of
specification in list

Search ALPAQ

Search PLPAD

Search
SVCLlB

Third: Modules in fix lists in
inverse order of specification in
lists. Lists also in inverse order of
their specification
Fourth: Modules in MLPA lists - in
inverse order of specification in
lists. Lists also in inverse order of
their specification

Yes

Search LlNKLIB

Figure 5-14. Module Search Sequence for LINK, AIT ACH, XCTL and LOAD

Section 5. Component Analysis

5.3.15

Program Manager (continued)

Yes

No

No

No

Yes

Yes

Search
Library Via
Given DCB

Z Byte

Z Byte> 1.

Search the Parent
Job/Step/Task
Libraries Via
Z Byte

=1

Search All
Job/Step/T ask
Libraries

Search Library
Via Given DCB

Z Byte:::i 1

Z Byte> 1

Search Current
Job/Step/T ask
Libraries

Conti nue Search
with the ALPAQ

Figure 5-15. Module Search Sequence of Private Libraries

5.3.16

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Search the Parent
Job/Step/Task
Libraries Via Z
Byte

Program Manager (continued)

Type of Request

COALLOC
Return
Code

Module Condition via the Input COE

8

Module from non APF-authorized library

Requestor is
Authorized*

From APF-authorized library

-

= same as non-authorized request

Module being fetched (CDNIC

= 1)

16
Reentrant or serially reuseable

LOAD

Module in
Storage
(CDNIC = 0)

No Load
Restrictions
(CDNLR = 1)

Fetched by Program Manager
Nonreusable

Load
Only

Requestor is
Non-authorized

4

Loaded by
as/LOADER

0

USECT =0

4

USECT )0

0

Reentrant or serially reuseable

4

Non-reuseable

0
16

Module being fetched (CDNIC = 1)
Reentrant (CDR EN = 1)

LINK
ATTACH
XCTL

Module in
Storage
(CDNIC =0)

No Load
Restrictions
(CDNLR = 1)

Serially
Reuseable

Nonreuseable

4

In use (CDRRBP +0)

12

Not in use (CDRRBP = 0)

4

Used (CDNFN = 1)

8

Never used (CDNFN = 0)

4

Load only (CDNLR = 0)

406
ABEND

0: Module not available via JPO
4: Module is immediately available
8: Module not available - continue JPO search
12: Module not immediately available - suspend requestor
until module is no longer in use
16: Module not immediately available - suspend requestor
until fetch is complete
*In supervisor state, in system key, or as part of an
APF-authorized step

Figure 5.16. CDE Allocation

Section 5. Component Analysis

5.3.17

Program Manager (continued)
1.

Establishing APF Authorization

An APF-authorized job step is executing if bit JSBAUTH is on in the JSCB.
This bit is turned on by the program manager if the following conditions exist
when LINK is called by ATTACH:
•

It must be a job step ATTACH. The program manager considers it a job
step ATTACH if field TCBJSTCB in the attached TCB points to itself and
if there is a JSCB for the step indicated by a non-zero TCBJSCB field.

•

The load module being attached must have been link edited with an APF
authorization code of 1. This is indicated to the program manager when
bit PDSAPF is on in the module's directory entry.

•

The load module being attached must be from an APF-authorized library.
This is determined by FETCH and indicated to the program manager by
bit WKA UTH being on in the FETWK.

In summary, a job step is APF -authorized if the first program executed in the
step is both from an APF-authorized library and is link edited with an APF
authorization code of one.
2.

306 ABEND

An authorized program is one that is executing in supervisor state, or with a
system protect key (0-7), or as part of an APF-authorized job step. An
authorized program must LINK to, ATTACH, LOAD and XCTL to modules
exclusively from APF-authorized libraries. The program manager issues an
abend code of 306 if the only useable copy of a module requested by an
authorized program is on a non-APF-authorized library.
When a load module is fetched into virtual storage, FETCH indicates to the
program manager via the FETWK bit, WKAUTH, whether it is (bit on) or is
not (bit off) from an APF-authorized library. If the requested module is
already in virtual storage, the program manager determines whether or not it is
from an APF-authorized library by examining the CDE bit, CDSYSLIB. If it
is on, the module can be used by an authorized program.
Bit CDSYSLIB = 1 if the associated module is from an APF-authorized library
except in the following cases:

5.3.18

•

The bit = 0 if the module is reentrant but is still fetched into subpool 251
because of TSO TEST considerations (see the following discussion on
"Module Subpools").

•

The bit = 0 when IDENTIFY creates a major CDE because a nonauthorized user indicates an embedded entry point in a module from an
APF-authorized library.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Program Manager (continued)

Module Subpools
All modules represented by CDEs on the ALPAQ are loaded into the pageable LPA,
the fIXed LPA, and the modified LPA. These modules are never freed.
Modules represented by CDEs on the JPQ however, are freed by the program
manager and can occupy storage in subpool 0, subpool 251, and subpool 252.
Modules loaded by the OS/LOADER always use subpool O. This is indicated by
bit CDSPZ being set to one.
When bit CDSPZ is zero, modules fetched by the program manager use sub pool
251 if bit CDJPA is set on orsubpool 252 if bit CDJPA is set off.
Reentrant modules from APF-authorized libraries are always fetched into subpool 252 if the requestor is a system program (a program in supervisor state or with
a system key). Reentrant modules from APF-authorized libraries requested by nonsystem programs are also fetched into subpool 252 provided TSO test (TCB bit
TCBTCP=O) is not running. All other modules are fetched in to subpool 251.

FETCH/Program Manager Work Area (FETWK)
Module IEAVLKOI obtains the FETWK (mapped by DSECT IHAFETWK) from
subpool253 when a load module is to be fetched into virtual storage. A pointer to
the FETWK is placed in RBCSWORK of the program manager SVRB. FETWK is
logically divided into three areas. The first area, up to but not including field
WKADDR, is used exclusively by FETCH as a work area. The second area, up to
but not including WKPREFX, is used as a work area by the program manager.
Field WKIOADDR and bits WKAUTH and WKSYSREQ in this area are also
addressed by FETCH, as follows:
•

WKIOADDR is always set to zero by the program manager. This instructs
FETCH to do the GETMAIN for the load module.

•

WKAUTH is set to one by FETCH to tell the program manager when a load
module is from an APF-authorized library.

•

WKSYSREQ is set to one by the program manager to tell FETCH that the
requesting program is in supervisor state and/or system key. FETCH uses this
indication to bypass DEB checking.

The third area of the FETWK, .starting with WRPREFX, is the BLDL area. This
area contains the directory entry used by FETCH to obtain the requested module.
The directory entry is placed there by the program manager either by moving a
caller-supplied directory entry into the area or by issuing a BLDL.

Section 5. Component Analysis

5.3.l9

Program Manager (continued)

RB Extended Save Area (RBEXSA VEl
The 48-byte extended· save area (RBEXSAVE at RB+ X'60') of the program
manager SVRB is used by the program manager as a work area. This area, along
with the FETWK, is a key area in analyzing program manager problems.
RBCSNAME (at RB+X'60') contains the module name requested by the caller, and
RBCSDE (at RB+X'68') points to a copy of the caller-supplied directory entry if
one was supplied. If EP or EPLOC is specified, then this pointer is zero. Other
key areas of the extended save area are RBCSWORK (at RB+X'74'), which points to
the FETWK if FETWK was obtained, and bit RBCSSYSR (RB+ X'70' = X'40'),
which is on if the caller is a system program.

5.3.20

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VSM

Virtual storage management (VSM) is responsible for allocating virtual storage,
keeping track of what is allocated and, for certain subpools, recording to whom it is
allocated. These services are performed both for the system and problem programs.
(See Figure 5-17.)
The following are the five basic functions that VSM performs:

Function

Module
Performing
Function

Nucelus initialization (NIP)

IEAVNP08

IPL parameters are: SQA=,
CSA=, REAL=, VRREGN=

Address space initialization

IEAVGCAS

Called by memory create

Step initialization/termination

I EAVPRTO

GETP ART /FREEPART

Virtual storage allocation

IEAVGMOO

GETMAIN/FREEMAIN

Cell pool management

l

IEAVGTCL
IEAVFRCL
IEAVBLDP
IEAVDELP

Comments

GETCELL (get cell)
FREECELL (free cell)
BLDCPOOL (build cell pool)
DELCPOOL (delete cell pool)

The nucleus initialization program (NIP) is not discussed in this book. The
remaining VSM functions, and the related FRRs (functional recovery routines),
are described in the following topics:
• Address Space Initialization
• Step Initialization/Termination
• Virtual Storage Allocation
• VSM Cell Pool Management
• Miscellaneous Debugging Hints

Section 5. Component Analysis

5.4.1

VSM (continued)

64K boundary

.....

-..

Cannot be relea sed
via FREEMAIN
4K boundary
64K boundary

SP 245 - Key 0, not fetch protect
SQA

r

These can be
intermixed
-<
on a 4K bas is

...

Common
Area

LPA
SP 231/241/227/228/239
CSA

~

I--I-

LSQA

SP 253 SP 254 -'SP 255 -

SWA

SP 236, SP 237 - Key 1, not fetch protect, pageable

U keY_l

SP 229 - User key, fetch protect, pageable
SP 230 - User key, not fetch protect, pageable

'-

Key 0, not fetch protect, not pageable
AQE for task
Key 0, not fetch protect, not pageable
AQE for any job step task
Key 0, not fetch protect, not pageable

,

User Area
(Private
Apdress
Space)

These are not allowed to cross.

These can b
intermixed

64K boundary

I

e{

Top of SP 0-127,251,252 is CURRGNTP
in contro.1 block local data area - (LOA)

SP 252 - Key 0, not fetch protect, pageable
SP 251 - User key, fetch protect, pageable
SP 0 - SP 127 - User key, fetch protect, pageable

-..

SYSTEM REGION
16K chained out of RCT's TCa (TcaPQE at Tca

5.4.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

I-

I-

+ X'98')

NUCLEUS
(FREEMAIN cannot be issued for the NUCLEUS)

Figure 5-17. Virtual Storage Management's View ofMVS Storage

-

J

System
Area

VSM (continued)

Address Space Initialization
The create address space module (IEAVGCAS) initializes the VSM address space.
lEAVGCAS creates the local data area (LOA). The LOA is the key anchor block
and VSM save area for space allocation within an address space. lEAVGCAS builds
all the blocks labeled "C" in Figure 5·18. lEAVGCAS also builds the initial
allocation of 16·byte LSQA elements for VSM's local cell pool. GETMAIN then
allocates VSM's internal control blocks from this pool.
lEAVGCAS also contains VSM's task termination and address space
termination resource managers. The task termination routine frees all non-shared
space anchored in the TCB. (See Figure 5-18). The address space termination
routine frees any WAIT or POST elements of a V=R (virtual=real) job that are
associated with the terminating address space and are chained out of VSM's GOA
(global data area). These V=R WAIT/POST elements exist only when ajob is
waiting for V=R address space.
lEAVGCAS's functional recovery routine (FRR) is lEAVCARR. IEAVCARR
is divided into the following three sections, corresponding to those of lEAVGCAS.
1.

Entry lEAVCARR, which protects the create address space portion of
IEAVGCAS.

2.

Entry IEAVTTRR, which protects the task termination portion of
IEAVGCAS.

3. Entry lEAVF ARR, which protects the address space termination portion of
IEAVGCAS.
lEAVCARR does not place data in the variable recording area of the SOW A
(STAE diagnostic work area). It does invoke SOUMP and either retries at the
beginning of the function that detects the error or continues with termination.

Section 5. Component Analysis

5.4.3

VSM (continued)

Current ASCB

I

ASCBLDA

h

I
SPQEfO/

V

LSOA

~IJ

~LDA

J

V

DOE
chain
(C)

(C)

FOE
chain

V

/

po.

(C)
8-byte FOEs

-

~

(C)

~_LS_O_A_P_T_R_---tlJ

_________

LDASRPOE
(Release 3 only)
VSM's pool of 16-byte
LSOA cells for VSM's
internal control blocks

-~

\

_AS_D_PO_E_ _
LCLCELL
....

..310.

.Dummy POE

(~ 8
0

_....L.._--

POE

TCBSWA
TCBMSS
TCBUKYSP

\.

address
space

FBQE
chain

------(C)

/

V

FBOE
chain

(C)

TCBAOE
TCBPOE

except in
(C)
Master
Scheduler's
address space) _

~~-------~
not V=R jo
POE for

Current TCB

(

V

POE for system
region (16K

.Jio.

/

(C)

-

exists only for V=R job
I

POEforV=R I ~
user region
~V

-

FBOE
chain
po.'""
J--------'

I

SPOE chain
subpools 236
and 237 (SWA)

AOE chain

built on V=R GETPART

I

AOE
chain*

,

SPOE chain
sub pools 0-127,
251, and 252

po.

-

SPOE chain
subpools 229

r-I

and 230

~

____________
I

...- Ipo. I-

'---------

DOE
chain

/
*AOEs will be for SP 254 for a job step task or
for SP 253 If not a Job step TCB
C = built by IEAVGCAS
Note: Updates of all control blocks and queues in this figure are serialized by the local lock.

Figure 5-18 .. Virtual Storage Management's Control Block Usage

5.4.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

FOE
chain

'-------.-~
16-byte FOEs

~

VSM (continued)

Step lnitializationtrennination (lEA VPRTO - GETPART /FREEPART)
IEAVPRTO is invoked by lEAVGMOO (GETMAIN/FREEMAIN) via a branch entry
as a result of a GETMAIN/FREEMAIN request from an initiator for subpools 242
(V=R) or 247 (V=V). IEAVPRTO does not return to lEAVGMOO; it returns
directly to the initiator.
IEAVPRTO performs four functions, as follows:
1.

For a V~V GETPART request:
Sets TCBPQE to point to the dummy address space partition queue
element (PQE) that was created at address space initialization time.
Calls the initiator exit routine IEALIMIT in order to establish the
LDALIMIT which is the value used by GETMAIN as an upper limit for
problem program subpool GETMAIN's requests. The OS/VS2 System
Programming Library: SupenJisor contains a discussion on LDALIMIT,
REGION=, and variable GETMAIN requests.

2.

For a V=R GETPART request:
Performs IEALIMIT processing as described above.
Attempts to obtain V=R space by interrogating V=R FBQEs chained from
the GDA.
• If unsuccessful

- Creates V=R wait element
- Chains the V=R wait element from the GDA
- Indicates the V=R wait element is waiting for
space

• If successful

- Interfaces with RSM's (real storage manager)
IEAVEQR to obtain real frames
- Builds V=R dummy PQE, V=R PQE, and V=R
FBQEs, and updates TCBPQE

3.

For a V=V

FRE~PART

request:

Frees all task-related space by calling FREEMAIN, and freeing one subpool at a time.
Frees SPQEs and task-related subpools.
Sets TCBPQE=O.
4.

For a V=R FREEPART request:
Performs the same functions as for V=V FREEPART.
Returns space to V=R FBQEs chained from the GDA.
Attempts to satisfy V=R waiting requests by posting the waiting initiator.
The waiting initiator reissues the request; lEAVPRTO will move the
WAIT elements to the POST queue anchored in the GDA. This POST
element is freed by GETPART when the initiator's new request is
received.

Section 5. Component Analysis

5.4.5

VSM (continued)
lEAVPRTO's FRR, lEAVGPRR, does not use the variable recording area of the
SDWA. It attempts a retry back into lEAVPRTO based on footprints set in the
FRR's six-word parameter area. Refer to the IEAVPRTO code (microfiche) for
a detailed description of this FRR parameter area.

Virtual Storage Allocation (lEA VGMOO - GETMAINJFREEMAIN)
lEAVGMOO satisfies all GETMAIN requests for virtual storage. The control block
structure it uses is shown in Figures 5-18 and 5-19. All GETMAIN processing for the
private area subpools and all associated control blocks are serialized by the local
lock. All common area sub pools and associated control blocks are serialized by the
SALLOC lock.
A detailed process flow through GETMAIN for a virtual storage allocation request
can be found in Appendix A in the GETMAIN/FREEMAIN process flow description.
In debugging GETMAIN, the most important information is contained in the
original request for virtual storage. This information can be obtained from the
trace table, from a parameter list passed by the problem program code, or sometimes from the LDA (local data area).
The LDA cannot always be relied upon to provide information about the request
unless the system is stopped immediately. Unless you insert a code or SLIP trap
and take a stand-alone dump on error, the LDA is overlaid by another request
during the dumping procedure.
If an immediate stop has been obtained upon encountering an error, the most
useful information in the LDA is obtained from the flags in the LDARQST A
(LDA + X'10') field. The LDARQSTA contains the current request status.
Compare the flags in this field to the request and determine if the two are
consistent. Then check through the control block chain, the LDA and GDA chains
that are set up when subpools are requested to find out why the abend or error
occurred.
LDARQSTA (LDA+X'lO')

Offset

o

1. ..
· 1. .
· . l.

· .. 1 ....
1. ..
· 1. .
· . 1.
· .. 1

1. ..
· 1. .
· . 1.
· . .1
1. ..

· 1..
· .1 .

· . .1
2
3

4096-byte Request from Subpool 253/254
An AQE is needed
Fetch Protected Subpool
Authorized User Key Subpool
SWA Subpool
LSQA Sub pool
CSA Subpool
SQA SUbpool
SVC Number

· . 1.
· . .1

5.4.6

Subpool 254 Requester has TCBABGM on
Explicit V=V Region reached
Variable Request, Pass 2
SQA or LSQA Expansion
Deferred Error I/O Flag
FMAINB or MRELEASR Request
GETMAINB Request
Branch Entry

Subpool FREEMAIN
Supervisor Mode. (if zero)

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VSM (continued)
Note: The GOA is always at the very end of SQA;
X'FFFFC8' in Release 2, X'FFFFCQ' in
Release 3.
GOA
CVTGOA
CSAPQEP
VRPQEP
PASTART
private area
start and size

PASIZE
SQASPQEP

SQA space l e f t , - - - - - - f SQASPLFT
includes CSA
VRPOSTQ
ava i lab Ie for
VRWAITQ
SQA expansion
PFSTCPAB
CSASPQEP

SQA
SPQE

GLBLCELL
GBLCELCT
count of
free cells

V=R virtual is
now available;
the initiator is
posted to re-issue
the request.

V=R post
element

jobs waiting
for V=R
virtual space

V=R wait
element

permanent cell
pool CPABs

CPAB
CPAB
CPAB
(CPAB Table is shown
in Figure 5-21.)

CSA
, SPQE

(Release 3 only)
VSM's pool of
,1 6-byte SQA
cells for VSM's
internal control
blocks

Note:

Updates of all the control blocks and queues in this figure,
except PFSTCPAB, are serialized by the SALLOC lock.
PFSTCPAB is read only after NIP.

Figure 5-19. Virtual Storage Management's Global Data Area (GDA)

Section 5. Component Analysis

5.4.7

VSM (continued)

GETMAIN's Functional Recovery Routine - lEA VGFRR
Whenever GETMAIN's FRR (lEAVGFRR) fmds an error in a queue, an entry is
made in the SDWA variable recording area, SDWAVRA (SDWA + X'194') to
indicate the error that has been found, its location, and the corrective action taken.
Each entry is added to the next available location and the length of the usersupplied data is increased (fieldSDWAURAL, SDWA + X'193'). The high-order
byte (byte 0) of the first word contains FF if the space in the variable
recording area was exceeded and data entries were lost. The low order byte (byte
3) of the first word contains a digit indicating the type of error that caused
lEAVGFRR to get control. This digit comes from FRRBRNDX (branch index
FRR) in the LDA where it is set up by IEAVGMOO. FRRBRNDX is X'lF' into the
GETMAIN/FREEMAIN work area (GMFMWKAR); GMFMWKAR is the portion
of the LDA that is used by lEAVGMOO as a work area. It is mapped at the end of
this chapter.
The second word of the SDWA variable recording area contains the LDA field
LDARQSTA at the time of error. The contents ofLDARQSTA are described in
the previous topic "Virtual Storage Allocation (IEAVGMOO - GETMAIN/
FREEMAIN)" .
The next 16 words usually contain the registers (ordered 0-15) at the time
IEAVGMOO was entered. These registers are useful for debugging problems that
occur on branch entry requests. Register 14 contains the caller's return address.
The remaining SDWAVRA entries consist of one to three words each. The first
word of each entry will have a code in the high-order byte and a data length in the
low-order byte. If this length is 0, there is no additional data for this entry." A
length of 4 or 8 indicates one or two additional words of data. These data words
always contain the address of the affected field or control block. These are all
shown in the table in Figure 5-20. Control blocks are checked to determine if they
are in the correct subpool, for example, SQA or LSQA; queues are checked for
validity.

5.4.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VSM (continued)

Code

Data
Length

Data Addresses
First
Second

Explanation

01

4

PVTLCSA

PVTLCSA is incorrect - it will remain
unchanged.

02

4

PASTRT

PASTRT in GOA is incorrect - it is
reset using PVT.

03

4

PASIZE

PASIZE in GOA is incorrect - it is
reset using PVT.

04

0

05

4

PVTHQSA

PVTHQSA is incorrect - it will remain
unchanged.

06

4

Bad TCB
pointer

TCB pointer is not valid - no queue
repairing is attempted.

B1

4

SPQE with
bad pointer

Next SPQE pointer is incorrect - the
SPOE queue is tru ncated (by setting
the bad pointer to zero).

B2

4

SQA SPQE

SQA SPQE flagword is incorrect - it
will remain unchanged.

B3

4

LSQA SPQE

LSQA SPQE flagword is incorrect - it
will remain unchanged.

B4

4

SPQE

Next SPQE pointer = 0, but last SPQE
flag is not on - the last SPQE flag is
set on.

e1

4

Cell with
bad pointer

Next cell address is incorrect - the
cell pool chain is truncated.

01

4

SQA SPQE

First SQA OQE pointer (in SPQE) is
incorrect; it points outside SQA so all
of SQA may be lost - it will remain
unchanged.

02

8

OQE with bad
pointer

03

-4

OQE

04

8

Current
OQE

Overlapped
OQE

Current OQE area overlaps a previous
DQE - current OOE is removed from
queue.

05

8

Current
OQE

Previous
OQE

Circular OQE queue - queue is
truncated after previous OQE.

06

4

OQE

First SQA OQE area address or length.
is incorrect - address and length are
corrected.

F1

4

FQE

Incorrect type flag in FQE - the flag
is corrected.

F2

4

OQE or FBQE
with bad
pointer

Next FQE pointer is incorrect (if SQA
or LSQA, then previous FQE length
could be too large) - FQE queue is
truncated (by setting the bad pointer
to zero).

F3

4

FQE

I ncorrect (too long) length in FQE FQE queue is truncated.

All three sources of CSA start addresses
(GOA, PVT, CSA, PQE) disagree - no
change will be made.

Bad OQE
address

OQE pointer is not in SQA or LSQA OQE queue is truncated (by setting
the bad pointer to zero).
OQE area address or length is incorrect
- this OQE is removed from the queue.

Figure 5-20. SDWAVRA Error Indicators

Section: 5. Component Arialysis

5.4.9

VSM (continued)

VSM Cell Pool Management
VSM's cell pool management consists of the following functions:
Module

Macro

Function

IEAVGTCL

GETCELL

Gets a cell from a preformatted pool of cells

IEAVFRCL

FREECELL

Frees a cell to a preformatted pool of cells

IEAVBLDP

BLDCPOOL

Builds a pool of formatted cells

IEAVDELP

DELCPOOL

Deletes a pool of formatted cells

The key to the VSM cell pool management function is the cell pool anchor block
(CPAB). The layout of the cell pools is shown in Figure 5-21. Note that the
permanent cell pools have IDs that are negative, for example, RSM01
(X'D9E2D401'), while the dynamic cell pools have IDs that are the address of the
first CPAB divided by 4 (shift right 2).
The four VSM cell pool management modules are small enough that processing
can be determined from studying the CPAB mapping along with the code.

Miscellaneous Debugging Hints
1.

I

Most common problems with GETMAIN/FREEMAIN occur in the interface
processing. There is usually a bad TCB or ASCB address or the local lock is
not held upon entry. The ASCB is used only to find the LDA which is in the
:same location in all address spaces except the master scheduler's.
Note: On a branch entry to GETMAIN, register 7 contains the address of the
ASCB; however, on return from the branch entry, register 7 no longer
contains this address.

2. You can catch callers who do not hold the local lock on entering GETMAIN
within routine CSPCHK. To do this, test for all of the following conditions:
•

Not NIP (CVTNIP)

•

Not GLBRANCH entry (SALHELD flag offset X'1E' in LDA)

•

Not GETMB or FREEMB (offset X'lO' in LDA)

•

Local lock not held (pSAHLHI)
Not in the address space creation process (ASXBTCBS not equal to 0)\

I.

5.4.10

3.

A valid GETMAIN/FREEMAIN that is issued for zero bytes is treated as a
no-op.

4.

The routine CSPCHK is a good place for a GETMAIN/FREEMAIN trap to
occur because CSPCHK is called for every request during the beginning of
lEAVGMOO processing.

OS!VS2 System Programming Library: MVS Diagnostic Techniques

VSM (continued)

Dynamic pools

CVT

X'230'

CVTGDA,
cells

cells

GOA

X'30' PFSTCPAB

/

A-----r-t

//

//
/
/
/
.

Permanent CPAB (Cell
Pool Anchor ilock) Table
4 pools are currently
in use:
SRBOO
RSM01
RM103
RT104
. Pointer to next CPAB
; for this pool

JIo.._-...ii..-~--I

Contains CPID when
cell in use

~____i..-_....... Pointer

to next
,available cell

Figure 5-21: VSM Cell Pool Management

Section 5•. Component AJlalysis

5.4.11

VSM (continued)

5.

GETMAIN makes few queue manipulation errors. If lEAVGMOO rejects a
request, it is usually because the caller made an error on the interface; the
message lEA 7001 is issued.

6.

Subpool queue elements (SPQEs) are not freed even on a subpool FREEMAIN,
and multiple keys for a problem program sub pool on the same rCB are not
allowed. This can result in a problem if a user changes TCBPFK. The
following is an example of such a situation:
Set TCB key

TCBPFK=6
GETMAIN SP I
FREEMAIN SP I

Change rCB key

TCBPFK=8
GETMAIN SP 1

Causes SPQE to be built, storage in
key 6
SPQE for SP I is not destroyed

lEAVGMOO uses old SPQE and assigns
storage in key 6

7.

On branch entry to GETMAIN, IEAVGMOO saves registers at field
BRANCHSV in the LDA and turns on the BRENTRY flag at offset X'10' in
LDA. To be sure these sayed registers are for the request in question, it is
necessary to stop the system via a trap. (See "Using the SLIP Command" and
the "System Stop Routine" topics in the chapter "Additional Data Gathering
Techniques" in Section 2.)

8.

Since the GDA occupies the last few bytes of storage, a random store at -4 or
-8 overlays the GDA.

9. MVS has added a new register type GETMAIN/FREEMAIN SVC and branch
entry. It is SVC 120. The parameters differ from those of SVC 10 as follows:
Register 1

Zero for a GETMAIN; address to be freed for FREEMAIN.

Register 15
(SVC only)

Bytes 0 and 1: Reserved, set to O.
Byte 2: Subpool ID
Byte 3: Following flag values:
Bits 0-4
Bit 5 = 0
=1
Bit 6 =0
=1
Bit 7 =0
=1

14

5.4.12

Reserved, set to 0
Double word boundary
Page boundary
Conditional request
Unconditional request
GETMAIN
FREEMAIN

For the branch entry SVC 120, register 15 contains the entry point address
and register 3 is used for the parameters. Register 3 is set up the same as
register 15 above with one exception: Byte 1 is the protect key for subpools
227-231 and subpool 241. The address that was obtained via GETMAIN is
returned in register 1 as in SVC 10.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VSM (continued)
10. Some GETMAIN failures are recorded in an information list contained in the
nucleus. This list is similar to a trace table and is pointed to by the CVTQMSG
(CVT + X'10C'). Each entry contains data such as: ABEND code, ASCB
address, TCB address, register 14 ifGETMAIN was branch entered, and the
parameters passed. Refer to the DSECT INFO LIST in module IEAVGMOO for
additional information.

GMFMWKAR (IN LDA AT + X'18')
OFFSET
IN HEX (FROM START OF LOA)
18~--------------------------------------~

ABNDATA(VAR.DATA)

1C~--------~--------~--------~~------~

MSGLEN

I

FREESW

ILOCKFLAG I FRRBRNDX

20~--------~--------~----------~------~

REGSAVE SAVE AREA USED
FOR SRM AND RSM
(18 FULL WORDS)
MSAVE SAVE AREA USED
FOR MRELEASE
(16 FULL WORDS)

-,....
E8

GNOTSAVE SAVE AREA USED
FOR GNOTSAT
(16 FULL WORDS)

-.....

I

GFSMFSVE/CSPCKSAV
(SMF CORE ROUTINES SAVE AREAl
CSPCHK SAVE AREA)

-.....

MPTRS
(PREVIOUS AND NEXT PTRS SAVE AREA)
DUMYDOE
(DUMMY DOE FOR L/SOA EXPANSION)

JL

.....

11~r_______________(_4_F_U_L_L_W_O__R_D_S_)______________Jr~

J:

_
121
-I'-

134

140
144
14C
150
154

-...

TEMPDOE
(TEMPORARY DOE FQR FMCOMMUN)
(4 FULL WORDS)
DUMFBOE
(DUMMY FBOE FOR MRCLEASE)
(4 FULL WORDS)

FREESW·
X'80'
FREEMAIN IN PROGRESS
X'40'
LENGTH HAS BEEN INCREMENTED
X'20' ADDRESS HAS BEEN DECREMENTED
X'10' NOT 1ST DOE (FOR L/SOA)
X'08' FQE WAS BELOW FREED AREA
X'04'
FURTHER PAGE RELEASE NEEDED
LOCKFLAG
X'02' SALLOC LOCK OBTAINED
X'01' SALLOC LOCK ALREADY HELD

FO

104

REASON CODE AND LENGTH OF VAR.
DATA

-~

LOCKSAVE
(OVERLAPS INTO GFSMFSVE AND MPTRS)

FC

MSGLEN -

r

FRRBRNDX
X'OT SUBPOOL FREEMAIN, AOE AREA NOT
IN DOE
X'06' PAGE RELEASE RETURN CODE OF 1
X'05' SALLOC OBTAIN RETURN CODE NOT
OOR 4
X'04' ON L/SOA EXPANSION, GFRECORE FAILED
X'03'
FINDPAGE RETURN CODE NOT 0 OR 4
X'02' CREATE SEGMENT RETURN CODE>O
X'01' SALLOC RELEASE RETURN CODE >0
X'OO'
UNEXPECTED ERROR,SEE STATUS

-,....

SAV911
SAVE AREA FOR REGS 9-11
(BRANCH ENTRY)
LASTSAVE (LAST LIST ENTRY)
MINMAX
MAX & MIN LENGTH FOR VARIABLE REOUEST
LASTLSTA (LAST LIST ENTRY ADDRESS)
LSTINDEX (INDEX FOR LIST REOUEST)
FMARCAS (PTR TO AREAS TO BE FREEMAINED)

-...
Section 5. Component Analysis

5.4.13

VSM (continued)
OFFSET
IN HEX (FROM START OF LOA)

,-

158 ......

....1"""

I

PARMLDA
15C

CLOPREV (PREVIOUS FOE TO CLOSEST)

160

CLOSEST (CLOSEST INSIZE FOE)

164

LARGESTA (LARGEST AVAILABLE)

168

LARGEST (LARGEST AVAILABLE FBOE)

16C
170
174

SAVESIZE (SIZE OF MULTIPLE OF 4K CORE)
ENDADD (END ADDRESS)
STRTADD (START ADDRESS)

17C

184

188
18C
190

PARMLDA
X'80'
GLOBAL REOUEST (GLBRANCH OR
MRELEASE ON GLOBAL REOUEST)
X'40'
SALLOC LOCK OBTAINED BY GM/FM
X'04'
FIRST FLAG BYTE IN FRR PARM
X'OO'
LOA ADDRESS IN FRR PARM

LENSAVE (SAVE AREA FOR LENGTH LIST PTR)

178
180

FRRPARM (FRR PARM AREA ADD)

DIFF/SAVEPOE (DIFFERENCE/POE PTR IN FBOESPCH)
FIXSTART (STARTING ADDRESS TO CLEAR)
FIXEND (ENDING ADDRESS TO CLEAR)
NOTSATSV (LEN PTR USED BY GNOTSAT)
NOTSATSI (LDAROSTA SAVE AREA FOR GNOTSAT)
SAVESEG (ADDRESS OF MULTIPLE OF 4K CORE)

194

198
19C

LARSOFAR (LARGEST AVAILABLE IN FBOE)
RSTRTADD (ROUNDED START ADDRESS)
RENDADD (ROUNDED END ADDRESS)

1AO
1A4
1AC

VPFP (FIND PAGE ADDRESS TO BE USED)
DOESAVE
SAVE DOE PTR AND PREVIOUS DOE PTR
FMSAVE (SAVE RETURN REG FOR FREEMAIN)

1BO
1B4

CODE1 - (SAVE AREA FOR OPTION CODE)
X'CO' LIST INDICIO.TOR (MIXED IF LIST)
X'20'
CONDITIONAL INDICATOR
X'10'
MASK FOR PAGE BOUNDARY
X'04'
SVC 120 PAGE BOUNDARYREOUEST
X'02'
SVC 120 UNCONDITIONAL REOUEST
X'Ol'
SVC 120 FREEMAIN REOUEST

PREVFOSV (SAVE AREA FOR PREVIOUS FOE PTR)
CLEARSW - (CLEARSW FOR GFRECORE)
X'Ol'
FOECPB INDICATOR ON IN FOE

FOESAVE (SAVE AREA FOR FOE)
lB8
SPOESAVE (SAVE AREA FOR SPOE)
lBC

S\(RLB (SAVE AREA FOR RLB)

1CO

SVSIZE (SAVE AREA FOR ROUNDED SIZE)
1C4
DOESAVE1
SAVE DOE INFO WHEN USING FMAINB

GMFMSW
X'04'
X'02'
X'Ol'

1CC
100

FETCH - (KEY AND FETCH PROTECT)
X'08'
FETCH PROTECT ON

FMSAVE1 (SAVE RETURN REG IN FMAINB)
FOESAVEl (SAVE FOE INFO IN FMAINB)

104
lOB

- (GM/FM SWITCH FOR MRELEASE)
FIRST TIME SW FOR MRELEASE
INDICATES FM FOR FBOE
INDICATES GM FOR FBQE

PREVFOS1 (SAVE PREVIOUS FOE IN FMAINB)
SPOSAVEl (SAVE SPOE IN FMAINB)

1DC

OUTSW - (SWITCH FOR OUT OF REALIVIRT.)
X'OO'
REAL INDICATOR FOR OUTSW
X'FF' VIRT. INDICATOR FOR OUTSW

SVRLBl (SAVE RLB FOR FMAINB)

1EO

SVSIZEl (SAVE ROUNDED SIZE FOR FMAINB)

CODE2 - (SAVE AREA FOR OPTION CODE)

lE4

1E8

.SAVSVTSV (SAVE LDAROSTA IN FMAINB)
FOENXTSV (FOE NEXT SAVE AREA)

lEC

SAVFRESW - (SAVE FREESW IN FMAINB)
SPID - (SPIO FOR MRELEASE)

OLDFOELN (OLD FOE LENGTH)

1FO

1F8
lFC

SEGTEST (END SEG TEST AREA)

CODE1

CLEARSW

GMFMSW

FETCH

OUTSW

CODE2

SAVFRESW

SPID

LSPOEKEY

ROSTRKEY

SAVSPID

SAVSPID2

200

5.4.14

LSPOEKEY - (PROTECT KEY FROM CURRENT SPOE)

NEWFOELN (NEW FOE LENGTH)

lF4

ROSTRKEY - (REOUESTER KEY OR KEY
SAVSPID - (SAVE SPID FOR FREEMAIN)
SAVSPID2 - (SPID FOR MESSAGES)

OS/VS2 System Programming Library: MVS Diagnostic Techniques

= PARM)

Real Storage Manager (RSM)

The real storage manager (RSM) manages the real storage of the system. To do
this, it divides all potentially pageable real storage into 4K-byte frames. Within
RSM, the page frame table entry (PFTE) describes the frame according to type,
current use, or its most recent use.
The current or last state of a request for RSM pageable services is described by
the pagecontrol block (PCB) within RSM: the requestor supplies information
about his request and RSM reformats this data into a PCB. As the request is
processed, RSM adds other internal RSM information to the PCB.
RSM is a queue-driven component. Both PFTEs and PCBs are queued based on
their current state. Simply stated, frames that can be used immediately are queued
on the available frame queue; their PFTEs describe the frame's last use. Similarly,
free request elements are queued on the FIFO PCB free queue; these PCBs describe
the final state of previously processed requests. (This historical nature of PCBs is
often useful in problem analysis.) To manipulate these control blocks and manage
the queues, RSM has a PFTE manager (lEAYPFTE) and a PCB manager
(lEAYPCB). Besides being queued, PFTEs are located in a contiguous table starting
at (PYTPFTP) + (PYTFPFN) and ending at (PYTPFTP) + (PYTLPFN). PCBs,
however, are obtained (via GETMAIN) in groups and are spread out in SQA. They
can be found only by following queue pointers.

Major RSM Control Blocks
RSM's major control blocks are the PFTE, PCB, page table entry (PGTE), external
page table entry (XPTE), paging vector table (PYT), RSM header (RSMHDR), and
swap control table (SPCT). An RSM service routine called find page (lEAYFP)
locates the PGTE and XPTE control blocks. The table in Figure 5-22 lists the
control block functions.
Control Block

Function

PFTE

describes the last use of a frame

PCB

describes the current or last state of a request

PGTE
XPTE

describe the current real frame and virtual page relationship for a
particular virtual address

PVT
RSMHDR

basically these are RSM anchors and work areas

SPCT

related only to swapping, it describes the RSM requirements necessary to
swap in an address space (the swap out process formats the SPCT)

Figure 5-22. Major RSM Control Blocks and Their Functions

Section 5. Component Analysis

5.5.1

Real Storage Manager (RSM) (continued)
Only the leftmost 12 bits of either a real or virtual address are needed to
describe a specific real frame or virtual page (a modulo 4K-bytes real and virtual
addressing scheme). These 12-bit numbers are multiplied by 16 to form block
numbers; for example, VBNO and RBNO are four-digit, hexadecimal, virtual and
real block numbers. Also, note the following:
•

PGTEs contain RBNx values.

•

The contents of PVTPFTP plus RBNO is the address of the PFTE for the frame
whose real address is RBNOOO through RBNFFF.

Of all the RSM control blocks, the most critical are the PCB, PFTE and SPCT.
The important fields in each block are described below. Figure 5-23 shows the
relationship among the blocks.

PGT

PCB

XPT

The contents of PVTPFTP plus
RBNO equal the address of the
PFTE.

AlA

t

XPTE

t

PGTE

RBNO

VBNO

[I--______

-A-IA_R_P_N.....

Figure S-23. Relationship of Critical RSM Control Blocks

5.S.2

OS/VS2 System· Programming Library: MVS Diagnostic Techniques

_---:1_

VBNO

r
'l.

Virtual Page Index

= SSPO

Virtual Segment Index

Real Storage Manager (RSM) (continued)
PCB (Page Control Block)
Important fields in the PCB are:
+0 PCBCQN - indicates the current queue location of this PCB as follows:

X'10' -

PCB is not currently in use. It is queued on the PCB
free queue anchored in the PVT.

X'18' -

PCB is currently waiting for frame allocation to occur.
It is queued on the PCB defer queue anchored in the
PVT.

X'20' -

PCB represents a common area I/O operation. Actual
physical I/O mayor may not be complete. It is queued
on the PCB common-I/O queue anchored in the PVT.

X'88' -

PCB represents a private area I/O operation. Actual
physical I/O mayor may not be complete. It is qu~ued
on the PCB local-I/O queue anchored in the RSMHDR
for the address space indicated by PCBASCB; ASCBRSM
points to the RSMHDR.

X'FF' -

PCB is probably in use. The not-queued state means
only that the PCB is not on the primary forward/backward chain of the above mentioned major PCB queues.
It can be a related PCB, a root PCB, or an associated
PCB.

+8 PCBFL1: PCBSRBMD=X'20' - PCBSRB is the address of a page-fault..
suspend SSRB. The use of this address is
the only means of locating page-faultsuspended SRBs.
PCBROOT=X'04'

- PCBRTCA is the address of a root PCB.
Root PCBs are only valid if their PCBCQN
field is X'FF'.

+9 PCBRTPA - When the PCBROOT bit is on, this contains the address of a PCB
that controls a block page operation.
+X'D' PCBRLPA - The address of a chain of PCBs for the same PCBVBN/
PCBRBN. The related chain of PCBs are dequeued PCBs that
are chained via the PCBRLPA field (not via PCBFQP/
PCBBQP).
+X'10' PCBFL2: PCBRESET=X'10' - The function indicated by the PCB has
been suspended for a page fault because
no frames were available or paging I/O
had to be completed before redispatching
the page faulter. PCBASCB, PCBRTPA,
and PCBSRB define the ASCB, TCB, and
RB to be RESET when PCBSRBMD is O.
When PCBSRBMD is 1, PCBASCB and
PCBSRB define the ASCB and SSRB that
will be RESET.

Section 5. Component Analysis

5.5.3

Real Storage Manager (RSM) (continued)
+X' 11' PCBXPT A

Is either 0 or the address of the XPTE.

+X'lS' PCBPGTA

Is either 0 or the address of the PGTE.

+X'18' PCBRBN

This value when added to the address in PVTPFTP gives the
address of the associated PFTE.

+X'lA' PCBVBN

- This field is often zero; when it is zero, the operation has
either been NOPed with page I/O still in progress or the I/O
is complete and the PCB is only serving a scheduling/tracking
function. The operation is considered to be complete when
PCBVBN=O; no other paging request should be able to
relate to it; that is, it cannot be found via an equal compare
on PCBVBN. When PCBVBN is zero, its previous value can
be determined from the AIARPN field in the AlA. The AlA
is the last 28 bytes of the PCB.

The following information about roots is useful to the problem solver.
• Root PCBs can generally be recognized because most of the PCB is still zero.
• The SPCT points to active roots for SWAP; RSMSPCT in the RSMHDR points
to the SPCT.
• V=R waits for region roots are queued from PVTVROOT in the PVT.
• Vary offline roots are queued from PVTOROOT in the PVT.
• PAGE FIX and PAGE LOAD roots can only be found via PCBRTPA of the
queued FIX/LOAD PCBs.
For non-root PCBs: PCBCQN, PCBFLI , PCBFL2, and PCBFL3 are the key
fields. They describe the current state of the paging request for which the non-root
PCB was last used.

5.5.4

OSjVS2 System Programming Library: MVS Diagnostic Techniques

Real Storage Manager (RSM) (continued)

SPCT (Swap Control Table)
The SPCT is mapped in modules IEAVSOUT, IEAVSWIN, IEAVCSEG, and
IEAVITAS. Space for the SPCT is obtained via GETMAIN and is initialized in
IEAVITAS. As segments are created, IEAVCSEG updates the SPCT. IEAVSOUT
initializes the SPCT with the pages that make up the working set (such as, LSQA and
fixed pages plus recently referenced pages). IEAVSWIN uses the information
IEAVSOUT put in the SPCT in order to start up a previously swapped-out address
space.
The first portion of the SPCT contains the _address of the swap root PCB
(SPCTSWRT); the number of fixed and LSQA entries in this SPCT
(SPCTFIX and SPCTLSQA); the number of segment entries and the number of
active segment entries (SPCTNSEG and SPCTSSEG); and the working set size
(SPCTWSSZ). The flags at offset X' A' are defined as follows:
X'80'

Swap-in in progress

X'40'

Swap-out in progress

X'20'

Paging was purged during swap-out

X'lO'

There is at least one fix entry with a fix count greater than 255

X'08'

Page data set used for LSQA

X'04'

Swap-out requested by IEAVEQRP

The next portion of the SPCT (SPCTSW AP) is the SPCT extension and is 56
(decimal) bytes long. It contains a maximum of six fix swap entries or eight
LSQA swap entries, or a combination of the two. In a combination, LSQA entries
precede all fix entries. LSQA entries are six bytes each and fix entries are eight
bytes each. Both entries contain the following flags in the first byte:
X'80'

LSID in this entry is valid.

X'40'

This is an LSQA entry.

X'20'

The VBN in this entry is for common.

X'lO'

The page was flagged defer release at swap time.

This flag byte is followed by a three-byte LSID and a two-byte VBN. If the
entry is for LSQA, there is nothing more, but if the entry is a fix entry, the next
two bytes contain the fix count. The last portion of the SPCT contains a variable
number of six-byte segment entries. The first byte is the segment number and it is
followed by the address of the page table. The next two-byte field (SPCTBITM)
is a 16-bit map indicating which pages are to be brought in at swap-in time.

Section 5. Component Analysis

5.5.5

Real Storage Manager (RSM) (c,ontinued)

PFTE (Page Frame Table Entry)
Important fields in the PFTE are:
PFTIRRG - indicates the format of the first word of the PFTE. This bit is
located in PFTFLAG2 at offset X'D' and is a X'lO'. If it is on, the
first word of the PFTE is mapped as PFTPGID and contains a VIO
LGN and RPN. If PFTIRRG is off, the first word of the PFTE is
mapped as PFTASID and PFTVBN. An ASID of X'FFFF' indicates
a common area page.' Note that a VIO LGN can be the same as an
address space ASID; address space ASIDs and LGNs are seldom
the same but could be.
PFTPCBSI - indicates there is a PCB on an I/O queue for this page; there can be
a string of related PCBs for this page. This bit is located in
PFTFLAG 1 at offset X'C' and is a X'08'. This bit is turned off by
the process that validates the page when the I/O completes, or, for
output I/O, after the I/O completes but before the PFTE is sent to the
free queue. Note that I/O queues sometimes contain several
"no-op" PCBs; these appear to point back to a frame and its
associated PFTE. When a PCB is made into a "no-op," PFTPCBSI
is turned off and the association between that PCB and that frame
and its associated PFTE is broken. These "no-op" PCBs are either
output PCBs with incomplete I/O or inputPCBs with complete I/O.

Page Stealing
Figure 5-24 shows the flow of the page stealing process. The circled numbers in the
figure correspond to the notes below.

CD

5.5.6

lEAVRFR scans the local frame queue (LFQ) or common frame queue
(CFQ); the queue it scans is determined by the parameter list received
from SRM.

G)

lEAVRFR checks the hardware reference bits and then updates the
unreferenced interval count (VIC). lEAVRFR orders the LFQ and the
CFQ so that the PFTEs with the highest VICs are at the top of the queue.
The queues are in descending order, with zero VICs at the bottom.

C])

Frames are selected to be stolen based on their VIC and pageability; that
is, flxed/LSQA/bad pages, and pages that are V= R allocated cannot be
stolen.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Real Storage Manager (RSM) (continued)

o

and

(0

IEAVRFR calls a common routine, FREEPAGE, to invalidate selected
pages and build a PCB for the page-out process if the page is changed.
If the frame queue from which frames are being selected does not
correspond to the current address space or the CFQ, lEAVRFR must
schedule an SRB (STEAL) to the appropriate address space in order to get
to the PGTE in LSQA.
Finally, IEAVRFR calls ASM to start output paging.

and

1(0

®

Entry IEAVRFRA scans the LFQ of the address space it is scheduled into.
If PFTSTEAL= 1 and if a frame is still stealable and has not been
referenced since "Select," IEAVRFRA sets the steal flag. FREEPAGE is
then called to steal the frame.
After the frame queue has been scanned, ASM is called and given a string
of AlAs.

IRARMSTM
SRM branches to
IEAVRFR

RFR
parameter list
Flags
A (ASCS)

For common
area steal

--~

IEAVRFR (SELECT)

Flags

CD

Obtains queue headers

A (0).

®

Selects frames to be stolen

Flags

@

A (ASCS)

o
®

Parameter list is in
module I RARMSTM
(Label RFRLST1)

IEAVRFR (Free Page)
Invalidates PGTE and
builds PCS if changed
page

®

Stops scanning the frame
queue when the UIC is
less than the criteria number
Calls FREEPAGE or
schedule steal
Starts accumulated I/O

IEAVRFR
Entry IEAVRFRA (Steal)

ILRPAGIO

•

Locates frames with
PFTSTEAL=1

Starts page-out I/O

•

If still stealable,
calls FREEPAGE

•

Starts accumulated
I/O

Figure 5-24. Page Stealing Process Flow

Section 5. Componeni Analysis

5.5.7

Real Storage Manager (RSM) (continued)

Reclaim
Reclaim is an RSM function that revalidates a page/real frame pair that was
previously iiwalidated. IEAVGFA performs the reclaim for the normal case after
a page fault on an address space or common area virtual address. IEAVAMSI
handles the VIO case.
In the virtual address case , lEAVG F A handles work as follows:
1. PCBVBN is used to locate the PGTE.
2. The PGTE is used to obtain the last-used RBNO value.
3. The RBNO is used to address the PFTE.
4. PFTIRRG is checked to ~etermine if the first word of the PFTE is in
PFTPGIDor PFTASID/PFTVBN format.
S. If PFTIRRG=O, PFTVBN is compared to PCBVBN.
6. If the VBNs match and the VBN is in the common area, the reclaim is
successful. If the VBN is in the private area and PFT ASID matches ASCBASID
(which PCBASCB points to), the private area reclaim is successful.
In the VIO case, IEAVAMSlhandles work as follows:
1. lEAV AMSI is supplied with both a RBNO and a DSPID.
2. The RBNO is used to address the PFTE.
3. PFTIRRG is checked to determine ifPFTPGID is in PGID format.
4. IfPFTIRRG=I, PFTPGID must match the DSPID; if it matches, the reclaim is
successful.
When reclaim fails, normal frame allocation paths are followed just as though the
page had never been in storage.

Relate
Relate is an RSM function that associates independently-generated page requests
(PCBs) for the same virtual address. When the physical action required to satisfy
one of these requests (I/O or frame allocation) is completed, all related requests are
also satisfied. A PCB-related chain is produced for all cases except the VIO data
set. The same modules that do reclaim, lEAVG F A and lEAV AMSI, handle the
relate process, which only follows after a successful reclaim.
In the virtual address case, lEA VGF A handles work as follows:
1. The relate function is invoked for one of two cases:
• The reclaim function has successfully completed and PFTPCBSI is on,
indicating page I/O is in progress; a PCB I/O queue is searched.
• The XPTDEFER bit is on, indicating that the previous PCBs have been
delayed because frames were not available. The GF A defer queue will be
searched to do the relate fUnction.

5.5.8

OS!VS2 System ProgramMing Library: MVS Diagnostic Techniques

Real Storage Manager (RSM) (continued)
2. The search argument is PCBVBN in all cases except that of the G F A defer
queue; in that case PCBASCB and PCBVBN are the search arguments.

3. When the correct queued PCB is located, the current'PCB is inserted in the
related PCB chain between the queued PCB and the previous, first-related PCB.
In the VIO data set case, IEAVAMSI handles work as follows:

1. The PCB local I/O queue is scanned for a match on PCBRBN because PCBVBN
is always set to 0 for move-out PCBs. If PCBRBN matches, PCBV AM must be
on.
2. 'When the correct PCB is found, it is updated with the information the I/O
completion portion of RSM needs to place the page of the VIO data set in the
new window location (this is not necessarily a new page).

RSM Recovery
RSM recovery consists of a SETFRR at all major entry points to the,
RSM:
• The issuer of the SETFRR places the address of the FRR in PVTPRCA.
• Each SETFRR returns a six-word parameter list in the recovery communications area (RCA).
•

RSM has only one FRR - lEA VRCV.

• The IHARCA macro maps the RCA; this macro can be found in most RSM
modules.
• lEAVPSI contains the RCA macro in assembler language format.
Whenever an unexpected error or COD abend occurs, the RCA is copied into
SDW AVRA. The CSECT ID and the module-entered flag in the RCA can be used
to determine the path taken through RSM to the point of error. To determine this
path, you must understand the RSM flow and know which module issues SETFRR.
The follOWing RSM modules issue SETFRR at their main entry point:
IEAVAMSI

IEAVPIOP

IEAVSOUT

IEAVCSEG

IEAVPIX

IEAVSQA

IEAVEQR

IEAVPRSB

IEAVSWIN

IEAVIOCP

IEAVPSI

IEAVTERM

IEAVITAS

IEAVRCF

lEAVRELS at lEAVRELV

IEAVPIOI

IEAVRCV

lEAVFRSB at IEAVPRSR

(entry)
pomt

IEAVRFR

Section 5. Component Analysis

5.5.9

Real Storage Manage.. (RSM) (continued)
RSM'sFRR does not attempt complex rec()very. Its main objective is to record
the error and issue an SDUMP. It has some special logic based on where the error
occurred, as follows:

Error Occurred III

FRR Actioll

lEAVEQRI or lEAVRCFI

Restore registers for return to IEAVPFTE.

IEAVPIX

Attempt to reset page faulter.

IEAVSIRT

"Memterm" swapping in address space.

IEAVSWIN

"Memterm" swapping in address space.

IEAVPIOI

Retry for cleanup or "Memterm."

IEAVINV

Set "GO" indicator and PTLB or retry to PTLB.

IEAVPSI

If error occurred while checking input parameters,
set abend of 171.

Other

If it is a non-zero retry address, retry; otherwise
continue with termination.

Recursion is not allowed. The PVT and PFTEs are dumped on the SDUMP.

The following reason codes are put into the RCARCRD field when real storage
management issues abend with a code of COD. All COD abends are retried at
the next sequential instruction.

Real Storage. Mallagemellt ABEND Reasoll Codes
Code (hex)

5.5.10

Meaning

01

Findpage, translate real to virtual, or the LRA instruction returned,
an unexpected code for a segment, page, or frame whose existence
was implied by some RSM control block or function. Findpage,
translate real to virtual, or LRA is assumed to be correct.

02

A GETCELL or FREECELL for the RSM cell pool failed. If
FREECELL, the error is ignored; if GETCELL, asynchronous retry
is attempted where possible.

03

A FREEMAIN failed for space originally obtained by RSM or VSM
using GETMAIN. The error is ignored.

04

The return code from ASM (ILRSWAP, ILRPAGIO, or ILRTRPAG)
indicates an invalid request. The recovery action taken by RSM
varies with the type of reql,lest, but the RSM function being performed is usually terminated if ASM resources were being requested,
or continued if ASM resources were being returned.

05

A GETMAIN for RSM control block space was unsuccessful. The
function for which the space was required is terminated.

06

An attempt was made to release a lock which was not held. RSM
tables might be damaged due to the loss of serialization. RSM
attempts to continue normal operation.

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Real Storage Manager (RSM) (continued)
Code (hex)

Meaning

07

RSM control information indicated a PCB for a page should exist. on
an I/O active queue or on the defer queue, but searching of the.
queue(s) failed to find the PCB. It is assumed the control information is in error and no such PCB exists.

08

The existence of a V=R or offline root PCB was implied but no
appropriate PCB could be found on the V=R or offline root queue.
The error is ignored and indicators are reset.

09

Swap-in's XMPOST error exit was entered, so restore will not run.
The target address space is terminated.

OA

An incorrect fix count was detected in a PFTE. The count is
adjusted to the expected value.

OB

The interprocessor communication service routine (RISIGNL) could
not signal another processor as requested by lEAVINV. The
condition is ignored and normal operation continues.

OC

IEAVPIOP has discovered an undefined combination of I/O
completion status flags in the AlA after a page-in or page-out. The
condition is treated as an I/O error.

OD

lEAVDSEG was requested to destroy a non-existent or common
area segment. The request is denied.

OE

A PCB was required but none were available. The routine needing a
PCB is terminated.

OF

The attempt to complete processing of a previously deferred
FREEMAIN release has failed.

10

An FOE could not be found on the specified TCB's fix ownership,
list.

11

An internal RSM invocation of the PGOUT function was
unsuccessful. The page remains in real storage.

12

A swap (in or out) was requested for an address that already has a
swap in progress, or no SPCT exists for the address space to be
swapped. The request is denied.

13

Swap-in could not re-establish the address space due to missing or
incorrect control information (SPCT or PCBs). The address space is
abnormally terminated.

14

An internal invocation ofPGFREE failed. The error is

IS

Swap-out has detected an inconsistency in the SPCT fix entries it
has created. The error is suppressed and recovery attempted. '

16

ASCBCHAP could not enqueue or dequeue an ASCB during a swapin or swap-out operation. The address space is terminated.

17

Swap-out has detected an error in the allocated frame count
(ASCBFMCT) for the address space. If possible, the count is
corrected and the swap-out continued; otherwise, the swap-out is
suppressed.

ignored~

Section 5. Component Analysis

5.5.11

Real Storage Manager (RSM) (continued)
Code (hex)

Meaning

18

No SPCT segment entry could be found for a segment whose
existence was implied by other RSM control information. The error
is ignored and the SPCT update is skipped.

19

An internal RSM function issued a return code which was either
invalid or not applicable. System action depends on the nature of
the function.

lA

Swap-in detected a common area page that was not obtained using
GETMAIN among the input working set. The page is not made
available to the incoming address space. Some other address space
must have freed the page using FREEMAIN while the current one
was swapped-out. Probable user error.

IB

During an attempt to free the frames backing a V=R region, it has
been determined that the virtual space is not backed by real storage,
or that the virtual-to-real mapping is not 1 to 1.

lC

lEAVPSI attempted to flx the ECB for a page service that will
complete asynchronously, but lEAVFXLD returned a code
indicating the flx was not accomplished.

ID

A PCB marked I/O complete (indicating that it was previously
processed by lEAVPIOP) has been passed to lEAVPIOP by ASM.

IE

A software error has been found in the AlA passed from ASM to
RSM for an I/O request. Possible errors are:
• The AlA contains data inconsistent with previous AlAs.
• The original input chain (to ASM) was invalid.
• The LSID in the XPTE was invalid.
• The LPID in the XPTE was invalid.
• A hardware I/O error occurred on a pageout PCB (this should not
occur).

1F

An invalid real storage address was returned to IEAVPRSB at entry
point lEAVPRSR.

21

IEAVPFTE detected a discrepancy in the SQA reserve queue count
controls. Use of the SQA reserve queue is discontinued until after
re-IPL. RSM attempts to continue normal operation.

22

lEAVTERM has found an FOE flx count that is greater than the flx count
in the corresponding PFTE. The PFTE flx count is not changed, but the
FOE is freed.

RSM Debugging Tips

5.5.12

1.

Because the PCB free queue is a FIFO queue, it represents recent history in
RSM. Start your search of the PCB free queue with the youngest PCB
(PVTFPCBL) and look for the appropriate VBN in the PCBVBN or AIARPN.
This approach often reveals what has most recently happened to the page in
question.

2.

Whenever the system wants to break the logical connection between the PCB
and the page, it sets PCBVBN to zero. Therefore, look at AIARPN to
determine what VBN the PCB was associated with (A IARPN=PCBV BN! 16).

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Real Storage Manager (RSM) (continued)

3.

The PVT contains several work/save areas that belong to a unique module.
These are often useful to determine the last thing a module did.

4.

At any time, there should never be more than one input PCB with a given
PCBVBN on the I/O-active or GFA-deferred PCB queues. Output PCBs are
never related.

5.

The XPTVIOLP flag can be confusing. If it is on, XPTXAV must be on.
SV AUX=l means the LPID in the XPT is a VIO LPID and not an adqress
space LPID.
When a page with XAV=l and SVAUX=l is stolen, it is paged out under an
address sp~ce LPID and SV AUX is set to zero. If the next operation on the
page is a VIO move out, RSM tells ASM to logically transfer the address space
LPID contents to the VIO LPID contents.

6.

It is sometimes useful to observe the AIANXAIA pointers in PCBs on the PCB
free queue. These pointers probably indicate the order in which I/O completed
for a group of requests.

7.

To help diagnose a COD abend, the PVTDUMP bit (byte 0, bit 7 of the PVT)
can be turned on (using superzap) to cause the RSM FRR to dump the PVT,
PFT, SQA, and current LSQA data areas.

Converting a Virtual Address to a Real Address
A virtual address contains the segment number in the first byte, the page number in
the next four bits, and the page displacement in the remaining twelve bits (that is,
sspddd - segment, page, displacement). The ASCB for the address space points to
the RSMHD. The first word (RSMVSTO) of the RSMHD is the virtual address of
the segment table (SGT). Multiply the segment number (ss) by the length of a
segment table entry (4) to locate the SGT entry (SGTE). The SGTE contains the
real address of the page table (PGT).
A real address consists of a real block number (RBN) in the first twelve bits and
a page displacement in the remaining twelve bits (that is, rrrddd - RBN, displacement). The RBN portion of the real address of the PGT is concatenated with zero
(RBNO) to form an index into the page frame table (PFT). This index is added to
the apparent origin of the page frame table (PVTPFTP) in order to obtain the
virtual address of the page frame table entry (PFTE). The PFTE identifies the
frame that contains the page in which the page table resides.
The second half of the first word of the PFTE is the virtual block number
(VBN). The VBN is concatenated with the displacement portion of the real address
of the page table to form the virtual address of the page table (VBNllddd).
Multiply the page number (p) of the virtual address being converted by the length
of a page table entry (2) to locate the PGT entry (PGTE). The PGTE contains the
RBN portion of the real address that corresponds with the initial virtual address.
This RBN is concatenated with the displacement portion of the initial virtual
address to obtain the desired real address (RBNllddd).
Figure 5-25 shows the relationship of the control blocks used to convert a virtual
address to a real address.
Section 5. Component Analysis

5.5.13

Real Storage Manager (RSM) (continued)

Given a virtual address - find the corresponding real address.
Definitions: Virtual address = sspddd = VBNllddd
ss
segment number
p
page number
ddd
displacement within page
VBN
virtual block number
Real address = RBNllddd
RBN
real block number
ddd
displacement within page

G)

Find the real address of the page table (RBNlld'd'd').

RSMHD

ASCB

SGT

RSMVSTO

+ (4*ss)

ASCBRSM

SGTE

SGTE contains the real address of the page table

®

Convert the real address of the page table to a virtual address (VBNlld'd'd').
PVT

PFT

I

PVTPFTP

~

+RBNO

PFTE

-p

,/
from SGTE
PFTE contains the VBN portion of the virtual address of the page table.

®

Find the RBN portion of the real address.
PGT

+(2*p)
PFTE

PGTE

SGTE

PGTE contains the RBN portion of the desired real address.

Concatenate the displacement portion of the virtual address (ddd) with the real
block number (RBN)to form the real address that corresponds to the given
virtual address.
real address

= RBNllddd

Figure 5·25. Converting Virtual Addresses to Real Addresses.

5.5.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Real Storage Manager (RSM) (continued)
Example: Converting a Virtual Address to a Real Address
This example shows how a virtual address of A9ECO was converted to a real address.
The values used in this example (such as ASCBRSM = FC7380) were obtained from
a sample dump.
Given:

ss
p
ddd

Step 1:

= A9ECO (sspddd)

Virtual address

OA (segment number)
9
(page number)
ECO (displacement within page)

Find the real address of the page table (PGT).
ASCBRSM
RSMVSTO

+

=
=

FC7380 (address of RSMHD)
89FCOO (address of SGT)

89FCOO (RSMVSTO)
28 (4*ss)
89FC28 (address of SGTE)

SGTE

=

F0307F20

Real address of PGT

Step 2:

= 307F20

Convert the real address of the PGT to a virtual address.
RBN
d'd'd.'

=
=

PVTPFTP

307 and RBNO
F20

= 3070

= 78760

78760
+ 3070

(PVTPFTP)
(RBNO)

7B7DO (address of PFTE)
PFTVBN

= 87BO

Virtual address of the PGT

Step 3:

= VBNlld'd'd' = 87BF20

Find the RBN portion of the real address.

+

87BF20 (virtual address of PGT)
12 (2*p)
87BF32 (virtual address of PGTE)

= 3811
RBN portion = 381

PGTE

Step 4:

Form the real address for the sample.
Real address = RBNllddd = 381ECO

Section 5: Component Analysis

5.5.15

5.5.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM)

The auxiliary storage manager (ASM) controls all system direct access storage that
is allocated for virtual address space paging and for virtual input/output (VIa)
data sets. ASM supports the dynamic paging requirements of the real storage
manager (RSM) and the data set storage and retrieval requirements of the virtual
block processor (VBP). For MVS paging, ASM has the responsibility of selecting
the auxiliary storage location (slot), maintaining the slot/page mapping, and
coordinating the slot/frame transfer.
The auxiliary storage manager consists of four sections:
• I/O control
• I/O subsystem
• VIa control
• VIa group operators.
I/O control is the link between RSM and the I/O subsystem for paging requests,
and between RSM and the I/O supervisor (IDS) for swapping requests. RSM
initiates all swapping requests; the I/O is executed by the I/O subsystem and IDS.
I/O control accepts the paging/swapping requests from RSM, determines the type
of I/O to be done and when it can be started, and notifies RSM when the I/O is
completed. I/O control also records the auxiliary storage locations of all virtual
pages.
The I/O subsystem communicates with lOS to cause the physical transfer of
data between real and auxiliary storage. It allocates auxiliary storage slots, builds
paging channel programs, passes them to IDS for execution, and processes I/O
completions.
VIO control coordinates all the ASM processing required to support VIa data
sets (called logical groups by ASM). Operations on a logical group are classified
as group operations and page operations. A group operation is not allowed to
process while another group operation or page operation is processing for a
logical group. The virtual block processor (VBP) initiates group-related operations and VIO control passes them to the VIO group operators to be processed.
RSM initiates page-related operations and I/O control and VIO control jointly
process them.
VIO group operations maintain the logical group information that VBP
requires. The VBP group operators perform all processing necessary to create,
save, restore, and delete a logical group. These operators are invoked only by
VIO control as a result of requests from VBP.

Section 5. Component Analysis

5.6.1

Auxiliary Storage Manager (ASM) (continued)
Modules (CSECTs) belonging to each section are:

I/O Control

I/O Subsystem

ILRPAGIO
ILRPAGCM
ILRSWAP
ILRSWPDR
ILRFRSLT

ILRPTM
ILRSRT
ILRCMP
ILRMSGOO

VIO Control
ILRPOS
ILE-GOS
ILRVIOCM
ILRSRBC
ILRJTERM

VIO Group Operators
ILRACT
ILRSAV
ILRRLG
ILRTMRLG
ILRVSAMI

Component Functional Flow
ASM provides seven functional services. The first four are invoked by the use of
the ILRCALL macro, the remaining three via BALR:

• ASSIGN LG obtains a logical group identifier from ASM and creates a logical
group for a VIO data set.
• SA VE preserves the status of a logical group for recovery at a later time.
• A CTIVATE places a logical group into active status after it has been saved and
the saved status of the group is desired. (Used for step restart of VIO data
sets.)
• RELEASE LOGICAL GROUP deletes an entire logical group; this allows ASM
to reuse all slots associated with that logical group (VIO data set).
• TRANSFER PAGE moves the logical slot identifier (LSIO) for a page from an
address space to a VIO logical group.
• REQUEST I/O transfers page-sized blocks between real storage and ASM's
auxiliary storage.
• REQUEST SWAP I/O transfers LSQA between real storage and ASM's auxiliary
storage. Page~size blocks are transferred if page data sets are used. Swap-set
size (up to 12 pages) blocks are transferred if swap data sets are used.
The following descriptions track three of these services through the component: SAVE, which is similar to assign LG,activate, and release logical group;
request I/O; and request swap I/O.

Saving an LG
SAVE requests ASM to write the ASPCT containing the slot numbers (LSIOs) of a
VIO data set to SYS 1.STGINOEX. ILRGOS receives control from VBP in task
mode with an ASM control area (ACA) containing the LGN of the VIO data set as
input. ILRGOS builds an ASM control element (ACE), queues it to the logical
group entry (LGE) process queue (LGEPROCQ)..for that LGN, and calls ILRSAV.

5.6.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
ILRSAV calls ILRVSAMI, which calls VSAM to write the ASPCT to the
SYSl.STGINDEX data set. An'S' symbol is returned by ILRVSAMI. (The'S'
symbol is part of the VSAM key used to save this ASPCT and can be used to
uniquely identify the ASPCT for an activate request). ILRSAV puts the'S' symbol
in the ACE and returns to ILRGOS. ILRGOS copies it into the ACA, frees the
ACE, and returns to VBP.

Requesting I/O
RSM calls ILRPAGIO for I/O requests. An ASM I/O request area (AlA), or string
of AlAs describes the request. ILRPAGIO determines if the request is for a VIO
page and, if it is, calls ILRPOS to process it. Otherwise, ILRPAGIO continues to
process the request.
For write requests, the previous slot for this page is freed. For read requests,
the LSID is obtained from the extended page table entry (XPTE), and put into the
AlA. The AlA is queued to the ASM staging queue (ASMST AGQ) and ILRQIOE is
called.
ILRQIOE builds an I/O request element (IOE) for each AlA on the staging
queue, and queues 10Es to the paging activity reference table (PART) header or
to a PART entry. Each PART entry represents a paging data set and controls
activity on the data set. Since a read request is for a particular data set, the read
10E is queued to the PART entry identified by the PART index contained in the
LSID. Write 10Es are queued to the PART header because the data set to be used
is still unknown.
If an SRB is not already scheduled for ILRPTM, ILRQIOE schedules one. The
PART monitor (lLRPTM) scans the PART entries for work and available resources
(I/O control blocks) to process the 10Es. For each PART entry with I/O to be
done, slot sort routine (ILRSRT) is called. ILRSRT allocates slots for writes interspersed between reads (to minimize arm movement), builds PCCW
chains, and issues a STARTIO macro to initiate lOS processing.
When I/O completes, lOS calls ASM's disable interrupt exit (DIE) routine
(ILRCMPDI - an entry pOint in ILRCMP). ILRCMPDI checks for errors, and if
one occurred, returns to lOS indicating that the I/O should be handled by the
post status lOS routine and ASM's appendages (ILRCMPAE and ILRCMPNE).
If the I/O is successful, ILRCMPDI calls page completion (lLRPAGCM).
ILRP AGCM calls VIO completion (ILRVIOCM) if the I/O is for a VIO page. If it is
a non-VIO write request, ILRPAGCM takes the LSID that ILRSRT put into the
AlA and puts it in the XPTE for the page in the correct address space. The AlA
is then returned to RSM (lEAVPIOP).
.

Section 5. Component Analysis

5.6.3

Auxiliary Storage Manager (ASM) (continued)

Requesting Swap I/O
RSM calls ILRSWAP with a chain of AlAs for either a swap-in or swap-out
request. The following discussion traces a swap-out operation.
ILRSWAP separates the non-LSQA AlAs from the LSQA AlAs and calls
ILRP AGIO to process the non-LSQA pages as a regular request I/O function.
The LSQA AlAs are queued to the ASM header of the address space (ASHSWAPQ).
If there were no non~LSQA AlAs, ILRSWAP immediately calls ILRSLSQA to
process the LSQA AlAs. Otherwise, ILRSWAP returns to RSM.
As non-LSQA AlAs complete, ILRPAGCM isgiven control (see Requesting I/O).
When alI non-LSQA AlAs have completed, ILRPAGCM calls ILRSLSQA to process
the LSQA AlAs. ILRSLSQA, called by ILRPAGCM or ILRSWAP, calls ILRPAGIO
to process the LSQA AlAs if there are no available swap sets. Otherwise,
ILRSLSQA assigns swap sets and initializes swap channel control workareas
(SCCWs) for all the AlAs queued to ASHSWAPQ. A count of LSQA pages
(ASHSWPCT) is incremented for each AlA. The completed SCCWs are chained to
the swap activity reference table (SART) entry SCCW queue (SRESCCW). If an
SRB is not already scheduled for swap driver (ILRSWPDR), ILRSLSQA schedules
one. ILRSWPDR searches each SART entry for a non-zero SCCW queue, chains the
SCCWs to an 10RB for that data set, and issues a ST ARTIO macro to initiate I/O
processing. Completed I/O is handled by ILRCMPDI as in the "Requesting I/O"
function, and ILRP AGCM is called. ILRPAGCM processes LSQA AlAs by putting
the LSID for each page into the SPCT control block for this address space, putting
the AlA on the capture queue (ASHCAPQ), and decreasing the swap count
(ASHSWPCT) by 1. When the swap count is 0, ILRPAGCM returns all the AlAs
on the capture queue to RSM (IEAVSWPC module).
Figure 5-26 shows the relationships among the important ASM control blocks.

Component Operating Characteristics
The following topics discuss characteristics of ASM's operating environment.

System Mode
ASM uses the SALLOC lock in most page and swap processing in I/O control
modules. I/O control"modules interface directly with RSM, the principle user
of the SALLOC lock. The SALLOC is held throughout processing, including the
calls to the VIO control routines ILRPOS and ILRVIOCM. The local lock is
used during assign and release logical group requests processed by ILRGOS and
ILRRLG.

5.6.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
PAGE FAULT PROCESSING

INPUT

Start
Macro
ILRPAGIO

LPID
or
LslD

Non-VIO
AlA

~

I
I

I

__ --1

'-...

I

I

I
I

~

"'- '-...

Create
from AlA

"'- .........
Works With.

IIOE
AlA

1+

ILRPOs

/

/

'-...
.........

"-,,"-

/

Page Operation
Starter

/

/

PART

Use LPID to Find
LPME

tccw

AlA

SRB

~
....

Q)
C)

CIJ

c:

• PCT

PCCW

{ . LGE
ASCB
LGVTEs
LGE

+

+PAT
+PCT

CCW

+ASCB

+EDB

LGVT

ASHMD

PARTES

Header
CCW

CIJ

~

Q)
C)

~

0

en

m
Q)

II:

'

10sB',

+

CCW

Header

CVT + 2CO

./'

I

<'

Flags

"....."

I

I I
NN

Slot No.

I

'---..,--J
LslD

I

Header
0001111
1111111
10110111

uu

Figure 5-26. Relationship of Important ASM Control Blocks

Section 5. Component Analysis

5.6.5

Auxiliary Storage Manager (ASM) (continued)
An ASM class lock exists for each active address space (lockword in ASMHD).
The ASM class lock is used by the VIO control modules. ILRPTM uses an ASM
class lock (lockword in PART) to serialize the IOE write queues. The
ILRCMPDI entry of ILRCMP (ASM's DIE routine) runs in physically-disabled
mode since it is running under the I/O interrupt handler. The rest of ASM modules
simply run in task Or SRB mode using compare and swap instructions where
necessary.

For additional information on locking, refer to the topic "ASM Serialization"
later in this section.

Address Space, Task, and SRB Structure
I/O control modules receive control in the address space of the caller with the
exception of ILRSWPDR, which is an SRB executing in master's address space.
Note also that ILRPAGCM transfers to the correct address space (TRAS) to update
the external page table entry (XPTE), which is in LSQA.
I/O subsystem modules run in SRB mode under master's address space, except
for the ILRCMPDI entry of ILRCMP and the modules it calls, which execute in the
address space interrupted by the I/O completion.
VIO group operator modules, as well as ILRGOS (VIO control module), are
tasks (locked mode) executing in the address space associated with the VIO
requests. ILRTMRLG runs in task mode, but in master's address space.
VIO control modules ILRPOS and ILRVIOCM receive control in the address
space of the caller. ILRSRBC executes in SRB mode in the address space
associated with the VIO requests.

Storage Considerations
ASM maintains four-cellpoois for its internal control blocks. These cell pools are
pushdown stacks and the elements at the top of the cellpools represent the last
control blocks used by ASM. There are three expandable cell pools for work areas,
ACEs, BWKs, and SWKs. The 10E cellpool is not expandable. The cellpools are
anchored in the ASMVT and the control blocks reside in SQA. The ASMVT is
in the nucleus, but most of the other ASM control blocks are in SQA. One
exception is the ASpeT, which resides in the LSQA of the associated address space.

MP Considerations
ASM takes advantage of MP by allowing both the I/O subsystem ahd the
ILRSWPDR module of I/O control, to execute concurrently on both processors.
This is achieved through extensive use of compare and swap logic. An individual
PART entry or SART entry is 'flag' locked by the processing processor, but ASM
can process a request for another entry on the second processor.

5.6.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

Interfaces With Other Components
Four other components interface with ASM:
• RSM with I/O control for page and swap I/O requests, and with VIO control
for transfer page requests.
• VBP with VIO control for assign, save, activate, and release logical group
requests.
• IDS with I/O subsystem and ILRSWPDR routine of I/O control to process
I/O requests.
• VSAM with VIO group operators to handle I/O to SYS1.STGINDEX. RSM and
VBP call ASM, and ASM calls IDS and VSAM.

Register Conventions
ASM modules adhere to the following register conventions when calling other
ASM modules. There are some exceptions where certain addresses are not required.
REGISTER:

0

- Parameter register, if required.

1

- Parameter register, if required.

2

RSMHD address for the current address space or the
address space identified by an input parameter in
register 0 or 1. The ASMHD is addressable as part of the
RSMHD.

3

- ASMVT address.

4

- Address of ATA or EPATH currently active for recovery
tracking.

13

- Address of register save area, if required.

14

Return address.

15

Entry point address.

The I/O subsystem does not use the ASMHD and therefore does not maintain
register 2 convention.

Footprints and Traces
The most useful traces of ASM processing are its control blocks and queues,
because they document the movement of requests through ASM.
The processor work/save area vector table (WSAVT), which is pointed to by
LCCACPUS, will point to the work/save areas for the last I/O processed on the processor. WSACASMD points to the 256 byte save/work area for ILRCMPDI (ASM's
DIE routine). WSACASMS points to two contiguous 256-byte save/work areas, the
first for ILRPTM, and the second for ILRSRT. The first one is also used by
ILRSWPDR and ASM's I/O appendages (ILRCMPAE, ILRCMPNE, and ILRCMP).

Section 5. Component Analysis

5.6.7

Auxiliary Storage Manager (ASM) (continued)
ASMVT contains save areas for ASM's other I/O-related modules. ASMBWKPC
is a poolof work areas used by VIO-related modules (ILRGOS, ILRACT, ILRSAV,
and ILRRLG). Bits in the X'O I' byte of the ASMVT indicate whether the IPL was
a cold, quick, or warm start.
The LGE process queues (LGEPROCQ) contain AlAs and ACEs in process, or
waiting for processing for VIO requests.

If the PARTE is locked, part monitor (ILRPTM) has called or is about to call
slot sort (ILRSRT). If ASMTMECB (ASMVT + x,AB') is a posted ECB,
ILRTMRLG is or was about to process the task portion of a release logical group
request.
When an ASM-Iocked or SRB-mode routine is processing, its functional
recovery routine is on the current FRR stack. The first word of the parameter
area passed to the FRR contains a one-byte id of the ASM module that established
the FRR, followed by three bytes of flags indicating the ASM module or entry
point in process at the time of the error. The different ids are discussed in
"Recovery Footprints."
When ASM's I/O completionmodule encounters the first bad slot, an error
record is built with its address at X'14' into the ASMVT. It contains the LSIDs of
the unusable slots. The first three bytes in the record are the address of the current
entry filled, beginning address of the record, and ending address of the record. An
entry contains one byte of flags and the three-byte LSID. If bit 0 is on, the error
occurred on a swap data set. If bit 4 is on, there was a read error. I/O error counts
are found in the ASMVT, PART entries and SART entries. ASMERRS (ASMVT +
X'7C') is the total of error slots found on local page data sets. PARERRCT (PART
entry + X'lB')and SRERRCNT (SART entry + X'lB') are the error slots
encountered on the particular data set represented by the entry.
In the ASMVT there are two counts, ASMIORQR (ASMVT + X'2B') and
ASMIORQC (ASMVT + X'2C'), which contain both the number of I/O requests
ASM has received and the number completed. If more requests have been received
than completed and the system is waiting, there is something wrong with ASM or
lOS.

General Debugging Approach
This description helps isolate paging problems, the most difficult problems to
debug. Paging problems (not all of which are ASM problems) fall into two main
groups - paging interlocks and incorrect or duplicate pages.

Paging Interlocks
Paging interlocks result in: an enabled wait state. There are two indicators that hint
that the enabled wait is a paging interlock:
• The I/O request counts in the ASMVT (ASMIORQR and ASMIORQC) are not equal.

5.6.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
• The ASMIOCNT contains a count of the number of I/O requests sent to lOS but
not received by ILRCMP. It is necessary to determine why paging requests
received by ASM are not completing. To learn more about these requests, it is
necessary to follow I/O control block chains .
• The ASMSRBCT field indicates whether ASM's SRB for ILRPTM (ASM PART
monitor) has been scheduled. If this field is not zero, ASM's SRB has been
scheduled but not dispatched. It is necessary to determine why the SRB has
not been dispatched.
The blocks discussed here are in the Debugging Handbook. To find the I/O
request blocks for a given page space, start with the PART entry. The PART entry
points to the first 10RB.
There is one 10RB for each page data set on a disk, and four for each page
data set on a drum. The first bit of the fourth byte indicates whether or not ASM
has passed the 10SB to lOS. If the bit is 0, the 10RB/IOSB is available. If the bit is
1, the 10RB/IOSB is in use. The 10RB points to the 10SB and to the first of a
queue of PCCWs. For an active I/O request, the third word into the PCCW points
to the associated AlA, and the second word points to the next PCCW on the chain.
If the request has been sent to lOS and not returned, it is necessary to trace lOS
processing. If I/O processing has caused a page-fault or a request for an enabled
lock, the interlock is probably explained. Either ASM could not get the resources·
to handle the page fault, the page is already in use and this request is backed up
behind the previous one, or the holder of the lock has page-faulted and the page
fault cannot be resolved.

Incorrect Pages
It is almost impossible to determine from a dump what caused the wrong page to be
written or read. At best, a dump provides clues as to which general area is causing
the problem. Intensive code reviews are then necessary to find it. Frequently,
traps must be applied to narrow the area further.
The following paragraphs contain descriptions of how to find various pieces of
useful information. There is no attempt to describe how to use them because
there is no general method.
It is first necessary' to determine which page contains bad data and whether the
whole page or only part of it is bad. If possible, also determine which page has
overlaid the bad page. If only part of the page is bad, the.error probably occurred
while handling a track overflow record to or from an alternate track. Check for
an invalid first or last part of a page. ASM is unlikely to be the cause of invalid
data in the middle of the page.
Incorrect pages cause a system failure when the page is used by a system task or
by a routine holding a critical system resource. The invalid page is more likely to
cause an address space to fail because of program checks that result from invalid
data. These failures are rarely attributed to invalid pages.

Section 5: Component Analysis

5.6.9

Auxiliary Storage Manager (ASM) (continued)
Scan the SYSI .LOGREC data set for any improbable program checks and obtain
any associated dumps. Multiple versions of the same problem are helpful in
suggesting a pattern for the error. For example, the error might only occur for the
second page of LSQA or only on a page associated with an overfl9w record.

Finding the LSID for a Given Page
A virtual address contains the segment numoer in the first byte, the page number in
the next four bits, and the page aisplacement in the remaining bits in the form
sspddd (segment, page, displacement). The ASCB for the address space points to
the RSMHD. The first word of the RSMHD is the virtual address of the segment
table. Multiply the segment number (ss) by the length of the segment table entry
(4) to locate the correct entry. It contains the real address of the page table (PGT).
Convert this address to a virtual address. Then locate the correct extended page
table entry (XPTE) by adding 16 times the length of the page table entry (2), and
adding the page number (p) times the length of a XPTE (12).

The XPTEcontains information about the status of this page on auxiliary
storage. If either the XPTVALID or XPTVIOLP flag is on, there is a slot assigned
to this page. If XPTV ALID is on, the LSID (slot identifier) is in the XPTE. If the
page is duplexed, two LSIDs are in the XPTE (one for each slot). If XPTVIOLP
flag is on an LPID instead of an LSID, is in the XPTE. To relate the LPID to an
LSID, see the following topic "Finding LSIDs of VIO Data Sets".
Finding LSIDs of VIO Data Sets
The ASPCT is used to record the auxiliary storage locations (LSIDs) of VIO data
set pages. Only a 1088 byte base ASPCT is created at ASSIGN LGN time. This
ASPCT can handle up to 1 megabyte of VIO data set space. If more than 1 megabyte of VIO space is used, the ASPCT is expanded as follows:
1. For each 256 megabytes of space up to 1 billion bytes, an ASST extension
is built.

2. For each megabyte of space, a LMPE extension is built.
Each ASST or LPME extension requires 1088 bytes of storage. Each ASST
extension contains a vector table of LPME extension addresses. The ASPCT
(base and all extensions) resides in the LSQA of the associated address space.
The LPID is eight bytes. The first four bytes contain an LGID, logical group
(VIO data set) identifier. The second four bytes contain a relative page number
(RPN).

5.6.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

When given an LGID, there are two methods to locate an ASPCT:
1. The ASCB (of the desired address space) points to the RSMHD. The
ASMHD is part of the RSMHD. ASHLGEQ in the ASMHD is the queue of
LGEs (active VIO data sets) related to this address space. Searching through
the address space's ASHLGE queue, one of the LGEs will have an LGELGID
field that matches this LGID. This same LGE has the address of the needed
ASPCT (LGEASPCT) . •
2. Another way to locate an ASPCr from an LGID is to follow the CVT to the
LGVT (CVTASMVT, ASMLGVr). Using the LGID as an index, locate the
appropriate LGVT entry. The LGVT entry contains the address of the LGE
that contains the address of the needed ASPCT.
With the appropriate ASPCT, now use the RPN portion of the LPID as an index
to locate the LPME containing the associated LSID.
Figure 5·27 shows the pointers and control blocks described in the following
paragraphs.
If A' and AA are both zero, use the LL to index ASPLPMES In the ASPCT
base for the LPME containing the LSID.
Otherwise, use A' to index ASPASSTP for the address of the appropriate ASST
extension. Use AA to index the ASPSECTAof the ASST extension for the address
of the appropriate LPME extension. And use LL to index the ASPSECT A of the
LPME extension for the LPME containing the LSID.
The LSID is the slot identifier for this page of the VIO data set. This LSID can
be related to the ASM control blocks PART and PAT and to the actual paging
device. See the following topic "Locate PART and PAT Bit".

LPID:

RPN:

o

2

3

Figure 5·27. Locating An LSID From An LPID (part 1 of 2)

Section 5: Component Analysis

5.6.11

Auxiliary Storage Manager (ASM) (continued)
indexes the base ASPCT, ASPASSTP.
indexes the ASST extension, ASPSECTA.
indexes the LPME extension, ASPSECTA.

+A'

ASPCT Base

ASPCT ASST
Extension

ASPCT LPME
Extension

Header

Header

Header

C
ASPASSTP

I

+AA

+LL

(
ASPSECTA

I

ASPLPMES

Figure 5-27. Locating An LSID From An LPID (Part 2 of 2)

Locate PART and PAT Bit
Suppose LSID 0004E3BO was found in the XPTE that represents the sample
address 07 AI2C:
PART entry index is 04.
Relative byte address (RBA) is E3BO.
The PART has one entry for each page data set, each having a pointer to its
PAT. The PAT is a cylinder bit mapping of this page data set. PATCYLMW is
the number of words that map a cylinder. PATCYLSZ, slots per cylinder, is the
number of significant bits in each cylinder mapping.
For device 2305-2:
P ATCYLMW is 1
PATCYLSZ is lA (26).
To locate the bit in the PAT map for slot E3BO(58288):

1. address of map word = (address of PATMAP) + 8964 = (address of
PATMAP) + ((58288/26) x PATCYLMW x (bytes in a word»
2. bit in the map word (origin 0) = 58288/26

= 22.

Figure 5-28 shows the control blocks involved in relating a virtual address to
the PART and PAT.

5.6.12

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
PSA

+10

FLCCVT

+224

PSAAOLD

-

~

CVT

+2CO

ASMVT

CVTASMVT

/

ASMPART

PAT
(PART

Header

••
•
•
•••
•

ASCB

+34

PARTE

~

~

PAREPATP

/

Header
PAT
BIT
MAP

PAREEDB
-----....... EDB
+4

~RSMHD
+0
RSMVSTO

EDBLPMBA

/

--

__ SGT

~

PGT

•
••
SGTE

ASCBRSM

LPMB

••
•

V

......

PGTE

•
•
•

>- 16 PGTEs

•

•
PGTE

•

-'

XPTE
16 XPTEs {

-----

••
•••
XPTE

I 40 I 80 I 00 I 04 I E3 I BO I 00 I 00 I 00 1 00 1_____
12-byte XPTE

Figure 5-28. Relating the Virtual Address to the PART and PAT

Section 5: Component Analysis

5.6.13

Auxiliary Storage Manager (ASM) (continued)

Converting a Slot Number to Full Seek Address
The full seek address can be used to read the record from the disk and determine
exactly what it does contain.
The PART entry points to the AMB extent descriptor block (EDB) for the data
set. The EDB and its associated LPMB(s) describe the data set on the device. The
EDB consists of an 8-byte header followed by entries, one' for each extent. The
second byte of the header contains the number of entries and the next two bytes
contain the length of an entry;
The relative byte address (RBA) is calculated by multiplying the slot number by
4K. The extent containing this slot is found by comparing the RBA to the low and
high RBAs in each extent. These are found at X'C' and X'10' in the EDB entry.
The second word of the entry thus found points to an LPMB.
To find the relative cylinder, subtract the low RBA of the extent from the RBA
and divide by the allocated unit size (LPMB + 4). To find the relative track, take
the remainder in the division just performed and divide it by the bytes per track
(LPMB + 8).
The remainder of the bytes-per-track division is the first step toward finding the
record number. Add 4095 to the remainder, divide the result by 4096, and add 1.
The result is the R of MBBCCHHR.
To find the CCHH, multiply the relative cylinder by the number of tracks per
allocated unit (LPMB + X'10') and add the relative track (as computed by the
m~thod just shown) and the starting track from the EDB entry + 8. Divide this
result by the number of tracks per cylinder (LPMB + X'12'). The quotient is the
CC and the remainder is the HH.

For example, when given an RBA of X'E3BO' (58288), calculate the
MBBCCHHR for device 2305-2.
(Reference IDAEDB and IDALPMB macros for fields) M
BB
Relative CC

=

(RBA -,- EDBLORBA)/LPMAUSZ
58,288 - 0/53,248 = 1

=

(RBA - EDBLORBA)//LPMAUSZ/LPMBPTRK
5040/13,312 = 0

=

(ReI CC * LPMTRRAU + ReI HH + EDBSTTRK)/LPMTPC
(1 * 4 + 0 + 8)/8 = 1st cylinder

Relative HH

CC

5.6.14

extent number =0
return code = 00

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

HH

=
R

=

(ReI CC * LPMTRRAU + ReI HH + EDBSTTRK)/ /LPMTPC
12//8 = 4th tnick
[««RBA - EDBLORBA)/!LPMAUSZ)/ /LPMBPTRK) +
LPMBLKSZ-l )/LPMBBLKSZ] +1
«5040 + 4095)/4096) + 1 = 3rd record

Therefore: MBBCCHHR

= 0000000001000403

Un useable Paging Data Sets
Certain types of I/O errors received at completion of I/O indicate that ASM is
either unable to, or would be ill-advised to access a particular auxiliary storage
data set any longer. ILRCMPAE, an entry in ILRCMP, receives these errors and
marks the data set as unuseable. For page data sets, the DSBD flag in the PART
entry is turned on. For swap data sets, the DSBD flag in the SART entry is turned
on. Both flags are X'OA' into the respective entries, and are set to X'04'.
ILRMSGOO is then called to determine whether ASM can continue processing
without the data set.
If ASM is unable to continue processing, ILRMSGOO issues message ILR008W
and terminates the system with a X'02E' wait state.
At this point, a stand-alone dump should be taken to determine which of the
above conditions occurred. The console sheet, if available, might also help because
ASM may have previously issued message(s) ILR0091.
If ASM is able to continue processing without the unuseable data set, message
ILR009I is written to the operator. This message indicates which volume contains
the unuseable data set. If this message occurs, use the DUMP command to take
an SVC dump of master's address space to detennine what error occurred. The
options specified should include NUC and SQA.
To determine from the dump what error occurred, the PART or SART entry for
the unuseable data set and the 10SB for the failing request must be located. Use
the AMDPRDMP service aid (print dump) with ASMDATA control statement to
print the dump. One of the following conditions occurred on the data set to make
it unuseable:
• If the number of write errors (X'18' into the entry: PARERRCT or
SRERRCNT) is 175, ASM has stopped using the data set because it has incurred
too many write errors (one way for this to happen is if the data set was not
formatted).
• If the completion code (X'OD') in the 10SB is a X'5 1" ASM has stopped using
the data set because there is no longer a path to the device (this could happen as
a result of an ACR condition).

• If the completion code in the 10SB is a X'6D', ASM has stopped using the data
set because the channel or the device has become non-operational.

Section 5: Component Analysis

5.6.15

Auxiliary Storage Manager (ASM) (continued)

• If the completion code in the IOSH is a X'41 " the device status in the IOSH
(offset X'lB') is X'02' and the sense data in the IOSH (offset X'2A') is
X'1000', ASM has stopped using the data set because an equipment check
occurred.
• If the completion code in the IOSH is a X'41 ' and the channel status in the
IOSH (offset X'19') is X'OB', X'04', or X'02', ASM has stopped using the data
set because a channel check occurred.
The system is terminated only if this unuseable data set (or several unuseable
data sets) caused one of the following conditions:
• The unuseable paging data set contains PLPA pages and the duplex data set,
if any, is already unuseable or full.
• The unuseable paging data set is the duplex data set and not all PLPA pages
are accessible (that is, the PLPA paging data set or a data set containing PLPA
pages is unuseable).
•

5.6.16

The unuseable paging data set is the last local paging data set.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

Page/Swap Data Set Errors
Figure 5·29 shows the messages issued and the actions taken by the ASM I/O sub·
system for various error conditions with the page and swap data sets.
Duplex
Status

Message(s)
Issued
ASM Action Taken

Error Conditions
Common * Available

ILROO51

Spill to Common

Common **Unavailable

ILR0101

Duplex Only

ILROO91,
ILR0101

Duplex Only

PLPA Available

ILROO61

Spill to PLPA

PLPA Unavailable

ILR0101

Duplex Only

ILROO91,
ILR0101

Duplex Only

PLPA or Common Available

ILROO71

Suspend Duplexing

PLPA and Common
Unavailable

ILROO8W

Wait X'03C'

PLPA Full

PLPA Bad

Common Full

Common Bad
Duplexing
Active
Duplex Full

PLPA and Common Available ILROO71

Suspend Duplexing

PLPA or Common Full

ILROO71

Suspend Duplexing

PLPA or Common Bad

ILROO8W

Wait X'02E'

PLPA and Common
Unavailable

ILROO8W

Wait X'02E'

PLPA Full

ILROO51

Spill to Common

PLPA Bad

ILROO8W

Wait X'02E'

Common Full

ILROO61

Spill to PLPA

Common Bad

ILROO8W

Wait X'02E'

PLPA and Common Full

ILROO8W

Wait X'03C'

Local Bad

ILROO91

Stop Writes to Bad
Data Set

Last Local Bad

ILROO8W

Wait X'02E'

Swap Bad

ILROO91

Stop Swap-outs to
Bad Data Set

Last Swap Bad

ILROO91

All Swap-outs Done
to Page Data Sets

Duplex Bad

Duplexing
Not
Active

In Either
Case

* Available - Data Set Neither Full Nor Bad
**Unavailable - Data Set Either Full or Bad

!Fi_gure 5·29. Page/Swap Data Set Error Action Matrix

SectionS: Component Analysis

5.6.17

Auxiliary Storage Manager (ASM) (continued)

Error Analysis Suggestions
The following are some guidelines for determining ASM problems:
• Print the dump specifying ASMDATA as a control statement to AMDPRDMP.

I•

Check SYSI.LOGREC and the LOGREC buffer to see if ASM's mainline has
abended. If it has, a request might have been lost or mishandled.

• Check the trace table for recent ASM activity. The key trace table entries are
SRB dispatches for ILRPTM (address of SRB in ASMPSRB, X'58' into ASMVT),
or ILRSWPDR (address of SRB is SARSRBP, X'30' into SART). Also look for
schedules of the post status SRB closely following an interrupt for ASM I/O
(CSW points to the nucleus area), which could be temporary or permanent I/O
errors coming to ILRCMP or one of its entry points.
• Check for outstanding I/O requests and determine the status of the I/O by
looking at the VCB and 10SB.
• Check for I/O errors on the paging packs, either on the error record (X'14'
into the ASMVT), or on SYSI.LOGREC.
• Scan the ASMHD's LGE process queues (LGEPROCQ) for current VIO
activity. Determine the extent of ASM processing for these LGEs. Determine
the logical group for which a VIO group operator has been requested.
• Scan the PART entries for the PAREFSIP flag which indicates that the PART
entry is locked and that sort slot or part monitor should be processing. Check
the PART and PARTE 10E queues for requests waiting for I/O. Also scan the
SART for the SRELOCK flag indicating that ILRSWPDR should be processing.
•

If you are interested in a specific request, find the request on ASM queues and
determine the extent of ASM processing for the request. For an I/O request,
convert the virtual page number to an LSID.

• Scan the BWK and SWK cell pool for a work area that is not chained to another
work area (offset 0). An unchained work area indicates current ASM processing
or a lost work area.
• Check for suspended ILRPTM or ILRSWPDR SRBs by scanning the PCB I/O
queues (painted to by the RSMHD and the PVT) for a suspended SRB whose
address matches ILRPTM or ILRSWPDR SRB's address. Although this situation
should not occur, it does occur occasionally.

5.6.18

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

Validi ty Checking
ASM is a nucleus-resident, performance-oriented component. For this reason, there
is minimal validity checking in mainline code. In addition, few of ASM's problems
can be attributed to invalid control blocks; this is probably because ASM
communicates only with other system components. In both mainline and recovery
code, critical global control blocks such as the ASMVT, PART, and SART, are used
without any validity checking. ASM's recovery routines do validity check control
blocks (and queues of these control blocks) that represent work to be processed by
ASM. Some of these control blocks are the ACE, AlA, LGE, and PCCW. In most
cases, if a control block fails the validity check, it is no longer used by ASM. The
only exception is the 10RB-IOSB-SRB combination, which is refreshed.

ASM Serialization
Serialization of ASM processing is done using the SALLOC and ASM global locks,
the local lock of the current address, compare-and-swap (CS) logic and control
block queueing.

SALLOC Lock
ASM uses the SALLOC lock to serialize most page and swap processing in I/O
control. The I/O control modules interface directly with RSM, the principle user
of SALLOC, either as the called routine or the calling routine. The SALLOC is
held throughout processing including calls to the VIO ILRPOS and completion
routines. The SALLOC is used to serialize most processing of:
XPTEs
PCB/AlAs
SPCTs
SART
SATs

complete coverage.
.complete coverage, except AlA noted below.
complete coverage.
complete coverage, except where noted below.
- complete coverage.

Specific areas of other control blocks serialized by the SALLOC lock are:
ASMVT

Work save areas.
I/O control section fields:
Flags ASMDUPLX
ASMNOCWQ
ASMCALLQ
ASMNODPX
ASMPLPAF
ASMCOMMF
LGVT pointer

Section 5: Component Analysis

5.6.19

Auxiliary Storage Manager (ASM)·(continued)
ASMVT
(continued)

Non-Via slot allocated count.
Expansion of ASM pools.

ASMHD

I/O control flags.
Swap and page counters.
Swap queue.

ASCB

Non-VIO slot allocated count.

LGVT

Available LGVTE queue.
Expansion of the LGVT.

PART

Count of local page data sets.

Modules whose processing is serialized by the SALLOC lock are:
ILRPAGIO

complete coverage, held by caller.

ILRPAGCM

complete coverage, obtained at entry.

ILRFRSLT

complete coverage, except ILRFRSL 1 entry point where caller
mayor may not hold the lock. The lock is not obtained by
this module, held only if by caller.

ILRSWAP

complete coverage, held by caller.

ILRPTM

only obtained to process data set full conditions for non-local
page data sets.

ILRCMP

only obtained to process I/O completion error conditions that
may require operator notification.

ILRMSGOO

complete coverage for main entry point, held by caller.

ILRPOS

complete coverage, held by caller (except for ILRTRANS
entry point).

ILRVIOCM

complete coverage, held by caller.

ILRGOS

only obtained for LGVT processing and
GETMAIN/FREEMAIN requests.

ILRPGEXP

only obtained to adjust the SART to reflect addition of a new
swap set data and update the count of local page data sets on
the PART.

ILRTERMR

obtained when referencing above control blocks.

ILRPEX

obtained when expanding an ASM pool.

ASM Class Locks
The ASM lock is a global spin class lock. A lockword must be provided when
obtaining or releasing an ASM class lock. A class Jock exists for each active address
space. The lockword is in the ASMHD. It is used by the VIa controller modules.
A class lock is also defined for the PART write queues with its lock word in the
PART header. This lock serializes the four FIFO 10E write queues in the PART.
The address space class locks serialize processing of the following control blocks:

5.6.20

AlA

VIa controller flags, LPID field.

ASMHD

VIa controller flags, LGE queue base pointer.

OS/VS2 System Programming Library: MVSDiagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
ASCB

VIO slot allocation count.

LGE

complete coverage.

ACE

complete coverage.

ASPCT

complete coverage while group operations are in progress.
Group operations and page operations can be executed in
parallel. VIO controller processing of the LGE process queue
provides this serialization.

The address space class locks serialize processing in the following modules:
ILRGOS

partial, obtained where processing above control blocks.

ILRPOS

complete coverage.

ILRSRBC

partial, obtained when searching LGE queue and LGE process
queues.

ILRVIOCM

complete coverage.

ILRJTERM

partial, obtained when adding ACEs to LGE process queue.

Local Lock of Current Address Space
The local lock is used by VIO controller and VIO group operator modules to
serialize certain VIO related operations. It is used by ILRGOS (held on entry) and
ILRJTERM (obtained) to serialize release LG requests with the internal ASM
deactivate function used to clean up VIO logical groups for a terminating job. The
local lock is also used by most VIO-related modules to allow use of branch entry
GETMAIN, rather than the SVC route.
Compare and Swap (CS) Serialization
Certain modules of ASM run without locks, requiring CS serialization of pointers,
flags, and counts. Where routines running with the locks change fields used by
unlocked routines, CS must be used. The I/O subsystem and VIO group operators
run unlocked and are the principle users of compare and swap. Control blocks
serialized via CS include:
PART

a special CS lock exists for each PARTE controlled by PART
monitor. This lock is used mainly for execution control. Most
fields are still serialized by CS. The 10E write queues are the
exception described above.

PATs

complete coverage .

ASMVT

I/O subsystem and group operator sections.
I/O error count.
unreserved slot count.
pool controllers.
VIO slot count.
A special CS lock exists in each SARTE to serialize swap driver
processing of the swap data sets. Other fields updated by swap
driver or I/O completion processing of the I/O subsystem are
updated with CS.

SART

Section 5: Component Analysis

5.6.21

Auxiliary Storage Manager (ASM) (continued)
The ASM modules that run without locks, using CS to serialize control block fields
are:
ILRSWPDR
ILRPTM
ILRSRT
ILRCMP
ILRSAV
ILRACT
ILRRLG
ILRTMRLG
ILRVSAMI

Serialization via Control Block Queues
Certain ASM control blocks are serialized via their available queues. The blocks
are kept on available queues and removed when needed. While in use the block is
so marked and associated with a specific operation and/or control block. Control
blocks included in this category are PCCWs, IORBs, and SCCWs.
The ASPCT is a special case. VIO control enforces the rule that page and group
operations cannot be performed in parallel for a given logical group and its ASPCT.
This is controlled by the LGE process queue. While paging operations are being
performed, the ASPCr is serialized via the ASM class lock of the owning address
space. While a group operation is in progress, ASPCT serialization is maintained
by the ACE for the group operation that is on the LGE process. This ACE
prevents any other processing of ASPCT until the group operation completes.

Recovery Considerations
The philosophy of ASM's recovery is to allow the system and ASM to continue
processing. To accomplish this, the first step in ASM's recovery routines is to
validity check any control block or queue that might have been affected by the
error, for example, the AlAs on the ASMSTAGQ. This is to allow future ASM
processing to proceed without error. The second step in ASM's recovery is to
notify ASM's caller that an error has occurred. In.a few instances w~ere ASM is
directly invoked by RSM (such as ILRPAGIO or ILRSWAP), ASM recovery
attempts retry to return to RSM with a failing return code. When an error occurs
during ASM processing that runs asynchronously, ASM recovery queues the failing
request for eventual return to RSM. When an error occurs during ASM processing
of a VIO group operator request, ASM recovery cleans up its resources and allows
the associated task to terminate.

5.6.22

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

Recovery Traces
A dump of SYSl.LOGREC is a prerequisite to debugging ASM problems. ASM's
recovery always records the SDWA to the SYSl.LOGREC data set. It is the most
convenient way of determining that recovery has been invoked. The recovery
routine id in the SDWA indicates which recovery routine was invoked.
ASM has a number of system abend completion codes ('08x' series) that are
always set up for retry. The purpose of these ABENDs is to record to
SYSl.LOGREC logical errors that have occurred in ASM's mainline processing.

Recovery Structure
ASM has eight recovery routines for ASM mainline:
\

• ILRIOFRR is an FRR that provides recovery for ILRPAGIO, ILRPOS,
ILRPAGCM, and ILRVIOCM. It also acts as a router, giving control to the
routines in ILRSWPO 1.
• ILRSWPOI contains recovery routines for ILRSWPDR and ILRSWAP.
•

ILRSRTO 1 is an FRR that provides recovery for part monitor (ILRPTM) and
slot sort (ILRSRT).

•

ILRCMPOI is an FRR that provides recovery for the I/O completion routine
(ILRCMP).

• ILRGOSO 1 is both an FRR and an EST AE that provides recovery for ILRGOS,
for the group operators ILRSAV, ILRACT, and ILRRLG, and for ILRVSAMI.
• ILRTMIOI is the ESTAE that provides recovery for ILRTMRLG and for its path
through ILRVSAMI.
• ILRSRBOI is an FRR that provides recovery for ILRSRBC.
• ILRFRRO 1 is a validity check routine called by most of the recovery routines.
Additional recovery routines are:
• TERMRFRR is an FRR that is an entry point into and provides recovery for
ILRTERMR.
• ILRJTMOI is an FRR that is an entry point into and provides recovery for
ILRJTERM.
• ILRMSGO 1 is an FRR that is an entry point into, and provides recovery for
ILRMSGOO.

Section 5: Component Analysis

5.6.23

Auxiliary Storage Manager (ASM) (continued)

• ESTAER is an ESTAE that is the entry point into and provides recovery for
ILRPGEXP.
• ESTAEXIT is an ESTAE that is an entry point into and provides recovery for
ILRPREAD.

Recovery As a Debugging Tool
Recovery has a benetlcial effect on problem solving primarily because having it
invoked isolates the problem to a specific area of ASM. If there is a paging interlock or duplicate page problem subsequent to an abend in ASM, the two are
probably related and the first error provides information useful in debugging
the second problem.
Recovery ignores invalid control blocks and truncates some of ASM's internal
queues in order to allow ASM to continue processing. Therefore, recovery will
cover up valid problems that cause code overlays in ASM and other system components.
The primary culprit in covering up errors is the non-historical nature of ASM
resource queues that results in rapid reuses of critical control blocks. The only
valuable information left by the recovery is the SDWA with its variable recording
area in the SYSl.LOGREC data set. At the very least, this record provides
sufficient information to trap the problem when it recurs.

Recovery Footprints
FRR/ESTAE Work Areas

ILRATA and ILREPATH are mapping macros that define the areas required by
ASM modules to provide tracking information for the FRRs and ESTAEs.
• ILRATA defines the six-word parameter area passed to the ASM routine issuing
the SETFRR macro, or it defines the parameter area passed to the ASM routine
issuing the ESTAE macro. It contains a module ID in the first byte, flags in the
next three bytes, and four words which have module-dependent contents.
The IDs of the ASM modules are:
ILRPAGIO
ILRPAGCM
ILRSWAP
ILPTRPAG

('01 'X)

('02'X)
('03'X)
('04'X)

ILRSWPDR
ILRGOS
ILRPTM
ILRSRBC

('OS'X)
('06'X)
('07'X)
('08'X)

ILRCMPDI ('09'X)
ILRCMPNE ('OA'X)
ILRCMP AE ('OB'X)
ILRCMP
('OC'X)

• ILREPATH defines a variable-length area containing any additional recovery
audit-trail data required for recovery by ASM recovery routines. The address of
the EPATH, if present, is in the AT A. There are four variations of the EPATH
area.

5.6.24

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
The formats of ILRAT A (ASM tracking area - AT A) and ILREP ATH
(recovery audit trail area - EP ATH) are described later in this chapter in the
topic "ASM Recovery Control Blocks".

SDWA Variable Recording Area
ASM uses the SDW A variable recording area (SDW AVRA) to save the
contents of the ATA (and EPATH, if present) upon entry to some of the recovery
routines. This preserves the original state of the error before recovery took place.
ILRIOFRR saves the ATA. ILRGOSOI, ILRCMPOl, and ILRSRTOI save the ATA
and EPATH. ILRTMIOI saves only the EPATH.

ASM Diagnostic Aids
This section contains diagnostic aids that are helpful in debugging problems in
ASM. Topics included are:
• COD ABEND Meanings for ASM
• ASM Recovery Control Blocks
• Additional ASM Data Areas

Section 5: Component Analysis

5.6.25

Auxiliary Storage Manager (ASM) (continued)

COD ABEND Meanings for ASM
An RSM routine has found one of the following conditions which should not occur
and has set the appropriate return code in register 15:
RC 4

- The count of available swap sets· for a specific swap data set is non-zero
but no available swap sets could be found.

RC 8

- The total count of available swap sets is non-zero but none of the swap
data sets contain available swap sets.

RC 12 - The group operations starter has returned from one of the group
operators but the ACE is not the first one on the LGE queue.
RC 16 - The memory termination resource manager for ASM has found that
LSQA is not available for an address space that is abnormally terminating
for one of the following reasons:
1.
2.
3.

address space is not swapped in
address space is in process of being swapped in
RSMLSQA frame queue is unusable.

RC 20 - The ASM SRB controller has found an AlA or ACE on the LGE process
queue which does not have the LPID converted flag on.
A software error record is written to SYS I.LOGREC and recovery processing
continues.

ASM Recovery Control Blocks
During error recovery and cleanup processing, the ASM recovery routines communicate with other routines by using the ASM tracking area (ATA) and recovery
audit trail area (EPATH).

ASM Tracking Area (ATA)
The AT A contains information necessary for the recovery or cleanup processing
performed by the ASM recovery routines. The ATA is mapped to the six word
work area returned by SETFRR when an FRR is established. For task mode
routines, the ATA is mapped to the parameter area that is passed via the ESTAE
macro.

5.6.26

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)

\
The mapping macro name is: ILRATA.

Disp
0
0

Name

Size

Description

ATA
ATAMODID
ATAMPGIO
ATAMPGCM
AT AM SWAP
ATAMTRPG
ATAMSWPD
ATAMGOS
ATAMPTM
ATAMSRBC
ATAMCMPD
ATAMCMPN
ATAMCMPA
ATAMCMP

24
1
01
02
03
04
05
06
07
08
09
OA
OB
OC

ASM Tracking Area
ID of module establishing recovery routine.
ILRPAGIO module ID.
ILRPAGCM module ID.
ILRSWAP module ID.
ILRTRPAG module ID.
ILRSWPDR module ID.
ILRGOS module ID.
ILRPTM module ID.
ILRSRBC module ID.
ILRCMPDI module ID.
ILRCMPNE module ID.
ILRCMPAE module ID.
ILRCMP module ID.

ATASFLGS

3

ATAQIOE
ATASLSQA
ATASCOMP
ATAVIOCM
ATAPCOMP
ATAPOS
ATAPAGIO
ATAPAGCM
ATASWAP
ATATRPAG
ATASWPDR
ATASRT

800000
400000
200000
100000
080000
040000
020000
010000
008000
004000
002000
001000

Bit map representing logical sections of ASM
routines; set to 1 on entry, set to 0 on exit.
ILRQIOE flag.
ILRSLSQA flag.
SWAPCOMP flag.
ILRVIOCM flag.
PAGECOMP flag.
ILRPOS flag.
ILRPAGIO flag.
ILRPAGCM flag.
ILRSWAP flag.
ILRTRPAG flag.
ILRSWPDR flag.
ILRSRT flag.

The remaing flags are reserved:
ATARFLGS
ATACNVRT

2
8000

ATASGNST
ATASCCWP
ATABADPK

4000
2000
1000

Other recovery flags.
ILRSLSQA flag-converting between forward
chained AlA's and lateral chained AT A's.
ILRSLSQA flag-in ASSIGNSET subroutine.
ILRSLSQA flag-in SCCWPROC subroutine.
ILRCMPAE flag-in BADPACK subroutine.

Section 5: Component Analysis

5.6.27

Auxiliary Storage Manager (ASM) (continued)

Disp

Name

Size

Description

The remaining flags are reserved:

6

7

ATARCRSN
ATARCRF1
ATARCRF2
ATARCRF3
ATARCRF4
ATARCRFS
ATARCRF6
ATARCRF7
ATARCRF8
ATARCODE

1
80
40
20
10
08
04
02
01

1

Recursion flags.
Recursion flag-function 1.
Recursion flag-function 2.
Recursion flag-function 3.
Recursion flag-function 4.
Recursion flag-function S.
Recursion flag-function 6.
Recursion flag-function 7.
Recursion flag-function 8.
Reason code for ASM-issued ABEND's.

The mapping of the remaining four words is dependent on the recovery routine
involved.
For the recovery routine ILRIOFRR:
8
8
8
C

ATAWORDS
ATAAIA
ATAACE
ATAASCB

16
4
4
4

C
C

ATALGE
ATAAIAQ

4
4

Maximum size of four-word area.
Address of in-process AlA.
Address of in-process ACE.
Address of in-process ASCB, or TRAS'd-to
address space.
Address of in-process LGE.
Address of AlA queue.

For the recovery routine ILRSWP01:
8

ATACLEAR

16

8
C
10
14

ATAAIA
ATASARTE
ATASCCW
ATAIORB

4
4
4
4

Definition allowing next four words to be
cleared.
Address of in-process AlA.
Address of SART entry.
Address of in-process SCCW.
Address of in-process IORB.

For the recovery routine ILRGOSO 1 :
8
C

ATAWORKA
ATAEPATH

4
4

Address of work-area cell.
Address of EPATH.

For the recovery routine ILRSRT01:
8
C

5.6.28

ATAWORKA
ATAEPATH

4
4

OS/VS2System Programming Library: MVS Diagnostic Techniques

Address of PTM work-area cell.
Address of EPATH.

li.l

~

Auxiliary Storage Manager (ASM) (continued)

Disp

Name

Size

Description

For the recovery routine ILRSRB01:

8
C
10
14

ATAAIACE
ATAAIAQ
ATAACEQ
ATAEPATH

4
4
4
4

Address of in-process AlAI ACE.
Address of AlA queue.
Address of ACE queue.
Address of EPA TH.

For the recovery routine ILRCMP01:

8
C

ATAIOSB
ATAPCCWQ

4
4

10

ATACOMPQ

4

14

ATACPCCW

4

Address of in-process IOSB.
Queue of PCCWs to be put back on PCCW
available queue.
Queue of AlAs to be returned to
ILRPAGCM.
Address ofin-process PCCW, not on IORB
queue and not on ATAPCCWQ.

For the recovery routine ILRJTM01:

8
8

ATASAVE
ATAACEQ

4
4

Address of register save area.
Address of ACE queue.

For the recovery routine TERMRFRR:

8

ATARMPL

4

C

ATAWORKA

4

Address of RMPL, resource manager
parameter list.
Address of work-area.

Recovery Audit Trail Area (EPATH)
The EPATH is a communication area between the mainline routine and its
corresponding recovery routine. The EPATH is necessary when the 6 word ATA
is not large enough to accomodate the data to be tracked. The mapping of the
EPATH is dependent on the recovery routine or mainline routine including the
macro.
EPATH for ILRPTM, ILRSRT, and recovery routine ILRSRTOI:

Disp
0
4

8

Name

Size

Description

EPAPARM
EPAIOEIP
EPAIOEQP

4
4
4

Address of parameter list.
Address of IOE currently being processed.
Address of first IOE on 'WORK' read IOE
queue.

Section 5: Component Analysis 5.6.29

Auxiliary Storage Manager (ASM) ( continued)

Disp

Name

Size

Description

C

EPAFFIOE

4

10

EPALFIOE

4

14

EPAWRTQ

4

18

EPAWTPAT

4

IC
20

EPACYLA
EPAMSPAD

4
4

24
26

EPAWRTCT
EPACPUID

2
2

Address of-first IOE on free IOE internal
queue.
Address of last IOE on free IOE internal
queue.
Address of write queue from which last
write IOEs removed.
Address of SCYLWRT which is used to
update current CYL Map.
Address of current CYL Map.
Address of 2 word parameter list for
ILRMSGOQ. Also serves as a switch for
ILRPTM.
Number of writes prepared for current CYL.
Processor locking count for current part
monitor processing.

EPATH for VIa group operators and their recovery routines - ILRGOS, ILRSAV,
ILRRLG, ILRACT, ILRVSAMI, ILRGOSOl, ILRTMRLG, ILRTMIOO, ILRTMlOl,
ILRSRBC, and ILRSRBOl. ILRGOSOI is the recovery routine for ILRGOS which
callsILRSAV, ILRRLG, and ILRACT which call ILRVSAMI. ILRTMIOI is the
recovery routine for ILRTMRLG which calls ILRVSAMI and ILRTMIOO.
ILRSRBO 1 is the recovery routine for ILRSRBC which calls ILRRLG. The first
section is common because of the use of ILRVSAMI. The second section is
dependent on the recovery routine involved.

Disp

5.6.30

Name

Size

0

EPAOWKA

4

4

EPAVWKA

4

4
4
8
8
C
C
10
10

EPATMWKA
EPASWKA
EPAAASP
EPADSLST
EPABASP
EPATMIBA
EPA RASP
EPATMACB

4
4
4
4
44
4
4

14

EPARTYRG

4

14

EPABKSLT

4

OS/VS2 System Programming Library: MVS Diagnostic Techniques_

Description
Group Operator's or ILRTMRLG's workarea address.
ILRVSAMI workarea address also points to
RPL in workarea.
ILRTMIOO workarea address.
ILRSRBC workarea address.
Address of active ASPCT.
Address of data set name list storage.
Address-of buffer ASPCT.
Base address value for ILRTMIOO.
Ada-ress of retrieved ASPCT.
Address of storage used to build ACB for
STGINDEX in ILRTMIOO.
Address of 15 word save area containing
retry_registers RO-RI4 for record-only abends.
Backing slots, only used for assign
processing.

Auxiliary Storage Manager (ASM) (continued)

Disp

Name

Size

18

EPAFLAGI
EPAVSAMI
EPAGRPOP
EPARLG
EPASAVE
EPAACT
EPAACASR
EPAASGN

X'80'
X'70'
X'40'
X'20'
X'IO'
X'08'
X'04'

EPAUNSAV

E~AMAST

X'02'
X'OI'
I
X'80'
X'40'
')('20'
X'IO'
X'08'

EPATMI
EPARECUR

X'04'
X'02'

1ft

X'OI'

1ft

19

EPAFLAG2
EPATMXIT
EPAWARM
EPACOLD
EPABUlLD

Description
Recovery flags.
ILRVSAMI currently processing.
One of group operators processing.
ILRRLG is currently processing.
ILRSAV is currently processing.
ILRACT is currently processing.
Activate or assign request.
Assign processing - backing slots count
(ASMBKSLT) has been updated.
Mark slots unsaved in active ASPCT.
Reserved.
Recovery flags.
ILRTMIOO completed processing.
ILRTMIOO warm start is processing.
ILRTMIOO CVIOSTRT is processing.
ILRTMIOO BUILDSNL is processing.
Master scheduler initialization has been
posted.
ILRTMIOO is currently processing.
Recursion indicator for retry into mainline
ILRTMRLG.
Reserved.

For ILRGOSO I, ILRSAV, ILRACT, ILRRLG, ILRSRBC, and ILRSRBO I :

Disp

Name

Size

Description

IA
IC

EPALSIZE
EPALGVTP

2
4

20

EPALGEP

4

24
28
2C
30
32

EPASRB
EPAACE
EPARBASP
EPARSIZE

4
4
4
2
2

Size of LGVT expansion.
New LGVT address for LGVT expansion
in ILRGOS.
Logical group entry for request being
processed.
Address of SRB for SRB controller.

*

Address of current ACE being processed.
Address of rebuilt ASPCT (LSQA).
LSQA block storage size for rebuilt ASPCT.
Reserved.

Section 5: Component Analysis

5.6.31

• "xiliary Storage Manager (ASM) (continued)
For ILRTMI01 and ILRTMRLG, andILRTMIOO:

Disp

Name

Size

Description

1A
1C
1C

EPAACE
EPAMSECB

*

2
4
4

20
24
24

EPATMRSV
EPAABEND
EPATMIRT

4
4
4

28

EPATPART

4

Reserved.
Address of ACE currently being processed.
Address of master scheduler initialization
ECB.
Address of ILRTMRLG save area.
Retry address for record-only abends.
Current retry address for failure in
ILRTMIOO.
Address of TPARTBLE while in ILRTMIOO.

Additional ASM Data Areas
The following four ASM data areas (BSHEADER, BUFCONBK, DSNLIST, and
MSGBUFER) are not contained in OS/VS2 Data Areas. For debugging ASM,
BSHEADER (bad slot record) may be especially helpful.
BSHEADER

5.6.32

Acronym:

BSHEADER

Full Name:

ASM error record (bad slots) .

Macro ID:

None.

Size:

1024 bytes.

Function:

Trace table of the last 253 slots that ASM has found to be bad.
Patterns of bad LSIDs can indicate where and what paging data sets
are having difficulties.

Location:

Pointed to by ASMVT (ASMEREC).

Name

Description

4

BSCURR

Current bad slot entry filled.

4 (4)

4

BSFIRST

Beginning address of table.

8 (8)

4

BSLAST

End address of table.

12 (C)

1012

BSLIST

253 four-byte bad slot identifiers (LSIDs).

Offset

Length

0(0)

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Auxiliary Storage Manager (ASM) (continued)
BSLIST entry
0(0)

BSFLAG

1. ..
1...
1 (1)

3

BSSPLSID

if 1, LSID entry is swap.
if 0, LSID entry is page.

BSRDLSID

if 1, LSID entry is for a read error.
if 0, LSID entry is for a write error.

BST ABNTY

LSID that is bad.

BUFCONBK

Acronym:

BUFCONBK

Full Name:

VSAM buffer control block.

Macro ID:

None.

Size:

12 bytes.

Function:

Queue VIO group operation for later processing until VSAM
resources are available.

Location:

Pointed to by ASMVT (ASMGOSQS).

Name

Description

4

BUFCHAIN

Pointer to next BUFCONBK.

4 (4)

4

BUFASCB

Pointer to ASCB.

8 (8)

4

BUFACE

Pointer to ACE.

Offset

Length

0(0)

DSNLIST

Acronym:

DSNLIST.

Full Name:

Data Set Name List (ASM).

Macro ID:

None.

Size:

44 times number of possible page/swap data sets. There are two
DSNLISTs, one for page data sets and one for swap data sets.

Function:

Make data set names available in non-fixed (pageable) storage.

Location:

Pointed to by PART (PARTDSNL) for page data sets, and by SART
(SARDSNL) for swap data sets.

Offset

Length

0(0)

44

Name

Description

DSNENTRY Data set name left-justified and padded
with blanks.

Section 5: Component Analysis

5.6.33

Auxiliary Storage Manager (ASM) (continued)
MSGBUFER

5.6.34

Acronym:

MSGBUFER.

Full Name:

ASM message buffer.

Macro ID:

None.

Size:

376 bytes.

Function:

Ensure that WTOR ~ith LOGREe request will have a buffer
to use.

Location:

Pointed to by ASMVT (ASMMSGBF).

Offset

Length

Name

Description

0(0)

4

MSGCURR

Pointer to current buffer used.

4 (4)

4

MSGFIRST

Pointer to first buffer.

8 (8)

4

MSGLAST

Pointer to last buffer.

12 (C)

4

MSGTERM

Pointer to special termination buffer.

16 (10)

240

MSGBFRS

Three 80-byte buffers ..

256 (100)

120

MSGTBFR

Special termination buffer.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM)

The system resources manager (SRM) is a component of the MVS control program. It determines which, of all active address spaces should be given access to
system resources, and the rate at which each address space is allowed to consume
the resources.
An installation controls the MVS system primarily through the SRM.
The evaluations and resulting decisions made by the SRM are dependent
on the constants and parameters with which it is provided. The reader should
understand the philosophy inherent in the use of these constants and parameters, so
that their use will produce the desired effect. Part 3 of OS/VS2 System Programming Library: Initialization and Tuning Guide, provides the background
information necessary to understand the controls available through the SRM,
and the implementation of these controls.

SRM Objectives
The SRM bases its decisions on two fundamental objectives:
I. To distribute system resources among individual address spaces in accordance
with the installation's response, turnaround, and work priority requirements.

2. To achieve optimal system-throughput through use of system resources.

An installation specifies its requirements for the first objective in a member of
SYSl.PARMLIB called the installation performance specification (IPS). Through the
IPS, the installation divides its types of work into distinct groups called domains,
assigns relative importance to each domain, and specifies the desired performance
characteristics for each address space within these domains. A secondary input to
the SRM is another member ofparmlib, the OPT member. Through a combination of IPS and OPT parameters, an installation can exercise a degree of control
over system throughput characteristics.
When the need arises, trade-offs can be made between SRM's objective.s. That is,
the installation can specify whether, and under what circumstances, throughput
considerations take priority over turnaround reqUirements. The SRM attempts to
ensure optimal use of system resources by periodically monitoring and balancing
resource utilization. If resources are under~utilized, the SRM attempts to increase
the system load. If, on the other hand, resources are over-utilized, the SRM
attempts to reduce the system load or to shift commitments to low-usage
resources such as the processor, logical channels, auxiliary storage, and pageable
real storage.

Section 5•. Component Analysis

5.7.1

System Resources Manager (SRM) (continued)

Address Space States
The SRM recognizes address spaces as being in one of three general states. Each
state corresponds in concept to a queue on which SRM places the SRM user
control block (OUCB) which describes the address space. These three states are:

1. In

The working set of an address space in this state occupies real storage.

2. Wait

The working set of an address space in this state does not occupy real
storage. It has been swapped-out, because it cannot be put into
execution.

3. Out

~

The working set of an address space in this state does not occupy real
storage; however, the address space is capable of executing and can
be considered for swapping-in.

It is important to recognize that the correspondence between these states and
presence on the associated queue is not precise; an address space can be in transit
between two states (for example, it may be in the process of being swapped-out).
Thus, the presence on a particular queue might not exactly mirror the physical state
of affairs. Further, these classes are necessarily broad, and SRM recognizes subclasses; this is especially true among address spaces belonging to the "In" class.
The use of the swap transition flags, in conjunction with the presence of an OUCB
on a particular queue, mirrors the exact physical state of an address space. For wait
state analysis, the exact state of given address spaces is important. If you can
determine precisely what state SRM considers the various ~ddress spaces to be in,
and the reasons why, you will gain insight for further analysis. The OUCB is the
primary address-space-related control block in which much of the above
information can be found.
In the OUCBQFL field (OUCB + X'10'), when the OUCBGOB bit is on, the
SRM's OUCB repositioning routine is to be invoked. The destination of this
pending OUCB repositioning is indicated by the following bit settings:
1. OUCBOUT='O'B

~

The OUCB will be placed on the "In" queue.

2. OUCBOUT='l 'B and OUCBOFF='l 'B
"Wait" queue.
3. OUCBOUT='l 'B and OUCBOFF='O'B
"Out" queue.

The OUCB will be placed on the
~

The OUCB will be placed on the

When the repositioning is completed, the OUCBGOB bit is turned off; the
setting of the OUCBOUT and OUCBOFF bits indicates the location of the OUCB.
The setting of the swap transition flags for swap-out processing occurs in the
following order:
1. If swap-out is initiated successfully, the OUCBGOO bit is set.
2. At quiesce-complete time, the repositioning of the OUCB takes place.
3. At swap-out-complete time, the OUCBGOO bit is turned off.

5.7.2

OS!VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
The setting of the swap transition flags for swap-in processing occurs in the
following order:
1. If swap-in is initiated successfully, the OUCBGOI bit is set.
2. At restore~complete time, the repositioning of the OUCB takes place and the
OUCBGOI bit is turned off.

SRM Indicators
It is helpful to understand how SRM views the total MVS system, as well as the
individual address spaces. This understanding can assist you in further problem
analysis, especially of enabled wait state situations. A discussion of some of the
SRM system and individual user indicators follows. Figure 5-30 shows the
relationships among important SRM control blocks and queues.
A study of several counters and flags aids in further understanding of SRM
processing. The counters and flags that pertain to the entire system are located in
the SRM constants module (IRARMCNS), which resides in the nucleus. The
counters and flags that pertain to a specific user are found in that user's OUCB.

System Indicators
The SRM control table (RMCT) is located at the start of module IRARMCNS.
This address is found at the CVT + X'25C'. Generally, when SRM is in control,
the address of the RMCT is contained in register 2. In the module IRARMCNS,
the following fields provide information concerning SRM's current processing:
MCTAVQl

(RMCT + X'IDB'; bit 2)
This bit indicates that the count of available pages has fallen below
the PVTAFCLO value, so the real storage manager (RSM) has
called SRM to steal pages in order to increase the count of available
pages. If this bit is on, it could indicate a normal condition.

MCTSQAI

(RMCT + X'IDB'; bit 0)
Indicates that the number of available SQA pages is critically low.
If MCTSMSI (RMCT + X'I09'; bit 4) is 1, the operator was
notified of this situation.

MCTSQA2

(RMCT + X'IDB'; bit 1)
Indicates that the number of available SQA pages has fallen below
a second, more critical threshold than the one noted
above. If MCTSMS2 (RMCT + X'I 09'; bit 5) is I, the operator was
notified of this situation.

MCTASMI

(RMCT + X'109'~ bit 0)
Indicates that the SRM has detected that less than 30% of all local
slots are available. The SRM has informed the operator of this fact
and has taken appropriate action to relieve the shortage.

Section 5. Component Analysis

5.7.3

System Resources Manager (SRM) (continued)
~

IRARMCNS

CVT

\Y------:l:.--~ RMCT
(SRM Tables and Entry Points)

--

4
4
4

-

..

-

4
4

0#

"'

CCT
ICT
MCT
RMPT
RMCA
WMST (IPS Information)
R LCT (Logical Channel
Information)

25C

OUCBs

•+

RMEX
RMSB·

-

+
(Workload Activity
+WAST
Specifications)
(Workload Activity
+WAMT
Information for MF/1)
TMQE

4

ACTION QUEUE

,0UCBs

Anchor Queues

''Wail'
OUCBs

H+

Timed
Actions

•

.OTQE
INQE
Algorithm Request Bits
Immediate Algorithm
Request Bits
Service Work
+ Request
Area (RQSV)
DMDT (Domain Descriptor
+Table)

4

J
I

RCT
RMPT
RMCA
RMEX
RMSB

~

EPAT
EPDT
EPST

I

DMDT (Last Entry)

CCT
ICT
MCT

CPU Usage Information
I/O Usage Information
Storage Usage Information
Resource Control
Information
Swap Analysis Parameters
Swap Analysis Variables
External Entry POint
Descriptors
Subroutine Entry Points
Algorithm Entry Point
Descriptors
Serialized Action Entry
Point Descriptors
Scanned Action Entry
Point Descriptors

Figure 5-30. SRM Control Block Overview (Part 1 of 2)

5.7.4

V

WTQE

TMQE

WTQE

•

OUCBs

for RMEPs in
the EPAT, EPDT

EPAT

4

.J ~eferred Aciion ~

1

+ EPDT

0#

"

~

~

+

--

• RMCT

OS/VS2 System Programming Library: MVS Diagnostic Techniques

RMEPs
Entry Point
Descriptions

I 11+ 'I
'11 O~CBS ~
Wait
Queue

OTQE

OUCBs

L" I '0 t'

... Oul

Queue

INQE

~I+

.. -,

In
Queue

OUCBs

I

I 'In'
OUCBs

~

System Resources Manager (SRM) (continued)
SRM Registers

+RRPA

,......
RRPA (Recovery Parameters)

~

+RMCT

Register 0 Contents on Entry
(ASID, PGN, SYSEVENT Code)
Register 1 Contents on Entry
(I nput Parameter Address)

RMEP
"7'

Entry point descriptor
of routine most
recently entered

t RMEP

WMST
Bf--~=~---

1

+ PGVT
...-------1
• PGDT

+

POVT

+PODT

~

f---.-

Pointer to and indexes into performance group descriptor table
(entry location similar to that for domain descriptor table)

J

Pointer to and indexes into performance objective table
(entry location similar to that for domain descriptor table)

1 - - - - - - - - - - - - - -.....

• DMDT
- - - ----4
DMVT

+

DMDT
(domain descriptor table)

~ Entry for i th domain

x

--

-

.~

IMCB

ASCB

~----I~

t OUCB

I

f

I \

I/O
Measurement
Information

/J-f_O_U_X_B-I

t

ASXB

OUCB

J

SRM User
Statistics

"- t

~

ASCB

J

+IMCB

\OUSB
SRM User
Statistics
(Temporary)

SRM User
Statistics
(Swapped)

I Figure 5-30. SRM Control Block Overview (part 2 of 2)
Section S. Component Analysis

5.7.5

System Resources Manager (SRM) (continued)
MCTAMS2

(RMCT + X'ID9'; bit 3)
Indicates that the SRM has detected that less than 15% of the total
local auxiliary storage slots are available. The SRM has informed the
operator of the slot shortage, and has taken appropriate action to
relieve the shortage.

MCTFAVQ

(RMCT+X'ID8'; bit 3)
This bit indicates that the count of fixed pages in the system is
above the threshold value, PVTMAXFX, and the real storage manager (RSM) has called SRM to swap-out the users responsible for the
shortage of pageable frames. If MCTFXI (RMCT + X'1 D9'; bit 6) is
1, the operator was informed of this situation.

RCVUICA
(RMCT+ X'21E'; halfword)
RCVCPUA
(RMCT + X'220'; halfword)
RCVASMQA (RMCT + X'224'; halfword)
These values are the system contention indicators that the resource
monitor examined for the last interval. They represent in the order
given; the average unreferenced interval count (UIC), the average
processor utilization, and the average ASM queue length. Based on these
values, the target MPL for a domain is altered.
RMCAINUS

(RMCT + X'29E'; halfword)
Indicates the count of address spaces currently residing in storage.
This count includes non-swappable address spaces; If this count is
high, look at the next field.

CCVENQCT

(RMCT + X'138'; halfword)
Indicates the count of address spaces currently residing in storage
and marked non-swappable because they are holding ENQ resources
that other address spaces want.

Individual User Indicators
The SRMuser control block (OOCB) contains flags and counters to provide
information about a specific user. There is one OUCB for each address space,
pointed to by ASCBOUCB (ASCB + X'90').
The following fields help in the understanding of specific user
characteristics.
OUCBMWT

(OUCB + X'15'; bit 7)
If this bit is on, the SRM has detected that this user has not been
dispatched, but was occupying storage for at least ~ seconds. (This
interval is processor-model dependent.) The user will be swappedout until the dispatcher informs SRM that the address space has
work to do.

5.7.6· OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
OUCBAXS

(OUCB + X'12'; bit 5)
When this bit is on, the user has been swapped-out of storage
because the user's address space was obtaining auxiliary storage slots
at the fastest rate in the system when an ASM slot shortage
occurred.

OUCBENQ

(OUCB + X'il'; bit 6)
A different address space has tried to ENQ on a resource held by this
address space. This user is treated as non-swappable for an installation-defined time perioQ.

OUCBYFL

(OUCB + X'12') See specific bit deSignations below:
• Bit 1 - indicates that the user was created via a.START
command.
• Bit 2 - indicates that the user was created via a TSO LOGON
command.
• Bit 3 - indicates that the user was created via a MOUNT
command.

OUCBFXS

(OUCB + X'12'; bit 7)
When this bit is on, it indicates that the user has been swapped-out
of storage because the user's address space had allocated to it the
greatest number of fixed frames when a pageable frame shortage
occurred.

~,

OUCBJSAS

(OUCB + X'I7'; bit I)
When this bit is on, it indicates that, at the time of job select
processing for this user, there was an auxiliary slot shortage. This
user's initiation is being delayed until the shortage is relieved.

OUCBJSFS

(OUCB + X'I7'; bit 0)
When this bit is on, it indicates that there was a pageable frame
shortage at the time of job select processing for this user. This
user's initiation is being delayed until the shortage is reHeved.

OUCBSRC

(OUCB + X'2S'; 1 byte)
This field contains a code describing why this user was
last swapped-out. The codes are:
01
Terminal output wait
02 - Terminal input wait
03 - Long wait
04 - Auxiliary storage shortage
OS
Real storage shortage
06
Detected wait
07
Reqswap or transwap SYSEVENT issued

Section S. Component Analysis

5.7.7

System ResQurcesManager (SRM) (continued)
08
09
OA OUCBRDY

ENQexchange by swap analysis
Exchange based on recommendation values by swap
analysis
Unilateral swapout by swap analysis.

(

(OUCB +X'56'; bit 0)
This flag indicates that ready work became available for this address
space which was swapped-out due to a long wait. The address space
is now capable of executing and is a candidate for swap-in.

Other Indicators
TheSRM domain descriptor table can be useful in pinpointing a problem involving
SRM's MPL control. Mapping of the table can reveal why a user is kept out of
main storage, why erratic response time occurs, and other user and system
information.

SRM Error Recovery
SRM maintains two functional recovery routines (FRRs) that are located in
IRARMERR. One FRR (recovery routine 1 - RR1) gets control whenever
errors occur after SRM is branch-entered by a routine that holds a lock higher in
the lock hierarchy than the SRM lock. The other FRR (recovery routine 2 RR2) gets control whenever errors occur and SRM is running with the SRM lock.
If it is suspected that SRM is entering error recovery and a stop is necessary at
the time of error, RMRR2INT is a subroutine common to both RRI and RR2.

(

Recovery routine 1 (RR1) retries if a retry routine exists. If no routine exists,
or if the error recurs, RRI percolates the error.
With recovery routine 2 (RR2), many special situations such as the following
are first checked:
• Is RMF active and should it be terminated?
• Is SET IPS active and should abend code be converted?
• Is OUCB valid and should abend code be converted?
Then RR2 retries if a retry routine exists. If no retry routine exists, or if the
error recurs, RR2 percolates the error.

Module Entry Point Summaries

I

Figure 5-31 shows a cross reference between SRM modules and entry points.
Descriptions of selected SRM modules and entry points are:

(
5.7.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
IRARMINT

- SRM Interface Routine

IGC09S

- SVC entry pOint to SRM.

IRARMIOO

-Branch entry point to SRM.
Handle all external SYSEVENTs.

IRARMI48

- Branch entry point to the SRM.
Handle the internal SYSEVENT (48).

IRARMIOI

- Entry point from IRARMEVT or IRARMCTL.
Return to the SYSEVENT issuer.

IRARMIIO

- Entry point from IRARMEVT.
Abend a user of the SRM.

IRARMEVT - SRM SYSEVENT ROUTER
IRARMEVT - SYSEVENT processor.
Begin to process the indicated SYSEVENT.
IRARMXVT - SYSEVENT retry.
Prepare a retry of a SYSEVENT that had incurred a system error.
IRARMDEL - Synchronize address-space delete processing.
IRARMIPS

- Set new IPS.
Invoke IRARMSET to establish a new IPS.

IRARMVXB - Synchronize OUXB deletion at swapout-completion time.

IRARMSTM - Storage Management Routine
IRARMPRI

- Page Replacement Normal Processing.
Examine each user in main storage and the system pageable area,
and call RSM real frame replacement to update VICs for each
user.

IRARMPRS

- Page Replacement Real Page Shortage Force Steal.
Steal as many pages as required to relieve a real page frame
shortage. The steal decision is made at entry IRARMMS2. The
oldest unreferenced pages are stolen first.

Section 5: Component Analysis 5.7.9

System Resources Manager (SRM) (continued)
IRARMMS2, ' - Real Page Shortage Prevention.
Calculate the number of frames necessary to reach the O.K.
threshold, and schedule IRARMPRS processing (if a real page
shortage eXists). Inform the operator of users that have the
greatest number offixed frames and direct the swaps of these
users (if a pageable real page shortage eXists).
IRARMMS6

- Main Storage Occupancy Long Wait Detection.
Discover users who have gone into long wait without notifying
SRM. Swapout such users, if swappable.

IRARMASM - Auxiliary Storage Shortage Monitoring.
Monitor the extent of auxiliary shortage allocation. If auxiliary
pages are in short supply, inform the operator and direct swaps of
users who are most rapidly acquiring auxiliary storage slots.
IRARMSQA

STEAL

lRARMSRV

II

- SQA Shortage Message Writer.
Inform operator of system queue area shortages.
Internal STM Steal Subroutine.
Add users to RFR interface list until full, then call RSM real
frame replacement (RFR) routine (via IRARMI03) and record
the number of pages stolen.

- SRM Service Routine.

IRARMI02

Interface to ASCB CHAP.

IRARMI03

Interface to RSM's real page frame replacement.

IRARMI04

Obtain or free SQA storage.

IRARM 10 5

Requeue SRM TQE routine.

IRARMI06

Cross-memory post entry point.

IRARMI07

SWAP SRB SCHEDULE routine.

IRARMI09

RECORD entry point.

TR"D~Mn1oL"
i\.~ l~l

Set a return code of 16 in register 15 and return. (Dummy routine)

1

IRARMERR - SRM'sFunctional Recovery Routine.
IRARMRRI

Functional recovery for globallY-locked entries (entries to
SRM in which the SRM lock could not be obtained).
Retry the failing SRM routine when possible. Otherwise,
percolate the error.

5.7.10

OS/VS2System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
IRARMRR2

- Functional recovery for non-globally-locked entries (entries
to SRM in which the SRM lock was obtained). Validate
queues and clean up. Retry the failing routine if possible;
otherwise percolate the error.

RMRR2RTY

Return to RTM indicating retry.

RMRR2PER

Return to R TM indicating perc,olation.

RMRR2INT

FRR initialization.

RMRR2VLD - Validate control blocks.
RMRR2GST - Release the dispatcher lock in order to call IRARMI04.
RMRR2CKQ - Verify the location of an OUCB.
RMRRIVFB

Verify addresses.

RMRR2REQ

OUCB enqueue routine entry point.

RMRR2SPR

Return with the return code in register 15.

IRA RM CPM

Processor Management.

IRARMAPI

Automatic Priority Group Reorder Processing.
Recompute dispatching priorities for all APG users in main
storage. Invoke ASCBCHAP for each user whose dispatching
priority has chan.ged.

IRARMEQI

ENQ/DEQ Algorithm ENQ Time Monitoring.
Stop giving extra CPU service to users with ENQHOLD
SYSEVENTs outstanding who have already received their
guaranteed processor service.

IRARMCLO

Processor Load Balancing User Swap Processing.
Compute user processor usage profile at QSCECMP SYSEVENT.

IRARMCLI

Processor Utilization Monitoring.
Compute processor utilization variables for processor load
balancing and resource management algorithms.

IRARMCL3

Processor Load Balancing User Swap Evaluation.
Produce a numerical recommendation value that reflects the
desirability of swapping a user based on processor utilization.

SectionS: Component Analysis

S.7.11

System Resources Manager (SRM) (continued)
CHAP

~

CPLRVSWF

- IRARMCPM Internal Wait Factor Computation Subroutine.
Compute system wait factor for CPU load balancing
recommendation value.

CPUWAIT

- IRARMCPM Internal Wait Time and Processor Utilization
Compute Subroutine.

IRARMCPM Internal Chapping Subroutine.
Search queue for APG users with changed dispatching priorities.
Put them in a list and call ASCBCHAP.

Compu te accumulated system wait time total for all processors
and compute recent processor utilization.
CPUTLCK

IRARMCPM Internal Processor Utilization Checking Routine.
Ensure that the computed processor utilization percentage falls
between 0 and 100 percent. If 100 percent and lowest priority
user has not been dispatched, set to 101 percent.

NEWDP

IRARMCPM Internal APG Computation Routine.
Compute mean time to wait and a new dispatching priority for
the APG user.

IRARMIOM
IRARMILO

IRARMILI

5.7.12

- I/O Management.
I/O Load Balancing User I/O Monitoring.
Compute I/O usage profile for all swappable problem-state users.
- I/O Load Balancing Logical Channel Utilization Monitoring.
Compute channel utilization values for I/O load balancing, page
replacement algorithms, and the device allocation SYSEVENT.

IRARMIL3

I/O Load Balancing User Swap Evaluation.
Compute numerical recommendation value that reflects
desirability of swapping a user based on logical channel
utilization.

IRARMIL4

I/O Load Balancing IMCB Deletion Routine.
At the end of the user job step, clean up the control blocks used
in monitoring a heavy I/O user.

LCHUSE

Internal I/O Subroutine.
Compute logical channel utilization, request rate, and I/O load
balancing recommendation value computation factor.

OS/VS2 System Progtamming Library: MVS Diagno~tic Techniques

System Resources Manager (SRM) (continued)
IRARMRMR

Resource Manager

IRARMRMI

Resource Monitor Periodic Monitoring.
Accumulate, at periodic sample intervals, several system resource
contention indicators and the number of ready users for each
domain.

IRARMRM2 - Resource Monitor MPL Adjustment Processing.
Compute the average system resource utilization and determine if
the system MPL should be raised or lowered.

IRARMCTL

- SRM Control Algorithms.

IRARMCTL

- Mainline Control Processing.
Transfer to deferred user action processing (IRARMCEN) and
then to the algorithm request routine (IRARMCEL).

IRARMCEN

Deferred User Action Processing.
Examine the OUCBACN field of the OUCBs on the action queue
and routes control to all routines whose request bits have been set
in that field. Dequeue each OUCB after its indicated actions
have been performed.

IRARMCEL

- Algorithm Request Routine.
Examines the RMCTALR and RMCTALA fields in the RMCT.
Routes control (via IRARMCRT) to each algorithm whose
request bit has been set in either of the two fields. Reset the
individual request bit after each algorithm completes.

IRARMCET

- Periodic Entry Point Scheduler.
Accept timer interrupts, schedule the algorithms currently due
for execution, and requeue the SRM timer element to permit
interrupts again when the next algorithm is due for execution.

IRARMCED

- SRB-Dispatched Original Entry Processor.
Receive control under an SRB scheduled by the dispatcher and
set up an entry to the mainline of SRM (IRARMCEN) by invoking
SYSEVENT 48.

IRARMCQT

- Periodically-Invoked Entry Point Rescheduler.
Accept a request to reschedule the execution of a periodicallyinvoked algorithm, and requeue the corresponding RMEP block
on the timed entry queue.

IRARMCRD - SRB Scheduling Routine.
Accept a request to schedule the SRM SRB which, if available, is
scheduled to obtain the SRM lock.

Section 5: Component Analysis

5.7.13

System Resources Manager (SRM) (continued)
IRARMCRL

- Algorithm Scheduling Routine.
Accept requests for an algorithm to be run. Turn on the bit in
the RMCTALA or RMCTALR associated with the algorithm.

IRARMCRN - Action Request Routine.
Accept requests for an action requiring the SRM lock. If the
SRM lock is held, control is given to the action immediately via a
routing routine. If the SRM lock is not held, the bit is set in the
OUCBACN field of the OUCB associated with the requesting
user, to indicate that the action requested is deferred.

5.7.14

IRARMCRT

- Entry Point Table Scanner.
Accept an 'invocation bit pattern and an entry point table
address. Compare the bit pattern to invocation flags in the entry
point table entries. When a match is found, invoke the routine
identified by the entry point.

IRARMCRY

User Swap Request Receiving Routine.
Accept a request for a user swap and check to see if such a swap
is already in progress. Route control to IRARMCSO or
IRARMCSI if a swap is not in progress and the SRM lock is held.

IRARMCSI

User Swap-In Request.
Accept a swap-in request, allocate an OUXB for the user, and
initiate the swap-in.

IRARMCSO

- User Swap-Out Request.
Accept a swap-out request and post the region control task's
quiesce routine to initiate the swap-out.

IRARMRPS

OUCB Repositioning Routine.
Dequeue an OUCB and requeue it at the end of the queue
specified in its OUCBQFL field.

IRARMWMY

Periodic Entry Point RequeUing Routine.
Requeue all of the members on the timed algorithm queue and
adjust all the time-due fields.

IRARMCAP

- Swap Analysis Algorithm.
Attempt to keep the multiprogramming level (MPL) at its target
level in each domain by performing user swaps.

IRARMCPI

- Select Swap-In Candidate Subroutine.
Scan the OUT queue for the user in a particular domain with the
highest recommendation value.

IRARMCPO

Select Swap-Out Candidate Subroutine.
Scan the IN queue for the user in a particular domain with the
lowest recommendation value.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
IRARMCVL

- User Swap Evaluation Routine.
Compute a numerical value representing the recommendation for
a user to be swapped in. This recommendation value is the sum
of the user's workload level and the recommendations of the I/O
and processor resource managers.

IRARMWAR - Workload Activity Recording
IRARMWRI

- Workload Activity Recording Initialization Subroutine.
Constructs and initializes the workload activity measurement
table (WAMT) in the buffer (storage from SQA obtained by
MF /1 and input with SYSEVENT 45).

IRARMWR2

- Workload Activity Recording WAMT Initialization Subroutine.
Build the WAMT in a format suitable for updating by the SRM.

IRARMWR3

- SRM Workload Activity Recording Data Collection Subroutine.
Move the contents of the WAMT into a collection buffer capable
of containing the data. (Note that the buffer is obtained by
MF/l from LSQA, storage key 0, and must be fixed in storage).
If the IPS has not been changed, add to the collected data the
transaction data for the current in-storage interval for each instorage address space with an active transaction, re-initialize the
data collection buffer for the next collection interval, and calculate the workload level for each performance group period that
contains transaction data.

IRARMWR4 - SRM Workload Activity Recording Transaction Data Update
Subroutine.
Add the service and transaction active time to the appropriate
WAMT performance group period accumulator in the data
collection buffer.
IRARMWR5

SRM Workload Activity Recording Workload Level Calculation
Subroutine.
Calculate the workload level for each WAMT performance group
period entry in which a transaction has been accumulated during
the data collection interval. Note that for those WAMT entries
in which the service rate calculated can be associated with
multiple workload levels, or is zero (even through at least one
transaction has been active during the data collection interval),
the negative value of the workload level is calculated to
indicate an estimated value to MF /1.

IRARMWR6

SRM Workload Activity Recording Transaction End Update
Subroutine.
Add the transaction elapsed time to the appropriate WAMT
performance group period accumulator and count the number
of transactions that terminated during the current data collection interval.

Section 5: Component Analysis

5.7.15

System Resou~ces Manager (SRM) (continued)
IRARMWR7 - SRM Workload Activity Recording WAMT Entry Determination
Subroutine.
Obtain addressability to the WAMT performance group period
entry used to accumulate user transaction information.
IRARMWR8

SRM Workload Activity Recording.
Terminate workload activity data collection whenever an IPS
change occurs.

lRARMWLM - SRM Workload Manager
IRARMWMI

- Workload Manager Service Calculator Routine.
Calculate the amount of service provided to a user since
the beginning of the current workload manager measurement
for that user. Service is calculated using the following
equation:
Service = (MP)/K)+(CT/K)+EI where:
.T

the TCB processor time elapsed for the current interval.

K
M

the time required to execute 10,000 instructions.
(dependent on the processor mode!).

=

the MSO service coefficient scaled by 1/50.

P

the number of page-seconds used by the user.

C

the processor service coefficient.

E

the Excp count for this interval.

I

the I/O service coefficient.

This routine calculates each of the three service factors and the
total service for the user for the interval.
IRARMWM2 - Swappable User Evaluation Routine.
Scan the in-storage queue and the out-of-storage-butready queue, and evaluate each swappable user, .
assigning each his current workload level.
IRARMWM3 - Individual User Evaluation Routine.
Evaluate a swappable user on the in queue or the out
queue, assigning acurrent workload level.
IRARMWM4 - Workload Manager Workload Level Calculator Subroutine.
Accept a service rate and a performance objective, and
calculate the corresponding workload level.

5.7.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
IRARMWMS - Workload Manager Update Performance Group Period Subroutine.
Test whether a user has accumulated enough service/time
to be assigned to a new performance group period. If so,
adjust the pointers that indicate the performance group
period, the performance objective, APG priority, and the
domain applicable to the transaction current for the user.
Note that the frequency (resolution) at which the test for
period-end is made depends on how often IRARMWMS i.s
called for any given user.
IRARMWM7 - WLM Recommendation Calculation Routine.
Calculate a workload manager recommendation value for
a user, based on the service that was received and on the
performance objective currently associated with the user.
Users who have not yet received an amount of service equal to
their interval service value (ISV) specification while in storage
are given a recommendation value boost. The boost gives
preferential treatment to users in their ISV as compared to
users not in their ISV, or to users between job steps.
IRARMHIT

- Workload Manager User Ready SYSEVENT Swap-In Scheduling
Routine.
Reposition the now-ready user from the wait queue to the out
queue. Receive control as the result ofa decision to apply
swap-in processing to a now-ready user.

IRARMWMI

- Workload Manager In Storage Interval Change Subroutine.
Update the transaction accumulators with the service
and the time received by the user during the preceding
in-storage interval.

IRARMWMJ

- Routine to Determine the Scope of ApplicabilitY' of Analysis to
a User.
Examine the current swap status and the performance
specification for a user. Indicate if the resource manager
algorithms are applicable to this user.

IRARMWMK - WLM Dontswap/Okswap User Analysis Routine.
Calculate the current service and ensure that the user is in
the correct performance group period. Set applicable
algorithm indicators based on the new swap status of the
user.

Section 5: Component Analysis

5.7.17

System Resources Manager (SRM) (continued)
IRARMWMN - Workload Manager Transaction Start Routine.
This routine receives control as the result of a SYSEVENT that
has been defined by the workload manager to signify that a new
transaction should be started for that user. If the user is not in
storage; a flag is set to cause the IRARMWMN routine to be
reentered during the swap-in of the user. Otherwise, any existing
transaction is stopped by calling IRARMWMO, and the user
transaction fields are reset to reflect the new transaction field
being started.
IRARMWMO - Workload Manager Transaction Stop Routine.
This routine receives control as the result of a SYSEVENT that
has, been specified by the workload manager as defining the end
of any current user transaction. If a new transaction is to be
created for the user, IRARMWMO indicates the end of the
current transaction. If the next user event is known,
IRARMWMO leaves the transaction accumulated values for later
resumption of the transaction. In any case, IRARMWMO causes
the preceding time and service to be properly recorded for the
current transaction.
IRARMWMQ - Workload Manager Quiesce Completed SYSEVENT Processing
Routine.
~
This routine receives control when a user has stopped executing
and is being swapped out so that the workload manager can
record the service given that user while he was in storage. The
workload manager determines if a user event caused the swapout, and flags the user to indicate whether previous service is to
be considered when the user is next swapped in.
IRARMWMR - Workload Manager Restore Completed SYSEVENT Processing
Routine.
This routine receives control when a user has been swapped in
and is ready to begin executing. The workload manager sets up
the fields used to calculate the service rate received by the user
during the forthcoming in-storage residency period.
IRARMSET

- Set to New IPS Non-Resident Action Routine.
Replace the internal IPS currently in use by the SRM with a
new IPS. All references to the old IPS in the SRM's control
blocks are resolved with offsets or addresses in the new
one.

IEEMB812

- Set IPS Processor.
IEEMB812

Open PARMLIB. Processes the IPS
parameter of the SET command.

IRARMRDR - Obtain a buffer and
reads records from PARMLIB.
IRARMWTR -

5.7.18

Write a message to system log.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

System Resources Manager (SRM) (continued)
I RARM IPS

- SRM List Processor.
IRARMIPS

IRARMFRE
IRARMOPT
I EAVNP 10

- Scan the IPS List in the SYS I.P ARMLIB
member, and if valid, build control blocks
containing the IPS information.
Free the obsolete IPS tables.
- Scan the IEAOPTxx member of PARMLIB.

SRM Initialization.
IEAVNPI0

1. Initialize constants in SRM tables. ~
2. Initialize sysgened address spaces for the
SRM.
3. Process the APG, OPT, and IPS system
parameters.

IRARMRDR - Obtain a buffer and read a record from
SYSl.PARMLIB.
IEEDISPD

Display Domain Processor.
Write a console display of entries in the domain descriptor
table to a target console.

IEE8603D

SETDOMAIN Command Processor.
Process the SETDMN command by altering the domain
descriptor table.

Section 5: Component Analysis

5.7.19

System Resources Manager (SRM) (continued)

~
MODULES

SRM
ENTRY
POINTS

CHAP
CPLRVSWF
CPUTLCK
CPUWAIT
IGC095
IRARMAP1
IRARMASM
IRARMCAP
IRARMCEO
IRARMCEL
IRARMCEN
IRARMCET
IRARMCLO
IRARMCL1
IRARMCL3
IRARMCPL
tRARMCPO
IRARMCQT
IRARMCRO
IRARMCRL
IRARMCRN
tRARMCRT
IRARMCRY
IRARMCSI
IRARMCSO
IRARMCVL
IRARMDEL
tRARMEQ1
IRARMFRE
IRARMHIT
IRARMIOO
IRARMI01
IRARMI02
IRARMI03
IRARMI04
IRARMI05
IRARMI06
IRARMI07
IRARMI09
IRARMI10
t,RARMI48
IRARMILO
IRARMIL 1
IRARMtL3
IRARMIL4
IRARMIPS
IRARMMS2
IRARMMS6

Q
Q..

0

Q..

Z

~

«

w
w

>

!:!:!

0

N

coCD

en

0

aco

~

00

~

-

w

w
w

Z

()

~

a:
!!:

~

Q..
()

~

a:
!!:

-I

t;
~

a:
!!:

« « «

a:
a:
w
~

a:

I-

~

>
w

zI-

Q

a:
!!:

~

a:

~

~

a:
!!:

« « « «

9:

~

~

a:
!!:

«

(!)

en

~
~

a:

a:
~
a:
~
a:

I-

w

en
~

a:

>
a:

~

a:

en

I-

en

~

a:

a:

a:

~

~

~

~
-I

~

~

a:

« « « « « « «
9: !!: !!: 9: 9: !!: a:

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

IRARMNQP

Figure 5-31. SRM Module/Entry Point Cross Reference (part 1 of 2)

5.7.20

en

~

OS/VS2 System Programming Library: MVS Diagnostic Techniques

X
X
X

System Resources Manager (SRM) (continued)

~
MODULES

SRM
ENTRY
POINTS

IRARMOPT
IRARMPR1
IRARMPR5
IRARMRDR
IRARMRM1
IRARMRM2
IRARMRPS
IRARMRR1
IRARMRR2

IRARMR16
IRARMSOA
IRARMUXB
IRARMWM1
IRARMWM2
IRARMWM3
IRARMWM4
IRARMWM5
IRARMWM7
IRARMWMI
IRARMWMJ
IRARMWMK
IRARMWMN
IRARMWMO
IRARMWMO
IRARMWMR
IRARMWMY
IRARMWR1
IRARMWR2
IRARMWR3
IRARMWR4
IRARMWR5
IRARMWR6
IRARMWR7
IRARMWR8
IRARMWTR
IRARMXVT
IRARMXTL
LCHUSE
NEWDP
RMRR1CKO
RMRR2GST
RMRR21NT
RMRR2PER
RMRR2REO
RMRR2RTY
RMRR2SPR
RMRR2VFB
RMRR2VLD
STEAL

I

Q
Q.

z
>

«

!:!:!

0

Q.

~

0

w
~

("II

0

com 8
<0
:E

w
~

co
w
!:!:!

en
Z

:E
Q.

(.)

(.)

:E

:E

a:

..J

a:

~
:E
a:

!

!

« « «

!

a:
a:

w

I~
w z

a:

a:

:E

:E

:E

a:

:E

Q

:E

a:

en

2::
:E

a:

(!)

en

:E
:E

a:

a:
a:
~
a:

:E

Iw
en

:E

a:

a:

t;

«

:E

en

a:

a:

a:

a:

>
a:

:E

:E

:E

..J

:::E :::E

« « «
« « « « « « « «
~
! a: ! ! ! ! ! ! ~ !

«

!

X
X
X
X

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

X

X
X
X
X

X
X
X
X
X
X
X
X
X
X

X

'Figure 5-31. SRM Module/Entry Point Cross Reference (part 2 of 2)

Section 5: Component Analysis

5.7.21

5.7.22

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VTAM

This chapter outlines the important aspects of VTAM problem analysis. It is
important that the problem solver have some understanding of how VTAM
works.
The following publications provide important VT AM structure, logic, control
block format, and debugging information:

• OS/VS2 VTAM Logic
• OS/VS2 System Programming Library: VTAM
• OS/VS2 VTAM Data Areas
• OS/VS2 MVS VTAM Debugging Guide
VTAM is a subsystem in itself. For VTAM problem determination, it is
especially important to understand how work progresses through VT AM via its
internal dispatching mechanism, process scheduling services (PSS).
Some of the VT AM concepts that are discussed in this section are:
• Process scheduling services (PSS)
• VTAM's Relationship With MVS
• Processing Work Through VTAM (PABs, FMCBs)
• VT AM Locking
• VT AM Recovery/Termination
• VTAM Debugging

VTAM's Relationship with MVS
VTAM has its own address space to manage the network control program (NCP)
network. Under VTAM's main task, the following services are performed:
• VT AM initiation and termination.
• VARY, DISPLAY, and MODIFY network operator commands.
• An access method control block (ACB) is opened to VTAM so that VTAM
services can be used to communicate with an NCP.
The VARY processor obtains and releases IBM 3704/3705 communications
controllers through the system dynamic allocation services. A waiting subtask is .
posted to build an NCP resource definition table (RDT) which provides a table
definition of the network. Another subtask is attached to actually load the 370x
NCP. This procedure allows multiple 370x's to be activated concurrently.
If TOLTEP and NETSOL are selected, each has its own subtasks operating in
VTAM's address space. Each is also connected to VTAM with an OPEN ACB.

Section 5. Component Analysis

5.B.1

VT AM (continued)
Because the VT AM address space owns the 370x, lOS schedules global SRBs to
this address space for POST STATUS processing. Normally, however, VTAM uses
a disable interrupt exit (DIE) to run its channel end appendage. The DIE
schedules SRBs (physically located in the I/O buffer) into the application
program's address space to run the posting process. POST STATUS is used only to
handle error situations or when RNIO is beirig traced with GTF active.
VTAM operates in the application program's address space when a service is
requested by the application program. Local SRBs are used for all VT AM I/O
processing to terminals or logical units. Other VTAM services such as
OPNDST /CLSDST are run under an IRB from the task that opened the VT AM
ACB. VT AM exits, ACB and request parameter list (RPL), are given control.
when VTAM issues a SYNCH under the IRB. This means that a VTAM exit runs
as a parameter request block (PRB) under the.task that opened the VTAM ACB.
VTAM macros (for I/O or other services) can be issued from these exits; however,
if the SYN option is used on the macro, a serialization bottleneck can result.
As seen from this explanation, VT AM's address space is hot used for normal
I/O activity. In analyzing VTAM problems, do not l?e concerned if several
tasks are waiting in VTAM's address space. These tasks .are the operator control,
NCP communication, and initiation/termination tasks, and are normally waiting·'
in VTAM's address space.

Processin~

Work Through VT AM
Following is an explanation of the dispatching mechanism and the associated key
control blocks that the problem solver should understand.
VTAM satisfies an application program's request by executing a series of
processes. Examples of processes are control layer and TPIOS; each process is a
discrete piece of work.
Each process is represented by a process anchor block (PAB) which is four
words long and serves as a serialization mechanism for a resource. See Figure 5-32.
A PAB always resides within some larger control block called a major control
block such as an FMCB or an ACDEB. A process is always executed for a
particular terminal, logical unit, or option as defmed by the major control block

PAB.

5.8.2

OS!VS2 System Programming Library: MVS Diagnostic Techniques

VTAM (continued)

Major Control
Block

~

8

PST

r.

~

8
PAB

'MPST'

Offset
Flag

WEL

WEL
(Work Element)

CHAIN

1 DVT.

T

+RPH

CRA (Component
Recovery Area)

Register l'

~

RPH

PSS CRR
(lSTAPCRR)
PROCESS
CRR

I

Figure 5-32. VTAM Control Block Structure

Section 5. Component Analysis

S.S.3

VTAM (continued)
The first word of a PAB contains a work element pointer. A work element is a
parameter list for the process. A request parameter list (RPL) and it logical
channel program block (LCPB) are examples of work elements. The high-order
bit (byte O,X'80') of this first word is a gate bit which indicates that a
work element has been queued to the PAB. The gate bit serves as a serialization
mechanism; as more work elements are queued to the PAB, the gate bit prevents
rescheduling of the PAB until it can handle the work. The gate bit is needed to
prevent double scheduling of the PAB, because for many VTAM processes the
process scheduling service (PSS) dequeues the work element before it gives the
process control.
The TPQUE macro is always used to queue work elements. This in-line macro
checks the gate bit to determine if scheduling is required and, if so, executes an
inner macro, TPSCHED.
The second word ,of the PAB is the PAB chain field. As a general convention,
PSS and its macros (for example, TPQUE) use the second word of any control
block as a chain field. The end of the chain is indicated by X'80000000' in the
chain field. The PAB chain field is used to chain the PAB to some queue, for
instance, a dispatching queue. The chain field's high-order bit is a gate bit. The
gate bit indicates that the PAB has been scheduled for dispatching.
Following are the three ways to schedule a PAB for dispatch:
• While running under a VTAM process, queue the PAB to the PABQ in the
request parameter header (RPH). The PAB will be dispatched when the
current process completes.
• If not running as a VT AM process, queue the PAB to the process scheduling
table (PST) for task-related work, or to the memory process scheduling
table (MPST) for address-space related or cross-address-space related
work.
• DIRECT scheduling causes an SRB, with the PAB address as a parameter, to be
scheduled to a special PSS entry point. TPIOS uses this method to initiate
inbound processing from the DIE.
Note that if the PAB chain gate is off while the work element gate is on, the PAB is
probably suspended. A TPSCHED macro is required to reactivate the process.
The third word in the PAB contains the PAB offset and the destination vector·
table (DVT) pointer.

5.8.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VT AM (continued)
The PAB offsetis used to locate the beginning of the major control block. It
is necessary to locate the beginning of the major control block because there is a
PSS convention that uses the third word of the major control block as a pointer
to the process scheduling table (PST). The fourth word of the PST points to the
memory process scheduling table (MPST), which is related to a particular address
space. The PAB offset then provides a means to identify task and address space
relationships for a given PAB. As a rule, the PST is used to schedule processes to
run under the IRB of a particular task, while the MPST is used for scheduling a
local SRB into the address space.
The PAB DVT pointer points to the beginning of a module list, that is, a list of
addresses that are entry points to the modules to be given control during the
process. Because the DVT defines a whole process that is to be executed, many
PABs will have DVT pointers to the same DVT. The next entry in the DVT to be
given control is kept in the request parameter header (RPH); the RPH is updated
each time the TPESC macro is used to pass control to the next module in the DVT.
The fourth and last word of the PAB contains a byte of flags and a pointer to
the request parameter header (RPH). The flag byte contains scheduling indicators
for PSSand a bit to indicate whether or not PSS should dequeue the work element
for the process. The RPH pointer is set by PSS when the PAB is to be dispatched
and is reset to zero when the process completes. Register I always points to the
RPH when the process is given control. All of the information relating to the
process is stored in the RPH. This information includes such items as pointers to
the work element (RPHWEA), the PST (RPHTSKID), a resume address
(RPHRESMA) and a register save area (RPHWORK) if the process is suspended,
and back pointers to the PAB (RPHMAJCB). The RPH resides within the
component recovery area (CRA).

VT AM Function Management Control Block (FMCB)
The function management control block (FMCB) is the primary control block
used in controlling I/O processing between an application program and a
destination node (terminal, component, logical unit (LV), etc). This block usually
contains the most information when a problem develops in the I/O processing to a
particular node. The FMCB is created at OPNDST time; at least one FMCB exists
for each open connection. All FMCBs for an application are chained together
(at offset X'4') out of the application's ACDEB (at offset X'40'). In addition to
the application FMCBs, VT AM maintains FMCBs for such things as dial-in lines
.and cluster control units. For logical units, there is also an SSCP FMCB chained
from VTAM's ACDEB that is used for network control.
The FMCB contains the PABs that control processing through control layer
(inbound/outbound) and TPIOS (outbound). Although there is aPAB in the
FMCB for TPIOS inbound, the PAB in the DNCB is normally used to control it.
In addition to the PABs, the FMCB contains many flags and indicators and some
queue headers. These flags and headers are described in the OS/VS2 Data Areas
(microfiche).

Section 5. Component Analysis

5.S.S

VTAM (continued)
The wait queues at offset X'110' and X'll~e FMCB are important in
debugging. These fields are used to queuerOgical channel program blocks (LCPBs)
that have had channel programs built and queued to be shipped out to the 370x
or local 3270. The LCPBs are dequeued from the wait queues when the
requested operation completes. Expect to se~ read-type operations queued to
the wait queue because these operations do not complete until data is entered
and received from the associated terminal. However, if write or control type
operations are not completing, investigate the situation further.

VT AM Operating Characteristics
Th~following

topics describe characteristics ofVTAM's operating environment.

Module Naming Convention
Each VTAM module name indicates the type of processing that it performs.
,Following are the major VTAM module naming groups and the processes
associated with them:
Module Group

Process

ISTAICxx

Application program interface

ISTAPCxx

Process scheduling services ('.RSS - VTAM's dispatching'
mechanism)

ISTDCCxx

Basic and common

ISTRCCxx

Record format control layer

ISTZxxxx

TPIOS

ISTORFxx

Storage management

ISTOCCxx

OPNDST, CLSDST, OPEN, CLOSE

ISTINxxx

SSCP(VARY, DISPLAY)

ISTRAMxx

Task termination and address space termination resource,
manager

IS TSDCxx

SYSDEF

contro~

layer

Address Space Usage
VTAM's modules reside in the nucleus, the VTAM address space private area, and
the LPA. The nucleus contains the attention handling routine and type 1 SVC'
routine. The private area contains modules for initialization, termination, initial
command processing, SYSDEF, and NETSOL. The LPA contains all of the other
VTAM modules.
Most of the VT AM control blocks are lo~ated in the CSA. The data buffers as
well as the majority of the control blocks occupy the 11 buffer pools that are
allocated at VTAM initialization with the CSA.

5.8.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VT AM (continued)

Locking
Because VTAM uses a number of SRBs and TCBs, it is important to serialize
VTAM's internal use of shared resources (that is, to prevent simultaneous update
of a control block by two different processes). The VT AM locking structure
accomplishes this serialization. The VTAM locking structure is an internal VTAM
function not visible to the user or MVS. VTAM's locking structure is totally
independent of the MVS locking structure.
In storage, the locks exist as full words in various areas of the VTAM control
blocks. Following is the organization of the lockword:
Lockword

x
o,,----_______v~----------~,----------~v------------1 byte - Count of lock
holders of this lock.

22 bits - Chain of RPHs
(request parameter
headers) when process(es}
is waiting for this lock.

Low-order bit on
indicates lock is
held exclusively.

Each lock is defmed as being of a certain lock level. This allows a lock to be
maintained according to a predetennined hierarchy. These lock levels are usually
not of significance to the debugger except that he needs to know that they exist
so he can interpret the lock level bits in the component recovery area (CRA) as
described below.
Locks are obtained and released by using VTAM internal macro instructions.
The access to the locks is controlled as follows:
• A shared request for a lock that is free or held as shared (with no outstanding
exclusive requests) is honored immediately.
• An exclusive request for a lock that is held as shared is queued until all current
shared requests are released.
• Any request for a lock that is held as exclusive, or has an exclusive request
outstanding, is queued until the exclusive use is complete.
The locks held by a process are indicated by the lock level bits in the CRA
(at offset X'B'). The pointers to the various locks are located at X'C' through
X'30' of the CRA. The pointers to the locks are filled in when a lock request is
made; therefore, only the locks currently held have valid pointers. Locks are held
only for the duration of a VTAM process; all locks must have been released when
a process exits.

Section

5~

Component Analysis

S.B.7

VTAM (continued)
Examples of a locking situation:

CRA+X'8'

I

6 5 4 3 2

1

I

~_O_-_-_-_-_-_-_-_-_-___
-_-_-_-_-_-_-_~_O__O
__1__0__
0__
0~.

(Bit 32)

Lock level
Bit setting

+ (Bit 1)

L

indicates Level 4
lock held

CRA+X'18'~I______~t_'l_e_ve_I_4_'_o_ck____________________
If the CRA lock accounting word appeared as above, it would mean that a level
4 lock was held by the process currently active. Offset X'18' (level 4 lock pointer)
of the CRA contains the pointer to the lock in question. Refer to OS/VS2 Data
Areas (microfiche) for details ..
RPHs that are waiting for a lock will be queued onto that lock. Multiple RPHs
waiting for the same lock will be chained together. This relationship is shown in
Figure 5-33.
Summary of VTAM Locking: The main concern of the debugger regarding
locks is that a process can be forced to wait because it cannot obtain a
lock. The lock is unavailable because it is held by some other process. This
situation is reflected by an active CRA with a resume address that points to
code that follows a lock request. Layout the CRAs, etc., as shown in Figure 5-33
and investigate those processes that are waiting, determine what lock is being
requested, locate the current lock holder, and determine why the lock has not been
released.

VTAM Recovery/Termination
VTAM recovery/termination is accomplished by means ofSTAE, EST AE, FRR,
and resource manager routines.
The exact recovery action attempted by VTAM depends on conditions at the
time of the errors. However, for debugging purposes, the basic functions
ofVTAM are the following:
• To record the SDWA in SYS 1.LOGREC
• To take an SDUMP.
• To terminate the application program or VTAM. (Note, if the error occurs in
the VTAM address space, VT AM generally attempts to simulate a normal
shutdown.)

5.8.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VTAM (continued)
LPBUF
A lockword,
somewhere in
storage

CRA 1

@ of a lock

10 0000 0000

!

RPH

CRA 2

@ of a lock

10 00000000

CRA3

!

RPH

@ of a lock
A lockword,
somewhere in
storage

@ RPH

0000 0000

Resume @

@ofa lock

CRA 5

4

~O_O_O_O_O_O_O_0-l

10

Resume @

l
f

4

0000 0000

10

Resume @

RPH

Non-waiting RPH (CRA 1) holds the lock that RPH 3 (CRA 3) and RPH 4 (CRA 4) are
waiting for. Non-waiting RPH 2 (CRA 2) holds a lock no RPHs are waiting for.
Waiting RPH 3 (CRA 3) holds a lock that no RPH is waiting for. Waiting RPH 4
(CRA 4) holds a lock that RPH 5 (CRA 5) is waiting for.

Figure 5-33. Several RPHs Waiting for the Same Lock

Section 5. Component Analysis

5.8.9

VTAM (continued)
The termination of VTAM or VTAM applications causes the VTAM resource
managers to get control. The resource manager routines clean up the VTAM
resources allocated to the terminating task or address .space.
VTAM recovery/termination functions affect debugging in the following ways:

1. A dump and SDWA in SYSI LOGREC are provided for the error condition. If
the error was in a VTAM application address space, VTAM and other VTAM
applications continue to run. This allows you to debug certain problems
without having a major impact on the installation's operation.
2. Subsequent errors can occur in termination, in VTAM, or in other VTAM
applications as a result of the original error that was undetected. If this is the
case, a dump can be very difficult to understand because cleanup was attempted
or performed on behalf of the original error. Always be aware of any problems
that have occurred prior to the particular problem being diagnosed. To detect
these previous problems, inspect the in-storage LOGREC buffer and
SYSI.LOGREC.
In addition to the major recovery action described above, there are other
recovery actions:
• A failure during command interpretation (but not during command execution)
results in the loss of the current operator command, but continued availability
of the operator control function.
• Failure during various SSCP functions can result in the immediate termination of
VTAM without simulating a normal shutdown.
• Failure in the storage management services (SMS) modules, ISTORFBA and
ISTORFBD, results in a failure of the storage request, but does not cause
termination. The module requesting SMS service is informed of this action by
return codes.
• Authorized path entry/exit errors are retried or the RPL is posted with an error
indication.

VTAM Debugging
Because VTAM is a large component that interacts with other components and
application programs, when you debug VTAM you must look at a number of
factors besides the storage dump. Begin the debugging process by
considering the operating environment and all the conditions that could have led
to the suspected error.

5.8.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VT AM (continued)
Following are some items you should look at when attempting to solve VTAM
problems:
• Console sheet
• GTF traces (especially for VTAM I/O activity)

I • SYSl.LOGREC entries for program checks or MDR records
• NCP generation listing
• PTF level of the system

Waits
VTAM waits can happen to the following groups:
• Entire VTAM component and all VTAM applications
•

One or more applications only

• VTAM network operator commands only (possibly only VARY)
• One or more terminals only
Also, a wait can occur when VTAM will not halt.
VT AM process scheduling services (PSS) routines control the flow of work
through VT AM by performing an internal VTAM dispatching function. In
debugging, the most important control block in determining the status of the
dispatching activity and the wait states is the request parameter header (RPH),
which is located within the component recovery area (CRA).

I

If a VT AM internal process is waiting for a VTAM lock, for storage to become
available, or for another process to complete, this condition is reflected by an
active CRA. CRAs are found in the LPBUF buffer pool. (See Figure "How to
Locate the CRA" in OS/VS2 MVS VTAM Debugging Guide.) This pool is located
as shown in Figure 5·34. The RPH is located X'34' bytes into the CRA; it can be
recognized by the string X'016C' in the first halfword. Offset X'lO' into the RPH
is the RPHRESMA field; this field contains zeros if the RPH is not waiting or a
resume address if it is waiting.
Once you find a waiting RPH, the best way to determine why it is waiting is
to find the module at the address in the resume address field, and then look at
the module listing. Unless the wait is for a lack of buffers (which can be resolved
by increasing the number of buffers), further analysis is necessary to determine
why the process is not being posted or why a lock is not being freed.

Section 5. Component Analysis

5.8.11

VTAM (continued)
RPHs waiting for a VTAM lock are queued onto that lock. Multiple
RPHs waiting for the same lock are chained together (as shown in Figure 5-33).
If a process holds any locks, the lock level bit at offset 8 in the CRA indicate
the level of the lock(s) being held. Pointers to the various locks are located at
offsets X'C' through X'30' of the CRA. Note that although all these pointers can
contain addresses, only the pointers to the locks held or requested during the
dispatching of the current process are valid.
A VT AM internal process can often be waiting for storage. VTAM routines
obtain and release buffers by using VTAM internal macro instructions. These
macro instructions branch to the VT AM storage management modules that control
the buffer pools allocated at VTAM start time. Because the number of buffer
pools is specified at VTAM initialization and is constant, it is possible to encounter
a shortage condition in unusual situations. See the section on tuning VTAM in the
OS/VS2 System Programming Library: VTAM for information on specifying the
proper storage pool values, the threshold value effect, and slowdown processing.
To determine if a buffer pool is in a slowdown state, do the following:

I•

Locate the buffer pool control blocks (BPCBs)as shown in Figure 5~34.

• Look at offset X'lO' of each BPCB. If the X'40' bit is on, the buffer pool is in
a slowdown state. (The BPCBs are located in contiguous storage and can thus
be scanned quickly.)
If a dump has been taken because of a wait-type problem in VTAM, and the
dump shows a buffer pool in slowdown, you can usually conclude that a buffer
shortage has caused the wait problem. If you increase the number of buffers in the
appropriate pool, this usually eliminates the problem. However, the problem
should be investigated further to determine if a VTAM logic error has caused
the buffers to be wasted and thus depleted.
VTAM routines that request buffers can choose to wait or not to wait if there
are not enough buffers available to fulfill the buffer request.
If a routine chooses to wait (and most do) when buffers are unavailable, the
process is represented by an active CRA with a non-zero resume address.
In addition to the slowdown bit being on in the BPCB, the RPH for the process
is queued onto the BPCB. The queue headers in the BPCB are located at
offsets X'18' and X'IC'; X'18' for queuing priority requests and X'lC' for
queuing normal requests. Figure 5-35 represents the queuing of RPHs and Figure
5-34 shows how to fmd the BPCBs. Any address in either of the queue headers of
a BPCB indicates a buffer problem with that pool. The RPHs queued from the
BPCB have a resume address that points to code following the buffer request.
Examine the routine in question to determine if an error in the code has caused the
buffer problem or if the condition exists because the buffer specification was too
small.

5.8.12

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VTAM (continued)
Address of ATCVT

~

V000400

86578881 60FE3000 00B2EA40 00000000

00000000 00000000 00000000 00000000

Beginning of ATCVT

0
0

VB2EA40
VB2EA60
VB2EA80
VB2EAAO
VB2EACO

40000000
80A9D098
00000000
00000000
00B2COOO

00C2D400
00B2D800
00000000
00C26190
00BF4B08

00000000
80A9D098
00000000
00C26160
000C2D50

Address of BPDTY
Beginning of BPEs

Beginning of BPDTY

0

VB2cOOO
VB2C020
VB2C040
VB2C060
VB2C080
VB2COAO
VB2COCO
VB2COEO
VB2C100
VB2C120
VB2C140

ooocoooc
00B2COAO
00320000
00320001
0064000A
E2C6FOFO
D3C6FOFO
E4C5C3C2
C6D4C3C2
C4C1E3C1
C3D9

00000000
00B2EA40
00320000
00320000
00510000
60000058
60000080
50000070
40000148
440000D8
40000070

00B2C4B8
00000000
00320000
02000000
00640000
00B2C2D8
00B2C398
00B2C458
00B2C518
00B2C5D8

00B2C5D8
000003E8
00250000
01 FC00(V
00580e 5
0000001l1'
00000('0
OOOOOO\;;.
00000000
00000000
00

00000000
005800FD
00140002
02000033
E2D4E2F1
E2D7FOFO
D3D7FOFO
C9D6FOFO
C1C3C540
D5D7FOFO
00000

00B2DOOO
00B2C038
OOOCOOOO
02000000
60000000
40000048
400003A8
6COOOOCO
40000040
42000148
578

00000000
00280000
00320000
0100001A
00B2C278
00B2C338
00B2C3F8
00B2C4B8
00B2C578
00B2C638
4

00B2C278
001DOOOO
00320000
01000000
00000000
00000000
00000000
00000000
00000000
00000000
000

LPBUF BPENT Address

Beginning address of pool
Ending address of pool

VB2C3EO
VB2C400
VB2C420
VB2C440

00C26550 00B2C398 00000000
00B28000 00B253BO 0144000B
000403A8 00B2C42C D6D9C4DB
OOOO~~--~~~~

Beginning of BPCB

00000000 00000000 400000E7 00B27ECO
00000000 00000000 00B2COOO 00B2C458
00000000 00FF2590 00000006 00000000
000

Notes:
1. Locate the address of the ATCVT (VTAM communication vector table) by going to
absolute storage location X'408.'
2. Go to the specified address to locate the ATCVT.
3. Locate the address of the BPDTY (buffer pool directory) by indexing into the
ATCVT a value of X'80:
4. Go to the specified address to locate the BPDTY.
5. Locate the BPEs (buffer pool entries) by indexing into the BPDTY a value of X'90:
6. The BPEs contain the name of the pool in the first 4 bytes - the length of the pool
element in the second 2 bytes of the second word - and the address of the BPCB
(buffer pool control block) in the third 4 bytes. Each BPE is X' 10' bytes long.
(Note that the example shows the LPBUF.)

Figure 5-34. Sample Storage Pool Dump

Section 5. Component Analysis

5.S.13

VTAM (continued)

----------

LPBUF

ACRA

la

@RPH

18 or 1C

'>

Resume @

I

~

@ of RPH

RPH ACRA

I

I

4.

0

10

Resume@

} RPH

\ACRA
4

A BPCB

@RPH

' } RPH
10

Resume @

I Figure 5-35. Queuing of RPHs While Waiting for Storage
If a routine chooses not to wait when buffers are unavailable, return codes
notify the routine of the lack of buffers. There are no specific flags that are always
set to indicate that the request was rejected. Therefore, you cannot easily
determine if a particular routine requested buffers but did not get them. However,
you can tell that there is a buffer problem because this is usually indicated by
the slowdown bit being on in one of the BPCBs.
Usually there is an active CRA for problems that are described as VT AM
waits. However, for some problems (for example, one or more terminals waiting),
a dump might be obtained that has no active CRAs. The best place to start with a
problem such as this one is to locate the FMCB/DNCB for the terminal(s)in
question. The FMCB/DNCB control blocks contain the following:
• various flags
• PABs to control the inbound and outbound request processing
• queues of outstanding requests
Investigate further any work elements found unprocessed on the PABs or queues
of these control blocks.
To find the FMCB/DNCB for a particular node name, look at the following:

QAB + X'8'

+ATCVT
+QAB
4first RDT segment

RDT+X'4C'

.• next RDT segment

LOC X'408'
ACTCVT + X'C'

5.8.14

=

OS!VS2 System Programming Library: MVS Diagnostic Techniques

VTAM (continued)
ROTs are segmented tables and each segment contains information ab.out a
major node as defined in SYSl.VTAMLST. Each segment contains entries for the
groups, lines, clusters, terminals, components, etc., for the major node, as shown
below:
Segment
~

Entry

<0:::::::::-----:.----1...

__--~w~---

NCPname

subelement name

subelement name

The node name is in the first eight bytes of each ROT entry. Stop chaining
through the RDT entries when the major node name (that is, NCP or LBUILD) is
found; then scan down the RDT entries of this major node until the node in
question is found. Offset X'28' of the appropriate RDT entry points to the DNCB
(offset X'28' from beginning of name). DNCB + X'10' points to the FMCB.
Another way to find the FMCB/DNCB for a process that is waiting (but has an
active CRA and an RPH with a non-zero resume address) is to look at the module
that is waiting and determine its register usage. Find the registers that were saved
in the RPH + X'28'. See the module for the order in which the registers were
saved.

Program Checks
The information generally available on VTAM program checks is in the SDWA and
the SDUMP that the EST AEs Or FRRs provide. This information is used in the
normal manner to determine the cause of the program check. However, there can
be cases where more exact or timely information is required. This additional
information might have to be obtained through the use of traces or traps. Traps at
the en try point of the FRRs or EST AEs (1ST APC61 and 1ST APC62 , in particular)
are often useful.

Miscellaneous Hints On VT AM
1. VTAM waits can occur because of buffer depletion. Such a situation
usually occurs just after VT AM is installed and before the actual buffer
requirements are determined. Running GTF (with the USR trace option) at
this time can be helpful because VT AM creates an SMS trace record whenever a storage request is queued. Because VARY ACTIVATE/INACTIVATE
of an NCP puts the heaviest stress on the buffer pools, start GTF
before an NCP is activated.

Section 5. Component Analysis

5.B.15

VTAM (continued)

2. VTAM places warmstart copies of major nodes into the data set
SYSl.VTAMOBJ the first time that anode is activated. These warmstart
copies are used for subsequent activations of the node. If a node definition is
changed in SYS 1.VTAMLST, be sure to scratch the corresponding member
in SYS 1.VTAMOBJ to ensure that the new defmition is used by VT AM. Also
scratch the members of SYSl.VTAMOBJ after PTFs have been installed
because some of the bit defmitions might have changed.
3. Most VTAM control blocks are in the 11 VTAM buffer pools. By simply
scanning the buffer pools and looking for unusual conditions you can often
uncover many of the problems. Each buffer in the buffer pools is preceded by
a two-word buffer header. The high-order bit of the first word indicates whether
the buffer is allocated or available: on =allocated, off =available. The address
portion of the first word points to the module that last released the buffer. The
second word contains a pointer to the buffer pool control block.

5.8.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VSAM

The virtual storage access method (VSAM) consists of three major subcomponents:
• Record management
• Open/ close/end-of-volume
• I/O manager

Record Management
Record management processing produces no messages. Problem determination
normally begins with an examination of the request parameter list (RPL). If a
physical error occurs and the user has provided a large enough message area
(pointed to by RPLERMSA), VSAM (IDAOI9R5) builds a SYNADAF-type record
in that area for the user to examine. For both logical and physical errors, VSAM
sets return codes in the RPL.

RPL
Three fields in the RPL are used to indicate an error:
1.

RPLERREG

- (RPL + X'D') a one-byte value which is also returned in
register 15 after a request:

o

request completed normally

8

a logical error occurred

12
2.

RPLCMPON

a physical error occurred.

-- (RPL + X'E') a one-byte value that indicates which
component was being processed at the time of the error if
the request involved alternate indexes. This value also
indicates whether upgrading was valid or was incorrect
because of the error.
CODE

3.

RPLERRCD

COMPONENT

STATUS OF UPGRADE

0

base cluster

valid

1

base cluster

might be incorrect

2

alternate index

valid

3

alternate index

might be incorrect

4

upgrade set

valid

5

upgrade set

might be incorrect

(RPL + X'F') a one-byte value describing the error (see the
Diagnostic Aids section of OSjVS2 VSAM Logic).

Section 5. Component Analysis

5.9.1

VSAM{continued)
Other important fields in the RPL are:
RPLREQ

(+X'02') request type

RPLPLHPT

(+ X'04') pointer to the PLH

RPLECB

(+X'OB') ECB or pointer to the ECB
~he

RPLDACB

- (+X'lB') pointer to

RPLAREA

- (+X'20') pointer to the user's record area

RPLARG

- (+X'24') pointer to the user's search argument

RPLOPTCD

- (+ X'2B') two bytes of option flags

RPLDDDD

- (+X'40') last successful request's RBA value
(returned to user by VSAM).

ACB

PLH
Once the information in the RPL has been evaluated, the next block to examine is
the placeholder (PLH). The PLH contains current information about the request,
including positioning and pointers to associated control blocks such as buffer
control blocks (BUFCs) and the I/O management block (IOMB).
The following fields are important for understanding the request:
PLHFLG1

- (+X'02') status flags

PLHFLG2

- (+X'03') status flags

PLHEFLGS

- (+X'04') two bytes of ex~eption flags

PLHFLG3

-(+X'06') status flags

PLHAFLG3

- (+X'07') statusflags

PLHCRPL

- (+X'14') pointer to the current RPL

PLHDBUFC

- (+X'34') pointer to the current data BUFC

PLHDIOB

- (+X'4C') pointer to 10MB

PLHRETO

- (+X'74') halfword offset into register 14 pushdown save area.
If the halfword at +X'76' is zero, PLHRETO is an
offset from +X'7S' into a 14-word save area and
points to the next available word. If the halfword at
+X'76' is not zero, then it is the offset from +X'7B' to
the beginning of a 20-word save area at the end of the
PLH, and PLHRETO is an offset from +X'7B' into
that save area.
- (+X'BC') pointer to the current index BUFC

PLHIBUFC

5.9.2

PLHIXSPL

- (+X'CS') 32-byte index search parameter list (IXSPL)
containing information about the results of the last
index search.

PLHKEYPT

- (+X'FS') pointer to the current key value or relative record
number.

OS!VS2 System Programming Library: MVS Diagnostic Techniques

VSAM (continued)

BUFC
The buffer control block (BUFC) contains function codes, status indicators, and
relative byte address (RBA) values describing the associated buffer.
BUFFLG 1

- (+ X '0 1 ') BUFC status flags

BUFCIOFL

- (+X'02') I/O status flags

BUFCDDDD - (+X'08') RBA for input if BUFCV AL is on
BUFCORBA - (+X'OC') RBA for output if BUFCMW is on
BUFCBAD

- (+X'14') pointer to associated buffer

During record management processing, register usage is as follows:
Rl

RPL pointer

R2

PLH pointer

R3

pointer to the access method block (AMB) of the component being
processed

R4

BUFC pointer

Use the register 14 save area in the PLH to find the path taken by a request through
record management.

Record Management Debugging Aids
It is not always desirable to cause program checks as a method of getting dumps,
because some applications have sophisticated error recovery routines that can
possibly change the environment. It is preferable to get documentation of the
error before such routines get control, and then allow these routines to do their
cleanup function after the dump is taken. The following code is an example of a
console-activated communications vector table (CVT) trap for record management
errors that causes the failing application to loop, allowing a console dump' to be
taken. Following the dump the trap can be deactivated, allowing the application
to continue processing. The code can be inserted into CSECT IDA019Rl at label
'POSTRPL', label 'POSTRPL2', and the patch area at the end of the module.

Section 5. Component Analysis

5.9.3

VSAM l continued)
NAME IDA019Ll

IDA019Rl

VERPOSTRPL

9S0C,100D

VER POSTRPL2

1851,9101,1028

VERPATCH

0000,0000

X'S4' bytes of patch area

REPPOSTRPL

4SEO,Bx.xx

to PATCH 1

REPPOSTRPL2

18S1,4SEO,Bxxx

to PATCH2

REP PATCH 1

S8FO,0010,

point to CVT

9102,F108,

is trap activated?

4780,Bxxx,

no, go to EXITI

LOOPI

DSOO,F10A,100D, compare error type
4770,Bxxx,

no, go to EXIT 1

DSOO,FI0B,100F,

compare error code

4770,Bx.xx,

no, go to EXIT 1

47FO,Bxxx,

yes, go to LOOPI. Loop until trap bit
in CVT is turned off.

9S0C,100D,

restore instruction

07FE,

branch back inline

PATCH2

S8FO,0010,

point to CVT

LOOP2

9102,FI08,

is trap activated?

4780 ,Bx.xx,

no, go to EXIT2

D 500 ,F lOA, 1OOD

compare error type

4770,Bx.xx,

no, go to EXIT2

DSOO,F10B,100F,

compare error code

4770,Bx.xx,

no, go to EXIT2

47 FO ,Bx.xx,

yes, go to LOOP2. Loop until trap bit
in CVT is turned off.

9101,1028,

restore instruction

07FE

branch back inline

EXIT 1

EXIT2

To activate the trap, set CVT + X'10A~10B' to logical error (X'08xx') where xx
is the error code (RPLERRCD), or to physical error (X'OCOO'). Then 'OR' on bit 6
(X'02') in CVT + X'108' taking care to leave the other bits in that byte
undisturbed. After the loop occurs and a console dump of the failing address space
has been taken, turn off bit 6 in CVT + X'108' to deactivate the trap and allow the
application to continue processing. Be sure that the dump taken includes the
region, SQA, and CSA. Note that when using the trap for physical errors the
RPLERRCD is X'OO' at the point of the trap because VSAM has not yet gone to
IDAOI9RS. Physical errors caused by unit check (for example - incorrect length,
no record found on a search id, require that the I/O supervisor block (lOSB) be
examined. To get a dump with the 10SB still valid, a trap can be inserted into
nucleus CSECT IDA121A4 (abnormal end appendage) at label 'PERMERR'. Since
this is in the nucleus, the trap can be set from the console. (See I/O Manager
Debugging).

5.9.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

VSAM (continued)
Record management error codes (RPLERRCD) are described in the Diagnostic
Aids section of OS/VS2 VSAM Logic. It is useful to know which module sets
each error and the name of each error, so that you can find where it is set in
the module via the cross reference.
Error Code (hex)

Symbolic Name

Moclule (IDAOl'xx)

RPLEODER
RPLDUP
RPLSEQCK
RPLNOREC
RPLEXCL
RPLNOMNT
RPLNOEXT
RPLINRBA
RPLNOKR
RPLNOVRT
RPLINBUF
RPLNOPLII
RPLINACC
RPLINKEY
RPLINADR
RPLERSER
RPLINLOC
RPLNOPTR
RPLINUPD
RPLKEYCH
RPLDLCER
RPLINVP
RPLINLEN
RPLKEYLC
RPLINLRQ
RPLINTCB
RPLSRLOC
RPLARSRK
RPLSRISG
RPLNBRCD
RPLNXPTR
RPLNOBFR
RPLIRRNO
RPLRRADR
RPLPAACI
RPLPUTBK
RPLINVEQ

RD, RR, RY, R2,R4, R8
RA,RQ,RX,R4
RA,RR,RX,R4
RA,RR,RY
RF, RY, R2, R8
RW, RY, R2, R5
RE, RF, RM, R5, 1t8
RA,R8

Logical

04
08
OC
10
14
18
lC
20
24
28
2C
40
44
48
4C
50
54
58
5C
60
64
68
6C
70
74
78
84
88
8C
90
94
98
CO
C4
C8
CC
DO

RM

RG,RU,RX
RR, RT, RY, R4, R8
RU,RX,Rl
RQ,R4,R8
Rl, R8
Rl,R8
RL,RX,R8
RQ, Rl, R4, R8
RD,RR, R4, R8
RQ, RX, R4, R8
RL,RX
RL,RQ
RA, RR, RY, RX, Rl, R4, R8
RL, RQ, RU, R4, R8
Rl
RR, R4, R8
RP
RT
RT
R4
RX
RU
RY
RQ,RR
Rl
RX
RQ,R4
RP

Section 5. Component Analysis

S.9.5

VSAM (continued)

Physical
04

08

OC
10
14
18

RPLRDERD
RPLRDERI
RPLRDERS
RPLWTERD
RPLWTERI
RPLWTERS

RS
RS
RS
RS
RS
RS

Record management processing sometimes requires serialization of internal
resources, When the needed resource can be acquired, processing proceeds
normally. However, when another request has control of the resource the request
is deferred. As each request completes, a scan is made for requests which have
been deferred. If the resource has become available, the deferred request is
restarted. While a request is deferred, PLHDRPND is set in the PLH and
PLHDRRSC points to the resource byte to be tested for availability.

Open/Close/End-Of-Volume
O/C/EOV documents errors by means of error messages and access method control
block (ACB) return codes. The codes returned in the ACB (ACBERFLG) are
explained in the Diagnostic Aids section of OS/VS2 VSAM Logic,
along with an indication of the modules that set each error. In the cross
reference of the modules, these error codes have the symbolic name of OPERRddd,
where ddd is the decimal error code. The most significant problem determination
feature of O/C/EOV however, is its message facility. The following messages are
issued:
MSGIEC070I - END OF VOLUME
MSGIECI611 - OPEN
MSGIEC2S II - CLOSE
MSGIEC2S2I - CLOSE (TYPE=T)
The messages contain both problem codes (symbolic PPddd) and function
codes (symbolic PDFddd). The problem codes that describe the error are
explained with each message in VS2 System Messages. The function codes are
described best in the Diagnostic Aids section of OS/VS2 VSAM Logic, along with
the module that was performing the function at the time of the error.

5.9.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VSAM (continued)

O/C/EOV Debugging Aids
There is a built-in trap for O/C/EOV (see the Caution later in this topic). There are
two bits involved. Bit 4 (X'08') at CVT + X'108' can be OR'd on (being careful to
leave the other bits in that byte undisturbed) to cause an abend dump (U888) when
the message is issued. Bit 6 (X'02') at CVT + X'10A' when turned on prevents the
freeing of module work areas. When both these bits are on, the U888 dump produced contains the module work area for every module gone through in the open
path. There is a discussion in the Diagnostic Aids section of OSjVS2 VSAM Logic
on finding the work areas in the dump and a diagram showing how the work areas
are chained together.

I

GTF trace is also available for debugging. If GTF is active for TRACE=USR
at the time of the error, VSAM Open (IDAOI92P) writes user records FFF and
FF5 containing the VSAM control blocks at the time of the failure. The
standard OPEN work area trace is also available by coding AMP='TRACE' on the
DD statement.
The following ENQs are issued by O/C/EOV:

Major Name

Minor Name

Modules

Reason

SYSVSAM

NNNCCCCB (Note 1) IDA0192A
IDA0200T
IDAo231 T
IDA0557 A

SYSVSAM

NNNCCCCI (Note 1)

SYSVSAM

NNNCCCCO (Note 1) IDA0192A The '0' ENQ is issued for each
component of a data set being
opened for output processing.
DEQ is issued when the data set
is closed.

The 'B' or busy ENQ is used
to serialize the modification
of the control block chains
by allowing only one of the
functions (OPEN, CLOSE,
TCLOSE, or END OF
VOLUME) to process the data
set. This resource is held for
the life of the function

IDA0192A The 'I' ENQ is issued for each
component of a data set being
opened for input processing.
DEQ is issued when the data
set is closed.

Note: If the data set is opened for both input and output, both the 'I' and '0'
resources will be held for each component.
Note 1: In the minor name, NNN
CCCC

the 3-byte CI number of the component's
catalog record
= the 4-byte catalog ACB address.

Section 5. Component Analysis

5.9.7

VSAM (continued)
When a VSAM (non-catalog) ACB is opened, data extent blocks (DEBs) are
constructed and chained as follows:
• A DEB containing the data set ACBaddress at DEB + X'IS' is chained on the
DEB chain of the current TCB. This DEB is referred to as the 'dummy' DEB.
Its purpose is to allow abend to close the VSAM data set if abnormal
termination occurs.
• A DEB containing the component access method block (AMB) address at
DEB + x' IS' is chained on the DEB chain of the jobstep TCB for each
component being opened. These are the 'real'DEBs and are the ones actually
used by VSAM processing.
When an ACB is being opened for DSNAME or DDNAME sharing and the data
set is already open, the ACB is just connected to the existing control block
structure and only the 'dummy' DEB is built and chained on the current TCB.
Caution:

When using the O/C/EOV trap be aware that:

• If the bit is turned on to prevent the freeing of work areas and the job causes
many calls to O/C/EOV, the region size may have to be increased to prevent
ABENDSOA.
• JOBCATs and STEPCATs are opened under the initiator TCB. The work
area core is owned by the initiator TCB. If this core is not freed because the
CVT debug bit is on, the initiator may get an ABEND20A when it issues
FREEMAIN for subpool 247 at jo b termination.

I/O Manager
I/O management includes the following modules:
IDAO 19 R3

Problem state I/O driver; a CSECT of LPA load module
IDA019Ll

IGC121

Supervisor state I/O driver (SIOD); a CSECT in the nucleus

IDA121A2

Actual block processor (ABP); a CSECT in the nucleus

IDA121A3

Channel end appendage; aCSECT in the nucleus

IDA121A4

Abnormal end appendage; a CSECT in the nucleus

The drivers and the ABP translate requests for access to the contents of control
intervals into requests for reading and writing physical records. They also build the
channel program to be passed to lOS.

5.9.8

OS/VS2System Programming Library: MVS Diagnostic Techniques

VSAM (continued)

I/O Manager Debugging
The combination of the I/O management block (IOMB), the I/O supervisor
block (IOSB), and the service request block (SRB), is used by I/O management to
control the processing of a request. The PLH (PLHIOB) points to the 10MB. The
10MB points to the 10SB (IOMIOSB), which in turn points to the SRB (IOSSRB).
For debugging unit checks (for example: no record found, incorrect length,
channel program check, channel protection check) the best place to trap for a
dump is at label 'PERMERR' in nucleus csect IDA121A4.

Section 5. Component Analysis

. 5.9.9

S.9.10

OS/VS2. System Programming Library: MVS Diagnostic Techniques

Catalog Management

Catalog management manages system requests for references and updates to the
master catalog. The following description of catalog management includes these
topics:
• Major Registers and Control Blocks
• Module Structure
• VSAM Catalog Recovery Logic
• Debugging Hints

Major Registers and Control Blocks
This section describes the major catalog management registers and control blocks,
shows how each can be located, and describes those control block fields and flags
that have proven to be useful in debugging.

How to Find Registers
Catalog management runs under control of an SVRB. The registers are saved across
supervisor-assisted linkages and interruptions in the standard ways. Depending
upon the nature of the problem, the registers can usually be found in one of the
following areas:
• For abends, registers are stored in RTM's SVRB and SDWA.
• For program checks, registers are stored in RTM's SVRB, the SDWA, and the
LCCA.
• For catalog-management-issued type 2, 3, and 4 SVCs, registers are stored in
the successor SVRB.
• F or waits, registers are stored in the TCB.
The registers stored in any of these areas will be the registers that existed when
the code that was running under a catalog SVRB gave up control. These registers
will either be the registers of one of the three catalog management routines or the
registers of a routine that was branch-entered by catalog management. If register
11 points to the CCA (identifiable vi~ a X'ACCA' in the first word), the registers
probably belong to IGGOCLAI; register 12 will be the base register for the CSECT
last in control. Otherwise, if register 11 is a base register, the code that it references
may be inspected to determine the routine in control. If the routine in control is
one that was branch-entered by catalog management, then catalog management's
registers may have been saved in a standard area pointed to by register 13.

Section S. Component Analysis

5.10.1

Catalog Management (continued)

Major Registers
IG COO 02F
Register 11

Base register

Register 12

Work area pointer

IGCOCLAI
Register 11 - CCA pointer
Register 12

Base register (current CSECT)

Register 13 - Register save push down list pointer (see CCAREGS) or
standard save area pointer

IGGOCLCA
Register 11 - Base register
Register 12 - Work area pointer

Major Control Blocks
The control blocks described in this section (AMCBS, PCCB, ACB, CAXWA,
CTGPL and CCA) are those that are most useful from a debugging standpoint. The
AMCBS and PCCB are useful in locating the control block structures for open
catalogs. The ACB and CAXWA relate to a particular catalog or catalog recovery
area (CRA) data set. The CTGPL and CCA relate to a particular catalog request.

AMCBS
The AMCBS (access method control block structure) is essentially a VSAM vector
table. It is constructed within the SQA during early NIP processing (IEAVNPll)
and resides there throughout the life of the system. The AMCBS is found through
CVT+X'100' (field CVTCBSP). Major fields in the AMCBS are:

Field

5.10.2

Descrip tion

CBSACB

Pointer to the master catalog's ACB.

CBSCMP

Pointer to the IGGOCLAI load module.

CBSCAXCN

CAXW A chain pointer. The CAXWAs 0 f all currently open
VSAM catalogs are included in this chain. The master catalog's
CAXWA is the last CAXW A in this chain.

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Catalog Management (continued)
PCCB
A PCCB (private catalog control block) connects a VSAM user catalog to a
particular initiator or job step. A PCCB is constructed (in SWA) for each user
catalog opened during the life of ajob step. PCCBs are chained together to form
an initiator or job-step-oriented PCCB chain. Generally, PCCBs are freed by step
termination. A PCCB is not required for the master catalog.
PCCBs are located through the TCB: TCB+X'B4' (field TCBJSCB) points to the
JSCB; JSCB+ X' 15 C' (field JSCBACT) points to the. active JSCB; the active
JSCB+X'CC' (field JSCBPCC) points to the first PCCB. PCCBs are chained via
PCCNEXTP.
Major fields in a PCCB are:

Field

Description

PCCACRO

PCCB identifier ('PCCB').

PCCNEXTP

Pointer to the next PCCB. This field is 0 if it is the last PCCB.

PCCACBP

Pointer to the catalog's ACB.

PCCDSNAM

Catalog's name.

PCCTGCON

Catalog's alias name.

Major flags in a PCCB are:

Flag

Description

PCCSTEPC

The catalog was specified to the job step through the use of a
JOBCAT or STEPCAT DD card.

PCCACTIV

The catalog is allocated and active.

PCOSCVOL

The catalog is an OS CVOL.

ACB
There is one ACB (access method control block) for each open VSAM catalog or
CRA. The ACB is created by the routine that opens the data set. Catalog and
CRA ACBs generally reside in the CSA.
An ACB can be located in the following ways:
1. The master catalog's ACB can be located from the AMCBS (CBSACB).
2. A particular user catalog's ACB can be located either via the CAXWA chain or
via the PCCB chain. To locate the ACB via the CAXWA chain, inspect the
CAXCNAM field of each CAXWA in turn until the desired catalog name is
found. The first CAXWA!s pointed to by the AMCBS (CBSCAXCN). The
CAXWAs are chained via CAXCHN. When the desired CAXWA is found, it
points to the desired ACB (CAXACB).

Section 5. Component Analysis

5.10.3

Catalog Management (continued)
To locate the ACB via the PCCB chain, inspect the PCCDSNAM and
PCCTGCON fields of each PCCB in turn until the desired catalog name or alias
name is found. The first PCCB is pointed to by the job step's active JSCB
(JSCBPCC). The PCCBs are chained via PCCNEXTP. When the desired PCCB
is found, it points to the desired ACB via PCCACBP.
3. A particular CRA's ACB can be located as follows:
a. Find the owning catalog's ACB (via steps 1 or 2).
b. Find the owning catalog's CAXWA (pointed to by ACBUAPTR).
c. Find the first CRA's ACB (pointed to by CAXCRACB).
d. Find the first CRA's CAXWA (pointed to by the CRA ACB's ACBUAPTR
field at ACB+X'40').
e. Inspect the CAXVOLID field for the desired CRA volume serial number.

f. If the desired CRA's ACB has not yet been found, then search the
remaining CAXW As in the CRA CAXWA chain. Inspect the
CAXVOLID field of each remaining CRA CAXWA in turn until the desired
CRA volume serial number is found. The remaining CRA CAXWAs are
chained to the first CRA CAXWA (and to each other) via CAXCHN. When
the desired CRA CAXW A is located, it points to the desired CRA ACB
via CAXCRACB.
4. The ACB representing the VSAM catalog that is currently being processed by a
particular catalog request can be located via the CCA (CCAACB).
5. The ACB representing the CRA that is currently being processed by a particular
catalog request can be located via the CCA (CCARAACB).
Major fields in the ACB are:

5.10.4

Field

Descrip don

ACBID

Control block identifier (X' AO').

ACBAMBL

Pointer to the VSAM record management control block structure.
This set of control blocks is built at OPEN time, resides in CSA,
and consists of those control blocks required to support a KSDS
(catalog) or an ESDS (CRA).

ACBERFLG

Error code stored by OPEN or CLOSE when the operation is
unsuccess ful.

ACBUAPTR

Pointer to the CAXWA.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Catalog Management (continued)
Major flags in the ACB are:

Flag

Description

ACBCAT

ACB represents a catalog.

ACBSCRA

ACB represents a CRA that has been opened for catalog management use.

ACBVCRA

ACB represents a CRA that has been opened for use by an access
method services (AMS) utility function.

CAXWA
There is one CAXW A (catalog ACB extended work area) for each open catalog or
CRA. The CAXWA is created during the OPEN process (either before the OPEN
or by the catalog OPEN routines). CAXW As generally reside in the CSA. The
CAXWA is pointed to by the ACB (field ACBVAPTR). See step 3 for locating the
ACB under the heading "ACB" earlier in this chapter. Major fields in the
CAXWAare:

Field

Descrip tion

CAXID

Control block identifier (X'CA').

CAXCHN

Pointer to the next CAXWA in the CAXWA chain. This is 0 if
it is the last CAXWA in the chain.

CAXACT

Count of the number of job steps for which this catalog is
currently open.

CAXACB

Pointer to the catalog ACB.

CAXVCB

Pointer to the catalog's or CRA's VCB.

CAXRPL

Pointer to a pool of RPLs. This pool is obtained at OPEN time
and resides in CSA. (Note: This field is not used in CRA
CAXW As. CRA RPLs are included within the owning catalog's
RPL pool.)

CAXCNAM

Catalog name (for catalog CAXWA only).

CAXVOLID

CRA volume serial number (for CRA CAXWA only).

CAXCRACB

For a catalog CAXWA: pointer to the first CRA ACB.
For a CRA CAXW A: pointer to the CRA ACB.

Major flags in the CAXWA are:

Flag

Description

CAXBLD

The catalog or CRA is in the process of being created.

CAXOPN

The catalog or CRA is being opened.

CAXCLS

The catalog or CRA is being closed.

CAXEOV

The catalog or CRA is being extended.

CAXMCT

The CAXWA represen ts the master catalog.

CAXF2DT

The catalog has been deleted.

Section 5. Component Analysis

5.10.5

Catalog Management (continued)

Flag

Description

CAXF2NDD

Unable to OPEN or CLOSE - DDNAME not found.

CAXF2NCR

Unable to OPEN or CLOSE

CAXF2IOE

Unable to OPEN or CLOSE - I/O error.

CAXF2REC

The catalog is a recoverable catalog (catalog CAXWA only).

~

insufficient main storage.

CTGPL
The CTGPL (catalog parameter list) is built by the routines that issue SVC 26 to
represent the desired catalog management request. The storage area where this
block resides varies and is controlled by the building routine. When a caller issues
SVC 26, the caller's registers are saved in the SVRB under which catalog management operates. Register 1 of this SVRB's register save area points to the CTGPL.
The CTGPL may also be located via the CCA (CCACPL). Note: At times, catalog
management processing uses CCACPL as a pointer to an internal CTGPL. Therefore, you should be careful when you use this pointer to locate the caller's CTGPL.
Major fields in the CTGPL are:

Field

Description

CTGOPTI
CTGOPT2
CTGOPT3
CTGOPT4
CTGOPTNS
CTGTYPE

These fields contain the codes and flags that indicate the type of
function requested.

CTGENT

Pointer to the entry name or CI number (for types of requests
other than DEFINE or ALTER).

CTGFVT

Pointer to the field vector table (FVT) for DEFINE and
ALTER requests.

CTGCAT

Pointer to an area that indicates the specific catalog (if any) to be
used in processing this request. The area may contain either the
catalog name or a pointer to the catalog's ACB. If no specific
catalog is indicated, CTGCAT will be O.

CTGWKA

Pointer to the work area. In general, catalog management stores
the requested information into this area.

CTGNOFLD

Number of FPL pointers in CTGFIELD.

CTGFIELD

An array of 4-byte FPL pointers. The FPLs describe the data
fields that the request is to process.

5.18.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

.-

Catalog Management (continued)
CCA
The CCA (catalog communications area) is the main VSAM catalog work area. It
is built upon entry to the VSAM catalog processor and freed just before exit. The
CCA resides in sub pool 252 of the caller's address space. Register 11 points to the
CCA.
Major fields in the CCA are:
Field

Description

CCAID

Control block identifier (X'ACCA').

CCAPROB

Error data - consists of a CSECT ID (2 bytes), reason code

(I byte), and error code (I byte).
CCATCB

Pointer to the caller's TCB.

CCACPL

Pointer to the CTGPL.

CCAACB

Pointer to the ACB of the catalog that is currently being
processed.

CCAURAB

Pointer to the record area block (RAB) of the record area
currently in use.

CCASRCH

Search argument for I/O requests.

CCARxREC

Pointer to record area x. (There are six record areas, record
area 0 through record area 5; x indicates the number of the
record area in question.)

CCARPLI

Pointer to the RPLthat is currently assigned to this request.

CCAEQDQ

An ENQ/DEQ parameter list that is used when VSAM catalog
management issues the RESERVE macro.

CCAMSSPL

A GETMAIN/FREEMAIN parameter list that the VSAM
catalog processor uses for most GETMAIN/FREEMAINs.

CCACMS

Pointer to the catalog management selVices work area
(CMSWA); it is used only for DELETE, ALTER, DEFINE,
and LISTCAT ALOG requests.

CCAREGS

An array of small (12·byte) register save areas. When a VSAM
catalog processor routine calls a lower level (nested) routine,
the contents of registers 12·14 are saved in the next save area
by the routine that is called. Registers 12 and 14 contain the
calling routine's base address and return address, respectively.
Register 13 is used to maintain position within the array.
Each time register 13 is saved, it points to the preceding save
area. During a lower level routine's processing, register 13
points to the current save area (that is, the area containing the
caller's registers). When a lower level routine exits, registers
12-14 are restored which causes register 13 to be auto·
matically switched (the preceding save area becomes the
current save area). Whenever VSAM catalog processor routines
branch·enter external routines, they pass a standard 72·byte
save area to the external routine. This is accomplished by
increasing register 13 by 12 during the process of setting up
the linking conventions for the branch and link. (The 72 bytes

Section 5. Component Analysis

5.10.7

Catalog Management (continued)

Field

Description

CCAREGS
( continued)

that follow the current save area are used as the standard save
Note: The register contents stored within this array
area.
can be used in debugging to identify predecessor routines and
modules.)

CCARAACB

Pointer to the ACB of the CRA that is currently being
processed, or zero.

CCARARPL

Pointer to the RPL that is currently assigned to this request
for CRA I/O use.

Major flags in the CCA are:

Flag

Descrip tion

CCAFLG1-4

Miscellaneous processing control flags.

CCARPLX

I/O option flags :
00 .....0

PUT direct

00 ..... 1 PUT sequential
01. .....

ERASE

1. .....0

GET direct

1. ..... 1 GET key equal to or greater than
· .0 . . . ..

Use the record area pointed to indirectly by
CCAURAB

· .1 . . ...

Use record area 0

· ..0 . . ..

Addressed or CI operation

· .. 1 . . ..

Keyed operation

· ...0. . .

Update operation

· ... 1 . ..

Non-update operation

· ....0 .. Check for errors
· .... 1 ..

Bypass error checking

· .... ·.0. 50S-byte low-key range record
· ..... 1.

5.10.8

47 -byte high-key range record

CCAFLG9

Miscellaneous CRA processing flags

CCARVFG1

Miscellaneous recovery (EST AE) control flags

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Catalog Management (continued)

Module Structure
Catalog management is packaged into three load modules. These modules are the
following:
1.

IGC0002F - Catalog Controller

2.

IGGOCLAI - VSAM Catalog Processor

3.

IGGOCLCA - CVOL Processor

This set of modules resides within SYS1.LPALIB and can be viewed as a type 4 SVC
routine consisting of three load modules. Catalog management receives control
via SVC 26 and operates under an SVRB. Control is passed between the three load
modules via XCTL. Each load module establishes its own ESTAE routine. A brief
description of each load module follows.

1.

IGC0002F - Catalog Controller
The function of this module is to translate (map) interfaces. The module
logically processes from a front end and a back end.
The front end receives control from the SVC SLIB whenever SVC 26 is
issued. Register 1 points either to an as CAMLIST or a VSAM CTGPL.
If register 1 points to an as CAMLIST, the as request is translated into an
appropriate VSAM request (a CTGPL is constructed). Control is then passed
to IGGOCLAI.
The back end receives control (at EP IGGOI02F) from IGGOCLAI upon
completion of a VSAM request for a VSAM catalog. It determines if the
original request was an as CAMLIST request and if so, it translates the
CTGPL output and the IGGOCLAI return code into appropriate CAMLIST
format. It then returns control to the issuer of SVC 26. For a more detailed
description of this module, see OSjVS2 Catalog Management Logic.

2.

IGGOCLA1- VSAM Catalog Processor
IGGOCLAI is a large load module that consists of many CSECTs anq
procedures. Control is passed between the various procedures via CALls.
This module relates a request to a specific catalog and also determines the
catalog type. If the catalog is an as CVOL, IGGOCLAI passes control to
the CVOL processor (IGGOCLCA). Otherwise, IGGOCLAI accesses the
VSAM catalog and: performs the function indicated by the CTGPL. When
the function is completed, IGGOCLAI exits by passing control to the back
end of IGC0002F. For a detailed description ofVSAM catalog management,
see OSjVS2 Catalog Management Logic.

3.

IGGOCLA. - CVOL Processor
IGGOCLCA is a load module that consists of several CSECTs and procedures.
Control is passed between the various procedures via CALLs. This module
translates CTGPL requests into as catalog requests and accesses as CVOLs
to perform the indicated function. Upon completion of processing this
module returns control to the issuer of SVC 26. For a detailed description
of this module, see OSjVS2 CVOL Processor Logic.

Section 5. Component Analysis

5.10.9

Catalog Management (continued)

VSAM Ca talog Recovery Logic
This section describes how mairiline VSAMcatalog management supports recovery
and also how its recovery routine works.
Mainline VSAM catalog management does the following:
• Establishes/releases the recovery environment
• Maintains a pushdown list end mark
• Tracks GETMAIN/FREEMAIN activity
• Maintains a CMS (catalog management services) function gate

Establishing/Releasing a Recovery Environment
To establish or release a recovery environment, the following actions occur:
1. Sub function BLDCCA in module IGGOCLC9 issues a branch entry to EST AE
to establish the recovery environment. This is done immediately after storage
has been obtained for the CCA via GETMAIN~
2. When BLDCCA completes the initialization of the CCA, it sets RVCCAV to
indicate that the CCA is now valid ..
3. Sub function IGGPRCLU (request cleanup) in module IGGOCLC9 performs
the following:
•

Indicates that the CCA is no longer valid (RVCCAV = oft)

•

Frees any GETMAIN/FREEMAIN tracking spill blocks that may exist

•

Branch enters ESTAE to remove the recovery environment

Maintaining a Pushdown List End Mark
A pushdown list end mark is maintained so that the ESTAE recovery routine can
reliably locate the last pushdown list entry. This enables the recovery routine to
determine:
1. The address at which the last call to a nested sub function was issued.
2. The routine to which this call was directed.
There is an instruction in the exit procedure code contained within each CSECT to
i~sure that the first byte following the last active entry contains an end-of-list
marker. (Note that X'OO' and X'FF' are considered end-of-list markers.)

S.10.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Catalog Management (continued)

Tracking GETMAIN /FREEMAIN Activity
GETMAIN/FREEMAIN tracking provides the recovery routine with the information it needs to automatically issue FREEMAINs against those areas of main
storage that have been acquired and not yet freed by VSAM catalog management.
The GETMAIN/FREEMAIN tracking function is implemented as follows:
1.

2.

A 256-byte contiguous area is defined in the CCA. The area consists of:
a.

A 248-byte tracking buffer.

b.

A single entry GETMAIN/FREEMAIN length list (four bytes) with the
high-order byte initialized to X'80' and the low-order three bytes defined
as CCAMNLEN.

c.

The GETMAIN/FREEMAIN address word (CCAMNADR).

The ?GETMS and ?FREEMS macros generate code that:
a.

Track the operation. This is accomplished by an MVC instruction that
traces the GETMAIN/FREEMAIN length and address by pushing it
(shifting it left) to the bottom (low address) of the 248-byte tracking area.

b.

Check for a full tracking buffer. If the buffer is full, a spill routine
(IGGPARFS) is called before the tracking MVC instruction is issued. This
spill routine:

(1) Issues GETMAIN to obtain a 256-byte spill buffer.
(2) Chains this buffer to the end of the spill buffer chain.
(Note: Chain anchor words are located in the CCA.)
(3) Copies the CCA tracking buffer into the new spill buffer.
(4) Clears the CCA tracking buffer.
c.

If the ?GETMS macro call is specified with CLASS(S) for storage (global),
a flag (MNATSCLS) is set in the first byte of the two-word trace entry to
indicate this. Refer to the description of CCAMNCAT, a work area that is
located at CCA+X'308', contained in OS/VS2 Catalog Managemen t Logic.

eMS Function Gate
The CMS function gate assists the recovery routine in determining if DEFINE or
DELETE back out action is required. This gate is represented by a bit (RVCMSFG)
in field CCARVFG 1. The bit is turned on by the CMS driver (IGGPCDVR in
module IGGOCLAT) immediately after a successful return from the check
authorization function. The bit is reset upon entry to the CMS cleanup function
(IGGPCCLN in module IGGOCLAT).

Section 5. Component Analysis

5.10.11

Catalog Management (continued)

Recovery Routine Functions
VSAM'scatalog processor recovery routine is labelled IGGPCMRR (CSECT
IGGOCLA9). This recovery routine is entered from MVS's recovery tennination
manager (RTM) whenever an error or interruption occurs either in VSAM catalog
management or in any successor routine that VSAM catalog management can cause
to receive control. A pointer to the ST AE diagnostic work area (SOW A) is passed
as input to IGGPCMRR: IGGPCMRR performs the following functions.
(Functions 2-13 are performe.d only when the CCA is marked valid, that is,
RVCCAV = ON.)
1. Retrieves the CCA pointer from the SDWA and puts it into register 11.
2., Saves the RTM return address in CCARI4S.
3. Saves the SDWA pointer in CCASDWAP.
4. Produces diagnostic output.
5. Initializes register 13 to point to the first register save area.
6. Cleans up RPLs (if required).
7. Determines if backout is to be performed.
8. Checkpoints the CCR (if required).
9. Drops catalog orientation.
10. Frees storage (using GETMAIN/FREEMAIN tracking information).
11. Frees GETMAIN/FREEMAIN tracking spill blocks (if any exist).
12. Performs DEFINE/DELETE backout (if applicable).
13. Restores the RTM return address and the SOWA pointer.
14. Frees the CCA.
15. Returns to RTM indicating that RTM should continue with termination.
The following sections describe the more complex of these recovery routine
functions in greater detail.

Diagnostic Output (Function 4)
Diagnostic output is'P{oduced except in those situations where the recovery
routine is invoked only for-elean up type functions, such as CANCEL. Diagnostic
output can be produced in two fonns-:
1.

. 5.10.12

Iriformation is' placed in a vaoable recording ~rea (SDWAVRA) within the
SOWA. This data is written t~~thN)ys I.LOG~EC data set as part of an entry
describing the error .

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Catalog Management (continued)
This variable data is formatted as follows:

Byte

2.

Length

Description of Data

0(0)

8

VSAM catalog processor module name - 'IGGOCLA1'

8(8)

3

IGGOCLA1 's entry point address

11(B)

8

Procedure name of the last-called routine

19(13)

3

Address of the last-called routine

22(16)

8

Procedure name of the routine that called the last-called
routine

30(1 E)

3

Address of the CALL to the last-called routine

33(21)

4

The characters 'CPL='

37(25)

28

A copy of the user's CTGPL

An SDUMP is-taken (if allowed by the system).

Backout (Function 7)
°Backout is performed for DEFINE or DELETE requests (except for DEFINE or
DELETE catalog requests) when the CMS function gate is active (RVCMSFG =
ON). When backout is to be performed, a switch (RVESBOR) is set. The backout
function (Function 12) is described later in this chapter.

Drop Catalog Orientation (Function 9)
This function uses the normal IGGPRPLF sub function to perform the RPL
freeup/DEQ functions.

Storage Freeup (Function 10)
This function frees all the storage (with the exception of the CCA and any existing
tracking spill blocks) that has been acquired and is still owned by the current
VSAM catalog management request. Storage freeup is done as follows:
1.

The GETMAIN/FREEMAIN tracking data is scanned starting at the first spill
block (if any) and following the chain of spill blocks. When the last spill block
has been processed, the scan continues with the first valid entry in the CCA
tracking buffer. This first scan selects and eliminates paired entries; a paired
entry consists of two entries with matching storage addresses, which indicate
that the storage area in question has already been freed.

2.

The tracking data is scanned again. During this second scan, each valid
remaining entry is processed as follows:
a. The length and address of the storage to be freed are extracted from the
entry.

Section 5. Component Analysis

5.10.13

Catalog Management (continued)
b. The subpool is determined from a switch setting within the entry.
c. A ?FREEMS macro is issued to free the main storage. This macro specifies
"RFR (NO)" to prevent recursive tracking.

DEFINE/DELETE Backout (Function 12)
This function attempts to preserve catalog integrity by cleaning up partiallycompleted DEFINE or DELETE operations. It uses the normal DELETE function
to accomplish this. The switch indicating that backout is required is tested. If this
switch is on, the following actions are performed:

5.10.14

1.

A backout work area is obtained.

2.

A DELETE CTGPL is constructed in the backout work area. This CTGPL is
set up to cause a DELETE of the object that was being defined (with DEFINE)
or deleted (with DELETE) whenever the error occurred.

3.

The CCA is rebuilt as follows:
a.

CCACPL, CCASZ, CCATCB, CCASDWAP, CCARI4S, and CCARVFGl
are saved (in the backout work area).

b.

The complete CCA is cleared.

c.

The previously-saved fields (with the exception of CCACPL) are restored.

d.

CCAPCL is initialized to point to the CTGPL, which was built into the
backou t work area.

e.

CCAID, CCAURAB, CCAROREC through CCAR5REC, CCAEDXFF,
CCAMNPTR, CCAMNLLP, CCAMNLL, and register 13 are
reinitialized to their original values.

f.

CCAF2SYS is set on.

g.

RVESBO is set on to indicate that backout is in control.

4.

The CMS driver (IGGPCDVR) is invoked which then invokes the DELETE
function; when the DELETE action is complete, control is returned to the
recovery routine.

5.

The CCR is checkpointed (if required).

6.

Catalog orientation is dropped (via a call to IGGPRPLF).

7.

CCACPL is restored.

8.

The backout work area is freed.

9.

Any spill blocks acquired during the backout process are freed.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Catalog Management (continued)

Debugging Aids
The control block structures for the VSAM catalog reside in the CSA. There is a
built-in communications vector table (CVT) debug word which allows you to get
a console dump at the time of the failure. This word is located at CVT + X'10B'
and is examined by module IGGOCLC9 at the end of each catalog request.
Following are the contents of the CVT debug word:
Byte 0 (X'10B') bits 0-3 must remain unchanged
bit 4

not used by catalog

bit 5= 1 causes message IEC3311 to be issued when condition
specified in byte 1 (X'109') is met. IEC3311 contains
the name of the catalog module which detected the
error.
bit 6

not used by catalog

bit 7=1 prevents catalog FRR (IGGOCLA9) from freeing the
catalog communications area (CCA) so that it is
available in the dump.
Byte 1 (X'109') Condition for which action specified at location X'10A-I0B'
is to be taken.
X'O l'

- take action at end of every catalog request

X'02'

- take action for any non-zero catalog return code

X'03'

- take action for return codes other than those
considered to be "nonnal". (The following are
considered to be normal return codes - X'OO, OB,
24, 2B, 2C, 4C, BC' and reason codes X'2B, BC,
and FO').

X'04' to X'FF' - take action only when catalog return code
equals value in this byte.
Bytes 2 and 3 (X'10A-I0B') Action to be taken on above condition:
X'07FE'

return immediately to inline catalog
code and continue processing. This
setting, in conjunction with bit 5 of
byte 0, causes no action other than
message IEC331I.

X'07FF'

- will cause loop here at CVT + X'10A'
to allow console dump of failing
ASID. To break job out ofloop.
either cancel the job or set these bytes
to X'07FE' to continue processing.

Section 5. Component Analysis

5.10.15

Catalog Management (continued)
When message IEC3311 appears by itself, use the above CVT trap to get a dump
of the failure. When messages IEC33II, IEC3321, and IEC3331 appear together,
the error is the result of a call to record management. Message IEC3331 contains
the record management return code in the form Lxxx (for logical error) or Pxxx
(for physical error) where xxx = decimal return code. In these cases use the CVT
trap discussed earlier in the Record Management Debugging Aids section of VSAM
component analysis.
In situations where an att~mpt to open a VSAM catalog results in message
IEC161I 004-080, it is difficult tQ determine the exact nature of the problem
because there are many conditions which can cause this error. The best place
to trap dump is at label 'CAPERR' in modules IFG0191X and IFG0191Y.
Register 14 at that point will be in the calling routine which detected the failure.

It is sometimes necessary to examine the records in the catalog as part of the
problem analysis. The following is an example of the access method services job
necessary for this.
/ /PRINT
EXE~ PGM=IDCAMS
/ /STEPCAT
DD DSN=catalogname, DISP=SHR
/ /DD 1
DD DSN=catalogname, DISP=SHR
//SYSPRINT
DD SYSOUT=A
DD *
//SYSIN
PRINT INFILE(DDl)

/*
The following ENQs are issued for catalog processing:

Major Name

Minor Name

Modules

Reason

SYSIGGVl

MCATOPEN

IGGOCLAC
IGGOCLAD

Open master catalog

SYSIGGV2
SYSIGGV2

catalogname
catalogname

IGGOCLA3

Assign RPL processing

SYSVTOC

volser

IGGOCLBU

Read/Write format 4 DSCB

SYSZCAXW

CAXW

IDACATII
IDACAT12
IGGOCLBG

Open, close, or delete
Catalog request

SYSZPCCB

PCCB

IGGOCLA3

While building PCCB for catalog
open

SYSZTIOT

asid

IDACATll
IDACAT12
IGGOCLAD

Open and close of catalog

, IEZIGGV3

5.10.16

addr of caxwa

OS/VS2 System Programming Library: MVS Diagnostic Techniques

IGGOCLA3

Component recovery area (CRA)
orientation
While Caxwa RPL count is being
altered.

Allocation/Unallocation

This section is divided into four parts. Part one provides a description of the six
major functionalareas of allocation/unallocation and the way in which they
interrelate. Parts two, three, and four contain general debugging aids, debugging
hints, and reason codes.

Functional Description
Figure 5-36 illustrates the control-flow discussion that is presented in the following
paragraphs.

Batch

.JFCB
Housekeepi n9

Routine

I Figure 5-36. The Relationship of the Six Major Functions-of Allocation/Unallocation

Section 5: Component Analysis 5.11.1

Allocation/Unallocation (continued)

Allocation
The flow through allocation following either batch initialization or dynamic
initialization is the same:
• Batch/dynamic initialization and control invokes JFCB housekeeping
• Batch/dynamic initialization and control then invokes common allocation
• Common allocation inv()kes volume mount and verify (if volume unloading or
mounting is needed).

Unallocation
At batch/dynamic unallocation, the control flow is as follows:
• Batch/dynamic initialization and controlinvokes common unallocation
• Common unallocation invokes volume mount and verify (if any volume
unloading is needed).
• Batch initialization and control invokes volume mount and verify (if volume
unloading is needed).

Batch Initialization and Control
Batch initialization and control uses the following control blocks:
•
•
•
•

Job control table (JCT)
Step control table (SCT)
linkage control table (LCT)
Job step control block (JSCB)

The SCT is needed to locate the chain of step I/O tables (SlOTs) and job file
control blocks (JFCBs) in the scheduler work area (SWA). A SlOT and its
corresponding JFCB are constructed by the converter/interpreter for each DD
statement in a job step's JCL. Allocation allocates one step at a time. The SlOTs
and JFCBs for a step are read by batch initialization and control when initializing
for the allocation or unallocation of a step. At step initiation, space for the task
I/O table (TIOT) is obtained, and the JSCB is initialized to point at the top of the
chain of data set association blocks (DSABs), which are actually constructed by
common allocation. At job step allocation, the SlOTs and JFCBs are passed as
the main input, first to JFCB housekeeping, and then to common allocation.
At job step unallocation, the SlOTs and JFCBs are passed as the main input to
common unallocation. At the end of the job, batch initialization and control uses
a volume unload table (VUT) to determine those private volumes that belong to
the ending job and that are to be unloaded. Unloading is done by volume mount
and verify (VM&V).
.

5.11.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Allocation/Unallocation (continued)

Dynamic Initialization and Control
When dynamic initialization and control is invoked, the job step's SlOTs and
JFCBs must be read. This is done only for the first dynamic allocation during a
given job step. The caller's parameters are syntax- and validity-checked and used to
build a SlOT and JFCB, just as in a DD statement. EXisting allocations (represented
by an existing DSAB and TIOT entry) are used where possible to satisfy the
request. If the requested data set is already allocated, certain information is copied
from the SlOT and JFCB of the existing allocation to those of the new allocation.
By using the existing allocation, invocation of JFCB housekeeping and common allocation is avoided. If an existing allocation cannot be used to satisfy the dynamic
request, the SlOT and JFCB built by dynamic initialization and control are used,
first as input to JFCB housekeeping, then to common allocation. After common
allocation completes, the SIOT(s) representing the request is chained to the step's
other SlOTs.

If dynamic unallocation is being requested, the parameters must be syntaxand validity-checked. The correct SlOT is located and passed to common
unall oca tion.

JFCB Housekeeping
the major input to JFCB housekeeping is the SlOT chain, each SlOT having an
associated JFCB. JFCB housekeeping completes needed information about either
batch or dynamic allocation requests that was not placed in SlOTs and JFCBs by
the converter/interpreter. Allocation parameters that JFCB housekeeping completes are the name, volume, unit, DCB, and disposition of the data set. Before
processing these parameters, JFCB housekeeping, using dynamic allocation,
allocates to the initiator's task control block (TCB) any STEPCAT DD or
JOBCAT DD statements. A private catalog control block (PCCB) is built for each
such catalog allocated, and all SlOTs are processed, one at a time. This
JOBCAT/STEPCAT processing takes place in a batch environment only. Information for a request is placed in the JFCB housekeeping work area as a SIOT/JFCB
pair, is processed and reinitialized for each SlOT. If volume information was not
specified for an old data set, the passed data set information (PDI) is searched
(only in a batch environment) in the SWA to locate volume and unit information.
If not found, or if the data set name is a generation data group (GDG) single
name, a catalog LOCATE is issued to obtain the volume and unit information. If
volume reference is specified in the SlOT, either the data set referenced is located
in the PDI or via catalog LOCATE, or the SIOT/JFCB of the referenced DD statement is found. The source of volume and unit information is recorded in the JFCB
housekeeping work area; the information is then retrieved and placed into the
SIOT/JFCB being processed. A DCB reference to a cataloged data set is resolved by
LOCATE and OBTAIN. A DCB reference to a DD statement is resolved by going
to the JFCB of the referenced DD statement and then issuing an OBTAIN. Finally,
disposition-related information is entered into the SIOT/JFCB.

Section 5. Component Analysis

5.11.3·

Allocation/Unallocation (continued)

Common Allocation
Common allocation receives as input the SlOTs' and JFCBs of allocation requests.
For requests that do not require a unit to be allocated, namely, DUMMY, Via, and
subsystems, DSAB and TIOT entries are built and the SlOT is marked "allocated."
For each request requiring units, a list of eligible devices called the eligible device
list (EDL) is constructed, and pointed to by the requestor's SlOT. An entry is
built into the volunit table representing each volume/unit required. Inter-DO
relationships are represented primarily by setting fields in the VU table for use by
the remainder of common allocatioll.
The remainder of common allocation is divided into:
.,
•
•
•

Fixed Device Allocation
TP Allocation
Generic Allocation
Recovery Allocation.

Common allocation control invokes each of these functions in the order:
indicated.
If all requests have been allocated, any requests needing volumes mounted have
volume mount and verify (VM&V) RBs chained to their SlOTs. These VM&V
RBs are chained to each other and sent to VM&V on input. VM&V mounts the'
necessary volumes.

Fixed Device Allocation
, Allocation for any request that can be allocated to a volume on a permanentlyresident or reserved DASD uses fixed device allocation. The allocation of a
request (VU entry) involves:
• The selection of the device
• The building of the DSAB (pointedto by a SlOT)
• The building ofa TIOT entry (pointed to by a DSAB)
• Setting indicators in the unit control block (UCB) of the selected device
• Issuing DADSM
• Demounting incorrect volumes (except in the case of fixed device allocation)
• Scheduling a mount (by building a VM&V request block (VM&V RB) if a
volume must be mounted) ,

TP Allocation
This is a small specialized operation for teleprocessing lines. TP lines, once
allocated, remain allocated whether online or not, and cannot be reallocated.

5.11.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Allocation/Unallocation (continued)
Generic Allocation
Generic allocation attempts to allocate the remaining requests that were not
allocated by previous processes. Requests for tapes, demountable direct access
volumes, graphics devices, and unit record devices are not considered until generic
allocation. A special set of tables, the generic allocation tables are built to
represent the units eligible for each request (VU entry). These tables are used
throughout generic and recovery allocation. Generic allocation processes
requests not sequentially but on the basis of generic device type. The order in
which generic device types are chosen is determined by a table, built at SYSGEN
time, called the device preference table.

Recovery Allocation
Requests left unallocated by previous steps are allocated by recovery allocation.
The main functions of recovery allocation are to interface with the operator to
request that offline devices be brought online, and, once online, to allocate these
devices to unallocated VU entries.

Common Unallocation
The input to common unallocation is a chain of RBs, each of which points to a
SlOT to be unallocated. Disposition processing uses the SIOT/JFCB and common
unallocation RB to give the data set a disposition. Units allocated to each SlOT
are unallocated by using the TIOT entry. Private tape volumes are unloaded and
the VUT is updated with volume serials to indicate which of the job's volumes
were left mounted at unallocation time, but need demounting by batch initialization and control at end of job. Data sets are released (dequeued) by using the
data set enqueue table to determine if the data set's last use in the job is in the
current step. All volumes used by a step are released by a generic dequeue if
unallocation is for a step. In the dynamic unallocation environment, only the
subject request's volumes are dequeued.

Volume Mount and Verify
Volume mount and verify (VM&V) mounts, verifies, and unloads volumes. VM&V
is driven by a chain of VM&V request blocks. A VM&V count table is built in
which the numbers of mount, verify, and unload requests are maintained.
In mounting and verifying direct access volumes, VM&V builds a mount
verification communication area (MVCA) in CSA. This contains a pointer to an
MVCA extension (MVCAX), which MV&V builds in the user region. The MVCAX
contains a device-end ECB and UCB pointers for each device for which a mount has
been issued. After issuing mounts and building the MVCA/MVCAX blocks, VM&V
waits for the device-end ECB in the MVCAX. Whenever a device-end occurs on a
unit that VM&V is waiting for, a nucleus routine (IEFVPOST) posts the device-end
ECBs in all MVCAXs. Any VM&V that is waiting looks at all UCBs being waited
for. Volume serials are read and verified when the devices become ready.

Section s. Component Analysis

5.11.5

Allocation/Unallocation (continued)
Volume unloading is accomplished for DASD by issuing an unload message to
the operator and clearing volume-related data from the VCB. For tape volume
unloading, a physical rewind/unload operation is also performed. Virtual volume
unloading is accomplished by issuing an unload SVC (SVC 126), and clearing
volume-related data from the VCB.

General Debugging Aids
Described here in general terms are the following:
•
•
•
•

Allocation Module Naming Conventions
Registers and Save Areas
Common Allocation Control Block Processing
EST AE Processing

Allocation Module Naming Conventions
All Allocation module names have the following format:
IEF B4

-

-

IEF indicates the module is a scheduler module. The fourth character has the
following meaning:

• If A, the module is part of common allocation, common unallocation, JFCB
housekeeping, or volume mount and verify.
• If B, the module is part of batch allocation or batch unallocation.
• If D, the module is part of dynamic allocation or dynamic unallocation.
B4 identifies the module as a part of allocation. The last two characters are a
unique module identifier.

Regist~/and

Save Areas

Allocation follows standard register saving and usage conventions. Register 13 is
used as a save area pointer, register 14 as a return address, and register 15 as a
branch address. Register save areas are chained in the standard manner.
Since allocation is coded completely in top-down fashion, it is a simple matter
to find the flow of control leading to the current point of processing by tracing
back through the save areas. All allocation modules have identifiers just after the
beginning of the module, which contain the module name in EBCDIC. A graphic
representation of control flow can be found under "Allocation/Vnallocation" in
"Module-to-Module Control Flow" of Volume 6 of OS/VS2 System Logic Library.

5.11.6

OS/VS2 System Programming Libr8lY: MVS Diagnostic Techniques

Allocation/Unallocation (continued)
Space for the allocation save areas is obtained in a unique manner, which can be
of help in debugging. On entry to allocation, a 4K block of space is obtained from
subpool230. This block is used to contain the save area and data area for each
module called, until the block is full, at which time another 4K block is obtained.
Save areas of modules that had been given control but then returned are still
valid, that is not freed, if the 4K block in which they had been placed has not been
freed. Allocation does not keep the address of a control block in any particular
register. Register 13 always points at the save area of the module in control.
Register 12 is usually the base register of the module in control.

Common Allocation Control Block Processing
This section graphically describes the control blocks used by common allocation
and explains how these control blocks reflect allocation processing. Figure 5·37
shows the control blocks which are input to common allocation. Data set associa·
tion blocks (DSABs) and their associated task input/output table (TIOT) entries
are shown as input. Note that DSABs exist only if common allocation was called
by dynamic allocation. When batch allocation calls common allocation, there are
no DSABs, but there is a DSAB queue descriptor block (QDB).
The first major step in common allocation processing is the construction of the
allocation work area (ALeWA). Following this, requests that do not require
units, such as DUMMY and SYSOUT DD requests, are allocated. A DSAB and
TIOT entry are built for eaGh of these requests as they are allocated. SIOTETIO is
initialized to point to the DSAB whenever it is created for a given SlOT. Bit
SIOTALCD is set to 1 whenever a request (SlOT) is fully allocated.
After allocating these requests, the vol unit table (VU table) is created to
represent the unit requirements of remaining (unallocated) SlOTs. In addition.,
an eligible devices list (EDL) is created for each remaining SlOT. The EDt"
contains the unit control block (UeB) pointers to all UeBs representing devices
eligible for allocation to the SlOT. (A device is "eligible" at this point whether on
or offline, either logically or physically.) Figure 5·38 shows the relationship of the
ALeWA, SlOTs, etc., after the VU table and EDLs are built. The first SlOT on
the chain (SlOT A) represents a SYSOUT DD statement that has already been
allocated. The second SlOT on the chain (SlOT B) represents a SlOT that requires
one or more units. It is shown to have 2 volunit entries, which indicates the total
number of units that can be allocated to that SlOT. SVOLUNNO in the SlOT
contains the number of VU entries for a SlOT. (Note that the total number of
units allocated to a request can exceed the number of units requested. This
happens, for example, if a specifically requested volume were found to be mounted
with the permanently-resident mount attribute).

Section S. Component Analysis

5.11.7

Allocation/Unallocation (continued)

Problem Program
JSCB

DSAB ODB

DSAB (0)

TIOT

DSOFRSTP
+X'8'

DSOLASTP

TIOENTRY

JSCDSABO

+X'140'

+X'10'

DSABTIOT

UCB
TIOENTRY

r
+X'8'
DSABTIOT

Virtual Address of 1st SlOT to Allocate

SlOT (1 per data set)

JFCB

I
+X'98'
JFCBX

+X'9C'
+X'AO'

SIOTJFX

C

+X'AC' [

JFCBXNXT

V

SlOT

-IJFCBX

JFCB

JFCBX

I Figure 5-37. Common Allocation Input

5.11.8

OSjVS2 System Programming Library: MVS Diagnostic Techniques

I

Allocation/Unallocation (continued)

SlOT 'A' (SYSOUT)
+ X'2B' (SlOTA LCD)

= X'02'
TIOT

+X'94'
+X'98'

A LCWA

+X'8'

+X'50'

SIOT1 P (

1-----.. .

DSABTIOT

[

:J

TIOENTRY

~r~~

(

+1st SlOT)

VOLUNPTR

+X'88'
+X'8C'

SIOTEDLP
SVOLUNAD
~------I

+X'98'

+X'A8'

SIOTNPTR

t-------.. .

EDL
+X'88'

SIOTEDLP

+X'8C'

SVOLUNAD

+X'98'

-

SIOTNPTR (=0)

VU entry no. 1 for SlOT 'B'
VU entry no. 2 for SlOT 'B'

I Figure 5-38. Common Allocation Control Blocks After Construction of Volunit Table and EDLs.

Section 5. Component Analysis

5.11.9

Allocation/Unallocation (continued)
Common allocation processing is reflected by the status of request's SlOT and
VU entries. As each VU entry requiring a unit is allocated, bit VOLALOC (bit 0
(X'80') at +7 into the VU entry) is set on. Bit VDEVREQD (bit 2 (X'20) at +7
into the VU entry), if on, indicates that the VU entry requires a unit. Once all VU
entries with VDEVREQD=l for a given SlOT are allocated and VOLALOC=l, the
SlOT is marked allocated by setting on SIOTALCD (bit 6 (X'02') at X'2B' into the
SlOT).
As each unit is allocated to a request, that allocation is reflected in (1) the
unit's UCB by setting UCBALOC (bit 4 (X'08') at +3 in the UCB) on, and in (2) the
request's TIOT entry by placing the UCB pointer into field TIOUCBP in the TIOT
entry. (TIODCBP is at a X'lO' into a TIOT entry for the first unit allocated, at
+X'14' for the second, etc.). The first time a VU entry for a SlOT is allocated, a
DSAB and TIOT entry are created. For subsequent VU entries allocated to a SlOT,
the DSAB and TIOT entries are updated.

ESTAE Processing
All of allocation is protected from abends by ESTAE processing. Only one ESTAE
is issued during allocation. The batch allocation ESTAE exit routine, IEFAB4E4,
performs a retry, causing routine IEF AB4E3 to get control. IEFAB4E3 returns to
the initiator with a failure return code, causing the initiator to fail the job. All
other ESTAE exit routines percolate to the next higher level of ESTAE protection.
In a batch unallocation environment, this causes the initiator to terminate.
When an abend occurs in a batch environment, message IEF197I "SYSTEM
ERROR DURING ALLOCATION/UNALLOCATION" is issued to SYSOUT by
ESTAE processing. If the abend occurs in batch allocation or a routine called by
batch allocation, such as JFCB hou~ekeeping, message IEF1971 is issued to
the job's SYSOUT. If the abend occurs during batch unallocation, the same
message goes to the initiator's SYSOUT.
An SVC dump is always taken if an abend occurs when allocation is in control.

S.11.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Allocation/Unallocation (continued)

Debugging Hints
Hints for debugging specific problem areas are described here including:
•
•
•
•
•

Allocation Serialization
Device Selection Problems (Non-Abend)
OBO Abend
OC4 Abend in IEFAB4FC, or Loop in IEFDB413
Volume Mount and Verify (VM&V) Waiting Mechanism

Allocation Serialization
Allocation serializes on several types of resources. This has resulted in deadlocks
between job steps when a programming change caused incorrect seriJIlization.
Both dynamic allocation and JFCB housekeeping enqueue on data set names.
Dynamic allocation enqueues on non-temporary data set names before calling
JFCB housekeeping. JFCB housekeeping enqueues on real data set names when
it finds via LOCATE, that the specified data set name is an alias; the fullyqualified names of GDG single requests (found via LOCATE); the individual names
in a generation data group; and the data set names of temporary, non-VIO data
sets. (The initiator enqueues all non-temporary names of JCL-specified data sets
before a job starts). Data set names are dequeued by unallocation, either batch
or dynamic, in the last step in which the data set is referenced.
Common allocation enqueues on volume serials of all specific volume requests
except for direct-access volumes, which are either permanently resident or reserved.
This is done after the allocation of permanently resident or reserved direct-access
volumes, that is, following fixed device allocation. The volume serials of demountable volumes allocated to non-specific volume requests are enqueued either
when the volume is allocated (if the volume is already mounted) or when
the volume is mounted (if allocation mounts and verifies it). (When there is a nonspecific request for tape, OPEN enqueues the tape-volume serial numbers because
allocation only waits for direct-access volumes to be mounted.) Before actually
allocating a device, common allocation serializes the status of devices by enqueuing
on several resources all with the major name SYSIEFSD. The minor names and
functions serialized are as follows:
1. Q4 - to serialize device allocation with VARY offline processing, which is
actually done by common allocation
2. CHNGDEVS - to serialize device allocation with device unloading done by
the UNLOAD operator command and JES3
3. DDRDA - to serialize devices allocation with dynamic device reconfiguration (DDR) processing of direct access devices and
4. DDRTPUR - to serialize device allocation with DDR processing of tape and
unit record devices

Section 5. Component Analysis

5.11.11

Allocation/Unallocation (continued) .
These four resources are enqueued for shared use by allocation and for exclusive
use by the other functions. Within common allocation, these resources, with the
exception of Q4, are dequeued when allocation must wait on an allocation recovery
WTOR or on an allocation group.
Allocation serializes, via an internal mechanism, the processing of all devices
except direct access devices containing. non-demountable (permanently mounted
or reserved) volumes. The serialization unit is an allocation group. This serialization is done to serialize the device allocation in one address space with that in
another. Group serialization" is exclusive, that is, it prevents an allocation in a
given address space from considering the same device that an allocation in another
address space is considering. All allocations serialize on groups in the same order;
this order is specified at sysgen and isrepresented in the csect PREFTAB, which is
part of allocation load module IEFW21SD. PREFTAB is simply a list of generic
device types.
To serialize changes to a specific UCB,allocation and unallocation always obtain
the local and CMS locks before setting fields in the UCB.
Dynamic allocation serializes with itself so that only one dynamic allocation
may proceed in an address space. This is done by an enqueue for exclusive use on
major name SYSZTIOT, the minor name consisting of the 2-byte ASID and 4-byte
address of the DSAB QDB.

Subsystem Allocation Serialization
Allocation does not serialize when processing subsystem data set requests, but
provides the capability whereby a subsystem may serialize its own requests if
it so desires. The mechanism to do this is the subsystem allocation sequence
table (SAST). A skeletal SAST is built during subsystem interface initialization
to define the order in which subsystems are to be invoked for the allocation
of subsystem data sets. During common allocation processing the subsystem
requests are sorted by subsystem. Using the sequence defined by the SAST, all
requests for a given subsystem are passed to that subsystem for allocation
before the next subsystems requests are processed. Thus a subsystem can serialize
its allocation processing in order to prevent deadlocks.

Device Selection Problems (Non-Abend)
The device selection logic of common allocation is heavily dependent on the
eligible devices table (EDT) which is built at SYSGEN. The EDT describes the unit
eligibility of any unit name that may be specified either via JCL or dynamic
request. Users have in the past tried to modify the EDT without doing either a
full or an I/O SYSGEN. Modification of the EDT can result in incorrect allocation,
for example, allocation of a 3330 request to a 2314,or failure of a request or job
ste.p with no error indicated. If such a device selection error occurs after modification of an EDT, the modification is suspect and should be carefully verified by
consulting the EDT descriptions in the OS/VS2 System Logic Library section on
Data Areas, and/or EDT mapping in OS/VS2 Data Areas (microfiche ),via mapping
macro IEFZB421.

5.11.12

OS!VS2 System Programming Library: MVS Diagnostic Techniques.

Allocation/Unallocation (continued)

Address Space Termination
When an address space is being abnormally terminated, the allocation address space
termination routine, IEFAB4E5, gets control. This routine releases any allocation
groups held by the address space and un allocates any non-shareable units allocated
to the address space. Non-shareable units include all units except shareable directaccess devices. The ASID of the address space allocated to a non-shareable unit is
at X'E' (halfwonU in the common UCB extension.

OBO Abend
OBO abends have occurred in allocation more than once. The code is issued by the
SWA manager, which handles the reading, writing, and assigning of SWA records.
Allocation requests all these functions of the SWA manager. Two situations cause
allocation to receive a OBO abend from the SWA manager:
1. The address of a SWA record to be read or written, in behalf of allocation, has
been overlaid. Allocation usually obtains a SWA virtual address (SV A) to read
or write from another SWA record. When such an SVA has been overwritten by
a scheduler SUb-component, a OBO abend may occur.
2. A OBO abend will occur when ~location assigns an SVA for a record and then
uses the SVA to attempt to read the record without first having written the
record.

OC4 Abend in IEFAB4FC, or Loop in IEFDB413
This error always occurs when the device type in a UCB is changed from one
generic type to another, and when a JCL statement or dynamic request specifies that
particular unit. If this error occurs, it can be diagnosed as follows:
1. Find the device type (+ X'l 0') in the UCB of the specific unit.
2. In the EDT (a CSECT that is well mapped in its assembly at SYSGEN), find the
look-up entry representing the device type in the DCB. If the requested unit is
not among the units represented by the look-up entry, the problem is that the
device type in the DCB was changed.

Section 5. Component Analysis

5.11.13

Allocation/UnaIlocation (continued)

Volume Mount and Verify (VM&V) Waiting Mechanism
Volume mount and verify must wait for direct access volumes to be mounted so
that the labels can be verified, and so that allocation can enqueue the volume
serials for non-specific volume requests and obtain space (for new data set
requests). In order to allow for several allocations to be waiting simultaneously,
the control block structure shown in Figure 5-39 is set up by VM&V.
Each address space waiting for at least one direct access volume to be mounted
has its own mount verification control area (MVCA), MVCA extension
(MVCAX), one or more mount entries, and, in each mount entry, one or more
UCB entries .. Each MVCAX contains an ECB. When an allocation is waiting for a
direct-access volume to be mounted, VM&V waits for this ECB in behalf of the
allocation. The MVCA chain is anchored in the allocation/termination communication area (ATCA) in the nucleus. The ATCA is pointed to by location
CVTQMWR in the CVT. All devices on which allocation waits for a device end
(volume mount), will have the scheduler attention table index placed in their
VCBs (at +3 in the common VCB extension). The index is X'OC'.
Any destruction of the MVCA/MVCAX structure causes one or more allocations to wait "permanently." The wait is not truly permanent, however, because
VM&Valso waits for (in a batch environment) the cancel ECB (in the CSCBcommand scheduling control block), which is posted when the operator cancels
ajob. In a dynamic environment, VM&V waits for a WTOR ECB, in which case
the operator can, via reply, cancel the single mount but not the job.

5.11.14

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Allocation/Unallocation (continued)

ATCA

Nucleus

CSA (Subpool 241)

4

8

0

C

I

-~---~

Memory A (Sub pool 230)

Memory B (Subpool 230)

o

Dev End DCB

4
t---------1

8
C

Device Entries

I
I
I
I
I

I

Mount Entry

o
4

8

t------.......
t - - - - - -.......
t--------I

C

I Figure 5-39. VM&V Control Block Structure

o
4

8

Device Entries

1------.. .

C

I
I

I
I

Section 5. Component Analysis

5.11.15

Allocation/Unallocation (continued)

Allocation/Unallocation Reason Codes
The reason codes listed here are divided into three groups:
.. Reason codes set by batch and cornmon allocation modules and by JFCB
housekeeping modules.
• Reason codes set by unallocation modules .
• Reason codes set by dynamic allocation modules.

Common and Batch Allocation and JFCB Housekeeping Reason Codes
The reason codes set by common and batch allocation and by JFCB housekeeping
are divided into step-related reason codes and DD-related reason codes.
The following are DD-related error reason codes set by allocation and JFCB
housekeeping modules and placed in the SIOTRSNC field of the SlOT. The reason
codes serve as an index into message module IEFBB4M3. The prologue of
IEFBB4M3 lists the modules which detect the error conditions.
Reason Code
1
2

3
4
5
6
7
8
9
10

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

33
34
35
36
37

Dynamic Allocation
Error Reason Code
1700
0244
0210
020C
0458
0214
021C
0480
0224
0398
4714

47A8
47AC
reserved
039C
0228
4704
4708
470C
4710
4714
4718

4734
4738
reserved
4740
reserved

Message

Meaning

IEF2121
IEF3711
IEF2111
IEF2111
IEF3651
IEF7021
IEF2211
IEF2101
IE F1951
IEF1921
IEF1941
IEF2461
IEF7211
IEF3721
IEF3181
IEF7191
IEF7201
IEF6881

Data set not found.
Telecommunication device is not accessible.
Unable to E NO on data set name.
Unable to E NO on data set name.
Referenced data set name is GOG ALL.
Unable to allocate.
Invalid backward reference to a step.
Invalid UNIT parameter.
Maximum number of devices for statement exceeded.
Not enough eligible devices.
Volume sequence number incorrect.
Insufficient space on storage volumes.
Protection conflict in ISAM requests (SU 32 only).
VOL=REF to unresolved DO.
UNIT=AFF to new direct data set.
Data set previously defined (SU 32 only).
User not authorized to define this data set (SU 32 only).
Nullfile and DSNAME conflict in ISAM concatenation.

IEF2451
IEF4741
IEF2531
IEF2541
IEF1931
IEF2561
IEF2571
IEF2581
IEF2601
IEF2611
IEF2621
IEF2631
IEF2641
IEF2661
IEF1401

Inconsistent unit name and volser.
Unit or volume in use by system task.
Duplicate data set name on direct access volume.
Insufficient space in VTOC.
Space not obtained because of I/O error.
Absolute track not available.
Space requested not available.
Invalid record length in SPACE parameter.
Incorrect DSORG or DISP.
No prime area request for ISAM data set.
Prime area must be requested before overflow area.
Space request not on cylinder boundary.
Duplication of DSNAME element.
Invalid JFCe or partial DSce pointer.
Directory space request too large.

IEF2731

Invalid user label request.

... - means that the, error cannot be set in dynamic allocation.

5.11.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Allocation/Unallocation (continued)

Reason Code
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
6~

Dynamic Allocation
Error Reason Code
474C

476C
4780

035C
0390
0394
0218
0494
022C
0214
0220
4794
4798
479C

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

0490
17FF
022C
024C
0250
03AO
04A4
0484
7700
04A8

91
92
93
94
95
96

04AC
04BO
7704
Reserved
04B4
03A4

0230
0488
048C
47A4
0214
0240
04B8
04BC
0234

0470
046C

Message

Meaning

IE F1271
IEF1281
IEF1291
IEF1301
IEF1311
IE F1321
IEF1331
IEF1341
IEF1351
I EF1361
IEF2671
IEF1451
IEF1411
IEF1431
IEF3661
IEF2191
IEF2861
IEF4661

No SPACE parameter or zero space request at ABSTR O.
Invalid request for ISAM index.
Multivolume index request.
DSNAME element wrong.
Multivolume OVFLOW request.
CYL and ABSTR conflict in SPACE parameter.
CYL and CONTIG conflict in SPACE parameter.
Subparameter wrong in SPACE parameter.
Zero primary space request.
Index area requested twice.
Space request for directory larger than primary space request.
Space request not ABSTR for DOS volume.
Index request did not precede prime request.
Last concatenated DO card unnecessary or invalid.
Relative GOG generation number contains syntax error.
GOG group name exceeds 35 characters.
DISP field incompatible with data set name.
Unable to recover from DADSM failure.
Mounting required but not allowed.
Can't access SYSCATLG data set on CVOL.
Volume on ineligible permanently resident or reserved device.
Units required not available - waiting not allowed.
Volumes required not available - waiting not allowed.
Data sets overlap in VTOC.
DOS split cylinder data sets overlap.
Possible VTOC error.
VTOC error on second or later volume of ISAM prime data set.
Same unit request twice - conflicts exist.
Permanently resident or reserved volume on requested unit.
Volume containing pattern DSCB not mounted.
Pattern DSCB record not found in VTOC.
New data set requested on DOS stacked pack format volume.
Can't wait for offline devices.
Requested device is a console.
MSS not initialized.
MSS select error.
More units required for demand request.
Invalid JOBCAT or STEPCAT parameters.
Invalid data set name for JOBCAT or STEPCAT.

IEF7041
IEF4751
IEF4671
~EF4851

IEF4761
IEF4771
IEF4781
IEF4791
IEF4811
IEF4821
IEF2171
IEF2181
IEF7031
IEF4831
IEF7261
IEF7251
IEF4841
IEF4931
IEF4921
reserved
IEF4801
reserved
IEF7011
IEF2131
IEF6871
IEF7521
IEF7521
IEF7521
IEF7521
IEF7521
IEF7521
IEF7531

Unauthorized request"or of subsystem data set.
Invalid destination requested.
Error changing allocation assignments.
Error processing cataloged data set.
Requested volume mounted on JES3-managed unit.

I

The request for a subsystem data set was failed by the subsystem
attempting to allocate the request.

IEF7541
IEF7551
IEF7561

A SUBSYS parameter specified a subsystem which does not support
the allocation of subsystem data sets.
The subsystem requested on a SUBSYS parameter was not operational.
The subsystem requested on a SUBSYS parameter does not exist.
A system error occurred in allocating a subsystem data set.

IE F7401
IEF7411

Data set/volume could not be RACF protected - RACF not active (SU 32 only).
Protect request failed - invalid data set/volume specification (SU 32 only).

• - means that the error cannot be set in dynamic allocation.

Section 5: Component Analysis

5.11.17

AUocation/Unallocation (continued)
The following are step-related error reason codes set by allocation and JFCB
housekeeping modules in an area pointed to by the allocation work area (ALCWA).
With· the exception of reason code 1, the reason codes serve as an index into message module IEFBB4M2. The prologue of IEFBB4M21ists the modules which detect
the error condition. Reason code 1 is set by IEFAB469 and is returned to dynamiC
allocation.
Reason Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

*-

Dynamic Allocation
Error Reason Code
023C
0204
0220
0484
0238
0220
049C
0474
0248
0450
172C
1718
670C
0478
047C
0214
0490
0468

0498
04AO
024C
0250
03AO
04A4
7700

Message
IEF1801
IEF7131
reserved
IEF2511
IEF2401
IEF4851
IEF71'1-1
IEF4731
IEF7161
IEF4911
IEF3631
IEF3641
IEF3671
IEF4651
IEF4561
IEF7001
IEF7011
IEF3611
IEF3621
IEF2021
IEF2021
IEF7151
IEF7171
IEF7181
IEF7511
IEF7511
IEF7511
IEF7511
IEF751!

Meaning
Catalog not mounted.
GETMAIN error.
MSS volume not available.
Job cancelled.
No space in TIOT.
Volumes not available and waiting not allowed.
MSS volume not defined.
System Resources Manager error.
Unable to mount MSS volume.
Number of DDs exceeds 1635.
Not enough storage for processing cataloged data set.
Permanent 1/0 error processing cataloged data set.
I/O error obtaining pattern osce.
Unable to allocate subsystem data set.
Error issuing EST AE macro.
Environment changed - no longer able to allocate.
Error changing allocation assignments.
Unal;>le to allocate private catalog.
Unable to unallocate private catalog.
Step not run because of condition codes.
Step not run because of condition codes.
MVS volume inaccessible.
Specified virtual volume group (VVGRP) name does not exist.
Space or virtual volumE! group (VVG RP) required for nonspecific MSS request.

}

The job was failed in allocation by a subsystem processing a request to allocate
one or more subsystem data sets.

means that the error cannot be set in dynamic allocation.

5.11.18

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Allocation/Unallocation (continued)

Common and Batch Unallocation Reason Codes
The following reason codes are set by common and batch unallocation modules.
Reason codes 1, 2, and 4 serve as an index into message module IEFBB4MS. Reason
code 3 does not result in a message; it is returned to dynamic allocation.
Reason Code

2

Message

Meaning

Mpdule Setting

IEF4681

GETMAIN error.

IEFBB410,IEFBB414,
IEFBB416,IEFAB4AO

IEF4691

3
IEF7241
4

IEF4S61

Data sets not released.

IEFAB4AO,IEFAB4A6

Volumes not released.
(Dynamic allocation only).
Step catalogs not allocated.
(Warm start only).

IEFAB4AO,IEFAB4A8

Error issuing ESTAE macro.

IEFBB410,IEFAB4AO

IEFAB4A2

In addition, IEFAB4A2 (disposition processor) receives return codes returned by
the data management catalog and scratch functions (called by IEFAB4A2 to perform disposition processing). If the allocation is dynamic, these return codes are
returned to dynamic allocation as reason codes in a field in the unallocation request
block. For batch allocation, the return code is converted to a code for a disposition
message.

Dynamic Allocation Reason Codes
For a description of dynamic allocation reason codes, refer to the topics "Informational
Reason Codes" and "Error Reason Codes" in OS/VS2 System Programming Library:
Job Management.

Section 5: Component Analysis

5.11.19

5.11.20

OS!VS2 System Programming Library: MVS Diagnostic Techniques

JES2

JES2 is a job entry subsystem for OS/VS2 MVS. An overview of the JES2
structure is presented in this section. For detailed information on JES2 structure,
logic, and control block formats, see OS/VS2 JES2 Logic. A partial list of major
JES2 control blocks showing storage location and primary use may be seen in
Figure 5-49 at the end of this section.
JES2 is a subsystem that runs as an operator-started job in a separate address
space. It provides input and output spooling for local and remote unit record
devices, and simplified batch scheduling. A subsystem support module, provided
by JES2 and located in the page able link pack area (PLPA), is utilized to communicate with other system components in performing job selection and execution. JES2 may be connected to as many as seven other JES2 subsystems via the
multi-access spool direct access storage devices.

Job Processing Through JES2
JES2 job processing is divided into the following five major phases:
Input
Jobs are read into the system from online card readers, remote terminal and
internal reader interfaces (TSO LOGONs, TSO-submitted j?bs, system tasks,
or jobs presented to the internal reader from other sources). These jobs are then
. entered into a priority queue to await processing by the next stage.
Conversion
As soon as the converter is available, the JCL for a job is passed through the
converter, scanned for syntax errors, and converted into internal text. Any jobs
having JCL errors will bypass execution and be queued for output processing
immediately. Those jobs that successfully complete conversion are queued by
priority, within class, to await an eligible initiator for execution.

Execution
Jobs are selected by priority, within class, for an eligible initiator. Input cards are
supplied as required to the executing program. Output records are received and
written onto JES2 spool devices. At the completion of execution, the job is placed
in a queue to await output processing.

Output
The print data sets created during execution, and messages created during earlier
stages, are printed. The punch data sets are punched.

Section 5. Component Analysis

5.12.1

JES2 (continued)
Purge
Upon completion of all processing required for the job, the direct access space
acquired by JES2 for the job and all JES2 resources associated with the job are
released.

JES2 Structure
JES2 consists of two basic modules. HASJES20, which operates in JES2 address
space and provides the subsystem's job processing functions; and HASPSSSM, which
is located in PLPA and provides the interface between the operating system and the
HASJES20 programs.

HASJES20 Program Structure
The HASJES20 module is made up of seven tasks that perform JES2 job processing. The JES2 main task provides the basic functions of reading and spooling
job input, converting JCL, selecting jobs for execution from the JES2 job queue,
receiving and outputing job output, and job cleanup, all accomplished with a set of
programs called basic functional processors. These processors are supported by
another set of programs, the control processors, which provide subsystem control
and JES2 facilities. Both sets of processors use numerous subroutines called
control service programs.
The heart of HASJES20 is the dispatcher, which schedules and dispatches
various processors under the single TCB of the main task. Since the main task cannot afford to go into a wait state, any JES2 programs that have the potential for
waiting are isolated as subtasks. There are six JES2 subtasks:
1. Conversionsubtask - links to OS/VS2 converter.
2. Image loader subtask - loads universal character set (UCS) and forms
control buffer (FCB) images.
3. System management facility (SMF) sub task - issues SVC 83 to write
accounting records for main task.
4. Communications subtask - issues SVC 34 and SVC 35 for main task operator
communications.
5. SNA subtask - initializes JES2 use of the VTAM interface with OPEN ACB.
6. Dynamic spool allocation subtask - initializes JES2's spool volumes
(SYS I.HASPACE and SYS I.HASPCKPT).

5.12.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

JES2 (continued)

HASJES20 Module Structure
HASJES20 shown in Figure S-40, consists of ten source modules which contain
JES2 main flow job processing code and associated directories. The HASPINIT
module is loaded in JES2 address space to initialize the subsystem. After initialization HASPINIT is deleted. HASPRDR, HASPXEQ, and HASPPRPU contain the
functional and control processors needed to effect major job processing steps,
along with related specialized subroutines and subtasks. The first 4K of HASPNUC
is fixed in JES2 memory since it contains the system routines which provide
support to the other modules. HASPNUC also contains the HCT and JES2 module
directory (see Figure S-4l). HASPRTAM contains all the access method and line
management functions to support both bisynchronous and systems network
architecture (SNA) remote job entry terminals. Communication functions are
isolated to this module. All processing associated with JES2 multi-access spool
systems is contained in HASPMISC along with spool initialization, checkpoint, and
job purge operations.
HASPNUC
HCT
Dispatcher
I/O Supervisor
Service Routines
Service Processors
Module Directory

•
•
•
•
•
•

JES2 Address Space (Low end)

HASPRDR
HASPXEQ
Conversion Processor
Conversion Subtask
Execution Processor
Time Excession Processor
Process SYSOUT Processor

•

•
•
••

HASPPRPU
Output Processor
Print/Punch Processor
Image Loading Subtask

•

•
•
HASPACCT
• SMF Subtask

HASPMISC
Purge Processor
Checkpoint Processor
Track Group Allocation Subroutine
Priority Aging Processor
Warm Start Processor
Dynamic Spool Allocation Subtask

•

•
•
•
•

•
HASPCON
Routines
• Service
Communication Subtask
•

HASPRTAM
RT AM Service Routines (SSC and SNA)
Line Manager Processor (SSC and SNA)
Remote Console Processor
VTAM Subtask (ACS OPEN, CLOSE)
API Routines

•
•
•
•

•

HASPCOMM
Command Processor

•

HASPINIT
(Deleted after subsystem initialization)

JES2 Address Space (High end)

I Figure 5-40. HASJES20 Module Map
Section 5. Component Analysis

5.12.3

JES2 (continued)
HASPNUC
HCT

R11 (BASE 1)

OFFSET
0008

Module MOP}

"

__------------A~---------__
,
(
• HASPXEQ
•
•
HASP/NIT
IHASPCOMM
HASPACCT
~

I---

JES2 Modules

I Figure 5-41. Locating the JES2 Module Directory in HASPNUC
HASP Control Table (HCT)
The global directory for HASJES20 is the HASP control table (HCT), which is
found in HASPNUC. (Figure 5-42 shows the major vector fields of the HCT.) In
HASJES20, RII may be used to locate the HCT. Eight bytes into the HCT is the
address of the JES2 module map which contains a symbolic name and YCON entry
for each of the JES2 modules.

5.12.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

JES2 (continued)
HASJES20
R11

l
HCT

I

ReI. 4.1
Offset

Rei. 4.0
Offset

0008

0008

0010
001C

0010
001C

0020
OOFO

0020
OOFO

00F4
012C

00F4
012C

0134

0134

0138
0240

$HASPMAP

Address JES2 Module Directory

0

••••••••••

•

0

••••••••••

•

•••

0

•

0

•••••

o

0

•

•

0

•

0

•

•

•

•

o

0

•

•

•

•

•

0

•

•

•

0138
0234

o

•••

0

0

••••••

0244
0254

0238
0248

· . . • . . . . . . . .Configuration Constraints

0256
029B

024A
028F

• '0 • • • • • • • • • • Operating Constraints

029C
02E2

0290
02CE

· . . • . . . . . . . • Internal Constraints

02E4
03E8

02DO
03CF

• • . . . . . • . . • .Control Fields

03EC
042C

03DO
0410

• • . . . . . . . . . . Processor PCE Addresses

0430

0414

0434
0436

0418
041A

0438
0490

041C
0474

0498
0540

047C
051A

$SSVT

$CURPCE

••

•

•••

0

Entries HASP Dispatcher
Entries Service Routines

.TCB and ECB Addresses for
SUbtasks
.Address SSVT
Control Block Directory

•••••••

Current PCE Address

• • . • . . . . . . . •Dispatcher Event Control Fields

-

o

•

•

•

•

•

•

•

•

•

•

.Processor Queue Addresses

· . . . . . . . . . . .Checkpoint Record

Figure 542. HCT Major Vector Fields

Section 5. Component Analysis

5.12.5

JE82 (continued)

HASPSSSM
Locate.d in PLPA, HASPS SSM interfaces directly with the operating system through
the formal subsystem interface (SSI) to provide job scheduling, data management
(SYSIN and SYSOUT), and. operator communications. HASPS SSM contains
function routines which are invoked through the use of vectors in the subsystem
vector table (SSVT) shown in Figure 5·43. The vectors are used by the operating
system to invoke functions which are defined by the IEFJSSOB macro expansion.
Additional SSVTvectors are used by the HASJES20 module to provide services to
the rest of the JES2 system. During execution of functions represented by the
SSVT vectors, additional vectors are set into data extent blocks (DEBs) and access
method control blocks (ACBs) for data management support. In the performance of
its functions, HASPSSSM makes requests for services to the HASJES20 module
running under the JES2 TCB as well as to the operating system. The module is
entered in the privileged state. The storage relationship of HASPSSSM and
HASJES20 to the operating system is shown in Figure 5·44.

SSVT
Function Map

Common to all
Subsystems

Pointers to Function Routines
Pointers to Service Routines

JES2 Extension

Queue Heads
Spool Control

I

Figure 5·43. The Subsystem Vector Table

I
I

]

SSVT

.

USER REGION

•
•

•

I

HASPSSM

JES2 REGION

STC Tasks
TSU Tasks
Batch JO B Tasks

HASJES20
HCT
Supporting Code

•

•

I

Control Blocks

I

Nucleus
SVC.111

•
I
5~ 12.6

Figure 5·44. HASPSSM - HASJES20 - OS/VS2 Relationship

OS/VS2 System Programming Library: MVS Diagnostic Techniques

JES2 (continued)

Subsystem Interface
MVS interfaces formally with JES2 by building a subsystem options block
(SSOB) and issuing the IEFSSREQ macro. This formal subsystem interface is
shown in Figure 5-45. The subsystem is entered by indexing the SSVT for the
entry pointer into HASPSSSM. HASPSSSM performs the requested function, if
necessary communicating with HASJES20 by cross-memory post and returns.
The SSVT, located in CSA subpool241, contains the pointers necessary for
MVS/JES2 and HASPSSSM/HASJES20 communication. With the HCT, the major
control block in HASJES20, the SSVT forms the central directory for JES2.
The subsystem interface is used for the following:
A. Job scheduling and control functions
1. Job selection
2. Job deletion (termination)
3. Re-enqueue job
4. Request job identification
5. Return job identification
6. End of memory
7. End of task
B.

Data set access method functions
1. Allocation
2. Open - activates following interfaces:
a. GET/PUT/PUT ENDREQ/NOTE/POINT
b. End of block (SVC 111)
3. Checkpoint
4. Restart
5. Close
6. Unallocation

C.

TSO/external writer communications
1. Process SYSOUT
2. CANCEL
3. STATUS
4. User identification validity check

D. Operator communications
1. Command processing (SVC 34)
2. Write to operator (SVC 35)
In addition to the formal SSOB interfaces, the following miscellaneous subsystem interfaces are defined:
1. Exit from the OS/VS2 converter
2. Unsolicited device end
3. Privilege status from the program properties table.

Section S. Component Analysis

S.1~.7

JES2 (continued)
User Module

/'

I
I

...

Build SSOB
Make Req.

J

t

"'1

I

i

IEFSSREQ

I
I
I

Validate
Set 55113
Enter SSYS.

I
I
I

..... SSOB

R1

...
SSCT

Header

/

RO
,"

I

D

Extension

I

I
I

I
I
I
I

I

I

SSIB

I

I

I

I

I

I
I

HASPSSSM ~

I

R11

I

.

ssvi

11,

Perform
Function
Return

I
I

SJB
~

...
"

.....

-

Figure 5-45. Formal Subsystem-Interface Vectors

5.12.8

-

Pointer
R1

Save Area

I

I

I,

..

R 13

OS!VS2 System Programming Library: MVS Diagnostic Techniques

II

JES2 (continued)

Dispatcher Structure
The JES2 dispatcher allocates processor time to the JES2 main task processors. Each
processor is represented by a control block called a processor control element
(PCE). When a processor is eligible for dispatching, its PCE is on a dispatcher
queue called the $READY queue. When a processor is waiting on an event, it is
ineligible for dispatching. If the processor is waiting for a resource, its PCE is
chained to the designated resource wait queue; if the processor is waiting for a
specific event, its PCE is queued to itself via a specific event wait field, 'PCEEWF',
in the PCE. The currently active processor's PCE is at the top of the $READY
queue and is addressed by the $CURPCE of the HCT. Major queue and eventcontrol fields in the HCT and SSVT are shown in Figure 5-46.

$WAIT
A processor that is currently active remains so until it issues a $WAIT macro
instruction, at which time the dispatcher is entered at entry point $WAIT (for
a specific event), or $WAITR (for a general resource). The dispatcher continues
to dispatch eligible processors from the $READY queue until the queue is found
to be empty. At this time control is passed to the dispatcher's resource posting
routine, which looks for waiting PCE's that have been posted for ev:ents and are
therefore eligible for dispatching. All eligible PCE's are moved to the $READY
queue, and control passes back to the dispatcher.
SSVT (PLPA)
Offset

0204

$SVECF

. . . . . . . . . . . . . •$$Post Event Control Field

1..............

0208

$$Post Elements for HASJES20 Processors

02FC

..

HCT (JES2 Address Space)

Rei 41
Offset

Rei 40
Offset

03EC

0300

042C

040C

0430

0414

$CURPCE

· . . . . . Current PCE Address

0434

0418

$HASPECF

· . . . . . Master Event Control Field

0438

041C

0480

0464

0488

046C

$EWOABIT

· . . . . . 0 Header for PCEs Awaiting Any Post

0490

0474

$READY

· . . . . . 0 Header for PCEs Eligible for Dispatching

}

••••• peE Addresses for JES2 Processors

1- .. . . .

Walt Queue Header Addresses

Figure 5-46. JES2 Queue Control Fields

Section 5. Component Analysis

5.12.9

JES2 (continued)

$$POST
The JES2 dispatcher can be notified of work from within its own address space
by the $POST macro. In addition, the dispatcher can be notified of work from
other address spaces or from subtasks within its address space by the $$POST
macro, which causes· a HASPSSSM interface routiJ)e to cross-memory post the
JES2 main task. In this case, the dispatcher post promulgation routine, which
receives control when the resource posting routine runs out of work, propagates
event posts from the SSVT fields used by HASPS SSM interface routines to the
HCr fields in JES2 address space. Here the resource posting routine can pick
them up and mark the corresponding processors eligible for dispatching. Control
then returns to the dispatcher.

JES2 WAIT
When the JES2 dispatcher determines that there is no more work to be done, it
issues an MVS WAIT macro, and waits to be posted for more work. When a
$$POST macro is issued, the dispatcher post promulgation routine receives control
and transfers the event notifications to the HCT, where they are picked up by the
resource posting routine. The corresponding PCE's are transferred to the $READY
queue.

Dispatcher Queue Structure
The dispatcher queues are double headed and double threaded. Each- PCE (as shown
in Figure 5-47) has a chain field to the following PCE entry and one to the
preceding entry on the queue. In the special case of the first PCE (referred to as
PCE zero), the preceding entry field points to the queue headers, offset so that the
queue header appears to be a PCE itself. The last PCE has a following entry field
which points back to the queue header. The queue header itself is double, with
pointers to the first and last'PCE's in the chain. An empty queue has both queue
header fields pointing to itself, offset to appear as PCE zero. A PCE that is not on
a queue has both its preceding and following entry fields pointing to its origin.
In addition to the chain fields, each PCE has a PCEEWF field, which contains
information about the type of event the processor is waiting for. Figure 5-48
provides an example of a 4ump of JES2 proces~or queue chains.

5.12.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

JES2 (continued)
HCT
Specific PCE

Q Header

T

r--------I
I
I

:

PCE 0

Queue Head

Empty Queue

PCE not on Queue

Figure 5-47. JES2 Processor Control Element Relationships

JES2 Error Services
The following routines make up the JES2 error services:
• Disastrous Error Routine
• JES2 EST AE Routine
• Catastrophic Error Routine
• JES2 Exit Routine
• Input/Output Error-Logging Routine

Disastrous Error Routine
This routine is entered at entry point $DSTERR in HASPNUC whenever a physical
I/O error occurs, or whenever a logical error is detected when reading a job control
table (JCT) or an input/ output table (lOT). The symbol and module names are
moved into the message from the $DISTERR macro expansion. A $WTO is issued to
notify the operator of the error, and control is returned to the calling processor.
The message to the operator is as follows:
$HASP096 DISASTROUS ERRORAT SYMBOL symbol IN MODULE module

Section 5. Component Analysis

5.12.11

-~

~

N

0

JES2 Processor Queues $EWQ1:

C"I)

<
C"I)

N

C"I)

'<
r;n

S-

a

"1:1
lot

c§
a
a
~

5°

075380
075400
075420
075440
075460
075480

1E
1E
1E
1E
1E
1E

000A7030
000A7010
FFFFOOOO
000753F4
00075414
000A5478

000A7910
)00A70FO
000753DC
000B7A88
00075410
00000000

FFOA7180
000A7FOO
000753DC
000A7B78
00075410
00000000

Queue is Empty if First and Last Pointers
Point to Queue - 48 (Hex)

000A7188
000A0488
00075334
000A8688
00075424
00000000

000A7200
000A7400
000753E4
000A8F10
00075424
00000000

00000000
000A8000
00075320
00075400
00075420
00000000

OQ

lot
~

-<..
a::

,
<:
C"I)
~

0

r;n

Q"o
(')

too3

(I)

(')

=-

.EO
=

=

(I)

r;n

JES2 Processor Control Element

OA8680
OA8680
OA86AO
OA86CO
OA86EO
OA8700
OA8720
OA8740
OA8760
OA8780
OAS800
OA8820
OA8840
OA8860
OA8880

1C
1C
1C
1C
1C
1C
1C
1C
1C
1C
1C
1C
1C
1C
1C

85B71140 00000000
7FOOC1C1 OOOFEOOO
000A0800 00000000
00000000 00000000
00000000 00000000
00000000 00000000
NEXT LINE ADDRESS
00000000 00000000
00000000 00000000
NEXT LINE ADDRESS
00000000 00000000
000A09FO D7D~C9D5
20010000 E2E3C44B
50505050 50505050
OOOAOAOO 600778FO

00009000 00000700
00000000 000A8560
00000000 00000000
00075000 0007FAEO
00000000 00010000
00000000 00000000
SAME AS ABOVE
00000000 02800009
00000000 00000000
SAME AS ABOVE
00000000 500A8888
E3C3D9F1 00000000
F0404040 F0404040
50505050 000000C1
00075000 00FF83EE

PCEPCEA Points to the Next PCE
Waiting on this Q.

I Figure 548. Example Dump of JES2 Processor Queue Chains

00000000
000A8878
00000000
000A8A68
000A8800
00000000

000A7536
000A8878
000756F4
00075414
000A8870
00000000

Register 15 in the PCE
is the Resume Point
When Processor is Dispatched

PCENEXT Points to Next
peE in Chain of all PCEs.

t""'

c;:

000A7298
000A8000
000753EO
00075400
00075420
00000000

00000000
A007FB26
00001AEO
00075404
00000000
00000000

100100C3
8007F842
00000000
000 8107
00001060
00000000

02000200
00000000
00080AEO
00000000
00000000
00000000

c....
~

c:Il
N

-(')

0

::s

~

ei"

8-

'-'

00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000
00000000
30088900
01404040
OOODFOOO

00AD5190
00000000
5C5C585C
40404040
000A09AO

PCEPCEB Points to the Processor
Queue - 48 if this is the First
PCE on the Q

000A5888
00000000
5C5C5C5C
4007FF2A
OOOODOOO

00200200
00000000
5C5C5C5C
000A8686
00000000
PCEID
8107 = Local Printer

JES2 (continued)
JES2 should be qUiesced and restarted as soon as it is practicable in order to
recover any direct-access space that might have been lost as a result of the error.

JES2 ESTAE Routine
This routine is e.ntered at entry point $ABEND in HASPNUC whenever JES2
abends for any reason. The catastrophic error routine is called with an error code
of ABND and control is passed to the JES2 exit routine ..

Catastrophic Error Routine
This routine is entered at entry point $ERRORTN in HASPNUC whenever an
unrecoverable error is discovered by JES2. Register 0 contains the address of a
three or four character left-justified error code. The four byte error code field
is moved into the operator message which is then written on the operator's
console (using a WTO macro instruction) as follows:
$HASP095 JES2 SYSTEM CATASTROPHIC ERROR. CODE=code.
Control is then passed to the JES2 exit routine. The error codes and their
meanings are listed in OS/VS Message Library: VS2 System Codes.

JES2 Exit Routine
This routine is entered from the catastrophic error routine whenever JES2 is to
terminate under abnormal circumstances, and whenever a $P JES2 command is
successfully executed. When entered from the catastrophic error routine, the
following WTOR message is issued:
$HASP098 ENTER TERMINATION OPTION
The routine waits for the operator to respond with one of the following
replies:
• EXIT
• PURG
• DUMP text

If the reply is EXIT, the subsystem vector table (SSVT) telmination complete
flag and the $SVPOSTP byte are set on. Control is returned to the system in the
case of JES2 error detection, by an SVC 3 instruction with register 15 set to 24;
or in the case of a JES2 task abend, by a branch to the location in register 14.

Section 5. Component Analysis

5.12.13

JES2 (continued)
If the reply is PURG, the routine attempts to clean up commonly addressable
control blocks. If the subsystem is the primary subsystem: the UCB attention
index values are set to zero; tasks waiting for CANCEL/STATUS, process-sysout,
and storage cell expansion queues are posted; a system management facility
(SMF) record may be optionally written, and JES2 sub tasks are terminated and
detached. Control is then returned to the system as with the EXIT option.
If the reply is DUMP, a $DUMP macro instruction is executed with the text
(if any) used as the header. Processing continues as with the PURG reply.

If entry to the routine is through the normal execution of the $P JES2
command, processing is the same as with thePURG option for abnormal
terminations, except that control is returned to the system by an SVC 3 instruction with register 15 set to zero.

Input/Output Error Logging Routine
This routine is entered whenever an unrecoverable input/output error occurs on a
JES2 spooling volume, or whenever a line error occurs which may require the
attention of the operator. A message to the operator is generated as follows:
• The channel status, channel command code, sense information, track address,
and line status are retrieved from the lOB (pointed to by register 1) and
formatted.
• The unit address and volume serial are obtained from the UCB.
• The device name (if applicable) is acquired from the device control table
(DCT).
The format of the message is described in OS/VS Message Library:
VS2 System Codes.

JES2 $DEBUG Functions in a Multi-Access Spool Configuration
JES2 systems in a multi-access spool configuration share a single job queue, job
output table, master track group map, and remote message spooling queues, all
of which are kept on the JES2 checkpoint record. In addition, the checkpoint
record contains shared system queue elements (QSEs) and other miscellaneous
information needed for inter-system control. The checkpoint record is allocated
to one processor at a time. Access to any part is controlled by a system's JES2
checkpoint processor, found in the HASPMISC module. The processor has four
major sections:
•
•
•
•

5.12.14

Initialization
Read
Write
Release

OS!VS2 System Programming Library: MVS Diagnostic Techniques

JES2 (continued)

Initialization
Initialization is executed once when the processor is first activated by the JES2
dispatcher. If the debug option has been selected (&DEBUG=YES), storage is
obtained, if possible, for debug copies of the job queue and the job output table'
(JOT). If sufficient storage is not available, message $HASP452 is issued and
processing continues.

Read

I

This is executed at the beginning of each shared queue ownership period by systems
in a multi-access spool configuration. If parameter &DEBUG=YES, the job queue
and JOT areas are compared with copies saved just prior to the last checkpoint
write. A mismatch indicates an invalid alteration to these shared queue areas and
)ES2 is terminated with a KOt catastrophic error. This step is skipped for
the first read following initialization. All checkpoint records are read from DASD.
A lockout warning timer (parameter &WARNTIM) is started and the read is started
by $EXCP. lOS performs the actual hardware reserve of the checkpoint device.
The processor then wait~ for read completion or timer expiration.
If the timer expires before read completion, warning message $HASP260 is
issued and the warning timer is restarted. The message is issued repeatedly, at
warning intervals, until read completion occurs.

If a permanent read error occurs, JES2 is terminated with a K02 catastrophiC
error.
After a successful read completion, the time stamps in this system's QSE, and in
any QSE for which the $ESYS command had been entered, are compared with a
time stamp saved in the HCT. A mismatch indicates that another system has
illegally taken ownership of a QSE owned by this system, or that the reserve
mechanism (hardware or lOS) has failed to prevent simultaneous access to the
multi-access spool checkpoint records. JES2 is terminated with a K03 catastrophic
error.

Write
This is executed repeatedly as a loop, in response to various requests by other
processors or timers. In a multi-access spool environment, the loop operates only
during an ownership or hold period.

Section 5: Component Analysis

5.12.15

JES2 (continued)

If parameter ~DEBUG=YES, the saved copies of the job queue and JOT areas
are compared with the current job queue and JOT areas. If a record has been
modified, but its corresponding checkpoint control byte does not indicate so,
JES2 is terminated with a K05 abend. A K05 abend indicates failure to issue a
. $QCKPT or $#CKPT macro prior to executing a $WAIT macro, after modifying
the job queue or JOT.
The current hardware time-of-day (TOD) clock is recorded in the HCT, along
with this system's QSE and a QSE for which any $ESYS command had been
entered. In multi-access spool systems, these stamps are verified following the
next read operation to ensure the integrity of QSE ownership.

If parameter &DEBUG=YES, copies of the job queue and JOT areas to be
written are saved. In multi-accessed spool systems, these are used prior to the
next read operation to detect invalid updates to shared but not owned information.
The write is started by $EXCP and the processor waits for completion. If a
permanent error occurs, JES2 is terminated with a K04 catastrophic error.
Following a successful completion, the $CKPTACT bit in theHCT is cleared.

Release
Release is executed only by systems in a multi~access spool configuration at the
end of each shared queue ownership period.

Miscellaneous Hints on JES2
Starting JES2 - Enqueue Wait on STCQUE
The installation can choose the option to manually start JES2 by changing
MSTRJCL with AMASPZAP. When MSTRJCL is changed, JES2 parameters are
entered on an operator-issued START command (that must be issued before MVS
processing can occur). If the operator misspells JES2 on the START command
(such as entering JES), a wait occurs with no indication other than STC
(IEESB605) being exclusively enqueued on SYSIEFSD STCQUE behind
IEEVWAIT, which does not release the resource until JES2 is initialized. Also note
that the CSCB (command scheduling control block) pointed to by the parm list to
IEFJSWT is not formatted in the dump.
Therefore, if you encounter an enqueue wait on STCQUE, check that the
ST ART command is entered correctly. This also holds true for any normally
started tasks (such as mounts or installation started tasks) which cannot be started
until the primary job entry subsystem is started.

5.12.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

JES2 (continued)
TYPE

NAME

ABBR.

STORAGE TYPE

PRIMARY USE

SSVT

Subsystem Vector
Table

CSA
SUBPOOL 241

Contains system pointers and parameters for
HASPSSSM interface routines.

HCT

HASP control table

JES2 Address space,
HASPNUC module

Major directory for HASJES20. Contains
queue headers, control blocks pointers,
module and entry pointers and system
parameters.

Processor
Management

PCE

Processor Control
Element

JES2 Address Space

Unit of JES2 dispatcher. Has associated work
space and save areas for JES2 processors.

Buffer
Management

BUFFER

Buffer

JES2 Address Space

Basic building block for JES2 control blocks
(JCT, lOT, Special)

JOB
Management
(TRANSI ENT)

JCT

Job Control Table

SYS1.HASPACE
User address space
during XEO

Primary job oriented control block. Contains
accounting information and pointers to other
job information.

lOT

Input Output Table

SYS1.HASPACE
User address space
during XEO

Contains job DASD information and PDDB's
for input/output data sets.

PO DB

Peripheral Data
Definition Block

SYS1.HASPACE
User address space
during XEO

Describes a job input or output data set.

OCT

Output Control Table

SYS1.HASPACE
User address space
during XEO

Contains output control records (OCB's) to
describe data output records (forms, route,
etc,)

JOE

Job Oueue Element

In JOB (JES2
address space)

Represents a job in process. Resides on
appropriate job queue chain.

JOB

Job Oueue Buffer

SYS1.HASPCKPT &
JES2 address space

Contains the job queue chain header's JOE's
and I/O parameters for checkpointing.

JOT

Job Output Table

SYS1.HASPCKPT &
JES2 address space

Central control block for all JES2 output.
Contains three kinds of JOE's.

JOE

Job Output Element

IN JOT (JES2
address space)

Represents output data set by units of work,
characteristics of data set, and class of output.

SJB

Subsystem Job Block

CSA SUBPOOL 231

Represents a job in process toOS/VS2 used
by HASPSSSM interface routines.

SOB

Subsystem Data Block

User address space
SUBPOOL 229

Used by HASPSSSM to control processing of
data set using HASP Access Method (HAM).

PIT

Partition Information
Table

CSA

Completely describes a JES2 logical partition,
its job classes and current state.

Organizational

Job Management
(RESIDENT)

Job Management
(MISCELLANEOUS)

Unit Management

Multi-System
Management

CAT

Class Attribute Table

JES2 Address Space

Describes the attributes of a job class.

SCAT

SYSOUT Class
Attributes Table

In SSVT Space

Describes output classes by print, punch,
plot, etc. characteristics.

OCT

Device Control Table

JES2 Address Space

Represents a unit record device or RJE line.
Contains all the information necessary to
set up EXCP.

RAT

Remote Attribute Table

JES2 Address Space

Consists of one entry per remote device,
containing attributes of device.

OSE

Shared Oueue Control
Element

SYS1.HASCKPT &
JES2 address space
(JOB)

One per system of a Multi-Access Spool
environment, containing identification and
cross-system communication parameters.

I Figure 5-49. Major JES2 Control Blocks

Section 5 •. Component Analysis

5.12.17

5.12.18

OS/VS2 System Progtartlining Library: MVS Diagnostic Techniques

Subsystem Interface (SSI)

In the course of HASP/ASP installation, hooks were put into OS and SVS operating
systems to establish an interface. With the job entry subsystem (JES), an interface
was designed to eliminate the need for these hooks.
The subsystem interface (SSI) is primarily used to communicate with the job
entry subsystem (either JES2 or JES3), but is flexible enough to communicate
with any trace subsystem.

System Initialization Processing
At SYSGEN, the name of the primary job entry subsystem and secondary subsystems are listed on the SCHEDULR macro and put in the job entry subsystem
names table (CSECT IEFJESNT). Alternatively, secondary subsystems may be
specified in the subsystem names table in load module IEFJSSNT.
The master scheduler base initialization module (IEEVIPL) gives control to the
subsystem interface initialization module (IEFJSINT). This module builds a
subsystem communication vector table (SSCVT) for each unique name in the JES
name table and in the subsystem names table. The SSCVTs are chained together
with the primary job entry subsystem SSCVT first, and the master subsystem
SSCVT second. These are followed by (in the same order as in their respective
tables) the SSCVTs in the JES names table and in the subsystem names table.
IEFJSINT also puts the name of the primary job entry subsystem in the JES
control table (JESCT). The subsystem vector table (SSVT) for the master
subsystem is built and initialized. The subsystem allocation sequence table (SAST)
is built for later use in allocating subsystem data sets. IEFJSINT then returns
control to IEEVIPL.
The job entry subsystem builds and initializes its own SSVT when the system
is initialized. All other subsystems must do likewise. A subsystem can be
initialized as follows:
• By being started (for example: START JES2) or,
• By having an initialization routine specified in the subsystem names table.
Additional subsystem initialization processing is performed by module IEEMB860.
This module has two subsystem initialization functions. The first is to issue
operator messages for errors that occurred during subsystem interface initialization;
the second is to LINK to the subsystem initialization routines specified in
IEFJSSNT.

Section S. Component Analysis

S.13.1

Subsystem Interface (SSI) (continued)
Operator messages are issued by IEEMB860 rather than IEFJSINT because
IEFJSINT executes before the communications task has been initialized. Message
IEE730I is issued to indicate that a duplicate subsystem has been specified in the
subsystem names table. A subsystem is a duplicate if it is:
• A respecification of the primary job entry subsystem
• A respecification of the MSTR subsystem, or
• A respecification of a subsystem that has been initialized previously.
Message IEE858I is issued if the subsystem names table (IEFJSSNT) could not
be found; message IEE859I is issued for each subsystem initialization routine
which could not be found. It is the responsibility of the subsystem initialization
routine to inform the operator of and recover from errors in that routine. If the
. subsystem initialization routine fails to recover from these errors, the next entry
in IEFJSSNT is processed and the failing subsystem may not be completely
initialized.

Subsystem Interface Major Control Blocks
Subsystem interface's major control blocks are the JES control table (JESCT),
the subsystem communications vector table (SSCVT), the subsystem vector table
(SSVT), the subsystem information block (SSIB), the subsystem options block
(SSOB), and the extension to the SSOB or function dependent area. The
following table summarizes each of these control blocks which are described in the
Debugging Handbook.

5.13.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Subsystem Interface (SSI) (continued)
Control
Block

Sub pool

Created By

NUCLEUS

Mapping
Macro

Size

Pointed
ToBy

0

44 bytes

CVT

Contains information needed by
the Subsystem Interface and
addresses of Scheduler routines.

IEFJESCT

0

Note 1

JESCT

Defines the order in
which subsystems will
be invoked to allocate
subsystem data sets.

IEFJSAST

0

24 bytes

JESCT

Identifies each subsystem defined
to the system and points to the
SSVT for each subsystem.

IEFJSCVT

Any

Note 2

SSCVT

Contains the indications of functions of a subsystem and the
addresses of the routines that
perform those functions.

IEFJSSVT

Key

Function

JESCT

SYSGEN

SAST

IEFJSINT

SSCVT

IEFJSINT

SSVT

Subsystem
Any - determined
owning the
by the subsystem
SSVT, at
initialization of
subsystem.

SSIB

The user of
Subsystem
Interface

User's Subpool

Any

36 bytes

SSOB,
JSCB

Identifies the subsystem to the
Subsystem· Interface and passes
information between the subsystem and its caller.

IEFJSSIB

SSOB

The User of
Subsystem
Interface

User's Subpool

Any

20 bytes

SSWA,
IEL

The parameter list for the Subsystem Interface.

IEFJSSOB

Function
Dependent
Area

The User of
Subsystem
Interface

User's Subpool

Any

Variable

SSOB

Passes. information to the function
of the subsystem the user wishes to
invoke.

IEFJSSOB

241

241

Notes:
1. The SAST size is 8 bytes plus 12* (the number of subsystems in the SSCVT chain).
2. The SSVT size is 260 bytes plus 4* (the number of functions suppored by the subsystem).
Minimum size is 264 bytes, maximum - 1284 bytes.

Control Block Usage is shown in Figures 5-50 and 5-51.

Section 5. Component Analysis

•

5.13.3

Subsystem Interface (SSI) (continued)
Lac X'10'

I
CVT'

•

X'128' CVT JESCT ~
(JESCT

X'14' JESSSREQ
X'18' JESSSCT
JESPJESN

X'1 C' 1 - - - - - - 1

r

x'30'

J ESSAST A,

SSCVT

If

SSCVT

X'O'

'SSCT'

X'4'

SSCTSCTA

SSCTSCTA

SSCTSCTA

X'8' SSCTSNAM

SSCTSNAM

SSCTSNAM

SSCTSSVT.

.SSCTSSVT

X'10'

SSCTSSVT

'SSCT'

~SSVT

'S5CT'

SSVT

SSVTCOD256·byte
Function Matrix

J
X'1 04' ....S_S_V_T_R_T_N-t ]
i

Function Pointer
Matrix can be
Maximum 256 Words

SSVTRTN

, Figure 5·50. Subsystem Interface Control Block Usage

5.13.4

If

SSCVT

OS{VS2 System Programming Library: MVS Diagnostic Techniques

•

SSVTRTN

Subsystem Interface (SSI) (continued)

Requesting Subsystem- Services
To request subsystem services, a system routine enters the correct function code
(see Subsystem Interface Summary in OS/VS2 System Logic Library)
in the subsystem options block (SSOB), and the name of the desired subsystem in
the subsystem information block (SSIB). The IEFSSREQ macro is then issued,
causing control to pass to the subsystem interface routine IEFJSREQ. The
specified function code and subsystem name indicates to the interface routine
the subsystem routine to receive control.

Invoking the Subsystem Interface
Storage is acquired for the SSIB, the SSOB, and the function dependent area of
the SSOB if required. The following entries are made in the SSOB header:
SSOBID

- 'SSOB'.

SSOBLEN

- The length of the SSOB header.

SSOBFUNC

- The function ID of the function to be invoked.

SSOBSSIB

- The address of the SSIB or zero. Zero means that the life-of-job
SSIB is to be used. Its address is in the active JSCB, field
JSCBSSIB. The request will thus be directed to the subsystem
that started the initiator under which the job is running. (See
Figure 5·52).-

SSOBINDV

- The address of the function dependent area or, if not needed by
the function, zero.

The following entries are made in the SSIB:
SSIBID

- 'SSIB'

SSIBLEN

- The length of the SSIB

SSIBSSNUM - The name of the subsystem to which the request is being made.
SSIBJBID
SSIBDEST

- If the function requires these fields

The entries made in the function dependent area are:
length

*

- The length of the function dependent area (first halfword)
- Any fields required by the function

Section 5. Component Analysis

5.13.5

Subsystem Interface (SSI) (continued)
Register 1

SSOB Header
'SSIB'

X'O'
X'4'

X'4'

X'S'

X'S'

X'C'

SSOBRETN

X'10'

SSOBINDV

SSIBLEN
SSIBSSNM

}

Function
Dependent

Function Dependent Area (SSOB Extension)

16

~-----""j

:;;::'on
TvpeofFunctlon

~-----..,

I

Figure 5-51. Control Block Structure for Invoking Subsystem Interface

TCB

X'B4'

TCBJSCB

JSCB

JSCB

(
X'15C'

JSCBSACT X'13C'

JSCBSSIB

~SSIB
X'O'
X'4'
Note: The active JSCB may be the
same as TCBJSCB.

X'8'

Figure 5-52. Finding the SSIB for a Job When SSOB Pointer is Zero

5.13.6

OS!VS2 System Programming Library: MVS Diagnostic Techniques

'SSIB'
SSIBLEN

I

SSIBSSNM

Subsystem Interface (SSI) (continued)
Register 1 points to a one-word parameter list which points to the SSOB.
(See Figure 5-51).
Macro IEFSSREQ is invoked which passes control to routines which handle the
Subsystem Interface request. The communications vector table (CVT) and the
JES control table (JESCT) must be mapped if IEFSSREQ is invoked.
The subsystem interface returns a code in register 15. Possible return
codes are:

o 4
8
12

Successful completion - request went to subsystem

- Subsystem does not support this function
Subsystem exists, but is not active
Subsystem does not exist

16

Function not completed - disastrous error

20

Logical error (such as invalid SSOB format, incorrect length)

The field SSOBRETN in the SSOB contains a return code from the subsystem if
the request was successful. The return code depends on the function being invoked
(see the SSOB description in the Debugging Handbook.)

Logic Flow Examples
Thi$ section provides an overall logic flow from a task making a request, through
the subsystem interface to the subsystem, and then back to the task. Two
examples are described.

Notifying a Single Subsystem
1. A task (TSO/cancel) wants to inform JES2 that a job is to be canceled.
2. The task creates an SSOB, SSIB, and a function dependent area.
a. The SSOB is filled in. A function code of 2 is used. (See OS/VS2 System
Logic Library, Volume 3, for a complete function code list.)
b. The SSIB is filled in. The subsystem name is JES2.
c. The function dependent area is filled in with the necessary information that
the subsystem needs for this type of request.
3. Macro IEFSSREQ is invoked which branches to module IEFJSREQ
(IEF JSREQ's address is in the JESeT). Register 1 points to a parameter list
which points to the SSOB ..

4. IEFJSREQ checks:
a. Are the pointers to the SSOB and SSIB valid? No, then return with a return
code of 16 in register 15.

Section 5. Component Analysis

5.13.7

Subsystem Interface (SSI) (continued)
b. Are the formats of the SSOB and SSIB correct? No, then return with a
return code of 20 in register 15.
c. Find the requested subsystem's SSCVT. If not found, return with a return
code of 12 in register 15.
d. Find the requested subsystem's SSVT. If not found, return with a return
code of 8 in register 15.
e. Is the requested function code valid? No, then return with a return code of
16 in register 15.
f. Is the requested function code supported by the requested subsystem?
No, then return with a return code of 4 in register 15.
g. Index into the SSVT and get the address of the function routine.
h. Branch to the function routine. Register 0
register 1 = Address of the SSOB.

= Address of the SSCVT,

5. Module HASPSSSM at label HOSCANC receives control. It is the function
routine for JES2 for function code 2 (CANCEL request).
a. Process the request and place a return code in the SSOB (SSOBRETN).
b. Return codes for this function code are as follows:

o-

CANCEL completed.

4 - Job name not found.
8 - Invalid JOBNAME/JOB ID combination.
12 - Job not canceled - Duplicate jobnames and no job ID given.
20 - Jobnot canceled- Job is on output queue.
24 -' Job ID with invalid syntax for subsystem.
28 . - Invalid CANCEL request. Cannot cancel an active TSO user or a
started task.
6. Control is then returned to the requesting task directly from the function
routine. The task then examines register 15 and SSOBRETN and acts
accordingly.

Notifying All Active Subsystems
1. A task wants to notify all active subsystems of a WTO message.
2. The task creates an SSOB and a function dependent area. No SSIB need be
created if the task's life-of-job SSIB has the master subsystem's name (MSTR)
in it. If it does not, and that SSIB is used, only one subsystem would be
notified. The SSOB ·and the function dependent area are mled in. A function
code of9 is used. (A list .of all function codes is in OS/VS2 System Logic
Library, Volume 3.)

5.13.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Subsystem Interface (SSI) (continued)
3. Macro IEFSSREQ is invoked which branches to IEFJSREQ (address is in the
JESCT). Register 1 points to a parameter list which points to the SSOB.
4. IEFJSREQ checks:
a. Are the pointers to the SSOB and SSIB valid? No, return with a return
code of 16 in register 15.
b. Are the formats of the SSOB and SSIB correct? No, return with a return
code of 20 in register 15.
c. Find the requested subsystem's SSCVT. If not found, return with a return
code of 12 in register 15.
d. Find the requested subsystem's SSVT. If not found, return with a return
code of 8 in register 15.
e. Is the requested function code valid? No, return with a return code of 16
in register 15.
f. Is the requested function code supported? No, return with a return code
of 4 in register 15.
g. Index into the SSVT and get the address of the function routine.
h. Branch to the function routine. Register 0 = address of the SSCVT.
Register 1.= address of the SSOB.
5. The SSIB points to the master subsystem, so module IEFJRASP is the function
routine that receives control.
a. IEF JRASP makes a copy of the SSIB.
b. For each SSCVT, th~ name of the subsystem is copied into the SSIB copy.
(The master sUbsystem's SSCVT is skipped.) IEFSSREQ is then invoked for
each subsystem.
c. The highest return code from the subsystems is placed in the requesting
task's SSOB, and the lowest return code from the subsystem interface is
put in register 15.
6. Control is then returned to the requesting task directly from the function
routine. The task then examines register 15 and SSOBRETN and acts
accordingly.

Debugging Hints
• Paging must be possible at the time subsystem interface is entered since the
code for subsystem interface may not already be paged in at the time the call
is made.
• For the same reason, the processor must not be physically disabled.
• The mapping macro IEFJSSVT maps the SSVT. Only the master subsystem's
SSVT matches the mapping exactly. JES2 and JES3 SSVTs have additional
material appended to the end of the area mapped by IEFJSSVT. For JES3,
the mapping macro is IATYSVT. For the contents of the JES2 SSVT, refer to
OS/VS2 JES2 Logic.
.

Section 5. Component Analysis

5.13.9

Subsystem Interface (SSI) (continued)
• Some functions requested at theinaster subsystem cause the function to be
broadcast to every active subsystem. These function codes are:
4 - Notify the subsystem of end-of-task.
8 - Notify the subystem of end-of-address space.
9 - Notify the subsystem of a WTO message.
10 - Notify the subsystem of an operator command.
14 - Notify the subsystem of a delete operator message (DOM).
32 - Notify the subsystem of a failing START command.
• Function code 9 is used in an SSOB with the pointer to the SSIB always zero.
This causes the SSIB pointer in the JSCB to be used. If that SSIB is for the
master subsystem, the request is ~v(m to every active subsystem. If the
SSIB is not for the master sub&ystem, the request is given to only the
subsystem named in the SSIB.

• If a subsystem verification request (function code 15) is made to the master
subsystem, (field SSIBSSNM in the SSIB contains 'MSTR'), and the name in
the SSIBJBID field of the SSIB is not that of a job entry subsystem, then
upon return from the subsystem interface, field SSIBSSNM will contain
the name of'the primary job entry subsystem. Ajob entry subsystem is
dermed as a subsystem that can provide its own sysout services. This is
indicated by bit SSCTUPSS being offin the subsystem's SSCVT.

S.13.10

OS!VS2 System Programming,'Library: MVS Diagnostic Techniques

Recovery Termination Manager (RTM)

The recovery termination manager (RTM) cleans up system resources when a task
or address space tenninates, either normally or abnonnally.

Functional Description
Logically, RTM consists of four related processes.
1. RTMl attempts recovery for software or hardware errors; it is entered via the
CALLRTM macro instruction issued by supervisory routines. Functional
recovery routines (FRRs) are processed in this logical phase.
2. RTM2 performs normal and abnormal task termination for both system and
problem program routines. The ABEND macro (SVC13) requests RTM2 services.

3. Address space tennination provides normal and abnonnal address space
termination for supervisory routines. The CALLRTM macro instruction is used
to request this function.
4. RTM support functions such as error recording, formatting of dumps*, and
creating recovery control blocks for error exit processing.
*Note: RTM generates an error id that ensures that information recorded in
SYS1.LOGREC concerning a problem, can be readily correlated with SVC dump
infonnation concerning the same problem. See 'Error Id' later in this topic.

Work Areas
For details of RTM work areas see "Use of Recovery Work Areas for Problem
Solving" in Section 2.

Major RTM Modules
RTM 1, which is part of the nucleus, comprises four modules:
1. IEAVTRT1

- RTM entry point processor

2. IEAVTRTM

- RTM1 mainline

3. lEAVTRTS

system recovery manager

4. IEAVTRTR

RTMI recovery routines

Section 5. Component Analysis

5.14.1

Recovery Termination Manager (RTM) (continued)

RTM2, which resides in the link pack area (LPA), is entered via SVC 13. The
mainline for RTM2 comprises the following three modules:
1.IEAVTRT2
2.IEAVTRTC
3.IEAVTRTE

- initialization
- controller
- exit handler

Other important RTM2 modules are:
•
•
•
•
•
•

IEAVTAS1
IEAVTAS2
IEAVTAS3
IEAVTSKT
IEAVTMRM
IEAVTRML -

pre-exit processing
post-exit processing
control recovery
task termination purges
R TM2 resource manager
installation resource manager list

Process Flow
The following charts depict the process flow for:
•
•
•
•
•
•

Hardware error processing
Normal end-of-task termination
Abnormal end-of-task termination
Retry
Cancel
Address-space termination.

Hardware Error Processing
Depicted here is the processing for a hard type machine check in a global routine
that has FRR recovery. It shows the interfaces and control flow between the
machine check handler and RTM1 for both hardware error processing and the
resulting sofi:ware recovery attempt by the FRR. It indicates that software
recovery continues in task mode because, in this example, the FRR does not
recover the error.
The use of extended error descriptors (EEDs) allows the LOGREC buffer to be
available for further possible machine checks and is the mechanism for passing
information to RTM 1 and RTM2. The information in the global system diagnostic
work area (SDWA) used by RTM 1 recovery was obtained from the EEDs. RTM2
obtains an SDWA, but also uses the EEDs as its source of error data to be passed
to recovery routines.
RTM 1 uses the RTM processor-related work save area (WSACRTMK) to alter
the registers and the PSW that MCH reloads, thereby determining whether MCH
resumes the interrupted process (soft error), or reenters RTM1 for software
recovery (or hard error).

5.14.2

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Recovery Termination Manager (RTM) (continued)
Legend:

Logree Buffer
MCH
• Processing a storage
check in a global
routine that has
established an FRR .
• Invokes RTM1 for
software repair:
CALLRTM
TYPE=MACHCK

~Pointer

Information
about
hardware
error

--+

Control flow

c:::=;> Data flow

I.
..

~

~~II~E~A~V_T_R_T~1

L...i... •
~

...

RTM1

~.~~ ~IEAVTRTM

____
Sets up environment for MACHCK
entry.

EED

01EAVTRTH

. . . . . Calls lEA VTRTH
(Hardware error
processor).

• Preserve hardware
data in EED's
(RTM's internal
control blocks).

EED

Registers
and PSW
at time of
MACHCK

.

I

• Call appropriate
repair routine.

• Passes back pOinter
to re-e ntry data
(stored in
WSACRTMK).

• Returns to caller
(MCH) with pointer
to WSACRTMK.

• Record hardware
error to logrec.

...

Repair
status
information

WSACRTMK

• Establish
environment for
re-entry to RTM in
WSACRTMK.

'"

Registers
and PSWfor
re-entry to
RTM1

I
WSACRTMK
MCH
regs and
PSW
altered
by RTM1

SDWA

r'\

rV

JCs)

MCH

11

~~~~I~I~V6----~/~)----~~~----~
.....,..

.

• Load registers and
~~
WSACRTMK (causing
re-entry to RTM1 _
type MACHCK _
RE-ENTRY) for
software recovery

~_".GI
E~A~V~T~R~T~1;j~~;;_\~7~ I EAVTRTM
~
r

•

Sets up environment
for MACHCK
re-entry.

~

DISPATCHER
The task is
dispatched eventually
and execute the SVC
13 which causes
RTM2 task
recovery/termination
services to be invoked.

~--------------~

--

...

SDWA

'--EEDs
RB

,.

• Records the error •
• Returns with a
'Continue-withtermination'
indicator

~

TCB

TCBRTM12 •

~

FRR
I EAVTRTS
~
~r- ...
• Attempt system
....,.. • Routes to FRR to
recovery since error
attempt recovery for
(MACHCK) occurred
routine that suffered ~..
Percolates
in a global routine.
MACHCK.

• Sets up task for
entry to RTM2 by
altering RBOPSW.

• Exits to the
dispatcher.

...

8

MACHCK
Information

A

..

Continuewith-term.

~ SVC 13

Section 5. Component Analysis

5.14.3

Recovery Termination Manager (RTM) (continued)

Normal 'Task Termination
EXIT and parts of RTM2 make up this function. The flow shows how EXIT is
entered and then reentered to complete task termination; it also provides a
perspective of RTM2 functions related to normal termination of a task.
Legend:

Task issues SVC 3

~ISP

- - . . Pointer

PRB

IGC003 - EXIT

\DISP

\!\

-+

SVRB

===>

ASXB

1

Determine tasks
eligibility for normal
task termination .
• Exit was issued by
last AS or AB queue •
• TCBEOT = zero.

2

Issue SVC 13 to pass
control to RTM2.

Contrpi Flow
Data Flow

~IASXBTCBS~

RTM2

RTM2WA
Communications are
tor processing within
the RTM2 load
module.

IEAVTRTC
No abnormal conditions to
handle.

RMPL
IEAVTSKT
Free resources via Link to F.TM2 and
user defined resources managers,
passing resource managers parameter
list (RMPL).

Pass control to task termination
processor.

2

3
4

It ASXBTCBS indicates '1' task is left
in the memory - then address space
termination Is necessary. Set the task
non-dlspatchable and issue CALLRTM
TVPE=MEMTERM to SCHEDULE an
SRB Which Initiates address space
termination processing.
If only normal task termination. then
branch to exit prolog to get rid of
SVRB.
Free the RTM2 work area.

2

Set PRB resume psw to point to an
SVC 3 Instruction.

3

Set end-of-task indicator for exit in
TCB (TCBEOT).

4

• LINK, To all Resource Managers
defined in IEAVTRML

..,.. . . .~ To System Resource
"'II
Managers

Indicate proper control flow in RTM2
work area.
• If last task in memory. indicates
address space termination
processing.
, • Not last task.

PRB

BRANCH

Prolog deletes SVRB

~

OlSP

\
IGC003 - EXIT

1

IEAQSPET - IEAVGCAS
Free storage

....

BALR

...

IGC062R1

IEAVEEDO

TCB & RB core.
•• ,Free
"Dequeue TCB.
EITHER
end-ot-task exit
• Schedule
routln!' ror task.
OR

•

Post mother task if attached
w/ECB operand.

5.14.4

...

0)

.
r

BALR

...

Since the end-ot-task indicator
has been set (TCBEOT) BALR'to
resource manager for cleanup of
task,

2

CD
CD
0

TRRM
VSM
PGM

.... BALR

...

..,

OET

Exit to dispatcher.

BH 14

.,

Normal Task Termination
is Complete

OS/VS2 System Programming Library: MVS Diagnostic Techniques

BALR

I

.. I

..
..

l

IEAVTSBP
Dequeue/Free SCB's owned by RB or
task.

I

J

IEAPPGMX

I

Free programs

I

Recovery Termination Manager (RTM) (continued)

Abnormal Task Termination
Shown here is the logic flow during abnormal termination of a non-critical nature.
If the error is not recoverable at a particular task level, that task and its subtasks
are removed. If the scope of the abend is "Step," then the entire job step is
removed. Optionally, serviceability information (dumps and software error
records) is supplied to the user.

ABEND
TCB

1----...J,,·-;.;ooihb;;ta;Tn;.'"i-;;n~ltlalize and

queue
the RTM2 work area.
• Save a copy of the trace
t~le~ _
. . ._ _
IEAVTRTC

....

• Validity check and process
the dump options.

Control Is
returned from
IEAVTASI

(Stacking)
IEAVTRTC
• ABTE RM SMC subtasks
with 100.
• Walt for the subtasks In
RTM2 to complete.
• Set the subtasks non·
dispatch able.
• Purge the resources.

1P......~~~~~JI--------~~........~~IEAVTRT2
resource managers •
• Update the RB queue
for exit.

normal exit

legend
-----.... Pointer

---+

Control Flow

c===) Data Flow

Section 5. Component Analysis

5.14.5

Recovery Termination Manager (RTM) (continued)

Retry
Shown here is the flow through RTM2 when processing a potentially recoverable
error. The recovery exit is supplied environmental data that describes the error
(for example,completion code, register contents, PSW,system state at time of
error) to ilid in diagnosing the error. To retry, the resume PSW in each
request block (RB) up to and including the retry RB is modified. The retry
address supplied by the exit is placed in the resume PSW field of the retrying RB,
and all RBs between the retry RB and the RTM2 RB have their resume PSW set to
either exit prologue or SVC 3. When RTM2 eventually returns to the system,
supervisor-assisted linkage will cause the retry address in the retry RB to be given
control.

r;;sc~ SJ·~Lr_:g ~
__V_I.___

__________

~

Tce

IEAVTRTC
• Process and validity check the dump
options.

Legend:
• Select an exit (SCB).
• Obtain and Initialize the SDWA.
• Perform I/O requests and block
asynchronous exits If requested.

.~V:'C~t:,EXI~
IEAVTAS2

Control from
IEAVTASl via
IEAVTRTC

. . ._ _ _

.Dlagn'ose
error

•

• Select
options

• Track the SDWA.
• Record, If requested.
• Save the dump OPtions.

IEAVTRT2
• Route control to IEAVTRTE.

IEAVTRTE
• Free the saved copy of the trace
table If available. (RTM2TRTB10).
• Free the RTM2 work area.
• Clear the TCB flags.
• Branch to the exit prologue.
Abnormal
'exit

5.14.6

- - - . PoInter
User exit

QS/VS2 System Programming Ubw{ MVS Diagnostic Techniques

•

~ Contr~1 flow

•••

~Dataflow

Recovery Termination Manager (RTM) (continued)

Cancel
Shown here is the flow of control through RTM when a job is cancelled. The
CANCEL request is indicated by specific completion codes set in the TCB by
RTMI (code='X22'). The CANCEL process is distinctive in that it is considered
a strictly unrecoverable situation. Normal termination procedures are abandoned
in favor of creating an express path through termination. However, term exits
are given control.
Legend:
_Pointer
- " c o n t r o l Flow

RTM1

~DataFIOW

IEAVTABD
• Process the EED's.
• SDUMP/SLIP considerations.
IEAVTRTC

• Determine the type of dump
(SVSABEND or SVSMDUMP).
• Process the dump data set for
current & SNAP.
• Find the daughters & SNAP If
not SVSMDUMP.
• Reset the TCB flags In current
and daughters.

--..-

• Set the subtasks nondispatch able.
• Process the subtasks and current
task. Setting abend bits,
halting I/O and purging
resources.

IEAVTRTC
• Initialize' term oxit processing
until all term exits have been
entered.

_____ ~--ii61

L-_ _ _..J

Resource managers

(Task Termination)
IEAVTSKT'

• Initiate task termination until

L-~e~ac~h~s~u~b~ta~s~k~h~a~s:.e~x~it~ed~·~.____

jl............

• Find the deepest subtask.
• Detach the subtask.
••••••••• :

~';;~:t~ht~:e~o~~~;~e for

• Installation resource
managers.
• IBM resource
managers .

L-_e_x_lt_.__________________--J

• Free the saved trace table
if available.
Exit prologue
(lEAVEEXP)
Normal exit

Section 5. Component Analysis

5.14.7

Recovery Termination Manager (RTM) (continued)
FORCE Command
The FORCE command is designed to remove a job or TSO user from the system
after the CANCEL command has failed to do so. For example, a job is writing to a
DASD unit when the unit is suddenly made unavailable to the system; in this case,
the CANCEL command is frequently unable to remove the job. If CANCEL does
fail to remove the job, then FORCE can be used. However, FORCE does not use
normal termination or normal cleanup routines, and is intended to be used only as
an alternative to another IPL.
'
When FORCE is issued, the job's address space is terminated and any task
running in the address space is terminated. If a job is running under an initiator,
the initiator is also terminated.
FORCE processing is dependent on the recovery termination manager (RTM),
and on the command scheduling control block (CSCB), which contains a new bit
definition in CHAFORCE. When the FORCE command is entered on a console
having system authority, control is given to the CANCEL-FORCE processor which
verifies that the command syntax is correct. The processor then scans the CSCB
chain to, see if the job exists and is cancelable. A bit in the job's CSCB is then
checked to see if a CANCEL has been issued for this job. If not, a call is made to
the message module to issue m~ssage IG~~38I - 'CANCELABLE - ISSUE CANCEL
BEFORE FORCE', and control is returned to the system. If CANCEL has been
issued, a CALLRTM TYPE=MEMTERM is issued. The message module is called to
issue message IEE301I - 'FORCE COMMAND ACCEPTED', and control is then
returned to the system. If an error is found in the command syntax, or the job was
either not found or was non-cancelable, the message module issues an appropriate
message and control is returned to the system.
The FORCE processor uses the, current CANCEL serialization code. The CSCB
chain resource is serialized via ENQUEUE on SYSIEFSD QIO. Because the holder
of CSCB QIO must be non-swappable while holding the resource, the FORCE
command 'processor issues a SYSEVENT DONTSWAP before issuing the
ENQUEUE on QIO.
For' additional information concerning the use of FORCE, see Operator's
Library: OS/VS2 MVS System Commands.

5.14.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Recovery Termination Manager (RTM) (continued)

Address-Space Termination
The process of terminating an address space (memory) is one that cannot be
isolated to one task, module, or logical unit of code. The many parts of the
process, depicting control flow and data flow, are shown here.
BALR

Since the MEMTERM process circumvents all TASK recovery and
TASK resource manager processing,
its use is restricted to a select group
of routines which can determine that
task recovery and resource manager
clean up is either not warranted or
will not successfully operate in the
address space being terminated.
It therefore is restricted to the
following users:

ATMI

\/\ ,

IEAVTRTI

CALLRTM
TVPE=MEMTERM
ASIOm
COMPCOO-O (normal)
+0 (abnormal)

Global SRB dispatcher

Via branch table go to
'TVPE' processor.
TVPE=MEMTERM

Address space
termination SRB
Post RTCTMECB
This activates the
address space
termlntlon task In
the master address
space.

IEAVTRTM
Put the ASCB of the
address space to be
termi nated on the address
apace queue.

1)

2)

3)

4)

5)

6)

Paging supervisor when It de·
termlnel that It cannot swap In the
LSQA for an address space.
Address sPace create when It
determines that an address space
cannot be initialized.
The RTM or the supervisor can·
trol F R R when they determine
that uncorrectable tranSlation
errors are occurring In·the
address space.
The RTM2 when It determines
that task recovery and termination
cannot take place in the current
address space:
The RCT when it determines that
the address space is permanently
deadlocked.

The RTM2 when all tasks in the
address space have terminated
(IEAVTRTE). This is the only
reQuestor of normal address sPace

termination (that

05

2

queue of

addr.., spec.(,)
to be terminated.

3

The auxiliary storage management
recovery routine when It suffers an
indeterminate error from which it
cannot recover I while handling a
swap-In or swap-out request.
8) The auxiliary storage management
recovery routine when It determines that uncorrectable transla"
tlon errors are occurring while ASM
is uSing the control register of
another address space to uPdate the
address space's LSQA.
7)

9)

SVC

34

Store the completion
code in tha ASCB with
matching ASIC (or
current).

Dispatcher
(IEAVEDSO)

Schedule the SRB to post
the address space termination task in the master

address space (use of
the SR B routine is
serialized by compare and
swap).
IEAVTRTI
Return to the caller.

in r.spons. to a FORCE

command.
10)

COMPCOD=O).

VTIOC In response to an FSTOP
rePly.

Step 1
Step 2
Steps 3.4
Steps 5.6.7

Note: Since callers 4. 5, and 6 above are task-related and running in the
address space to be terminated, they will set themselves non-dlspatchable
after Issuance of CALLRTM.

Identify the requesters
The request format
Initiate the request
Process the request

RTCT

I

RTCTFASB
ASCBQptr

L~

~.7 ASCB

I
~________~:A:S:ID::::~'

/

, next ASCB

~

Reside~ess

~

r-'-"\

r-------,
f~t!~~~~i.

I

I

I

2

I

3

I
I

4

Get the local lock _
patcher lock.

5

Free locks.

L _ _ _ _ _ --.J

6

Can SVC I/O purge.
Purge 110 for that
address spee•.

7

CaU RSM (real .torage
management)
to free aU real and
au~iliary storage.

poSted for work.

I

CMS lock _

dis-

J..

MEMTERM
OPtions

I
I

Address space
terminator processor taSk

~

IEAVTMTR

~ 1
2

Set th.e address Ipace Indicator by ASCB
non-dilpatchable.

'I :~~~tl~;aSt~~~d~tler L..J
IPLl. It remains I
I inactive until
I
I

I

to dequeued ASCBl

Dequeue the last ASCB on the address space
termination queue (QUEUE MOOIFICATION SERIALIZED via compare and swap.)

I

I

CD rt

to dequeued
ASCB

Rt

I~
I~
'~_1__________I_EA_V_T_M_T_C____________-4

t:::::::::J

ReSIdent task

It

Rt

RTCTMECB

space termination
controller task in master address space

I
~
I

RO

POST

3

MP-W.lt for task and SRB activity for this
addreilipace to stop In other procellor.

4

Set RO to point to this
terminating address
space's ASCB.
I ndicete the
MEMTERM options
in RI.
Issue SVC 13 - to
invoke the services of
RTM2.

RTM2

SVC 13

..

.

Perform
address
space
purges

Return

to caller.

EXIT to the dispatcher.

l

~
~IGCOOOIF

SR 14

~
~IEAVTERM

8

Atteeh a subtask to handle remainder of
purges for the address space (pass ASCB
in Rl).

9

If the address space termination ASCB
queue pointer is not zero, do processing
steps

o

to

(£)

-

A TTACH

Legend:
for the n•• t ASCB.

Otherwlle, the talk waltl for work (walt on

RTCTMECBl.

______ Pointer

~ Control

flow

~Dataflow
WAIT

Section 5. Component Analysis

5.14.9

Recovery Termination Manager (RTM) (continued)

Error ID
Error ID ensures that problem information recorded in SYSl.LOGREC, can be
easily correlated with SVC dump information concerning the same problem. The
error id function is invoked whenever the RTM is entered to process an error condition. The RTM determines if the entry is to process a recursive or a new error, a
new error beingone.unrelated to a previous error.

If an error occurs during the processing of a previous error, the error id has the
same sequence number as the original error, but is given a new time stamp. In this
way, the sequence numbers show that the errors are related, and the time stamps
show the history of error processing.
The RTM generates an entirely new error id:
1. Upon entry to RTMI for a machine-check error (module IEAVTRTH)"
2. Upon entry to RTMI in SLIH mode for non-recursive error processing.
3. Upon entry to RTM2 when there has been no previous error processing in
RTMI. Control passed to RTM2 by RTMI does not result in a new error id if
RTMI has already generated one.
RTMI maintains the error id in either the SDWA or an EED. RTM2 maintains
the error id in its work area, RTM2WA. At an appropriate point in the error
processing, the error id is moved to the SDUMP work area (pointed to by
RTCTSDWK), where it is stored until processed by SDUMP. The correct error id is
passed to SySl.LOGREC when a software or hard machine check error record is
written by RTM. Soft machine check error records do not contain an error id
because no subsequent software recovery takes place following a "soft" error.
IFCEREPI recognizes and prints the error id in the LOGREC software or machine
check record. AMDPRDMP recognizes and prints the error id as part of the header
information for an SVC dump, formatted as follows:
ERRORID FOR THIS DUMP = SEOyyyyy CPUzz ASIDaaaa TIMEhh.mm.ss.t

where:
yyyyy
zz
aaaa
hh.mm.ss.t

represents the sequence number of the error id
represents the error id logical processor
represents the error id ASID
represents the error id time stamp converted to read as hours,
minutes, seconds, and tenths of seconds

If an error id is not available for a dump (indicated to AMDPRDMP by zeroes in
the error id header field), the message "NO ERRORID ASSOCIATED WITH THIS
DUMP" is printed where the error id is normally found.
To further increase the usefulness of error id, message IEA911 A is changed to
include the error id when an SVC dump is taken. The message reads:
IEA911A COMPLETE/PAR.TIAL DUMP
ON SYS1.DUMPxx/UNIT=ddd
ERRORID=SEOyyyy CPUzz ASIDaaaa TIMEhh.mm.ssit

where xx and ddd have the same meaning as in the current message.

5.14.10

OS/VS2 System Programming Library-: MVS Diagnostic Techniques

Recovery Tennination Manager (RTM) (continued)

, SVC Dump Debugging Aids
The SVC dump function of RTM is invoked when the SDUMP macro is issued. SVC
dump produces dumps of system errors on a SYSl.DUMPxx or user-defined data
set. SVC dump also produces abend dumps requested by SYSMDUMP DD
statements.
Items that are important for you to understand when debugging errors in SVC
dump processing are described in the following topics:
• Important SVC Dump Entry Points
• SVC Dump Error Conditions
• SYSl.LOGREC Entries Produced for SVC Dump Errors
• Control Blocks Used to Debug SVC Dump Errors
• Resource Cleanup for SVC Dump

Important SVC Dump Entry Points
The BRANCH= parameter on the SDUMP macro determines the SVC dump entry
points and mainline processi~g to be used.
BRANCH=YES Option
Entry point lEAVTSDX is used for branch-entry SVC dumps. lEAVTSDX creates
a summary dump in a'real storage buffer (if the SUMDUMP option is requested on
the SDUMP macro), schedules one or more SRBs to invoke dump task (IEAVTSDT)
processing for the requested address spaces, and then returns control to the caller.
The branch-entry option is requested by many FRR routines and some ESTAE
routines. This option is also requested when ACTION=SVCD is specified on the
SLIP command.

BRANCH=NO Option
Entry point lEAV ADOO is used for the SVC entry to SVC dump. For scheduled
dump requests (ASID or ASIDLST is specified on the SDUMP macro), lEAV ADOO
calls lEAVTSDX which schedules one or more SRBs to invoke dump task
(lEAVTSDT) processing for the requested a,ddress spaces, and then returns control
to the caller. For synchronous dump requests (ASID and ASIDLST are not
specified on the SDUMP macro), lEAVADOO processes the dump and then returns
to the caller.
The SVC entry is requested by many ESTAE routines. It is also requested by
the DUMP command (as a scheduled dump), and by the abend dump processor
(lEAVTABD) for SYSMDUMP DD statements (as a synchronous dump).

SectionS: Component Analysis

5.14.11

Recovery Tennination Manager (RTM) (contipued)

SVC Dump Error Conditions
If the SVC dump function encounters an unexpected abend during its processing,
it produces a software SYS1.LOGREC record and, if possible, continues taking
the dump.
Expected program checks can occur when SVC dump is checking whether a
virtual page that is to be dumped is valid and assigned. These program checks
do not result in SYSl.LOGRECentries.
SVC dump issues abends 133 and 233 if it detects an unauthorized caller or
invalid input parameter. In these cases, LOGREC entries are not created and retry
is not attempted.
SVC dump issues a COD abend for some unexpected errors during its processing.
In this case, retry is attempted.

SYS1.LOGREC Entries Produced for SVC Dump Errors
The best starting place for debugging SVCdump problems is the SYS1.LOGREC
entries contained in the in-storage SYS1.LOGREC buffer or in SYS1.LOGREC,
because a dump of the SVC dump problem is generally not available. (SVC dump
does not take a dump of its own problems.)
Many SVC dump problems can be debugged from the SYS1.LOGREC entries
alone. However, more complex problems may require a stand-alone dump that
can be taken after a SLIP trap with ACTION=WAIT has matched. These problems
include loops and failures to free critical system resources.
Fixed Data
The fixed data that SVC dump places in the system diagnostic work area (SDWA)
for recording on SYS1.LOGREC is:
SDWAMODN -

Load module name (generally IGC0005A, which is the SVC 51
load module in SYSl.LPALIB).

SDWACSCT

CSECT (microfiche) name, which can be any SVC dump
module name. For details of SVC dump module functions,
interfaces, and flow, refer to OS/VS2 System Logic Library.

-

SDWAREXN -

5.14.12

O~LVS2

Recovery routine name, which is given as a label. This label
is not always within the failing CSECT shown in SDWACSCT.

System Programming Library,: MVS Diagnostic Techniques

Recovery Termination Manager (RTM) (continued)
The following table shows the label of the recovery routine, the name of the
containing CSECT, and a description of the recovery processing.
Label

CSECT

Description

DTEST AEI

IEAVTSDT

ESTAE routine for scheduled SVC dumps that are
executing under the dump task (IEAVTSDT) in the
requested address space. lEAVTSDT also establishes
SDESTAEX which can percolate to DTESTAEI.

SCHFRR

IEAVTSDX

FRR routine for branch-entry to SVC dump and
for the timer disabled interrupt element (DIE)
used to free SVC dump's real storage buffer. This
FRR is established by lEAVTSDX.

SDEST AEX

lEAVADOO

ESTAE routine for mainline SVC dump processing.
This EST AE is established by lEAV ADOO and
IEAVTSDT.

SDFRRRTN lEAVADOO

FRR routine for mainline SVC dump processing.
This FRR is established by SVC dump modules when
a lock is held and a retry is needed in the locked
state. IEAVADOO and IEAVTSDT are the main
users of this FRR.

SUMFRFRR lEAVTSSD

FRR routine for SUMFRR routine processing.
This FRR is established by the SUMFRR routine.

SUMFRR

lEAVTSSD

FRR routine for summary dump processing invoked
for branch-entered SVC dumps. This FRR is
established by lEAVTSSD. If SUMFRR abends,
SUMFRFRR receives control; and if it percolates,
SCHFRR receives control.

Variable Data
The variable data that SVC dump places in the SDWA for recording to
SYSI.LOGREC is:
SDWAVRA

Contains the 24-byte recovery routine parameter area if
DTESTAEl, SDESTAEX, SDFRRRTN, SUMFRFRR, or
SUMFRR is the recovery routine name in SDWAREXN. This
area contains bits that indicate the resources held, other status
bits, the retry address, the base register value, and the address
of the SVC dump work area (ERRWKADR at X'8 '). The
contents of the parameter area are mapped by the IHASDERR
macro. The common name of the work area is ERRWORK.

To obtain the offset into the failing module, subtract the base register field in
ERRWORK (ERBASEI at X'C') from the address in the failing PSW (found in
the SDW ANXTI field at X'6C').

Section 5: Component Analysis

5.14.13

Recovery Termination Manager (RTM) (continued)

Control Blocks Used to Debug SVC Dump Errors
The following control blocks contain key information that can be used to debug
problems in SVC dump routines.
• Address Space Control Block (ASCB)
• Recovery Termination Control Table (RTCT)
• SVC Dump Work Area (SDWORK)
• Summary Dump Work Area (SMWK)

Address Space Control Block (ASCB)
The ASCB contains the address of the TCB for the SVC dump task (lEAVTSDT)
in the ASCBDUMP field (at offset X'60'). In this TCB, the TCBEXSVC bit (loworder bit at X'CC') is set on while the SVC dump task is executing.

Recovery Termination Control Table (RTCT)
The RTCT is pointed to by the CVTRTMCT field (at X'23C') in the CVT. It
contains SVC dump information including status bits, an array that describes the
SYSl.DUMPxx data sets, and an array that contains information for the address
spaces to be dumped.
SVC Dump Work Area (SDWORK)
The SDWORK is pointed to by the RTCTSDPL field (at X'9C') in the RTCT. It
contains most of the reentrant storage used by SVC dump including register save
areas, CCWs, and the I/O buffer that contains the 4104-byte SVC dump records
before they are written to the dump data set.
Summary Dump Work Area (SMWK)
The SMWK is pointed to by the RTCTSDSW field (at X'B4') in the RTCT and
contains fields used when a summary SVC dump was requested or defaulted (via
the SUMDUMP option on the SDUMP macro). It includes counter fields that show
how many real frames are used for the real storage buffer that holds the summary
dump created for branch-entry callers of SVC dump. The count of real frames held
.(field SMWKFRHD at X'C6') is zeroed after the summary dump is written to the
dump data set and the frames returned to RSM.

5.14.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Recovery Termination Manager (RTM) (continued)

Resource Cleanup for SVC Dump
Resource cleanup performed by SVC dump includes: setting the system dispatchable, setting tasks dispatchable, freeing the summary dump real storage buffer,
deleting the TQE for the storage buffer, restarting the system trace, writing end-offile on the dump data set, dequeueing the dump data set, and turning off indicators
that an SVC dump is in progress. These resources are cleaned up by SVC dump's
mainline processing or recovery routines. In special cases, the following routines
also perform resource cleanup.
If an address space terminates during SVC dump processing, SVC dump's
MEMTERM exit (IEAVTSDR) cleans up the resources related to that address
space (such as setting tasks dispatchable). If the address space was the last to be
processed, then all resources are cleaned up and the SVC dump in-progress indicators (high-order bits in the CVTSDBF (at X'24C') and RTCTSDPL (at X'9C')
fields) are turned off so that additional dumps can be taken.
SVC dump also uses a timer DIE exit that is contained in module lEAVTSDX
at label SCHDIE. This exit ensures that the SVC dump real storage buffer is
returned to RSM if SVC dump encounters an error during processing (such as a
loop ).

Section 5: Component Analysis

5.14.15

5.14.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Communications Task

The communications task (comm task) handles communications between console
operators and the system (user programs and system routines).
The types of communications that the communications task handles are:
• Operator commands from a console as a result of an attention interrupt (on local
devices).
• Output to the operator caused by the write-to-operator (WTO), write-tooperator with reply (WTOR), and delete-operator-message (DOM) macro
instructions.
• External interrupts that are caused by the operator pressing the interrupt key on
the operator control panel. The communications task switches the master
console's function to an alternate console.
• Automatic console switching from a failing console to its alternate when an
unrecoverable I/O error occurs.
• Console switching as the result of the VARY CHANNEL, VARY CPU, or VARY
MSTCONS command.
• Console switching as a result of a processor failure in a multiprocessing system as
a part of alternate CPU recovery (ACR).
Before processing WTO, WTOR, and DOM macro requests, the communications
task passes control to the job entry subsystem (JES) responsible for the job issuing
the request. The JES exit routine may suppress the message, or modify the message
text or routing code.
Multiple console support (MCS) is a standard feature that supports up to 99
consoles. With MCS, messages can be routed to up to 15 different functional areas.
according to the type of information in the message.
Device independent display operator console support (DIDOCS) is an optional
feature that provides uniform console services for various display consoles.

Section 5: Component Analysis

5.15.1

Communications Task (continued)

Functional Description
. The. ~ait service routine (lEAVMQWR) detennines the functions to be performed
by the communications task. It is given control by the dispatcher (supervisor
control routine IEAVEDSO) after one of the communications task's event control
blocks (ECBs) has been posted.
Upon each entry to the wait service routine, the entire list of communications
task's ECBs is tested from top to bottom in priority sequence. The posted ECB
identifies the service that will be performed by the communications task. As each
service is completed, control is returned to the wait service routine and the entire
list of ECBs is again tested for an active ECB. When no active ECBs are found, the
wait serVice routine issues the WAIT macro which places the communications task
in the wait state until the next communications task ECB is posted.
In addition to testing for posted ECBs, the wait service routine checks other
indicators (represented by control bits). The communications task ECBs and
control bits are located in the unit control module (UCM) and unit control module
entries (UCMEs).
Figure 5-53 lists and describes the ECBs and control bits in the sequence that the
wait service routine makes the tests.

5.15.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Communications Task (continued)
ECB or
Control Bit

Function

UCMARECB
(in UCM)

Alternate CPU recovery - the process of switching from one processor
to another in multiple processor configurations. The communications
task switches consoles as required.

UCMXECB
(in UCM)

External interrupt - switches the master console functions from the
current master console to the next available alternate console. This
results when the operator presses the interrupt key on the console
control panel.

UCMAECB
(in UCM)

Attention interrupt - prepares the console (from which the interrupt
was received) to accept operator input.

UCMECB
(in UCME)

I/O processing complete - indicates a message has been sent to or
received from a console. Results from the interrupt that an I/O device
causes after performing each I/O operation.

UCMPF
(bit in UCME)

Console output pending - indicates a message is queued and ready for
some console. Results if (1) one message is queued for several consoles,
or (2) a console is busy when a WTO or WTOR message is queued for
that console.

UCMSYSJ
(bit in UCM)

Hardcopy output pending - indicates a message is queued for hardcopy
output and ready to be sent to a data set.

Note: Before the following ECB is processed, the communications task tests the WOEs
and may issue message IEA405E (80% of WOEs in use), IEA404A (limit of WOEs
reached), or I EA4061 (shortage relieved).
UCMOECB
(in UCM)

Queue message for output - prepares the message posted by the WTO
or WTOR macro for output to the appropriate consoles.

UCMSYSI
(bit in UCM)

Cleanup WQE chain - eliminates WOEs marked for deletion by system
functions (such as task termination).

UCMDECB
(in UCM)

Delete operator message - indicates that a DOM macro has been issued
to (1) delete a WTOR message that the operator has not responded to,
or (2) delete a WTO message when the issuer has determined that the
requested action was performed.

UCMNPECB
(in UCM prefix)

Write NIP messages to buffer - indicates that NIP messages stored
during nucleus initialization can be written.

Figure 5-53. Sequence of Communications Task Processing

Section 5: Component J\nalysis

5.15.3

Communications Task (continued)
Comm~nications

Task Controll;llocks'

The following control blocks are used by the communications task:
UCM

Unit control module - created at system generation. Contains pointers
to the control blocks and routines that support the communications
, task.

UCME

Unit controlmodule entry - created at system generation for each
generated device. Contains information about the deVice including
attributes, pointer to the UCB, I/O ECB and message queue for the
device:

WQE

Write'queue element - created for each WTO or WTOR request.'
Contains information about the request including message text and
routing code.

ORE

Operator reply element - created for each WTOR. Contains information about the reply portion of a WTOR request including the buffer
to receive the reply.

CQE

Console queue element - created for each console that is to receive a
message. Contains information about messages queued to particular
consoles.

ElL

Ev.ent indication list - created at system generation. Contains pointers
to the various ECBs in the UCM and UCME.

RDCM

. Residentdisplay control module - created at system generation.
Contains information about a display console.

TDCM

Pageable display control module - created at system generation.
Contains DIDOCS work and save areas, pointers to related modules,
and the screen image.

CXSA

Communications extended save area - used to communicate among
communications task modules.

Refer to Figure 5-54 for the relationship of these control blocks.

5.15.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Communications Task (continued)

CVT

TCB

.....

Communications
Task

CVTCUCB

-;:Y_EIL

UCM

Pointers to
the ECBs in
UCM and UCME

UCMPXA

""

UCMLSTP

~

UCMWTOO

UCMWOEND

WOE (lasd

~

WOE (firsd
WOELKPA

WOE

WOETXT
(message)

UCMRPYO

UCMVEA

IRE

1-.....

ORE

UCMVEL

/I

ORELKP
,

UCMXB
UCMOUTO
"

\

Uastl

I

~Next

ORE

OREWOE
(address of
associated
WOE)

UCME (first)

I~

WOETXT
(message)

.~ Next

.

ORERPYor
OREOPBUF
(reply
buffert

UCME (last)

__ COE
COEWOEA

RDCM

•
•

DCMADTRN

f

Pointers to
associated WOEs

(word 6)
COE

I

TDCM
DCMCXSVE

--..........

I

CXSA

1

I

Figure 5·54. Communications Task Control Block Structure

Section 5: Component Analysis 5.15.5

Communications Task (continued)

Debugging Hints
Hints for debugging various problems are described in this topic.

Console Not Responding to Attention

If a console is not responding to an attention interrupt, check the following:
• The console attention processor (lEAVVCRA) may not be posting the attention
ECB (UCMAECB) in the UCM. The communications task will not process the
attention interrupt until the attention ECBis posted. This normally occurs
when the console is inactive (UCMUF indicator in the UCME is off), a CLOSE
is pending for the device (UCMCF indicator in the UCME is on), or the device
does not support interrupts (UCMIF indicator in the UCME is off).
• The attention processor may not be setting the attention pending indicator
(UCMAF in the UCME) on for the console causing the interrupt. It is turned on
when the attention ECB (UCMAECB) is posted.

• If the attention pending (UCMAF) and busy (UCMBF) indicators in the UCME
are both on, the attention interrupt will not be processed until an I/O processing
complete interruption is received from the console. I/O processing is performed
by a specific device support processor (DSP). The busy indicator (UCMBF) is
turned on while the console is waiting for the completion of an I/O operation
and is turned off when the I/O completion operation is processed.

Enabled Wait State
If the communications taskis in an enabled wait state, check the following:

Norma/Case: The communications task has no work to do; that is, no
communications task ECBs have been posted. Check the following ECBs (see
Figure 5-53 f()r descriptions and locations of the ECBs).
UCMXECB
UCMARECB

UCMAECB
UCMNPECB

UCMOECB
UCMECB

UCMDECB

WQE Limit Reached: The system limit for WQEs or OREs has been reached
(indicated by message IEA404A).
• Check the following fields in the UCM:
UCMWQNR. - indicates the current number ofWQEs in the system.

5.15.6

UCMWQLM

indicates how many WQEs can be built.

UCMRQNR

indicates the current number of OREs in the system.

UCMRQLM

indicates how many OREs can be built.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Communications Task (continued)
• Check the following indicators in the UCM prefix:

•

UCMSYSI

- indicates that cleanup of the WQE chain is needed; that is,
eliminate WQEs marked for deletion. This indicator is checked
by the wait service (lEAVMQWR) and device service
(lEAVMDSV) routines; and it is set on by the DaM processing
(lEAVMDOM), wait service (lEAVMQWR), console queueing
(lEAVMWSV), multiple-line processing (lEAVMWTO), and
WTO!WTOR processing (lEAVVWTO) routines.

UCMSYSJ

indicates that at least one message needs to be sent to the hardcopy log. Possibly, the WQE space is filled with WQEs
(messages) that need to be sent to the hardcopy log. This
indicator is referenced by the wait service (lEAVMQWR) and
device service (IEAVMDSV) routines, and it is set on by the
wait service (lEAVMQWR) and console switching (lEAVSWCH)
routines .

UCM~YSM

indicates a failure in a composite console. This indicator is used
by the console switching (IEAVSWCH) routine.

UCMSYSO

indicates a dummy attention interrupt. This indicator is
checked by the wait service (lEAVMQWR) routine. It is set on
by the WTO/WTOR processing (lEAVVWTO) routine when the
system log is not available and a WTL (write-to-log) is changed
to a WTO macro.

Disabled Wait State The communications task issues only one wait state code, code 007. This code is
issued during nucleus initialization when a master console is not available to the
system. See wait state code 007 in OS/VS2 System Initialization Logic.

Messages or Replies Lost
Messages and replies can be lost or routed incorrectly if the WQE, ORE, or CQE
control blocks are not chained correctly.
• To ensure that the WQE chain is intact, check the following:
In the UCM, check fields:
UCMWTOQ - points to the first WQE on the chain.
UCMWQEND - points to the last WQE on the chain.
In each WQE, check:
WQELPKA
WQEORE

points to the next WQE on the chain.
- indicates that an ORE exists for this WQE.

• To ensure that the ORE chain is intact, check the following:
In the UCM, the UCMRPYQ field points to the first ORE.
In each ORE, check:
ORELKP
OREWQE

- points to the next ORE on the chain.
- points to the WQE associated with this ORE.

SectionS: Component Analysis

5.15.7

Communications Task (continued)
.To ensure that theCQE chain is intact, check the following:
- In the UCME (for each console), the UCMOUTQ field points to the first
group of six CQEs.
- In each group of CQEs, the CQEWQEA field in the last (sixth) CQE points
to the next group of CQEs on the chain ..

Note: Each CQE is one word; one byte for control bits, and three bytes for
a pointer. The. CQEs are built in groups of six. The first five CQEs point to
WQEs and the sixth points to the next group of CQEs.
- In each group of CQEs, the CQEWQEA field in the first five CQEs point to
their associated WQEs.
- In each CQE, the CQEFLAG byte contains the control bits.

No Messages on One Co'nsole
If messages are not being received on a specific console, check the following:
• The device busy indicator (UCMBF) in the failing console's UCME may be on.
A message is not processed until an I/O processing complete interruption is
received from the console. I/O processing is· performed by a specific device
support processO,r (DSP). The busy indicator (UCMBF) is turned on while the
console is waiting for the completion of an I/O operation and is turned off when
the I/O completion operation is processed.
• If the console is not busy, ensure that the CQE chain for the console is intact.
(Refer to the previous topic "Messages or Replies Lost".)

• If the CQE chain is valid, then check for unusual status in the failing console's
UCME and UCB.

Messages Routed to Wrong Console
The console queueing routine (IEAVMWSV) queues messages for specific consoles
and builds the CQE chain. If messages are routed to the wrong console, then:
• Ensure that the CQE chain is correct for the fa,iling console. (Refer to the
previous topic "Messages or Replies Lost".)
• Check the routing codes for each console. The, UCMRTCD field in each
console's UCME defmes the routing codes for the respective consoles.
• Check the routing codes for the messages that are being incorrectly routed:
- In the WTO/WTOR WQE, the WQEROUT field contains the routing codes
for the message.
- In a major multiple-line WQE (for MLWTO), the WMJMRTC field contains
the routing codes.

S.lS.8

OS!VS2 System .Programming Library: MVS Diagnostic Techniques

•

Communications Task (continued)
Truncated Messages

If message text is being truncated (the length of the message text is shortened),
then:
• The message may exceed the maximum allowable bytes for console messages.
• The console operator may have requested that time stamps and/or job names
appear with the messages. Check the following indicators in the UCME for the
failing console:
UCMDISPI

indicates that messages are to appear with both time stamps and
job names.

UCMDISPJ -

indicates that only job names are to appear with messages.

Console Switching
Console switching is performed by the IEAVSWCH routine for the following error
condi tions:
• An I/O error occurs on a console. The failing console's function is automatically
switched to its alternate (or, if none available, to the master console). Check the
I/O interrupt ECB (UCMECB) in the failing console's UCME. Note that
successful I/O completion is indicated by X'7F' in the first byte of the ECB.
• An abnormal termination in the device support processor (DSP) that services the
failing console. The failing console's function is automatically switched to its
alternate (or, if none available, to the master console). Check the appropriate
DSP in load module IGC0007B.
• A processor failure in a multiprocessing environment as a part of alternate CPU
recovery (ACR). Consoles are switched as required. Check the alternate CPU
recovery ECB (UCMARECB) in the UCM.

DIDOCS Trace Table
A DIDOCS trace table exists in the pageable DCM (display control module
IEETDCM) beginning at field DCMTRACE. The trace table contains the identifiers
of up to 16 of the last DIDOCS modules to receive control on the console
represented by the pageable DCM.
After each DIDOCS module receives control, it places a two-byte identifier in
the trace table. The first byte of the identifier states whether the module is an
"E" module (such as IEECV~TA) or an "F" module (such as IEECV..ETA). The
second byte of the identifier is the last character in the module name. For
example, the identifier for IEECVETA is "EA and the identifier for IEECVFTI is
"F 1". (An exception to this rule occurs during DIDOCS recovery processing.
Entries to the ESTAE routine in IEECVETI are indicated by the identifier "ES".)

Section 5: Component Analysis

5.15.9

Communications Task (continued)
When DIDOCS is entered for the first time to perform an operation, the first
DIDOCS module to receive control (module IEECVETI) places two bytes of
asterisks in the trace table before it stores its identifier. The asterisks signal the
beginning of a DIDOCS operation.

DIDOeS-In-Operation Indicator
At offset X'IIF' in a console's page able DCM (display control module IEETDCM)
is a field labeled DCMMCSST. When DIDOCS is processing, bit DCMUSE (X'80')
in DCMMCSST is set on. This bit remains on during any SVC processing initiated
by DIDOCS (SVC34, GETMAIN, FREEMAIN, andEXCP). DIDOCS turns the
bit off when DIDOCS exits (via BRI4).

DIDoes Locking
DIDOCS uses two fields (CSAXB and CSAXC) in the communications extended
save area (CXSA) to control locking during DIDOCS operations.
The two fields are used as follows:
• When the lock is available:
Field CSAXB contains the address of the subroutine that obtains the lock.
Field CSAXC contains the address of a BRl4 instruction.
• After a DIDOCS module obtains the lock, the subroutine that obtains the lock:
Sets field CSAXB to the address of a BRl4 instruction.
Sets field CSAXC to the address of the subroutine that releases the lock.
• After the DIDOCS module releases the lock, the subroutine that releases the
lock:
Resets field CSAXB to the address of the subroutine that obtains the

lock~

Resets field CSAXC to the address of a BRl4 instruction.
When the lock is already held by a DIDOCS module (field CSAXB contains the
address of a BRI4), any attempt by another DIDOCS module to obtain the lock
results in a no-operation (NOP).

5.15.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Appendix A: Process Flows

This appendix describes the flow of various MVS processes. These processes are
described in the following chapters:
• RSM Processing for Page Faults
• Swapping
• EXCP/IOS
• GETMAIN/FREEMAIN
• VT AM Process
• TSO

•
Appendix A: Process Flows

A.l.l

A~1.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

RSM Processing for Page Faults

This chapter describes the important aspects of the RSM component's page-fault
processing. Figure A-I outlines the major functions in this processing.
During page fault processmg, several important tests are made. The following
describes what these tests are, where they are made, and what they mean during
the course of the RSM page fault process.

lEAVPIX Tests
lEAVPIX performs the following:
•

Checks the PGTE to ensure that PGTPVM is still on after the SALLOC lock
has been obtained. This is done because in an MP environment the other
processor might have validated the PGTE (turned PGTPVM off) between the
time this processor page-faulted and the time the SALLOC lock was obtained.

•

Checks the PGTE to ensure that PGTPAM is on. If it is not, this is.a logical
protection violation.

lEA VGF A Tests
IEAVGFA performs the following:
•

Checks XPTE to ensure that XPTDEFER is off. If it is on, some paging
request (PCB) for this page has been deferred. (The PCB is on the GFA defer
queue anchored in the PVT.) Normally, the fact that a paging request is
currently outstanding is indicated by PFTPCBSI, but in the defer case there is
no frame and therefore no PFTE is yet associated with the request.

•

Checks to see if RBNx from the PGTE is non-zero. If it is non-zero, the last
used PFTE can be located. Once that PFTE is located, you can determine if
the frame has been used for another purpose since last backing the requested
VBN.

•

Checks the XPTE for XPTFREL. If XPTFREL is on, the RBNx of the PGTE
is zero and there is a paging request (PCB) on an I/O queue for this page that
must be "related to." Because only a swap produces this condition, which
applies only to stage 2 working set pages, this bit can only be on in private
area XPTEs.

•

If reclaim is successful, checks PFTONAVQ. If it is on, the reclaimed frame is
available immediately; that is, no page I/O can be in progress for it. If it is
off, the frame must be on either a local queue or the common queue and a
PCB must be on an I/O queue.

Appendix A: Process Flows

A.1.3

RSM Processing for Page Faults (continued)

®

PIC 11
IEAVEPC
Determines type of
program interrupt
IEAVFP
Locates PGT E
and XPTE

IEAVEDSO
Saves status and
selects next unit
of work to run

@
IEAVPIX
Formats page
fault PCB

IEAVPCB
Allocates a
PCB from the
PCB free
queue

IEAVPFTE
Moves PFTE
from AFQ to
LFQ or CFQ

IEAVSUSP
(physically located in
IEAVEPC) Suspends
page faulting TCB/SRB
and allocates an SSRB
for an SRB mode page
fault

0

0

IEAVGFA

0)
IEAVPCB
Queues PCB on
I/O queue

Note: Circled numbers indicate the sequence of processing.

Figure A-I. Page Fault Process Flow (Part 1 of 2)

,A.l.4

OS!VS2' System Programming Library: MVS Diagnostic Techniques

ILRPAGIO
Passes AlA (ASM's
request element)
to ASM. If ASM is
not active, schedules
an SRB to the ASM
monitor.

RSM Processing for Page Faults (continued)

ILRPAGCM
ASM I/O complete
processor (obtains
SALLOC Lock)

J@

®
"
Normally given
control via
IECIOSCN

"

IEAVPIOP
Runs with the SALLOC
lock held in the address
space interupted at I/O
completion
Schedules SRB to
IEAVIOCP and
validates PGTE for
common area pages
Marks PCB I/O complete
(zeroes PCBVBN for
common area PCBs)

•

•

•

IEAVIOCP
• Runs in page faulting
address space in SRB
mode with SALLOC
lock and sometimes
also with local
lock
• Validates private
area PGTE

IEAVRSET
(physically located in
IEAVEPC) Sets
dispatchable suspended
TCB/RB or schedules
SSRB

IEAVPCB
Returns PCB to
PCB free queue

Note: Circled numbers indicate the sequence of processing.
Figure A-I. Page Fault Process Flow (Part 2

of 2)

Appendix A: Process Flows

A.1.S

RSM Processing for Page Faults (continued)
•

Checks to see if Pf'TPCBSI is zero. If it is, there is an inconsistency and
IEAVGFAissues"a·COD abend to record the error. IfPFTPCBSI is on, the
VBNO value is used to select the PCB I/O queue to be searched and a PCB
relate functiort is performed.

•

If an old frame with or without I/O in progress cannot be found, a frame is
selected from the front of the AFQ. The PFTE is filled in and it is queued on
either the common or the local frame queue. The XPTE (XPTXAV) is now
checked to see if the paging data sets contain a copy of this virtual page.
If XPTXAV is on, a page in operation is started; if it is off, the frame is
cleared to zeroes.

•

If the AFQ is empty, the request is deferred by placing the PCB on the PCB
defer queue (PVTGFADF) of the PVT. The XPTDEFER flag is set in the
XPTE.

•

If a page~in is needed, the RBNO of the allocated frame is placed in the AlA
(which is always physically adjacent to the PCB) and the AlA is passed to
ASM. Processing then exits as shown by steps 9, 10 and 11 in Figure A-I.

lEA VPIOP Tests
lEAVPIOP receives control from ASM and is passed the AlA when I/O has
completed. lEAVPIOP checks for an I/O error and marks the PCB I/O complete.
If necessary, lEAVPIOP indicates an I/O error in the PCB. lEAVPIOP checks
PCBFREAL to d~.termine if the reason for the page-in still exists.
IfPCBFREAL=I, the page-in has been NOPed for some reason (such as
FREEMAIN) and the frame is sent to the AFQ. If PCB FREA L=O , the PGTE must
be validated. IEAVPIOP validates common area PGTEs but must schedule
IEAVIOCP to validate private area PGTEs because they are in the LSQA of the
page-faulting address space. If IEAVPIOP validates the common area PGTEs,
PCBVBN is set to 0 to prevent a second validation by lEAVIOCP. lEAVIOCP will
be scheduled if PCBRESET=I. PCBRESET is still one unless the PCB has
been NOPed.

lEAVIOCP Tests
IEAVIOCP runs in SRB mode and gets the local lock according to SRBPARM.
SRBPARM is set earlier by IEAVOPBR (a subroutine of IEAVPIOP) if
IEAVIOCP will need the local lock. lEA VOPBR is called from several places
in RSM; its sole function is to determine if lEAVIOCP will need the local
lock and to schedule IEAVIOCP.

A.1.6

OS/VS2 System Programming Library: MVS Diagnostic Techniques

RSM Processing for Page Faults (continued)
lEAVIOCP searches the local and common PCB queues looking for I/O
complete PCBs. Once found, lEAVIOCP calls lEAVRSET for any I/O
complete PCBs with PCBRESET= 1. The reset function (lEAVRSET in
IEAVEPC) is responsible for making the suspended work (TCB/SSRB)
redispatchable. lEAVIOCP validates the PGTE for any I/O-complete PCB
with a non-zero PCBVBN, with PCBFREAL=O, and without an I/O error
(PCBIOERR=O). When this is done, IEAVIOCP returns the PCB to the free
queue.
Because lEAVIOCP is queue-driven, it might not be able to get the local lock
when it requests it. In such a case, it can be held in suspension by a page
faulter whose PCB is on the queue lEAVIOCP is working on. Therefore, up
to two SRBs can be scheduled for IEAVIOCP at one time. If IEAVIOCP does
not hold the local lock and discovers an I/O-complete PCB that needs to be
reset and for which reset requires the local lock (PCBLLHLD=O,
PCBSRBMD=O, PC BPEX= 1, an unlocked TCB page fault), it can call
IEAVOPBR to reschedule itself (exit to dispatcher). IEAVIOCP continues
its scan of the PCB queues, doing any work possible before it exits to the
dispatcher.

Appendix A: Process Flows

A.t.7

A.1.8

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Swapping

This chapter describes the major considerations and decisions of the swapping
processes (swap-in and swap-out).

Swap-in Process
The numbers in the following descriptions correlate to the circled numbers in
Figure A-2.

CD - G)

·1

CD - 0

CD

SRM schedules lEAVSWIN and passes it the address of the ASCB
in SRBPARM. IEAVSWIN obtains working-set size (SPCTWSSZ)
+1 PCBs. It then scans the SPCT LSQA entries and fills in a PCB
for each entry. Next IEAVSWIN scans the fix entries. For private
area fix entries, it builds a stage one PCB. For common area fixes,
it adds the SPCT fix count to the PFTE fix count. For common
area fixes not in storage, it builds a PCB. Next, IEAVSWIN scans the
SPCT segment entries and builds a PCB for each bit map entry. It
then returns unused PCBs to the PCB free queue and calls
lEAVGFA. If enough frames are not available for the stage one
pages, IEAVGFA returns a code of eight to IEAVSWIN and sets
PCBRETRY. IEAVSWIN notifies SRM via a SYSEVENT SWINFL
to try the swap-in later.
IEAVGFA allocates frames for both stage one and stage two PCBs
and then calls ASM to start swap-in I/O.
After swap-in I/O completes, the IEAVSWIN root exit IEAVSIRT
is called by lEAVPIOP with stage one PCBs chained from the root
PCB. IEAVSWIN does the following:
• Updates PFTFXCT if any fix counts are greater than 255
• Sets ASCBSTO
• Fills in SGTEs in non-translate mode
• Fills in PGTEs in non-translate mode

o

lEAVSIRT calls lEAVPCB to free the root and all stage one PCBs.

(j)

lEAVSIRT calls ASCBCHAP to put the ASCB back on the ASCB
queue.

@

IEAVSIRT calls status to start both quiesceable and non-quiesceable
SRBs.

Appendix A: Process Flows

A.2.1

Swapping (continued)

IRARMCSI
SRM schedules
swap-in

CD
IEAVPCB
SWIN gets
SPCTWSSZ+1
PCBs

PCBs on exit
fromSWIN
mainline

IEAVSWIN
(MAINLINE)
Executes in SRB
mode in master
scheduler's
address space.
Builds PCBs and
gets frames
allocated

®

,0
IEAVGFA
Allocates
Stage 1'and
Stage 2 frames

'~
Stage 2

0

ILRSWAP'
Starts
Swap-in
paging I/O

Stage 1 I/O Completes
I LRPAGCM (Normally executes iii the address
space interrupted at the completion
of I/O)

IEAVPIOP
Decrements '
SWIN root
count; Calls
root exit
when count=O

IEAVSWIN
Entry IEAVSIRT
(root exit)
Rebuilds segment
and page tables

Schedules SRB to
swapped-in
address space

IEAVSWIN
Entry SWINPOST
Post RCT to
restore address
space

PCBs on entry to
SWI N root exit

IEAVPCB
Frees root
and Stage 1
PCBs

IEAVEACO
(ASCBCHAP)
Places ASCB
on dispatching
Queue

Private Area Stage 1 PCBs chained out of
PCBRWRK1 and PCBRWRK2

Figure A-2. Swap-In Process Flow

A.2.20S/VS2 System Programming Library: MVS Diagnostic Techniques

IGG079
(entry IGC07903)
Status start SRBs

Swapping (continued)
lEAVSIRT obtains an SRB from the RSM cell pool and schedules an
entry point in IEAVSWIN (SWINPOST) into the swapped-in address
space so it can post the region control task. SWINPOST posts
RCT's ASCBQECB to restore the address space.
Note that stage two frames are allocated at the same time as stage
one frames. The XPTFREL flag is on in each stage two PCB's
corresponding XPTE. Then, if a page fault or other request reaches
IEAVGFA prior to stage two I/O complete, IEAVGFA can relate
the request to ongoing I/O (see the chapter on RSM in Section 4 for
a discussion of RSM's relate functions). IEAVIOCP sets the
XPTFREL flag to zero and fills in the PGTRSA field when stage two
I/O completes. lEAVSOUT sets to 0009 all PGTEs for which it
made a bit map entry in the SPCT.

Swap-Out Process
IEAVAR02
SRM (IRARMCSO) posts the region control task (RCT) to swap out the address
space. RCT is responsible for:
• lOS purge processing: I/O requests that have been requested or started are
purged or quiesced, respectively.
• Halting all tasks in the address space with the exception of its own task.
• Preventing quiesceable SRBs from executing.

IEAVSOur
The numbers in the following descriptions correlate to the circled numbers in
Figure A-3.

CD

IEAVSOUT receives control from RCT and calls STATUS (IEAVSSNQ) to
stop non-quiesceable SRBs.
lEAVSOUT gets enough PCBs to page out every private area page in the
address space plus one to be used as a swap out root.
lEAVSOUT clears the swap control table (SPCT) LSQA, fix entries
(SPCTSWPE), and all bits in the bit maps in the segment entries (SPCTSEGE).
Prior to this, the SPCT reflects the status of the address space at the last
swap-out. SPCTSEGEs provide a mechanism to check how many and which
segments are obtained via GETMAIN in an address space because there is a
SPCTSEGE for each private area segment that is obtained by GETMAIN.

Appendix A: Process Flows

A.2.3

Swapping (continued)

IEAVAR02
Region Control Task
j

~

IEAVSOut

(2)

Stops non-quiesceable SRBs

0

Gets ASCBFMCT+1 PCBs
(1 extra for root)

0
0
0
0
G
0
@

@)
@
@

In itializes SPCT
Builds LSQA entries in SPCT
SRB for
IEAVPIOI

Builds fix entries in SPCT
from FOEs
Initializes PCBs including
root
Purges paging I/O
Completes Stage 1 PCBs
(!.-SQA and Fixed)
Completes working set PCBs
(changed private area)
Frees unused PCBs
Schedule I EAVPI 01 to master
scheduler's address space
Returns to RCT

After the SRB is dispatched in
the master scheduler's address
space:

IEAVEACO
(ASCBCHAP)
Removes ASCB
from dispatching
queue

PCBs are on local queue
(RSMLlOQ) when
received by PIO I

@
IEAVINV
Issues PTLB
ILRSWAP
Starts I/O
swapping

Figure A-3. Swap-out Process Flow

A.2.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Swapping (continued)

@
@
@

lEAVSOUT next initializes a PCB for each changed page on the local frame
queue and sets a bit in the bit map (SPCTBITM) for all pages that are not to
be stolen. The steal is based on a comparison of a criterion number passed
by SRM in OUXBSTC to PFTUIC.
lEAVSOUT returns any unused PCBs to the free queue. This marks where
on the free queue the swap-out began.
IEAVSOUT schedules an SRB for IEAVPIOI, releases the SALLOC lock, and
returns to RCT (lEAVAR02), which waits for ASCBQECB to be posted by
swap-in (lEAVSWIN). Because release of the SALLOC lock enables the processor,
an address space is often swapped-out before RCT has gotten a chance to
wait. When analyzing a stand-alone dump, you will see the following if the
above case is true:
• The RCT is dispatchable.
• There is no wait count in RBWCF.
• There are no frames allocated to storage (ASCBFMCT=O).
• The address space is not on the ASCB dispatchability queue.
Do not consider this situation a problem.

/EAVP/O/

lEAVPIOI receives control in the master scheduler's address space. It calls
ASCBCHAP to remove the ASCB from the dispatching queue, calls ASM with the
string of AlAs passed to it from IEAVSOUT via the SRBPARM field, and calls
lEAVINV to PTLB and exits.

A.2.6

OS!VS2 System Programming Library: MVS Diagnostic Techniques

(
...

Swapping (continued)

o

)

lEAVSOUT builds a two-byte LSQA entry for each frame on the LSQA
frame queue.
lEAVSOUT builds a four-byte fix entry for each page (private or common)
that has an FOE on any TCB in this address space. The fix count is added
into the ftx entry SPCTSWPE. Note that ftxes done without a TCB address
supplied do not have FOEs.
lEAVSOUT initializes a root PCB to zero and places the address of
IEAVSORT in PCBRGOTO. It initializes the remaining PCBs, which might
be used to swap-out a page as follows:

Partially initialized Swap-Out PCB

FF 000000

00
06
00
80
80
00
00

000000
A(ROOT) Root and output

000000
000000
000000
000000
000000

PCBFREAL
Swap-out

A(ASCB)
:<

)

AlA

00
00
18
00
00
00
00

000000
000000
COOOOO

Swap-out and write

000000
000000
000000
000000

lEA VSOUT purges paging for this address space on the common PCB I/O
queue, local PCB I/O queue, and the GFA deferred queue. The processing is
to post users waiting on ftxes, reset page faulters, and to NO-OP the PCB.
(Fix entries are made for PCBs found for zero TCB ftxes.) The NO-OP
process makes the PCB look like a cancelled page load PCB; that is, no
notification (RESET/POSTING) is to be done and the frame is to be freed.
PCBs on the GFA defer queue are removed. The only exceptions here are for
zero TCB fixes for which no entries could be made in the SPCT (GETMAIN
for SQA failed). These PCBs remain unchanged and the fixed frame remains
fixed throughout the swap ..
IEAVSOUT fills RBNs and VBNs into PCBs for each LSQA or ftx entry now
in the SPCT. Even unchanged fixed or LSQA pages are paged out.

Appendix A: Process Flows

A.2.S

EXCP/IOS

Figure A-4 is an overview of the I/O process through MVS using EXCP as the driver.
The following outline correlates to this process.
1.

Problem program issues GET/PUT (implied wait).

2.

Problem program branches to access method.

3.

Access method issues SVC 0 (EXCP) to EXCP front end.
or
Access method issues SVC 114 (EXCPVR) to EXCP front end.

4.

EXCP front end:
a.

Validates request.

b.

Builds RQE.

c.

Queues related requests.

d.

If a VIO data set, goes to window intercept processor.

e.

Builds SRB/IOSB.

f.

If a virtual user, gets TCCW and BEB.

g.

Branches to PAGE FIX appendage (if specified and not a V=R region).

h.

Branch returns.

i.

If EXCPVR request, fixes pages from PAGE FIX appendage.

j.

Fixes DEB for V=R user if not already fixed.

k.

If a DASD device, branches to END of EXTENT appendage (if seek
address is out of specified extent).

1.

Branch returns.

m. Branches to START I/O appendage if specified.

s.

n.

Branch returns

o.

If virtual user: translates CCWs, fixes pages for buffers, and builds IDAL.

p.

Issues START I/O macro (branch entry to IDS front end).

IDS front end.
a.

Builds IOQ.

b.

Selects physical path (channel scheduling).

c.

If path available, adds prefix CCWs and issues SID; otherwise, queues
10Q on LCH.

d.

Restarts all queued I/Os to available channels.

e.

Branch returns to EXCP front end and branch returns from EXCP front
end to problem program WAIT.

Appendix A: Process Flows

.A.3.1

>
~

Enabled, Problem Program, User Key, Under TCB

Enabled, Supervisor, Key 0, Under TCB, Local Lock

Disabled, Supervisor, Key 0, Under TCB

EXCP Front End

o

rn

BR I'Access Mother

GET/PUT

"<
rn
~
.fIJ
S-

Enabled, Supervisor, Key 0, Under TCB, Local Lock

:3

~

i

r

:3

Branches (If PAGE FIX Appendage
Specified and Not V=R Region)
Fixes DEB For V=R if Not Fixed Yet.

BRANCH

Branches (DASD Device and Seek
Address is Out of Extent)

EOE
")
Appendage
BRANCH

~

----'--

~

Branches (If SIO Appendage Specified)

--o
n

I

BRANCH

SIO
~4
APpendage)

~
rn

Prefixes CCWs
Issues SIO (Instruction)
Restarts All Queued I/Os
to Available Channels
Returns

BRANCH

~.

~

Queues 10Q on LCH
Return To Caller
Yes

BRANCH

PAGE FIX ' - Appendage

IDS Front End
Builds 10Q
Selects Physical Path
(CHANNEL SCHEDULING)

Validates Request
Builds RQE
Queues Related Requests
If VIO Data Set, Goes to WINDOW
INTERCEPT PROCESSOR
Builds SRB/IOSB
If Virtual User, Gets TCCW and BEB

SVC 114 (EXCPVR)

EXCP

r+--

SVC 0 (EXCP)

IV

BRANCH

"'tJ

Channel Program
Execution

If Virtual User: Translates CCWs,
Fixes Pages For Buffers, Builds IDAL

00

Enabled, Problem Program, User Key, UnderTCB

S!

..... -

t

-

-

- - -- - -

WAIT '- ) .
ECB=ECBX 4

~

---

BRANCH

.J

::to
(')
~
CD

lOS

EXCP Processor

User

~

-

- - - - Disabled, Supervisor, Key 0 - - - - - -

g.

'(;"

Issues START 10 (Macro)

II

-r --

BRANCH

Returns

- --. --

-~~
--:-~b~~

-t- - -

Supervisor,

Disabled~~ l-·- - - --""-

Disabled Interrupt Exit (DIE)

.c.=

lOS Back End

=
fIJ

BRANCH

PCI
Appendage

FLIH

)

Maps 10SB/IOB

BRANCH

L

(

-- ---

I I

If PCI and V=R or EXCPVR 4

CD

o

a~.

I/O Interrupt

I I

Maps 10B/IOSB

I

Queues Type 3 Related Requests

BRANCH TRAS

I I BR1ANCH .. TRAS
Schedules POST STATUS
(Global SRB)
Channel Restart
Returns to FLiH

-

Supervisor, Enabl;d,""U;;;t(ey, Local Lock, Under S~
Appendages
PCI (V=V)
CE

I

~

•

ABE

Figure A-4. IOS/EXCP Process Flow

BRANCH
BRANCH

I

--S~isor, Enabled, Key 0, Under SRB-- ~

II
.•

--

S~pervisor ~abled,

Key 0, Under SRB

IDS POST STATUS

I II

f

EXCP Back End
BRANCH
Maps IOS8/IOB • .
If Exit Processing (PCI, CE, ABE)
.. Maps 10B/IOSB _ _ _ _ _-+._
"B...,R_A_N_C_H_t-.- - t . - - - - - - - - ,
Termination:
Maps 10SB/IOB
Yes
No
Start Related Requests
BRANCH
Error?
Schedules ERP
Free Control Blocks
Exits to Dispatcher
Posts ECB = ECBX
Exits to Dispatcher

I

I

<-

From
Dispatcher

--2-

EXCP/IOS (continued)

6.

lOS back end (entry from I/O FLIH) entered as a result of I/O interrupt.
a.

If DIE is specified:

(1) TRAS (translates address space - to get addressability to control blocks
in originating address space).
(2) Branch enters DIE.
(3) If PCI and V=R or EXCPVR, maps 10SB to lOB and branch enters
PCI appendage.
(4) PCI processing.
(5) Branch returns to DIE.
(6) Maps lOB to 10SB.
(7) Queues type-3-related requests.
(8) Branch returns to lOS front end.
(9) TRAS (returns to addressability at time of interrupt).

7.

b.

Schedules POST STATUS [global] (means POST STATUS will be entered
via dispatcher).

c.

Branches to channel restart to start queued 10QEs on LCHs.

d.

Returns to FLIH.

e.

If system was in SRB mode, loads PSW for SRB or returns to dispatcher.

lOS POST STATUS (scheduled from lOS back end).
a.

If PCI, CE or ABE appendages specified:
(1) Branch enters EXCP back end.
(2) Maps 10SB to lOB.
(3) Branch enters appropriate appendage.
(4) Appendage processing.
(5) Branch returns to EXCP back end.
(6) Maps lOB to 10SB.
(7) Branch returns to lOS POST STATUS.

b.

If error, schedules ERP. (See 8.)

c.

Branches to EXCP back end for termination processing.
(1) Maps 10SB to lOB.
(2) Starts related requests.
(3) Unfixes buffer pages.
(4) Posts ECB (the one after the GET/PUT).
(5) Exits to the dispatcher.

Appendix A: Process Flows

A.3.3

EXCP/IOS (continued)
8~

A.3.4

ERP interface.
a.

If IBM ERP, get ERP work area.

b.

If DASD(IECVDERP), branch to ERP.

c.

If non.. DASD, schedule ERP loader (IECVERPL) under SIRB.
Use stage II exit effector to queue SRB to ASXBFSRB. Set
stage II exit effector switch in ASCB.

OS!VS2 System Progtamminj Library: MVS Diagnostic Techniques

GETMAIN/FREEMAIN

This chapter describes the processing for virtual storage requests in terms
of GETMAIN processing and FREEMAIN processing. The flow through the
GETMAIN/FREEMAIN process is complicated and the VSM control block
structure should be understood prior to following this process. This process flow
is not intended to explain exactly how GETMAIN/FREEMAIN works but to
provide an understanding of the important considerations of virtual storage
management, how the important control blocks are manipulated, and the common
subroutines of VSM.

GETMAIN Processing
The following describes the processing required to satisfy a given GETMAIN.
1.

A problem program issues an SVC 10 GETMAIN for subpool 0 for 256 bytes.

2.

GETMAIN (entry at IGCOI0) saves the TCB addresses in LDA, sets
IEAVGMOO's FRR (module IEAVGFRR), sets up the length and subpool
ID for·common processing routines, saves the caller's mode in LDARQSTA,
and goes to the common GETMAIN routine, GMCOMM 1..

3.

GMCOMMI goes to routine CSPCHK to find the SPQE for the requested
subpool. CSPCHK is a key routine for defining the characteristics of various
subpools. For subpool 0, CSPCHK searches the TCBMSS chain~ If no SPQE
is found, CSPCHK returns a zero for the address of the SPQE and saves the
address of the previous SPQE on the chain in SPQE SAVE.
GMCOMMI then calls routine QSPQESPC to get a 16-byte element to build
and chain-in an SPQE for the requested subpool. The 16-byte blocks for
internal control blocks are obtained via GETMAINC (a simple GETCELL
function).

4.

GSPQESPC passes control to (label) ROUND where the request is rounded
up to a doubleword boundary.

5.

GMCOMM calls GFRECORE to search the FQEs pointed to by each DQE for
the appropriate subpool. A best-fit algorithm is used to fmd the smallest free
elementlarge enough to satisfy the request. Exception: LSQA/SQA requests
for 4096 bytes or less are not satisfied across page boundaries because the
request can be for page or segment tables that must reside in contiguous real
storage.

6.

If storage is found in an FQE, GFRECORE calls GFQEUPDT to maintain and
update the FQE chain. (Control is passed to step 9.)

Appendix A: ProcessFlows

A.4.1

Virtual Storage Requests (continued)
7.

If storage is not found in an FQE, GFRECORE determines the number of
4K-blocks that are required and calls G4KSRCH to satisfy the request.

8.

G4KSRCH performs the following functions:

9.

a.

Calls FBQSRCH to search the appropriate FBQE chain to fmd 4K bytes
of free space. (For problem program subpools, TCBPQE points to PQE
which pOints to FBQE.) Once found, FBQSRCH removes the space from
the FBQE and, if the FBQE is empty,frees it via an internal FREEMAIN
(FMAINB) or an internal freecell (FMAINC).

b.

Acquires a DQE and chains it onto the DQE chain anchored in the SPQE.

c.

Calls RSM (lEAVFP1) to locate the page table entry (PGTE) and the
external page table entry (XPTE) of the new 4K-block. Then at label
SETUPPTE it initializes both the GETMAIN-assigned flag (PGTPAM) in the
PGTE and the XPTPROT (protection key) in the XPTE (+0). Note: This
is the only place XPTPROT is set.

d.

Updates the SMF region usage fields of the TCT (task control table).

e.

Creates an FQE and chains it from the DQE that was just built.

f.

Returns to GMCOMMI.

GM~OMMI

places the address of the allocated storage in register 1 and sets
the return code. Then GMCOMMI performs housekeeping of any areas
chained from FMAREAS in the LDA, deletes the FRR, and passes control to
the EXIT prologue.

FREEMAIN Processing
The following is a logic flow of the FREEMAIN process when a problem program
issues an SVC 10 requesting 256 bytes from subpool O.
1.

2.

A.4.2

Upon entry at IGCOI0, FREEMAIN:
a.

Saves the TCB address in LDA.

b.

Establishes the FRR (lEAVGFRR).

c.

Saves the callers mode in LDARQSTA.

d.

Sets up the length and subpool ID for common processing.

e.

Passes control to FMCOMM 1.

FMCOMMI passes control to FMCOM because the request is not to free an
entire subpool. FMCOM calls CSPCHK to locate the SPQE. The associated
DQEs are searched to locate the one DQE that describes the area to be freed.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Virtual Storage Requests (continued)
3.

Label QELOCATE ensures that the area is not already described in an FQE
(if it is, the requestor is abnormally terminated). Subroutine CREATFQE
obtains a 16-byte element for an FQE, then builds the FQE and adds it to the
proper FQE chain. Note: If possible, FQEs are combined if the new free
space is adjacent to free space described by an existing FQE.

4.

If less than 4K bytes are freed, FREEMAIN has completed its task and control
is passed to the EXIT prologue.

5.

a.

If all space described by a DQE has become free, FREEMAIN frees the
FQE and DQE and notifies RSM (IEAVRELV) that a page(s) can be
released.

b.

If a virtual page is freed, FREEMAIN frees the FQE (and adjusts the DQE
if the free pages exist at either end of the described area) and notifies RSM
(IEAVRELV) to release the page(s).

c.

If the free page exists in the middle of the area described by the DQE,
FREEMAIN obtains a new DQE and the two DQEs will now describe the
area (essentially the area has been split into two parts). FREEMAIN
updates the associated FQEs and notifies RSM (lEAVRELV) to release
the page(s).

Note: RSM invalidates the PGTE(s) for the associated pages being freed
and calls ASM to release the auxiliary storage copy of the page. If a page
table has become completely free, lEAVGMOO is passed the PGT address,
which is queued from a field in the LDA (FMAREAS) to be freed at exit
time. FMAREAS is really a list of items no longer required to describe
virtual storage.
6.

After restructuring the DQEs, MRELEASE returns virtual space to the
appropriate FBQEs. If possible, MRELEASE places 4K blocks of storage in an
existing FBQE; if not, it builds a new FBQE and includes it in the existing
FBQE chain.

7.

FREEMAIN returns to FMCOMM 1A, which performs FMAREAS bookkeeping, deletes the FRR, and returns to the caller.

Note: FMAREAS anchors a one-way chain of areas to be freed. The area
itself contains the address of the next area at offset +0 and the subpool's ID
and length at offset +4. These areas are not freed immediately by IEAVGMOO
because freeing them might cause register save area overlays on the double
recursion into FREEMAIN processing.

Appendix A: Process Flows

A.4.3

A.4.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VTAM Process

The following shows the logic ·flow through the VT AM component into
lOS and out to the 3705 when an application issues a SEND request. This description includes the major module flow and the control blocks required in order to
process the request. Note that this is a general processing flow; additional modules
not shown can be entered depending on options and device type. Figure A-5
illustrates the system modes at various stages of the VTAM processing.
1.

The application program issues the VT AM SEND macro, passing an RPL
(request parameter list), which points to the data that is to be sent.

2.

The SEND macro branches to a VTAM interface routine (ISTAICIR).

3.

1STAICIR determines that this is a non-authorized request and issues the
VTAM SVC. This is a type 1 SVC (SVC 124).

4.

The type 1 SVC routine (1ST APC22) obtains an MQL (MPST queueing
element), places the address of the user RPL in it, and issues the TPQUE macro
to queue the MQL to the TPIO PAB for the application's address space.

5.

The TPQUE macro (normally) issues the TPSCHED macro in order to
schedule the TPIO PAB.

6.

The TPSCHED macro invokes ISTAPC32, which queues the TPIO PAB to the
memory process scheduling table (MPST), and schedules an SRB to execute
ISTAPC55.

7.

ISTAPC32 returns to ISTAPC22.

8.

1STAPC22 issues a Type 1 exit back to ISTAICIR.

9.

1STAICIR determines if the request was synchronous or asynchronous. If it
was synchronous, it issues the WAIT macro. If it was asynchronous, it returns
control to the application program.

10. When the SRB is dispatched, ISTAPC55 de queues the TPIO PAB from the
MPST, obtains a component recovery area (CRA) from the large pageable
(LP) pool and passes control to 1STAPC57 .
11. ISTAPC57 formats the request parameter header (RPH) (within the CRA),
dequeues the MQL from the TPIO PAB, and passes control to 1ST APC23.
12. ISTAPC23 releases the MQL, obtains a Copy RPL (CRPL) from the CRPL
pool and copies the user RPL into it.
13. ISTAPC23 then issues the TPQUE macro to queue the CRPL to the control
layer outbound PAB in the appropriate FMCB and schedules control layer
processing.

Appendix A: Process Flows

A.S.I

VT AM Process (continued)
Application's Address Space
Task Mode

I

Application's Address Space
SRB Mode

Any Address Space
Disabled Mode

I/O interrupt

\
VS2 Dispatcher
Via SRB

~

I

Exits to VS2 Dispatcher

I

I
I

I
Figure A-S. VTAM SEND Process Flow

A.S.2

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VTAM Process (continued)
14. ISTAPC23 then issues the VTAM TPE2'IT macro, which passes control to
ISTAPC31.
15. 1STAPC31 recognizes that there is more work to do (control layer processing)
and passes control to ISTAPCS7.
16. 1STAPCS7 reformats the RPH (within the same CRA) for processing by the
control layer .
17. ISTAPCS7 then passes control to the control layer (ISTDCCOO).
18. ISTDCCOO recognizes that this is a SEND request, obtains a logical channel.i
program block (LCPB) from the CRPL pool, and invokes ISTRCC22.
19. ISTRCC22 sets up the logical channel command words (LCCWs) in the
LCPB from the options in the CRPL and issues the TPQUE macro to queue
the LCPB to the TPIOS outbound PAB in the FMCB and schedule TPIOS
processing.
20. ISTRCC22 then passes control to ISTCDDOO, which issues the TPEXIT macro.
21. The TPEXIT macro passes control to 1ST APC31, which recognizes that there
is more work to do (TPIOS processing) and passes control to 1STAPCS7 .
22. 1STAPCS7 reformats the RPH (within the same CRA) for TPIOS processing
and passes control to ISTZAFIB.
23. Within TPIOS, ISTZDFAO allocates the fIxed I/O buffer; ISTZDFCO and
ISTZDFDO move the user data to the I/O buffer.
24. Once the data is moved from the user's buffer, TPIOS invokes a routine
(ISTRCFYO) which calls 1STAICPT. 1STAICPT copies the CRPL back to the
user's RPL, frees the CRPL, and POSTs the ECB complete.
25. ISTRCFYO then frees the LCPB and returns control to TPIOS.
26. TPIOS then invokes ISTZEMBB, which obtains the UCB lock for the 370S
and checks the ICNCB (intermediate controller node control block) to see
if there is an active channel program currently executing for the 3705.
27. If the 3705 is busy, ISTZEMBB queues the I/O buffer to the ICNCB write
queue, releases the UCB lock, and returns to TPIOS. (Go to step 29.)
28. If the 3705 is not busy, ISTZEMBB calls ISTZEMAB, which issues the
STARTIO macro to lOS ana then returns to ISTZEMBB, which returns to
TPIOS. The 10SB, which is the "interface to lOS, physically resides within the
ICNCB.
29. After ISTZEMBB returns to TPIOS, TPIOS issues the TPEXIT macro, which
invokes 1STAPC31.

Appendix A: Process Flows

A.S.3

VTAM Process (continued)
30. 1STAPC31 recognizes that there is nothing more to do and calls 1STAPC58.
31. ISTAPC58 frees the CRA and exits to the VS2 dispatcher.
32. Sometime later, an I/O in terrupt occurs as a result of the write channel
program completing.
33. lOS passes control to the VTAM DIE (disable interrupt exit) (ISTZFM3B).
34. ISTZFM3B frees the I/O buffer and returns to lOS, indicating that POST
STATUS should not be scheduled.
35. lOS exits to the dispatcher.

A.S.4

OS/VS2 System P~ogramming Library: MVS Diagnostic Techniques

TSO

Following are some of the more important processes involved with the
TSO/TIOC/TCAM interface portion ofMVS. The processes are:
• Time Sharing Initialization
• LOGON Processing
• TSO Line Drop Processing
• TMP and Command Processor Interface
• TSO Command Processor Recovery
• TSO Terminal I/O Overview
• TSO/TIOC Terminal I/O Diagnostic Techniques
• TSO Attention Processing

Time Sharing Initialization
The system operator issues the MODIFY command (F TCAM, TS=START) to
initialize the time sharing system. Terminal I/O control (TIOC) logic is
documented in OS/VS TeAM Level 10 Logic.
The major functions that occur during time sharing initialization are:
1.

The SYS 1.PARMLIB member IKJPRMxx is read to determine the TIOC buffer
size and number, the maximum number of time sharing users allowed to be
logged on at one time, and thresholds for the maximum number of TIOC
buffers a single user can use at one time.

2.

The main control block for the time sharing system (TIOC reference table TIOCRPT) is initialized. This control block points to the free queue of TIOC
buffers and has status flags indicating whether the system is in an LWAIT (out
of TIOC buffers). The TIOCRPT also points to a pool of terminal status
blocks.

3.

The pool of terminal status blocks (TSBs) is built. The number of TSBs is
determined by the maximum user parameter in IKJPRMxx. A TSB is
assigned to a user during logon processing. The TSB connects the ASCB of
the user to the terminal-name table entry of the terminal. From the terminalname table entry, TCAM can locate the terminal table entry for the user and
hence the address of the destination QCB. The TSB contains input and output
queues for TIOC buffers that are used by the time sharing user.
The TSB also contains status indicators that record whether the user is in an
input wait (TGET issued and no TIOC buffer on TSB input queue) or an
output wait (maximum number of TIOC buffers used for output).

Appendix A: Process Flows

A.6.1

TSO (continued)

Terminal User Issues LOGON

, - - -- -- -- -- -- --I-;a-;;S-;;;'':;;;--'
I

Address Space

I

I

I
I
I

I
I
II

I
TIOC

ATTACH

SVC34
'LOGON'

IEDAY3
TIOC
LOGON
SYNC

POST

IEDAYL
and
IEDAYLL

1

I
I
I

I

I

TeAM Address Space

I

---1

L------r----. _I_-

I

I

I

I
I

I
LOGON
Processor

XCTL
STC

New User Address Space

Note: Details of this process are shown in
part 2 of this figure.

Figure A-6. Overview of Logon Processing (Part 1 of 2)

A.6.2

I

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
LOGON Scheduling
IKJEFLA
STC

XCTL

-

Installation
Exit

Logon
Initialization

I

ILiNK

IKJEFLB

IKJEFLE

IKJEFLC

Logon
Scheduler
Router

ATTACH

Logon
Monitor

LINK

Logon
Verification

Calls Job
Scheduling
Subroutine

POST

Schedules
Session

CALL

Logon
Information
Routine

LINK

Pre-TMP
Exit

ATTACH

TMP
Issues
"READY"
Message

IKJEFLH

IXCTL
IKJEFLJ

IEESB605
Job
Scheduling
Subroutine
(STC)

Figure A-6. Overview of Logon Processing (Part 2 of 2)

Appendix A: Process Flows

A.6.3

TSO (continued)
4.

The TIOe buffer pool is bui~.t. The number and size of the buffers is
determined from IKJPRMxx .. If no parmllb member was specified on the
MODIFY TeAM command, SYS1.PARMLIB is searched for the default
parmlib member name - IKJPRMOO. If this member is not found, standard
default values are used. .

S.

The 'TSO HAS BEEN INITIALIZED' message is issued (via WTO).

LOGO N Processing
The major functions of LOGON processing are:
1.

TeAM handles line I/O and routes the buffer to the TSO message
handler. The message handler routes the buffer to various functional
routines. One of these is logon.

2.

The logon routine receives control from the TSO message handler as a result
of the expansion of the LOGON macro. Logon routes the buffer to TSINPUT
s'o that logon scheduling may retrieve it via a TGET sve. TIOe logon then
issues an sve 34 to notify the master scheduler that logon processing should
be started. TIOe then issues QTIP 10 to initialize control blocks. Note: QTIP
is the TIOe code invoked when sve 101 is issued. It performs functions
rel(ited to communication between the TeAM and TSO user address spaces.
The specific function it is to perform is indicated by an entry code (for
example, QTIP 10). A table of entry codes, their callers, the functions
performed, and the modules that provide the function is contained in OS/VS
TeAM Levell 0 Logic.
QTIP issues an XMPOST to inform the master scheduler that TIOe
initialization is complete and that memory create may begin. TIOe then
returns to the message handler for final buffer disposition. If logon
fails or is terminated, TIOe is notified so that the appropriate error
message can be issued.

3.

TSINPUT invokes QTIP to move the contents of the TeAM buffer to the
TIOe buffers. This data can then be accessed using TGET services.

4.

The master scheduler recognizes that a logon has been requested and attaches
TIOe synchronization. This routine waits until QTIP signals with a post that
memory create can begin. Once an address space has been initialized for the
logon request, the region control task is the first task to be dispatched.

S.

Region control establishes an ESTAE routine, attaches the dump task, attaches
started task control, and waits for one of the following:

• An attention request signaled by TIOe via XMPOST
• A swap request signaled by SRM
• A termination request

A.6.4

OS!VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
6.

Started task control recognizes that logon is requested and passes control to
logon initialization (IKJEFLA).

7.

Logon initialization opens the DADS and broadcast data sets, initializes
control blocks, and calls logon scheduling (IKJEFLB).

8.

The logon load module contains four service modules. One,IKJEFLPO,
contains the default values for the number of seconds requested between
'LOGON PROCEEDING' messages and the number of logon attempts allowed
before automatic logoff. Both values are sysgen options on the TSO macro.
The logon scheduler attaches the logon monitor (IKJEFLC). The scheduler
and monitor now begin parallel processing. WAITs and POSTs are used when
synchronization is required.

9.

The logon monitor (IKJEFLC) builds the environment control table (ECT),
sets the first element of the input stack to indicate terminal input, and links to
logon verification.

10. Logon verification (IKJEFLE) calls the user's pre-prompt exit if it was coded.
Logon verification makes the· following checks:
• Determines (via ENQ) if the userid is in use.
• Checks the user's password, account number, and procedure name.
• Checks the.performance group requested in the LOGON command.
Logon verification prompts the user for missing parameters if required
parameters do not have defaults in the DADS. After all required parameters
have been obtained, verification builds the JOB and EXEC statement images
for the session. The EXEC statement contains the name of a logon procedure
specified in the DADS or the LOGON command.
11. Logon verification posts the logon scheduler when the parameters are complete
and the job can be scheduled. The scheduler's job now is to cause the broadcast messages to be listed at the terminal at the same time that the user's job is
being scheduled. To do this, it posts the monitor task and then XCTLs to the
initiator, passing it the JCL that has been created.
12. The logon monitor regains control when signaled by the logon scheduler,
attaches the LISTBC command processor to write broadcast messages to the
terminal, and then waits for a post from a special initiator logon routine.
This post signals that final processing can be completed.
13. The initiator uses the TSO internal reader to send the logon job to JES2.
JES2 reads the user's procedure from the procedure library specified by the
&TSD job class parameter and changes the JCL to internal text. This is placed
on the spool data set. Once this processing has completed, the initiator
requests the user's job by ID and completes initiation and allocation. Initiation
finally gives control to a special TSO routine (pre-TMP exit, IKJEFLJ). This
routine posts the logon monitor and issues a WAIT. The logon monitor then
terminates. This causes the initiator task to regain control. The logon monitor
is then detached. Once the monitor is detached, the initiator attaches the
TMP and waits.

Appendix A: Process Flows

A.6.S

TSO (continued)
14. The TMP (specified as IKJEFTO 1 in the LOGON PROC on the EXEC
statement) performs initialization and then issues a PUTGET to write the
'READY' message a~d request a command from the user. This PurGET
results in a TPUT to send READY and a TGET to request terminal input.
The user is now in an input wait. This signals SRM to perform a swap-out
until input is available.
Figure A-7 shows TCAM's organization after a TSO logon. The following are
detailed descriptions of the logon process including information on control block
manipulation. The numbers in parentheses correlate to the numbers in the
preceding summary of the logon process.
TIDe Logon Processing (2):

• Checks the maximum user count in TIOCRPI'.
• Issues SVC 34 'LOGON'.
• . Places the returned ASID in the QCB for this line.
• Calls QTIP (entry 10) to fmd and initialize the TSB.
- Puts TSB address in the ASCB for the user's address space.
Puts the ASCB address in the TSB.
- Updates the user count.
- Puts the UCB address in. the TSB.
- XMPOSTs 'TIOC SYNC'.
• Sets the QCB to indicate TSO.
• . Pass the logon message buffer to TSINPUT QCB (which is now available to
system logon processing via GETLINE).

A.6.6

OS!VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
Common Storage
CVT

Asce

ASVT

TS buffers

(with data)

TCAM's Address Space
TSINPUT ace

,

MCP

H

"

/

TeAM buffers

MH

7

.............

"'" ."

\

,

\

4 TSINPUT ace I
-TSO
- User ASID

Figure A-7. TeAM Organization After a TSO Logon

Appendix A: Process Flows

A.6.7

TSO (continued)

Logon Initialization (IKJEFLA) (7): Logon initialization uses the address;
of the ASCB as input and does the following:
• Ensures SYSI.UADS and SYSI.BRODCAST data sets are allocated.
• Gets the LWA (logon work area) from the LSQA. (See Figure A-8.)
• Puts the LWA address in the ASXB.
• Gets the JSEL Gob scheduling entrance list) from the LSQA.
• Puts the CSCB and ASCB addresses in the JSEL.
• Gets the JSXL Gob scheduling exit list) from the LSQA.
• Puts the LWAaddress in the JSXL. JSXL contains pointers to the PRE·TMP,
POST·TMP, and PRE·FREEPART exits.
• Puts the JSXL address in the JSEL.
• Gets the UPT (user profile table) from sub pool 230.
• Issues BLDL for the installation exit routine (Release 2 only).
• Gets the PSCB (protected step control block) from subpool 230.
• Puts the PSCB address in the LWA.
• Puts the UPT address in the PSCB.
• Gets the re-Iogon buffer from subpool 230.
• Puts the re-Iogon address in the PSCB.
• Calls the logon scheduler router.

Logon Scheduler Router (IKJEFLB) (8):
• Frees subpool O.
• Attaches the logon monitor.
• }>osts the monitor with the 'schedule' code.
• Waits for the 'what to do' post from the monitor.

Logon Monitor (IKJEFLC) (9):
• Switches the storage key to '8'.
• Gets the STAX work area from sub pool 1.
• Gets the ECT (environment control table) from subpool 1.
• Puts the ECT address in the LWA.
• Invokes the STACK macro (input is to come from the terminal).
• Gets the new CSCB (command scheduler control block) from the SQA.
• Sets the CSCB to indicate the job is:
- swapp able
- terminal job
- cancellable
- TSO

A.6.S

OS!VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)

ASXB

. X'14'

~

LWA
Logon work area
"LWA"

PSCB

ECT
LOGON ECB

t RLGB

PROMPT ECB

30

SCHED ECB

34 • UPT

PROMPT ECB

o
4

1-:-------.
..
,

*The logon work area (IKJEFLWA) is a 148-byte area that is created by IKJEFLA
and is pointed to by ASXB and JSXL. It contains control block pointers, entrance
lists, and parameter lists that are required for logon/logoff.

Figure A-S. Logon Work Area

Appendix A: Process Flows

A.6.9

TSO (continued)
•

Gets the local and CMS locks.

• Puts the CSCB address in the ASCB.
• Frees the local and CMS locks.
• Calls MGCR to remove the old CSCB from the chain.
• Puts the new CSCB pointer in the JSCB and JSEL.
• Calls MGCR to add the new CSCB to the chain.
• Issue the STAX macro to set up attention handling.
• Unks to logon verification.

Logon Verification (IKJEFLE) (10):
• Calls the installation exit (if necessary).
• Issues GETLINE or uses the installation supplied buffer containing the logon
parameters.
• Calls the command scan service routine to ensure that input is the LOGON
or LOGOFF command (assumes LOGON).
• Calls PARSE for logon parameter parsing.
• Indicates no password required for the UADS.
• Issues ENQ on the UADS (prevents the ACCT CP from changing UADS).
• Opens the UADS.
• Issues FIND for the userid member (userid is taken from the logon parameter).
• Places the userid in the PSCB.
• Posts the logon scheduler.
• Waits for the post from the logon scheduler.

Logon Scheduler (IKJEFLB):
• Enqueues (via ENQ) on SYSIKJUA.USERID.
• Posts logon verification.
• Waits for logon verification.

Logon Verification (IKJEFLE):
• Dequeues (via DEQ) from UADS.
• Puts the userid in the CSCB.
• Puts the userid in the ASCB.
• Enqueues (via ENQ) on UADS.
• Finds userid member.
• Dequeues (via DEQ) on UADS.
• Reads UADS.
• Issues check.

A.6.10

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
• Places' the parameter in the proper control block.
• Places the password in the TSB.
• Places the procname in the CSCB.
• Places the region size in the PSCB.
• Informs SRM of the performancegroup.
• Builds the JCL:
/'/USERID JOB 'account#', REGION=region size
/ /procname EXEC procname, PERFORM=performance group
• Issues the 'LOGON IN PROGRESS' message to the terminal.
• Closes the UADS.
• Clears 'NO PASSWORD' in the JSCB.
• Dequeues (via DEQ) from UADS.
• Posts the logon scheduler to schedule the session. (11)
• Waits for the logon scheduler.
• Sends the broadcast messages (via the information routine). (12)
• Issues the 'LOGON IN PROGRESS' messages until posted by the initiator.
• Frees subpool 78.

Logon Scheduler (IKJEFLB) (11):
• Sets up the interface to JSS.·
• Posts the logon monitor.
• XCTLs to JSS (initiator).

Job Scheduling Subroutine (IEESB605) (13):
• Calls the PRE-TMP exit.

PRE·TMP Exit (IKJEFLJ):
• Posts the monitor task to terminate.
• Moves the PSCB from (unaccountable) subpool 230 storage to (accountable)
subpool 252 storage. The PSCB address is placed in the active JSCB.
• Moves the UPT and the re-Iogon buffer to 0 (allows updating by CPs).
• Returns to the initiator.

Initiator:
• Attaches TMP (PARM='xxx ... ' is passed).

TMP(I4):
• Issues "READY" message.
• Requests terminal input.

Appendix A: Process Flows

A.6.11

TSO (continued)

LOGON Scheduling Diagnostic Aids
The following two figures contain information that can be used for diagnosing
problems that occur during logon scheduling. '
Field Name
and Contents
LWAINX1
LWALA
LWALB
LWALC
LWALE
LWALEA
LWALI
LWALH
LWALL
LWALGM
LWALJ
LWALK
LWALG
LWALGB
LWALS
LWALTBC
LWAMCK
LWAPCK
LWAPHASE

Name of
Executing Module
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1
=1

=0

LWAPHASE =1
LWAPSW
LWATNBT

Common Name of Module

IKJEFLD
IKJEFLA
IKJEFLB
IKJEFLC
IKJEFLE
IKJEFLEA
IKJEFLI
IKJEFLH
IKJEFLL
IKJEFLGM
IKJEFLJ
IKJEFLK
IKJEFLG
IKJEFLGB
IKJEFLS
IKJEFLH
IKJEFLGB
IKJEFLGB
Any LOGON module
except IKJEFLH
IKJEFLH
IKJEFLGB
IKJEFLG

Installation Exit (written by installation)
LOGON Initialization
LOGON Scheduling
LOGON Monitor
LOGON/LOGOFF Verification
Parse/Scan Interface
Installation Interface
LOGON Synchronizer
LOGO F F Processi ng
LOGON Message Handler
Pre-attach Exit
Post-attach Exit
Attention Exit
LOGON Monitor Recovery
LOGON Scheduling Recovery and Retry
Mail and Notices Processing
ABEND was a machine check
ABEND was a program check
LOGON/LOGOF F Verification
LOGON Synchronizer
Console Restart key depressed
Attention Routine

Figure A-9. LOGON Work Area Bits That Indicate the Currently Executing Module

A.6.12

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)

Module
'Issuing
POST
IKJEFLB

IKJEFLC

IKJEFLE

Module
Being
.Posted

Location
of
ECB

IKJEFLC

field
LWASECB
in LWA

IKJEFLB

IKJEFLB

field
LWAPECB
in LWA

field
LWAPECB
in LWA

Condition of
Post Module Issuing
Code POST

Action Taken by
Module Being
Posted

16

Ready to invoke job
scheduling subroutine
(lEESB605).

Invoke LOGON informati on routine
(lKJEFLH).

24

Terminating for
LOGOFF or for
unusual termination
of LOGON monitor
(lKJEFLC)'

Perform clean-up
operations and
terminate.

12

Termination or
attention requested.

Issue DEQ on user
identification.

16

Verified and processed Schedule a terminal
the LOGON
session.
parameters.

24

Processing a LOGOF F Terminate.
command.

8

Authorized the user
identification.

Issue ENQ on user
identification.

12

Error processing.

Issue DEQ on user
identification.

IKJEFLJ

IKJEFLH

field
LWASECB
in LWA

20

Detects that the
initiator is ready to
attach the TMP.

Finish LlSTBC
processing; return
to caller.

IKJEFLH

IKJEFLJ

field
LWAPECB
in LWA

20

Finished LlSTBC
processing.

Terminate so the
initiator can attach
theTMP.

Figure A-I O~ LOGON Scheduling Post Codes

Appendix A: Process Flows

A.6.13

TSO (continued)

TSO Line Drop Processing
The following description corresponds to the overvieW of line drop processing
shown in Figure A-II.

IEDA YH (Part of TCAM MCP):

• Gets control from the TCAM dispatcher when either of the following occurs:
A hang up on a monitoring channel program or a message
generation.
Each input or output message ends.
• Tests for and handles several kinds of errors. If it discovers the line has dropped,
it begins terminating the user. Each of the following is considered a line drop:
Entry because of a hang up on a·monitoring channel program or a message
generation
A 3705 control unit error -'- indicated in the SCB (station control block)
A permanent terminal error - indicated in the SCB
A countable error and an appropriate number of retries have been done indicated in the SCB
• If a line drops, issues a QTIP 4 (SVC IO I, entry code 4).

QTIP4 (IEDAYHH):

• TSBHUNG=l.
• Issues QTIP 28 to free the TCAM buffers.
• If the reconnect time limit is 0 (in TIOCRPT), then branch enters SIC (systeminitiated-cancel IKJEFLF) with code 622; upon return, returns to caller.
• For a non-O RECONLIM:
- Sets TSBMINLequal to the reconnect time limit.
IfTIOCTECB (in TIOCRPT) is posted, then increases the value in TSBMINL
by one. Otherwise, posts TIOCTECB (which IEDA Y802, running as a
subtask of TCAM, is waiting for).
• Returns.

A.6.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
LINE DROP IN TSO ENVIRONMENT
·TeAM Address Space
TCAMMCP
SVC 101
IEDAYH

POST

...

-

QTIP4

~

t--

IEDAY802
Subtasks
of TCAM

CALL

-

-

SIC
IKJEFLF

I

.- _______ ---1
USER Address Space

I

SCHEDULE
SRB
I

,--POST

SIC (SRB)
IKJL4TOO

J

t
SVC
INITIATOR*

-

CALL

SVC34

~

'RETURN

~

TMP

- -

CALLRTM

I

I
I
______
.
_ _ ---..J
ABEND

Command
Processor

*Upon return, continues with
normal logoff.

Figure A-It. Overview ofTSO Line Dump Process

Appendix A: Process Flows

A.6.IS.

TSO (continued)

IEDA Y802 (subtas~ of TeAM): I{e'eps track of users whose lines have dropped
and, if the time limit' expires before they come back, terminates the address space.
IEDAY802 does the following:
• Waits for TIOCTECB.
• Sets the one-minute
timer.
I
• Invokes QTIP 27 (IEDAY88) SVC 101, entry 27 which scans the TSBs for
TSBHUNG=l and TSBMINLfO,

If so, QTIP 27 decreases TSBMINL by 1.
If TSBMINL is now 0, QTIP 27 branch enters SIC (system-initiated-cancel)
with code 622.
QTIP 27 returns a code of 0 if any users have time left or a code of 4 if all
users have been cancelled.
• If the return code is 0, IEDA Y802 goes to the one-minute timer.
• If the return code is 4, IEDAY802 waits for TIOCTECB.

SIC (system-initiated cancel):
IKJEFLF schedules an SRB in the address space to be terminated, passes a
completion code (622 for line drop), and returns to the caller.
IKJL4TOO runs under the SRB scheduled by IKJEFLF and gets control the next
time the address space is dispatched. IKJL4tOO does the following:
• If TMP is in control, skips to POST.
• Issues STATUS STOP for TCB= (IWAIT/OWAIT dispatchability bits).
• Issues QTIP 24, which sets TSBCANC=l and removes aWAIT for other address
spaces TPUTing to this user.
• POST cancels the ECB in the CSCB (IKJL4 TOO branch enters POST with
completion code 622). The initiator (IEFSD263) waits for the ECB while TMP
is in control.
• If TMP is not in control, issues STATUS START for the logon scheduler and
monitor tasks.
• Exits.

Initiator (IEFSD263):
• Waits for the CANCEL ECB and ATTACH ECB of the TMP task.
• When the CANCEL ECB is posted, issues SVC 34 to abnormally terminate the
user.

A.6.16

OS!VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
SVC 34:

• Issues CALLRTM, which sets the resume PSW of the TMP task to point to an'
SVC D instruction and forces the TMP task to be dispatchable.

SVC D (RTM2):

• Oversees the termination of the TMP task and all daughter tasks.
• When the TMP task terminates, its attach ECB is posted, giving the initiator
control again.

Initiator:
Processing continues the same as for normal logoff except:
IKJEFLK, the POST~TMP exit module, issues QTIP 24.
IKJEFLC issues the "session cancelled" message before the logon scheduler
XCTLs to the STC termination.
If the line drops, IEDAY8, the TIOC resource manager, does not force the
remaining messages out.

TMP and Command Processor Interface
The following is a description of the TMP and command processor flow.
1. The TMP is attached by the initiator as a result of a logon command from a
teJ;minal user or the execution of a batch job. Logon initialization establishes
the s1 AE environment to handle abends and the STAX exit to handle
attention interrupts.
2.

The TMP mainline routine receives control and determines which buffer to
obtain. This can be either:
a. The logon buffer (from PARM= on the EXEC statement of the logon
procedure)
b. The command buffer, as a result of a PUTGET
c. The buffer obtained by the attention prolog
d. The buffer obtained by the STAI exit

3.

If the current input is the command buffer, the TMP must check for five
special cases as follows:
'
a.

PUTGET is responsible for checking for a '?' in the first buffer position in
response to a mode message. When one is detected PUTGET immediately
issues the next available second-level message. This TMP should never
receive a '?' in a buffer, but if the user enters a '~?' (blank ?), PUTGET
lpasses the buffer through to the TMP.

b.

A nul11ine.

Appendix A: ProcessFlows A.6.17

TSO (continued)
c.

TEST command without operands.

d.

TIME command.

e.

If scan determines that the data in the buffer is not one of these special
cases and that the data begins with an alphabetic character and is less
than eight bytes, the TMP issues an ATTACH for the command name.
Prior to ATTACH processing a search is conducted (through MLPA, LPA,
joblib, LNKLSTxx, respectively) to assure a successful ATTACH. If the
ATTACH is not successful, the TMP assumes a CLIST and attaches the
EXEC CP to search the user's command procedure library. If the TMP
does not locate either a command or a command procedure whose name
is the same as that found in the input buffer, a 'COMMAND NOT
FOUND' message is issued to the terminal.

4. If the command processor was attached, the TMP waits for an ECB list
containing the following ECBs:
a.

STAI ECB:The'TMP's STAI exit routine posts this when a command
processor abnormally terminates and does not recover with its own ST AE
routine.

,b. Attention exit ECB: The TMP's attention exit routine I!osts this when it
gains control. It gains control when the user enters an attention interrupt
and the TMP exit is the current level exit. For more details, see the
discussion of "TSO attention processing" later in this chapter.

A.6.18

c.

STOP/MODIFY ECB: this ECB is posted if a stop userid is requested
by the system operator.

d.

Command processor ECB: this is the ECB specified in the attach of the
command processor. It is posted when the processor terminates.

5.

If the command processor ECB is posted, the TMP repeats step 2 to
determine what' action to take.

6.

If the attention exit or STAI ECB is posted, the TMP does one of the
following:
a.

If a ~~?' was entered in response to the mode message, the TMP sends
second level messages to tl;te terminal.

b.

if a null line was entered, TMP returns control to the command processor.
If an attention interrupt occurred, the TMP continues normal processing.
If an abend occurred, the TMP takes a dump.

c.

If TEST was entered without operands, the TMP links to TEST and places
the interrupted command processor under test control. When TEST
processing is ended, the TMP aetaches the current command and prompts
the user with a 'READY' to enter a new command.

d.

If the TIME command was en tered, the TMP displays the curren t time. and
prompts the user for anew command. In this case, the user can exercise
any of the preceding options or enter a new command.

c.

If the user enters a new command or exercises one of the preceding
options, the TMP detaches the current command and issues a PUTGET
requesting new input.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
The following common control blocks are used for communication among the
TMP, command processors, and service routines (PUTGET, PARSE, etc.):

IKJTMPWA (TMP Work Area)
Created by:

IKJEFTOI

Length:

1076 bytes

Pointed to by:

TMPWAPTR,WORKAPTR

Function: Provides communication among
TMP modules. Contains register save
areas, parameter lists for TEST and TMP,
ABEND exit routines, and mappings of
macros commonly used by TMP modules.

TPL MAPPING

3C
14C
158

,

CPPL

,

ECT

168
• PSCB

170

IKJCPPL (Command Processor Parameter List)
Created by:

IKJEFTO 1

Length:

16 bytes

Pointed to by:

Register 1

• UPT
2E4
2E8
2EC
2FO
2F4

(CPPL), CBUF

~

UPT

• PSCB
ECT

~

Function: Provides parameters for the
command processor.
334
(ECT)

IKJECT (Environment Control Table)

338

33C

Created by:

IKJEFTOI

Length:

40 bytes

Pointed to by:

TPL,CPPL

Function: Provides communication among
the TMP, CP, and service routines.
Contains current command/subcommand
names, pointers to work areas and
second-level message chains, and return
codes.

340

~ IOWA

I' SMSG
PRIMARY
COMMAND

348
SUBCOMMAND

350

Appendix A: Process Flows

A.6.19

TSO (continued)

IKJPSCB (Protected Step Control Block)
Created by:

IKJEFLA

Length:

72 bytes

Pointed to by:

LWA,CPPL

Function: Contains information from
UADS, control bits, and accounting
data for the user ID. (This accounting
data is controlled by the installation
via the ACCOUNT command.)

PSCB

0

User 10

8

30
34

~

RLGB

•

UPT

IKJRLGB (Re-Logon Buffer)
Created by:

IKJEFLA

Length:

264 bytes

Pointed to by:

PSCB

Function: Contains the LOGON/LOGOFF
command entered at the terminal at
th~ end of the session.

RLGB

X'100'M----------I
• ECT

IKJUPT (User Profile Table)
Created by:
. Length:
Pointed to by:

24 bytes

C
User
Environmental
Switches

PSCB,CPPL

Function: Contains information
stored in UADS that is used by
LOGON/LOGOFF, the TMP,
and the command processors.
(This information is all
controlled by the installation
via the PROFILE command.)

A.6.20

UPT

IKJEFLA

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Line
Line
Delete Delete
Char
Char

10
~

18

DSNAME
Prefix

r--=

1S0 (continued)

TSO Command Processor Recovery
The following describes IBM's TSO command processors. Figure A-12 summarizes
their recovery activity.

ACCOUNT
The STAE exit routine for ACCOUNT flushes the input stack and posts the
ACCOUNT ECB before returning to continue abend processing. ACCOUNT
attaches the HELP command processor, specifying for a STAI exit routine the
same name as the STAE exit routine.

EDIT
The ESTAE exit routine for EDIT flushes the input stack, stops automatic line
prompting, and frees any acquired storage still remaining. The EDIT work area,
mapped by IKJEBECA, can be located in a dump to obtain certain data on the
EDIT session. The pointer to the communication area is passed between routines
in register O. By convention, most routines keep the pointer in register 9 during
. execution. A description of IKJEBECA can be found in the data areas microfiche

(OSjVS2 Data Areas).

LOGON
The ESTAE exit routine for LOGON de queues from the userid, closes the UADS
data set, and detaches IKJEFLC. The LOGON work area, mapped by IKJEFLWA,
can be located in the dump (field ASXBLWA in the ASXB) to obtain certain
information on the session. A description of IKJEFLWA can be found in the data
areas microfiche.
LOGON also has an EST AI exit routine, which dcqueues from the userid,
closes the UADs data set, cancels the attention exit, and frees subpools 1 and 78.

OPERATOR
The STAE exit routine for OPERATOR stops all active monitor function if
the abend is caused by a DETACH with ST AE. OPERATOR also has a ST AI
exit routine that is the same name as the STAE exit routine.
The SVC 100 parameter list, mapped by IKJEFFIB and passed to the
OPERATOR command processor, can be located in the dump and certain data on
the session can be obtained. A description of IKJEFFIB can be found in the data
areas microfiche.

. Appendix A: Process Flows

A.6.21.

TSO (continued)

OUTPUT
Before returning to continue abend processing, the ESTAE exit routine for
OUTPUT closes any data sets that are being processed. The OUTPUT work
area, mapped by IKJOCMTB, can be located in a dump (while OUTPUT is in
control) and certain data on the session can be obtained.
OUTPUT attaches the HELP command processor specifying a STAI exit routine.
The ST AI exit routine simply returns to continue abend processing.

SUBMIT
The SUBMIT command processor runs under the ST AI environment established by
SVC 100. This STAI routine closes the INTRDR data set before it returns to
continue abend processing. The SVC 100 parameter list, mapped by IKJEFFIB
and passed to the SUBMIT command processor, can be located in the dump and
certain data on the session can be obtained. A description of IKJEFFIB can be
found in the data areas microfiche.
Command
Processor

STAEI
ESTAE

ACCOUNT

STAE

STAll
ESTAI

RETRY

SDUMP

LOGREC

IKJ565541

STAI
EDIT

ESTAE

LOGON

ESTAE

Messages
IKJ565541

I
I
ESTAI

See Note 1

,;

IKJ564521

I

IKJ56451I
IKJ564521
I KJ56406 I

OPERATOR

STAE
STAI

OUTPUT

,j

IKJ550041

I

IKJ550041
See Note 2

ESTAE

See Note 3

IKJ563181

STAI
SUBMIT

STAI

IKJ562941

Notes:
1. Abend codes B37, D37, and E37 point to I KJ524271, I KJ524281; the others point to
I KJ524221. I f the data set is modified, abend codes poi nt to I KJ525551.
2. SDUMP is issued for all abends except for DETACH with STAE, codes 437, 913, and
422.
3. LOGREC is written to except for DETACH with STAE.
4. An effective trapping and problem solving technique for TSO command processors is to
stop the error processing in the appropriate error recovery routine.

Figure A·12. Summary of Command Processor Recovery Activity

A.6.22

OS/VS2 System Program~ing Library: MVS Diagnostic Techniques

TSO (continued)

TSO Tenninal Input/Output Overview
Terminal I/O flow is divided into two parts: input flow and output flow. This
overview highlights each at the SVC level.
TS/TCAM uses the services of three SVCs to communicate between the user's
address space and the TCAM address space:

1.

TGET/TPUT (SVC 93): The TMP and command processors issue this SVC to
move data from the user's buffer to an interface buffer in CSA (TIOC buffer).

2.

QTIP(SVC 101): This SVC is a set of multipurpose routines that perform
functions for both the user address space and the TCAM address space. For
example, QTIP is used by TCAM to move data from a TCAM buffer to an
interface (TIOC) buffer and is also used by J'GET/TPUT to move data from a
user's buffer to a TIOC buffer.

3.

STCC (SVC 94): This SVC is a set of routines used to update TCAM control
blocks from the user's address space. For example, the user can use the
terminal command to change a terminal characteristic. This is
communicated to TCAM via SVC 94.

TS/TCAM data flow also requires a logical connection between a terminal, a
line, and an address space. This is accomplished as follows:
•

The terminal macro in the user's MCPestablishes the connection between a
terminal name and it destination (destination QCB).

•

At TCAM initialization, OPEN establishes the connection between the
destination and a physical terminal (a line control block is connected to the
terminal name table via an index into the table).

•

Logon processing establishes the connection between the destination QCB and
the user's address space (the destination QCB contains the ASID of the user
and the user's terminal status block (TSB) contains an index to the TCAM
terminal name table). Also, a user's TSB and ASCB point to each other. The
station's control block contains the address of the TSINPUT QCB.

Terminal I/O flow also requires the use of two special TCAM subtasks:
TSINPUT and TSOUTPUT. TSOUTPUT acts as the router for all messages
coming from time sharing users. TSOUTPUT is responsible for editing output
messages as it moves the data from the time sharing interface buffers (TIOC
buffers) in CSA to the TCAM buffers in the TCAM address space. Once
TSOUTPUT has moved data to the TCAM buffers, the buffer is routed to the
output side of the message handler and then written to the terminal.
TSOUTPUT also runs as a subroutine of TCAM. TSOUTPUT is the first
subroutine in control of the disk I/O QCB in a TCAM system that supports time
sharing.

Appendix A: Process Flows

A.6.23

TSO (continued)

Terminal Output Flow
Assume that a user has logged on, the TMP has been initialized, and a PUTGET has
been issued by the TMP to put out a 'READY' message and request input from the
terminal user. The following now occurs:
1.

2.

The TMP invokes the services of thePUTGET service routine, which issues a
TPUT and then a TGET (both SVC 93s). TPUT performs the following basic
functions:
a.

Obtains a TIOC buffer from the pool of free buffers. If a buffer is not
available or the user has passed the output buffer limit (OWAITHI
,parameter in IKJPRMOO), the user is placed in an output wait (the
appropriate flag is set in the TSB).

b.

If a buffer is available, the 'READY' message is moved from the user's
buffer to the TIOC buffer.

c.

The user's terminal status block is placed on TCAM's asynchronous ready
queue. (A siJecial element at TSB + X'40' is used.)

d.

An XMPOST is done to alert TCAM.

e.

Control is returned to PUTGET.

When the TCAM address space is dispatched, and the MCP TCB regains control,
TCAM searches its asynchronous ready queue and discovers the user's TSB.
However, because this -is a TS/TCAM system, TSOUTPUT receives control
instead of the disk I/O routine. ~SOUTPUTperforms the following functions:
a.

Builds TCAM buffers from basic TCAM buffer units.

b. 'Uses QTIP services to move the TIOC buffer from the TSB header queue
(queue of complete output messages) to the TSB output trailer queue
(queue of TIOC buffers being moved).

A.6.24

c.

Uses special TIOC edit routines (not QTIP) to move and edit data from the
TIOC buffer to the TCAM buffer.

d.

Once the data has been moved into the TCAM buffers, the TCAM buffers
are routed to the output side of the message handler and are then written
to the terminal. After the message is successfully written, the TIOC
buffers are freed via a subsequent call to TSOUTPUT.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)

Terminal Input Flow
The following process can run in parallel with step 2 in the preceding section,
"Terminal Output Flow." It starts when control is returned to PUTGET as
described at the end of step 1 in that section.
1.

PUTGET issues a TGET to obtain input. TGET (SVC 93) performs the
following functions:
a.

Checks to determine if there is an input buffer on the user's terminal
status input queue. TCAM normally allows users of remote terminals to
enter input while the current input is being processed. Therefore, it is
possible that input could be 'stacked' and an input buffer found on the
TSB input queue. However, TCAM does not allow local devices to 'stack'
input. In this case, assume a local device and no buffer on the TSB input
queue.

b.

Therefore, the TGET notifies SRM that an input WAIT has been entered
and sets the appropriate flag in the TSB (IWAIT condition).

c.

SRM eventually performs a swap-out on the user.

2.

The user now enters a new command at the display station and hits 'ENTER'.
TCAM handles the interrupt, associates it via the LCB to a terminal name
table index, terminal table entry, and destination QCB.

3.

The TCAM buffer is routed to the input side of the appropriate message
handler (determined from the DCB for the line). The message handler
normally translates the data from line code to EBCDIC. The message handler
must locate the destination QCB of the terminal that issued the message and
also check that the terminal is logged on to time sharing. If it is logged on,
the message handler routes the buffer to TSINPUT as the common input
destination for all time sharing messages.

4.

TSINPUT performs the follOWing functions:
a.

From the ASID value in the terminal's destination QCB, TSINPUT
determines which address space should receive a particular message.

b.

TSINPUT obtains a TIOC buffer from the free buffer pool. If no TIOC
buffers are available, the TCAM buffer is chained from a special queue in
the TSINPUT QCB until TIOC buffers are made available. In this case,
the time sharing system is placed in an LWAIT (out of TIOC buffers).

c.

If a TIOC buffer is available, TSINPUT uses the services of QTIP to move
data from the TCAM buffer to the TIOC buffer. Most line control
characters and all 3270 buffer control characters are edited out ·of the
message during this move.

Appendix A: Process Flows

A.6.25

TSO (continued)
d.

SRM is"notified that the user is no longer in an input wait and may be
swapped in.

e.

The TCAM bufferis routed to the buffer disposition routine for final
processing.

5.

Once the TCAM buffer has been freed and final cleanup has been performed
on the line, TCAM searches for additional work on the work-to-do queues.
If there is none, TCAM enters a wait.

6.

Once SRM has swapped-in the user, TGET regains control. Using QTIP,
TGET moves the data from the TIOC buffer to the user's buffer.

TSO/TIOC Tenninal I/O Diagnostic Techniques
For terminal hangs or interlocks involving TSO tem1inal I/O, a good place to start
is at the TSB and TIOCRPI'. The TSBs are physically contiguous and adjacent
to the TIOCRPT (aU in CSA), as shown below:

TCX (TCAM CVT extension)

+24

TIOCRPT (reference pointer table)

TSB

A.6.26

OS/VS2 System Programming Library: MVS Diagnostic Techniques

TSO (continued)
TIOCRPT is described in the Debugging Handbook. TSB is described in OS/VS2
Data Areas (microfiche). TIOC is described in OS/VS TCAM Level 10 Logic.
TSBOWIP and TSBWOWIP are used to serialize TPUTs to a user. TSBOWIP is
set at the start of a TPUT SVC, while that SVC holds the local and CMS locks. If
another TPUT is issued before OWIP is reset, then WOWIP is set and the issuer of
the second TPUT is put in OWAlT.
The task that has "seized the TSB" (that is, set OWIP) can be determined by
checking TSBCTCB. (TSBTJIP and TSBTJOW serve approximately the same
function for cross-memory TPUTs.)

TSO Attention Processing
. The followingsection summarizes the process of TSO attentions. The numbers in

I parentheses correlate to the numbers in Figure A-I3.
TCAM Channel End Appendage ( 1)
•

Ensures TCAM is active.

•

Finds the element associated with this terminal.

•

Places the element on the asynchronous queue.

•

TeAM dispatcher merges the asynchronous queue to the ready queue and
give control to the message handler.

•

TCAM recognizes the following forms of terminal attention interrupts:
I/O attention interrupt for a 2741, which is checked in the line end
appendage.
Two separate interrupts for the 3270; (1) a keyboard-invoked I/O
attention interrupt, followed by (2) an I/O complete interrupt for
the read issued by TeAM in response to the first interrupt.
A user character string for a simulated attention, which is checked by the
SIMATTN routine.

Appendix A: Process Flows

A.6.27

TSO (continued)
Hardware Attention

Simulated Attention

(3)
(1)

..-

TeAM Channel
End Appendage

Message
Handler

--

(~)

Message
Control
Program(MCP)

I

(4)

TIOC
Attention
Handler

/

/

/
(5)

/

/

OTIP
Attention
Handler

/

L --(
r-:(6)
TIme

I

Sharing

t--.. .

RCT

LBIOC~S~J

-

/

r - L -,
User Issues
IL ___
STAX
..JI

(7)

RCT
Attention
Scheduler

(9)

RCT
Attention
Exit
I

.1

(8)
, Figure A·13. TSO Attention Flow

A.6.28

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Selected
User STAX
Exit

TSO (continued)

Simulated Attention (2):
The message control program (MCP) reads input from the terminal the same as it
does for normal operation. It then passes the message to the message handler.

Message Handler (MH) (3): .
•

Checks for the following conditions and calls TIOC if any exist:
Terminal input (character string)
PAl function key
Terminal output lines

TIDC Attention Handler (4):
•

Ensures TSO is active.

•

Gets the user's TSB.

•

Checks if the attention was caused by a deleted line.

•

Invokes QTIP (TIOC/TSO interface).

QTIP Attention Handler (5):
•

Checks if the user has issued any STAX macros.

•

Ensures the number of unprocessed attentions does not exceed the number of
active STAXs (causes '!I=TOO MANY ATTENTIONS or
'!'=ATlIENTION ACCEPTED to be printed a(the terminal).

•

Posts the RCT to schedule the user's attention exit.

•

Purges input and output message queues to/from user except ASID type
messages.

RCT (Region Task Control) (6):
•

Waits for:
- Termination
- QUIESCE/RESTORE
- Attention

.Appendix A: Process Flows

A.6.29

TSO (continued)

RCT Attention Scheduler (7):
•

Cancels previously-scheduled attentions that have not been executed.

•

Determines the current attention level requested.

•

Disables any affected tasks.

•

IfOBUF and/or IBUF was specified on the STAX macro, issues TPUT and/or
TGET.

User STAX Exit (8):
•

User defined.

RCT Attention Exit (9):
•

Enables any affected tasks.

•

Checks for another attention pending.

•

RCT enters wait.

TSO APAR Documentation
. TSO APAR documentation should include:
•

Terminal input and output.

•

SYSUDUMP or stand-alone dump, as appropriate.

•

Information about how the system differs from PID release in the TSO area:
PTF list.
Information about non-IBM commands that appear in terminal output.
Description of any TMP modifications.
Description of applicable installation exits (LOGON, SUBMIT, etc.).

•

A.6.30

listing of the logon procedure, with a list of membernames in STEPLIBs,
if any.

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis

This appendix contains a procedure that has been used successfully in stand-alone
dump analysis. It is part of the course material in Field Engineering classes that
teach MVS problem determination. This procedure does not attempt to cover all
situations bu t it can be used as a guide through major status areas until you become
thoroughly familiar with the system.

Overview
Stand-alone dumps are generally taken by the operator when he detects:
• That the system has stopped in a solid wait state with a wait state code.
• What appears to him to be a system loop.
• That the system is not running or is running slowly.
Usually the 'Title From Dump' reflects what the operator thought happened.
Before becoming too involved in the problem itself, it is a good practice to get
some feel for the status of the system at the time the dump was taken. Some
valuable system status indicators can be obtained from the formatted section of the
dump. Indicators can be obtained from the formatted portion of the dump under
"System Summary" (produced by the SUMMARY control statement) and CSD,
PSA, LCCA, and PCCA (produced by the CPUDATA control statement). Although
it is seldom that anyone indicator definitely points out the problem, when all
indicators are noted and analyzed, a pattern might emerge that points the problem
solver to the proper area for further investigation.
The enabled wait generally occurs as aresult of the lack of some critical system
resource. If the PRINT statement of PRDMP is used, PRDMP identifies the
current task. If the current task is the wait task, the message "Curren t Task = Wait
Task" will appear.
If it appears you have an enabled wait condition, read the chapter on "Waits"
in Section 4 of this book before proceeding with your analysis.
The system can appear or actually prove to be bottlenecked because the
operator cannot communicate with MVS. This is the sign of a problem almost anywhere in MVS, but an error in the communication task or its associated processing
might be the direct cause. The communication task runs as a task in the master
scheduler's address space, usually represented by, the third TCB in the formatted
portion of the stand-alone dump; it is identified by a X'FD' in the TCBTID field
(TCB+X'EE'). By inspecting the RB structure associated with this task, you can
determine the current status. It is not unusual to find one RB with a resume PSW
address in the LP A and an RB wait count of one. If more than one RB is chained
from the TCB and you could not enter commands, analyze the RB structure as this
is not a normal condition.

Appendix B:. Stand-alone Dump Analysis

B.1.1

Appendix B: Stand-alone Dump Analysis (continued)
Remember that communicatio"ns task processing is very dependent on the rest
of the operating system. Probably some external service or process has caused the
communications task to back-up, and this possibility should be investigated.
For the system to continue execution, the major components must be opera. tional. If any critical system components such as master scheduler, ASM, JES2,
and TCAM for TSO, terminate abnormally and fail to. recover, the system· cannot
continue normal operation. Usually this can be determined from the records in
SYSI.LOGREC. However, check the TCB summary in the formatted section for
completion codes.
The presence of a TCB completion code does not positively identify the
associated task as being inoperative. It is possible that the completion code is
residual and the task has recovered. The presence of a completion code makes
the task suspect, however, and should be investigated.
Unless the operator STORE STATUS command was issued before taking the
dump or the "Title from Dump" reflects a WSC (wait state code), it can be difficult
to determine if a WSC exists and what it is if it does.
Ifhowever, the WSC PSW is dispatched by NIP during IPL, it is generally
located in one of two places:
• In the MCH new PSW if a program check occurred prior to RTM initialization.
• In the nucleus vector table (NYT + X'EO') in the case of a system-detected
error during the NIP process.
The other WSCs (they are few in number) issued by the system are dispatched
by the master scheduler communications task and ASM. The current address space
should identify who loaded the WSC PSW; WSC PSWs are issued when the system
determines that it cannot continue. They are usually preceded by other error
indicators that should be investigated along with the WSC.

Note: A valid WSC always looks like: X'00020000 OOOOOxxx'
A disabled wait normally has a wait state code associated with it. If so, the
messages and codes should contain a problem description.
If there is no wait state code, the trace table should indicate the last sequence
of events leading to the wait state condition. Probably a bad PSW (wait bit on) has
been loaded.

If no valid WSC exists and if the PSW reflects the wait bit, is disabled, and the
STORE STATUS registers are not equal to zero, suspect: a user or Field
Engineering trap or a SLIP trap (with a wait state code ofX'OlB'), a bad branch, or
system damage. Examine the trace table and attempt to define the events that led
up to the wait condition. Was the last entry an SRB dispatch or an SVC I/O
interrupt? Using the PSW address, determine the entry point of the routine if
possible.

B.l.2

OSNS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)
The PSA is a system area whose status indicators are dynamically changing. The
status indicators reflect the condition of the system the instant the dump was
taken. Taken out of context, they can be misleading.
Therefore, find out if the operator entered a STORE STATUS command and
keep in mind the status could have been stored any time and not necessarily just
before the dump.

Note: The best evidence that the operator issued STORE STATUS is the content
of 'Current Registers and PSW at Time of Dump.' This is because although the
stored status is put in the PSA +X'IOO' and the registers are put at PSA +X'1601FF', the SADMP program reads this area as the current PSW and registers and
writes them to the dump data set. On a UP, the formatted current data will be the
same as in the PSA. On an MP system, however, the SADMP program issues
SIGP to the other processor to store status. The STORE STATUS command always
stores in the normal PSA at location zero. This means that the normal PSA will
contain the registers and PSW from the other processor. If the SADMP program did
not save the STORE STATUS data before issuing the SIGP instruction to the other
processor, the data from the operator's STORE STATUS command would be overlaid
and the contents lost.
Also note that on an MP system there are three PSAs and the AMDPRDMP
program formats all of them for you. The normal PSA is used only during NIP
(and SADMP). Always be sure you are looking at the right PSA when you
are analyzing the PSA contents.
If the PSW + X'OI' = xE or x2, the PSW = Wait PSW. IfPSW + X'OO' = X'04',
the system was disabled. If PSW + X'OO' = X'OT, the system was enabled.
Determine whether the PSW contains a WSC or an address. Then determine what
key the PSW reflects. PSA + X'lOl' = X'xC' or X'xE' where the x = key, as
follows:

o
1
5
6
7
8
9-F

su pervisor
scheduler/JES2/JES3
lOS, data management, actual block processor, O/e/EOV
TCAM/VTAM
IMS
virtual problem program
V=R problem program

Check the PSA for a low storage overlay. Critical fields are the CVT pointer at
X~IO', the new PSWlocations at location X'58-78' and at location X'OO', and the
trace table pointer at location X'54'.
Keep in mind that the CVT pointer at location X'lO' is constantly refreshed and
the old PSWs are constantly updated by the hardware. They could have been overlaid at one time and sti11look okay in the dump from an MP system.

I

In a SADMP on a UP, locations X'OO' through X'18' are always overlaid by the
IPL CCWs and PSW from the IPL of SADMP itself. These locations never contain
valid data.

Appendix B: Stand-alone Dump Analysis

B.1.3

Appendix B: Stand-alone Dump Analysis (continued)
If the PSW reflects the wait bit and does not have a zero address and if the
STO RE STATUS registers are zero, check location X'300'. Is it equal to the wait
state PSW? If so, it is possible some task scheduled a bad SRB. Examine the trace
table for the SRB dispatch. Register O's position in the trace table is a pointer to
the SRB. The previous address space before the SRB dispatch is the possible
scheduler of the SRB. Another possibility is an overlaid RB or LCCA. What does
the last entry in the trace table reflect - SRB or task dispatch? Make sure that
the trace table was not stopped by the dump task. Check for an X'80' in the high
order byte of the CPUID field.
Loops can be either disabled or enabled. The best way of determining which has
occurred is by noting the address of the loop if the operator recorded it before
taking the SADMP.
Recorded addresses that fall within the SRM code are usually not indicative of
a loop because this code is entered periodically as a result of a timer interrupt. This
signifies, however, that the system does enable for interrupts and you can
treat the error as an enabled loop. Caution: If the only addresses the operator
furnished are in the timer or SRM code, check that it is not really an enabled wait
condition. The typical disabled loop is quite short, whereas the enabled loop covers
a wide range of addresses. Be careful that the recorded addresses that may reflect
a short loop are not a 'loop within a loop.' Scan the trace table and try to
determine if a pattern of activity exists. Look for SIOs to the same device, SVCs
from the same address, program checks occurring frequently for other than page
faults, or any repetitive activity. If no pattern exists, try to correlate the last trace
entry with what you already know about the loop (for example, I/O interrupts, a
loop in an lOS or SRB dispatch, and a loop in the nucleus in some routine which is
entered via an SRB).
The enabled loop usually reflects a wide range of addresses and can even span
address spaces between a user and the system address spaces. An examination of
the trace table usually shows some pattern of activity that is recognizable as a
loop.
Be especially suspicious of a SVC OD or SVC OA for the same size area, SVC 33,
SVC 4C, and SIOs to the same device with the same 10SB address in register 1.
Trace table entries with SVC OD and/or SVC 33 in a stand-alone dump usually
mean that some task is abending and the system is attempting to recover and'
purge the task from the system.

B.1.4

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)
If any address within the loop points to the lock manager (module IE~VELK) ,
the problem is probably caused by someone requesting an unavailable spin lock.
On a UP, this is an invalid condition and always signifies an overlaid lockword. On
an MP system, this signifies that the other processor is holding the lock and failing to
release it. There is a strong possibility that this indicates an overlaid lockword also.
If not, the problem is on the other processor. In either case, register 11 can point to
the lockword requested and Register 14 is the address of the requestor. Check the
value in the lockword. Valid values are a fullword of zeros or three bytes of zeros
and the CPUID in the fourth byte. Any other bit configuration causes the
system to spin in a disabled loop and signifies an overlaid lockword. Register 12
always contains the bit mask to check the locks-held-table in the PSA.
If the lockword is overlaid, you must identify who overlaid it. It is possible that
the lockword was overlaid in conjunction with some other problem.
This procedure is designed to aid the problem solver and to supplement the
diagnostic procedures he has developed over the years. Its main purpose is to call
attention to the new serviceability features within MVS and provide an index into
the correct component analysis procedures in Section 5 of this manual. Once again,
the component analysis procedures are there as hints and helps rather than to provide
a structured approach to all problems.

Appendix B: Stand-alone Dump Analysis

B.1.5

Appendix B: Stand-alone Dump Analysis (continued)

No

Note 11

Ye-'s_ _ _ _--,J.... Note 21

>-N'""O____--,J.

Note 18

Note 12

Yes

PSA+X'2FS'

Yes

Note 25

Figure

B.1.6

B-1.

Stand-alone Dump Analysis Flowchart

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Note 14
~

OOOOxxxl

Note 20

Appendix B: Stand-alone Dump Analysis (contineud)

Analysis Procedure
The following explanations correlate to the "Notes" in Figure B-1.

Note 0 - Dummy Task?
The enabled wait generally occurs as a result of the lack of some critical
system resource.

If the PRINT statement ofPRDMP is used properly (see the chapter
"Additional Data Gathering Techniques" in Section 2), the message
"CURRENT TASK = WAIT TASK" appears in the formatted portion
of the dump.
PSA + X'218/21C' is the new/old TCB pointer. PSA + X'220/224' is
the new/old ASCB pointer.
The ASCB with an ASID=O is the dummy ASCB. If the dummy task is
the current task, go to the TCB summary before going to the next
block and check whether any task has an error completion code. If any
TCBs are abending, continue at Point A and start with the top three
address spaces if they have a completion code.
If the ASCB is the dummy ASCB but the TCB new/old pointers are
zero, then take the "no" path and check for SRB mode.

Note I

- System Enabled for I/O?
Is bit 6 on in the current PSW?
Is control register 2 correctly loaded?
The current status of the system is in the PSA if a STORE
STATUS command was entered before the dump was taken.

Note 2 - Dispatchable Work to be Done?
1 . One of the first places to check for system dispatchability is the
common system data area (CSD). For example, CSD+C=40 indicates
that most of the system is non-dispatchable. This bit can be set by
SDUMP. Is any address space abendingand in the process of
taking an SDUMP? Check the TCB summary for completion codes.
2. Dispatchable work within an address space is indicated by:
ASCB+80 = 00000000 or FFFFFFFF
ASCB+66, 67, 72, 73 = 00
ASCB+7C = some value or
ASCB + 1C = the service priority list has an SRB queued.

Appendix B: Stand-alone Dump Analysis

B.l. 7

Appendix B: Stand-alone Dump Analysis (continued)
3. The JES2/JES3 address space can contain work that should be passed
to a waiting initiator or interface that has an address space for SYSIN
or SYSOUT data.
4. Dispatchable work at the system level is indicated by SRBs queued to
the service manager queue and the global service priority list.
For (1), you must determine who set the bit on, who should have reset it,
and why the bit was set. It might be necessary to trap on the setting of this
bit.
For (2), a 7FFFFFFF in the lockword keeps the dispatcher from
dispatching. Most of the flags show the reason for not dispatching.
For (3), check the JES control blocks more closely.
For (4), determine why the dispatcher is not functioning. See the
"Dispatcher" chapter in Section 5 of this manual.

Note 3 - Enqueue Lockout?
As in other systems, an exclusive enqueue prevents other tasks from
using the same resource. However, in MVS, locks are now used
frequently instead of an enqueue.

1: Use the QCB format function to print the QCBs and check for
exclusive enqueues.
2. The CVT+X'280' points to the first major QCB.
Major QCB+8 points to the first minor QCB.
Minor QCB+O points to the next minor QCB.
Minor QCB+8points to the first QEL.
3. Any QEL reflecting exclusive control or reserve status prevents any
other task from using that resource. Any.QEL reflecting shared
status prevents any task requesting exclusive control
from using that resource.
4. The Debugging Handbook defines some of the major and minor
ENQ names.

Note 4 - Incomplete I/O?
Label IECVSHDR in IEANUCO 1 points to a pool of cells used by lOS
to build the 10Q (I/O queue element). The 10Qs are found in two
places:

B.I.8

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)
I. An 10Q chained to the VCBA indicates an I/O operation is in
progress or has completed on that device. The flag bytes at VCB + 6
determine the current state of the device. The device is available
when the flag byte is zero.
No request for this device should be chained to the LCH during an
enabled waiL
2. The 10Qs are chained to the logical channel queues (LCH) if the
I/O operation has been requested but not started.
The LCH is pointed to by the CVT+X'SC'. The entry for each
logical channel is 20 bytes long. At X'OO' into each entry is a
pointer to the first 10Q queued for that logical channel. The
presence of IOQs on any logical channel is immediately suspect
whet) examining an enabled wait state dump. An empty queue (no
requests) is indicat.ed by a word of FFFFFFFF in the LCH at
X'OO'.

Note 5 - Is Any Task in a Page Wait?
Check the TCB RBs for a wait count not equal to zero.
RB+IC =wait count
RB-S =1=40 (FLAGI)

Note 6 - Explicit Wait in System Code?
Does the address in the PSW fall within the nucleus or LP A code?
Compare the address with a NVCMAP or LPA map.
Check the load list and CDEs for system modules that have been loaded
into the private area.

Note 7 - Real Storage Okay?
If a task remains in a page wait, it could indicate a shortage of page
frames or a real storage failure.
The control blocks that contain status about the use of real storage are:
I. Page vector table (PVT)
PVT+4 = available frame count
PVT+X'24' = free PCB count
PVT+X'140' = deferred for lack of free page frames
PVT+X'14S' = requests sent to ASM

Appendix B:

Stand~alone

Dump Analysis

B.l.9

,,', Appmdix B: Stand-alone Dump Analysis (continued)
2.' Page frame table (PFT)
Shows use of each frame of real storage available for paging.

Note 8 - Is Auxiliary Storage Okay?
If tasks are in a page wait and real storage is not a problem, the
trouble could be within the auziliary storage manager (ASM).
ASM status indicators are:
1. AS MVT+ X'28' = the number of paging I/O requests received
2. ASMVT+X'2C' = the number of paging I/O requests completed
3. ASMVT+X'50' =the number of started I/O requests that have
, not completed
4. ASMVT+X'54' =indicates whether the ASM's SRB for ILRPTM
(ASM PART monitor) has been scheduled
If the number of paging I/O requests completed is equal to the number
of paging I/O requests received, the ASM has no outstanding work.
However, if these counts differ, check the other status indicators for
the following:
1. If I/O requests have been started but not completed, determine
what has happened to the I/O.
2. If ASM's SRB for ILRPTM has been scheduled, determine what the
dispatcher has done with the SRB.

Note 9 - Is lOS Okay?
If the number of started I/O requests that have not completed
(ASMVT+X'50') is zero, then lOS has completely processed all
the I/O that ASM has started.

Note 10 - Interrupted SRB or TeB?
The condition that caused the SRB to be suspended has been resolved.
The suspended SRB (SSRB) is queued on the SPL at the non-quiesceable
level.
The condition that caused the TCB holding the local lock or local and
and CMS locks to be suspended has been resolved. The save area to be
restored upon dispatching is the IHSA.
A TCB holding the local lock or local and CMS locks has been
interrupted by a higher priority task. The save area used for redispatching is the IHSA. See the chapter "Dispatcher" in Section
5 or the chapter "System Execution Modes and Status Saving" in
Section 2 of this manual.

B.1.10

OS/VS2 System Programming Library: 'MVS Diagnostic Techniques

Appendix B: Stand-alone Dumr Analysis (continued)

Note 11 - Not RTM ?
Without the detection of a failure by MVS, which would have caused
entry into RTM, check the following. If the stand-alone dump reflects
the same current task, this could be normal operation or the task could
be in a loop. Check the following for status information:
LCCA
PSA
PCCA
Trace table
TCB
RB/SVRB
If no failure information is found (the system appears to be running
normally), the problem might be that another task or address space
should be running and is unable to. Check the following for status
information:

1. Check each address space that is expected to be running to find
out why it is not running. The information about each address
space and task within that address space can be found in: ASCB,
ASXB, TCB, and RB/SVRB.
2. Or, check the total system to find out why other work is not
being run. Check the status of the system resources:
ENQ lockout of data sets
I/O failures
RSM or ASM failure
Waits in system code for other system resources (such as buffers)
If you are checking other than the current task, the TCBs could be
dispatch able , but not yet dispatched. If the task is non-dispatchable
(non-dispatchability bits on in the TCB), this can indicate an error
situation. Or the task could be simply waiting (indicated by a wait
count in the current RB). Check the dispatchability flags in the
following control blocks for status of this task or select another address
space or task and continue at Point A.
Status information can be found in: ASCB, ASXB, TCB, and _
RB/SVRB.

If this is a system dump, the TCB belongs to a non-abending sister,
mother, or daughter task. Find the task that has a completion code
(by checking the TCB summary) and continue at Point A.

Appendix B: Stand-alone Dump Analysis

B.1.11

Appendix B: Stand-alone Dump Analysis (continued)

Note 12 - RTM2, Yes.
The most important place to find information about abend codes is
OSjVS Message Library: VS2 System Codes..
The RTM2 work area address is stored by RTM2 in TCB+X'EO'. Every
system dump (SYSABEND/SYSMDUMP/SYSUDUMP) should have at
least one TCB with an RTM2WA address at TCB+X'EO'. The error
indicators contained in the RTM2WA are described in the Debugging
'Handbook.
If an Estae routine is in control when an error occurs, RTM builds an
SDWA (described in the Debugging Handbook) and places its address
at the RTM2WA+X'D4'.
Additional information about the failure may be found in the LOGREC
buffer. RTM2WA+X'38' points to RTCT; RTCT+X'20' points to the
LOG REC buffer.
If recursion occurs during RTM processing, other RTM2WAs may exist.
If other work areas were obtained, the last one is pointed to by
the TCB+X'EO'. The last RTM2WA, points to the previous
work area (RTM2WA+X' 168, 16C, 170'). If there is no space in LSQA
to build the work area, SQA is used.
Normally the RTM2WA is obtained from LSQA. It is therefore unique
to each address space. If you are looking at a stand-alone dump, be sure
that the area you are looking at belongs to the failing address space.
If the abending task is one of several abending tasks it is important
to decide which task to look at first. There could be several failures or
one failure causing all the others. Any failure in the system address
spaces (JES2, master scheduler) are important because they might have
caused the user address spaces to terminate.
For the system to continue execution, the major components must
be operational. If any of the critical system components (master
scheduler, ASM, JES2, TCAM for TSO, etc.) abend and fail to
recover, the system cannot continue normal operation. Usually this
can be determined from the records in logrec. However, check the
TCB summary in the format section for completion codes.
The presence of a TCB completion code does not positively identify the
associated task as being inoperational. It is possible that the completion
code is residual and the task has recovered. The presence of a completioncode makes the task suspect however, and should be investigated.

B.l.12

OS!VS2 System PrograJ11ming Library: MVS Diagnostic Techniques

Appendix 8: Stand-alone Dump Analysis (continued)

Simplify your choice of address spaces by using:
• SYSl.LOGREC external and internal entries
• Console sheets
• Trace table or GTF (check for SVC D or program check entries)
Once you have selected an address space and TCB, continue at Point A.
(Check Section 5 for the component analysis of the involved
component.)
In addition to the RTM2WA, status indicators related to the problem
can be found in::
•
•
•
•

Trace table
EST AE control block (SCB)
RB/SVRB
TCB

Note 13 - Local Lock Only?
The current ASCB+X'C' contains the CPU ID. The current
TCB+X' 110' also contains the CPU ID. The loop is within this task.
Status is saved (if a STORE STATUS was done) in:
•
•
•
•

PSA
LCCA
Current stack
Local SDWA (ASCB+6C) - if the task abended while holding
the lock·
• Trace table
• In-storage LOGREC buffer

Is this task looping in the lock managet's code? Check the;
PSA+X'22B' and LCCA+X'20C'. If the task is looping and this is an
MP system, the other processor could be causing the loop by not freeing
a spin lock that it is currently holding. Note: The failure to free or
obtain a lock can be caused by the lockword being overlaid on either
an MP or UP.

If both processors of an MP are looping in lock manager code, then the
failure could be in that code. If only one processor is in lock manager code,
then the failure is likely to be in the processor currently holding the lock.
Within the lock manager code, register 12 contains the bit mask for
the locks-held table in the PSA (PSA+ X'2FB'). Register 11 can
contain the address to the lockword itself and register 14 contains the
return address of the requestor.

Appendix B: Stand-alone Dump Analysis

B.1.13

:Appendix 'B: Stand-aIbneDttmp Analysis (continued)

Whe.re is the tasklooping? Why doesn't it free the locks? Is RTM
involved with this task? If it is, continue at Point A.
See the chapters on "Locking" and "Effects of Multi-Processing on
Problem Analys,is" in Section 2 of this manual.

Note 14 - Local Lock Plus Another Lock.
. The current ASCB contains the CPU ID. The current TCB+X'110'
contains the CPU ID. The loop is within this task if only the local and
CMS locks are held. The loop could be a spin loop waiting for the
other processor to release a global lock. In this case, determine why
the lock has not been released.
Status indicators can be found in the following areas (if a STORE:
STATUS was done):
•
•
•
•

PSA
LCCA
Current stack
Local SDWA (ASXB+6C) - jfthe task abended while holding
local and CMS locks
• Trace table
.• In-stor~ge LOG~EC buffer
See Note 13 for additional information. Also see the chapters
"Locking" and "Effects of Multiprocessing on Problem Analysis" in
Section 2 of this ~anual.

Note 15 - Global Lock Held.
A global lock loop in an MP system could be normal. The spin loop
continues until the global lock is released by the other processor .
. Determine why the other processor has not released the lock.
Error status indicators can be found in the following areas if a STORE
STATUS was done:
•
•
•
•

PSA (current PSW)
LCCA
Current stack
Global SDWA (if there was an abended failure while the global lock
. was held)
.

The global SDWA for the super stacks is located at the respective super
stack+ X'2S4'. For the normal stack, the global SDWA immediately
follows the RESTART super stack SDW A.

B.l.14

OS/VS2 System Programming Library: MvS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)

Now continue at Point A in the procedure. See Note 13 for additional
infonnation. Also see the chapters "Locking" and "Effects of
MlVtiprocessing on Problem Analysis" in Section 2 of this manual.

Note 16 - lOS Not Okay.
Check the requests sent to lOS from auxiliary storage manager
(ASM). Control blocks containing information are:
1. PART (paging activity reference table) - One entry
per page data set. Each PART entry contains a pointer
to an 10RB (I/O request block) at X'l C' and a pointer
to a UCB at X'2C'.
2. 10RB contains the following I/O related data:
10RB+X'1' = number of 10RBs for this page data set
IORB+X'3' =indicates whether 10RB is in use
10RB+ X'4' = pointer to next 10RB for this page data set
10RB+ X'8' = pointer to the first PCCW
10RB+X'C' = pointer to the 10SB.

Refer to the Component Analysis section for additional lOS
status indicators.

Note 17 - Suspended SRB or TeB With Lock Held.
An SRB can be suspended because of a page fault or a request for a
CMS lock when it is being held by another processor. The save area
for the suspended SRB is the SSRB. If interrupted by a page fault,
the SSRB is pointed to by the corresponding PCB+ X' 1C'.
For a CMS lock request, the SSRB is on the CMS lock-suspended
queue, which can be located in IEANUCOl at label CMSSRBF. (See
AMBLIST of IEANUCO 1.)
A locked TCB can be suspended for the same reasons as an SRB. The
save area is the IHSA (described in the Debugging Handbook). The
IHSA is valid during a page fault if the corresponding PCB+8 flag is on.
The IHSA is valid for a CMS lock suspend if the ASCB is on the CMS
lock suspend queue in IEANUCOI at label CMSASBF.
The TCB can be suspended because of a page fault while holding the
local lock and the CMS lock. A difference would be that the
ASCB+X'67' flag for the CMS lock is turned on. See the chapter
"Dispatcher" in Section 5 and the chapter "System Execution
Modes and Status Saving" in Section 2 of this manual.

Appendix B: Stand-alone Dump Analysis

B.1.15

Appendix B: Stand-alone Dump Analysis (continued)

Note 18 - Not RTM2.
The presence of a TCB completion code does not positively identify
the associated task as being inoperational. It is possible that the
completion code is residual and the task has recovered. The presence
of a completion code makes the task suspect however, and it should be
investiga ted.
The save areas have been released. The status of the error has been
written to SYSl.LOGREC. Continue at Point A with other TCBs in
the dump. Another abending task is likely. If this is a stand-alone
dump, it very likely has the needed LOGREC entry in the in-storage
buffer. CVT+'23C' points to RTCT; RTCT+X'20' points to the
LOGREC buffer.

Note 19 - Real Storage Not Okay.
If page waits seem to be caused by the lack of real frames, check their
usage. The PFT contains information about each frame currently being
used. Importan t items to check are:
Which ASID holds the most real storage?
What are the frames being used for?
Is it valid that they be held or is .there a problem with the freeing of
the frames?
Status information might be found in the PVT, PFT, and RSMHD and
ASCB (X'98') for the ASID that is holding all the frames.
See the "RSM" chapter in Section 5 of this manual for more information about RSM.

Note 20 - lOS Okay.
Either something was missed along the way or the failure is in one of
the following areas:
• The lOS interrupt handler has failed to schedule the SRBjIOSB to
the address space.
• The dispatcher has not handled the SRB correctly.
• POST has not functioned properly.
Information on these errors might be found in the trace table or the
in-storage LOGREC buffers.

B.l.16

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)

Note 21 - RTMllnvolved.
If there is an address at TCB+X'104' there might be two problems to
resolve:

• The failure that caused the system to enter RTM initially.
• A loop between RTMI and RTM2, since the pointer at
TCB+ X' 104' normally lasts for only a short time,
The pointer at TCB+X'104' is the EED (described under RTIW in
the Debugging Handbook). This data area is used to pass information from RTMI to RTM2. Once RTM2 receives control the
information is moved to the RTM2 work area and the EED is
deleted. Therefore, because of its short life span, the presence of
an EED is unusual.
A SLIP trap may be required to solve the RTM loop. This loop is of
course the most important problem.
If the loop is in the current task, check these status indicators:

•
•
•
•
•
•
•
•
•

LCCA
PSA
Current stack
RTMIWA
RTM2WA
SDWA pointed to by RTMIWA
EEDs
LOGREC buffer
Trace table

If the loop is not in the current task, all the indicators above except the
LCCA, PSA, and current stack are valid. The current FRR stack is also
a valid status indicator. Remember that all disabled or locally locked
code runs under the protection of an FRR routine.

Check the current stack pointer at PSA + X'380'. If the current stack
pointer points to a super FRR it is almost certain that system damage
has occurred.
The normal stack at X'COO' contains a record of FRR activity for the
current address space. Location X'COC' is the pointer to the current
entry on the normal FRR stack. An address at X'COC' or·
X'C34' indicates an empty stack. Any address between X'CS4' and
X'E34' indicates that the system is currently under FRR protection
and the first word in each FRR entry is a pointer to the FRR routine.
Because the FRR routine is usually embedded within the routine it
protects, identifying the FRR routine identifies the "looper."

Appendix B: Stand-alone Dump Analysis

B.1.17

Appendix B: Stand-alone Dump Analysis (continued)
The second word in each entry contains an indicator in the first byte.
A X'BO' indicates that this routine is in control. A X'40' indicates that
this nested recovery routine is in control. If any entry on the stack
points to RTM or ABDUMP's FRR, it is almost certain that system
damage has occurred in a SADMP. This is normal in an SVC dump.
If there is an address at either X'C44' or X'C4B', there has been an
entry into RTMI and an RTCA (SDWA) has been obtained. The loop
could be occurring in the FRR routine itself. The first word in the
FRR stack entry points to the FRR routine. TheSDWA (pointed to
by X'C44' or C'4B') is the input passed to the FRR. Examine the
code for the FRR and the module and consider the input passed to
it in the SDWA to gain some insight into the cause of the loop.

Note 22 - Auxiliary Storage Not Okay.
If the count of I/O requests received (ASMVT+X'2B') differs from
the count of I/O requests completed (ASMVT+X'2C'), and the
number of started I/O requests that have not completed (ASMVT+X'50')
is zero, locate those paging I/O requests (represented by an AlA) that
ASM has received but not completed. Control blocks containing
information are:
I. AIA-X'2B' = part of PCB which contains RSM-related data
2. ASMVT+ X'20-24' = queue of AlAs waiting for 10Es
3. The I/O request element (lOE) which points to the AlA is queued
to one of the following PART queues:
PART+X'30-34' = common write queue
PART+ X'3B-3C' =spill write queue
PART+X'40-44' = duplex write queue
PART+X'4B-4C' =local write queue
each PART entry contains an unsorted read queue (X'C') and
a sorted read queue (X'30').
4. Each active 10RB (PART entry+ X' I C') contains a chain of PCCWs
(lORB+ X'B'). Each of these PCCWs points to an AlA (PCCW+ X'B').
5. If the AIAcannot be found by the above means (that is, it was lost
by ASM), PCB/AlA may be found on the common I/O queue
(PVT+X'75C-760') or one of the local I/O queues (RSMHD+X'IC-20').
For further information, see ASM's "General Debugging Approach"
in section 5.

B.l.18

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)

Note 23 - Local SRB Mode.
This indicates a loop (or enabled wait) within a single address space.
The SRB code cannot be pre-empted. If a loop occurs in the SRB
routine, no higher priority task can be dispatched.
For an MP system there is a second possibility. Determine if the loop
is in the lock manager code. If so, see notes 13, 14, and 15 for
additional information. Continue at Point A.

Status Indicators
• Trace table.
• PSA (current PSW).
• LCCA.
• Current stack.
• RTMI WA (SDWA) - if abend occurred during SRB processing.
• ASCB.
• RTMI WA+X'38' points to an SDWA obtained via GETMAIN
(if RTMI WA+ X'40' = 10).
• RTMI WA+X'34' points to a local SDWA if the GETMAIN for
SDWA failed.
Note: If the system is an MP and the loop is in the lock manager
code, then the other processor might be at fault. See notes 13, 14,
and 15 for additional information. Continue at Point A.

Status Indicators
• PSA (current PSW).
• LCCA.
• Current stack.
• RTMI WA (SDWA) - if failure occurred during SRB processing.
• Trace table.
• RTMIWA+X'38' points to an SDWA obtained via GETMAIN (if
RTMIWA+X'40' = 10).
• RTMI WA+X'34' points to a local SDWA if the GETMAIN failed.
See the chapter "Dispatcher" in Section 5. Also see the chapters
"Locking," System Execution Modes and Status Saving," and
"Effects of MP on Problem Analysis" in Section 2 of this manual.

Appendix B: Stand-alone Dump Analysis

B.1.19

Appendi~

B: Stand-alone Dump Analysis (continued)

Note 24 - Global SRB Mode.
This indicates an enabled loop (or enabled wait) within a single address
space .
. The SRB code cannot be pre-empted. If a loop occurs in the SRB
. routine, no higher priority task can be dispatched.
For an MP system there is a second possibility. Determine if the loop
is in the lock manager code. Ifso, see notes 13, 14, and 15 for
additional infomlation. Continue at Point A.
Status Indicators
• Trace table.
• PSA (current PSW).
• LCCA.
• Current stack.
• RTMI WA (SDWA) - if ABEND occurred during SRB processing.
• ASCB.
• RTMIWA+X'38' points to an SDWA obtained via GETMAIN
(ifRTMIWA+X'40' = 10).
• RTMIWA+X'34'points to a local SDWA if the GETMAIN failed.
Note: If this is an MP system and the loop is in the lock manager
code, then the other processor might be at fault. See notes 13, 14,
and 15 for additional information. Continue at Point A.
Status Indicators
• PSA (current PSW).
• LCCA.
• Current stack.
• RTMIWA (SDWA) - if failure occurred during SRB processing.
• Trace table.
• RTMIWA+X'38' points to an SDWA obtained via GETMAIN
(ifRTMIWA+X'40' = 10).
• RTMIWA+X'34' points to a local SDWA if the GETMAIN failed.
See the chapter "Dispatcher" in Section 5. Also see the chapters
"Locking," "System Execution Modes and Status Saving," and
"Effect of MP on Problem Analysis" in Section 2 of this manual.

B.l.20

OS/VS'). System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)

Note 25 - Wait in User Code.
This could be normal operation for an explicit wait (SVC 1) issued by
a user routine. Determine if the event waited upon has completed.
Check the TCB non-dispatch ability flags to determine the reason. The
flags normally indicate the area of the problem. For example, if Flags4
=X'04', this indicates a VARY or QUIESCE command is in process on
an MP system; Flags5 =X'80' means the task was terminated.

Note 26 - Non-enabled System.
A disabled wait normally has a wait state code associated with it. If so,
the messages and codes should contain a problem description.
If there is no wait state code, the trace table should indicate the last
sequence of events leading to the wait state condition. Probably a bad
PSW (wait bit on) has been loaded.

Status Indicators
• LCCA
• PSA
• Current stack
• Trace table
• In-storage LOGREC buffer
If no valid WSC exists, if the PSW reflects the wait bit and is disabled,
and if the STORE STATUS registers are not equal to zero, suspect
a user/FE trap, a SLIP trap (wait state code 01 B), bad branch, or
system damage. Examine the trace table and attempt to define events
that lead up to the wait condition. Was the last entry an SRB dispatch
or an SVC or I/O interrupt? Using the PSW address, determine the
entry point of the routine if possible and go to the chapter "MVS Trace
Analysis" in Section 2 of this manual.
If the wait state occurs during system initialization, see the NIP vector
table for error information. If the system is in a disable loop,
determine what code is in control and why it is not returning to the
enabled state.
A disabled loop in the lock manager on an MP system could be okay ..
Read notes 13, 14, and IS. A disabled loop in the SIGP processor on
an MP system could be okay. (The other processor should turn
off its PCCA's parallel/serial bit.)
If the system is looping (no wait bit), follow the SRB mode path.
Check if RTM is involved and if it is, go to Point A.

Appendix B:· Stand-alone Dump Analysis

B.1.21

Appendix B: Stand-alone Dump Analysis (continued)

Note 27 - Dispatchable Work'Available.

If the system is dispatchable and an address space has dispatchable
work, the following are possible causes:
• The dispatcher is not functioning.
• CPU affinity may have been requested.
• JES2 might not be sending work to the initiators. In this case,
take a closer look at JES2.
See the chapter "Dispatcher" in Section 5 of this manual to
determine why the dispatcher is not functioning properly.

Note 28 - Enqueue Lockout.
Determine why the top task of a series of exclusive enqueues is not
running or has not dequeued from the resource.
Note: It is valid for the top task to be swapped out. If it does not get
swapped back in, then the failure might be in the system resource
manager (SRM).

Note 29 - Incomplete I/O.
This is a probable hardware error. See the "IDS" chapter in
Section 5 to determine the status of I/O.

Note 30 - Explicit Wait in System Code.
Check in the program listings (on microfiche) for the reason of the
wait. Then determine which resource is being waited upon.
Once the resource is identified, determine if the wait should have been
satisfied. If the wait appears to be a normal operation, con tinue at
Point A for this TCB.
If the last thing done before the wait was an SVC 23 (WTO), related
infoffilation can be found in the UCM base, prefix UCM, UCM
extension and the chain of used WQEs.

8.1.22

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)

Note 3 J - System Analysis.
If the failing task or component is not known, continue on the "yes"
path of the flowchart.
To determine status about a TCB without doing a total system analysis,
continue on the "no" path of the flowchart.
For a complete system analysis, start with low storage. Check the PSA
for a low storage overlay. Critical fields are the CVT pointer at X' 10',
the PSW new locations at location X'S8-78' and at location X'OO',
and the trace table pointer at location X'S4'. Be especially critical of
the interrupt handler new PSWs. Any change to any new PSW will
cause the next interrupt handler for that event to be dispatched in the
wrong mode or key or to the wrong address. Subsequent results can be
very unpredictable.
Keep in mind that the CVT pointer at location X'10' is constantly
refreshed and the old PSWs are constantly updated by the hardware.
They could have- been overlaid at one time and still look okay in the
dump from an MP system.
In a SADMP on a UP, locations X'OO' through X'18' are always overlaid by the IPL CCWs and PSW from the IPL of SADMP itself. They
will never contain valid data.
Other important fields in the PSA are as follows.
The interrupt code for the various classes of interrupts are located at:
•

X'84' external interrupt

•

X'88' SVC interrupt

•

X'8C' program interrupt

These fields indicate the last type of interrupt associated with each
interrupt class for each processor.
PSA + X'21 0' - address of the LCCA (1 per processor). The LCCA contains many of the status-saving areas that were located in low storage in
previous systems. It is used for software environment saving and
indicators. The registers associated with each of the interrupts you
have discovered in the PSA are saved in this area. In addition, the
system mode indicators for each processor are maintained in the LCCA.
The ASCB and TCB NEW/OLD pointers in the PSA (locations
X'218-227') indicate the currently dispatched task. Note: PSATOLD
can equal zero if an SRB is dispatched.

AppendiX B: Stand-alone Dump Analysis

B.1.23

Appendix B: Stand-alone Dump Analysis (continued)
PSA + X'228' - PSASUPER. This is a field of bits that represent
various supervisory functions in the system. If a loop is suspected,
check these fields to isolate the looping process':
PSA + X'2F8' - PSAHLI. This field indicates the current locks held
on each processor. Knowing which locks are held may help isolate
the problem, especially in a Loop situation, By determining the lock
holders you can isolate the current process ..
PSA + X'380' - PSACSTK. This is the address of the active recovery
stack that contains the addresses of the recovery routines to which
control will be routed in case of an error. If the address is other than
X'COO' (normal stack), determining the type of stack (for example,
program check FLIH, restart FLIH should aid in debugging the loop
situation.
Another thing to consider in systems analysis is the possibility of a
storage overlay of some critical system code such as lOS or GETMAIN.
Because of the recovery aspects of MVS (percolation and retry),
evidence of storage overlays can often be found in the LOGREC
recording buffers. To find the LOGREC recording buffers:
CVT + X'23C'
(RTCT).

= pointer to the recovery termination control table.

The RTCT + X'20' = pointer to the recording buffers.
The recording buffer (LRB) + 0

=pointer to the first entry.

The recording buffer (LRB) + 4 = pointer to the last entry.
The recording buffer (LRB) + 8

=pointer to the next available buffer.

Each buffer entry for a software record begins with X'408x' or '428x'
where x =the release number. Each software entry is approximately
X'200' bytes long. The first X'20' bytes is header information and
contains the CPUID and serial, the time and date, and the JOBNAME
if entry is made from an ESTAE routine. This is followed by an
SDWA as defined in the Debugging Handbook.
Identify the last entry. Are there entries following it? If so, the buffer
might have been wrapped and it no longer contains the earliest entry.
It is a good idea to have the SYSl.LOGREC records for the time
leading up to the dump. Scan the trace table for SVC 4C. This
represents a call to the logrec recording task and identifies a record
being written to SYS1.LOGREC. If SVC 4Cs appear in the trace,
it is certain that there are SYSl.LOGREC records that may more
closely define the problem. (See the discussion of logrec records in
the chapter "Use of Recovery Work Areas for Problem Analysis" in
Section 2 of this manual).

B.1.24

OS!VS2 System Programming Library: MVS Diagnostic Techniques

Appendix B: Stand-alone Dump Analysis (continued)

As a general approach, follow the flow of FRR activity from the last
entry backwards until a pattern is recognizable or the first entry is
found.
If the abend codes relate to a particular component, refer to that
component's analysis procedure in Section 5 of this manual.

If you can define a function that is consistently failing (lOS, a program
check, etc.), examine the trace table for evidence of successful completion of this function. If the function completed successfully, the
search for the function that caused the overlay is narrowed to those
functions appearing in the trace between the last successful completion
and the first evidence of error. This should at least narrow the search
to the address space and task level.
Analyze the contents of the overlaid storage. If it appears to contain
registers, determine what data areas or modules the registers are
pointing at. This helps to identify the failing code.

If there is no evidence of a storage overlay, return to your system
analysis at the beginning of Note 31.
If a storage overlay exists, further examination of the reported
problem is usually non-productive until the cause of the system damage
is explained.
It might be necessary to build a trap to identify the cause of the overlay.
The chapter "Additional Data Gathering Techniques" in Section 2 of
this manual helps in building such a trap.

Appendix B: Stand-alone Dump Analysis

B.1.25

B.l.26

OS/VS2 System Progr~mming Library: MVS Diagnostic Techniques

Appendix C: Abbreviations

ABP
ACA
ACB
ACE
ACP
ACR
ACT

ADA
AFQ
AlA
ALCWA
ALPAQ
AMB
AMBL
AMCBS
AMDSB
AP

APF
APG
ASCB
ASID
ASM
ASMHD
ASMVT
ASPCT
ASST
ASVT
ASXB
ATA
AVT

>

Actual block processor
ASM control area
Access method control block
ASM control element
Automatic command processing
Alternate CPU recovery
Account control table
Automatic data area
Available frame queue
ASM. I/O request area
Allocation work area
Active link pack area queue
Access method block
AMB list
Access method control block structure
Access method data statistics block·
.Attached processor
Authorized program facility
Automatic priority group
Address space control block
Address space identification
Auxiliary storage manager
Allxiliary storage management header
A$M vector table
Auxiliary storage page correspondence table
Address space sector table
Address space vector table
Address space extension block
ASM tracking area
TCAM address vector table

BPCB
BUFC

Buffer pool control block
Buffer control area

CA
CAW
CAXWA
CCA
CCH
CCW
CDE
CFQ

Control area or channel adapter
Channel address word
Catalog ACB extended work area
Catalog communications area
Channel check handler
Channel command word
Contents directory entry
Common frame queue
Change priority
Control interval
Control interval definition field
Console message buffer
Completion field

CHAP
CI
CIDF
CMB
CMP

Appendix C:

Abbreviation~

C.I.I

Abbreviations (continued)
CMS
CMSWA
CPA
CPAB
CPB
CPPL
,~
CPU
CPUID
CQE
,CRA
CSA
CSCB
CSD
CTGPL
CVT
CXSA

Cross memory services or cat~og management services
CMS work area
Channel program area
Cell pool anchor block
Channel program block
Command processor parameter list
Central processing unit
CPU identification
Console queue element
Component recovery area
Common storage area
Command scheduling control block
Common system data area
Catalog parameter list
Communications vector table
Communications extended save area

DAT
DAVV
DCB
DCM
DCT
DDRCOMDE
DEB
DECB
IIDIDOCS
DIE
DIR
DMDT
DMVT
DQE
DRQ
DSAB
DSCB
DSPCT
DVT

Dynamic address translation
Direct access volume verification
Data control block
Display control module
Device control table
Dynamic device reconfiguration communication table
Directory entry
Data extent block
Data event control block
Device independent display operators console support
Disable interrupt exit
Deferred incident record
Domain descriptor table
Domain' vector table
Descriptor queue element
Data ready queue
Data set association block
Data set control block
Data set page corfespondence table
Destination vector table:

ECB
ECC
ECT
EDB
EDL
EED
ElL
EIP
EMS
EOA
EP
EPATH
EPS

C.l.2

Event control block
Error checking and correction
Environment control table
Extent descriptor block
Eligible device list
Extended error descriptor
Event indication list
EXCP intercept processor
Emergency signal
End of address
Emulator program
Error path (recovery audit trail area)
External page storage

OS/VS2 System Programming Library: MVS Diagnostic Techniques

Abbreviations (continued)
ERP
ERPIB
ESTAE
ESTAI
EVNT
EWA

Error recovery procedures
Error recovery procedures interface block
Extended ST AE
Extended ST AI
Event table
Common ERP work area

FBQE
FDB
FETWK
FIFO
FLIH
FMCB
FOE
FOT
FQE
FRR
FRRS
FSB
FVT

Free block queue element
Feedback data block
Fetch work area
First in first out
First level interrupt handler
VT AM function management control block
Fixed ownership element
Fixed ownership table
Free queue element
Functional recovery routine
FRR stack
Feedback status block
Field vector table

GDA
GPR
GSMQ
GSPL
GSR
GTF

Global data area
General purpose register
Global service manager queue
Global system priority list
Global shared resource
General Trace Facility

HIR

Hardware instruction retry

IC
ICNCB
IHSA
ILC/CC
lOB
10E
10MB
10QE
10RB
10SB
lOT
IOWA
IPC
IPCS
IPL
IPS
IQE
IRB
IRT

Instruction counter
Intermediate controller node control block
Interrupt handler save area
Instruction length condition code
Input output block
I/O request element
I/O management block
I/O queue element
I/O request block
I/O supervisor block
I/O table
I/O work area
Inter-processor communication
Interactive problem control system
Initial program load
Installation perfonnance specifications
Interrupt queue element
Interrupt request block
lOS recovery table

Appendix C: Abbreviations

C.1.3

Abbreviations (continued)

C.1.4

JCL
JCT
JES
JESCT
JFCB
JFCBX
JOE
JOT
JPQ
JQE
JSCB
JSEL
JSXL

Job con trollanguage
Job con trol table
Job Entry Subsystem
JES control table
Job file control block
Job file control block extension
Job output element
Job output table
Job pack queue
Job queue element
Job step control block
Job scheduling entry list
Job scheduling exit list

KSDS

Key sequence data set

LCB
LCCA
LCCAVT LCH
LCPB
LCT
LDA
LFQ
LG
LGF
LGCB
LGE
LGN
LGVT
LGVTE
LIFO
LIT
LLE
LLQ
LPA
LPDE
LPID
LPME
LRB
LSID
LSMQ
LSPL
LSQA
LVB
LWA

TP line control block
Logical configuration communication area
Logical configuration communication area vector table
Logical channel queue
Logical channel program block
Linkage control table
Local data area
Local frame queue
Logical group
Line group block
Logical group control block
Logical group entry
Logical group number
Logical group vector table
Logical group vector table entry
Last in first out
Lock interface table
Load list element
Load list queue
Link pack area
Link pack directory entry
Logical page identifier
Logical to 'physical mapping entry (or) logical page mapping entry
Logrec buffer
Logical slot ID
Local service manager queue
Local service priority Jist
Local system queue area
NCP logical unit block
Logon work area

MCH
MCIC
MCP
MCS

Machine check handler
Machine check interrupt code
Message control program
Multiple console support

OS/VS2.System Programming Library: MVS Diagnostic Techniques

Abbreviations (continued)

MLPA
MP
MPST
MSS
MVS
MWA

Malfunction alert
Message handler
Missing interrupt handler
Modified link pack area
Multiprocessing
Memory process scheduling table
Mass storage subsystem
Multiple Virtual Storage
Module work area

NCP
NCP
NIP

VTAM node control block
Network Con trol Program
Nucleus initialization program

OCR
OCT
OPWA
ORE
OUCB
OUSB
OUXB

Output control record
Output control table
Open work area
Operator reply element
SRM-user control block
SRM-user swapp able block
SRM-user extension block

PAB
PART
PARTE
PAT
PCB
PCCA
PCCAVT PCCB
PCCW
PCE
PDDB
PDS
PEP
PER
PFT
PFTE
PGT
PGTE
PICA
PIE
PIT
PIU
PLH
PLPA
PLPAD
PQE
PRB
PSA

Process anchor block
Paging activity reference table
PART entry
Page allocation table
Page control block
Physical configuration communication area
PCCA vector table
Private catalog con trol block
Paging channel command work area
Processor control element
Peripheral data definition block
Partitioned data set
Partitioned emulator program
Program event recording
Page frame table
Page frame table entry
Page table
Page table en try
Program interrupt control area
Program interrupt element
Partition information table
Physical information unit
Place holder
Pageable link pack area
PLP A directory
Partition queue element
Program request block
Prefixed save area

MFA

MH
MIH

I

Appendix C: Abbreviations

C.l.S

Abbreviations (continued)
PSAHLHIPSCB
PSS
PST
PSW
PTLB
PVT
PVTAFC PWKA

PSA highest lock held indicator
'Protected step control block
Process scheduling service
Process scheduling table
Program status word
Purge translation lookaside buffer
Paging vector table
PVT available frame count
Paging work area

QAB
QCB
QEL

Queue anchor block
Queue control block
Queue element

RACF
RB
I RBA
RBN
RCB
RCT
I RDCM
RDF
RDT
RDTE
RIM
RJE
RMCT
RMF
RMS

RPH
RPL
RQE
RSM
RSMHD
RTAM
RTCA
RTCT
RTM
S/A
SART
SAST
SAT
SCCW
SCT
SDWA
SGT
SGTE
SIC
SIGP
SIO

C.l.6

Resource Access Control Facility (program Product)
Request block
Relative byte address
Real block number
Resource control block
Region control task
Resident display control module
Record definition :field
Resource definition table
Resource definition table entry
Resource initialization module
Remote job entry
Resource m~nager control table
Resource Management Facility (Program Product)
Recovery management support
Request parameter header
Request parameter list
Request queue dement
Real storage manager
RSM header
Remote tenninal access method
Recovery termination control area
Recovery termination control table
Recovery termination manager
Stand.alone (dump program)
Swap activity reference table
Subsystem allocation sequence table
Swap allocation table
Swap channel control work area
Step control table
System diagnostic work area
Segment table
Segment table entry
System initiated cancel
. Signal processor
Start input/output

OS!VS2· System Programming Library: MVS Diagnostic Techniques

Abbreviations (continued)
SlOT
SLIH
SMF
SMS
SNA
SPCT
SPQE
SQA
SRB

Step I/O table
Second level interrupt handler
System measurement facility
Storage management services
System Network Architecture
Swap control table
Subpool queue element
System queue area
Service request block
System resources manager
Serially reusable resource
System services control point
Subsystem communications vector table
Subsystem interface
Subsystem identification block
Subsystem options block
SVRB suspend queue
Suspended service request block
Subsystem vector table
Specify task abnormal exit
Sub task abend in tercept
Started task control
Subtask control block
Store Multiple instruction
Segment table origin register
Supervisor call
Supervisor request block
Scheduler work area

SRM
SRR
SSCP
SSCVT

SSI
SSIB

SSOB
SSQ
SSRB
SSVT
STAE
STAI
STC
STCB
STM
STOR
SVC
SVRB
SWA
TCAM
TCB
TCH

TIE

Telecommunications Access Method
Task control block
Test channel
TCAM
Page able display control module
Translation exception address
Transmission header
TerminalI/O coordinator
Task input/output table
Translation lookaside buffer
Task mode controller
Task mode element
Terminal monitor program
Time of day
Terminal statl,ls block
Time Sharing Option
Trace table entry

UADS
UCB
UCM

User attribute data sets
Unit control block
Unit control module

TeX
TDCM
TEA
. TH
TIOC
TIOT
TLB
TMC
TME
TMP
TOO
TSB
TSO

-

-

. Appendix C: Abbreviations

C.I.7

Abbreviations (contin~ed)

UCME
UIC
UPT

Unit controlmQdule. entry
Unreferencedidterval count
User .profile table

VBN
VBP
VOSCB
VIO
VSAM
VSM
VTAM
VTOC
VUT·.

Virtual block number
Virtual block processor
Virtual data set control block·
Virtual I/O
Virtaul Storage Access Method
Virtual storage··management
Virtual Telecommunications Access Method
Volume table of contents
:Volume unload table

-

I

WAST
WMST
WQE

WfQE
XL
XPTE

C.l.8

. Workload activity specification 'table
Workload manager specification table
Write queue element
Wait queue element
Extent list
External page table entry

OS/VS2 System Prol@JPmingLibrary: MVS Diagnostic Techniques

Index

abbreviations, list of C.l.3
abend codes
ASM 08x series 5.6.14
COD in ASM 5.6.19
started task control 2.7.19
SW A manager 2.7.20
symptoms of lOS problems 5.2.4
OBO in Allocation 5.11.13
OC4 in Allocation 5.11.13
.306 abend in program manager 5.3.18
806 abend in program manager 5.3..14
abend dump debugging 2.7.11 .abend resource manager 5.3.13
abnormal end appendages
with ERPs 5.2.10
abnormal task termination (RTM) 5.14.5
ACB (access method control block)
how to locate 5.10.3
major fields in 5.10.4
major flags in 5.10.4
ACCOUNT command processor A.6.19
ACR (see alternate CPU recovery)
active recovery stack 2.1.6
additional data gathering techniques 2.8.1
addresses, commonly bad 2.7.5
address space
analysis 2.1. 7
ASM's 5.6.5
blocks 4.4.4
dispatchable work in B.1.7
dispatcher's 5.1.8
initialization 5.4.3
OUCB queues 5.7.6
states 5.7.2
termination 5.14.9
tests made by dispatcher
5.1.11
allocation
of SRM device 5.7.5
of virtual storage 5.4.6
allocation/unallocation
abends
OBO 5.11.13
OC4 5.11.13
address space termination 5.11.13
allocation
common 5.11.4
fixed device 5.11.4
generic 5.11.5
module naming conventions 5.11.6
recovery 5.11.5
serialization 5.11.11
TP 5.11.4
work area 5.11.7
batch initialization 5.11.2
data set association block (DSAB) 5.11. 7
device selection 5.11.12
dynamic initialization 5.11.3
EST AE processing 5.11.10
JFCB housekeeping 5.11.3
job control table (JCT) 5.11.2
job step control block (JSCB) 5.11.2

allocation/unalloca tion (con tin ued)
linkage control table (LCT) 5.11.2
reason codes 5.11.16
step control table (SCT) 5.11.2
unallocation
common 5.11.5
dynamic 5.11.3
volume mount and verify (VM&V) 5.11.5
defmition 2.5.2
initiated via EMS 2.5.10
problem analysis 2.7.1
AMCBS, major fields in 5.10.2
AMDPRDMP
control cards 2.8.2
example of use of data 4.1.2
how to copy tapes 2.8.5
QCBTRACE option
function 2.8.4
use for loop analysis 4.2.3
use for wait analysis 4.1.2
APFauthorization 5.3.14,5.3.18
appendages, abnormal end
with ERPs 5.2.10
ASCB (address space control block)
analysis 2.1.7
ASM (auxiliary storage manager)
address space structure 5.6.6
cell pools 5.6.6
component
functional flow 5.6.2
operating characteristics 5.6.4
con trol blocks 5.6.19
converting a slot number to full seek address
COD abend 5.6.19
diagnostic aids 5.6.18
error analysis suggestion 5.6.12
finding the LSID for a given page 5.6.9
footprints and traces 5.6.7
FRR/ESTAE work areas 5.6.15
general debugging approach 5.6.8
incorrect pages 5.6.9
interfaces with other components 5.6.7
MP considerations 5.6.6
page/swap data set errors 5.6.12
paging interlocks 5.6.8
recovery
as a debugging tool 5.6.15
considerations 5.6.13
footprints 5.6.15
structure 5.6.14
traces 5.6.14
register conventions 5.6.7.0
requesting I/O 5.6.3
requesting swap I/O 5.6.4
saving an LG 5.6.2
SDWA variable recording area 5.6.16
serialization 5.6.13.0
SRB structure 5.6.4
storage considerations 5.6.4
system mode 5.6.4
task structure 5.6.4

5.6.10

Index

I.i.l

ASM (auxiliary storage manager) (continued)
unuseable paging data sets 5.6.11
validity checking 5.6.13
ATA (ASM tracking area) 5.6.19
ATTACH (program manager function) 5.3.8,5.3.15
attention processing (TSO) A.6.25
attention, console
not responding 5.15.6
audit trail area (EPATH) 5.6.22
auxiliary storage manager (see ASM)
backout (for DEFINE/DELETE) 5.10.13
batch initialization 5.11.2
BLDL table analysis 4.4.5
BPCBs (buffer pool control blocks) 5.8.12
BSHEADER data area 5.6.25
BUFCONBK data area 5.6.25
buffer
emergency signal 2.5.11
external call 2.5.17
LOGREC 2.4.14
translation lookaside 2.5.1
VTAM buffer pools 5.8.16
VTAM buffer trace 4.3.6,4.3.29
cancel process (RTM) 5.14.7
catalog communications area (see CCA)
catalog management
backout 5.10.13
CMS function gate 5.10.11
component analysis 5.10.1
debugging aids 5.10.15
diagnostic output 5.10.12
establishing/releasing a recovery environment 5.10.10
how to fmd registers 5.10.1
maintaining a pushdown list end mark 5.10.10
major control blocks 5.10.2
major registers 5.10.2
module structure 5.10.9
recovery routine functions 5.10.12
tracking GETMAIN/FREEMAIN activity 5.10.11
VSAM catalog recovery logic 5.10.10
catalog parameter list (CTGPL), major fields 5.10.6
CAXWA
major fields 5.10.5
major flags 5.10.6
CCA (catalog communication area)
major fields 5.10.7
major flags 5.10.7
CDE (contents directory entry)
allocation 5.3.17
analysis 4.4.4
initialization by IDENTIFY 5.3.12
order of on ALPAQ 5.3.15
cell pool anchor block (see CP AB)
cell pool management
VSM 5.4.10
channel program
with ERPs 5.2.10
channel scheduler, invoked for lOS 5.2.1
CHNGDUMP command
to change SDUMP contents 2.8.2, 3.1.6
to override SVCDUMP parameters 2.8.5
C/L IN, OUT traces
definition 4.3.6
example 4.3.12
class locks
with ASM 5.6.13.2

L1.2

CMS function gate 5.10.11
CMS lock 2.3.2
CMS lockword
contents 2.3.5
requests for unavailable 2.3.7
suspend queues 2.3.7
command processor
and TMP interface A.6.15
parameter list A.6.17
COMM task, current status 4.1.15
(see also communications task)
common
allocation 5.11.2
storage area (see CSA)
unallocation 5.11.2
communications task
control blocks 5.15.4
debugging hints 5.15.6
description 5.15.1
sequence of processing 5.15.3
compare and swap
serialization with ASM 5.6.13.3
completion codes in IOSB for ASM errors 5.6.11
console
messages 5.15.9
not responding to attention 5.15.6
switching 5.15.10
contents directory entries (see CD E)
controllayer A.5.1
converting virtual to real addresses 5.5.14
CPAB 5.4.10
CQE control block 5.15.4
CSA (common storage area)
analysis of use of 4.4.5
use by TCAM A.6.7
CTGPL (catalog parameter list), major fields 5.10.6
current recovery stack (see FRR stacks)
CVOL processor 5.10.9
CXSA contro block 5.15.4
DASD ERPs 5.2.14
data gathering techniques 2.8.1
data sets
page/swap errors 5.6.12
DEFINE/DELETE backout 5.10.14
DELETE (function of program manager) 5.3.11
DIDOCS
in-operation indicator 5.15.11
locking 5.15.12
trace table 5.15.11
disabled loop (see loops)
disabled mode 2.2.2
disabled wait (see waits)
DISP lock
description 2.3.2
recovery routines when held 5.1.3
dispatch able units of work
in an address space B.1. 7
priority and location 5.1.4
dispatchability tests
address space 5.1.11
SRB 5.1.10
task 5.1.12
dispatcher
component analysis 5.1.3
determining the last dispatch 5.1.12
dispatchability tests 5.1.10
error conditions 5.1.14

OS/VS2 System Programming Library: MVS Diagnostic Techniques

dispatcher (continued)
important entry points 5.1.3
processing overview 5.1.9
recovery considerations 5.1.13
DISPLAY DUMP command 2.8.2
DSNLIST data area 5.6.26
dump analysis
areas 3.5.6
MP 2.5.2
stand-alone 3.1.3, B.1.1
tracing procedure 2.6.5
dumps
how to copy tapes 2.8.5
how to print 2.8.2
sample storage pool dump 5.8.13
DUMP command 2.8.2,5.14.12

FRR stacks, important field contents 2.4.17, B.1.17
functional recovery routine (see FRR)
GDA (global data area) for VSM 5.4.7
GETMAIN/FREEMAIN
GETMAIN FRR 5.4.8
indication in trace table 2.6.9
process flow A.4.1
SVC 120 5.4.12
virtual storage allocation 5.4.6
GETPART/FREEPART 5.4.5
global data area for VSM 5.4.7
global indicators of current system state 2.1.3
global locks
defmition 2.3.1
error status B.1.14
spin locks
definition 2.3.2
content of lockword 2.3.5
suspend locks
defmition 2.3.2
content of lockword 2.3.5
global SRBs
control block relationships 5.1.5
dispatching 5.1.4
mode indicators set by dispatcher 5.1.12
queue structure 5.1.5
status indicators B.l.19
global service priority list 2.2.2 (5740-XEl)
global system analysis (chapter) 2.1.3
GSMQ/LSMQ 2.1.7
GSPLs/LSPLs 2.1.7, 2.2.2
GTF (generalized trace facility)
I/O and SIO trace (EP) 4.3.4
I/O and SIO trace (NCP) 4.3.5
output examples 2.8.17
RNIO trace 4.3.5
trace examples 2.6.3

EDIT command processor A.6.19
EED, important fields 2.4.18
ElL control block 5.15.4
Emergency Signal instruction (see EMS)
EMS (function of SIGP)
defmition 2.5.7,2.5.10
process flow 2.5.14
enabled loop (see loops)
enabled loop exception 4.2.3
enabled wait (see waits)
enabling PER hardware 2.8.18
ENQ/DEQ
analysis for enabled waits 4.1.12
analysis for performance degradation
common ENQ resource names 4.1.13
enqueue lockout B.1.8
global save area 4.4.4
EP mode traces 4.3.4
EPATH (error path) 5.6.22
ERPs (error recovery procedures)
abnormal end appendages 2.7.3, 5.2.10
description 5.2.8
diagnostic approach 5.2.17
EWA (ERP work area) 5.2.6
traps 5.2.16
error id 5.14.10
error interpreter table 5.2.11
error recovery procedures (see ERPs)
EST AE/EST AI
ASM work areas 5.6.15
processing, allocation 5.11.10
EWA (ERP work area) 5.2.6
EXCP major control block relationships 5.2.3
EXCP/IOS process flow A.3.1
execution modes (see system mode)
exit resource manager 5.3.11
explicit waits 2.1.8, B.1.9
extended error descriptor (EED) 2.4.18,5.14.2
external call (XC function of SIGP)
description 2.5.7,2.5.9
process flow 2.5.12
FETCH, program manager work area (FETWK)
FMCB/DNCB, how to fmd for a node 5.8.14
FORCE command 5.14.8
FORMAT statement (ofPRDMP) 2.8.4
formatted RTM control blocks 2.4.19
formatting (LOGREC buffer) 2.4.15
FRR (functional recovery routine)
ASM's 5.6.14
ASM's FRR work areas 5.6.15
GETMAIN's 5.4.8
RSM's 5.5.9
SRM's 5.7.10

hard ware-detected errors, analy sis
hierarch¥ of locks 2.3.2

5.3.19

3.1.10

IDENTIFY (function of program manager) 5.3.12
lEA VGF A tests by RSM A.l.3
IEAVIOCP tests by RSM A.1.6
lEA VPIOP tests by RSM A.l.6
lEA VPIX tests by RSM
A.I.3
IEAVSWIN A.2.1
IHSA 2.2.1
IEAVTABD 5.14.12
IEAVTSDT 5.14.11
ILC/CC important field contents 2.1.4
Incorrect Output (chapter) 4.5.1
analyzing system functions 4.5.2
initial analysis 4.5.1
isolating the component 4.5.1
in-operation indicator
DIDOCS 5.15.11
Installation Performance Specification (IPS) 5.7.1
inter-processor communication 2.5.7
interactive problem control system (lPCS) 1.1.4
intercept condition
ERPs 5.2.13
interrupts, PSA fields B.1.23
I/O
capability in MP 2.5.17
incomplete B.1.8
problems in enabled waits 4.1.10
requesting (ASM) 5.6.3
requesting swap 5.6.4
trace entries 2.6.7

Index

1.1.3

I/O (continued)

VTAM I/O trace (see VTAM)
lOB (see 10MB)
I/O manager
debugging 5.9.9
modules 5.9.8
10MB 5.9.9
I/O request, information in PLH 5.9.2
lOS (I/O Supervisor)
ABEND codes 5.2.4
back-end processing 5.2.1, A.3.3
component analysis S.2.1
ERP processing 5.2.8
EXCP/IOS process flow A.3.1
front-end processing 5.2.1, A.3.1
general hints 5.2.6
loops 5.2.4
major control block relationships 5.2.3
POST STATUS A.3.3
problem analysis 5.2.1
processing overview 5.2.2
save areas 5.2.6
storage manager queues 4.4.4
VTAM interaction A.5.1
wait states 5.2.5
IOSB flags 5.2.7
IOSCAT lock 2.3.2,2.3.6
IOSLCH lock 2.3.2, 2.3.6
IOSUCB lock 2.3.2, 2.3.6
IOSYNCH lock 2.3.2.2.3.6
IPC (see inter-processor communication)
IPCS 1.1.4
JES2 Gob entry subsystem)
&DEBUG parameter 5.12.14
&WAIT macro 5.12.9
control blocks 5.12.17
conversion 5.12.1
dispatcher 5.12.9
queue structure 5.12.10
error routines
catastrophic 5.12.13
disastrous 5.12.11
ESTAE 5.12.13
exit 5.12.13
I/O error logging 5.12.14
execution 5.12.1
HASP Control Table (HCT) 5.12.4
HASPSSSM 5.12.6
multi-access spool configuration 5.12.14
initialization 5.12.15
read 5.12.15
release 5.12.16
write 5.12.15
operator commands for status information
output 5.12.1
processor control element (PCE) 5.12.9
purge 5.12.2
structure 5.12.2
subsystem interface 5.12.7
JFCB housekeeping 5.11.3
job control table (JCT) 5.11.2
LCCA indicator 2.2.4
LCH queues, analysis for enabled waits
LDA, important flags 5.4.6
LG, saving 5.6.2
line drop (TSO processing) A.6.12
linkage control table (LCT) 5.11.2
LINK (function of program manager)
description 5.3.5
module search sequence 5.3.15

1.1.4

4.1.10

4.4.2

LMOD map, how to print 2.8.8
LOAD (function of program manager)
description 5.3.11
module search sequence 5.3.15
local lock
defmition 2.3.1
dispatcher recovery routines 5.1.13
lockword contents 2.3.5
lockword location 2.3.6
requests for unavailable 2.3.7
suspend (defmition) 2.3.3
local SRBs
control block relationship 5.1.7
dispatching 5.1.6
dispatching priority in address space 5.1.8
mode indicators set by dispatcher 5.1.12
queue structure 5.1.7
status indicators B.1.19
locating statu's information in a storage dump 2.2.5
locked mode
defmition 2.2.3
status saving during execution in 2.2.3
locking (chapter) 2.3.1
lock interface table (IEAVESLA) 2.3.5
locks (see also lockwords)
classes 2.3.1,2.3.6
determining which held on a processor 2.3.4
hierarchy 2.3.2
location of 2.3.6
PSAHLSI bits 2.3.4
requests for.unavailable 2.3.7
table of definitions 2.3.2
types 2.3.2
VTAM locking 5.8.7
with ASM 5.6.4,5.6.19
with DIDOCS 5.15.2
lockwords
contents of 2.3.5
how to find 2.3.5
LOGDATAverb 2.4.15
logging, ERPs 5.2.12
logical groups
assigning 5.6.2
releasing 5.6.2
logon
command processor A.6.19
diagnostic aids A.6.11.0
initialization A.6.8
monitor A.6.8
post codes A.6.11.0
process overview A.6.1
scheduler A.6.10
scheduler router A.6.8
verification A.6.10
work area A.6.9, A.6.11.0
LOGREC
analysis 2.4.2
buffer, recording control 2.4.14
for debugging SVC dump 5.14.12
formatting 2.4.15
how to print 2.8.9
listing LOGREC data set 2.4.2
record examples 2.4.3
recording control buffe'r 2.4.14
loops
common loops 4.2.1
disabled
apparent in IEAVERI 2.5.16
definition 4.2.1
intentional 4.2.1
PSASUPER bits to check 2.1.5

OS/VS2 System Pr~grammingLibrary: MVS Diagnostic Techniques

loops (continued)
system mode 4.2.4
enabled
definition 4.2.1
exception 4.2.3
in Lock Manager code B.l.19
symptoms of lOS problems 5.2.4
low storage overlays 2.7.4
LPAMAP (statement in PRDMP) 2.8.4
LPSW, common uses of 4.1.4
LSID, finding
for a page 5.6.9
for VIO 5.6.10.0
LSMQ 2.1.7
LSPL 2.1.7,2.2.2

open/close/end-of-volume (see O/C/EOV)
OPERATOR command processor A.6.19
operator commands
for status information 4.4.2
to identify performance degradation 4.4.1
ORE control block 5.15.4
other tracing methods 4.3.30
OUCB (SRM user control block)
important fields 5.7.2
OUTPUT command processor A.6.20
overlays, storage
cause of wait state PSWs 4.1.4
how to locate in trace table 2.6.2
in low storage 2.7.4
pattern recognition 2.7.3

machine checks
debugging 2.7.6
interrupt code (MCIC) 2.7.6
reference matrix 2.7.10
message flow
through the system 4.3.1
trace examples 4.3.12
messages
ERPs 5.2.12
lost 5.15.8
routed wrong 5.15.9
miscellaneous debugging hints (chapter) 2.7.1
module search sequence
for LINK, ATTACH, XCTL, LOAD 5.3.15
of private libraries 5.3.16
module subpools 5.3.19
MP (multiprocessing)
activity in trace table 2.6.8
ASM's use of 5.6.6
associated data areas 2.5.3
debugging hints 2.5.16
dump analysis 2.5.2
effects on problem analysis 2.5.1
features of MP environment 2.5.1
parallelism 2.5.4
PSA analysis B.1.3
remote pendable services 2.5.9
remote immediate services 2.5.10
SIGP instruction 2.5.7
system stop routine 2.8.20
MSGBUFER data area 5.6.26
multiprocessing (see MP)
multi-access spool configuration 5.12.14
MVS trace (see trace, trace table)

PAB (process anchor block) 5.8.2
page control block (see PCB)
page fault
process flow A.1.3
Reclaim 5.5.8
status saving 2.2.6
trace examples 2.6.3
waits 4.1.10
page frame table entries (see PFTE)
page stealing . 5.5.6
page waits B.1.9
page/swap data set errors 5.6.12
paging
fmding the LSID 5.6.9
incorrect pages 5.6.9
interlocks 5.6.8
process 5.5.6
unuseable data sets 5.6.11
paging requests, analysis 4.4.5
parallelism 2.5.4
PART/PAT bit, locating 5.6.10.3
pattern recognition 2.7.3
PCB (page control block)
important fields in 5.5.3
swap-out A.2.5
use in debugging A.2.5
PeeB major fields and flags 5.10.3
PEP emulator line trace 4.3.4
performance degradation
chapter on 4.4.1
dump analysis areas 4.4.2
operator commands to identify 4.4.1
PER hardware
en~bling to monitor storage
2.8.18
trace example 2.8.19
PFTE (page frame table entries)
analysis 4.4.4
important fields 5.5.6
PGTE, RSM tests on A.1.3
physically disabled mode 2.2.2
PIU (physical information unit)
format 4.3.27
tracing inbound/outbound 4.3.30
PLH (place holder) 5.9.2
post codes, LOGON A.6.11.0
PRB initialization 5.3.7
PRDMP (see AMDPRDMP)
PRE-TMP exit A.6.11
PRINT statement (in AMDPRDMP), use of 2.8.4
printer ERP 5.2.15
private libraries, module search sequence 5.3.16
process flows
page faults (RSM processing) A.1.3

NCP (network control program)
activating several NCP traces 4.3.28
channel adapter traces 4.3.5
line trace
definition 4.3.5
example 4.3.12
node trace 4.3.5
normal stack 2.1.6,2.4.17
normal task termination 5.14.4
no-work wait (see also enabled waits) 4.1.8
O/C!EQV (open/close/end-of-volume)
abends
2.7.5
DEB chaining 5.9.8
debugging aids 5.9.7
ENQs issued by 5.9.7
messages 5.9.6
online problem analysis 1-1.4·

Index

1.1.S

process flows (continued)
EXCP/IOS A.3.1
GETMAIN/FREEMAIN A.4.1
swapping A.2.1
TSO A.6.1
VTAM A.5.1
program checks
example of LOGREC entry 2.4.10
interrupts 5.1.14
VTAM 5.8.15
Program Manager
APF authorization 5.3.14
ATTACH 5.3.8
CDE
allocation 5.3.17
component analysis 5.3.1
control blocks 5.3.1,5.3.3
DELETE 5.3.11
exit resource manager 5.3.11
FETCH/program manager work area 5.3.19
functional description 5.3.1
functional flow 5.3.5
IDENTIFY 5.3.12
LINK 5.3.5
LOAD 5.3.11
module description 5.3.2
module search sequence
for LINK, ATTACH, XCTL, LOAD 5.3.15
of private libraries 5.3.16
module subpools 5.3.19
organization 5.3.1
process anchor block (PAB) 5.8.2
queue validation 5.3.4
queues, description 5.3.2
RB extended save area 5.3.20
SYNCH 5.3.12
system initialization 5.3.5
XCTL 5.3.8
806 ABEND 5.3.14
PSA (preflxed save area)
analysis on MP systems B.1.13
contents of important flelds 2.1.5
indicators 2.2.4
interrupt it: licator B.1.23
using as a patch area 2.8.10
used to determine current system state 2.1.3,2.2.4
PSW (program status word)
analysis 2.1.4
wait state B.1.2
pushdown list end mark, maintaining 5.10.10
QCBTRACE (AMDPRDMP option)
when to use 2.8.4
use for loop analysis 4.2.3
use for wait analysis 4.1.12
QTIP
attention handler A.6.27
processing A.6.4
RB (request block)
analysis 2.1.9
extended save area (RBEXSAVE) 5.3 .20
manipulation by XCTL 5.3.10
new RB initialization for XCTL 5.3.9
RCB (recording control buffer) 2.4.14
RCT (region control task)
attention exit A.6.28
attention scheduler A.6.28
functions A.6.27
RDCM (resident display control module)
con trol block 5.15.4

1.1.6

real addresses, converting 5.5.14
real frame shortage, indicators 4.4.5
real storage manager (see RSM)
reason codes
allocation 5.1-1.16
started task control 2.7.19
SWA manager 2.7.20
reclaim (function of RSM) 5.5.8
record management
debugging aids 5.9.3
processing 5.9.1
recovery audit trail (ASM) 5.6.22
recovery stack 2.2.4
recovery work areas, use of 2.4.1
register conventions
ASM 5.6.7.0
Relate (function of RSM) 5.5.8
replies, lost 5.15.8
requesting
I/O (ASM) 5.6.3
swap I/O (ASM) 5.6.4
retry process (RTM) 5.14.6
retry/restart
with ERPs 5.2.10
RMCT (SRM control table)
system indicators 5.7.3
RNIO trace example 4.3.12
RPHs (request parameter headers)
location of 5.8.11
queuing while waiting for storage 5.8.14
waiting for the same lock 5.8.9
RPL error flelds 5.9.1
RSM (real storage manager)
abend reason codes 5.5.10
component analysis 5.5.1
debugging tips 5.5.12
major control blocks 5.5.1
.page fault processing A.1.3
page stealing process 5.5.6
Reclaim 5.5.8
Recovery 5.5.9
Relate 5.5.8
RTM (recovery termination manager)
cancel 5.14.7
error id 5.14.10
FORCE command 5.14.8
extended error descriptor (EED) 5.14.2
hardware error processing 5.14.2
major RTM modules 5.14.1
process flow 5.14.2
retry 5.14.6
RTM1 5.14.1
RTM2 5.14.2
stack vector table 2.2.4
systept ~iagnostic work area (SDWA) 5.14.2
termmatlOn
abnormal task 5.14.5
address space 5.14.9
normal task 5.14.4
I use in producing SVC dump 5.14.11
RTM2WA
deflnition 2.4.19
status information A.11-A.12
SALLOC lock 2.3.2,2.3.6
with ASM 5.2.13.0
SCHEDULE macro 2.2.2
scheduler work area (see SWA manager)
SDUMPs
analysis 3.1.5
how to change contents of 3.1.6
parameter list 3.1.6

OS!VS2 System Programming library: MVS Diagnostic Techniques

SDWA (system diagnostic work area)
data recorded by dispatcher
5.1.13
use by Catalog Management 5.10.1
use by SYS1.LOGREC 2.4.3,2.4.6-2.4.10
use in FRR stack 2.4.18
SDWAVRA (SDWA variable recording area)
entries 2.4.10,5.4.8
error indicators 5.4.9
use by ASM 5.6.16
use by catalog management 5.10.12
sense command
with ERPs 5.2.13
serialization, ASM 5.6.13.0
SIC (system-initiated cancel) A.6.14
SIGP (signal processor) instruction 2.5.7
EMS function 2.5.10
return codes 2.5.8
XC function 2.5.9
SLIP command, using 2.8.11
SLIP keywords 2.8.11
SLIP trap design 2.8.12
slot number, converting to full seek address 5.6.10
software-detected errors, analysis 3.1.9
software incidents
examples 2.4.3
types 2.4.3
. SPCT (swap control table)
format of 5.5.5
important fields in 5.5.5
special exits, dispatching 5.1.4
spin locks, defmition 2.3.2
SRB (see also local and global SRBs)
dispatching queues 2.1.7,5.1.4
global 5.1.4
local 5.1.6
locally locked interrupted/suspended 2.2.3
mode 2.2.2
suspension 2.1.9,2.2.7,2.3.7
tests made by dispatcher 5.1.10
SRM (system resources manager)
address space states 5.7.2
control algorithms 5.7.12
entry point summaries 5.7.8
error recovery 5.7.8
functional recovery routine 5.7.10
in dica tors 5.7.3
interface routine 5.7.8
I/O management 5.7.12
objectives 5.7.1
processor management 5.7.11
resource manager 5.7.12
service routine 5.7.10
storage management routine 5.7.9
SYSEVENT router 5.7.9
system (RMCT) indicators 5.7.3
user (OUeB) indicators 5.7.6
workload activity recording 5.7.14
workload manager 5.7.15
SSI (subsystem interface)
function codes 5.13.10
function dependent area 5.13.5
initialization processing 5.13.1
lES control table (JESCT) 5.13.2
logic flow examples 5.13.7
major control blocks 5.13.2
requesting services 5.13.5
return codes 5.13.8
subsystem
communications vector table (SSCVT) 5.13.1
information block (SSIB) 5.13.1

su bsystems (COil till II ed)
options block (SSOB) 5.13.1
vector table (SSVT) 5.13.1
stand-alone dump
analysis B.l.1
procedure B.l.7
chapter on 3.1.3
debugging SVC dump 5.14.11
determining system mode from 2.2.4
how to print 2.8.2
special notes 2.1.3
started task control (see STC')
status information, locating in storage dump 2.2.5
STATUS STOP SRB 2.5.6
STC (started task control)
abend codes 2.7.19
reason codes 2.7.19
step initiation/termination 5.4.5
SUBMIT command processor A.6.20
subpools for modules 5.3.19
SUM DUMP output 2.7.14
summary dump 2.7.14
super bits (see PSASUPER)
superzaps
to expand trace table 2.8.21
to force tracing during NIP processing 2.8.21
to modify trace table to monitor low storage 2.8.20
to stop MP system 2.8.21
to trace all inbound PIUs 4.3.30
to trace all outbound PIUs 4.3.30
VTAM buffer trace modification 4.3.29
VT AM I/O trace modification 4.3.29
suspend locks, definition 2.3.2
suspended
locally locked tasks 2.2.6
SRB status 2.1.9
SRB/task with lock held B.1.15
tasks or address space caused by unsatisifed ENQ
request 4.1.12
task status 2.1.8
sve D entries in trace table 2.6.8
SVC dumps
analysis 3.1.5
debugging of
control blocks, use of 5.14.14
fixed data, use in 5.14.12
procedure 5.14.12
recovery routines, use in 5.14.14
SLIP traps, use in 5.14.12
SYS1.LOGREC, use in 5.14.12
variable data, use in 5.14.14
variable data offset determination 5.14.14
how to override parameters 2.8.5
IEAVTSDT, dump task for 5.14.11
invocation of
branch entry 5.14.11
DUMP Command 5.14.12
IEAVTABD 5.14.12
producing
RTM, use in 5.14.11
SYSMDUMP DD', use in 5.14.11
SW A (scheduler work area) manager
reason codes 2.7.20
swap-in process A.2.1
swap transition flags 5.7.2
swapping
process flow A. 2.1
swap-out process A.2.3

Index

1.1.7

swap-out PCB A.2.5
SWIN (lEAVSWIN) A.2.1
SYNCH (function of Program Manager) 5.3.12
SYSABENDs
analysis approach 3.1.9
hardware-detected errors 3.1.10
software-detected errors 3.1.9
system degradation (see performance degradation)
system diagnostic work area (see SDWA)
system execution modes and status saving 2.2.1
system hung (see enabled waits)
system options for SVCDUMP 2.8.5
system modes
at entry to RTMI 2.4.18
determining from Stand-alone dump 2.2.4
locked mode 2.2.3
physically disabled mode 2.2.2
SRB mode 2.2.2
task mode 2.2.1
system resources manager (see SRM)
system stop routine 2.8.20
system options for SVC dump
SYSABENDs 3.1.9
SYSMDUMPs 3.1.9,5.14.1
SYSUDUMPs, analysis approach 3.1.9
SYSZEC16-PURGE 4.1.13
SYSZVARY-x 4.1.3
SYS1.COMWRITE data set, how to print 2.8.8
SYS1.DUMP
how to clear without printing 2.8.7
how to print 2.8.7
SYS1.LOGREC (see LOGREC)
SYS1.STGINDEX, how to recreate 2.8.9
SYS1.UADS, how to rebuild 2.8.6

traces (see also trace table)
activating several NCP traces 4.3.28
analysis of 2.6.1
currency 2.6.8
EP mode 4.3.4
events not traced 2.6.8
examples 2.6.3,4.3.7
interpreting 2.6.5
NCP mode 4.3.5
other tracing methods 4.3.30
output under normal conditions 4.3.7
summary of 4.3.3
to monitor storage 2.8.21
types of 4.3.3
trace table
cautionary notes 2.6.7
how to expand 2.8.21
how to locate 2.6.1
how to modify to monitor low storage 2.8.20
types of entries 2.6.1
with DIDOCS 5.15.11
traps, ERPs 5.2.16
'
TSO (time sharing option)
AP AR documentation A.6.28
attention processing A.6.25
command processor recovery A.6.19
line drop processing A.6.12
message handler A.6.27
overview of logon processing A.6.2
process flow A.6.1
terminal I/O flow A.6.21
time sharing initialization A.6.1
TSO/TIOC terminal I/O diagnostic techniques A.6.24

tape ERP 5.2.15
task
analysis 2.1.8
locally locked interrupted 2.2.3
locally locked suspended 2.2.6,2.3.5,2.3 7,4.1.11
mode indicators set by dispatcher 5.1.12
RB structure 2.1.9
tests made by dispatcher 5.1.12
TCAM
address space A.6.7
buffer trace (EP) 4.3.4
buffer trace (NCP) 4.3.6
channel end appendage A.6.2S
dispatcher subtask trace 4.3.6
EP mode line I/O interrupt trace table 4.3.4
organization after a TSO logon A.6.7
PIU trace 4.3.6
subtask trace 4.3.4
TIOC logon processing A.6.6
TSO terminal I/O diagnostic techniques A.6.24
TCB (task control block)
analysis 2.1.8
dispatching priority in address space 5.1.8
summary report 2.1.6
suspended with lock held B.1.5
TDCM (pageable display control module)
control block 5.15.4
teleprocessing (see TP)
timer value in trace table 2.6.7
time sharing and TCAM data flow A.6.21
TMP (terminal monitor program) A.6.6
EP mode 4.3.4
typical problems 4.3.1
TMP/command processor
interface A.6.15
work area A.6.1 7
TPIOS buffer trace, example 4.3.12
TPIOS IN/OUT REMOTE trace 4.3.6

UCB, analysis for enabled waits 4.1.10
UCM (unit control module)
control block 5.15.4
UCME (UCM entry)
control block 5.15.4
unit check
with ERPs 5.2.13
use of recovery work areas for problem analysis

1.1.8

2.4.1

validity bits for machine checks 2.7.9
virtual addresses, converting 5.5.14
virtual storage access method (see VSAM)
virtual storage manager (see VSM)
. virtual telecommunications access method (see VT AM)
volume mount & verify (VM&V) 5.11.5
VSM (virtual storage manag~r)
addr~ss space initialization
5.4.3
allocation 5.4.6
basic functions 5.4.1
cell pool management 5.4.10
control block usage 5.4.4
debugging hints 5.4.10
GETMAIN/FREEMAIN process flow A.4.1
global data areas (GDA) 5.4.7
step initialization/termination 5.4.5
view of MVS storage 5.4.2
VSAM (virtual storage access method)
component analysis 5.9.1
I/O manager debugging 5.9.9
O/C/EOV debugging aids 5.9.7
O/C/EOV messages 5.9.6
record management
buffer control block (BUFC) 5.9.3
debugging aids 5.9.3
error codes 5.9.5
placeholder (PLH) 5.9.2
request parameter list (RPL) 5.9.1

OS/VS2 System Programming Library: MVS Diagnostic Techniques

VTAM (virtual telecommunications access method)
address space usage 5.8.6
component analysis 5.8.1
control block structure 5.8.3
debugging 5.8.10
function management control block (FMCB) 5.8.5
how work is processed 5.8.2
locating FMCB/DNCB for a mode 5.8.14
locking 5.8.7
miscellaneous hints 5.8.15
module naming conventions 5.8.6
operating characteristics 5.8.6
process flow A.S.l
program checks 5.8.15
recovery/termination 5.8.8
relationship with MVS '5.8.1
sample storage pool dump 5.8.13
SEND process flow A.S.2
VTAM buffer trace
defwtion 4.3.6
modification 4.3.29
VTAM GTF trace example 4.3.12
VTAM I/O trace
defmition 4.3.5
example 4.3.7
modification 4.3.29
waits
chapter on 4.1.3
disabled
analysis approach 4.1.5
characteristics of 4.1.4
locked console exception 4.1.5
with communications task 5.15.7
enabled
analysis approach 4.1.7
analysis via trace table 2.6.7
characteristics of 4.1.3

waits (continued)
with communications task 5.15.6
enabled loop exception 4.2.3
explicit 2.1.8, B.1.9
for VTAM buffer depletion 5.8.15
indications of paging interlocks 5.6.8
in user code B.1.21
in VTAM 5.8.11
no-work wait 4.1.8
OUCB analysis 5.7.6
page fault waits 4.1.10
page waits B.1.9
record managementdebu8ging aids 5.9.3
wait state PSW B.1.2
wait task. dispatching of 5.1.8
window spin 2.5.10
working set sizes 4.1.11
work area bits
logon scheduler A.6.11.0
work queues, TCBs, address space analysis 2.1.6
WQE control block 5.15.4
XC (SIGP external call)
defmition 2.5.9
process flow 2.5.12
XCTL (function of program manager)
deSCription 5.3.8
module search sequence 5.3.15
new RB initialization 5.3.9
RB manipulation 5.3.10
zaps (see superzaps)
OBO abend
5.11.13
306 abend
5.3.18
3705 EP line trace 4.3.4
806 abend' 5.3.14

Index

1.1.9

1.1.10 OS/VS2 System Propamming Library: MVS Diagnostic techniques

READER'S
COMMENT
FORM

OS/VS2 System Programming Library:
MVS Diagnostic Techniques
GC28-072S-2

This manual is part of a library that serves as a reference source for systems analysts, programmers,
and operators of IBM systems. This form may be used to communicate your views about this
publication. They will be sent to the author's department for whatever review and action, if any,
is deemed appropriate.
IBM may use or distribute any of the information you supply in any way it believes appropriate
without incurring any obligation whatever. You may, of course, continue to use the information
you supply.
Note: Copies of IBM publications are not stocked at the location to which this form is addressed.
Please direct any requests for copies of publications, or for assistance in using your IBM system,
to your IBM representative or to the IBM hranch office serving your locality.
Possible topics for comments are:
Clarity

E

Accuracy

Completeness

Organization

Coding

Retrieval

Legibility

If comments apply to a Selectable Unit, please provide the name of the Selectable Unit _ _ __

£

If you wish a reply, give your name and mailing address:

"~

-S
iO

>Q)

en

0

+"'
Q)

0.

f!

i

E

E
;:,

Q)

c
....
:J
Q)

Cl

..c

Cl

0

«

+"'

0
II)

>

'';:

'iii

c

Q)

en

'~

;:,

C

0

"C

"0
u.

0

....::s

CJ

en
en

Q)

C.
Q)

en
;:,

Q)

en
CO

Q)

ii:

Please circle the description that most closely describes your occupation.
(Q)

Customer

Install
Mgr.
(S)

IBM

System
Eng.

(U)

(X)

System System
Consult. Analyst
(P)

Prog.
Sys.
Rep.

(A)

System
Anilyst

(Y)
System
Prog.

(B)
System
Prog.

(Z)

(F)

Applica. '§ystem
Prog.
Oper.
(C)
Applica.
Prog.

(D)

Dev.
Prog.

(I)

I/O
Oper.
(R)

Compo
Prog.

~

(L)

Term.
o per.
(G)

System
Oper.

L::J
(J)

I/O
Oper.

(E)
Ed.
Dev.
Rep.

(N)

Cust.
Eng.

(T)

Tech.
Staff
Rep.

Number of latest Newsletter associated with this publication: _ _ _ _ _ _ _ _ _ _ _ _ __
Thank you for your cooperation. No pos-tag-e stamp necessary if mailed in the U.S.A. (Elsewhere,
an IBM office or representative will be happy to forward your comments.)

GC28-0725-2

(')

s.
~
."
0

Reader's Comment Form

0:

»

5'
:::l

OQ

,...
5'

(1).

I
I.
Fold and tape

Fold and tape

Please Do Not Staple .

III I I

<

CJ'.

I\:

I

~

-I

3
..,'"t

.,I

NO POSTAGE
NECESSARY
IF MAILED
IN THE
UNITED STATES

c

CJ'.
......

(I)

.-+
(1)

0

..,

(.Q
Q)

3

2
:l

(.Q

C-

..,
..,

O"
Q)

-<

s

<

BUSINESS REPLY MAIL
FIRST CLASS

PERMIT NO. 40

CJ'.

c::

iii

ARMONK, N.Y.

(.Q

:l

0
(I)

.-+

POSTAGE WILL BE PAID BY ADDRESSEE:

n

International Business Machines Corporation
Department 058, Building 706-2
PO Box 390
Poughkeepsie, New York 12602

::r
~
.c

(1)

(')

c

(1)
(I)

c;,
c...:

......

c;:
c...:

.::::
'"t

~

:l

.-+

Fold and tape

Please Do Not Staple

Fold and tape

(1)

Q

:;
.C

Cr.

:~

C
\.

I\:
OC

------.-- ----.-_
.International Business Machines Corporation
Data Processing Division' .
1133 Westchester Avenue, White Plains, N.V. 10604
IBM World Trade Americas/Far East Corporation
Town of Mount Pleasant, Route 9, North TarrytoWn, N.Y., U.s.A. 10591
IBM World Trade Europe/Middle East/Africa Corporation
360 Hamilton Avenue, White Plains, N.Y., U.S.A. 10601

C:
....

I\:
U

i\;



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date                     : 2012:06:10 09:21:28-08:00
Modify Date                     : 2012:06:11 01:18:58-07:00
Metadata Date                   : 2012:06:11 01:18:58-07:00
Producer                        : Adobe Acrobat 9.51 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:4fbcb82a-cf2e-463d-9361-5f5e748c1960
Instance ID                     : uuid:241135ed-d334-4e82-b569-bd6328b04741
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 564
EXIF Metadata provided by EXIF.tools

Navigation menu